Re: [tip:x86/debug] printk: Make the printk*once() variants return a value
On Sat, Jul 09, 2016 at 10:56:55AM -0700, Joe Perches wrote: > defconfigs both with and without CONFIG_PRINTK build > properly with the proposed change to this specific patch. Did you try latest tip/master? > Borislav, your delightful personality always impresses. > Never change. What goes around comes around. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --
[char-misc 4.7] mei: me: disable driver on SPT SPS firmware
Sunrise Point PCH with SPS Firmware doesn't expose working MEI interface, we need to quirk it out. Cc: #4.4+ Signed-off-by: Tomas Winkler --- drivers/misc/mei/pci-me.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c index 64e64da6da44..e64464c5c160 100644 --- a/drivers/misc/mei/pci-me.c +++ b/drivers/misc/mei/pci-me.c @@ -85,7 +85,7 @@ static const struct pci_device_id mei_me_pci_tbl[] = { {MEI_PCI_DEVICE(MEI_DEV_ID_SPT, mei_me_pch8_cfg)}, {MEI_PCI_DEVICE(MEI_DEV_ID_SPT_2, mei_me_pch8_cfg)}, - {MEI_PCI_DEVICE(MEI_DEV_ID_SPT_H, mei_me_pch8_cfg)}, + {MEI_PCI_DEVICE(MEI_DEV_ID_SPT_H, mei_me_pch8_sps_cfg)}, {MEI_PCI_DEVICE(MEI_DEV_ID_SPT_H_2, mei_me_pch8_cfg)}, {MEI_PCI_DEVICE(MEI_DEV_ID_BXT_M, mei_me_pch8_cfg)}, -- 2.5.5
Re: Missing include file in include/uapi/linux/errqueue.h?
On Sat, Jul 9, 2016 at 10:36 AM, Brooks Moses wrote: > I've been attempting to qualify the Linux 4.5.2 user-space headers for > a toolchain release, and ran into what looks like a missing include > file in include/uapi/linux/errqueue.h. In particular, > https://github.com/torvalds/linux/commit/f24b9be5957b38bb420b838115040dc2031b7d0c > adds the following to this file: > > +struct scm_timestamping { > + struct timespec ts[3]; > +}; > > However, struct timespec is defined in time.h, which isn't included > either in 4.5.2 or in current head. Is this simply a missing #include > line, or am I misunderstanding something? As a followup: Unfortunately the obvious fix -- adding "#include " -- causes other problems, since linux/time.h is incompatible with the glibc time.h such that including both of them into the same compilation unit causes errors about redefined types. And we, at least, have some programs that want to include linux/errqueue.h and (glibc's) time.h. The fix of adding "#include " to linux/errqueue.h seems to work for us, but I'm not sure that won't cause problems in the other direction for other people. - Brooks
Re: [CRIU] Introspecting userns relationships to other namespaces?
On Fri, Jul 08, 2016 at 10:13:08PM -0500, Eric W. Biederman wrote: > "W. Trevor King" writes: > > > On Thu, Jul 07, 2016 at 08:01:52AM -0700, James Bottomley wrote: > >> In theory, we could get nsfs to show this information as an option > >> (just add a show_options entry to the superblock ops), but the > >> problem is that although each namespace has a parent user_ns, > >> there's no way to get it without digging in the namespace specific > >> structure. Probably we should restructure to move it into > >> ns_common, then we could display it (and enforce all namespaces > >> having owning user_ns) but it would be a reasonably large (but > >> mechanical) change. > > > > It sounds like everyone is either positive or or neutral on this > > groundwork, even if we haven't decided if/how to expose the > > information to userspace. I'm happy to work up a patch while the rest > > of the discussion continues. I'm also happy to let someone else work > > up the patch, if anyone else is chomping at the bit ;). > > I am dubious on moving all of the user namespace members into ns_common. > > I would happy to be proved wrong but I suspect in the cases where we > actually use that user namespace the code will become uglier. Making > the ordinary uses uglier to make a rare corner case nicer is the wrong > trade off. > > But feel free to try it is certainly worth doing if it doesn't make the > code that uses the user namespaces uglier. If it's interesting for someone, I have this patch in my tree https://github.com/avagin/linux-task-diag/commit/63b32df68ae8d3a3842bae42bbcae3468db76d85 I can't say that it makes something uglier. > > Eric > > ___ > CRIU mailing list > c...@openvz.org > https://lists.openvz.org/mailman/listinfo/criu
Re: [PATCH v2 03/17] libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users
Hi, [auto build test ERROR on linux-nvdimm/libnvdimm-for-next] [also build test ERROR on v4.7-rc6 next-20160708] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Dan-Williams/replace-pcommit-with-ADR-or-directed-flushing/20160710-113558 base: https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git libnvdimm-for-next config: um-allmodconfig (attached as .config) compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430 reproduce: # save the attached .config to linux build tree make ARCH=um All error/warnings (new ones prefixed by >>): drivers/nvdimm/core.c: In function 'alloc_nvdimm_map': >> drivers/nvdimm/core.c:108:23: error: implicit declaration of function >> 'ioremap' [-Werror=implicit-function-declaration] nvdimm_map->iomem = ioremap(offset, size); ^~~ >> drivers/nvdimm/core.c:108:21: warning: assignment makes pointer from integer >> without a cast [-Wint-conversion] nvdimm_map->iomem = ioremap(offset, size); ^ drivers/nvdimm/core.c: In function 'nvdimm_map_release': >> drivers/nvdimm/core.c:139:3: error: implicit declaration of function >> 'iounmap' [-Werror=implicit-function-declaration] iounmap(nvdimm_map->iomem); ^~~ cc1: some warnings being treated as errors vim +/ioremap +108 drivers/nvdimm/core.c 102 if (!request_mem_region(offset, size, dev_name(&nvdimm_bus->dev))) 103 goto err_request_region; 104 105 if (flags) 106 nvdimm_map->mem = memremap(offset, size, flags); 107 else > 108 nvdimm_map->iomem = ioremap(offset, size); 109 110 if (!nvdimm_map->mem) 111 goto err_map; 112 113 dev_WARN_ONCE(dev, !is_nvdimm_bus_locked(dev), "%s: bus unlocked!", 114 __func__); 115 list_add(&nvdimm_map->list, &nvdimm_bus->mapping_list); 116 117 return nvdimm_map; 118 119 err_map: 120 release_mem_region(offset, size); 121 err_request_region: 122 kfree(nvdimm_map); 123 return NULL; 124 } 125 126 static void nvdimm_map_release(struct kref *kref) 127 { 128 struct nvdimm_bus *nvdimm_bus; 129 struct nvdimm_map *nvdimm_map; 130 131 nvdimm_map = container_of(kref, struct nvdimm_map, kref); 132 nvdimm_bus = nvdimm_map->nvdimm_bus; 133 134 dev_dbg(&nvdimm_bus->dev, "%s: %pa\n", __func__, &nvdimm_map->offset); 135 list_del(&nvdimm_map->list); 136 if (nvdimm_map->flags) 137 memunmap(nvdimm_map->mem); 138 else > 139 iounmap(nvdimm_map->iomem); 140 release_mem_region(nvdimm_map->offset, nvdimm_map->size); 141 kfree(nvdimm_map); 142 } --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [RFC PATCH 0/3] doc-rst: customize HTML (RTD) theme
On Tue, 5 Jul 2016 14:55:09 -0300 Mauro Carvalho Chehab wrote: > I hope you don't mind. I'm merging those three patches on my tree > (for now, they're on an experimental tree that I can easily rebase, if > needed). If OK for you, my plan is to merge it on a separate branch, > together with the other patches for Documentation/linux_tv. [Slowly trying to catch back up with the real world; service will continue to be intermittent for a bit yet.] So as far as I can tell, I never got part 1/3, not sure what happened there. In general, my only concern is that we haven't really begun the process of debating the proper bikeshed^Wtheme for the kernel docs. Which is just fine. At some point, we may want to think about it a bit more, but, for now, there is certainly no harm in making what we have work better. Please feel free to include these with your stuff with my acked-by. Thanks, jon
Re: Odd performance results
On 10 July 2016 06:26:39 CEST, "Paul E. McKenney" wrote: >Hello! > >So I ran a quick benchmark which showed stair-step results. I >immediately >thought "Ah, this is due to CPU 0 and 1, 2 and 3, 4 and 5, and 6 and 7 >being threads in a core." Then I thought "Wait, this is an x86!" >Then I dumped out cpu*/topology/thread_siblings_list, getting the >following: > > cpu0/topology/thread_siblings_list: 0-1 > cpu1/topology/thread_siblings_list: 0-1 > cpu2/topology/thread_siblings_list: 2-3 > cpu3/topology/thread_siblings_list: 2-3 > cpu4/topology/thread_siblings_list: 4-5 > cpu5/topology/thread_siblings_list: 4-5 > cpu6/topology/thread_siblings_list: 6-7 > cpu7/topology/thread_siblings_list: 6-7 I'm guessing this is an AMD bulldozer like machine? -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()
On Sat, Jul 9, 2016 at 9:47 PM, kbuild test robot wrote: > Hi, > > [auto build test ERROR on linux-nvdimm/libnvdimm-for-next] > [also build test ERROR on next-20160708] > [cannot apply to v4.7-rc6] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Dan-Williams/replace-pcommit-with-ADR-or-directed-flushing/20160710-113558 > base: https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git > libnvdimm-for-next > config: i386-randconfig-r0-201628 (attached as .config) > compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430 > reproduce: > # save the attached .config to linux build tree > make ARCH=i386 Hi kbuild team, Can we add an "i386 allmodconfig" build to the standard "BUILD SUCCESS" notification runs? I had two positive build results on a private branch prior to posting this series, but the i386 runs did not build the nvdimm sub-system. In any event this report is valid, so thank you for that! > > All errors (new ones prefixed by >>): > >drivers/nvdimm/region_devs.c: In function 'nvdimm_flush': >>> drivers/nvdimm/region_devs.c:887:4: error: implicit declaration of function >>> 'writeq' [-Werror=implicit-function-declaration] >writeq(1, ndrd->flush_wpq[i][0]); >^~ >cc1: some warnings being treated as errors > > vim +/writeq +887 drivers/nvdimm/region_devs.c > >881 * writes to avoid the cache via arch_memcpy_to_pmem(). The >882 * final wmb() ensures ordering for the NVDIMM flush write. >883 */ >884 wmb(); >885 for (i = 0; i < nd_region->ndr_mappings; i++) >886 if (ndrd->flush_wpq[i][0]) > > 887 writeq(1, ndrd->flush_wpq[i][0]); >888 wmb(); >889 } >890 EXPORT_SYMBOL_GPL(nvdimm_flush); > > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [PATCH v3 0/7] lib: string: add functions to case-convert strings
On 7/8/2016 6:43 PM, Markus Mayer wrote: This series introduces a family of generic string case conversion functions. This kind of functionality is needed in several places in the kernel. Right now, everybody seems to be implementing their own copy of this functionality. Based on the discussion of the previous version of this series[1] and the use cases found in the kernel, it does look like having several flavours of case conversion functions is beneficial. The use cases fall into three categories: - copying a string and converting the case while specifying a maximum length to mimic strlcpy() - copying a string and converting the case without specifying a length to mimic strcpy() - converting the case of a string in-place (i.e. modifying the string that was passed in) Consequently, I am proposing these new functions: void strlcpytoupper(char *dst, const char *src, size_t len); void strlcpytolower(char *dst, const char *src, size_t len); void strcpytoupper(char *dst, const char *src); void strcpytolower(char *dst, const char *src); void strtoupper(char *s); void strtolower(char *s); You may want to read the article here: https://lwn.net/Articles/659214/ and follow up some of the discussion threads on LKML about the best semantics to advertise for the strlcpy/strscpy variants. It might be helpful to return some kind of overflow/truncation error from your copy functions so people can error-check the result. -- Chris Metcalf, Mellanox Technologies http://www.mellanox.com
Re: [PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()
Hi, [auto build test ERROR on linux-nvdimm/libnvdimm-for-next] [also build test ERROR on next-20160708] [cannot apply to v4.7-rc6] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Dan-Williams/replace-pcommit-with-ADR-or-directed-flushing/20160710-113558 base: https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git libnvdimm-for-next config: i386-randconfig-r0-201628 (attached as .config) compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): drivers/nvdimm/region_devs.c: In function 'nvdimm_flush': >> drivers/nvdimm/region_devs.c:887:4: error: implicit declaration of function >> 'writeq' [-Werror=implicit-function-declaration] writeq(1, ndrd->flush_wpq[i][0]); ^~ cc1: some warnings being treated as errors vim +/writeq +887 drivers/nvdimm/region_devs.c 881 * writes to avoid the cache via arch_memcpy_to_pmem(). The 882 * final wmb() ensures ordering for the NVDIMM flush write. 883 */ 884 wmb(); 885 for (i = 0; i < nd_region->ndr_mappings; i++) 886 if (ndrd->flush_wpq[i][0]) > 887 writeq(1, ndrd->flush_wpq[i][0]); 888 wmb(); 889 } 890 EXPORT_SYMBOL_GPL(nvdimm_flush); --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Odd performance results
Hello! So I ran a quick benchmark which showed stair-step results. I immediately thought "Ah, this is due to CPU 0 and 1, 2 and 3, 4 and 5, and 6 and 7 being threads in a core." Then I thought "Wait, this is an x86!" Then I dumped out cpu*/topology/thread_siblings_list, getting the following: cpu0/topology/thread_siblings_list: 0-1 cpu1/topology/thread_siblings_list: 0-1 cpu2/topology/thread_siblings_list: 2-3 cpu3/topology/thread_siblings_list: 2-3 cpu4/topology/thread_siblings_list: 4-5 cpu5/topology/thread_siblings_list: 4-5 cpu6/topology/thread_siblings_list: 6-7 cpu7/topology/thread_siblings_list: 6-7 Is this now expected behavior or a fluke of my particular laptop? Here is hoping for expected behavior, as it makes NUMA locality the default for a great many workloads. Enlightenment? Thanx, Paul
Re: [PATCH v3 0/7] lib: string: add functions to case-convert strings
On 9 July 2016 at 20:13, Chris Metcalf wrote: > On 7/8/2016 6:43 PM, Markus Mayer wrote: >> >> This series introduces a family of generic string case conversion >> functions. This kind of functionality is needed in several places in >> the kernel. Right now, everybody seems to be implementing their own >> copy of this functionality. >> >> Based on the discussion of the previous version of this series[1] and >> the use cases found in the kernel, it does look like having several >> flavours of case conversion functions is beneficial. The use cases fall >> into three categories: >> - copying a string and converting the case while specifying a >>maximum length to mimic strlcpy() >> - copying a string and converting the case without specifying a >>length to mimic strcpy() >> - converting the case of a string in-place (i.e. modifying the >>string that was passed in) >> >> Consequently, I am proposing these new functions: >> void strlcpytoupper(char *dst, const char *src, size_t len); >> void strlcpytolower(char *dst, const char *src, size_t len); >> void strcpytoupper(char *dst, const char *src); >> void strcpytolower(char *dst, const char *src); >> void strtoupper(char *s); >> void strtolower(char *s); > > > You may want to read the article here: > > https://lwn.net/Articles/659214/ I'll read that. Thanks. > and follow up some of the discussion threads on LKML about the best > semantics to advertise for the strlcpy/strscpy variants. It might be > helpful to return some kind of overflow/truncation error from your > copy functions so people can error-check the result. I am inclined to agree. However, everybody has been telling me that these functions should be void. Originally they weren't. Regards, -Markus
Re: [PATCH 12/13] nvme: switch to use pci_alloc_irq_vectors
On Thu, Jul 07, 2016 at 09:30:19PM +0200, Alexander Gordeev wrote: > On Mon, Jul 04, 2016 at 05:39:33PM +0900, Christoph Hellwig wrote: > > @@ -1575,6 +1546,7 @@ static int nvme_dev_add(struct nvme_dev *dev) > > dev->tagset.cmd_size = nvme_cmd_size(dev); > > dev->tagset.flags = BLK_MQ_F_SHOULD_MERGE; > > dev->tagset.driver_data = dev; > > + dev->tagset.affinity_mask = to_pci_dev(dev->dev)->irq_affinity; > > > > if (blk_mq_alloc_tag_set(&dev->tagset)) > > return 0; > > Are there any post-init uses of blk_mq_tag_set::affinity_mask other than > calling to blk_mq_alloc_tag_set()? If no, blk_mq_tag_set::affinity_mask > is redundant, since the mask could be passed as a parameter. We'll have to look at it in the block code when reinitializing rebuilding the queue topology. This isn't currently done, but we'll need it rather soon.
Re: [PATCH 08/13] pci: spread interrupt vectors in pci_alloc_irq_vectors
On Thu, Jul 07, 2016 at 01:05:01PM +0200, Alexander Gordeev wrote: > irq_create_affinity_mask() bails out with no affinity in case of single > vector, but alloc_descs() (see below (*)) assigns the whole affinity > mask. It should be consistent instead. I don't understand the comment. If we only have one vector (of any kinds) there is no need to create an affinity mask, we'll leave the interrupt to the existing irq balancing code. > Actually, I just realized pci_alloc_irq_vectors() should probably call > irq_create_affinity_mask() and handle it in a consistent way for all four > cases: MSI-X, mulit-MSI, MSI and legacy. That's what the earlier versions did, but you correctly pointed out that we should call irq_create_affinity_mask only after we have reduced the number of vectors to the number that the bridges can route, i.e. that we have to move it into the pci_enable_msi(x)_range main loop. > Optionally, the three latter could be dropped for now so you could proceed > with NVMe. NVMe cares for all these cases at least in theory. > (*) In the future IRQ vs CPU mapping 1:N is possible/desirable so I suppose > this piece of code worth a comment or better - a separate function. In fact, > this algorithm already exists in alloc_descs(), which makes even more sense > to factor it out: > > for (i = 0; i < cnt; i++) { > if (affinity) { > cpu = cpumask_next(cpu, affinity); > if (cpu >= nr_cpu_ids) > cpu = cpumask_first(affinity); > node = cpu_to_node(cpu); > > /* >* For single allocations we use the caller provided >* mask otherwise we use the mask of the target cpu >*/ > mask = cnt == 1 ? affinity : cpumask_of(cpu); > } > > [...] While these two pieces of code look very similar there is an important difference in why and how the mask is calculated. In alloc_descs() the difference here is that cnt = 1 is the MSI-X case where the passed in affinity is that for the MSI-X descriptor which is for a single vector. in the MSI case where we have multiple vectors per descriptor a different affinity is asigned for each vector based of a single passed in mask.
Re: [PATCH 07/13] pci: Provide sensible irq vector alloc/free routines
On Wed, Jul 06, 2016 at 10:05:45AM +0200, Alexander Gordeev wrote: > > + pci_enable_msi, pci_enable_msi_range, pci_enable_msi_exact, > > pci_disable_msi, > > + pci_msi_vec_count, pci_enable_msix_range, pci_enable_msix_exact, > > + pci_disable_msix, pci_msix_vec_count > > Description of these functions can be removed when all drivers migrated > to the new API. Also implementation descriptions + examples would still > be needed AFAICT. I diagreed - if we deprecated functions the only thing that should be mentioned is a "don't use these". > This function's code almost matches the existing pci_enable_msix_range() > so pci_enable_msix_range() should be reworked instead IMHO. That's what earlier versions of the code did. However due to the fact that we want to avoid over-allocating the msix_vectors array (minor) and get the vectors count of the affinity mask right (major, as pointed out by you last time) I had to move the allocations inside the helpers that loop around the atctual enablement. I didn't want to change the function to a different version of the algorithm just before removing them relatively soon. But given that strong preference for changing these simple functions instead of duplicating them I've changed that patch to do that now. > We do not need to keep msix_entry array, since it only needed for > pci_irq_vector() function. But the same info could be retrieved from > msi_desc::irq. Indeed. Avoiding this allocation makes these interfaces quite a bit simpler. It requires a few prep patches, but I think it's definitively worth, so the next version will avoid the need for the msix_entry array. > > + /* use legacy irq if allowed */ > > + if (min_vecs == 1) > > + return 1; > > + return -ENOSPC; > > The original error code (in vecs) would be overridden with -ENOSPC here. Ok, fixed. > > + WARN_ON_ONCE(!dev->msi_enabled && nr > 0); > > + return dev->irq + nr; > > I think this function should check irq number existence and return the > vector number or -EINVAL; Ok, fixed. > > + unsigned int flags) > > +{ > > + if (min_vecs > 1) > > + return -ENOSPC; > > In case CONFIG_PCI_MSI is unset min_vecs > 1 is -EINVAL; Ok, fixed.
Re: [PATCH] intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate
On Sat, 2016-07-09 at 02:45 +0200, Rafael J. Wysocki wrote: > On Friday, July 08, 2016 12:39:07 PM Srinivas Pandruvada wrote: > > On Fri, 2016-07-08 at 20:42 +0200, Jan Kiszka wrote: > > > If MSR_CONFIG_TDP_CONTROL is locked, we currently try to address > > > some > > > MSR 0x8648 or so. Mask out the relevant level bits 0 and 1. > > > > > > Found while running over the Jailhouse hypervisor which became > > > upset > > > about this strange MSR index. > > > > > > Signed-off-by: Jan Kiszka > > Acked-by: Srinivas Pandruvada > > OK > > Should this go into stable? Better to mark for stable tree 4.4+ Thanks, Srinivas
Re: [PATCH 11/13] blk-mq: allow the driver to pass in an affinity mask
On Mon, Jul 04, 2016 at 11:35:28AM +0200, Alexander Gordeev wrote: > > mq_map is initialized to zero already, so we don't really need the > > assignment for queue 0. The reason why this check exists is because > > we start with queue = -1 and we never want to assignment -1 to mq_map. > > Would this read better then? > > int queue = 0; > > ... > > /* If cpus are offline, map them to first hctx */ > for_each_online_cpu(cpu) { > set->mq_map[cpu] = queue; > if (cpumask_test_cpu(cpu, affinity_mask)) > queue++; It would read better, but I don't think it's actually correct. We'd still assign the 'old' queue to the cpu that is set in the affinity mask.
[PATCH v2 02/17] nfit: don't override return value of nfit_mem_init
We were needlessly converting nfit_mem_init() errors to -ENOMEM. Signed-off-by: Dan Williams --- drivers/acpi/nfit.c |5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c index d79837b9d07e..f8c1a850effc 100644 --- a/drivers/acpi/nfit.c +++ b/drivers/acpi/nfit.c @@ -2422,10 +2422,9 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz) if (rc) goto out_unlock; - if (nfit_mem_init(acpi_desc) != 0) { - rc = -ENOMEM; + rc = nfit_mem_init(acpi_desc); + if (rc) goto out_unlock; - } acpi_nfit_init_dsms(acpi_desc);
[PATCH v2 01/17] nfit: always associate flush hints
Before enabling use of flush hints for pmem regions, we need to make sure they are always associated. Move the initialization of nfit_flush out of the block-window specific init path to the general init path. Cc: Ross Zwisler Signed-off-by: Dan Williams --- drivers/acpi/nfit.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c index 3e54157f02cc..d79837b9d07e 100644 --- a/drivers/acpi/nfit.c +++ b/drivers/acpi/nfit.c @@ -614,7 +614,6 @@ static void nfit_mem_init_bdw(struct acpi_nfit_desc *acpi_desc, { u16 dcr = __to_nfit_memdev(nfit_mem)->region_index; struct nfit_memdev *nfit_memdev; - struct nfit_flush *nfit_flush; struct nfit_bdw *nfit_bdw; struct nfit_idt *nfit_idt; u16 idt_idx, range_index; @@ -647,14 +646,6 @@ static void nfit_mem_init_bdw(struct acpi_nfit_desc *acpi_desc, nfit_mem->idt_bdw = nfit_idt->idt; break; } - - list_for_each_entry(nfit_flush, &acpi_desc->flushes, list) { - if (nfit_flush->flush->device_handle != - nfit_memdev->memdev->device_handle) - continue; - nfit_mem->nfit_flush = nfit_flush; - break; - } break; } } @@ -675,6 +666,7 @@ static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc, } list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) { + struct nfit_flush *nfit_flush; struct nfit_dcr *nfit_dcr; u32 device_handle; u16 dcr; @@ -721,6 +713,13 @@ static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc, break; } + list_for_each_entry(nfit_flush, &acpi_desc->flushes, list) { + if (nfit_flush->flush->device_handle != device_handle) + continue; + nfit_mem->nfit_flush = nfit_flush; + break; + } + if (dcr && !nfit_mem->dcr) { dev_err(acpi_desc->dev, "SPA %d missing DCR %d\n", spa->range_index, dcr);
[PATCH v2 03/17] libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users
In preparation for generically mapping flush hint addresses for both the BLK and PMEM use case, provide a generic / reference counted mapping api. Given the fact that a dimm may belong to multiple regions (PMEM and BLK), the flush hint addresses need to be held valid as long as any region associated with the dimm is active. This is similar to the existing BLK-region case where multiple BLK-regions may share an aperture mapping. Up-level this shared / reference-counted mapping capability from the nfit driver to a core nvdimm capability. This eliminates the need for the nd_blk_region.disable() callback. Note that the removal of nfit_spa_map() and related infrastructure is deferred to a later patch. Signed-off-by: Dan Williams --- drivers/acpi/nfit.c | 14 +++-- drivers/nvdimm/core.c | 122 + drivers/nvdimm/nd-core.h |1 include/linux/libnvdimm.h |9 +++ 4 files changed, 139 insertions(+), 7 deletions(-) diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c index f8c1a850effc..b047dbe13bed 100644 --- a/drivers/acpi/nfit.c +++ b/drivers/acpi/nfit.c @@ -1616,7 +1616,8 @@ static void __iomem *__nfit_spa_map(struct acpi_nfit_desc *acpi_desc, * when all region devices referencing the same mapping are disabled / * unbound. */ -static void __iomem *nfit_spa_map(struct acpi_nfit_desc *acpi_desc, +static __maybe_unused void __iomem *nfit_spa_map( + struct acpi_nfit_desc *acpi_desc, struct acpi_nfit_system_address *spa, enum spa_map_type type) { void __iomem *iomem; @@ -1669,7 +1670,6 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus, struct device *dev) { struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus); - struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc); struct nd_blk_region *ndbr = to_nd_blk_region(dev); struct nfit_flush *nfit_flush; struct nfit_blk_mmio *mmio; @@ -1697,8 +1697,8 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus, /* map block aperture memory */ nfit_blk->bdw_offset = nfit_mem->bdw->offset; mmio = &nfit_blk->mmio[BDW]; - mmio->addr.base = nfit_spa_map(acpi_desc, nfit_mem->spa_bdw, - SPA_MAP_APERTURE); + mmio->addr.base = devm_nvdimm_memremap(dev, nfit_mem->spa_bdw->address, +nfit_mem->spa_bdw->length, ARCH_MEMREMAP_PMEM); if (!mmio->addr.base) { dev_dbg(dev, "%s: %s failed to map bdw\n", __func__, nvdimm_name(nvdimm)); @@ -1720,8 +1720,8 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus, nfit_blk->cmd_offset = nfit_mem->dcr->command_offset; nfit_blk->stat_offset = nfit_mem->dcr->status_offset; mmio = &nfit_blk->mmio[DCR]; - mmio->addr.base = nfit_spa_map(acpi_desc, nfit_mem->spa_dcr, - SPA_MAP_CONTROL); + mmio->addr.base = devm_nvdimm_ioremap(dev, nfit_mem->spa_dcr->address, + nfit_mem->spa_dcr->length); if (!mmio->addr.base) { dev_dbg(dev, "%s: %s failed to map dcr\n", __func__, nvdimm_name(nvdimm)); @@ -1748,7 +1748,7 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus, nfit_flush = nfit_mem->nfit_flush; if (nfit_flush && nfit_flush->flush->hint_count != 0) { - nfit_blk->nvdimm_flush = devm_ioremap_nocache(dev, + nfit_blk->nvdimm_flush = devm_nvdimm_ioremap(dev, nfit_flush->flush->hint_address[0], 8); if (!nfit_blk->nvdimm_flush) return -ENOMEM; diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c index 32e4fe2f6274..f9686297ff79 100644 --- a/drivers/nvdimm/core.c +++ b/drivers/nvdimm/core.c @@ -57,6 +57,127 @@ bool is_nvdimm_bus_locked(struct device *dev) } EXPORT_SYMBOL(is_nvdimm_bus_locked); +struct nvdimm_map { + struct nvdimm_bus *nvdimm_bus; + struct list_head list; + resource_size_t offset; + unsigned long flags; + size_t size; + union { + void *mem; + void __iomem *iomem; + }; + struct kref kref; +}; + +static struct nvdimm_map *find_nvdimm_map(struct device *dev, + resource_size_t offset) +{ + struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev); + struct nvdimm_map *nvdimm_map; + + list_for_each_entry(nvdimm_map, &nvdimm_bus->mapping_list, list) + if (nvdimm_map->offset == offset) + return nvdimm_map; + return NULL; +} + +static struct nvdimm_map *alloc_nvdimm_map(struct device *dev, + resource_size_t offset, size_t size, unsigned long flags) +{ + struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev); + struct nvdimm_map *nvdimm_ma
[PATCH v2 06/17] tools/testing/nvdimm: simulate multiple flush hints per-dimm
Sample nfit data to test the kernel's handling of the multiple flush-hint case. Signed-off-by: Dan Williams --- tools/testing/nvdimm/test/nfit.c | 55 +++--- 1 file changed, 33 insertions(+), 22 deletions(-) diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c index 4fdd139f6e6c..ff09a28890ed 100644 --- a/tools/testing/nvdimm/test/nfit.c +++ b/tools/testing/nvdimm/test/nfit.c @@ -98,6 +98,7 @@ enum { NUM_PM = 3, NUM_DCR = 5, + NUM_HINTS = 8, NUM_BDW = NUM_DCR, NUM_SPA = NUM_PM + NUM_DCR + NUM_BDW, NUM_MEM = NUM_DCR + NUM_BDW + 2 /* spa0 iset */ + 4 /* spa1 iset */, @@ -569,7 +570,8 @@ static int nfit_test0_alloc(struct nfit_test *t) + offsetof(struct acpi_nfit_control_region, window_size) * NUM_DCR + sizeof(struct acpi_nfit_data_region) * NUM_BDW - + sizeof(struct acpi_nfit_flush_address) * NUM_DCR; + + (sizeof(struct acpi_nfit_flush_address) + + sizeof(u64) * NUM_HINTS) * NUM_DCR; int i; t->nfit_buf = test_alloc(t, nfit_size, &t->nfit_dma); @@ -599,7 +601,8 @@ static int nfit_test0_alloc(struct nfit_test *t) return -ENOMEM; sprintf(t->label[i], "label%d", i); - t->flush[i] = test_alloc(t, 8, &t->flush_dma[i]); + t->flush[i] = test_alloc(t, sizeof(u64) * NUM_HINTS, + &t->flush_dma[i]); if (!t->flush[i]) return -ENOMEM; } @@ -633,6 +636,8 @@ static int nfit_test1_alloc(struct nfit_test *t) static void nfit_test0_setup(struct nfit_test *t) { + const int flush_hint_size = sizeof(struct acpi_nfit_flush_address) + + (sizeof(u64) * NUM_HINTS); struct acpi_nfit_desc *acpi_desc; struct acpi_nfit_memory_map *memdev; void *nfit_buf = t->nfit_buf; @@ -640,7 +645,7 @@ static void nfit_test0_setup(struct nfit_test *t) struct acpi_nfit_control_region *dcr; struct acpi_nfit_data_region *bdw; struct acpi_nfit_flush_address *flush; - unsigned int offset; + unsigned int offset, i; /* * spa0 (interleave first half of dimm0 and dimm1, note storage @@ -1126,37 +1131,41 @@ static void nfit_test0_setup(struct nfit_test *t) /* flush0 (dimm0) */ flush = nfit_buf + offset; flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS; - flush->header.length = sizeof(struct acpi_nfit_flush_address); + flush->header.length = flush_hint_size; flush->device_handle = handle[0]; - flush->hint_count = 1; - flush->hint_address[0] = t->flush_dma[0]; + flush->hint_count = NUM_HINTS; + for (i = 0; i < NUM_HINTS; i++) + flush->hint_address[i] = t->flush_dma[0] + i * sizeof(u64); /* flush1 (dimm1) */ - flush = nfit_buf + offset + sizeof(struct acpi_nfit_flush_address) * 1; + flush = nfit_buf + offset + flush_hint_size * 1; flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS; - flush->header.length = sizeof(struct acpi_nfit_flush_address); + flush->header.length = flush_hint_size; flush->device_handle = handle[1]; - flush->hint_count = 1; - flush->hint_address[0] = t->flush_dma[1]; + flush->hint_count = NUM_HINTS; + for (i = 0; i < NUM_HINTS; i++) + flush->hint_address[i] = t->flush_dma[1] + i * sizeof(u64); /* flush2 (dimm2) */ - flush = nfit_buf + offset + sizeof(struct acpi_nfit_flush_address) * 2; + flush = nfit_buf + offset + flush_hint_size * 2; flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS; - flush->header.length = sizeof(struct acpi_nfit_flush_address); + flush->header.length = flush_hint_size; flush->device_handle = handle[2]; - flush->hint_count = 1; - flush->hint_address[0] = t->flush_dma[2]; + flush->hint_count = NUM_HINTS; + for (i = 0; i < NUM_HINTS; i++) + flush->hint_address[i] = t->flush_dma[2] + i * sizeof(u64); /* flush3 (dimm3) */ - flush = nfit_buf + offset + sizeof(struct acpi_nfit_flush_address) * 3; + flush = nfit_buf + offset + flush_hint_size * 3; flush->header.type = ACPI_NFIT_TYPE_FLUSH_ADDRESS; - flush->header.length = sizeof(struct acpi_nfit_flush_address); + flush->header.length = flush_hint_size; flush->device_handle = handle[3]; - flush->hint_count = 1; - flush->hint_address[0] = t->flush_dma[3]; + flush->hint_count = NUM_HINTS; + for (i = 0; i < NUM_HINTS; i++) + flush->hint_address[i] = t->flush_dma[3] + i * sizeof(u64); if (t->setup_hotplug) { - offset = offset + sizeof(struct acpi_nfit_flush_address) * 4; +
[PATCH v2 04/17] libnvdimm, nfit: remove nfit_spa_map() infrastructure
Now that all shared mappings are handled by devm_nvdimm_memremap() we no longer need nfit_spa_map() nor do we need to trigger a callback to the bus provider at region disable time. Signed-off-by: Dan Williams --- drivers/acpi/nfit.c | 146 -- drivers/acpi/nfit.h | 21 -- drivers/nvdimm/nd.h |1 drivers/nvdimm/region_devs.c |3 - include/linux/libnvdimm.h|1 5 files changed, 172 deletions(-) diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c index b047dbe13bed..b76c95981547 100644 --- a/drivers/acpi/nfit.c +++ b/drivers/acpi/nfit.c @@ -1509,126 +1509,6 @@ static int acpi_nfit_blk_region_do_io(struct nd_blk_region *ndbr, return rc; } -static void nfit_spa_mapping_release(struct kref *kref) -{ - struct nfit_spa_mapping *spa_map = to_spa_map(kref); - struct acpi_nfit_system_address *spa = spa_map->spa; - struct acpi_nfit_desc *acpi_desc = spa_map->acpi_desc; - - WARN_ON(!mutex_is_locked(&acpi_desc->spa_map_mutex)); - dev_dbg(acpi_desc->dev, "%s: SPA%d\n", __func__, spa->range_index); - if (spa_map->type == SPA_MAP_APERTURE) - memunmap((void __force *)spa_map->addr.aperture); - else - iounmap(spa_map->addr.base); - release_mem_region(spa->address, spa->length); - list_del(&spa_map->list); - kfree(spa_map); -} - -static struct nfit_spa_mapping *find_spa_mapping( - struct acpi_nfit_desc *acpi_desc, - struct acpi_nfit_system_address *spa) -{ - struct nfit_spa_mapping *spa_map; - - WARN_ON(!mutex_is_locked(&acpi_desc->spa_map_mutex)); - list_for_each_entry(spa_map, &acpi_desc->spa_maps, list) - if (spa_map->spa == spa) - return spa_map; - - return NULL; -} - -static void nfit_spa_unmap(struct acpi_nfit_desc *acpi_desc, - struct acpi_nfit_system_address *spa) -{ - struct nfit_spa_mapping *spa_map; - - mutex_lock(&acpi_desc->spa_map_mutex); - spa_map = find_spa_mapping(acpi_desc, spa); - - if (spa_map) - kref_put(&spa_map->kref, nfit_spa_mapping_release); - mutex_unlock(&acpi_desc->spa_map_mutex); -} - -static void __iomem *__nfit_spa_map(struct acpi_nfit_desc *acpi_desc, - struct acpi_nfit_system_address *spa, enum spa_map_type type) -{ - resource_size_t start = spa->address; - resource_size_t n = spa->length; - struct nfit_spa_mapping *spa_map; - struct resource *res; - - WARN_ON(!mutex_is_locked(&acpi_desc->spa_map_mutex)); - - spa_map = find_spa_mapping(acpi_desc, spa); - if (spa_map) { - kref_get(&spa_map->kref); - return spa_map->addr.base; - } - - spa_map = kzalloc(sizeof(*spa_map), GFP_KERNEL); - if (!spa_map) - return NULL; - - INIT_LIST_HEAD(&spa_map->list); - spa_map->spa = spa; - kref_init(&spa_map->kref); - spa_map->acpi_desc = acpi_desc; - - res = request_mem_region(start, n, dev_name(acpi_desc->dev)); - if (!res) - goto err_mem; - - spa_map->type = type; - if (type == SPA_MAP_APERTURE) - spa_map->addr.aperture = (void __pmem *)memremap(start, n, - ARCH_MEMREMAP_PMEM); - else - spa_map->addr.base = ioremap_nocache(start, n); - - - if (!spa_map->addr.base) - goto err_map; - - list_add_tail(&spa_map->list, &acpi_desc->spa_maps); - return spa_map->addr.base; - - err_map: - release_mem_region(start, n); - err_mem: - kfree(spa_map); - return NULL; -} - -/** - * nfit_spa_map - interleave-aware managed-mappings of acpi_nfit_system_address ranges - * @nvdimm_bus: NFIT-bus that provided the spa table entry - * @nfit_spa: spa table to map - * @type: aperture or control region - * - * In the case where block-data-window apertures and - * dimm-control-regions are interleaved they will end up sharing a - * single request_mem_region() + ioremap() for the address range. In - * the style of devm nfit_spa_map() mappings are automatically dropped - * when all region devices referencing the same mapping are disabled / - * unbound. - */ -static __maybe_unused void __iomem *nfit_spa_map( - struct acpi_nfit_desc *acpi_desc, - struct acpi_nfit_system_address *spa, enum spa_map_type type) -{ - void __iomem *iomem; - - mutex_lock(&acpi_desc->spa_map_mutex); - iomem = __nfit_spa_map(acpi_desc, spa, type); - mutex_unlock(&acpi_desc->spa_map_mutex); - - return iomem; -} - static int nfit_blk_init_interleave(struct nfit_blk_mmio *mmio, struct acpi_nfit_interleave *idt, u16 interleave_ways) { @@ -1773,29 +1653,6 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus, return 0; }
[PATCH v2 09/17] libnvdimm: cycle flush hints
When the NFIT provides multiple flush hint addresses per-dimm it is expressing that the platform is capable of processing multiple flush requests in parallel. There is some fixed cost per flush request, let the cost be shared in parallel on multiple cpus. Since there may not be enough flush hint addresses for each cpu to have one, keep a per-cpu index of the last used hint, hash it with current pid, and assume that access pattern and scheduler randomness will keep the flush-hint usage somewhat staggered across cpus. Cc: Ross Zwisler Signed-off-by: Dan Williams --- drivers/nvdimm/nd.h |1 + drivers/nvdimm/region_devs.c | 17 ++--- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h index 5912bd6b4234..40476399d227 100644 --- a/drivers/nvdimm/nd.h +++ b/drivers/nvdimm/nd.h @@ -52,6 +52,7 @@ struct nvdimm_drvdata { struct nd_region_data { int ns_count; int ns_active; + unsigned int flush_mask; void __iomem *flush_wpq[0][0]; }; diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index 46b6e2f7d5f0..4bcb3b6744aa 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -22,6 +23,7 @@ #include "nd.h" static DEFINE_IDA(region_ida); +static DEFINE_PER_CPU(int, flush_idx); static int nvdimm_map_flush(struct device *dev, struct nvdimm *nvdimm, int dimm, struct nd_region_data *ndrd) @@ -61,7 +63,7 @@ static int nvdimm_map_flush(struct device *dev, struct nvdimm *nvdimm, int dimm, int nd_region_activate(struct nd_region *nd_region) { - int i; + int i, num_flush = 0; struct nd_region_data *ndrd; struct device *dev = &nd_region->dev; size_t flush_data_size = sizeof(void *); @@ -73,6 +75,7 @@ int nd_region_activate(struct nd_region *nd_region) /* at least one null hint slot per-dimm for the "no-hint" case */ flush_data_size += sizeof(void *); + num_flush = min_not_zero(num_flush, nvdimm->num_flush); if (!nvdimm->num_flush) continue; flush_data_size += nvdimm->num_flush * sizeof(void *); @@ -84,6 +87,7 @@ int nd_region_activate(struct nd_region *nd_region) return -ENOMEM; dev_set_drvdata(dev, ndrd); + ndrd->flush_mask = (1 << ilog2(num_flush)) - 1; for (i = 0; i < nd_region->ndr_mappings; i++) { struct nd_mapping *nd_mapping = &nd_region->mapping[i]; struct nvdimm *nvdimm = nd_mapping->nvdimm; @@ -872,7 +876,14 @@ EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create); void nvdimm_flush(struct nd_region *nd_region) { struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev); - int i; + int i, idx; + + /* +* Try to encourage some diversity in flush hint addresses +* across cpus assuming a limited number of flush hints. +*/ + idx = this_cpu_read(flush_idx); + idx = this_cpu_add_return(flush_idx, hash_32(current->pid + idx, 8)); /* * The first wmb() is needed to 'sfence' all previous writes @@ -884,7 +895,7 @@ void nvdimm_flush(struct nd_region *nd_region) wmb(); for (i = 0; i < nd_region->ndr_mappings; i++) if (ndrd->flush_wpq[i][0]) - writeq(1, ndrd->flush_wpq[i][0]); + writeq(1, ndrd->flush_wpq[i][idx & ndrd->flush_mask]); wmb(); } EXPORT_SYMBOL_GPL(nvdimm_flush);
[PATCH v2 07/17] libnvdimm: keep region data alive over namespace removal
nd_region device driver data will be used in the namespace i/o path. Re-order nd_region_remove() to ensure this data stays live across namespace device removal Signed-off-by: Dan Williams --- drivers/nvdimm/region.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c index 333175dac8d5..8f241772ec0b 100644 --- a/drivers/nvdimm/region.c +++ b/drivers/nvdimm/region.c @@ -82,6 +82,8 @@ static int nd_region_remove(struct device *dev) { struct nd_region *nd_region = to_nd_region(dev); + device_for_each_child(dev, NULL, child_unregister); + /* flush attribute readers and disable */ nvdimm_bus_lock(dev); nd_region->ns_seed = NULL; @@ -91,7 +93,6 @@ static int nd_region_remove(struct device *dev) dev_set_drvdata(dev, NULL); nvdimm_bus_unlock(dev); - device_for_each_child(dev, NULL, child_unregister); return 0; }
[PATCH v2 11/17] libnvdimm, pmem: flush posted-write queues on shutdown
Commit writes to media on system shutdown or pmem driver unload. Signed-off-by: Dan Williams --- drivers/nvdimm/bus.c | 16 drivers/nvdimm/pmem.c |8 include/linux/nd.h|1 + 3 files changed, 25 insertions(+) diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index e4882e63bece..1cc7880320fe 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -136,6 +136,21 @@ static int nvdimm_bus_remove(struct device *dev) return rc; } +static void nvdimm_bus_shutdown(struct device *dev) +{ + struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev); + struct nd_device_driver *nd_drv = NULL; + + if (dev->driver) + nd_drv = to_nd_device_driver(dev->driver); + + if (nd_drv && nd_drv->shutdown) { + nd_drv->shutdown(dev); + dev_dbg(&nvdimm_bus->dev, "%s.shutdown(%s)\n", + dev->driver->name, dev_name(dev)); + } +} + void nd_device_notify(struct device *dev, enum nvdimm_event event) { device_lock(dev); @@ -214,6 +229,7 @@ static struct bus_type nvdimm_bus_type = { .match = nvdimm_bus_match, .probe = nvdimm_bus_probe, .remove = nvdimm_bus_remove, + .shutdown = nvdimm_bus_shutdown, }; static ASYNC_DOMAIN_EXCLUSIVE(nd_async_domain); diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 18cd95719da0..3f3fdb9586b9 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -351,9 +351,16 @@ static int nd_pmem_remove(struct device *dev) { if (is_nd_btt(dev)) nvdimm_namespace_detach_btt(to_nd_btt(dev)); + nvdimm_flush(to_nd_region(dev->parent)); + return 0; } +static void nd_pmem_shutdown(struct device *dev) +{ + nvdimm_flush(to_nd_region(dev->parent)); +} + static void nd_pmem_notify(struct device *dev, enum nvdimm_event event) { struct pmem_device *pmem = dev_get_drvdata(dev); @@ -393,6 +400,7 @@ static struct nd_device_driver nd_pmem_driver = { .probe = nd_pmem_probe, .remove = nd_pmem_remove, .notify = nd_pmem_notify, + .shutdown = nd_pmem_shutdown, .drv = { .name = "nd_pmem", }, diff --git a/include/linux/nd.h b/include/linux/nd.h index aee2761d294c..1ecd64643512 100644 --- a/include/linux/nd.h +++ b/include/linux/nd.h @@ -26,6 +26,7 @@ struct nd_device_driver { unsigned long type; int (*probe)(struct device *dev); int (*remove)(struct device *dev); + void (*shutdown)(struct device *dev); void (*notify)(struct device *dev, enum nvdimm_event event); };
[PATCH v2 10/17] libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush()
Given that nvdimm_flush() has higher overhead than wmb_pmem() (pointer chasing through nd_region), and that we otherwise assume a platform has ADR capability when flush hints are not present, move nvdimm_flush() to REQ_FLUSH context. Cc: Ross Zwisler Signed-off-by: Dan Williams --- drivers/nvdimm/pmem.c | 24 +--- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index e303655f243e..18cd95719da0 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -113,6 +113,11 @@ static int pmem_do_bvec(struct pmem_device *pmem, struct page *page, return rc; } +/* account for REQ_FLUSH rename, replace with REQ_PREFLUSH after v4.8-rc1 */ +#ifndef REQ_FLUSH +#define REQ_FLUSH REQ_PREFLUSH +#endif + static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio) { int rc = 0; @@ -121,6 +126,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio) struct bio_vec bvec; struct bvec_iter iter; struct pmem_device *pmem = q->queuedata; + struct nd_region *nd_region = to_region(pmem); + + if (bio->bi_rw & REQ_FLUSH) + nvdimm_flush(nd_region); do_acct = nd_iostat_start(bio, &start); bio_for_each_segment(bvec, bio, iter) { @@ -135,8 +144,8 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio) if (do_acct) nd_iostat_end(bio, start); - if (bio_data_dir(bio)) - nvdimm_flush(to_region(pmem)); + if (bio->bi_rw & REQ_FUA) + nvdimm_flush(nd_region); bio_endio(bio); return BLK_QC_T_NONE; @@ -149,8 +158,6 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector, int rc; rc = pmem_do_bvec(pmem, page, PAGE_SIZE, 0, rw, sector); - if (rw & WRITE) - nvdimm_flush(to_region(pmem)); /* * The ->rw_page interface is subtle and tricky. The core @@ -209,9 +216,9 @@ static int pmem_attach_disk(struct device *dev, struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); struct nd_region *nd_region = to_nd_region(dev->parent); struct vmem_altmap __altmap, *altmap = NULL; + int nid = dev_to_node(dev), has_flush; struct resource *res = &nsio->res; struct nd_pfn *nd_pfn = NULL; - int nid = dev_to_node(dev); struct nd_pfn_sb *pfn_sb; struct pmem_device *pmem; struct resource pfn_res; @@ -237,8 +244,6 @@ static int pmem_attach_disk(struct device *dev, dev_set_drvdata(dev, pmem); pmem->phys_addr = res->start; pmem->size = resource_size(res); - if (nvdimm_has_flush(nd_region) < 0) - dev_warn(dev, "unable to guarantee persistence of writes\n"); if (!devm_request_mem_region(dev, res->start, resource_size(res), dev_name(dev))) { @@ -279,6 +284,11 @@ static int pmem_attach_disk(struct device *dev, return PTR_ERR(addr); pmem->virt_addr = (void __pmem *) addr; + has_flush = nvdimm_has_flush(nd_region); + if (has_flush < 0) + dev_warn(dev, "unable to guarantee persistence of writes\n"); + else if (has_flush > 0) + blk_queue_write_cache(q, true, true); blk_queue_make_request(q, pmem_make_request); blk_queue_physical_block_size(q, PAGE_SIZE); blk_queue_max_hw_sectors(q, UINT_MAX);
[PATCH v2 12/17] fs/dax: remove wmb_pmem()
Flushing posted-write queues is now deferred to REQ_FLUSH context, or otherwise handled by an ADR event at the platform level. Cc: Ross Zwisler Signed-off-by: Dan Williams --- fs/dax.c |7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 761495bf5eb9..434f421da660 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -147,7 +147,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter, struct buffer_head *bh) { loff_t pos = start, max = start, bh_max = start; - bool hole = false, need_wmb = false; + bool hole = false; struct block_device *bdev = NULL; int rw = iov_iter_rw(iter), rc; long map_len = 0; @@ -213,7 +213,6 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter, if (iov_iter_rw(iter) == WRITE) { len = copy_from_iter_pmem(dax.addr, max - pos, iter); - need_wmb = true; } else if (!hole) len = copy_to_iter((void __force *) dax.addr, max - pos, iter); @@ -230,8 +229,6 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter, dax.addr += len; } - if (need_wmb) - wmb_pmem(); dax_unmap_atomic(bdev, &dax); return (pos == start) ? rc : pos - start; @@ -783,7 +780,6 @@ int dax_writeback_mapping_range(struct address_space *mapping, return ret; } } - wmb_pmem(); return 0; } EXPORT_SYMBOL_GPL(dax_writeback_mapping_range); @@ -1227,7 +1223,6 @@ int __dax_zero_page_range(struct block_device *bdev, sector_t sector, if (dax_map_atomic(bdev, &dax) < 0) return PTR_ERR(dax.addr); clear_pmem(dax.addr + offset, length); - wmb_pmem(); dax_unmap_atomic(bdev, &dax); } return 0;
[PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()
nvdimm_flush() is a replacement for the x86 'pcommit' instruction. It is an optional write flushing mechanism that an nvdimm bus can provide for the pmem driver to consume. In the case of the NFIT nvdimm-bus-provider nvdimm_flush() is implemented as a series of flush-hint-address [1] writes to each dimm in the interleave set (region) that backs the namespace. The nvdimm_has_flush() routine relies on platform firmware to describe the flushing capabilities of a platform. It uses the heuristic of whether an nvdimm bus provider provides flush address data to return a ternary result: 1: flush addresses defined 0: dimm topology described without flush addresses (assume ADR) -errno: no topology information, unable to determine flush mechanism The pmem driver is expected to take the following actions on this ternary result: 1: nvdimm_flush() in response to REQ_FUA / REQ_FLUSH and shutdown 0: do not set, WC or FUA on the queue, take no further action -errno: warn and then operate as if nvdimm_has_flush() returned '0' The caveat of this heuristic is that it can not distinguish the "dimm does not have flush address" case from the "platform firmware is broken and failed to describe a flush address". Given we are already explicitly trusting the NFIT there's not much more we can do beyond blacklisting broken firmwares if they are ever encountered. Cc: Ross Zwisler Signed-off-by: Dan Williams --- drivers/acpi/nfit.c | 33 ++--- drivers/acpi/nfit.h |1 - drivers/nvdimm/pmem.c| 27 - drivers/nvdimm/region_devs.c | 55 ++ include/linux/libnvdimm.h|2 ++ 5 files changed, 81 insertions(+), 37 deletions(-) diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c index 6796f780870a..0497175ee6cb 100644 --- a/drivers/acpi/nfit.c +++ b/drivers/acpi/nfit.c @@ -1393,24 +1393,6 @@ static u64 to_interleave_offset(u64 offset, struct nfit_blk_mmio *mmio) return mmio->base_offset + line_offset + table_offset + sub_line_offset; } -static void wmb_blk(struct nfit_blk *nfit_blk) -{ - - if (nfit_blk->nvdimm_flush) { - /* -* The first wmb() is needed to 'sfence' all previous writes -* such that they are architecturally visible for the platform -* buffer flush. Note that we've already arranged for pmem -* writes to avoid the cache via arch_memcpy_to_pmem(). The -* final wmb() ensures ordering for the NVDIMM flush write. -*/ - wmb(); - writeq(1, nfit_blk->nvdimm_flush); - wmb(); - } else - wmb_pmem(); -} - static u32 read_blk_stat(struct nfit_blk *nfit_blk, unsigned int bw) { struct nfit_blk_mmio *mmio = &nfit_blk->mmio[DCR]; @@ -1445,7 +1427,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw, offset = to_interleave_offset(offset, mmio); writeq(cmd, mmio->addr.base + offset); - wmb_blk(nfit_blk); + nvdimm_flush(nfit_blk->nd_region); if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH) readq(mmio->addr.base + offset); @@ -1496,7 +1478,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk, } if (rw) - wmb_blk(nfit_blk); + nvdimm_flush(nfit_blk->nd_region); rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0; return rc; @@ -1570,7 +1552,6 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus, { struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus); struct nd_blk_region *ndbr = to_nd_blk_region(dev); - struct nfit_flush *nfit_flush; struct nfit_blk_mmio *mmio; struct nfit_blk *nfit_blk; struct nfit_mem *nfit_mem; @@ -1645,15 +1626,7 @@ static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus, return rc; } - nfit_flush = nfit_mem->nfit_flush; - if (nfit_flush && nfit_flush->flush->hint_count != 0) { - nfit_blk->nvdimm_flush = devm_nvdimm_ioremap(dev, - nfit_flush->flush->hint_address[0], 8); - if (!nfit_blk->nvdimm_flush) - return -ENOMEM; - } - - if (!arch_has_wmb_pmem() && !nfit_blk->nvdimm_flush) + if (nvdimm_has_flush(nfit_blk->nd_region) < 0) dev_warn(dev, "unable to guarantee persistence of writes\n"); if (mmio->line_size == 0) diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h index 9282eb324dcc..9fda77cf81da 100644 --- a/drivers/acpi/nfit.h +++ b/drivers/acpi/nfit.h @@ -183,7 +183,6 @@ struct nfit_blk { u64 bdw_offset; /* post interleave offset */ u64 stat_offset; u64 cmd_offset; - void __iomem *nvdimm_flush; u32 dimm_flags; }; diff --git a/dri
[PATCH v2 15/17] Revert "KVM: x86: add pcommit support"
This reverts commit 8b3e34e46aca9b6d349b331cd9cf71ccbdc91b2e. Given the deprecation of the pcommit instruction, revert its usage as a vm exit source in kvm. Cc: Xiao Guangrong Cc: Paolo Bonzini Cc: Ross Zwisler Signed-off-by: Dan Williams --- arch/x86/include/asm/vmx.h |1 - arch/x86/include/uapi/asm/vmx.h |4 +--- arch/x86/kvm/cpuid.c|2 +- arch/x86/kvm/cpuid.h|8 arch/x86/kvm/vmx.c | 32 5 files changed, 6 insertions(+), 41 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 14c63c7e8337..a002b07a7099 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -72,7 +72,6 @@ #define SECONDARY_EXEC_SHADOW_VMCS 0x4000 #define SECONDARY_EXEC_ENABLE_PML 0x0002 #define SECONDARY_EXEC_XSAVES 0x0010 -#define SECONDARY_EXEC_PCOMMIT 0x0020 #define SECONDARY_EXEC_TSC_SCALING 0x0200 #define PIN_BASED_EXT_INTR_MASK 0x0001 diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h index 5b15d94a33f8..37fee272618f 100644 --- a/arch/x86/include/uapi/asm/vmx.h +++ b/arch/x86/include/uapi/asm/vmx.h @@ -78,7 +78,6 @@ #define EXIT_REASON_PML_FULL62 #define EXIT_REASON_XSAVES 63 #define EXIT_REASON_XRSTORS 64 -#define EXIT_REASON_PCOMMIT 65 #define VMX_EXIT_REASONS \ { EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \ @@ -127,8 +126,7 @@ { EXIT_REASON_INVVPID, "INVVPID" }, \ { EXIT_REASON_INVPCID, "INVPCID" }, \ { EXIT_REASON_XSAVES,"XSAVES" }, \ - { EXIT_REASON_XRSTORS, "XRSTORS" }, \ - { EXIT_REASON_PCOMMIT, "PCOMMIT" } + { EXIT_REASON_XRSTORS, "XRSTORS" } #define VMX_ABORT_SAVE_GUEST_MSR_FAIL1 #define VMX_ABORT_LOAD_HOST_MSR_FAIL 4 diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 7597b42a8a88..643565364497 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -366,7 +366,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | f_mpx | F(RDSEED) | F(ADX) | F(SMAP) | F(AVX512F) | F(AVX512PF) | F(AVX512ER) | - F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(PCOMMIT); + F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB); /* cpuid 0xD.1.eax */ const u32 kvm_cpuid_D_1_eax_x86_features = diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index e17a74b1d852..35058c2c0eea 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -144,14 +144,6 @@ static inline bool guest_cpuid_has_rtm(struct kvm_vcpu *vcpu) return best && (best->ebx & bit(X86_FEATURE_RTM)); } -static inline bool guest_cpuid_has_pcommit(struct kvm_vcpu *vcpu) -{ - struct kvm_cpuid_entry2 *best; - - best = kvm_find_cpuid_entry(vcpu, 7, 0); - return best && (best->ebx & bit(X86_FEATURE_PCOMMIT)); -} - static inline bool guest_cpuid_has_rdtscp(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index fb93010beaa4..2e2685424fdc 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2705,8 +2705,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx) SECONDARY_EXEC_APIC_REGISTER_VIRT | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | SECONDARY_EXEC_WBINVD_EXITING | - SECONDARY_EXEC_XSAVES | - SECONDARY_EXEC_PCOMMIT; + SECONDARY_EXEC_XSAVES; if (enable_ept) { /* nested EPT: emulate EPT also to L1 */ @@ -3268,7 +3267,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) SECONDARY_EXEC_SHADOW_VMCS | SECONDARY_EXEC_XSAVES | SECONDARY_EXEC_ENABLE_PML | - SECONDARY_EXEC_PCOMMIT | SECONDARY_EXEC_TSC_SCALING; if (adjust_vmx_controls(min2, opt2, MSR_IA32_VMX_PROCBASED_CTLS2, @@ -4856,9 +4854,6 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx) if (!enable_pml) exec_control &= ~SECONDARY_EXEC_ENABLE_PML; - /* Currently, we allow L1 guest to directly run pcommit instruction. */ - exec_control &= ~SECONDARY_EXEC_PCOMMIT; - return exec_control; } @@ -4902,9 +4897,10 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, vmx_exec_control(vmx)); - if (cpu_has_secondary_exec_ctrls()) + if (cpu_has_secondary_exec_ctrls()) {
[PATCH v2 13/17] libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
nsio_rw_bytes() is used to write info block metadata to the namespace, so it should trigger a flush after every write. Replace wmb_pmem() with nvdimm_flush() in this path. Cc: Ross Zwisler Signed-off-by: Dan Williams --- drivers/nvdimm/claim.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c index 9997ff94a132..d5dc80c48b4c 100644 --- a/drivers/nvdimm/claim.c +++ b/drivers/nvdimm/claim.c @@ -240,7 +240,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns, return memcpy_from_pmem(buf, nsio->addr + offset, size); } else { memcpy_to_pmem(nsio->addr + offset, buf, size); - wmb_pmem(); + nvdimm_flush(to_nd_region(ndns->dev.parent)); } return 0;
[PATCH v2 16/17] x86/insn: remove pcommit
The pcommit instruction is being deprecated in favor of either ADR (asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM Firmware Interface Table). Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: x...@kernel.org Cc: Josh Poimboeuf Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Borislav Petkov Cc: Andy Lutomirski Cc: Xiao Guangrong Cc: Adrian Hunter Cc: Ross Zwisler Signed-off-by: Dan Williams --- arch/x86/include/asm/cpufeatures.h |1 arch/x86/include/asm/special_insns.h | 46 arch/x86/lib/x86-opcode-map.txt|2 - tools/objtool/arch/x86/insn/x86-opcode-map.txt |2 - tools/perf/arch/x86/tests/insn-x86-dat-32.c|2 - tools/perf/arch/x86/tests/insn-x86-dat-64.c|2 - tools/perf/arch/x86/tests/insn-x86-dat-src.c |4 -- .../perf/util/intel-pt-decoder/x86-opcode-map.txt |2 - 8 files changed, 3 insertions(+), 58 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 4a413485f9eb..700d97df7d28 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -225,7 +225,6 @@ #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */ #define X86_FEATURE_ADX( 9*32+19) /* The ADCX and ADOX instructions */ #define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */ -#define X86_FEATURE_PCOMMIT( 9*32+22) /* PCOMMIT instruction */ #define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */ #define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */ #define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */ diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index d96d04377765..587d7914ea4b 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -253,52 +253,6 @@ static inline void clwb(volatile void *__p) : [pax] "a" (p)); } -/** - * pcommit_sfence() - persistent commit and fence - * - * The PCOMMIT instruction ensures that data that has been flushed from the - * processor's cache hierarchy with CLWB, CLFLUSHOPT or CLFLUSH is accepted to - * memory and is durable on the DIMM. The primary use case for this is - * persistent memory. - * - * This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH and PCOMMIT - * with appropriate fencing. - * - * Example: - * void flush_and_commit_buffer(void *vaddr, unsigned int size) - * { - * unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1; - * void *vend = vaddr + size; - * void *p; - * - * for (p = (void *)((unsigned long)vaddr & ~clflush_mask); - * p < vend; p += boot_cpu_data.x86_clflush_size) - * clwb(p); - * - * // SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes - * // MFENCE via mb() also works - * wmb(); - * - * // PCOMMIT and the required SFENCE for ordering - * pcommit_sfence(); - * } - * - * After this function completes the data pointed to by 'vaddr' has been - * accepted to memory and will be durable if the 'vaddr' points to persistent - * memory. - * - * PCOMMIT must always be ordered by an MFENCE or SFENCE, so to help simplify - * things we include both the PCOMMIT and the required SFENCE in the - * alternatives generated by pcommit_sfence(). - */ -static inline void pcommit_sfence(void) -{ - alternative(ASM_NOP7, - ".byte 0x66, 0x0f, 0xae, 0xf8\n\t" /* pcommit */ - "sfence", - X86_FEATURE_PCOMMIT); -} - #define nop() asm volatile ("nop") diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt index d388de72eaca..28632ee68377 100644 --- a/arch/x86/lib/x86-opcode-map.txt +++ b/arch/x86/lib/x86-opcode-map.txt @@ -947,7 +947,7 @@ GrpTable: Grp15 4: XSAVE 5: XRSTOR | lfence (11B) 6: XSAVEOPT | clwb (66) | mfence (11B) -7: clflush | clflushopt (66) | sfence (11B) | pcommit (66),(11B) +7: clflush | clflushopt (66) | sfence (11B) EndTable GrpTable: Grp16 diff --git a/tools/objtool/arch/x86/insn/x86-opcode-map.txt b/tools/objtool/arch/x86/insn/x86-opcode-map.txt index d388de72eaca..28632ee68377 100644 --- a/tools/objtool/arch/x86/insn/x86-opcode-map.txt +++ b/tools/objtool/arch/x86/insn/x86-opcode-map.txt @@ -947,7 +947,7 @@ GrpTable: Grp15 4: XSAVE 5: XRSTOR | lfence (11B) 6: XSAVEOPT | clwb (66) | mfence (11B) -7: clflush | clflushopt (66) | sfence (11B) | pcommit (66),(11B) +7: clflush | clflushopt (66) | sfence (11B) EndTable GrpTable: Grp16 diff --git a/tools/perf/arch/x86/tests/insn-x86-dat-32.c b/tools/perf/arch/x86/tests/insn-x86-dat-32.c index 3b491cfe204e..38a48daed154 100644 --- a/tools/perf/arch/x86/tests/insn-
[PATCH v2 17/17] pmem: kill __pmem address space
The __pmem address space was meant to annotate codepaths that touch persistent memory and need to coordinate a call to wmb_pmem(). Now that wmb_pmem() is gone, there is little need to keep this annotation. Cc: Christoph Hellwig Cc: Ross Zwisler Signed-off-by: Dan Williams --- Documentation/filesystems/Locking |2 + arch/powerpc/sysdev/axonram.c |4 +- arch/x86/include/asm/pmem.h | 41 +- drivers/acpi/nfit.h |2 + drivers/block/brd.c |4 +- drivers/nvdimm/pmem.c |6 ++- drivers/nvdimm/pmem.h |4 +- drivers/s390/block/dcssblk.c |6 ++- fs/dax.c |6 ++- include/linux/blkdev.h|6 ++- include/linux/compiler.h |2 - include/linux/nd.h|2 + include/linux/pmem.h | 70 + scripts/checkpatch.pl |1 - tools/testing/nvdimm/pmem-dax.c |2 + 15 files changed, 56 insertions(+), 102 deletions(-) diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 75eea7ce3d7c..d9c37ec4c760 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -395,7 +395,7 @@ prototypes: int (*release) (struct gendisk *, fmode_t); int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long); int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long); - int (*direct_access) (struct block_device *, sector_t, void __pmem **, + int (*direct_access) (struct block_device *, sector_t, void **, unsigned long *); int (*media_changed) (struct gendisk *); void (*unlock_native_capacity) (struct gendisk *); diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c index ff75d70f7285..154cd9110c08 100644 --- a/arch/powerpc/sysdev/axonram.c +++ b/arch/powerpc/sysdev/axonram.c @@ -143,12 +143,12 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio) */ static long axon_ram_direct_access(struct block_device *device, sector_t sector, - void __pmem **kaddr, pfn_t *pfn, long size) + void **kaddr, pfn_t *pfn, long size) { struct axon_ram_bank *bank = device->bd_disk->private_data; loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT; - *kaddr = (void __pmem __force *) bank->io_addr + offset; + *kaddr = bank->io_addr + offset; *pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV); return bank->size - offset; } diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h index a8cf2a6b14d9..643eba42d620 100644 --- a/arch/x86/include/asm/pmem.h +++ b/arch/x86/include/asm/pmem.h @@ -28,10 +28,9 @@ * Copy data to persistent memory media via non-temporal stores so that * a subsequent pmem driver flush operation will drain posted write queues. */ -static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src, - size_t n) +static inline void arch_memcpy_to_pmem(void *dst, const void *src, size_t n) { - int unwritten; + int rem; /* * We are copying between two kernel buffers, if @@ -39,19 +38,17 @@ static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src, * fault) we would have already reported a general protection fault * before the WARN+BUG. */ - unwritten = __copy_from_user_inatomic_nocache((void __force *) dst, - (void __user *) src, n); - if (WARN(unwritten, "%s: fault copying %p <- %p unwritten: %d\n", - __func__, dst, src, unwritten)) + rem = __copy_from_user_inatomic_nocache(dst, (void __user *) src, n); + if (WARN(rem, "%s: fault copying %p <- %p unwritten: %d\n", + __func__, dst, src, rem)) BUG(); } -static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src, - size_t n) +static inline int arch_memcpy_from_pmem(void *dst, const void *src, size_t n) { if (static_cpu_has(X86_FEATURE_MCE_RECOVERY)) - return memcpy_mcsafe(dst, (void __force *) src, n); - memcpy(dst, (void __force *) src, n); + return memcpy_mcsafe(dst, src, n); + memcpy(dst, src, n); return 0; } @@ -63,15 +60,14 @@ static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src, * Write back a cache range using the CLWB (cache line write back) * instruction. */ -static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size) +static inline void arch_wb_cache_pmem(void *addr, size_t size) { u16 x86_clflush_size = boot_cpu_data.x86_clflush_size; unsigned long clflush_mask = x86_clflush_size - 1; - void *vaddr = (void __force *)addr; - void *vend = vaddr + s
[PATCH v2 14/17] pmem: kill wmb_pmem()
All users have been replaced with flushing in the pmem driver. Cc: Ross Zwisler Signed-off-by: Dan Williams --- arch/x86/include/asm/pmem.h | 36 ++--- include/linux/pmem.h| 47 --- 2 files changed, 6 insertions(+), 77 deletions(-) diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h index fbc5e92e1ecc..a8cf2a6b14d9 100644 --- a/arch/x86/include/asm/pmem.h +++ b/arch/x86/include/asm/pmem.h @@ -26,8 +26,7 @@ * @n: length of the copy in bytes * * Copy data to persistent memory media via non-temporal stores so that - * a subsequent arch_wmb_pmem() can flush cpu and memory controller - * write buffers to guarantee durability. + * a subsequent pmem driver flush operation will drain posted write queues. */ static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src, size_t n) @@ -57,32 +56,12 @@ static inline int arch_memcpy_from_pmem(void *dst, const void __pmem *src, } /** - * arch_wmb_pmem - synchronize writes to persistent memory - * - * After a series of arch_memcpy_to_pmem() operations this drains data - * from cpu write buffers and any platform (memory controller) buffers - * to ensure that written data is durable on persistent memory media. - */ -static inline void arch_wmb_pmem(void) -{ - /* -* wmb() to 'sfence' all previous writes such that they are -* architecturally visible to 'pcommit'. Note, that we've -* already arranged for pmem writes to avoid the cache via -* arch_memcpy_to_pmem(). -*/ - wmb(); - pcommit_sfence(); -} - -/** * arch_wb_cache_pmem - write back a cache range with CLWB * @vaddr: virtual start address * @size: number of bytes to write back * * Write back a cache range using the CLWB (cache line write back) - * instruction. This function requires explicit ordering with an - * arch_wmb_pmem() call. + * instruction. */ static inline void arch_wb_cache_pmem(void __pmem *addr, size_t size) { @@ -113,7 +92,6 @@ static inline bool __iter_needs_pmem_wb(struct iov_iter *i) * @i: iterator with source data * * Copy data from the iterator 'i' to the PMEM buffer starting at 'addr'. - * This function requires explicit ordering with an arch_wmb_pmem() call. */ static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes, struct iov_iter *i) @@ -136,7 +114,6 @@ static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes, * @size: number of bytes to zero * * Write zeros into the memory range starting at 'addr' for 'size' bytes. - * This function requires explicit ordering with an arch_wmb_pmem() call. */ static inline void arch_clear_pmem(void __pmem *addr, size_t size) { @@ -150,14 +127,5 @@ static inline void arch_invalidate_pmem(void __pmem *addr, size_t size) { clflush_cache_range((void __force *) addr, size); } - -static inline bool __arch_has_wmb_pmem(void) -{ - /* -* We require that wmb() be an 'sfence', that is only guaranteed on -* 64-bit builds -*/ - return static_cpu_has(X86_FEATURE_PCOMMIT); -} #endif /* CONFIG_ARCH_HAS_PMEM_API */ #endif /* __ASM_X86_PMEM_H__ */ diff --git a/include/linux/pmem.h b/include/linux/pmem.h index 57d146fe44dd..9e3ea94b8157 100644 --- a/include/linux/pmem.h +++ b/include/linux/pmem.h @@ -26,16 +26,6 @@ * calling these symbols with arch_has_pmem_api() and redirect to the * implementation in asm/pmem.h. */ -static inline bool __arch_has_wmb_pmem(void) -{ - return false; -} - -static inline void arch_wmb_pmem(void) -{ - BUG(); -} - static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src, size_t n) { @@ -101,20 +91,6 @@ static inline int memcpy_from_pmem(void *dst, void __pmem const *src, return default_memcpy_from_pmem(dst, src, size); } -/** - * arch_has_wmb_pmem - true if wmb_pmem() ensures durability - * - * For a given cpu implementation within an architecture it is possible - * that wmb_pmem() resolves to a nop. In the case this returns - * false, pmem api users are unable to ensure durability and may want to - * fall back to a different data consistency model, or otherwise notify - * the user. - */ -static inline bool arch_has_wmb_pmem(void) -{ - return arch_has_pmem_api() && __arch_has_wmb_pmem(); -} - /* * These defaults seek to offer decent performance and minimize the * window between i/o completion and writes being durable on media. @@ -152,7 +128,7 @@ static inline void default_clear_pmem(void __pmem *addr, size_t size) * being effectively evicted from, or never written to, the processor * cache hierarchy after the copy completes. After memcpy_to_pmem() * data may still reside in cpu or platform buffers, so this operation - * must be followed by a wmb_pmem(). + * must be followed by a blkdev_issue_flush()
[PATCH v2 00/17] replace pcommit with ADR or directed flushing
Changes since v1 [1]: 1/ Move flush address data from nvdimm_drvdata to nd_region_data (Greg, Toshi) 2/ Add more detail to cover letter and patch descriptions (Linda, Jeff) 3/ Account for s/REQ_FLUSH/REQ_PREFLUSH/ rename pending in -next. 4/ Add a directed flush at pmem ->remove() and ->shutdown() time. [1]: https://lists.01.org/pipermail/linux-nvdimm/2016-June/005897.html --- The pcommit instruction, which has not shipped on any product, is deprecated. Instead, the expectation is that platforms implement either ADR, or provide one or more flush addresses per nvdimm. ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers to the memory controller on a power-fail event. Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure: "Flush Hint Address Structure". A flush hint is an mmio address that when written and fenced assures that all previous posted writes targeting a given dimm have been flushed to media. Code paths that previously called wmb_pmem() instead must arrange for a flush request to be sent to the pmem driver. Towards this end, the pmem driver is converted to advertise itself as having a write cache to indicate to a filesystem that a flush request must occur before writes are guaranteed to be on media. See "[PATCH v2 08/17] libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()" for details. --- Dan Williams (17): nfit: always associate flush hints nfit: don't override return value of nfit_mem_init libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users libnvdimm, nfit: remove nfit_spa_map() infrastructure libnvdimm, nfit: move flush hint mapping to region-device driver-data tools/testing/nvdimm: simulate multiple flush hints per-dimm libnvdimm: keep region data alive over namespace removal libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush() libnvdimm: cycle flush hints libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush() libnvdimm, pmem: flush posted-write queues on shutdown fs/dax: remove wmb_pmem() libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes pmem: kill wmb_pmem() Revert "KVM: x86: add pcommit support" x86/insn: remove pcommit pmem: kill __pmem address space Documentation/filesystems/Locking |2 arch/powerpc/sysdev/axonram.c |4 arch/x86/include/asm/cpufeatures.h |1 arch/x86/include/asm/pmem.h| 77 ++- arch/x86/include/asm/special_insns.h | 46 arch/x86/include/asm/vmx.h |1 arch/x86/include/uapi/asm/vmx.h|4 arch/x86/kvm/cpuid.c |2 arch/x86/kvm/cpuid.h |8 - arch/x86/kvm/vmx.c | 32 --- arch/x86/lib/x86-opcode-map.txt|2 drivers/acpi/nfit.c| 230 +++- drivers/acpi/nfit.h| 25 -- drivers/block/brd.c|4 drivers/nvdimm/bus.c | 16 + drivers/nvdimm/claim.c |2 drivers/nvdimm/core.c | 122 +++ drivers/nvdimm/dimm_devs.c |5 drivers/nvdimm/nd-core.h |4 drivers/nvdimm/nd.h| 10 + drivers/nvdimm/pmem.c | 59 - drivers/nvdimm/pmem.h |4 drivers/nvdimm/region.c| 19 +- drivers/nvdimm/region_devs.c | 148 - drivers/s390/block/dcssblk.c |6 - fs/dax.c | 13 - include/linux/blkdev.h |6 - include/linux/compiler.h |2 include/linux/libnvdimm.h | 16 + include/linux/nd.h |3 include/linux/pmem.h | 117 ++ scripts/checkpatch.pl |1 tools/objtool/arch/x86/insn/x86-opcode-map.txt |2 tools/perf/arch/x86/tests/insn-x86-dat-32.c|2 tools/perf/arch/x86/tests/insn-x86-dat-64.c|2 tools/perf/arch/x86/tests/insn-x86-dat-src.c |4 .../perf/util/intel-pt-decoder/x86-opcode-map.txt |2 tools/testing/nvdimm/pmem-dax.c|2 tools/testing/nvdimm/test/nfit.c | 55 +++-- 39 files changed, 505 insertions(+), 555 deletions(-)
[PATCH v2 05/17] libnvdimm, nfit: move flush hint mapping to region-device driver-data
In preparation for triggering flushes of a DIMM's writes-posted-queue (WPQ) via the pmem driver move mapping of flush hint addresses to the region driver. Since this uses devm_nvdimm_memremap() the flush addresses will remain mapped while any region to which the dimm belongs is active. We need to communicate more information to the nvdimm core to facilitate this mapping, namely each dimm object now carries an array of flush hint address resources. Signed-off-by: Dan Williams --- drivers/acpi/nfit.c | 21 +++ drivers/acpi/nfit.h |1 + drivers/nvdimm/dimm_devs.c |5 ++- drivers/nvdimm/nd-core.h |3 +- drivers/nvdimm/nd.h |8 +++- drivers/nvdimm/region.c | 16 - drivers/nvdimm/region_devs.c | 79 -- include/linux/libnvdimm.h|4 ++ 8 files changed, 119 insertions(+), 18 deletions(-) diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c index b76c95981547..6796f780870a 100644 --- a/drivers/acpi/nfit.c +++ b/drivers/acpi/nfit.c @@ -714,9 +714,24 @@ static int nfit_mem_dcr_init(struct acpi_nfit_desc *acpi_desc, } list_for_each_entry(nfit_flush, &acpi_desc->flushes, list) { + struct acpi_nfit_flush_address *flush; + u16 i; + if (nfit_flush->flush->device_handle != device_handle) continue; nfit_mem->nfit_flush = nfit_flush; + flush = nfit_flush->flush; + nfit_mem->flush_wpq = devm_kzalloc(acpi_desc->dev, + flush->hint_count + * sizeof(struct resource), GFP_KERNEL); + if (!nfit_mem->flush_wpq) + return -ENOMEM; + for (i = 0; i < flush->hint_count; i++) { + struct resource *res = &nfit_mem->flush_wpq[i]; + + res->start = flush->hint_address[i]; + res->end = res->start + 8 - 1; + } break; } @@ -1171,6 +1186,7 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc) int dimm_count = 0; list_for_each_entry(nfit_mem, &acpi_desc->dimms, list) { + struct acpi_nfit_flush_address *flush; unsigned long flags = 0, cmd_mask; struct nvdimm *nvdimm; u32 device_handle; @@ -1204,9 +1220,12 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc) if (nfit_mem->family == NVDIMM_FAMILY_INTEL) cmd_mask |= nfit_mem->dsm_mask; + flush = nfit_mem->nfit_flush ? nfit_mem->nfit_flush->flush + : NULL; nvdimm = nvdimm_create(acpi_desc->nvdimm_bus, nfit_mem, acpi_nfit_dimm_attribute_groups, - flags, cmd_mask); + flags, cmd_mask, flush ? flush->hint_count : 0, + nfit_mem->flush_wpq); if (!nvdimm) return -ENOMEM; diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h index 52078475d969..9282eb324dcc 100644 --- a/drivers/acpi/nfit.h +++ b/drivers/acpi/nfit.h @@ -127,6 +127,7 @@ struct nfit_mem { struct list_head list; struct acpi_device *adev; struct acpi_nfit_desc *acpi_desc; + struct resource *flush_wpq; unsigned long dsm_mask; int family; }; diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c index bbde28d3dec5..d9bba5edd8dc 100644 --- a/drivers/nvdimm/dimm_devs.c +++ b/drivers/nvdimm/dimm_devs.c @@ -346,7 +346,8 @@ EXPORT_SYMBOL_GPL(nvdimm_attribute_group); struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data, const struct attribute_group **groups, unsigned long flags, - unsigned long cmd_mask) + unsigned long cmd_mask, int num_flush, + struct resource *flush_wpq) { struct nvdimm *nvdimm = kzalloc(sizeof(*nvdimm), GFP_KERNEL); struct device *dev; @@ -362,6 +363,8 @@ struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data, nvdimm->provider_data = provider_data; nvdimm->flags = flags; nvdimm->cmd_mask = cmd_mask; + nvdimm->num_flush = num_flush; + nvdimm->flush_wpq = flush_wpq; atomic_set(&nvdimm->busy, 0); dev = &nvdimm->dev; dev_set_name(dev, "nmem%d", nvdimm->id); diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h index 790b62cc81ed..6e961f7f43e7 100644 --- a/drivers/nvdimm/nd-core.h +++ b/drivers/nvdimm/nd-core.h @@ -41,7 +41,8 @@ struct nvdimm { unsigned long cmd_mask; s
Re: [PATCH] drivers/mtd/chips/cfi_cmdset_0020.c: Deinline do_write_buffer, save 5316 bytes
On Fri, Apr 08, 2016 at 08:35:43PM +0200, Denys Vlasenko wrote: > This function compiles to 2554 bytes of machine code. > In C, the function is almost 200 lines long. > > It has only one callsite, but forced inlining that much code > makes gcc generate significantly worse code. Let gcc itself decide > what to do. > > Signed-off-by: Denys Vlasenko > CC: David Woodhouse > CC: Brian Norris > CC: Dan Carpenter > CC: Artem Bityutskiy > CC: linux-...@lists.infradead.org > CC: linux-kernel@vger.kernel.org Applied to l2-mtd.git
Re: [PATCH] mtd: Replace if and BUG with BUG_ON
Hi, On Tue, May 31, 2016 at 07:41:23AM +0200, Julia Lawall wrote: > On Mon, 30 May 2016, Ezequiel Garcia wrote: > > On 28 May 2016 at 13:41, Amitoj Kaur Chawla wrote: > > > Replace if condition and BUG() with a BUG_ON having the conditional > > > expression of the if statement as argument. [...] > > > diff --git a/drivers/mtd/ssfdc.c b/drivers/mtd/ssfdc.c > > > index daf82ba..41b13d1 100644 > > > --- a/drivers/mtd/ssfdc.c > > > +++ b/drivers/mtd/ssfdc.c > > > @@ -380,8 +380,7 @@ static int ssfdcr_readsect(struct mtd_blktrans_dev > > > *dev, > > > " block_addr=%d\n", logic_sect_no, sectors_per_block, > > > offset, > > > block_address); > > > > > > - if (block_address >= ssfdc->map_len) > > > - BUG(); > > > + BUG_ON(block_address >= ssfdc->map_len); > > > > > > > I don't want to be rude, but I wonder if there's any value at all in > > such a patch. It barely improves readability, it barely reduces the > > LoC, yet it consumes developer time, maintainer time, and changes git > > per-line authorship (used in git blame). > > Actually, I think that this particular patch does improve readability a > bit. Scanning straight down the code is easier than looking under an if. > Also, git blame now has a way to go back in history (although I don't > remember what it is), so the argument that cleaning up the code makes it > very difficult to find why the nontrivial part of the code is as it is > doesn't completely hold any more. I agree it's a small improvement. Not sure I'd worry too much about git-blame. Applied to l2-mtd.git. Brian
[PATCH] x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
From: Rafael J. Wysocki On Intel hardware, native_play_dead() uses mwait_play_dead() by default and only falls back to the other methods if that fails. That also happens during resume from hibernation, when the restore (boot) kernel runs disable_nonboot_cpus() to take all of the CPUs except for the boot one offline. However, that is problematic, because the address passed to __monitor() in mwait_play_dead() is likely to be written to in the last phase of hibernate image restoration and that causes the "dead" CPU to start executing instructions again. Unfortunately, the page containing the address in that CPU's instruction pointer may not be valid any more at that point. First, that page may have been overwritten with image kernel memory contents already, so the instructions the CPU attempts to execute may simply be invalid. Second, the page tables previously used by that CPU may have been overwritten by image kernel memory contents, so the address in its instruction pointer is impossible to resolve then. A report from Varun Koyyalagunta and investigation carried out by Chen Yu show that the latter sometimes happens in practice. To prevent it from happening, modify native_play_dead() to make it use hlt_play_dead() instead of mwait_play_dead() during resume from hibernation which avoids the inadvertent "revivals" of "dead" CPUs. A slightly unpleasant consequence of this change is that if the system is hibernated with one or more CPUs offline, it will generally draw more power after resume than it did before hibernation, because the physical state entered by CPUs via hlt_play_dead() is higher-power than the mwait_play_dead() one in the majority of cases. It is possible to work around this, but it is unclear how much of a problem that's going to be in practice, so the workaround will be implemented later if it turns out to be necessary. Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371 Reported-by: Varun Koyyalagunta Original-by: Chen Yu Signed-off-by: Rafael J. Wysocki --- This is a slightly rearranged new version of https://patchwork.kernel.org/patch/9217459/ --- arch/x86/include/asm/cpu.h |6 ++ arch/x86/kernel/smpboot.c |3 +++ arch/x86/power/cpu.c | 21 + kernel/power/hibernate.c |7 ++- kernel/power/power.h |2 ++ 5 files changed, 38 insertions(+), 1 deletion(-) Index: linux-pm/kernel/power/hibernate.c === --- linux-pm.orig/kernel/power/hibernate.c +++ linux-pm/kernel/power/hibernate.c @@ -409,6 +409,11 @@ int hibernation_snapshot(int platform_mo goto Close; } +int __weak hibernate_resume_nonboot_cpu_disable(void) +{ + return disable_nonboot_cpus(); +} + /** * resume_target_kernel - Restore system state from a hibernation image. * @platform_mode: Whether or not to use the platform driver. @@ -433,7 +438,7 @@ static int resume_target_kernel(bool pla if (error) goto Cleanup; - error = disable_nonboot_cpus(); + error = hibernate_resume_nonboot_cpu_disable(); if (error) goto Enable_cpus; Index: linux-pm/kernel/power/power.h === --- linux-pm.orig/kernel/power/power.h +++ linux-pm/kernel/power/power.h @@ -38,6 +38,8 @@ static inline char *check_image_kernel(s } #endif /* CONFIG_ARCH_HIBERNATION_HEADER */ +extern int hibernate_resume_nonboot_cpu_disable(void); + /* * Keep some memory free so that I/O operations can succeed without paging * [Might this be more than 4 MB?] Index: linux-pm/arch/x86/power/cpu.c === --- linux-pm.orig/arch/x86/power/cpu.c +++ linux-pm/arch/x86/power/cpu.c @@ -266,6 +266,27 @@ void notrace restore_processor_state(voi EXPORT_SYMBOL(restore_processor_state); #endif +#if defined(CONFIG_HIBERNATION) && defined(CONFIG_HOTPLUG_CPU) +bool force_hlt_play_dead __read_mostly; + +int hibernate_resume_nonboot_cpu_disable(void) +{ + int ret; + + /* +* Ensure that MONITOR/MWAIT will not be used in the "play dead" loop +* during hibernate image restoration, because it is likely that the +* monitored address will be actually written to at that time and then +* the "dead" CPU may start executing instructions from an image +* kernel's page (and that may not be the "play dead" loop any more). +*/ + force_hlt_play_dead = true; + ret = disable_nonboot_cpus(); + force_hlt_play_dead = false; + return ret; +} +#endif + /* * When bsp_check() is called in hibernate and suspend, cpu hotplug * is disabled already. So it's unnessary to handle race condition between Index: linux-pm/arch/x86/kernel/smpboot.c === --- linux-pm.orig/arch/x86/kernel/smpboot.c +++ linux-pm/arch/x86/kernel/smpboot.c
Re: [PATCH] mtd: nand: brcmnand: Change BUG_ON in brcmnand_send_cmd
On Fri, Jul 08, 2016 at 10:36:39AM -0700, Florian Fainelli wrote: > Change the BUG_ON() condition in brcmnand_send_cmd() which checks for > the interrupt status "controller ready" bit to a WARN_ON. > > There is no good reason to kill the system when this condition occur > because we could have systems which listed the NAND controller as > available (e.g: from Device Tree), but the NAND chip could be > malfunctioning and not responding. > > Signed-off-by: Florian Fainelli Acked-by: Brian Norris > --- > Note that I even hesitated to remove that completely, but there is > some value in knowing about this condition since it helps figuring > out what could be wrong. > > drivers/mtd/nand/brcmnand/brcmnand.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c > b/drivers/mtd/nand/brcmnand/brcmnand.c > index b6062a2f3dfd..72bdc283778d 100644 > --- a/drivers/mtd/nand/brcmnand/brcmnand.c > +++ b/drivers/mtd/nand/brcmnand/brcmnand.c > @@ -1165,7 +1165,7 @@ static void brcmnand_send_cmd(struct brcmnand_host > *host, int cmd) > ctrl->cmd_pending = cmd; > > intfc = brcmnand_read_reg(ctrl, BRCMNAND_INTFC_STATUS); > - BUG_ON(!(intfc & INTFC_CTLR_READY)); > + WARN_ON(!(intfc & INTFC_CTLR_READY)); > > mb(); /* flush previous writes */ > brcmnand_write_reg(ctrl, BRCMNAND_CMD_START, > -- > 2.7.4 >
Re: [PATCH v6 00/10] acpi, clocksource: add GTDT driver and GTDT support in arm_arch_timer
On Saturday, July 09, 2016 11:44:47 AM Hanjun Guo wrote: > On 2016/7/8 21:22, Lorenzo Pieralisi wrote: > > On Thu, Jul 07, 2016 at 03:58:04PM +0200, Rafael J. Wysocki wrote: > > > > [...] > > > >>> Anyway let's avoid these petty arguments, I agree there must be some > >>> sort of ARM64 ACPI maintainership for the reasons you mentioned above. > >> > >> To avoid confusion on who's going to push stuff to Linus, I can do > >> that, but it must be clear whose ACKs are needed for that to happen. > >> That may be one person or all of you, whatever you decide. > > > > I think the reasoning is the same, to avoid confusion and avoid stepping > > on each other toes it is best to have a single gatekeeper (still > > multiple maintainer entries to keep patches reviewed correctly), if no > > one complains I will do that and a) provide ACKs (I will definitely > > require and request Hanjun and Sudeep ones too appropriately on a per > > patch basis) and b) send you pull requests. > > Fine to me. > > > > > Having a maintainer per file would be farcical, I really do not > > Agree, but having three of us in maintainer entries in MAINTAINERS > file will help the patches be reviewed correctly with more eyes. > > > expect that amount of traffic for drivers/acpi/arm64 therefore I > > really doubt there is any risk of me slowing things down. > > > > Does this sound reasonable ? Comments/complaints welcome, please > > manifest yourselves. > > Fair enough. What I'm concern most is land ACPI on ARM64 soundly, > let's do that :) > > OK, let's back to this patch set, Fuwei already prepared a new version > of patches [1] (moving acpi_gtdt.c to drivers/acpi/arm64/ and add a > maintainer entries patch), shall we review and comment on this patch > set for now, or just let Fuwei send out the new version? Frankly, I don't see a point in discussing the old version only if a new one is available already. Post it, please. Thanks, Rafael
[PATCH] media: solo6x10: increase FRAME_BUF_SIZE
In practice, devices sometimes return frames larger than current buffer size, leading to failure in solo_send_desc(). It is not clear which minimal increase in buffer size would be enough, so this patch doubles it, this should be safely assumed as sufficient. Signed-off-by: Andrey Utkin --- drivers/media/pci/solo6x10/solo6x10-v4l2-enc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/pci/solo6x10/solo6x10-v4l2-enc.c b/drivers/media/pci/solo6x10/solo6x10-v4l2-enc.c index 8b1cde5..3991643 100644 --- a/drivers/media/pci/solo6x10/solo6x10-v4l2-enc.c +++ b/drivers/media/pci/solo6x10/solo6x10-v4l2-enc.c @@ -33,7 +33,7 @@ #include "solo6x10-jpeg.h" #define MIN_VID_BUFFERS2 -#define FRAME_BUF_SIZE (196 * 1024) +#define FRAME_BUF_SIZE (400 * 1024) #define MP4_QS 16 #define DMA_ALIGN 4096 -- 2.8.4
[RFC PATCH v3 2/2] mm, thp: convert from optimistic swapin collapsing to conservative
To detect whether khugepaged swapin worthwhile, this patch checks the amount of young pages. There should be at least half of HPAGE_PMD_NR to swapin. Signed-off-by: Ebru Akagunduz Suggested-by: Minchan Kim --- Changes in v2: - Don't change thp design, only notice amount of young pages, if khugepaged needs to swapin (Minchan Kim). - Print out count of referenced pages in __collapse_huge_page_swapin() (Ebru Akagunduz) Changes in v3: - After khugepaged extracted from huge_memory.c, changes moved to khugepaged.c include/trace/events/huge_memory.h | 19 +++ mm/khugepaged.c| 38 +++--- 2 files changed, 34 insertions(+), 23 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 830d47d..04f58ac 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -13,7 +13,7 @@ EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \ EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \ EM( SCAN_PAGE_RO, "no_writable_page") \ - EM( SCAN_NO_REFERENCED_PAGE,"no_referenced_page") \ + EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \ EM( SCAN_PAGE_NULL, "page_null")\ EM( SCAN_SCAN_ABORT,"scan_aborted") \ EM( SCAN_PAGE_COUNT,"not_suitable_page_count") \ @@ -47,7 +47,7 @@ SCAN_STATUS TRACE_EVENT(mm_khugepaged_scan_pmd, TP_PROTO(struct mm_struct *mm, struct page *page, bool writable, -bool referenced, int none_or_zero, int status, int unmapped), +int referenced, int none_or_zero, int status, int unmapped), TP_ARGS(mm, page, writable, referenced, none_or_zero, status, unmapped), @@ -55,7 +55,7 @@ TRACE_EVENT(mm_khugepaged_scan_pmd, __field(struct mm_struct *, mm) __field(unsigned long, pfn) __field(bool, writable) - __field(bool, referenced) + __field(int, referenced) __field(int, none_or_zero) __field(int, status) __field(int, unmapped) @@ -108,14 +108,14 @@ TRACE_EVENT(mm_collapse_huge_page, TRACE_EVENT(mm_collapse_huge_page_isolate, TP_PROTO(struct page *page, int none_or_zero, -bool referenced, bool writable, int status), +int referenced, bool writable, int status), TP_ARGS(page, none_or_zero, referenced, writable, status), TP_STRUCT__entry( __field(unsigned long, pfn) __field(int, none_or_zero) - __field(bool, referenced) + __field(int, referenced) __field(bool, writable) __field(int, status) ), @@ -138,25 +138,28 @@ TRACE_EVENT(mm_collapse_huge_page_isolate, TRACE_EVENT(mm_collapse_huge_page_swapin, - TP_PROTO(struct mm_struct *mm, int swapped_in, int ret), + TP_PROTO(struct mm_struct *mm, int swapped_in, int referenced, int ret), - TP_ARGS(mm, swapped_in, ret), + TP_ARGS(mm, swapped_in, referenced, ret), TP_STRUCT__entry( __field(struct mm_struct *, mm) __field(int, swapped_in) + __field(int, referenced) __field(int, ret) ), TP_fast_assign( __entry->mm = mm; __entry->swapped_in = swapped_in; + __entry->referenced = referenced; __entry->ret = ret; ), - TP_printk("mm=%p, swapped_in=%d, ret=%d", + TP_printk("mm=%p, swapped_in=%d, referenced=%d, ret=%d", __entry->mm, __entry->swapped_in, + __entry->referenced, __entry->ret) ); diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5661484..7dbee69 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -27,7 +27,7 @@ enum scan_result { SCAN_EXCEED_NONE_PTE, SCAN_PTE_NON_PRESENT, SCAN_PAGE_RO, - SCAN_NO_REFERENCED_PAGE, + SCAN_LACK_REFERENCED_PAGE, SCAN_PAGE_NULL, SCAN_SCAN_ABORT, SCAN_PAGE_COUNT, @@ -500,8 +500,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, { struct page *page = NULL; pte_t *_pte; - int none_or_zero = 0, result = 0; - bool referenced = false, writable = false; + int none_or_zero = 0, result = 0, referenced = 0; + bool writable = false; for (_pte = pte; _pte < pte+HPAGE_PMD_NR; _pte++, address += PAGE_SIZE) { @@ -580,11 +580,11 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageLRU(page), page); - /* If there is n
[PATCH v3 1/2] mm, thp: fix comment inconsistency for swapin readahead functions
After fixing swapin issues, comment lines stayed as in old version. This patch updates the comments. Signed-off-by: Ebru Akagunduz Cc: Hillf Danton --- Changes in v2: - Newly created in this version. Changes in v3: - Replace Reported-by with Cc (Hillf Danton) - Remove RFC tag (Hillf Danton) - After khugepaged extracted from huge_memory.c, changes moved to khugepaged.c mm/khugepaged.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 93d5f87..5661484 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -891,9 +891,10 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, /* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */ if (ret & VM_FAULT_RETRY) { down_read(&mm->mmap_sem); - /* vma is no longer available, don't continue to swapin */ - if (hugepage_vma_revalidate(mm, address)) + if (hugepage_vma_revalidate(mm, address)) { + /* vma is no longer available, don't continue to swapin */ return false; + } /* check if the pmd is still valid */ if (mm_find_pmd(mm, address) != pmd) return false; @@ -969,7 +970,7 @@ static void collapse_huge_page(struct mm_struct *mm, /* * __collapse_huge_page_swapin always returns with mmap_sem locked. -* If it fails, release mmap_sem and jump directly out. +* If it fails, we release mmap_sem and jump out_nolock. * Continuing to collapse causes inconsistency. */ if (!__collapse_huge_page_swapin(mm, vma, address, pmd)) { -- 1.9.1
[RFC PATCH v3 0/2] mm, thp: convert from optimistic swapin collapsing to conservative
This patch series fixes comment inconsistency and supplies to decide to swapin looking the amount of young pages. Changes in v2: - Don't change thp design, notice young pages if needs to swapin - Add comment line fixing patch Changes in v3: - Remove revert patch (allocstall), the patch automatically dropped - Set comment line fixing patch as first part of the series - Move changes from huge_memory.c to khugepaged.c Ebru Akagunduz (2): mm, thp: fix comment inconsistency for swapin readahead functions mm, thp: convert from optimistic swapin collapsing to conservative include/trace/events/huge_memory.h | 19 +--- mm/khugepaged.c| 45 +++--- 2 files changed, 38 insertions(+), 26 deletions(-) -- 1.9.1
Re: [PATCH v7 3/4] perf: xgene: Add APM X-Gene SoC Performance Monitoring Unit driver
On Wed, Jul 6, 2016 at 8:07 PM, Tai Nguyen wrote: > Signed-off-by: Tai Nguyen > --- > Documentation/perf/xgene-pmu.txt | 48 ++ > drivers/perf/Kconfig |7 + > drivers/perf/Makefile|1 + > drivers/perf/xgene_pmu.c | 1398 > ++ > 4 files changed, 1454 insertions(+) > create mode 100644 Documentation/perf/xgene-pmu.txt > create mode 100644 drivers/perf/xgene_pmu.c > [...] > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig > index 04e2653..4d5c5f9 100644 > --- a/drivers/perf/Kconfig > +++ b/drivers/perf/Kconfig > @@ -12,4 +12,11 @@ config ARM_PMU > Say y if you want to use CPU performance monitors on ARM-based > systems. > > +config XGENE_PMU > +depends on PERF_EVENTS && ARCH_XGENE > +bool "APM X-Gene SoC PMU" If the driver is bool, then please avoid using module.h and anything from within it. They are either no-ops when built in, or there are non-modular equivalents available, so it is entirely avoidable, and makes for smaller and better code. > +default n > +help > + Say y if you want to use APM X-Gene SoC performance monitors. > + > endmenu > diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile > index acd2397..b116e98 100644 > --- a/drivers/perf/Makefile > +++ b/drivers/perf/Makefile > @@ -1 +1,2 @@ > obj-$(CONFIG_ARM_PMU) += arm_pmu.o > +obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o [...] ver = { > + .name = "xgene-pmu", > + .of_match_table = xgene_pmu_of_match, > + .acpi_match_table = ACPI_PTR(xgene_pmu_acpi_match), > + }, > +}; > + > +module_platform_driver(xgene_pmu_driver); builtin_platform_driver > + > +MODULE_DESCRIPTION("APM X-Gene SoC PMU driver"); > +MODULE_AUTHOR("Hoan Tran "); > +MODULE_AUTHOR("Tai Nguyen "); > +MODULE_LICENSE("GPL"); As long as this information is at the top of the file, then these can go away too -- just like MODULE_DEVICE_TABLE they are no-op. Paul.
bug in memcg oom-killer results in a hung syscall in another process in the same cgroup
I came across the following issue in kernel 3.16 (Ubuntu 14.04) which was then reproduced in kernels 4.4 LTS: After a couple of of memcg oom-kills in a cgroup, a syscall in *another* process in the same cgroup hangs indefinitely. Reproducing: # mkdir -p strace_run # mkdir /sys/fs/cgroup/memory/1 # echo 1073741824 > /sys/fs/cgroup/memory/1/memory.limit_in_bytes # echo 0 > /sys/fs/cgroup/memory/1/memory.swappiness # for i in $(seq 1000); do ./call-mem-hog /sys/fs/cgroup/memory/1/cgroup.procs & done Where call-mem-hog is: #!/bin/sh set -ex echo $$ > $1 echo "Adding $$ to $1" strace -ff -tt ./mem-hog 2> strace_run/$$ Initially I thought it was a userspace bug in dash as it only happened with /bin/sh (which points to dash) and not with bash. I see the following hanging processes: USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 20999 0.0 0.0 4508 100 pts/6S16:28 0:00 /bin/sh ./call-mem-hog /sys/fs/cgroup/memory/1/cgroup.procs However, when using strace, I noticed that sometimes there is actually a mem-hog process hanging on sbrk syscall (Of course the memory.oom_control is 0 and this is not expected). Sending an ABRT signal to the waiting strace process then resulted in the mem-hog process getting oom-killed by the kernel.
Re: [PATCH v6 3/5] usb: dwc3: add phyif_utmi_quirk
Am Samstag, 9. Juli 2016, 11:38:00 schrieb William.wu: > Dear Heiko & Balbi, > > On 2016/7/8 21:29, Felipe Balbi wrote: > > Hi, > > > > Heiko Stuebner writes: > >> Am Donnerstag, 7. Juli 2016, 10:54:24 schrieb William Wu: > >>> Add a quirk to configure the core to support the > >>> UTMI+ PHY with an 8- or 16-bit interface. UTMI+ PHY > >>> interface is hardware property, and it's platform > >>> dependent. Normall, the PHYIf can be configured > >>> during coreconsultant. But for some specific usb > >>> cores(e.g. rk3399 soc dwc3), the default PHYIf > >>> configuration value is fault, so we need to > >>> reconfigure it by software. > >>> > >>> And refer to the dwc3 databook, the GUSB2PHYCFG.USBTRDTIM > >>> must be set to the corresponding value according to > >>> the UTMI+ PHY interface. > >>> > >>> Signed-off-by: William Wu > >>> --- > >> > >> [...] > >> > >>> diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt > >>> b/Documentation/devicetree/bindings/usb/dwc3.txt index > >>> 020b0e9..8d7317d > >>> 100644 > >>> --- a/Documentation/devicetree/bindings/usb/dwc3.txt > >>> +++ b/Documentation/devicetree/bindings/usb/dwc3.txt > >>> > >>> @@ -42,6 +42,10 @@ Optional properties: > >>>- snps,dis-u2-freeclk-exists-quirk: when set, clear the > >>> > >>> u2_freeclk_exists in GUSB2PHYCFG, specify that USB2 PHY doesn't > >>> provide > >>> > >>> a free-running PHY clock. > >>> > >>> + - snps,phyif-utmi-quirk: when set core will set phyif UTMI+ > >>> interface. > >>> + - snps,phyif-utmi: the value to configure the core to support a > >>> UTMI+ > >>> PHY + with an 8- or 16-bit interface. Value 0 select 8-bit > >>> + interface, value 1 select 16-bit interface. > >> > >> maybe > >> > >>snps,phyif-utmi-width = <8> or <16>; > >> > >> devicetree is about describing the hardware, not the things that get > >> written to registers :-) . The conversion from the described width to > >> the register value can easily be done in the driver. > > Thanks for your suggestion:-) > Yes, “snps,phyif-utmi-width = <8> or <16>” is much clearer and easier to > understand. > And I have considered the same dts property for phyif-utmi, but I have > no good idea about > the conversion from described width to the registers value for the time > being. > > About phyif utmi width configuration, we need to set two places in > GUSB2PHYCFG register, > according to DWC3 USB3.0 controller databook version3.00a,6.3.46 > GUSB2PHYCFG > > -- > Bits | Name | Description > -- > 13:10 | USBTRDTIM | Sets the turnaround > time in PHY clocks. > || 4'h5: When the MAC > > interface is 16-bit UTMI+ > > || 4'h9: When the MAC > > interface is 8-bit UTMI+/ULPI. > -- > 3| PHYIF|If UTMI+ is > selected, the application uses this bit to configure > > ||core to support a UTMI+ > > PHY with an 8- or 16-bit interface. > > ||1'b0: 8 bits > ||1'b1: 16 bits > > -- > > > And I think maybe I can try to do this: > change it in dts: > snps,phyif-utmi-width = <8> or <16>; > > Then convert to register value like this: > device_property_read_u8(dev, "snps,phyif-utmi-width", > &phyif_utmi_width); > > dwc->phyif_utmi = phyif_utmi_width >> 4; > > Ater the conversion, dwc->phyif_utmi value 0 means 8 bits, value 1 > means 16 bits, > and it's easier for us to config GUSB2PHYCFG. > > Is it OK? or you could just store the actual width value read from the dts and make the core handle accordingly, making everything a bit more explicit. I guess personally I'd do something like: make dwc->phyif_utmi a regular unsigned int in probe: ret = device_property_read_u8(dev, "snps,phyif-utmi-width", &dwc->phyif_utmi); if (ret < 0) { dwc->phyif_utmi = 0; else if (dwc->phyif_utmi != 16 && dwc->phyif_utmi != 8) { dev_err(dev, "unsupported utmi interface width %d\n", dwc->phyif_utmi); return -EINVAL; } when setting your GUSB2PHYCFG register: if (dwc->phyif_utmi > 0) { reg &= ~(DWC3_GUSB2PHYCFG_PHYIF_MASK | DWC3_GUSB2PHYCFG_USBTRDTIM_MASK); usbtrdtim = (dwc->phyif_utmi == 1
Re: [PATCH V2 04/10] firmware: tegra: add IVC library
On Tue, Jul 5, 2016 at 5:04 AM, Joseph Lo wrote: > The Inter-VM communication (IVC) is a communication protocol, which is > designed for interprocessor communication (IPC) or the communication > between the hypervisor and the virtual machine with a guest OS on it. So > it can be translated as inter-virtual memory or inter-virtual machine > communication. The message channels are maintained on the DRAM or SRAM > and the data coherency should be considered. Or the data could be > corrupted or out of date when the remote client checking it. > > Inside the IVC, it maintains memory-based descriptors for the TX/RX > channels and the coherency issue of the counter and payloads. So the > clients can use it to send/receive messages to/from remote ones. > > We introduce it as a library for the firmware drivers, which can use it > for IPC. > > Based-on-the-work-by: > Peter Newman > > Signed-off-by: Joseph Lo > --- > Changes in V2: > - None > --- > drivers/firmware/Kconfig| 1 + > drivers/firmware/Makefile | 1 + > drivers/firmware/tegra/Kconfig | 13 + > drivers/firmware/tegra/Makefile | 1 + > drivers/firmware/tegra/ivc.c| 659 > > include/soc/tegra/ivc.h | 102 +++ > 6 files changed, 777 insertions(+) > create mode 100644 drivers/firmware/tegra/Kconfig > create mode 100644 drivers/firmware/tegra/Makefile > create mode 100644 drivers/firmware/tegra/ivc.c > create mode 100644 include/soc/tegra/ivc.h > > diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig > index 5e618058defe..bbd64ae8c4c6 100644 > --- a/drivers/firmware/Kconfig > +++ b/drivers/firmware/Kconfig > @@ -200,5 +200,6 @@ config HAVE_ARM_SMCCC > source "drivers/firmware/broadcom/Kconfig" > source "drivers/firmware/google/Kconfig" > source "drivers/firmware/efi/Kconfig" > +source "drivers/firmware/tegra/Kconfig" > > endmenu > diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile > index 474bada56fcd..9a4df8171cc4 100644 > --- a/drivers/firmware/Makefile > +++ b/drivers/firmware/Makefile > @@ -24,3 +24,4 @@ obj-y += broadcom/ > obj-$(CONFIG_GOOGLE_FIRMWARE) += google/ > obj-$(CONFIG_EFI) += efi/ > obj-$(CONFIG_UEFI_CPER)+= efi/ > +obj-y += tegra/ > diff --git a/drivers/firmware/tegra/Kconfig b/drivers/firmware/tegra/Kconfig > new file mode 100644 > index ..1fa3e4e136a5 > --- /dev/null > +++ b/drivers/firmware/tegra/Kconfig > @@ -0,0 +1,13 @@ > +menu "Tegra firmware driver" > + > +config TEGRA_IVC > + bool "Tegra IVC protocol" If this driver is not tristate, then why does the driver include the module.h header below? > + depends on ARCH_TEGRA > + help > + IVC (Inter-VM Communication) protocol is part of the IPC > + (Inter Processor Communication) framework on Tegra. It maintains the > + data and the different commuication channels in SysRAM or RAM and > + keeps the content is synchronization between host CPU and remote > + processors. > + > +endmenu > diff --git a/drivers/firmware/tegra/Makefile b/drivers/firmware/tegra/Makefile > new file mode 100644 > index ..92e2153e8173 > --- /dev/null > +++ b/drivers/firmware/tegra/Makefile > @@ -0,0 +1 @@ > +obj-$(CONFIG_TEGRA_IVC)+= ivc.o > diff --git a/drivers/firmware/tegra/ivc.c b/drivers/firmware/tegra/ivc.c > new file mode 100644 > index ..3e736bb9915a > --- /dev/null > +++ b/drivers/firmware/tegra/ivc.c > @@ -0,0 +1,659 @@ > +/* > + * Copyright (c) 2014-2016, NVIDIA CORPORATION. All rights reserved. > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms and conditions of the GNU General Public License, > + * version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + */ > + > +#include ^ I'm sure it "works" since module.h includes nearly everything else, but that is less than ideal for exactly the same reason. Thanks, Paul. -- > + > +#include > + > +#define IVC_ALIGN 64 > +
Re: [PATCH v2 2/6] clk: mvebu: Add the xtal clock for Armada 3700 SoC
On Thu, Jul 7, 2016 at 6:37 PM, Gregory CLEMENT wrote: > This clock is the parent of all the Armada 3700 clocks. It is a fixed > rate clock which depends on the gpio configuration read when resetting > the SoC. > > Signed-off-by: Gregory CLEMENT > --- > drivers/clk/mvebu/Kconfig| 3 ++ > drivers/clk/mvebu/Makefile | 1 + > drivers/clk/mvebu/armada-37xx-xtal.c | 98 > > 3 files changed, 102 insertions(+) > create mode 100644 drivers/clk/mvebu/armada-37xx-xtal.c > > diff --git a/drivers/clk/mvebu/Kconfig b/drivers/clk/mvebu/Kconfig > index 3165da77d525..fddc8ac5faff 100644 > --- a/drivers/clk/mvebu/Kconfig > +++ b/drivers/clk/mvebu/Kconfig > @@ -24,6 +24,9 @@ config ARMADA_39X_CLK > bool > select MVEBU_CLK_COMMON > > +config ARMADA_37XX_CLK > + bool > + Since the driver is not tristate, can you please remove all modular references from it? With the author and license etc. at the top you can just delete the last three lines, the DEVICE_TABLE and register with builtin_platform_driver, and then no need for module.h either. Either that, or change it to a tristate, if that use case makes sense. Thanks, Paul. -- > config ARMADA_XP_CLK > bool > select MVEBU_CLK_COMMON > diff --git a/drivers/clk/mvebu/Makefile b/drivers/clk/mvebu/Makefile > index 7172ef65693d..4257a36d0219 100644 > --- a/drivers/clk/mvebu/Makefile > +++ b/drivers/clk/mvebu/Makefile > @@ -6,6 +6,7 @@ obj-$(CONFIG_ARMADA_370_CLK)+= armada-370.o > obj-$(CONFIG_ARMADA_375_CLK) += armada-375.o > obj-$(CONFIG_ARMADA_38X_CLK) += armada-38x.o > obj-$(CONFIG_ARMADA_39X_CLK) += armada-39x.o > +obj-$(CONFIG_ARMADA_37XX_CLK) += armada-37xx-xtal.o > obj-$(CONFIG_ARMADA_XP_ [...]
Re: [PATCH 0/9] mm: Hardened usercopy
On 9 Jul 2016 at 14:27, Andy Lutomirski wrote: > On Jul 6, 2016 6:25 PM, "Kees Cook" wrote: > > > > Hi, > > > > This is a start of the mainline port of PAX_USERCOPY[1]. After I started > > writing tests (now in lkdtm in -next) for Casey's earlier port[2], I > > kept tweaking things further and further until I ended up with a whole > > new patch series. To that end, I took Rik's feedback and made a number > > of other changes and clean-ups as well. > > > > I like the series, but I have one minor nit to pick. The effect of > this series is to harden usercopy, but most of the code is really > about infrastructure to validate that a pointed-to object is valid. actually USERCOPY has never been about validating pointers. its sole purpose is to validate the *size* argument of copy*user calls, a very specific form of runtime bounds checking. it's only really relevant for slab objects and the pointer checks (that one might mistake for being a part of the defense mechanism) are only there to determine whether the kernel pointer refers to a slab object or not (the stack part is a small bonus and was never the main goal either). > Might it make sense to call the infrastructure part something else? yes, more bikeshedding will surely help, like the renaming of .data..read_only to .data..ro_after_init which also had nothing to do with init but everything to do with objects being conceptually read-only... > After all, this could be extended in the future for memcpy or even for > some GCC plugin to check pointers passed to ordinary (non-allocator) > functions. what kind of checks are you thinking of here? and more fundamentally, against what kind of threats? as for memcpy, it's the standard mandated memory copying function, what security related properties can it check on its pointer arguments?
Re: [PATCH 14/14] PCI: xgene: make it explicitly non-modular
[Re: [PATCH 14/14] PCI: xgene: make it explicitly non-modular] On 07/07/2016 (Thu 15:42) Duc Dang wrote: > On Thu, Jul 7, 2016 at 3:35 PM, Tanmay Inamdar wrote: > > > > > > On Sat, Jul 2, 2016 at 4:13 PM, Paul Gortmaker > > wrote: > >> > >> The Kconfig currently controlling compilation of this code is: > >> > >> drivers/pci/host/Kconfig:config PCI_XGENE > >> drivers/pci/host/Kconfig: bool "X-Gene PCIe controller" > >> > >> ...meaning that it currently is not being built as a module by anyone. > >> > >> Lets remove the few trace uses of modular code and macros, so that > >> when reading the driver there is no doubt it is builtin-only. > >> > >> Since module_platform_driver() uses the same init level priority as > >> builtin_platform_driver() the init ordering remains unchanged with > >> this commit. > >> > >> We also delete the MODULE_LICENSE tag etc. since all that information > >> is already contained at the top of the file in the comments. > >> > >> Cc: Tanmay Inamdar > >> Cc: Bjorn Helgaas > >> Cc: linux-...@vger.kernel.org > >> Signed-off-by: Paul Gortmaker > > Thanks for taking care of this, Paul. > > I tested your patch and it worked fine on my X-Gene Mustang board. > > One minor comment below. > > >> --- > >> drivers/pci/host/pci-xgene.c | 8 ++-- > >> 1 file changed, 2 insertions(+), 6 deletions(-) > >> > >> diff --git a/drivers/pci/host/pci-xgene.c b/drivers/pci/host/pci-xgene.c > >> index 7eb20cc76dd3..a81273c23341 100644 > >> --- a/drivers/pci/host/pci-xgene.c > >> +++ b/drivers/pci/host/pci-xgene.c > >> @@ -21,7 +21,7 @@ > >> #include > >> #include > >> #include > >> -#include > >> +#include > > The platform_device.h already has builtin_platform_driver macro > defined. So this init.h is not need? If you look, you will find that platform_device.h does not include the init.h even though it references __init; it can do this w/o error since all the references themselves are in a macro. However once code wants to be a consumer of those macros, they will need init.h present. Often you can overlook directly calling it out for inclusion since it gets sourced by another header, but it is best policy to list what gets used. Thanks for testing! Paul. -- > > >> #include > >> #include > >> #include > >> @@ -579,8 +579,4 @@ static struct platform_driver xgene_pcie_driver = { > >> }, > >> .probe = xgene_pcie_probe_bridge, > >> }; > >> -module_platform_driver(xgene_pcie_driver); > >> - > >> -MODULE_AUTHOR("Tanmay Inamdar "); > >> -MODULE_DESCRIPTION("APM X-Gene PCIe driver"); > >> -MODULE_LICENSE("GPL v2"); > >> +builtin_platform_driver(xgene_pcie_driver); > > > > > > Copying Duc. > >> > >> -- > >> 2.8.4 > >> > > > Regards, > Duc Dang.
Re: [PATCH 0/3] ARM: dts: the dts support for rk3288 firefly reload
Hi Randy, Am Samstag, 9. Juli 2016, 23:42:28 schrieb ayaka: > On 07/08/2016 05:35 AM, Heiko Stuebner wrote: > > Am Donnerstag, 7. Juli 2016, 02:22:57 schrieb Randy Li: > >> The rk3288 firefly reload is a Rockchip RK3288 based board be found by > >> core board and main board. The regulators are connected in a different > >> way to the previous version of firefly boards, it is necessary to > >> move some common code to uncommon place. > >> > >> I only tested the ethernet and confirmed that works. > >> The usb in this board won't caused by the bugs in the driver. > >> > >> This version follow the suggests from Heiko Stuebner, > >> except the duplicated supply name problem, I don't think > >> it could be fixed in that way. > > > > I've now had a chance to look at that reload board on the firefly site. > > Firefly also is the company name, so a board named that way is not > > necessarily a "variant" :-) . > > > > And looking at the "reload" board this definitly seems to be a very > > different product with it being a system-on-module+baseboard design with > > additional peripherals like that sata bridge, camera interfaces and > > probably > sata bridge is just a SATA to usb bridge and the "reload" bring back the > DVP camera interface and > a HDMI rx chip connected to the other MIPI camera interface. there are always more things to control (reset pins, regulators) and the usb subsystem is currently in the process of getting support for such "embedded" uses. > > more. > > > > As you might've seen, most Rockchip boards are based on some reference- > > design, so are similar in a big part of their core layout. > > Yes, from the evb. But the even the main board of evb in rockchip > company have at lease 3 versions > as I known. > Also the evb is found by power board, main board and core board. > > > So, looking at the vastly different product the reload is, I'd really > > like to have a separate dts for the reload, to not run into more > > confusing differences later on. > > The main problem is that power connections are different. That is why I > decide to make a > separate dts. If the kernel introduce the override dts, I could have a > better way to implement > it. Just to make sure we're not talking about different things. This was meant to illustrate that even though core layouts often look similar we should not try to connect different product board files unnecessarily, as the small differences will make everything more complicated. The "reload" definitly is a completely different product that only shares the manufacturer (firefly) and the soc (rk3288) with the other product and as I wrote should get its own independent dts file. If anything you could do a split into a reload-core dtsi for the system-on- module part and a baseboard dts that includes that (something like what is done for rk3288-rock2). > > Also, when adding a new board, please also add an entry to > > Documentation/devicetree/bindingd/arm/rockchip.txt > > I would send a patch set in a few days. > > > Thanks > > Heiko > > Thank you for you review and you patient again no problem, always nice to have more people play with Rockchip stuff on a mainline kernel :-) Heiko
[PATCH] drm/vc4: remove redundant ret status check
From: Colin Ian King At the current point where ret is being checked for non-zero it has not changed since it was initialized to zero, hence the check and the label unref are redundant and can be removed. Signed-off-by: Colin Ian King --- drivers/gpu/drm/vc4/vc4_drv.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c index 54d0471..0e4cf27 100644 --- a/drivers/gpu/drm/vc4/vc4_drv.c +++ b/drivers/gpu/drm/vc4/vc4_drv.c @@ -195,8 +195,6 @@ static int vc4_drm_bind(struct device *dev) vc4_bo_cache_init(drm); drm_mode_config_init(drm); - if (ret) - goto unref; vc4_gem_init(drm); @@ -218,7 +216,6 @@ unbind_all: component_unbind_all(dev, drm); gem_destroy: vc4_gem_destroy(drm); -unref: drm_dev_unref(drm); vc4_bo_cache_destroy(drm); return ret; -- 2.8.1
Re: [PATCH v2 0/6] net: ethernet: bgmac: Add platform device support
From: Jon Mason Date: Thu, 7 Jul 2016 19:08:52 -0400 > David Miller, Please consider including patches 1-5 in net-next Done.
Re: [PATCH] Need proper type casting before assignment, Remove compilation Warning.
From: Arvind Yadav Date: Fri, 8 Jul 2016 00:07:54 +0530 > -Return type of 'qe_muram_alloc' is 'unsigned long', That Was trying to > assigned in ucc_fast_tx_virtual_fifo_base_offset and > ucc_fast_rx_virtual_fifo_base_offset. These variable are 'unsigned int'. > So before assginment need a proper type casting. > > -Passing value in IS_ERR_VALUE() is wrong, as they pass an 'int' > into a function that takes an 'unsigned long' argument.This happens > to work because the type is sign-extended on 64-bit architectures > before it gets converted into an unsigned type. > > -Passing an 'unsigned short' or 'unsigned int'argument into > IS_ERR_VALUE() is guaranteed to be broken, as are 8-bit integers > and types that are wider than 'unsigned long'. > > -Any user will get compilation warning for that do not pass an > unsigned long' argument. > > Signed-off-by: Arvind Yadav Your subject line is improperly formed. It must have the subsystem or driver name, followed by a colon ":" and a space. Such as: [PATCH] ucc_geth: Need proper type ...
Re: [PATCH net-next 0/3] r8152: remove the redundant code
From: Hayes Wang Date: Thu, 7 Jul 2016 15:09:17 +0800 > Remove the unnacessary code. Series applied.
Minor PKRU bug?
is_prefetch in arch/x86/mm/fault.c can be called on a user address that's not readable due to PKRU. This could break it. You might need to add a get_user_exec or similar. --Andy
Re: [PATCH 0/9] mm: Hardened usercopy
On Jul 6, 2016 6:25 PM, "Kees Cook" wrote: > > Hi, > > This is a start of the mainline port of PAX_USERCOPY[1]. After I started > writing tests (now in lkdtm in -next) for Casey's earlier port[2], I > kept tweaking things further and further until I ended up with a whole > new patch series. To that end, I took Rik's feedback and made a number > of other changes and clean-ups as well. > I like the series, but I have one minor nit to pick. The effect of this series is to harden usercopy, but most of the code is really about infrastructure to validate that a pointed-to object is valid. Might it make sense to call the infrastructure part something else? After all, this could be extended in the future for memcpy or even for some GCC plugin to check pointers passed to ordinary (non-allocator) functions.
Re: linux-next: Tree for Jul 8
On Fri, Jul 08, 2016 at 06:03:38PM +1000, Stephen Rothwell wrote: > Hi all, > > Changes since 20160707: > > New trees: netfilter and netfilter-next > > The drm-msm tree gained a conflict against the arm tree. > > The block tree gained conflicts against Linus' and the btrfs-kdave trees. > > The userns tree gained a conflict against Linus' tree. > > Non-merge commits (relative to Linus' tree): 7460 > 6931 files changed, 350754 insertions(+), 147233 deletions(-) > Build results: total: 148 pass: 136 fail: 12 Failed builds: arc:defconfig arc:allnoconfig arc:tb10x_defconfig arc:axs103_defconfig arc:nsim_hs_smp_defconfig arc:vdk_hs38_smp_defconfig arm:allmodconfig arm64:allmodconfig hexagon:defconfig hexagon:allnoconfig mips:ath79_defconfig mips:malta_defconfig Qemu test results: total: 107 pass: 95 fail: 12 Failed tests: arm64:smp:defconfig arm64:nosmp:defconfig mips:malta_defconfig:nosmp mips:malta_defconfig:smp mips64:malta_defconfig:nosmp mips64:malta_defconfig:smp mipsel:malta_defconfig:nosmp mipsel:malta_defconfig:smp mipsel64:malta_defconfig:nosmp mipsel64:malta_defconfig:smp xtensa:dc233c:ml605:generic_kc705_defconfig xtensa:dc233c:kc705:generic_kc705_defconfig Details are available at http://kerneltests.org/builders. Thanks, Guenter
[PATCH v1] module: Fully remove the kernel_module_from_file hook
Fixes: a1db74209483 ("module: replace copy_module_from_fd with kernel version") Signed-off-by: Mickaël Salaün Cc: Mimi Zohar Cc: Kees Cook Cc: Luis R. Rodriguez Cc: Rusty Russell Cc: Linus Torvalds Cc: Greg Kroah-Hartman --- include/linux/lsm_hooks.h | 1 - include/linux/security.h | 1 - 2 files changed, 2 deletions(-) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 7ae397669d8b..58c777ec8bcf 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1455,7 +1455,6 @@ union security_list_options { int (*kernel_act_as)(struct cred *new, u32 secid); int (*kernel_create_files_as)(struct cred *new, struct inode *inode); int (*kernel_module_request)(char *kmod_name); - int (*kernel_module_from_file)(struct file *file); int (*kernel_read_file)(struct file *file, enum kernel_read_file_id id); int (*kernel_post_read_file)(struct file *file, char *buf, loff_t size, enum kernel_read_file_id id); diff --git a/include/linux/security.h b/include/linux/security.h index 14df373ff2ca..2b8c7d2a3fd8 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -307,7 +307,6 @@ void security_transfer_creds(struct cred *new, const struct cred *old); int security_kernel_act_as(struct cred *new, u32 secid); int security_kernel_create_files_as(struct cred *new, struct inode *inode); int security_kernel_module_request(char *kmod_name); -int security_kernel_module_from_file(struct file *file); int security_kernel_read_file(struct file *file, enum kernel_read_file_id id); int security_kernel_post_read_file(struct file *file, char *buf, loff_t size, enum kernel_read_file_id id); -- 2.8.1
Re: [Ksummit-discuss] 2016 Kernel Summit Planning Kickoff
On Fri, Jul 08, 2016 at 03:06:07PM -0700, Dmitry Torokhov wrote: > > Last year we had the invite-only session on the 3rd day and what I heard > > from some people that was considered better. People had a chance to > > already solve several things upfront and the invite-only day had less > > issues to discuss. Not sure if this can be changed and if the majority > > of people agree with that conclusion. > > I think that only worked because Korea Linux Forum preceded KS so we > had shared talks first. We'd have to swap KS and Plumbers and I'd > guess it is too late now. The Korea Linux Forum is a much shorter conference, so holding it afterwards probably worked better. With conferences that are longer and/or more intense, we've gotten complaints from Kernel Summit attendees that by the time the invite-only day happened at the tail-end of the week meant that people were pretty brain-fried by then. This would have been especially true with the Linux Plumbers Conference, which throws a very nice party at the very end of the conference --- with an open bar, no less. (What this might mean if we tried to hold the Kernel Summit invite-only day afterwards is left to imagination of the gentle reader. :-) People will be very much encouraged to stay for all of the Plumbers Conference, and not just because the party at the end of the week. There's no rule that says we have to make all of our decisions on the invite-only day. In fact, it may be good for decisions to be discussed with the wider LPC community before we make a final decision. That's why we'll have spare slots in reserve for people to schedule topic-specific discussions on Wednesday and Thursday. Cheers, - Ted
[PATCH] spi: spi-ti-qspi: clear wlen field while setting word length.
When a word length of 1 byte is selected and writing data of length more than QSPI_WLEN_MAX_BYTES, first MAX_BYTES will be transfered and remaining will be transfered byte by byte. In that case wlen field should be cleared before setting. Signed-off-by: Prahlad V --- drivers/spi/spi-ti-qspi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/spi/spi-ti-qspi.c b/drivers/spi/spi-ti-qspi.c index 29ea8d2..6c61f54 100644 --- a/drivers/spi/spi-ti-qspi.c +++ b/drivers/spi/spi-ti-qspi.c @@ -276,9 +276,9 @@ static int qspi_write_msg(struct ti_qspi *qspi, struct spi_transfer *t, cmd |= QSPI_WLEN(QSPI_WLEN_MAX_BITS); } else { writeb(*txbuf, qspi->base + QSPI_SPI_DATA_REG); - cmd = qspi->cmd | QSPI_WR_SNGL; xfer_len = wlen; - cmd |= QSPI_WLEN(wlen); + cmd = ((qspi->cmd & ~QSPI_WLEN_MASK) | +QSPI_WLEN(wlen)); } break; case 2: -- 2.5.5
Re: [CRIU] Introspecting userns relationships to other namespaces?
ebied...@xmission.com (Eric W. Biederman) writes: > Andrew Vagin writes: > >> All these thoughts about security make me thinking that kcmp is what we >> should use here. It's maybe something like this: >> >> kcmp(pid1, pid2, KCMP_NS_USERNS, fd1, fd2) >> >> - to check if userns of the fd1 namepsace is equal to the fd2 userns >> >> kcmp(pid1, pid2, KCMP_NS_PARENT, fd1, fd2) >> >> - to check if a parent namespace of the fd1 pidns is equal to fd pidns. >> >> fd1 and fd2 is file descriptors to namespace files. >> >> So if we want to build a hierarchy, we need to collect all namespaces >> and then enumerate them to check dependencies with help of kcmp. > > That is certainly one way to go. > > There is a funny case where we would want to compare a user namespace > file descriptor to a parent user namespace file descriptor. > > > Grumble, Grumble. I think this may actually a case for creating ioctls > for these two cases. Now that random nsfs file descriptors are bind > mountable the original reason for using proc files is not as pressing. > > One ioctl for the user namespace that owns a file descriptor. > One ioctl for the parent namespace of a namespace file descriptor. > > We also need some way to get a command file descriptor for a file system > super block. Al Viro has a pet project for cleaning up the mount API > and this might be the idea excuse to start looking at that. > > (In principle we might be able to run commands through the namespace > file descriptor and using an ioctl feels dirty. But an ioctl that > only uses the fd and request argument does not suffer from the same > problems that ioctls that have to pass additional arguments suffer > from.) Of course it should be an error perhaps -EINVAL to get a user namespace owner or parent namespace that is outside of a processes current user namespace or pid namespace. That way thing stay bounded within the current namespaces the process is in. Which prevents any leak possibilities, and keeps CRIU working. Eric
Re: [CRIU] Introspecting userns relationships to other namespaces?
Andrew Vagin writes: > All these thoughts about security make me thinking that kcmp is what we > should use here. It's maybe something like this: > > kcmp(pid1, pid2, KCMP_NS_USERNS, fd1, fd2) > > - to check if userns of the fd1 namepsace is equal to the fd2 userns > > kcmp(pid1, pid2, KCMP_NS_PARENT, fd1, fd2) > > - to check if a parent namespace of the fd1 pidns is equal to fd pidns. > > fd1 and fd2 is file descriptors to namespace files. > > So if we want to build a hierarchy, we need to collect all namespaces > and then enumerate them to check dependencies with help of kcmp. That is certainly one way to go. There is a funny case where we would want to compare a user namespace file descriptor to a parent user namespace file descriptor. Grumble, Grumble. I think this may actually a case for creating ioctls for these two cases. Now that random nsfs file descriptors are bind mountable the original reason for using proc files is not as pressing. One ioctl for the user namespace that owns a file descriptor. One ioctl for the parent namespace of a namespace file descriptor. We also need some way to get a command file descriptor for a file system super block. Al Viro has a pet project for cleaning up the mount API and this might be the idea excuse to start looking at that. (In principle we might be able to run commands through the namespace file descriptor and using an ioctl feels dirty. But an ioctl that only uses the fd and request argument does not suffer from the same problems that ioctls that have to pass additional arguments suffer from.) Eric
Re: [v1] ErrHandling:Make IS_ERR_VALUE_U32 as generic API to avoid IS_ERR_VALUE abuses.
Hi, [auto build test WARNING on v4.7-rc6] [also build test WARNING on next-20160708] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Arvind-Yadav/ErrHandling-Make-IS_ERR_VALUE_U32-as-generic-API-to-avoid-IS_ERR_VALUE-abuses/20160709-235356 config: x86_64-randconfig-x007-201628 (attached as .config) compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): In file included from include/uapi/linux/stddef.h:1:0, from include/linux/stddef.h:4, from include/uapi/linux/posix_types.h:4, from include/uapi/linux/types.h:13, from include/linux/types.h:5, from include/linux/mod_devicetable.h:11, from include/linux/pci.h:20, from include/linux/bcma/bcma.h:4, from drivers/bcma/bcma_private.h:8, from drivers/bcma/scan.c:9: drivers/bcma/scan.c: In function 'bcma_get_next_core': include/linux/err.h:23:52: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:151:30: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^~~~ >> drivers/bcma/scan.c:361:2: note: in expansion of macro 'if' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^~ include/linux/err.h:23:29: note: in expansion of macro 'unlikely' #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^~~~ drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^~~~ include/linux/err.h:23:38: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:151:30: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^~~~ >> drivers/bcma/scan.c:361:2: note: in expansion of macro 'if' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^~ include/linux/err.h:23:29: note: in expansion of macro 'unlikely' #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^~~~ drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^~~~ include/linux/err.h:23:52: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:151:30: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^~~~ >> drivers/bcma/scan.c:361:2: note: in expansion of macro 'if' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^~ include/linux/err.h:23:29: note: in expansion of macro 'unlikely' #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^~~~ drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^~~~ include/linux/err.h:23:38: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:151:30: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^~~~ >> drivers/bcma/scan.c:361:2: note: in expansion of macro 'if' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^~ include/linux/err.h:23:29: note: in expansion of macro 'unlikely' #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^~~~ drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32'
Re: [tip:x86/debug] printk: Make the printk*once() variants return a value
On Sat, 2016-07-09 at 09:50 +0200, Borislav Petkov wrote: > On Fri, Jul 08, 2016 at 07:40:48PM -0700, Joe Perches wrote: > > This change isn't described in the commit message and there > > doesn't seem to be a need to change this. > How do *you* know? Did *you* actually sit down and build a kernel with > your proposed change before sending a reply? > I'm pretty sure you didn't. defconfigs both with and without CONFIG_PRINTK build properly with the proposed change to this specific patch. > Well, there is a very good reason why I made that change but I'm not > going to tell you. Borislav, your delightful personality always impresses. Never change. If there is a specific reason you know why this 0; value must be added to a do {} while (0) to statement expression macro conversion, it'd be good to write that in the commit message. It'd also be good to remove the useless "do {} while (0);" surrounding a single statement.
[tip:x86/urgent] x86/cpu: Fix duplicated X86_BUG(9) macro
Commit-ID: 8709ed4d4b0eab04561c1ec9e6ea50fd1e3897ff Gitweb: http://git.kernel.org/tip/8709ed4d4b0eab04561c1ec9e6ea50fd1e3897ff Author: Dave Hansen AuthorDate: Fri, 17 Jun 2016 17:15:03 -0700 Committer: Ingo Molnar CommitDate: Sat, 9 Jul 2016 14:06:06 +0200 x86/cpu: Fix duplicated X86_BUG(9) macro cpufeatures.h currently defines X86_BUG(9) twice on 32-bit: #define X86_BUG_NULL_SEGX86_BUG(9) /* Nulling a selector preserves the base */ ... #ifdef CONFIG_X86_32 #define X86_BUG_ESPFIX X86_BUG(9) /* "" IRET to 16-bit SS corrupts ESP/RSP high bits */ #endif I think what happened was that this added the X86_BUG_ESPFIX, but in an #ifdef below most of the bugs: 58a5aac53313 x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled Then this came along and added X86_BUG_NULL_SEG, but collided with the earlier one that did the bug below the main block defining all the X86_BUG()s. 7a5d67048745 x86/cpu: Probe the behavior of nulling out a segment at boot time Signed-off-by: Dave Hansen Acked-by: Andy Lutomirski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: sta...@vger.kernel.org Link: http://lkml.kernel.org/r/20160618001503.cee1b...@viggo.jf.intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/cpufeatures.h | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 4a41348..c64b1e9 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -301,10 +301,6 @@ #define X86_BUG_FXSAVE_LEAKX86_BUG(6) /* FXSAVE leaks FOP/FIP/FOP */ #define X86_BUG_CLFLUSH_MONITORX86_BUG(7) /* AAI65, CLFLUSH required before MONITOR */ #define X86_BUG_SYSRET_SS_ATTRSX86_BUG(8) /* SYSRET doesn't fix up SS attrs */ -#define X86_BUG_NULL_SEG X86_BUG(9) /* Nulling a selector preserves the base */ -#define X86_BUG_SWAPGS_FENCE X86_BUG(10) /* SWAPGS without input dep on GS */ - - #ifdef CONFIG_X86_32 /* * 64-bit kernels don't use X86_BUG_ESPFIX. Make the define conditional @@ -312,5 +308,7 @@ */ #define X86_BUG_ESPFIX X86_BUG(9) /* "" IRET to 16-bit SS corrupts ESP/RSP high bits */ #endif +#define X86_BUG_NULL_SEG X86_BUG(10) /* Nulling a selector preserves the base */ +#define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ #endif /* _ASM_X86_CPUFEATURES_H */
[tip:x86/platform] x86/platform/intel-mid: Rename mrfl.c to mrfld.c
Commit-ID: 62d855d3e725f4e4b0d2786f7cad3f0660a03a59 Gitweb: http://git.kernel.org/tip/62d855d3e725f4e4b0d2786f7cad3f0660a03a59 Author: Andy Shevchenko AuthorDate: Sat, 18 Jun 2016 18:51:34 +0300 Committer: Ingo Molnar CommitDate: Sat, 9 Jul 2016 14:02:09 +0200 x86/platform/intel-mid: Rename mrfl.c to mrfld.c Use mrfld as an abbreviation of Merrifield to be consistent with the rest of the code. In the future we are going to add more files here prefixed with 'mrfld'. Signed-off-by: Andy Shevchenko Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1466265094-146113-1-git-send-email-andriy.shevche...@linux.intel.com Signed-off-by: Ingo Molnar --- arch/x86/platform/intel-mid/Makefile| 2 +- arch/x86/platform/intel-mid/{mrfl.c => mrfld.c} | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/platform/intel-mid/Makefile b/arch/x86/platform/intel-mid/Makefile index aebb5b9..fa021df 100644 --- a/arch/x86/platform/intel-mid/Makefile +++ b/arch/x86/platform/intel-mid/Makefile @@ -1,4 +1,4 @@ -obj-$(CONFIG_X86_INTEL_MID) += intel-mid.o intel_mid_vrtc.o mfld.o mrfl.o pwr.o +obj-$(CONFIG_X86_INTEL_MID) += intel-mid.o intel_mid_vrtc.o mfld.o mrfld.o pwr.o # SFI specific code ifdef CONFIG_X86_INTEL_MID diff --git a/arch/x86/platform/intel-mid/mrfl.c b/arch/x86/platform/intel-mid/mrfld.c similarity index 97% rename from arch/x86/platform/intel-mid/mrfl.c rename to arch/x86/platform/intel-mid/mrfld.c index bd1adc6..59253db 100644 --- a/arch/x86/platform/intel-mid/mrfl.c +++ b/arch/x86/platform/intel-mid/mrfld.c @@ -1,5 +1,5 @@ /* - * mrfl.c: Intel Merrifield platform specific setup code + * Intel Merrifield platform specific setup code * * (C) Copyright 2013 Intel Corporation *
[tip:sched/core] sched/cpuacct: Merge cpuacct_usage_index and cpuacct_stat_index enums
Commit-ID: 9acacc2ac525ef1397af63b15cef7bb77a823c06 Gitweb: http://git.kernel.org/tip/9acacc2ac525ef1397af63b15cef7bb77a823c06 Author: Zhao Lei AuthorDate: Mon, 20 Jun 2016 17:37:18 +0800 Committer: Ingo Molnar CommitDate: Sat, 9 Jul 2016 13:56:15 +0200 sched/cpuacct: Merge cpuacct_usage_index and cpuacct_stat_index enums These two types have similar function, no need to separate them. Signed-off-by: Zhao Lei Cc: KOSAKI Motohiro Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/436748885270d64363c7dc67167507d486c2057a.1466415271.git.zhao...@cn.fujitsu.com Signed-off-by: Ingo Molnar --- kernel/sched/cpuacct.c | 47 --- 1 file changed, 20 insertions(+), 27 deletions(-) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 41f85c4..74241eb 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -25,15 +25,13 @@ enum cpuacct_stat_index { CPUACCT_STAT_NSTATS, }; -enum cpuacct_usage_index { - CPUACCT_USAGE_USER, /* ... user mode */ - CPUACCT_USAGE_SYSTEM, /* ... kernel mode */ - - CPUACCT_USAGE_NRUSAGE, +static const char * const cpuacct_stat_desc[] = { + [CPUACCT_STAT_USER] = "user", + [CPUACCT_STAT_SYSTEM] = "system", }; struct cpuacct_usage { - u64 usages[CPUACCT_USAGE_NRUSAGE]; + u64 usages[CPUACCT_STAT_NSTATS]; }; /* track cpu usage of a group of tasks and its child groups */ @@ -108,16 +106,16 @@ static void cpuacct_css_free(struct cgroup_subsys_state *css) } static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, -enum cpuacct_usage_index index) +enum cpuacct_stat_index index) { struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); u64 data; /* -* We allow index == CPUACCT_USAGE_NRUSAGE here to read +* We allow index == CPUACCT_STAT_NSTATS here to read * the sum of suages. */ - BUG_ON(index > CPUACCT_USAGE_NRUSAGE); + BUG_ON(index > CPUACCT_STAT_NSTATS); #ifndef CONFIG_64BIT /* @@ -126,11 +124,11 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, raw_spin_lock_irq(&cpu_rq(cpu)->lock); #endif - if (index == CPUACCT_USAGE_NRUSAGE) { + if (index == CPUACCT_STAT_NSTATS) { int i = 0; data = 0; - for (i = 0; i < CPUACCT_USAGE_NRUSAGE; i++) + for (i = 0; i < CPUACCT_STAT_NSTATS; i++) data += cpuusage->usages[i]; } else { data = cpuusage->usages[index]; @@ -155,7 +153,7 @@ static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) raw_spin_lock_irq(&cpu_rq(cpu)->lock); #endif - for (i = 0; i < CPUACCT_USAGE_NRUSAGE; i++) + for (i = 0; i < CPUACCT_STAT_NSTATS; i++) cpuusage->usages[i] = val; #ifndef CONFIG_64BIT @@ -165,7 +163,7 @@ static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) /* return total cpu usage (in nanoseconds) of a group */ static u64 __cpuusage_read(struct cgroup_subsys_state *css, - enum cpuacct_usage_index index) + enum cpuacct_stat_index index) { struct cpuacct *ca = css_ca(css); u64 totalcpuusage = 0; @@ -180,18 +178,18 @@ static u64 __cpuusage_read(struct cgroup_subsys_state *css, static u64 cpuusage_user_read(struct cgroup_subsys_state *css, struct cftype *cft) { - return __cpuusage_read(css, CPUACCT_USAGE_USER); + return __cpuusage_read(css, CPUACCT_STAT_USER); } static u64 cpuusage_sys_read(struct cgroup_subsys_state *css, struct cftype *cft) { - return __cpuusage_read(css, CPUACCT_USAGE_SYSTEM); + return __cpuusage_read(css, CPUACCT_STAT_SYSTEM); } static u64 cpuusage_read(struct cgroup_subsys_state *css, struct cftype *cft) { - return __cpuusage_read(css, CPUACCT_USAGE_NRUSAGE); + return __cpuusage_read(css, CPUACCT_STAT_NSTATS); } static int cpuusage_write(struct cgroup_subsys_state *css, struct cftype *cft, @@ -213,7 +211,7 @@ static int cpuusage_write(struct cgroup_subsys_state *css, struct cftype *cft, } static int __cpuacct_percpu_seq_show(struct seq_file *m, -enum cpuacct_usage_index index) +enum cpuacct_stat_index index) { struct cpuacct *ca = css_ca(seq_css(m)); u64 percpu; @@ -229,24 +227,19 @@ static int __cpuacct_percpu_seq_show(struct seq_file *m, static int cpuacct_percpu_user_seq_show(struct seq_file *m, void *V) { - return __cpuacct_percpu_seq_show(m, CPUACCT_USAGE_USER); + return __cpuacct_percpu_seq_show(m, CPUACCT_STAT_USER); } static int cpuacct_percpu_sys_seq_show(struct seq_file *m, v
[tip:sched/core] sched/cpuacct: Use loop to consolidate code in cpuacct_stats_show()
Commit-ID: 8e546bfafb3121ed25c73a0c02311ec58459344a Gitweb: http://git.kernel.org/tip/8e546bfafb3121ed25c73a0c02311ec58459344a Author: Zhao Lei AuthorDate: Mon, 20 Jun 2016 17:37:19 +0800 Committer: Ingo Molnar CommitDate: Sat, 9 Jul 2016 13:56:15 +0200 sched/cpuacct: Use loop to consolidate code in cpuacct_stats_show() In cpuacct_stats_show() we currently we have copies of similar code, for each cpustat(system/user) variant. Use a loop instead to consolidate the code. This will also work better if we extend the CPUACCT_STAT_NSTATS type. Signed-off-by: Zhao Lei Cc: KOSAKI Motohiro Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/b0597d4224655e9f333f1a6224ed9654c7d7d36a.1466415271.git.zhao...@cn.fujitsu.com Signed-off-by: Ingo Molnar --- kernel/sched/cpuacct.c | 29 ++--- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 74241eb..677cd1a 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -243,27 +243,26 @@ static int cpuacct_percpu_seq_show(struct seq_file *m, void *V) static int cpuacct_stats_show(struct seq_file *sf, void *v) { struct cpuacct *ca = css_ca(seq_css(sf)); + s64 val[CPUACCT_STAT_NSTATS]; int cpu; - s64 val = 0; + int stat; + memset(val, 0, sizeof(val)); for_each_possible_cpu(cpu) { - struct kernel_cpustat *kcpustat = per_cpu_ptr(ca->cpustat, cpu); - val += kcpustat->cpustat[CPUTIME_USER]; - val += kcpustat->cpustat[CPUTIME_NICE]; - } - val = cputime64_to_clock_t(val); - seq_printf(sf, "%s %lld\n", cpuacct_stat_desc[CPUACCT_STAT_USER], val); + u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; - val = 0; - for_each_possible_cpu(cpu) { - struct kernel_cpustat *kcpustat = per_cpu_ptr(ca->cpustat, cpu); - val += kcpustat->cpustat[CPUTIME_SYSTEM]; - val += kcpustat->cpustat[CPUTIME_IRQ]; - val += kcpustat->cpustat[CPUTIME_SOFTIRQ]; + val[CPUACCT_STAT_USER] += cpustat[CPUTIME_USER]; + val[CPUACCT_STAT_USER] += cpustat[CPUTIME_NICE]; + val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SYSTEM]; + val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_IRQ]; + val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SOFTIRQ]; } - val = cputime64_to_clock_t(val); - seq_printf(sf, "%s %lld\n", cpuacct_stat_desc[CPUACCT_STAT_SYSTEM], val); + for (stat = 0; stat < CPUACCT_STAT_NSTATS; stat++) { + seq_printf(sf, "%s %lld\n", + cpuacct_stat_desc[stat], + cputime64_to_clock_t(val[stat])); + } return 0; }
[tip:sched/core] sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats together
Commit-ID: 277a13e4f0d661678a7084bf97ed96a99c7dac21 Gitweb: http://git.kernel.org/tip/277a13e4f0d661678a7084bf97ed96a99c7dac21 Author: Zhao Lei AuthorDate: Mon, 20 Jun 2016 17:37:20 +0800 Committer: Ingo Molnar CommitDate: Sat, 9 Jul 2016 13:56:15 +0200 sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats together In current code, we can get cpuacct data from several files, but each file has various limitations. For example: - We can get CPU usage in user and kernel mode via cpuacct.stat, but we can't get detailed data about each CPU. - We can get each CPU's kernel mode usage in cpuacct.usage_percpu_sys, but we can't get user mode usage data at the same time. This patch introduces cpuacct.usage_all, to show all detailed CPU accounting data together: # cat cpuacct.usage_all cpu user system 0 3809760299 5807968992 1 3250329855 454612211 .. Signed-off-by: Zhao Lei Cc: KOSAKI Motohiro Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/7744460969edd7caaf0e903592ee52353ed9bdd6.1466415271.git.zhao...@cn.fujitsu.com Signed-off-by: Ingo Molnar --- kernel/sched/cpuacct.c | 40 1 file changed, 40 insertions(+) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 677cd1a..bc0b309c 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -240,6 +240,42 @@ static int cpuacct_percpu_seq_show(struct seq_file *m, void *V) return __cpuacct_percpu_seq_show(m, CPUACCT_STAT_NSTATS); } +static int cpuacct_all_seq_show(struct seq_file *m, void *V) +{ + struct cpuacct *ca = css_ca(seq_css(m)); + int index; + int cpu; + + seq_puts(m, "cpu"); + for (index = 0; index < CPUACCT_STAT_NSTATS; index++) + seq_printf(m, " %s", cpuacct_stat_desc[index]); + seq_puts(m, "\n"); + + for_each_possible_cpu(cpu) { + struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); + + seq_printf(m, "%d", cpu); + + for (index = 0; index < CPUACCT_STAT_NSTATS; index++) { +#ifndef CONFIG_64BIT + /* +* Take rq->lock to make 64-bit read safe on 32-bit +* platforms. +*/ + raw_spin_lock_irq(&cpu_rq(cpu)->lock); +#endif + + seq_printf(m, " %llu", cpuusage->usages[index]); + +#ifndef CONFIG_64BIT + raw_spin_unlock_irq(&cpu_rq(cpu)->lock); +#endif + } + seq_puts(m, "\n"); + } + return 0; +} + static int cpuacct_stats_show(struct seq_file *sf, void *v) { struct cpuacct *ca = css_ca(seq_css(sf)); @@ -294,6 +330,10 @@ static struct cftype files[] = { .seq_show = cpuacct_percpu_sys_seq_show, }, { + .name = "usage_all", + .seq_show = cpuacct_all_seq_show, + }, + { .name = "stat", .seq_show = cpuacct_stats_show, },
Missing include file in include/uapi/linux/errqueue.h?
Hello! I've been attempting to qualify the Linux 4.5.2 user-space headers for a toolchain release, and ran into what looks like a missing include file in include/uapi/linux/errqueue.h. In particular, https://github.com/torvalds/linux/commit/f24b9be5957b38bb420b838115040dc2031b7d0c adds the following to this file: +struct scm_timestamping { + struct timespec ts[3]; +}; However, struct timespec is defined in time.h, which isn't included either in 4.5.2 or in current head. Is this simply a missing #include line, or am I misunderstanding something? I also note that this is the second user-space header in the Linux 4.5.2 release we've run into that simply fails to compile when included by itself. Is there not a test target that tests for this? Would it be welcome if I were to work on adding one? Thanks, - Brooks
[PATCH] mm: gup: Re-define follow_page_mask output parameter page_mask usage
From: Chen Gang For a pure output parameter: - When callee fails, the caller should not assume the output parameter is still valid. - And callee should not assume the pure output parameter must be provided by caller -- caller has right to pass NULL when caller does not care about it. Signed-off-by: Chen Gang --- include/linux/mm.h | 5 ++--- mm/gup.c | 6 +++--- mm/mlock.c | 2 +- mm/nommu.c | 1 - 4 files changed, 6 insertions(+), 8 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index b21e5f3..5c560fd 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2205,10 +2205,9 @@ struct page *follow_page_mask(struct vm_area_struct *vma, unsigned int *page_mask); static inline struct page *follow_page(struct vm_area_struct *vma, - unsigned long address, unsigned int foll_flags) + unsigned long address, unsigned int foll_flags) { - unsigned int unused_page_mask; - return follow_page_mask(vma, address, foll_flags, &unused_page_mask); + return follow_page_mask(vma, address, foll_flags, NULL); } #define FOLL_WRITE 0x01/* check pte is writable */ diff --git a/mm/gup.c b/mm/gup.c index 96b2b2f..9684b06 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -222,8 +222,6 @@ struct page *follow_page_mask(struct vm_area_struct *vma, struct page *page; struct mm_struct *mm = vma->vm_mm; - *page_mask = 0; - page = follow_huge_addr(mm, address, flags & FOLL_WRITE); if (!IS_ERR(page)) { BUG_ON(flags & FOLL_GET); @@ -298,7 +296,8 @@ struct page *follow_page_mask(struct vm_area_struct *vma, page = follow_trans_huge_pmd(vma, address, pmd, flags); spin_unlock(ptl); - *page_mask = HPAGE_PMD_NR - 1; + if (page_mask) + *page_mask = HPAGE_PMD_NR - 1; return page; } @@ -574,6 +573,7 @@ retry: if (unlikely(fatal_signal_pending(current))) return i ? i : -ERESTARTSYS; cond_resched(); + page_mask = 0; page = follow_page_mask(vma, start, foll_flags, &page_mask); if (!page) { int ret; diff --git a/mm/mlock.c b/mm/mlock.c index ef8dc9f..626eb58 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -438,7 +438,7 @@ void munlock_vma_pages_range(struct vm_area_struct *vma, while (start < end) { struct page *page; - unsigned int page_mask; + unsigned int page_mask = 0; unsigned long page_increm; struct pagevec pvec; struct zone *zone; diff --git a/mm/nommu.c b/mm/nommu.c index 95daf81..c1a0a89 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1749,7 +1749,6 @@ struct page *follow_page_mask(struct vm_area_struct *vma, unsigned long address, unsigned int flags, unsigned int *page_mask) { - *page_mask = 0; return NULL; } -- 1.9.3
Re: [PATCH 2/2] drm/vc4: Squash commit for Mario's precise vblank timestamping.
Hi Eric, thanks for all the infos and help! Both your patches look good and i have successfully tested them on top of with my vblank timestamping patch. So for both: Reviewed-and-tested-by: Mario Kleiner Will you squash 2/2 into my patch or should i resend my patch with yours squashed in? thanks, -mario On 07/08/2016 08:44 PM, Eric Anholt wrote: Read out the DISPBASE registers to decide on the FIFO size. Signed-off-by: Eric Anholt --- Mario: How about this for a squash into your commit? Here are the values I dumped for cob_size: [2.148314] [drm] Scaler 0 size 5232 [2.162239] [drm] Scaler 2 size 2048 [2.172957] [drm] Scaler 1 size 13456 drivers/gpu/drm/vc4/vc4_crtc.c | 23 +-- drivers/gpu/drm/vc4/vc4_regs.h | 18 +- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c index baf962bce063..3b7db17c356d 100644 --- a/drivers/gpu/drm/vc4/vc4_crtc.c +++ b/drivers/gpu/drm/vc4/vc4_crtc.c @@ -55,6 +55,8 @@ struct vc4_crtc { u8 lut_r[256]; u8 lut_g[256]; u8 lut_b[256]; + /* Size in pixels of the COB memory allocated to this CRTC. */ + u32 cob_size; struct drm_pending_vblank_event *event; }; @@ -195,8 +197,7 @@ int vc4_crtc_get_scanoutpos(struct drm_device *dev, unsigned int crtc_id, *hpos = 0; /* This is the offset we need for translating hvs -> pv scanout pos. */ - /* XXX Find proper formula from hw docs instead of guesstimating? */ - fifo_lines = 2048 * 7 / mode->crtc_hdisplay; + fifo_lines = vc4_crtc->cob_size / mode->crtc_hdisplay; if (fifo_lines > 0) ret |= DRM_SCANOUTPOS_VALID; @@ -873,6 +874,22 @@ static void vc4_set_crtc_possible_masks(struct drm_device *drm, } } +static void +vc4_crtc_get_cob_allocation(struct vc4_crtc *vc4_crtc) +{ + struct drm_device *drm = vc4_crtc->base.dev; + struct vc4_dev *vc4 = to_vc4_dev(drm); + u32 dispbase = HVS_READ(SCALER_DISPBASEX(vc4_crtc->channel)); + /* Top/base are supposed to be 4-pixel aligned, but the +* Raspberry Pi firmware fills the low bits (which are +* presumably ignored). +*/ + u32 top = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_TOP) & ~3; + u32 base = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_BASE) & ~3; + + vc4_crtc->cob_size = top - base + 4; +} + static int vc4_crtc_bind(struct device *dev, struct device *master, void *data) { struct platform_device *pdev = to_platform_device(dev); @@ -949,6 +966,8 @@ static int vc4_crtc_bind(struct device *dev, struct device *master, void *data) crtc->cursor = cursor_plane; } + vc4_crtc_get_cob_allocation(vc4_crtc); + CRTC_WRITE(PV_INTEN, 0); CRTC_WRITE(PV_INTSTAT, PV_INT_VFP_START); ret = devm_request_irq(dev, platform_get_irq(pdev, 0), diff --git a/drivers/gpu/drm/vc4/vc4_regs.h b/drivers/gpu/drm/vc4/vc4_regs.h index 63cdc28ff7bb..160942a9180e 100644 --- a/drivers/gpu/drm/vc4/vc4_regs.h +++ b/drivers/gpu/drm/vc4/vc4_regs.h @@ -366,7 +366,6 @@ # define SCALER_DISPBKGND_FILLBIT(24) #define SCALER_DISPSTAT00x0048 -#define SCALER_DISPBASE00x004c # define SCALER_DISPSTATX_MODE_MASK VC4_MASK(31, 30) # define SCALER_DISPSTATX_MODE_SHIFT 30 # define SCALER_DISPSTATX_MODE_DISABLED 0 @@ -379,6 +378,20 @@ # define SCALER_DISPSTATX_FRAME_COUNT_SHIFT 12 # define SCALER_DISPSTATX_LINE_MASK VC4_MASK(11, 0) # define SCALER_DISPSTATX_LINE_SHIFT 0 + +#define SCALER_DISPBASE00x004c +/* Last pixel in the COB (display FIFO memory) allocated to this HVS + * channel. Must be 4-pixel aligned (and thus 4 pixels less than the + * next COB base). + */ +# define SCALER_DISPBASEX_TOP_MASK VC4_MASK(31, 16) +# define SCALER_DISPBASEX_TOP_SHIFT16 +/* First pixel in the COB (display FIFO memory) allocated to this HVS + * channel. Must be 4-pixel aligned. + */ +# define SCALER_DISPBASEX_BASE_MASKVC4_MASK(15, 0) +# define SCALER_DISPBASEX_BASE_SHIFT 0 + #define SCALER_DISPCTRL10x0050 #define SCALER_DISPBKGND1 0x0054 #define SCALER_DISPBKGNDX(x) (SCALER_DISPBKGND0 +\ @@ -389,6 +402,9 @@ (x) * (SCALER_DISPSTAT1 - \ SCALER_DISPSTAT0)) #define SCALER_DISPBASE10x005c +#define SCALER_DISPBASEX(x)(SCALER_DISPBASE0 +\ +(x) * (SCALER_DISPBASE1 - \ + SCALER_DISPBASE0)) #define SCALER_DISPCTRL2
Re: [kernel-hardening] Re: [PATCH 9/9] mm: SLUB hardened usercopy support
On Fri, Jul 8, 2016 at 11:17 PM, wrote: > Yeah, 'ping' dies with a similar traceback going to rawv6_setsockopt(), > and 'trinity' dies a horrid death during initialization because it creates > some sctp sockets to fool around with. The problem in all these cases is that > setsockopt uses copy_from_user() to pull in the option value, and the > allocation > isn't tagged with USERCOPY to whitelist it. Just a note to clear up confusion: this series doesn't include the whitelist protection, so this appears to be either bugs in the slub checker or bugs in the code using the cfq_io_cq cache. I suspect the former. :) -Kees -- Kees Cook Chrome OS & Brillo Security
Re: [v1] ErrHandling:Make IS_ERR_VALUE_U32 as generic API to avoid IS_ERR_VALUE abuses.
Hi, I have summited one more version-2 patch. Please test on that. please share your result with us. Thanks, Arvind yadav On Saturday 09 July 2016 10:08 PM, kbuild test robot wrote: Hi, [auto build test WARNING on v4.7-rc6] [also build test WARNING on next-20160708] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Arvind-Yadav/ErrHandling-Make-IS_ERR_VALUE_U32-as-generic-API-to-avoid-IS_ERR_VALUE-abuses/20160709-235356 config: x86_64-rhel (attached as .config) compiler: gcc-4.9 (Debian 4.9.3-14) 4.9.3 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): In file included from include/uapi/linux/stddef.h:1:0, from include/linux/stddef.h:4, from include/uapi/linux/posix_types.h:4, from include/uapi/linux/types.h:13, from include/linux/types.h:5, from include/linux/mod_devicetable.h:11, from include/linux/pci.h:20, from include/linux/bcma/bcma.h:4, from drivers/bcma/bcma_private.h:8, from drivers/bcma/scan.c:9: drivers/bcma/scan.c: In function 'bcma_get_next_core': include/linux/err.h:23:52: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:170:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^ include/linux/err.h:23:38: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:170:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^ include/linux/err.h:23:52: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:170:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/bcma/scan.c:365:19: note: in expansion of macro 'IS_ERR_VALUE_U32' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^ include/linux/err.h:23:38: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:170:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/bcma/scan.c:365:19: note: in expansion of macro 'IS_ERR_VALUE_U32' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^ include/linux/err.h:23:52: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:170:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/bcma/scan.c:380:8: note: in expansion of macro 'IS_ERR_VALUE_U32' if (IS_ERR_VALUE_U32(tmp)) { ^ include/linux/err.h:23:38: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:170:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/bcma/scan.c:380:8: note: in expansion of macro 'IS_ERR_VALUE_U32' if (IS_ERR_VALUE_U32(tmp)) {
Re: [PATCH 0/9] mm: Hardened usercopy
On Sat, Jul 9, 2016 at 1:25 AM, Ard Biesheuvel wrote: > On 9 July 2016 at 04:22, Laura Abbott wrote: >> On 07/06/2016 03:25 PM, Kees Cook wrote: >>> >>> Hi, >>> >>> This is a start of the mainline port of PAX_USERCOPY[1]. After I started >>> writing tests (now in lkdtm in -next) for Casey's earlier port[2], I >>> kept tweaking things further and further until I ended up with a whole >>> new patch series. To that end, I took Rik's feedback and made a number >>> of other changes and clean-ups as well. >>> >>> Based on my understanding, PAX_USERCOPY was designed to catch a few >>> classes of flaws around the use of copy_to_user()/copy_from_user(). These >>> changes don't touch get_user() and put_user(), since these operate on >>> constant sized lengths, and tend to be much less vulnerable. There >>> are effectively three distinct protections in the whole series, >>> each of which I've given a separate CONFIG, though this patch set is >>> only the first of the three intended protections. (Generally speaking, >>> PAX_USERCOPY covers what I'm calling CONFIG_HARDENED_USERCOPY (this) and >>> CONFIG_HARDENED_USERCOPY_WHITELIST (future), and PAX_USERCOPY_SLABS covers >>> CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC (future).) >>> >>> This series, which adds CONFIG_HARDENED_USERCOPY, checks that objects >>> being copied to/from userspace meet certain criteria: >>> - if address is a heap object, the size must not exceed the object's >>> allocated size. (This will catch all kinds of heap overflow flaws.) >>> - if address range is in the current process stack, it must be within the >>> current stack frame (if such checking is possible) or at least entirely >>> within the current process's stack. (This could catch large lengths that >>> would have extended beyond the current process stack, or overflows if >>> their length extends back into the original stack.) >>> - if the address range is part of kernel data, rodata, or bss, allow it. >>> - if address range is page-allocated, that it doesn't span multiple >>> allocations. >>> - if address is within the kernel text, reject it. >>> - everything else is accepted >>> >>> The patches in the series are: >>> - The core copy_to/from_user() checks, without the slab object checks: >>> 1- mm: Hardened usercopy >>> - Per-arch enablement of the protection: >>> 2- x86/uaccess: Enable hardened usercopy >>> 3- ARM: uaccess: Enable hardened usercopy >>> 4- arm64/uaccess: Enable hardened usercopy >>> 5- ia64/uaccess: Enable hardened usercopy >>> 6- powerpc/uaccess: Enable hardened usercopy >>> 7- sparc/uaccess: Enable hardened usercopy >>> - The heap allocator implementation of object size checking: >>> 8- mm: SLAB hardened usercopy support >>> 9- mm: SLUB hardened usercopy support >>> >>> Some notes: >>> >>> - This is expected to apply on top of -next which contains fixes for the >>> position of _etext on both arm and arm64. >>> >>> - I couldn't detect a measurable performance change with these features >>> enabled. Kernel build times were unchanged, hackbench was unchanged, >>> etc. I think we could flip this to "on by default" at some point. >>> >>> - The SLOB support extracted from grsecurity seems entirely broken. I >>> have no idea what's going on there, I spent my time testing SLAB and >>> SLUB. Having someone else look at SLOB would be nice, but this series >>> doesn't depend on it. >>> >>> Additional features that would be nice, but aren't blocking this series: >>> >>> - Needs more architecture support for stack frame checking (only x86 now). >>> >>> >> >> Even with the SLUB fixup I'm still seeing this blow up on my arm64 system. >> This is a >> Fedora rawhide kernel + the patches >> >> [ 0.666700] usercopy: kernel memory exposure attempt detected from >> fc0008b4dd58 () (8 bytes) >> [ 0.666720] CPU: 2 PID: 79 Comm: modprobe Tainted: GW >> 4.7.0-0.rc6.git1.1.hardenedusercopy.fc25.aarch64 #1 >> [ 0.666733] Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Nov 24 >> 2015 >> [ 0.666744] Call trace: >> [ 0.666756] [] dump_backtrace+0x0/0x1e8 >> [ 0.666765] [] show_stack+0x24/0x30 >> [ 0.666775] [] dump_stack+0xa4/0xe0 >> [ 0.666785] [] __check_object_size+0x6c/0x230 >> [ 0.666795] [] create_elf_tables+0x74/0x420 >> [ 0.666805] [] load_elf_binary+0x828/0xb70 >> [ 0.666814] [] search_binary_handler+0xb4/0x240 >> [ 0.666823] [] do_execveat_common+0x63c/0x950 >> [ 0.666832] [] do_execve+0x3c/0x50 >> [ 0.666841] [] call_usermodehelper_exec_async+0xe8/0x148 >> [ 0.666850] [] ret_from_fork+0x10/0x50 >> >> This happens on every call to execve. This seems to be the first >> copy_to_user in >> create_elf_tables. I didn't get a chance to debug and I'm going out of town >> all of next week so all I have is the report unfortunately. config attached. >> > > This is a known issue, and a fix is already queued for v4.8 in the arm64 tree: > > 9fdc14c55c arm64: mm: fix location of _etext [0] >
[v2] ErrHandling:Make IS_ERR_VALUE_U32 as generic API to avoid IS_ERR_VALUE abuses.
IS_ERR_VALUE() assumes that its parameter is an unsigned long. It can not be used to check if an 'unsigned int' reflects an error. As they pass an 'unsigned int' into a function that takes an 'unsigned long' argument. This happens to work because the type is sign-extended on 64-bit architectures before it gets converted into an unsigned type. However, anything that passes an 'unsigned short' or 'unsigned int' argument into IS_ERR_VALUE() is guaranteed to be broken, as are 8-bit integers and types that are wider than 'unsigned long'. It would be nice to any users that are not passing 'unsigned int' arguments. Signed-off-by: Arvind Yadav --- drivers/bcma/scan.c | 1 - include/linux/err.h | 2 ++ 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/bcma/scan.c b/drivers/bcma/scan.c index 4a2d1b2..3bc77eb 100644 --- a/drivers/bcma/scan.c +++ b/drivers/bcma/scan.c @@ -272,7 +272,6 @@ static struct bcma_device *bcma_find_core_reverse(struct bcma_bus *bus, u16 core return NULL; } -#define IS_ERR_VALUE_U32(x) ((x) >= (u32)-MAX_ERRNO) static int bcma_get_next_core(struct bcma_bus *bus, u32 __iomem **eromptr, struct bcma_device_id *match, int core_num, diff --git a/include/linux/err.h b/include/linux/err.h index 1e35588..e05a63d 100644 --- a/include/linux/err.h +++ b/include/linux/err.h @@ -20,6 +20,8 @@ #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO) +#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(x) >= (unsigned int)-MAX_ERRNO) + static inline void * __must_check ERR_PTR(long error) { return (void *) error; -- 1.9.1
Re: [PATCH 0/9] mm: Hardened usercopy
On Fri, Jul 8, 2016 at 7:22 PM, Laura Abbott wrote: > On 07/06/2016 03:25 PM, Kees Cook wrote: >> >> Hi, >> >> This is a start of the mainline port of PAX_USERCOPY[1]. After I started >> writing tests (now in lkdtm in -next) for Casey's earlier port[2], I >> kept tweaking things further and further until I ended up with a whole >> new patch series. To that end, I took Rik's feedback and made a number >> of other changes and clean-ups as well. >> >> Based on my understanding, PAX_USERCOPY was designed to catch a few >> classes of flaws around the use of copy_to_user()/copy_from_user(). These >> changes don't touch get_user() and put_user(), since these operate on >> constant sized lengths, and tend to be much less vulnerable. There >> are effectively three distinct protections in the whole series, >> each of which I've given a separate CONFIG, though this patch set is >> only the first of the three intended protections. (Generally speaking, >> PAX_USERCOPY covers what I'm calling CONFIG_HARDENED_USERCOPY (this) and >> CONFIG_HARDENED_USERCOPY_WHITELIST (future), and PAX_USERCOPY_SLABS covers >> CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC (future).) >> >> This series, which adds CONFIG_HARDENED_USERCOPY, checks that objects >> being copied to/from userspace meet certain criteria: >> - if address is a heap object, the size must not exceed the object's >> allocated size. (This will catch all kinds of heap overflow flaws.) >> - if address range is in the current process stack, it must be within the >> current stack frame (if such checking is possible) or at least entirely >> within the current process's stack. (This could catch large lengths that >> would have extended beyond the current process stack, or overflows if >> their length extends back into the original stack.) >> - if the address range is part of kernel data, rodata, or bss, allow it. >> - if address range is page-allocated, that it doesn't span multiple >> allocations. >> - if address is within the kernel text, reject it. >> - everything else is accepted >> >> The patches in the series are: >> - The core copy_to/from_user() checks, without the slab object checks: >> 1- mm: Hardened usercopy >> - Per-arch enablement of the protection: >> 2- x86/uaccess: Enable hardened usercopy >> 3- ARM: uaccess: Enable hardened usercopy >> 4- arm64/uaccess: Enable hardened usercopy >> 5- ia64/uaccess: Enable hardened usercopy >> 6- powerpc/uaccess: Enable hardened usercopy >> 7- sparc/uaccess: Enable hardened usercopy >> - The heap allocator implementation of object size checking: >> 8- mm: SLAB hardened usercopy support >> 9- mm: SLUB hardened usercopy support >> >> Some notes: >> >> - This is expected to apply on top of -next which contains fixes for the >> position of _etext on both arm and arm64. >> >> - I couldn't detect a measurable performance change with these features >> enabled. Kernel build times were unchanged, hackbench was unchanged, >> etc. I think we could flip this to "on by default" at some point. >> >> - The SLOB support extracted from grsecurity seems entirely broken. I >> have no idea what's going on there, I spent my time testing SLAB and >> SLUB. Having someone else look at SLOB would be nice, but this series >> doesn't depend on it. >> >> Additional features that would be nice, but aren't blocking this series: >> >> - Needs more architecture support for stack frame checking (only x86 now). >> >> > > Even with the SLUB fixup I'm still seeing this blow up on my arm64 system. > This is a > Fedora rawhide kernel + the patches Is this on top of -next? The recent _etext change ("arm64: mm: fix location of _etext") is needed to fix the kernel text test for arm64. -Kees > > [0.666700] usercopy: kernel memory exposure attempt detected from > fc0008b4dd58 () (8 bytes) > [0.666720] CPU: 2 PID: 79 Comm: modprobe Tainted: GW > 4.7.0-0.rc6.git1.1.hardenedusercopy.fc25.aarch64 #1 > [0.666733] Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Nov > 24 2015 > [0.666744] Call trace: > [0.666756] [] dump_backtrace+0x0/0x1e8 > [0.666765] [] show_stack+0x24/0x30 > [0.666775] [] dump_stack+0xa4/0xe0 > [0.666785] [] __check_object_size+0x6c/0x230 > [0.666795] [] create_elf_tables+0x74/0x420 > [0.666805] [] load_elf_binary+0x828/0xb70 > [0.666814] [] search_binary_handler+0xb4/0x240 > [0.666823] [] do_execveat_common+0x63c/0x950 > [0.666832] [] do_execve+0x3c/0x50 > [0.666841] [] > call_usermodehelper_exec_async+0xe8/0x148 > [0.666850] [] ret_from_fork+0x10/0x50 > > This happens on every call to execve. This seems to be the first > copy_to_user in > create_elf_tables. I didn't get a chance to debug and I'm going out of town > all of next week so all I have is the report unfortunately. config attached. > > Thanks, > Laura -- Kees Cook Chrome OS & Brillo Security
Re: [PATCH] kbuild: Abort build on bad stack protector flag
On Sat, Jul 9, 2016 at 5:03 AM, Ingo Molnar wrote: > > * Kees Cook wrote: > >> Before, the stack protector flag was sanity checked before .config had >> been reprocessed. This meant the build couldn't be aborted early, and >> only a warning could be emitted followed later by the compiler blowing >> up with an unknown flag. This has caused a lot of confusion over time, >> so this splits the flag selection from sanity checking and performs the >> sanity checking after the make has been restarted from a reprocessed >> .config, so builds can be aborted as early as possible now. >> >> Additionally moves the x86-specific sanity check to the same location, >> since it suffered from the same warn-then-wait-for-compiler-failure >> problem. >> >> Signed-off-by: Kees Cook >> --- >> Makefile | 69 >> +-- >> arch/x86/Makefile | 8 --- >> 2 files changed, 42 insertions(+), 35 deletions(-) > > What's the status of this patch? I can merge it if Michal acks the main > Makefile > bits. There's been no feedback yet, but I'd really like to see it landed: it removes a lot of ambiguity for this option (and creates a place for future similar options). -Kees -- Kees Cook Chrome OS & Brillo Security
Re: [v1] ErrHandling:Make IS_ERR_VALUE_U32 as generic API to avoid IS_ERR_VALUE abuses.
Hi, [auto build test WARNING on v4.7-rc6] [also build test WARNING on next-20160708] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Arvind-Yadav/ErrHandling-Make-IS_ERR_VALUE_U32-as-generic-API-to-avoid-IS_ERR_VALUE-abuses/20160709-235356 config: x86_64-rhel (attached as .config) compiler: gcc-4.9 (Debian 4.9.3-14) 4.9.3 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): In file included from include/uapi/linux/stddef.h:1:0, from include/linux/stddef.h:4, from include/uapi/linux/posix_types.h:4, from include/uapi/linux/types.h:13, from include/linux/types.h:5, from include/linux/mod_devicetable.h:11, from include/linux/pci.h:20, from include/linux/bcma/bcma.h:4, from drivers/bcma/bcma_private.h:8, from drivers/bcma/scan.c:9: drivers/bcma/scan.c: In function 'bcma_get_next_core': include/linux/err.h:23:52: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:170:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ >> drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^ >> include/linux/err.h:23:38: warning: cast from pointer to integer of >> different size [-Wpointer-to-int-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:170:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ >> drivers/bcma/scan.c:361:18: note: in expansion of macro 'IS_ERR_VALUE_U32' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^ include/linux/err.h:23:52: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:170:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/bcma/scan.c:365:19: note: in expansion of macro 'IS_ERR_VALUE_U32' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^ >> include/linux/err.h:23:38: warning: cast from pointer to integer of >> different size [-Wpointer-to-int-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:170:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/bcma/scan.c:365:19: note: in expansion of macro 'IS_ERR_VALUE_U32' if (tmp == 0 || IS_ERR_VALUE_U32(tmp)) { ^ include/linux/err.h:23:52: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:170:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/bcma/scan.c:380:8: note: in expansion of macro 'IS_ERR_VALUE_U32' if (IS_ERR_VALUE_U32(tmp)) { ^ >> include/linux/err.h:23:38: warning: cast from pointer to integer of >> different size [-Wpointer-to-int-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) ^ include/linux/compiler.h:170:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/bcma/scan.c:380:8: note: in expansion of macro 'IS_ERR_VALUE_U32' if (IS_ERR_VALUE_U32(tmp)) { ^ include/linux/err.h:23:52: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] #define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) &g
Re: [PATCH] capabilities: add capability cgroup controller
On 07/08/16 09:13, Petr Mladek wrote: > On Thu 2016-07-07 20:27:13, Topi Miettinen wrote: >> On 07/07/16 09:16, Petr Mladek wrote: >>> On Sun 2016-07-03 15:08:07, Topi Miettinen wrote: The attached patch would make any uses of capabilities generate audit messages. It works for simple tests as you can see from the commit message, but unfortunately the call to audit_cgroup_list() deadlocks the system when booting a full blown OS. There's no deadlock when the call is removed. I guess that in some cases, cgroup_mutex and/or css_set_lock could be already held earlier before entering audit_cgroup_list(). Holding the locks is however required by task_cgroup_from_root(). Is there any way to avoid this? For example, only print some kind of cgroup ID numbers (are there unique and stable IDs, available without locks?) for those cgroups where the task is registered in the audit message? >>> >>> I am not sure if anyone know what really happens here. I suggest to >>> enable lockdep. It might detect possible deadlock even before it >>> really happens, see Documentation/locking/lockdep-design.txt >>> >>> It can be enabled by >>> >>>CONFIG_PROVE_LOCKING=y >>> >>> It depends on >>> >>> CONFIG_DEBUG_KERNEL=y >>> >>> and maybe some more options, see lib/Kconfig.debug >> >> Thanks a lot! I caught this stack dump: >> >> starting version 230 >> [3.416647] [ cut here ] >> [3.417310] WARNING: CPU: 0 PID: 95 at >> /home/topi/d/linux.git/kernel/locking/lockdep.c:2871 >> lockdep_trace_alloc+0xb4/0xc0 >> [3.417605] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) >> [3.417923] Modules linked in: >> [3.418288] CPU: 0 PID: 95 Comm: systemd-udevd Not tainted 4.7.0-rc5+ #97 >> [3.418444] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >> BIOS Debian-1.8.2-1 04/01/2014 >> [3.418726] 0086 7970f3b0 8816fb00 >> 813c9c45 >> [3.418993] 8816fb50 8816fb40 >> 81091e9b >> [3.419176] 0b3705e2c798 0046 0410 >> >> [3.419374] Call Trace: >> [3.419511] [] dump_stack+0x67/0x92 >> [3.419644] [] __warn+0xcb/0xf0 >> [3.419745] [] warn_slowpath_fmt+0x5f/0x80 >> [3.419868] [] lockdep_trace_alloc+0xb4/0xc0 >> [3.419988] [] kmem_cache_alloc_node+0x42/0x600 >> [3.420156] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 >> [3.420170] [] __alloc_skb+0x5b/0x1d0 >> [3.420170] [] audit_log_start+0x29b/0x480 >> [3.420170] [] ? __lock_task_sighand+0x95/0x270 >> [3.420170] [] audit_log_cap_use+0x39/0xf0 >> [3.420170] [] ns_capable+0x45/0x70 >> [3.420170] [] capable+0x17/0x20 >> [3.420170] [] oom_score_adj_write+0x150/0x2f0 >> [3.420170] [] __vfs_write+0x37/0x160 >> [3.420170] [] ? update_fast_ctr+0x17/0x30 >> [3.420170] [] ? percpu_down_read+0x49/0x90 >> [3.420170] [] ? __sb_start_write+0xb7/0xf0 >> [3.420170] [] ? __sb_start_write+0xb7/0xf0 >> [3.420170] [] vfs_write+0xb8/0x1b0 >> [3.420170] [] ? __fget_light+0x66/0x90 >> [3.420170] [] SyS_write+0x58/0xc0 >> [3.420170] [] do_syscall_64+0x5c/0x300 >> [3.420170] [] entry_SYSCALL64_slow_path+0x25/0x25 >> [3.420170] ---[ end trace fb586899fb556a5e ]--- >> [3.447922] random: systemd-udevd urandom read with 3 bits of entropy >> available >> [4.014078] clocksource: Switched to clocksource tsc >> Begin: Loading essential drivers ... done. >> >> This is with qemu and the boot continues normally. With real computer, >> there's no such output and system just seems to freeze. >> >> Could it be possible that the deadlock happens because there's some IO >> towards /sys/fs/cgroup, which causes a capability check and that in turn >> causes locking problems when we try to print cgroup list? > > The above warning is printed by the code from > kernel/locking/lockdep.c:2871 > > static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags) > { > [...] > /* We're only interested __GFP_FS allocations for now */ > if (!(gfp_mask & __GFP_FS)) > return; > > /* >* Oi! Can't be having __GFP_FS allocations with IRQs disabled. >*/ > if (DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))) > return; > > > The backtrace shows that your new audit_log_cap_use() is called > from vfs_write(). You might try to use audit_log_start() with > GFP_NOFS instead of GFP_KERNEL. > > Note that this is rather intuitive advice. I still need to learn a lot > about memory management and kernel in general to be more sure about > a correct solution. Here's what I got now: [ 18.043181] [ 18.044123] == [ 18.044123] [ INFO: possible circular locking dependency detected ] [ 18.044123] 4.7.0-rc5+ #99 Not tainted [ 18.044123]
[PATCH] include: mman: Use bool instead of int for the return value of arch_validate_prot
From: Chen Gang For pure bool function's return value, bool is a little better more or less than int. And return boolean result directly. Since 'if' statement is also for boolean checking, and return boolean result, too. Signed-off-by: Chen Gang --- arch/powerpc/include/asm/mman.h | 8 +++- include/linux/mman.h| 2 +- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h index 2563c43..62e1f47 100644 --- a/arch/powerpc/include/asm/mman.h +++ b/arch/powerpc/include/asm/mman.h @@ -31,13 +31,11 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags) } #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags) -static inline int arch_validate_prot(unsigned long prot) +static inline bool arch_validate_prot(unsigned long prot) { if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO)) - return 0; - if ((prot & PROT_SAO) && !cpu_has_feature(CPU_FTR_SAO)) - return 0; - return 1; + return false; + return (prot & PROT_SAO) == 0 || cpu_has_feature(CPU_FTR_SAO); } #define arch_validate_prot(prot) arch_validate_prot(prot) diff --git a/include/linux/mman.h b/include/linux/mman.h index 33e17f6..634c4c5 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -49,7 +49,7 @@ static inline void vm_unacct_memory(long pages) * * Returns true if the prot flags are valid */ -static inline int arch_validate_prot(unsigned long prot) +static inline bool arch_validate_prot(unsigned long prot) { return (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM)) == 0; } -- 1.9.3
Re: [f2fs-dev] [PATCH 3/7] f2fs: drop any block plugging
On Sat, Jul 09, 2016 at 10:28:49AM +0800, Chao Yu wrote: > Hi Jaegeuk, > > On 2016/6/9 1:24, Jaegeuk Kim wrote: > > In f2fs, we don't need to keep block plugging for NODE and DATA writes, > > since > > we already merged bios as much as possible. > > IMO, we can not remove block plug, this is because there are still many > conditions which stops us merging r/w IOs into one bio as we expect, > theoretically, block plug can hold bios as much as possible, then submitting > them into queue in batch, it will reduce racing of grabbing queue->lock during > bio submitting, if we drop them, when syncing nodes or flushing datas, we will > suffer more lock racing. > > Or there are something I am missing, do you suffer any performance issue on > block plug? In the latest patch, I've turned off plugging forcefully, only if the underlying device is SMR drive. And, still I removed other block plugging, since I couldn't see any performance regression. Even in some workloads, I could have seen some inverted IOs due to race condition between plugged and unplugged IOs. Thanks, > > Thanks, > > > > > Signed-off-by: Jaegeuk Kim > > --- > > fs/f2fs/checkpoint.c | 4 > > fs/f2fs/data.c | 17 ++--- > > fs/f2fs/gc.c | 5 - > > fs/f2fs/segment.c| 7 +-- > > 4 files changed, 11 insertions(+), 22 deletions(-) > > > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > > index 5ddd15c..4179c7b 100644 > > --- a/fs/f2fs/checkpoint.c > > +++ b/fs/f2fs/checkpoint.c > > @@ -897,11 +897,8 @@ static int block_operations(struct f2fs_sb_info *sbi) > > .nr_to_write = LONG_MAX, > > .for_reclaim = 0, > > }; > > - struct blk_plug plug; > > int err = 0; > > > > - blk_start_plug(&plug); > > - > > retry_flush_dents: > > f2fs_lock_all(sbi); > > /* write all the dirty dentry pages */ > > @@ -938,7 +935,6 @@ retry_flush_nodes: > > goto retry_flush_nodes; > > } > > out: > > - blk_finish_plug(&plug); > > return err; > > } > > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > > index 30dc448..5f655d0 100644 > > --- a/fs/f2fs/data.c > > +++ b/fs/f2fs/data.c > > @@ -98,10 +98,13 @@ static struct bio *__bio_alloc(struct f2fs_sb_info > > *sbi, block_t blk_addr, > > } > > > > static inline void __submit_bio(struct f2fs_sb_info *sbi, int rw, > > - struct bio *bio) > > + struct bio *bio, enum page_type type) > > { > > - if (!is_read_io(rw)) > > + if (!is_read_io(rw)) { > > atomic_inc(&sbi->nr_wb_bios); > > + if (current->plug && (type == DATA || type == NODE)) > > + blk_finish_plug(current->plug); > > + } > > submit_bio(rw, bio); > > } > > > > @@ -117,7 +120,7 @@ static void __submit_merged_bio(struct f2fs_bio_info > > *io) > > else > > trace_f2fs_submit_write_bio(io->sbi->sb, fio, io->bio); > > > > - __submit_bio(io->sbi, fio->rw, io->bio); > > + __submit_bio(io->sbi, fio->rw, io->bio, fio->type); > > io->bio = NULL; > > } > > > > @@ -235,7 +238,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio) > > return -EFAULT; > > } > > > > - __submit_bio(fio->sbi, fio->rw, bio); > > + __submit_bio(fio->sbi, fio->rw, bio, fio->type); > > return 0; > > } > > > > @@ -1040,7 +1043,7 @@ got_it: > > */ > > if (bio && (last_block_in_bio != block_nr - 1)) { > > submit_and_realloc: > > - __submit_bio(F2FS_I_SB(inode), READ, bio); > > + __submit_bio(F2FS_I_SB(inode), READ, bio, DATA); > > bio = NULL; > > } > > if (bio == NULL) { > > @@ -1083,7 +1086,7 @@ set_error_page: > > goto next_page; > > confused: > > if (bio) { > > - __submit_bio(F2FS_I_SB(inode), READ, bio); > > + __submit_bio(F2FS_I_SB(inode), READ, bio, DATA); > > bio = NULL; > > } > > unlock_page(page); > > @@ -1093,7 +1096,7 @@ next_page: > > } > > BUG_ON(pages && !list_empty(pages)); > > if (bio) > > - __submit_bio(F2FS_I_SB(inode), READ, bio); > > + __submit_bio(F2FS_I_SB(inode), READ, bio, DATA); > > return 0; > > } > > > > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c > > index 4a03076..67fd285 100644 > > --- a/fs/f2fs/gc.c > > +++ b/fs/f2fs/gc.c > > @@ -777,7 +777,6 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi, > > { > > struct page *sum_page; > > struct f2fs_summary_block *sum; > > - struct blk_plug plug; > > unsigned int segno = start_segno; > > unsigned int end_segno = start_segno + sbi->segs_per_sec; > > int seg_freed = 0; > > @@ -795,8 +794,6 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi, > > unlock_page(sum_page); > > } > > > > - blk_start_plug(&plug); > > - > > for (segno = start_seg
Re: [PATCH v3] f2fs: fix to avoid data update racing between GC and DIO
On Fri, Jul 08, 2016 at 11:50:02PM +0800, Chao Yu wrote: > Hi Jaegeuk, > > On 2016/7/8 11:19, Jaegeuk Kim wrote: > > Hi Chao, > > > > Could you take a look at this in xfstests/generic/013? > > > > [ 502.480850] == > > [ 502.480864] [ INFO: possible circular locking dependency detected ] > > [ 502.480877] 4.7.0-rc1+ #124 Tainted: G OE > > [ 502.480886] --- > > [ 502.480897] fsstress/10729 is trying to acquire lock: > > [ 502.480906] (&sb->s_type->i_mutex_key#18){+.+.+.}, at: > > [] do_blockdev_direct_IO+0x1db/0x2310 > > [ 502.480948] > > [ 502.480948] but task is already holding lock: > > [ 502.480959] (&fi->dio_rwsem){.+.+.+}, at: [] > > f2fs_direct_IO+0xd1/0x3d0 [f2fs] > > [ 502.481003] > > [ 502.481003] which lock already depends on the new lock. > > [ 502.481003] > > [ 502.481018] > > [ 502.481018] the existing dependency chain (in reverse order) is: > > [ 502.481030] > > [ 502.481030] -> #1 (&fi->dio_rwsem){.+.+.+}: > > [ 502.481054][] lock_acquire+0xd3/0x220 > > [ 502.481071][] down_read+0x51/0xa0 > > [ 502.481089][] f2fs_direct_IO+0xd1/0x3d0 [f2fs] > > [ 502.481114][] > > generic_file_direct_write+0xa7/0x160 > > [ 502.481133][] > > __generic_file_write_iter+0xbd/0x1e0 > > [ 502.481149][] f2fs_file_write_iter+0xdb/0x100 > > [f2fs] > > [ 502.481173][] __vfs_write+0xc8/0x140 > > [ 502.481190][] vfs_write+0xb5/0x1b0 > > [ 502.481205][] SyS_write+0x49/0xa0 > > [ 502.481220][] > > entry_SYSCALL_64_fastpath+0x23/0xc1 > > [ 502.481236] > > [ 502.481236] -> #0 (&sb->s_type->i_mutex_key#18){+.+.+.}: > > [ 502.481264][] __lock_acquire+0x161c/0x1940 > > [ 502.481280][] lock_acquire+0xd3/0x220 > > [ 502.481296][] down_write+0x5a/0xc0 > > [ 502.481312][] > > do_blockdev_direct_IO+0x1db/0x2310 > > [ 502.481328][] __blockdev_direct_IO+0x3a/0x40 > > [ 502.481344][] f2fs_direct_IO+0x104/0x3d0 [f2fs] > > [ 502.481368][] > > generic_file_read_iter+0x689/0x7e0 > > [ 502.481384][] __vfs_read+0xc1/0x130 > > [ 502.481399][] vfs_read+0x91/0x140 > > [ 502.481414][] SyS_read+0x49/0xa0 > > [ 502.481429][] > > entry_SYSCALL_64_fastpath+0x23/0xc1 > > [ 502.481445] > > [ 502.481445] other info that might help us debug this: > > [ 502.481445] > > [ 502.481459] Possible unsafe locking scenario: > > [ 502.481459] > > [ 502.481726]CPU0CPU1 > > [ 502.481987] > > [ 502.482242] lock(&fi->dio_rwsem); > > [ 502.482501] > > lock(&sb->s_type->i_mutex_key#18); > > [ 502.482765]lock(&fi->dio_rwsem); > > [ 502.483025] lock(&sb->s_type->i_mutex_key#18); > > Seems we will suffer ABBA deadlock: > > writerreader > - f2fs_file_write_iter > - down_write(&inode->i_rwsem) > - __generic_file_write_iter > - generic_file_direct_write >- f2fs_direct_IO > - generic_file_read_iter >- f2fs_direct_IO >- down_read(&fi->dio_rwsem) > - __blockdev_direct_IO > - do_blockdev_direct_IO > - down_write(&inode->i_rwsem) > > - down_read(&fi->dio_rwsem) > > What about splitting dio_rwsem to rdio_rwsem/wdio_rwsem for reader/writer to > avoid deadlock? Hmm, how about inode_trylock in GC? > > Thanks, > > > [ 502.483285] > > [ 502.483285] *** DEADLOCK *** > > [ 502.483285] > > [ 502.484018] 1 lock held by fsstress/10729: > > [ 502.484262] #0: (&fi->dio_rwsem){.+.+.+}, at: [] > > f2fs_direct_IO+0xd1/0x3d0 [f2fs] > > > > Thanks, > > > > On Thu, Jul 07, 2016 at 12:49:12PM +0800, Chao Yu wrote: > >> From: Chao Yu > >> > >> Datas in file can be operated by GC and DIO simultaneously, so we will > >> face race case as below: > >> > >> For write case: > >> Thread A Thread B > >> - generic_file_direct_write > >> - invalidate_inode_pages2_range > >> - f2fs_direct_IO > >> - do_blockdev_direct_IO > >>- do_direct_IO > >> - get_more_blocks > >>- f2fs_gc > >> - do_garbage_collect > >> - gc_data_segment > >> - move_data_page > >>- do_write_data_page > >>migrate data block to new block > >> address > >>- dio_bio_submit > >>update user data to old block address > >> > >> For read case: > >> Thread
[PATCH] mm: migrate: Use bool instead of int for the return value of PageMovable
From: Chen Gang For pure bool function's return value, bool is a little better more or less than int. And return boolean result directly, since 'if' statement is also for boolean checking, and return boolean result, too. Signed-off-by: Chen Gang --- include/linux/migrate.h | 4 ++-- mm/compaction.c | 9 +++-- 2 files changed, 5 insertions(+), 8 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index ae8d475..0e366f8 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -72,11 +72,11 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping, #endif /* CONFIG_MIGRATION */ #ifdef CONFIG_COMPACTION -extern int PageMovable(struct page *page); +extern bool PageMovable(struct page *page); extern void __SetPageMovable(struct page *page, struct address_space *mapping); extern void __ClearPageMovable(struct page *page); #else -static inline int PageMovable(struct page *page) { return 0; }; +static inline bool PageMovable(struct page *page) { return false; }; static inline void __SetPageMovable(struct page *page, struct address_space *mapping) { diff --git a/mm/compaction.c b/mm/compaction.c index 0bd53fb..cfcfe88 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -95,19 +95,16 @@ static inline bool migrate_async_suitable(int migratetype) #ifdef CONFIG_COMPACTION -int PageMovable(struct page *page) +bool PageMovable(struct page *page) { struct address_space *mapping; VM_BUG_ON_PAGE(!PageLocked(page), page); if (!__PageMovable(page)) - return 0; + return false; mapping = page_mapping(page); - if (mapping && mapping->a_ops && mapping->a_ops->isolate_page) - return 1; - - return 0; + return mapping && mapping->a_ops && mapping->a_ops->isolate_page; } EXPORT_SYMBOL(PageMovable); -- 1.9.3
[v1] ErrHandling:Make IS_ERR_VALUE_U32 as generic API to avoid IS_ERR_VALUE abuses.
IS_ERR_VALUE() assumes that its parameter is an unsigned long. It can not be used to check if an 'unsigned int' reflects an error. As they pass an 'unsigned int' into a function that takes an 'unsigned long' argument. This happens to work because the type is sign-extended on 64-bit architectures before it gets converted into an unsigned type. However, anything that passes an 'unsigned short' or 'unsigned int' argument into IS_ERR_VALUE() is guaranteed to be broken, as are 8-bit integers and types that are wider than 'unsigned long'. It would be nice to any users that are not passing 'unsigned int' arguments. Signed-off-by: Arvind Yadav --- drivers/bcma/scan.c | 1 - include/linux/err.h | 2 ++ 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/bcma/scan.c b/drivers/bcma/scan.c index 4a2d1b2..3bc77eb 100644 --- a/drivers/bcma/scan.c +++ b/drivers/bcma/scan.c @@ -272,7 +272,6 @@ static struct bcma_device *bcma_find_core_reverse(struct bcma_bus *bus, u16 core return NULL; } -#define IS_ERR_VALUE_U32(x) ((x) >= (u32)-MAX_ERRNO) static int bcma_get_next_core(struct bcma_bus *bus, u32 __iomem **eromptr, struct bcma_device_id *match, int core_num, diff --git a/include/linux/err.h b/include/linux/err.h index 1e35588..1940af7 100644 --- a/include/linux/err.h +++ b/include/linux/err.h @@ -20,6 +20,8 @@ #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO) +#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(void *)(x) >= (unsigned int)-MAX_ERRNO) + static inline void * __must_check ERR_PTR(long error) { return (void *) error; -- 1.9.1
Re: [PATCH 0/3] ARM: dts: the dts support for rk3288 firefly reload
Thank you for you review, HeiKo On 07/08/2016 05:35 AM, Heiko Stuebner wrote: Hi Randy, Am Donnerstag, 7. Juli 2016, 02:22:57 schrieb Randy Li: The rk3288 firefly reload is a Rockchip RK3288 based board be found by core board and main board. The regulators are connected in a different way to the previous version of firefly boards, it is necessary to move some common code to uncommon place. I only tested the ethernet and confirmed that works. The usb in this board won't caused by the bugs in the driver. This version follow the suggests from Heiko Stuebner, except the duplicated supply name problem, I don't think it could be fixed in that way. I've now had a chance to look at that reload board on the firefly site. Firefly also is the company name, so a board named that way is not necessarily a "variant" :-) . And looking at the "reload" board this definitly seems to be a very different product with it being a system-on-module+baseboard design with additional peripherals like that sata bridge, camera interfaces and probably sata bridge is just a SATA to usb bridge and the "reload" bring back the DVP camera interface and a HDMI rx chip connected to the other MIPI camera interface. more. As you might've seen, most Rockchip boards are based on some reference- design, so are similar in a big part of their core layout. Yes, from the evb. But the even the main board of evb in rockchip company have at lease 3 versions as I known. Also the evb is found by power board, main board and core board. So, looking at the vastly different product the reload is, I'd really like to have a separate dts for the reload, to not run into more confusing differences later on. The main problem is that power connections are different. That is why I decide to make a separate dts. If the kernel introduce the override dts, I could have a better way to implement it. Also, when adding a new board, please also add an entry to Documentation/devicetree/bindingd/arm/rockchip.txt I would send a patch set in a few days. Thanks Heiko Thank you for you review and you patient again Randy
Re: [PATCH v3 1/7] lib: string: add functions to case-convert strings
On 9 July 2016 at 05:04, Luis de Bethencourt wrote: > On 08/07/16 23:43, Markus Mayer wrote: >> Add a collection of generic functions to convert strings to lowercase >> or uppercase. >> >> Changing the case of a string (with or without copying it first) seems >> to be a recurring requirement in the kernel that is currently being >> solved by several duplicated implementations doing the same thing. This >> change aims at reducing this code duplication. >> >> The new functions are >> void strlcpytoupper(char *dst, const char *src, size_t len); >> void strlcpytolower(char *dst, const char *src, size_t len); >> void strcpytoupper(char *dst, const char *src); >> void strcpytolower(char *dst, const char *src); >> void strtoupper(char *s); >> void strtolower(char *s); >> >> The "str[l]cpyto*" versions of the function take a destination string >> and a source string as arguments. The "strlcpyto*" versions additionally >> take a length argument like strlcpy() itself. Lastly, the strto* >> functions take a single string argument and modify the passed-in string. >> >> Like strlcpy(), and unlike strncpy(), the functions guarantee NULL >> termination of the destination string. >> >> Signed-off-by: Markus Mayer >> --- >> include/linux/string.h | 40 >> lib/string.c | 38 ++ >> 2 files changed, 78 insertions(+) >> >> diff --git a/include/linux/string.h b/include/linux/string.h >> index 26b6f6a..36c9d14 100644 >> --- a/include/linux/string.h >> +++ b/include/linux/string.h >> @@ -116,6 +116,8 @@ extern void * memchr(const void *,int,__kernel_size_t); >> #endif >> void *memchr_inv(const void *s, int c, size_t n); >> char *strreplace(char *s, char old, char new); >> +extern void strlcpytoupper(char *dst, const char *src, size_t len); >> +extern void strlcpytolower(char *dst, const char *src, size_t len); >> >> extern void kfree_const(const void *x); >> >> @@ -169,4 +171,42 @@ static inline const char *kbasename(const char *path) >> return tail ? tail + 1 : path; >> } >> >> +/** >> + * strcpytoupper - Copy string and convert to uppercase. >> + * @dst: The buffer to store the result. >> + * @src: The string to convert to uppercase. >> + */ >> +static inline void strcpytoupper(char *dst, const char *src) >> +{ >> + strlcpytoupper(dst, src, -1); >> +} >> + > > Why not use SIZE_MAX instead of -1? Sure. I'll change all four of them. Thanks. >> +/** >> + * strcpytolower - Copy string and convert to lowercase. >> + * @dst: The buffer to store the result. >> + * @src: The string to convert to lowercase. >> + */ >> +static inline void strcpytolower(char *dst, const char *src) >> +{ >> + strlcpytolower(dst, src, -1); >> +} >> + > > Same here, and the 2 below :) > > Thanks Markus, > Luis > >> +/** >> + * strtoupper - Convert string to uppercase. >> + * @s: The string to operate on. >> + */ >> +static inline void strtoupper(char *s) >> +{ >> + strlcpytoupper(s, s, -1); >> +} >> + >> +/** >> + * strtolower - Convert string to lowercase. >> + * @s: The string to operate on. >> + */ >> +static inline void strtolower(char *s) >> +{ >> + strlcpytolower(s, s, -1); >> +} >> + >> #endif /* _LINUX_STRING_H_ */ >> diff --git a/lib/string.c b/lib/string.c >> index ed83562..fd8c427 100644 >> --- a/lib/string.c >> +++ b/lib/string.c >> @@ -952,3 +952,41 @@ char *strreplace(char *s, char old, char new) >> return s; >> } >> EXPORT_SYMBOL(strreplace); >> + >> +/** >> + * strlcpytoupper - Copy a length-limited string and convert to uppercase. >> + * @dst: The buffer to store the result. >> + * @src: The string to convert to uppercase. >> + * @len: Maximum string length. May be SIZE_MAX (-1) to set no limit. >> + */ >> +void strlcpytoupper(char *dst, const char *src, size_t len) >> +{ >> + size_t i; >> + >> + if (!len) >> + return; >> + >> + for (i = 0; i < len && src[i]; ++i) >> + dst[i] = toupper(src[i]); >> + dst[i < len ? i : i - 1] = '\0'; >> +} >> +EXPORT_SYMBOL(strlcpytoupper); >> + >> +/** >> + * strlcpytolower - Copy a length-limited string and convert to lowercase. >> + * @dst: The buffer to store the result. >> + * @src: The string to convert to lowercase. >> + * @len: Maximum string length. May be SIZE_MAX (-1) to set no limit. >> + */ >> +void strlcpytolower(char *dst, const char *src, size_t len) >> +{ >> + size_t i; >> + >> + if (!len) >> + return; >> + >> + for (i = 0; i < len && src[i]; ++i) >> + dst[i] = tolower(src[i]); >> + dst[i < len ? i : i - 1] = '\0'; >> +} >> +EXPORT_SYMBOL(strlcpytolower); >> >
Re: Re: cgroup: Fix split bio been throttled more than once
Hello, Ming. On Fri, Jul 08, 2016 at 06:35:06PM +0800, Ming Lei wrote: > I am wondering why REQ_THROTTLED is cleared for the original bio > even it has been charged and will be issued to driver, and is it allowed > to throttle and charge the same bio for many times? So, IIUC, the flag is just to prevent the bio from recursing while being issued from blk-throtl after queued there for throttling. We can probably extend the flag. I'm not sure how it'd interact with stacked drivers tho. It'd definitely need to be cleared before traveling down to a lower level device. Thanks. -- tejun
Re: [PATCH v2] Add tw5864 driver
Hi Hans, Thanks for great help. I believe the issues highlighted by your are rectified by now. One chunk of your proposed changes seems to be wrong. Also I have one non-technical change I want to introduce to this driver, see it in the bottom of this letter ("Also, I decided to document known video quality issues in a printed warning..."). On Fri, Jul 01, 2016 at 03:35:40PM +0200, Hans Verkuil wrote: > On 06/10/2016 12:11 AM, Andrey Utkin wrote: > > + cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS; > > This line can be dropped: the v4l2 core will do this automatically. This seems not so: dropping it resulted in new compliance fails: Required ioctls: fail: v4l2-compliance.cpp(550): dcaps & ~caps test VIDIOC_QUERYCAP: FAIL Allow for multiple opens: test second video open: OK fail: v4l2-compliance.cpp(550): dcaps & ~caps test VIDIOC_QUERYCAP: FAIL I am running latest v4l-utils from git. This particular fail happens on kernels built from next-20160707 and next-20160609. BTW next-20160707 makes my dev machine to hang after few minutes of uptime, regardless of my module being loaded, so for now I am testing driver on next-20160609. This (running old linux-next) causes such new fail with latest v4l-utils: fail: v4l2-test-buffers.cpp(293): g_flags() & V4L2_BUF_FLAG_DONE which is understandable because of recent commit to v4l-utils flipping expected behaviour in this regard: commit 7d784c6894b10cdf5ec025c2cd7c320320f5f658 Author: Hans Verkuil Date: Fri Jul 8 23:10:34 2016 +0200 v4l2-compliance: fix a check for the DONE flag This was always set by vb2 drivers due to a bug. It is now cleared again after that bug was fixed, but the test should now be inverted. Signed-off-by: Hans Verkuil diff --git a/utils/v4l2-compliance/v4l2-test-buffers.cpp b/utils/v4l2-compliance/v4l2-test-buffers.cpp index fb14170..dc82918 100644 --- a/utils/v4l2-compliance/v4l2-test-buffers.cpp +++ b/utils/v4l2-compliance/v4l2-test-buffers.cpp @@ -290,7 +290,7 @@ int buffer::check(unsigned type, unsigned memory, unsigned index, fail_on_test(g_bytesused(p) > g_length(p)); } fail_on_test(!g_timestamp().tv_sec && !g_timestamp().tv_usec); - fail_on_test(!(g_flags() & V4L2_BUF_FLAG_DONE)); + fail_on_test(g_flags() & V4L2_BUF_FLAG_DONE); fail_on_test((int)g_sequence() < seq.last_seq + 1); if (v4l_type_is_video(g_type())) { fail_on_test(g_field() == V4L2_FIELD_ALTERNATE); So please expect this fail in v4l2-compliance logs of my new submission. Also, I decided to document known video quality issues in a printed warning; I like how it looks now both in code and in dmesg, but checkpatch.pl doesn't like it. See commit at https://github.com/bluecherrydvr/linux/commit/83395b6c5e1e5ceb642c9a04a28db5fc22566c87 The message is split in pieces because otherwise it gets truncated. I'd like some approval or suggestion for rework on this. It looks like this in dmesg: [ 5101.182151] tw5864 :06:07.0: BEWARE OF KNOWN ISSUES WITH VIDEO QUALITY This driver was developed by Bluecherry LLC by deducing behaviour of original manufacturer's driver, from both source code and execution traces. It is known that there are some artifacts on output video with this driver: - on all known hardware samples: random pixels of wrong color (mostly white, red or blue) appearing and disappearing on sequences of P-frames; - on some hardware samples (known with H.264 core version e006:2800): total madness on P-frames: blocks of wrong luminance; blocks of wrong colors "creeping" across the picture. There is a workaround for both issues: avoid P-frames by setting GOP size to 1. To do that, run such command on device files created by this driver: for dev in /dev/video*; do v4l2-ctl --device $dev --set-ctrl=video_gop_size=1; done [ 5101.357312] systemd-journald[219]: Compressed data object 850 -> 636 using XZ [ 5101.471071] tw5864 :06:07.0: These issues are not decoding errors; all produced H.264 streams are decoded properly. Streams without P-frames don't have these artifacts so it's not analog-to-digital conversion issues nor internal memory errors; we conclude it's internal H.264 encoder issues. We cannot even check the original driver's behaviour because it has never worked properly at all in our development environment. So these issues may be actually related to firmware or hardware. However it may be that there's just some more register settings missing in the driver which would please the hardware. Manufacturer didn't help much on our inquiries, but feel free to disturb again the support of Intersil (owner of former Techwell). And checkpatch says this: $ ./../../../../scripts/checkpatch.pl -f tw5864-core
Re:[v1.1,1/3] driver: input :touchscreen : add Raydium crc touch function
Hi dmitry: > >>input_mt_report_slot_state(ts->input, MT_TOOL_FINGER, state); > >> > >> - if (!state) > >> - continue; > >> - > >> - input_report_abs(ts->input, ABS_MT_POSITION_X, > >> + if (state == 0x01) { > > >Why we need this change? How is it related to CRC? Do you intent to > >report contact as active but not emit any position data of state is > >neither 0 nor 1? > This is no relationship with CRC, just want to make sure report points as > state equal to 1. >If active contact only reported when state is 0x01 you need to update >the statements above like this: > > input_mt_report_slot_state(ts->input, MT_TOOL_FINGER, > state == 0x01); > > if (state != 0x01) > continue; > >but I am surprised that your firmware would report anything but 0 for >inactive contact. > >Could you document all possible state values? Actual, our firmware only can report touch points as 1. Other cases is nothing to do. Can I merge this part you suggested into the CRC version patch? Thanks. Jeffrey
Re:[v1.1,3/3] modify raydium firmware update rule
Hi dmitry: >> >> modify raydium touch firmware update rule. >> >> >Why? You need to explain why you are proposing a change (but as I >> >mentioned I see no reason for using custom file names for firmware. Have >> >userspace adjust name as needed by the driver. >> >> >Thanks. >> >> Just want to easy to do firmware update version control in the factory. If >> do this, >> factory do not easy update wrong version. > >Just have your factory image rename firmware to canonical name before >initiating update. There is no need to encumber kernel code with this. Okay Thanks. Jeffrey.
[PATCH v1 1/1] x86/platform/intel-mid: Mark regulators explicitly defined
Intel MID platforms are using explicitly defined regulators. Let regulator core know that we do not have any additional regulators left. This lets it substitute unprovided regulators with dummy ones. Without this change when CONFIG_REGULATOR=y the USB driver fails on getting "vbus" regulator and SDHCI can't get "vmmc" and "vqmmc" regulators either. Signed-off-by: Andy Shevchenko --- arch/x86/platform/intel-mid/intel-mid.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/platform/intel-mid/intel-mid.c b/arch/x86/platform/intel-mid/intel-mid.c index 90bb997..ad10fce 100644 --- a/arch/x86/platform/intel-mid/intel-mid.c +++ b/arch/x86/platform/intel-mid/intel-mid.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -144,6 +145,15 @@ static void intel_mid_arch_setup(void) out: if (intel_mid_ops->arch_setup) intel_mid_ops->arch_setup(); + + /* +* Intel MID platforms are using explicitly defined regulators. +* +* Let regulator core know that we do not have any additional +* regulators left. This lets it substitute unprovided regulators with +* dummy ones. +*/ + regulator_has_full_constraints(); } /* MID systems don't have i8042 controller */ -- 2.8.1
Re: [PATCH] qxl: correctly handling failed allocation
Hi, [auto build test ERROR on drm/drm-next] [also build test ERROR on v4.7-rc6 next-20160708] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Insu-Yun/qxl-correctly-handling-failed-allocation/20151230-031647 base: git://people.freedesktop.org/~airlied/linux.git drm-next config: x86_64-allmodconfig (attached as .config) compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): drivers/gpu/drm/qxl/qxl_kms.c: In function 'qxl_device_init': >> drivers/gpu/drm/qxl/qxl_kms.c:224:11: error: 'struct qxl_device' has no >> member named 'memslots'; did you mean 'mem_slots'? if (!qdev->memslots) ^~ vim +224 drivers/gpu/drm/qxl/qxl_kms.c 218 (~(uint64_t)0) >> (qdev->slot_id_bits + qdev->slot_gen_bits); 219 220 qdev->mem_slots = 221 kmalloc(qdev->n_mem_slots * sizeof(struct qxl_memslot), 222 GFP_KERNEL); 223 > 224 if (!qdev->memslots) 225 return -ENOMEM; 226 227 idr_init(&qdev->release_idr); --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [tip:x86/asm] x86/entry: Inline enter_from_user_mode()
tip-bot for Paolo Bonzini wrote: >Commit-ID: eec4b1227db153ca16f8f5f285d01fefdce05438 >Gitweb: >http://git.kernel.org/tip/eec4b1227db153ca16f8f5f285d01fefdce05438 >Author: Paolo Bonzini >AuthorDate: Mon, 20 Jun 2016 16:58:30 +0200 >Committer: Ingo Molnar >CommitDate: Sat, 9 Jul 2016 10:44:02 +0200 > >x86/entry: Inline enter_from_user_mode() > >This matches what is already done for prepare_exit_to_usermode(), >and saves about 60 clock cycles (4% speedup) with the benchmark >in the previous commit message. > >Signed-off-by: Paolo Bonzini >Reviewed-by: Rik van Riel >Reviewed-by: Andy Lutomirski >Reviewed-by: Rik van Riel >Reviewed-by: Andy Lutomirski >Reviewed-by: Rik van Riel >Reviewed-by: Andy Lutomirski >Reviewed-by: Rik van Riel >Reviewed-by: Andy Lutomirski >Acked-by: Paolo Bonzini Woohaa, if that amount of review doesn't get this patch upstream I don't know what will ;-) -- Sent from a small device: formatting sucks and brevity is inevitable.
Re: [PATCH net] udp: prevent bugcheck if filter truncates packet too much
On Sat, Jul 9, 2016 at 6:43 AM, Michal Kubecek wrote: > On Sat, Jul 09, 2016 at 11:48:49AM +0200, Daniel Borkmann wrote: >> On 07/09/2016 02:20 AM, Alexei Starovoitov wrote: >> >On Sat, Jul 09, 2016 at 01:31:40AM +0200, Eric Dumazet wrote: >> >>On Fri, 2016-07-08 at 17:52 +0200, Michal Kubecek wrote: >> >>>If socket filter truncates an udp packet below the length of UDP header >> >>>in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a >> >>>BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if >> >>>kernel is configured that way) can be easily enforced by an unprivileged >> >>>user which was reported as CVE-2016-6162. For a reproducer, see >> >>>http://seclists.org/oss-sec/2016/q3/8 >> >>> >> >>>Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before >> >>>queueing") >> >>>Reported-by: Marco Grassi >> >>>Signed-off-by: Michal Kubecek >> >>>--- > >> >>Acked-by: Eric Dumazet >> > >> >this is incomplete fix. Please do not apply. See discussion at >> >security@kernel >> >> Ohh well, didn't see it earlier before starting the discussion at >> security@... >> >> I'm okay if we take this for now as a quick band aid and find a better >> way how to deal with the underlying issue long-term so that it's >> /guaranteed/ that it doesn't bite us any further in such fragile ways. > > Agreed. As rc7 is due in a day or two, rushing a complex and intrusive > solution in might be too risky. Acked-by: Willem de Bruijn Thanks, Michal.
Re: [PATCH v2] input: tablet: pegasus_notetaker: USB PM fixes
Am 2016-07-08 um 23:08 schrieb Dmitry Torokhov: > On Tue, Jun 28, 2016 at 06:17:13PM +0200, Martin Kepplinger wrote: >> Am 2016-06-23 um 19:18 schrieb Dmitry Torokhov: >>> Hi Martin, >>> >>> On Tue, Jun 14, 2016 at 01:20:15PM +0200, Martin Kepplinger wrote: static int pegasus_reset_resume(struct usb_interface *intf) { + struct pegasus *pegasus = usb_get_intfdata(intf); + + if (pegasus->dev->users) + pegasus_set_mode(pegasus, PEN_MODE_XY, NOTETAKER_LED_MOUSE); + return pegasus_resume(intf); >>> >>> Hmm, we need to take input mutex when using pegasus->dev->users, how >>> about the version below instead? >>> >>> Thanks. >>> >> >> Sorry for the delay, give me a few more days to test and confirm this or >> come up with a final patch. > > Martin, did you have time to try out this version of the patch? > > Thanks! > This patch doesn't seem to work as is. Holidays get in the way, but you can expect a working patch(set) next week. martin
Re: [PATCH v3 1/7] lib: string: add functions to case-convert strings
On 08/07/16 23:43, Markus Mayer wrote: > Add a collection of generic functions to convert strings to lowercase > or uppercase. > > Changing the case of a string (with or without copying it first) seems > to be a recurring requirement in the kernel that is currently being > solved by several duplicated implementations doing the same thing. This > change aims at reducing this code duplication. > > The new functions are > void strlcpytoupper(char *dst, const char *src, size_t len); > void strlcpytolower(char *dst, const char *src, size_t len); > void strcpytoupper(char *dst, const char *src); > void strcpytolower(char *dst, const char *src); > void strtoupper(char *s); > void strtolower(char *s); > > The "str[l]cpyto*" versions of the function take a destination string > and a source string as arguments. The "strlcpyto*" versions additionally > take a length argument like strlcpy() itself. Lastly, the strto* > functions take a single string argument and modify the passed-in string. > > Like strlcpy(), and unlike strncpy(), the functions guarantee NULL > termination of the destination string. > > Signed-off-by: Markus Mayer > --- > include/linux/string.h | 40 > lib/string.c | 38 ++ > 2 files changed, 78 insertions(+) > > diff --git a/include/linux/string.h b/include/linux/string.h > index 26b6f6a..36c9d14 100644 > --- a/include/linux/string.h > +++ b/include/linux/string.h > @@ -116,6 +116,8 @@ extern void * memchr(const void *,int,__kernel_size_t); > #endif > void *memchr_inv(const void *s, int c, size_t n); > char *strreplace(char *s, char old, char new); > +extern void strlcpytoupper(char *dst, const char *src, size_t len); > +extern void strlcpytolower(char *dst, const char *src, size_t len); > > extern void kfree_const(const void *x); > > @@ -169,4 +171,42 @@ static inline const char *kbasename(const char *path) > return tail ? tail + 1 : path; > } > > +/** > + * strcpytoupper - Copy string and convert to uppercase. > + * @dst: The buffer to store the result. > + * @src: The string to convert to uppercase. > + */ > +static inline void strcpytoupper(char *dst, const char *src) > +{ > + strlcpytoupper(dst, src, -1); > +} > + Why not use SIZE_MAX instead of -1? > +/** > + * strcpytolower - Copy string and convert to lowercase. > + * @dst: The buffer to store the result. > + * @src: The string to convert to lowercase. > + */ > +static inline void strcpytolower(char *dst, const char *src) > +{ > + strlcpytolower(dst, src, -1); > +} > + Same here, and the 2 below :) Thanks Markus, Luis > +/** > + * strtoupper - Convert string to uppercase. > + * @s: The string to operate on. > + */ > +static inline void strtoupper(char *s) > +{ > + strlcpytoupper(s, s, -1); > +} > + > +/** > + * strtolower - Convert string to lowercase. > + * @s: The string to operate on. > + */ > +static inline void strtolower(char *s) > +{ > + strlcpytolower(s, s, -1); > +} > + > #endif /* _LINUX_STRING_H_ */ > diff --git a/lib/string.c b/lib/string.c > index ed83562..fd8c427 100644 > --- a/lib/string.c > +++ b/lib/string.c > @@ -952,3 +952,41 @@ char *strreplace(char *s, char old, char new) > return s; > } > EXPORT_SYMBOL(strreplace); > + > +/** > + * strlcpytoupper - Copy a length-limited string and convert to uppercase. > + * @dst: The buffer to store the result. > + * @src: The string to convert to uppercase. > + * @len: Maximum string length. May be SIZE_MAX (-1) to set no limit. > + */ > +void strlcpytoupper(char *dst, const char *src, size_t len) > +{ > + size_t i; > + > + if (!len) > + return; > + > + for (i = 0; i < len && src[i]; ++i) > + dst[i] = toupper(src[i]); > + dst[i < len ? i : i - 1] = '\0'; > +} > +EXPORT_SYMBOL(strlcpytoupper); > + > +/** > + * strlcpytolower - Copy a length-limited string and convert to lowercase. > + * @dst: The buffer to store the result. > + * @src: The string to convert to lowercase. > + * @len: Maximum string length. May be SIZE_MAX (-1) to set no limit. > + */ > +void strlcpytolower(char *dst, const char *src, size_t len) > +{ > + size_t i; > + > + if (!len) > + return; > + > + for (i = 0; i < len && src[i]; ++i) > + dst[i] = tolower(src[i]); > + dst[i < len ? i : i - 1] = '\0'; > +} > +EXPORT_SYMBOL(strlcpytolower); >
Re: [LEDE-DEV] DHCP via bridge in case of IPv4
Hi Aaron, On Sat, 2016-07-09 at 07:47 -0400, Aaron Z wrote: > On Sat, Jul 9, 2016 at 4:37 AM, Alexey Brodkin > wrote: > > > > Hello, > > > > I was playing with quite simple bridged setup on different boards with > > very recent kernels (4.6.3 as of this writing) and found one interesting > > behavior that I cannot yet understand and googling din't help here as well. > > > > My setup is pretty simple: > > - -- - > > > > > > HOST | | "Dumb AP" | | Wireless client | > > > with DHCP |<->(eth0) (wlan0)<->| attempting to | > > > server| |\ br0 / | | get settings via DHCP | > > - -- - > > > > * HOST is my laptop with DHCP server that works for sure. > > * "Dumb AP" is a separate board (I tried ARM-based Wandboard and ARC-based > > AXS10x boards but results are exactly the same) with wired (eth0) and > > wireless > > (wlan0) network controllers bridged together (br0). That "br0" bridge > > flawlessly > > gets its settings from DHCP server on host. > > * Wireless client could be either a smatrphone or another laptop etc but > > what's important it should be configured to get network settings by DHCP > > as well. > > > > So what happens "br0" always gets network settings from DHCP server on HOST. > > That's fine. But wireless client only reliably gets settings from DHCP > > server > > if IPv6 is enabled on "Dumb AP" board. If IPv6 is disabled I may see that > > wireless client sends "DHCP Discover" then server replies with "DHCP Offer" > > but > > that offer never reaches wireless client. > > > Do you have WDS enabled? If not, DHCP has issues in that scenario: > https://wiki.openwrt.org/doc/howto/clientmode I don't have WDS enabled. I tried to have as simple setup as possible. Still from what I see in the Wiki article above problem happens when there're 4 devices in the chain, right? Because as it says: >8 The 802.11 standard only uses three MAC addresses for frames transmitted between the Access Point and the Station. Frames transmitted from the Station to the AP don't include the ethernet source MAC of the requesting host and response frames are missing the destination ethernet MAC to address the target host behind the client bridge. >8 But in my case I only have 3 devices in the chain so I would think it's something else but issue described in the article. Anyways thanks for the hint. -Alexey