Re: [RFC PATCH 0/7] mm: Get rid of vmalloc_sync_(un)mappings()
On Sat, May 9, 2020 at 10:52 AM Joerg Roedel wrote: > > On Fri, May 08, 2020 at 04:49:17PM -0700, Andy Lutomirski wrote: > > On Fri, May 8, 2020 at 2:36 PM Joerg Roedel wrote: > > > > > > On Fri, May 08, 2020 at 02:33:19PM -0700, Andy Lutomirski wrote: > > > > On Fri, May 8, 2020 at 7:40 AM Joerg Roedel wrote: > > > > > > > What's the maximum on other system types? It might make more sense to > > > > take the memory hit and pre-populate all the tables at boot so we > > > > never have to sync them. > > > > > > Need to look it up for 5-level paging, with 4-level paging its 64 pages > > > to pre-populate the vmalloc area. > > > > > > But that would not solve the problem on x86-32, which needs to > > > synchronize unmappings on the PMD level. > > > > What changes in this series with x86-32? > > This series sets ARCH_PAGE_TABLE_SYNC_MASK to PGTBL_PMD_MODIFIED, so > that the synchronization happens every time PMD(s) in the vmalloc areas > are changed. Before this series this synchronization only happened at > arbitrary places calling vmalloc_sync_(un)mappings(). > > > We already do that synchronization, right? IOW, in the cases where > > the vmalloc *fault* code does anything at all, we should have a small > > bound for how much memory to preallocate and, if we preallocate it, > > then there is nothing to sync and nothing to fault. And we have the > > benefit that we never need to sync anything on 64-bit, which is kind > > of nice. > > Don't really get you here, what is pre-allocated and why is there no > need to sync and fault then? > > > Do we actually need PMD-level things for 32-bit? What if we just > > outlawed huge pages in the vmalloc space on 32-bit non-PAE? > > Disallowing huge-pages would at least remove the need to sync > unmappings, but we still need to sync new PMD entries. Remember that the > size of the vmalloc area on 32 bit is dynamic and depends on the VM-split > and the actual amount of RAM on the system. > > A machine wit 512MB of RAM and a 1G/3G split will have around 2.5G of > VMALLOC address space. And if we want to avoid vmalloc-faults there, we > need to pre-allocate all PTE pages for that area (and the amount of PTE > pages needed increases when RAM decreases). > > On a machine with 512M of RAM we would need ca. 1270+ PTE pages, which > is around 5M (or 1% of total system memory). I can never remember which P?D name goes with which level and which machine type, but I don't think I agree with your math regardless. On x86, there are two fundamental situations that can occur: 1. Non-PAE. There is a single 4k top-level page table per mm, and this table contains either 512 or 1024 entries total. Of those entries, some fraction (half or less) control the kernel address space, and some fraction of *that* is for vmalloc space. Those entries are the *only* thing that needs syncing -- all mms will either have null (not present) in those slots or will have pointers to the *same* next-level-down directories. 2. PAE. Depending on your perspective, there could be a grand total of four top-level paging pointers, of which one (IIRC) is for the kernel. That points to the same place for all mms. Or, if you look at it the other way, PAE is just like #1 except that the top-level table has only four entries and only one points to VMALLOC space. So, unless I'm missing something here, there is an absolute maximum of 512 top-level entries that ever need to be synchronized. Now, there's an additional complication. On x86_64, we have a rule: those entries that need to be synced start out null and may, during the lifetime of the system, change *once*. They are never unmapped or modified after being allocated. This means that those entries can only ever point to a page *table* and not to a ginormous page. So, even if the hardware were to support ginormous pages (which, IIRC, it doesn't), we would be limited to merely immense and not ginormous pages in the vmalloc range. On x86_32, I don't think we have this rule right now. And this means that it's possible for one of these pages to be unmapped or modified. So my suggestion is that just apply the x86_64 rule to x86_32 as well. The practical effect will be that 2-level-paging systems will not be able to use huge pages in the vmalloc range, since the rule will be that the vmalloc-relevant entries in the top-level table must point to page *tables* instead of huge pages. On top of this, if we preallocate these entries, then the maximum amount of memory we can possibly waste is 4k * (entries pointing to vmalloc space - entries actually used for vmalloc space). I don't know what this number typically is, but I don't think it's very large. Preallocating means that vmalloc faults *and* synchronization go away entirely. All of the page tables used for vmalloc will be entirely shared by all mms, so all that's needed to modify vmalloc mappings is to update init_mm and, if needed, flush TLBs. No other page tables will need modification at all. On
Re: [PATCH 2/2] fpga: zynq: Add AFI config driver
Hi Nava, On Mon, May 04, 2020 at 11:55:23AM +, Nava kishore Manne wrote: > Hi Mortiz, > > Thanks for proving the comments. > Please find my response inline. > > > -Original Message- > > From: Moritz Fischer [mailto:m...@kernel.org] > > Sent: Thursday, April 23, 2020 8:59 AM > > To: Nava kishore Manne > > Cc: m...@kernel.org; Michal Simek ; linux- > > f...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux- > > ker...@vger.kernel.org; chinnikishore...@gmail.com > > Subject: Re: [PATCH 2/2] fpga: zynq: Add AFI config driver > > > > Hi Nava, > > > > On Wed, Apr 15, 2020 at 03:54:50PM +0530, Nava kishore Manne wrote: > > > This patch Adds AFI config driver. This is useful for the PS to PL > > > configuration for the fpga manager On zynq platform. > > > > > > Signed-off-by: Nava kishore Manne > > > --- > > > drivers/fpga/Kconfig| 8 + > > > drivers/fpga/Makefile | 1 + > > > drivers/fpga/zynq-afi.c | 81 > > > + > > > 3 files changed, 90 insertions(+) > > > create mode 100644 drivers/fpga/zynq-afi.c > > > > > > diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig index > > > 474f304e..60982a0 100644 > > > --- a/drivers/fpga/Kconfig > > > +++ b/drivers/fpga/Kconfig > > > @@ -214,4 +214,12 @@ config FPGA_MGR_ZYNQMP_FPGA > > > to configure the programmable logic(PL) through PS > > > on ZynqMP SoC. > > > > > > +config FPGA_MGR_ZYNQ_AFI_FPGA > > > + bool "Xilinx AFI FPGA" > > > + depends on FPGA_MGR_ZYNQ_FPGA > > Curious. How does this dependency play in here? > > > + help > > > + Zynq AFI driver support for writing to the AFI registers > > > + for configuring the PS_PL interface. For some of the bitstream > > > + or designs to work the PS to PL interfaces need to be configured > > > + like the data bus-width etc. > > > endif # FPGA > > > diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile index > > > 312b937..d115e29 100644 > > > --- a/drivers/fpga/Makefile > > > +++ b/drivers/fpga/Makefile > > > @@ -26,6 +26,7 @@ obj-$(CONFIG_FPGA_BRIDGE) += fpga- > > bridge.o > > > obj-$(CONFIG_SOCFPGA_FPGA_BRIDGE)+= altera-hps2fpga.o altera- > > fpga2sdram.o > > > obj-$(CONFIG_ALTERA_FREEZE_BRIDGE) += altera-freeze-bridge.o > > > obj-$(CONFIG_XILINX_PR_DECOUPLER)+= xilinx-pr-decoupler.o > > > +obj-$(CONFIG_FPGA_MGR_ZYNQ_AFI_FPGA) += zynq-afi.o > > > > > > # High Level Interfaces > > > obj-$(CONFIG_FPGA_REGION)+= fpga-region.o > > > diff --git a/drivers/fpga/zynq-afi.c b/drivers/fpga/zynq-afi.c new > > > file mode 100644 index 000..7ce0d08 > > > --- /dev/null > > > +++ b/drivers/fpga/zynq-afi.c > > > @@ -0,0 +1,81 @@ > > > +// SPDX-License-Identifier: GPL-2.0 > > > +/* > > > + * Xilinx FPGA AFI driver. > > > + * Copyright (c) 2018 Xilinx Inc. > > > + */ > > > + > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > + > > > +/* Registers and special values for doing register-based operations */ > > > +#define AFI_RDCHAN_CTRL_OFFSET 0x00 > > > +#define AFI_WRCHAN_CTRL_OFFSET 0x14 > > > + > > > +#define AFI_BUSWIDTH_MASK0x01 > > > + > > > +/** > > > + * struct afi_fpga - AFI register description > > > + * @membase: pointer to register struct > > > + * @afi_width: AFI bus width to be written > > > + */ > > > +struct zynq_afi_fpga { > > > + void __iomem*membase; > > > + u32 afi_width; > > > +}; > > > + > > > +static int zynq_afi_fpga_probe(struct platform_device *pdev) { > > > + struct zynq_afi_fpga *afi_fpga; > > > + struct resource *res; > > > + u32 reg_val; > > > + u32 val; > > > + > > > + afi_fpga = devm_kzalloc(>dev, sizeof(*afi_fpga), GFP_KERNEL); > > > + if (!afi_fpga) > > > + return -ENOMEM; > > > + > > > + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > > > + afi_fpga->membase = devm_ioremap_resource(>dev, res); > > > + if (IS_ERR(afi_fpga->membase)) > > > + return PTR_ERR(afi_fpga->membase); > > > + > > > + val = device_property_read_u32(>dev, "xlnx,afi-width", > > > +_fpga->afi_width); > > > + if (val) { > > > + dev_err(>dev, "Fail to get the afi bus width\n"); > > > + return -EINVAL; > > > + } > > > + > > > + reg_val = readl(afi_fpga->membase + AFI_RDCHAN_CTRL_OFFSET); > > > + reg_val &= ~AFI_BUSWIDTH_MASK; > > > + writel(reg_val | afi_fpga->afi_width, > > > +afi_fpga->membase + AFI_RDCHAN_CTRL_OFFSET); > > > + reg_val = readl(afi_fpga->membase + AFI_WRCHAN_CTRL_OFFSET); > > > + reg_val &= ~AFI_BUSWIDTH_MASK; > > > + writel(reg_val | afi_fpga->afi_width, > > > +afi_fpga->membase + AFI_WRCHAN_CTRL_OFFSET); > > > + > > > + return 0; > > > +} > > > + > > > +static const struct of_device_id zynq_afi_fpga_ids[] = { > > > + { .compatible = "xlnx,zynq-afi-fpga" }, > > > + { }, > > > +}; > > > +MODULE_DEVICE_TABLE(of, zynq_afi_fpga_ids); > > > + > > > +static struct platform_driver
Re: [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint
On 05/09, Chao Yu wrote: > On 2020/5/9 0:10, Jaegeuk Kim wrote: > > Hi Sayali, > > > > In order to address the perf regression, how about this? > > > >>From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001 > > From: Jaegeuk Kim > > Date: Fri, 8 May 2020 09:08:37 -0700 > > Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint > > > > There could be a scenario where f2fs_sync_node_pages gets > > called during checkpoint, which in turn tries to flush > > inline data and calls iput(). This results in deadlock as > > iput() tries to hold cp_rwsem, which is already held at the > > beginning by checkpoint->block_operations(). > > > > Call stack : > > > > Thread AThread B > > f2fs_write_checkpoint() > > - block_operations(sbi) > > - f2fs_lock_all(sbi); > > - down_write(>cp_rwsem); > > > > - open() > > - igrab() > > - write() write inline data > > - unlink() > > - f2fs_sync_node_pages() > > - if (is_inline_node(page)) > > - flush_inline_data() > >- ilookup() > > page = f2fs_pagecache_get_page() > > if (!page) > > goto iput_out; > > iput_out: > > -close() > > -iput() > >iput(inode); > >- f2fs_evict_inode() > > - f2fs_truncate_blocks() > > - f2fs_lock_op() > >- down_read(>cp_rwsem); > > > > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to > > inline_data") > > Signed-off-by: Sayali Lokhande > > Signed-off-by: Jaegeuk Kim > > --- > > fs/f2fs/node.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > > index 1db8cabf727ef..626d7daca09de 100644 > > --- a/fs/f2fs/node.c > > +++ b/fs/f2fs/node.c > > @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > > goto continue_unlock; > > } > > > > - /* flush inline_data */ > > - if (is_inline_node(page)) { > > + /* flush inline_data, if it's not sync path. */ > > + if (do_balance && is_inline_node(page)) { > > IIRC, this flow was designed to avoid running out of free space issue > during checkpoint: > > 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > > The sceanrio is: > 1. create fully node blocks > 2. flush node blocks > 3. write inline_data for all the node blocks again > 4. flush node blocks redundantly > > I guess this may cause failing one case of fstest. Yeah, actually I was hitting 204 failure, and thus, revised like this. Now, I don't see any regression in fstest. >From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001 From: Sayali Lokhande Date: Thu, 30 Apr 2020 16:28:29 +0530 Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint There could be a scenario where f2fs_sync_node_pages gets called during checkpoint, which in turn tries to flush inline data and calls iput(). This results in deadlock as iput() tries to hold cp_rwsem, which is already held at the beginning by checkpoint->block_operations(). Call stack : Thread AThread B f2fs_write_checkpoint() - block_operations(sbi) - f2fs_lock_all(sbi); - down_write(>cp_rwsem); - open() - igrab() - write() write inline data - unlink() - f2fs_sync_node_pages() - if (is_inline_node(page)) - flush_inline_data() - ilookup() page = f2fs_pagecache_get_page() if (!page) goto iput_out; iput_out: -close() -iput() iput(inode); - f2fs_evict_inode() - f2fs_truncate_blocks() - f2fs_lock_op() - down_read(>cp_rwsem); Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") Signed-off-by: Sayali Lokhande Signed-off-by: Jaegeuk Kim --- fs/f2fs/checkpoint.c | 9 - fs/f2fs/f2fs.h | 4 ++-- fs/f2fs/node.c | 10 +- 3 files changed, 15 insertions(+), 8 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index d49f7a01d8a26..928aea4ff663d 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi) }; int err = 0, cnt = 0; + /* +* Let's flush node pages first to flush inline_data. +* We'll actually guarantee everything below under f2fs_lock_all. +*/ + f2fs_sync_node_pages(sbi, , false, false, FS_CP_NODE_IO); + retry_flush_quotas: f2fs_lock_all(sbi); if (__need_flush_quota(sbi)) { @@ -1222,7 +1228,8 @@ static int block_operations(struct f2fs_sb_info *sbi) if (get_pages(sbi, F2FS_DIRTY_NODES)) {
Re: [PATCH v3] kernel: add panic_on_taint
On Sat, May 09, 2020 at 09:57:37AM -0400, Rafael Aquini wrote: > Analogously to the introduction of panic_on_warn, this patch > introduces a kernel option named panic_on_taint in order to > provide a simple and generic way to stop execution and catch > a coredump when the kernel gets tainted by any given taint flag. > > This is useful for debugging sessions as it avoids rebuilding > the kernel to explicitly add calls to panic() or BUG() into > code sites that introduce the taint flags of interest. > Another, perhaps less frequent, use for this option would be > as a mean for assuring a security policy (in paranoid mode) > case where no single taint is allowed for the running system. > > Suggested-by: Qian Cai > Signed-off-by: Rafael Aquini Reviewed-by: Kees Cook -- Kees Cook
Re: [PATCH] fpga: zynqmp: fix modular build
On Tue, May 05, 2020 at 04:00:11PM +0200, Arnd Bergmann wrote: > Two symbols need to be exported to allow the zynqmp-fpga module > to get loaded dynamically: > > ERROR: modpost: "zynqmp_pm_fpga_load" [drivers/fpga/zynqmp-fpga.ko] undefined! > ERROR: modpost: "zynqmp_pm_fpga_get_status" [drivers/fpga/zynqmp-fpga.ko] > undefined! > > To ensure this is done correctly, also fix the Kconfig dependency > to only allow building the fpga driver when the firmware driver is > either disabled, or when it is reachable. With that, the dependency > on the SoC itself can be removed, and there are no surprises when > the fpga driver is built-in but the firmware a module. > > Fixes: 4db8180ffe7c ("firmware: xilinx: Remove eemi ops for fpga related > APIs") > Signed-off-by: Arnd Bergmann > --- > drivers/fpga/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig > index b2408a710662..7cd5a29fc437 100644 > --- a/drivers/fpga/Kconfig > +++ b/drivers/fpga/Kconfig > @@ -208,7 +208,7 @@ config FPGA_DFL_PCI > > config FPGA_MGR_ZYNQMP_FPGA > tristate "Xilinx ZynqMP FPGA" > - depends on ARCH_ZYNQMP || COMPILE_TEST > + depends on ZYNQMP_FIRMWARE || (!ZYNQMP_FIRMWARE && COMPILE_TEST) > help > FPGA manager driver support for Xilinx ZynqMP FPGAs. > This driver uses the processor configuration port(PCAP) > -- > 2.26.0 > Applied to fixes, Thanks
My Dear in the lord
My Dear in the lord My name is Mrs. Mina A. Brunel I am a Norway Citizen who is living in Burkina Faso, I am married to Mr. Brunel Patrice, a politician who owns a small gold company in Burkina Faso; He died of Leprosy and Radesyge, in the year February 2010, During his lifetime he deposited the sum of € 8.5 Million Euro) Eight million, Five hundred thousand Euros in a bank in Rome the capital city of Italy in Southern Europe. The money was from the sale of his company and death benefits payment and entitlements of my deceased husband by his company. I am sending you this message with heavy tears in my eyes and great sorrow in my heart, and also praying that it will reach you in good health because I am not in good health, I sleep every night without knowing if I may be alive to see the next day. I am suffering from long time cancer and presently I am partially suffering from Leprosy, which has become difficult for me to move around. I was married to my late husband for more than 6 years without having a child and my doctor confided that I have less chance to live, having to know when the cup of death will come, I decided to contact you to claim the fund since I don't have any relation I grew up from an orphanage home. I have decided to donate this money for the support of helping Motherless babies/Less privileged/Widows and churches also to build the house of God because I am dying and diagnosed with cancer for about 3 years ago. I have decided to donate from what I have inherited from my late husband to you for the good work of Almighty God; I will be going in for an operation surgery soon. Now I want you to stand as my next of kin to claim the funds for charity purposes. Because of this money remains unclaimed after my death, the bank executives or the government will take the money as unclaimed fund and maybe use it for selfishness and worthless ventures, I need a very honest person who can claim this money and use it for Charity works, for orphanages, widows and also build schools and churches for less privilege that will be named after my late husband and my name. I need your urgent answer to know if you will be able to execute this project, and I will give you more information on how the fund will be transferred to your bank account or online banking. Thanks Mrs. Mina A. Brunel
Re: [PATCH v8 8/8] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32
Le 09/05/2020 à 17:54, Christophe Leroy a écrit : Le 28/04/2020 à 18:05, Arnd Bergmann a écrit : On Tue, Apr 28, 2020 at 3:16 PM Christophe Leroy wrote: Provides __kernel_clock_gettime64() on vdso32. This is the 64 bits version of __kernel_clock_gettime() which is y2038 compliant. Signed-off-by: Christophe Leroy Looks good to me Reviewed-by: Arnd Bergmann There was a bug on ARM for the corresponding function, so far it is unclear if this was a problem related to particular hardware, the 32-bit kernel code, or the common implementation of clock_gettime64 in the vdso library, see https://github.com/richfelker/musl-cross-make/issues/96 Just to be sure that powerpc is not affected by the same issue, can you confirm that repeatedly calling clock_gettime64 on powerpc32, alternating between vdso and syscall, results in monotically increasing times? I think that's one of the things vdsotest checks, so yes that's ok I think. Here is the full result with vdsotest: gettimeofday: syscall: 3715 nsec/call gettimeofday:libc: 794 nsec/call gettimeofday:vdso: 947 nsec/call getcpu: syscall: 1614 nsec/call getcpu:libc: 484 nsec/call getcpu:vdso: 184 nsec/call clock-gettime64-realtime-coarse: syscall: 3152 nsec/call clock-gettime64-realtime-coarse:libc: not tested clock-gettime64-realtime-coarse:vdso: 653 nsec/call clock-getres-realtime-coarse: syscall: 2963 nsec/call clock-getres-realtime-coarse:libc: 745 nsec/call clock-getres-realtime-coarse:vdso: 553 nsec/call clock-gettime-realtime-coarse: syscall: 5120 nsec/call clock-gettime-realtime-coarse:libc: 731 nsec/call clock-gettime-realtime-coarse:vdso: 577 nsec/call clock-gettime64-realtime: syscall: 3719 nsec/call clock-gettime64-realtime:libc: not tested clock-gettime64-realtime:vdso: 976 nsec/call clock-getres-realtime: syscall: 2496 nsec/call clock-getres-realtime:libc: 745 nsec/call clock-getres-realtime:vdso: 546 nsec/call clock-gettime-realtime: syscall: 4800 nsec/call clock-gettime-realtime:libc: 1080 nsec/call clock-gettime-realtime:vdso: 1798 nsec/call clock-gettime64-boottime: syscall: 4132 nsec/call clock-gettime64-boottime:libc: not tested clock-gettime64-boottime:vdso: 975 nsec/call clock-getres-boottime: syscall: 2497 nsec/call clock-getres-boottime:libc: 730 nsec/call clock-getres-boottime:vdso: 546 nsec/call clock-gettime-boottime: syscall: 3728 nsec/call clock-gettime-boottime:libc: 1079 nsec/call clock-gettime-boottime:vdso: 941 nsec/call clock-gettime64-tai: syscall: 4148 nsec/call clock-gettime64-tai:libc: not tested clock-gettime64-tai:vdso: 955 nsec/call clock-getres-tai: syscall: 2494 nsec/call clock-getres-tai:libc: 730 nsec/call clock-getres-tai:vdso: 545 nsec/call clock-gettime-tai: syscall: 3729 nsec/call clock-gettime-tai:libc: 1079 nsec/call clock-gettime-tai:vdso: 927 nsec/call clock-gettime64-monotonic-raw: syscall: 3677 nsec/call clock-gettime64-monotonic-raw:libc: not tested clock-gettime64-monotonic-raw:vdso: 1032 nsec/call clock-getres-monotonic-raw: syscall: 2494 nsec/call clock-getres-monotonic-raw:libc: 745 nsec/call clock-getres-monotonic-raw:vdso: 545 nsec/call clock-gettime-monotonic-raw: syscall: 3276 nsec/call clock-gettime-monotonic-raw:libc: 1140 nsec/call clock-gettime-monotonic-raw:vdso: 1002 nsec/call clock-gettime64-monotonic-coarse: syscall: 4099 nsec/call clock-gettime64-monotonic-coarse:libc: not tested clock-gettime64-monotonic-coarse:vdso: 653 nsec/call clock-getres-monotonic-coarse: syscall: 2962 nsec/call clock-getres-monotonic-coarse:libc: 745 nsec/call clock-getres-monotonic-coarse:vdso: 545 nsec/call clock-gettime-monotonic-coarse: syscall: 4297 nsec/call clock-gettime-monotonic-coarse:libc: 730 nsec/call clock-gettime-monotonic-coarse:vdso: 592 nsec/call clock-gettime64-monotonic: syscall: 3863 nsec/call clock-gettime64-monotonic:libc: not tested clock-gettime64-monotonic:vdso: 975 nsec/call clock-getres-monotonic: syscall: 2494 nsec/call clock-getres-monotonic:libc: 745 nsec/call clock-getres-monotonic:vdso: 545 nsec/call clock-gettime-monotonic: syscall: 3465 nsec/call clock-gettime-monotonic:libc: 1079 nsec/call clock-gettime-monotonic:vdso: 927 nsec/call Christophe
Re: [PATCH] staging: vc04_services: interface: vchiq_arm: vchiq_connected.c: Block comments should align the * on each line
Hi Greg Yea, thanks for the feedback. Will fix and resend... John On Sat, 9 May 2020 at 18:11, Greg KH wrote: > > On Sat, May 09, 2020 at 02:07:14PM +0100, John Oldman wrote: > > Coding style issue > > Your subject line needs to be much shorter, don't you think? > > Please fix up and resend. > > greg k-h
Re: [PATCH 00/15] net: taint when the device driver firmware crashes
On Sat, 9 May 2020 04:35:37 + Luis Chamberlain wrote: > Device driver firmware can crash, and sometimes, this can leave your > system in a state which makes the device or subsystem completely > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead > of scraping some magical words from the kernel log, which is driver > specific, is much easier. So instead this series provides a helper which > lets drivers annotate this and shows how to use this on networking > drivers. > > My methodology for finding when firmware crashes is to git grep for > "crash" and then doing some study of the code to see if this indeed > a place where the firmware crashes. In some places this is quite > obvious. > > I'm starting off with networking first, if this gets merged later on I > can focus on the other drivers, but I already have some work done on > other subsytems. > > Review, flames, etc are greatly appreciated. Tainting itself may be useful, but that's just the first step. I'd much rather see folks start using the devlink health infrastructure. Devlink is netlink based, but it's _not_ networking specific (many of its optional features obviously are, but don't let that mislead you). With devlink health we get (a) a standard notification on the failure; (b) information/state dump in a (somewhat) structured form, which can be collected & shared with vendors; (c) automatic remediation (usually device reset of some scope). Now regarding the tainting - as I said it may be useful, but don't we have to define what constitutes a "firmware crash"? There are many failure modes, some perfectly recoverable (e.g. processing queue hang), some mere bugs (e.g. device fails to initialize some functions). All of them may impact the functioning of the system. How do we choose those that taint?
Re: [PATCH v1 2/3] armv8: gpio: add gpio feature
> Oh, and... u-b...@linux.nxdi.nxp.com bounces because that domain is > not resolvable - I guess that is internal to NXP, and this patch > should have remained within NXP and not been posted publically. Yeah, realized it just after sending my reply. It is for internal NXP list. Had it been for open source U-boot, Author would have chosen "u-b...@lists.denx.de" ? But, I still believe Author did it unknowingly and we all should be easy on him/her. Thanks -Amit
Re: [PATCH v1 2/3] armv8: gpio: add gpio feature
On Sat, May 09, 2020 at 07:18:45PM +0100, Russell King - ARM Linux admin wrote: > On Sat, May 09, 2020 at 11:34:59PM +0530, Amit Tomer wrote: > > > From what I can tell, these patches are not for the kernel. The > > > filenames don't match th kernel layout. > > > > These files looks to be from U-boot, and must be intended for U-boot > > as I see U-boot mailing address in recipient's address? > > So why is it copied to: > > devicet...@vger.kernel.org - a kernel mailing list > linux-kernel@vger.kernel.org - the main kernel mailing list > linux-g...@vger.kernel.org - the gpio driver kernel mailing list > linux-arm-ker...@lists.infradead.org - the ARM kernel mailing list > > Given that it includes four kernel mailing lists (ok, devicetree > may be argued to have a wider application), then I don't think the > conclusion that "it's for u-boot, because there's _one_ u-boot > mailing list in the recipients" is particularly obvious. > > The author really needs to state that up front if they're sending > it to a wide audeience, rather than leaving people to guess, thereby > potentially wasting their time. > > Not only did Andrew review the patch as if it were for the kernel, > but I also wasted time on this as well when I double-took the > ifdefs, and wanted to check the current driver in the kernel. Oh, and... u-b...@linux.nxdi.nxp.com bounces because that domain is not resolvable - I guess that is internal to NXP, and this patch should have remained within NXP and not been posted publically. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up
Re: [PATCH v1 2/3] armv8: gpio: add gpio feature
On Sat, May 09, 2020 at 11:34:59PM +0530, Amit Tomer wrote: > > From what I can tell, these patches are not for the kernel. The > > filenames don't match th kernel layout. > > These files looks to be from U-boot, and must be intended for U-boot > as I see U-boot mailing address in recipient's address? So why is it copied to: devicet...@vger.kernel.org - a kernel mailing list linux-kernel@vger.kernel.org - the main kernel mailing list linux-g...@vger.kernel.org - the gpio driver kernel mailing list linux-arm-ker...@lists.infradead.org - the ARM kernel mailing list Given that it includes four kernel mailing lists (ok, devicetree may be argued to have a wider application), then I don't think the conclusion that "it's for u-boot, because there's _one_ u-boot mailing list in the recipients" is particularly obvious. The author really needs to state that up front if they're sending it to a wide audeience, rather than leaving people to guess, thereby potentially wasting their time. Not only did Andrew review the patch as if it were for the kernel, but I also wasted time on this as well when I double-took the ifdefs, and wanted to check the current driver in the kernel. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up
Re: [PATCH] net/sonic: Fix some resource leaks in error handling paths
On Sat, 9 May 2020 18:47:08 +0200 Christophe JAILLET wrote: > Le 09/05/2020 à 03:54, Jakub Kicinski a écrit : > > On Fri, 8 May 2020 19:25:57 +0200 Christophe JAILLET wrote: > >> @@ -527,8 +531,9 @@ static int mac_sonic_platform_remove(struct > >> platform_device *pdev) > >>struct sonic_local* lp = netdev_priv(dev); > >> > >>unregister_netdev(dev); > >> - dma_free_coherent(lp->device, SIZEOF_SONIC_DESC * > >> SONIC_BUS_SCALE(lp->dma_bitmode), > >> -lp->descriptors, lp->descriptors_laddr); > >> + dma_free_coherent(lp->device, > >> +SIZEOF_SONIC_DESC * SONIC_BUS_SCALE(lp->dma_bitmode), > >> +lp->descriptors, lp->descriptors_laddr); > >>free_netdev(dev); > >> > >>return 0; > > This is a white-space only change, right? Since this is a fix we should > > avoid making cleanups which are not strictly necessary. > > Right. > > The reason of this clean-up is that I wanted to avoid a checkpatch > warning with the proposed patch and I felt that having the same layout > in the error handling path of the probe function and in the remove > function was clearer. > So I updated also the remove function. I understand the motivation is good. > Fell free to ignore this hunk if not desired. I will not sent a V2 only > for that. That's not how it works. Busy maintainers don't have time to hand edit patches. I'm not applying this to the networking tree and I'm tossing it from patchwork. Please address the basic feedback. Thank you.
Re: [PATCH v1 2/3] armv8: gpio: add gpio feature
> From what I can tell, these patches are not for the kernel. The > filenames don't match th kernel layout. These files looks to be from U-boot, and must be intended for U-boot as I see U-boot mailing address in recipient's address?
Re: [PATCH v2] net: Protect INET_ADDR_COOKIE on 32-bit architectures
On Sat, 9 May 2020 10:13:22 +0200 Stephen Kitt wrote: > Hi, > > Thanks for taking the time to review my patch. > > On Fri, 8 May 2020 20:50:25 -0700, Jakub Kicinski wrote: > > On Fri, 8 May 2020 14:04:57 +0200 Stephen Kitt wrote: > > > Commit c7228317441f ("net: Use a more standard macro for > > > INET_ADDR_COOKIE") added a __deprecated marker to the cookie name on > > > 32-bit architectures, with the intent that the compiler would flag > > > uses of the name. However since commit 771c035372a0 ("deprecate the > > > '__deprecated' attribute warnings entirely and for good"), > > > __deprecated doesn't do anything and should be avoided. > > > > > > This patch changes INET_ADDR_COOKIE to declare a dummy struct so that > > > any subsequent use of the cookie's name will in all likelihood break > > > the build. It also removes the __deprecated marker. > > > > I think the macro is supposed to cause a warning when the variable > > itself is accessed. And I don't think that happens with your patch > > applied. > > Yes, the warning is what was lost when __deprecated lost its meaning. I was > trying to preserve that, or rather extend it so that the build would break if > the cookie was used on 32-bit architectures, and my patch ensures it does if > the cookie is used in a comparison or assignment, but ... > > > + kfree(); > > I hadn’t thought of taking a pointer to it. > > If we want to preserve the use of the macro with a semi-colon, which is what > Joe’s patch introduced (along with the deprecation warning), we still need > some sort of declaration which can’t be used. Perhaps > > #define INET_ADDR_COOKIE(__name, __saddr, __daddr) \ > struct __name {} __attribute__((unused)) > > would be better — it declares the cookie as a struct, not a variable, so then > the build fails if the cookie is used as anything other than a struct. If > anyone does try to use it as a struct, the build will fail on 64-bit > architectures... > > CC net/ipv4/inet_hashtables.o > net/ipv4/inet_hashtables.c: In function ‘__inet_lookup_established’: > net/ipv4/inet_hashtables.c:362:9: error: ‘acookie’ undeclared (first use in > this function) > kfree(); > ^~~ > net/ipv4/inet_hashtables.c:362:9: note: each undeclared identifier is > reported only once for each function it appears in > make[2]: *** [scripts/Makefile.build:267: net/ipv4/inet_hashtables.o] Error 1 > make[1]: *** [scripts/Makefile.build:488: net/ipv4] Error 2 > make: *** Makefile:1722: net] Error 2 Hm. That does seem better. Although thinking about it - we will not get a warning when someone declares a variable with the same name.. What if we went back to your original proposal of an empty struct but added in an extern in front? That way we should get linker error on pointer references.
Re: [RFC PATCH 0/7] mm: Get rid of vmalloc_sync_(un)mappings()
On Sat, May 09, 2020 at 11:25:16AM +0200, Peter Zijlstra wrote: > Right; it's just that the moment you do trigger it, it'll iterate that > pgd_list and that is potentially huge. Then again, that's not a new > problem. > > I suppose we can deal with it if/when it becomes an actual problem. > > I had a quick look and I think it might be possible to make it an RCU > managed list. We'd have to remove the pgd_list entry in > put_task_struct_rcu_user(). Then we can play games in sync_global_*() > and use RCU iteration. New tasks (which can be missed in the RCU > iteration) will already have a full clone of the PGD anyway. Right, it should not be that difficult to rcu-protect the pgd-list, but we can try that when it actually becomes a problem. Joerg
Fw:
Gorące pozdrowienia, Wysłałem ci ten list miesiąc temu, ale nie jestem pewien, czy go otrzymałeś, ponieważ nie otrzymałem od ciebie żadnej odpowiedzi, dlatego ponownie go wysyłam. Jestem adwokatem Chan Leap Phala, osobistym prawnikiem mojego zmarłego klienta przed jego przedwczesną śmiercią z rodziną. Otrzymałem od jego banku mandat na dostarczenie / przedstawienie bliskich krewnych kwoty jego funduszu o wartości Siedemdziesiąt Osiem Milionów Pięćset Sto Osiemdziesięciu Tysięcy Dolarów Amerykańskich 78 580 000 USD, więc skontaktowałem się z Tobą. Po nieudanej próbie znalezienia krewnego mojego zmarłego klienta postanowiłem się z tobą skontaktować, ponieważ ma on to samo nazwisko i narodowość. Proszę o kontakt w celu uzyskania dalszych informacji: (chan.l.ph...@gmail.com) Z poważaniem Barrister Chan Leap PHALA Prawnik / Adwokaci E-mail: chan.l.ph...@gmail.com
Re: [RFC PATCH 0/7] mm: Get rid of vmalloc_sync_(un)mappings()
On Fri, May 08, 2020 at 04:49:17PM -0700, Andy Lutomirski wrote: > On Fri, May 8, 2020 at 2:36 PM Joerg Roedel wrote: > > > > On Fri, May 08, 2020 at 02:33:19PM -0700, Andy Lutomirski wrote: > > > On Fri, May 8, 2020 at 7:40 AM Joerg Roedel wrote: > > > > > What's the maximum on other system types? It might make more sense to > > > take the memory hit and pre-populate all the tables at boot so we > > > never have to sync them. > > > > Need to look it up for 5-level paging, with 4-level paging its 64 pages > > to pre-populate the vmalloc area. > > > > But that would not solve the problem on x86-32, which needs to > > synchronize unmappings on the PMD level. > > What changes in this series with x86-32? This series sets ARCH_PAGE_TABLE_SYNC_MASK to PGTBL_PMD_MODIFIED, so that the synchronization happens every time PMD(s) in the vmalloc areas are changed. Before this series this synchronization only happened at arbitrary places calling vmalloc_sync_(un)mappings(). > We already do that synchronization, right? IOW, in the cases where > the vmalloc *fault* code does anything at all, we should have a small > bound for how much memory to preallocate and, if we preallocate it, > then there is nothing to sync and nothing to fault. And we have the > benefit that we never need to sync anything on 64-bit, which is kind > of nice. Don't really get you here, what is pre-allocated and why is there no need to sync and fault then? > Do we actually need PMD-level things for 32-bit? What if we just > outlawed huge pages in the vmalloc space on 32-bit non-PAE? Disallowing huge-pages would at least remove the need to sync unmappings, but we still need to sync new PMD entries. Remember that the size of the vmalloc area on 32 bit is dynamic and depends on the VM-split and the actual amount of RAM on the system. A machine wit 512MB of RAM and a 1G/3G split will have around 2.5G of VMALLOC address space. And if we want to avoid vmalloc-faults there, we need to pre-allocate all PTE pages for that area (and the amount of PTE pages needed increases when RAM decreases). On a machine with 512M of RAM we would need ca. 1270+ PTE pages, which is around 5M (or 1% of total system memory). Regards, Joerg
[PATCH] drm/exynos: fimc: Add support for S5PV210 variant
S5PV210 can be trivially supported by this driver. Only one of its FIMC devices (#2) supports the same scaling values as Exynos4, and it is marked by mainscaler-ext in the DTS (as it is for all of the Exynos4 devices). It's limits are the same as that of id's 0-2 of Exynos4 so we don't even need to change the device id check. It has been tested with a modified libdrm's test from https://github.com/tobiasjakobi/libdrm/tree/ippv2 Signed-off-by: Jonathan Bakker --- drivers/gpu/drm/exynos/exynos_drm_fimc.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_fimc.c b/drivers/gpu/drm/exynos/exynos_drm_fimc.c index 29ab8be8604c..63e1b8ccb8e9 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_fimc.c +++ b/drivers/gpu/drm/exynos/exynos_drm_fimc.c @@ -89,6 +89,7 @@ struct fimc_scaler { * @regs: memory mapped io registers. * @lock: locking of operations. * @clocks: fimc clocks. + * @num_clocks: number of fimc clocks * @sc: scaler infomations. * @pol: porarity of writeback. * @id: fimc id. @@ -107,6 +108,7 @@ struct fimc_context { void __iomem*regs; spinlock_t lock; struct clk *clocks[FIMC_CLKS_MAX]; + int num_clocks; struct fimc_scaler sc; int id; int irq; @@ -1183,7 +1185,7 @@ static int fimc_setup_clocks(struct fimc_context *ctx) for (i = 0; i < FIMC_CLKS_MAX; i++) ctx->clocks[i] = ERR_PTR(-EINVAL); - for (i = 0; i < FIMC_CLKS_MAX; i++) { + for (i = 0; i < ctx->num_clocks; i++) { if (i == FIMC_CLK_WB_A || i == FIMC_CLK_WB_B) dev = fimc_dev->parent; else @@ -1210,6 +1212,9 @@ int exynos_drm_check_fimc_device(struct device *dev) { int id = of_alias_get_id(dev->of_node, "fimc"); + if (!of_property_read_bool(dev->of_node, "samsung,mainscaler-ext")) + return -ENODEV; + if (id >= 0 && (BIT(id) & fimc_mask)) return 0; return -ENODEV; @@ -1277,6 +1282,11 @@ static int fimc_probe(struct platform_device *pdev) ctx->dev = dev; ctx->id = of_alias_get_id(dev->of_node, "fimc"); + if (of_device_is_compatible(dev->of_node, "samsung,s5pv210-fimc")) + ctx->num_clocks = 2; + else + ctx->num_clocks = FIMC_CLKS_MAX; + /* construct formats/limits array */ num_formats = ARRAY_SIZE(fimc_formats) + ARRAY_SIZE(fimc_tiled_formats); formats = devm_kcalloc(dev, num_formats, sizeof(*formats), @@ -1409,6 +1419,7 @@ static const struct dev_pm_ops fimc_pm_ops = { static const struct of_device_id fimc_of_match[] = { { .compatible = "samsung,exynos4210-fimc" }, { .compatible = "samsung,exynos4212-fimc" }, + { .compatible = "samsung,s5pv210-fimc" }, { }, }; MODULE_DEVICE_TABLE(of, fimc_of_match); -- 2.20.1
Mutual business proposal
Greeting, My Name is Regina Daniel am a Business Consultant and I represent a group of company based in Gulf Region that wish to invest between US$10,000,000.00 TO US$550,000,000. 00 in foreign investment depending on your investment capacity based on the amount you can invest and manage. We are currently seeking means of expanding and relocating our business interest abroad in the following sectors: oil/Gas, real estate, construction stock, mining, transportation, health sector, tobacco or Communication Services. Also in Agriculture or any other viable sector. If you think you have a solid background and idea of making good profit in any of the mentioned business sectors or any other business in your country, please write me for possible business co-operation and the investment amount you can handle. Let me hear from you. Yours sincerely, Mr Regina Daniel
Re: WARNING in memtype_reserve
Hi, On 5/9/20 2:20 AM, syzbot wrote: Hello, syzbot found the following crash on: HEAD commit:d5eeab8d Merge tag 'scsi-fixes' of git://git.kernel.org/pu.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1509363210 kernel config: https://syzkaller.appspot.com/x/.config?x=b0212dbee046bc1f dashboard link: https://syzkaller.appspot.com/bug?extid=353be47c9ce21b68b7ed compiler: gcc (GCC) 9.0.0 20181231 (experimental) userspace arch: i386 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=168ee02c10 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=119f378810 The bug was bisected to: commit 2bef9aed6f0e22391c8d4570749b1acc9bc3981e Author: Jeremy Linton Date: Mon May 4 20:13:48 2020 + usb: usbfs: correct kernel->user page attribute mismatch bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1701f16810 final crash:https://syzkaller.appspot.com/x/report.txt?x=1481f16810 console output: https://syzkaller.appspot.com/x/log.txt?x=1081f16810 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+353be47c9ce21b68b...@syzkaller.appspotmail.com Fixes: 2bef9aed6f0e ("usb: usbfs: correct kernel->user page attribute mismatch") [ cut here ] memtype_reserve failed: [mem 0xff000-0x8fff], req write-back WARNING: CPU: 1 PID: 7025 at arch/x86/mm/pat/memtype.c:589 memtype_reserve+0x69f/0x820 arch/x86/mm/pat/memtype.c:589 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 7025 Comm: syz-executor254 Not tainted 5.7.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 panic+0x2e3/0x75c kernel/panic.c:221 __warn.cold+0x2f/0x35 kernel/panic.c:582 report_bug+0x27b/0x2f0 lib/bug.c:195 fixup_bug arch/x86/kernel/traps.c:175 [inline] fixup_bug arch/x86/kernel/traps.c:170 [inline] do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267 do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286 invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027 RIP: 0010:memtype_reserve+0x69f/0x820 arch/x86/mm/pat/memtype.c:589 Code: 48 8b 2c ed c0 00 29 88 e8 ae ad 3e 00 48 8d 4b ff 49 89 e8 4c 89 e2 48 c7 c6 20 01 29 88 48 c7 c7 80 f9 28 88 e8 79 e8 0f 00 <0f> 0b 41 bf ea ff ff ff e9 03 fc ff ff 41 bf ea ff ff ff e9 f8 fb RSP: 0018:c900015e7790 EFLAGS: 00010282 RAX: RBX: 9000 RCX: RDX: RSI: 815ce181 RDI: f520002bcee4 RBP: 8828ff40 R08: 888097ce85c0 R09: ed1015ce45f1 R10: 8880ae722f83 R11: ed1015ce45f0 R12: 000ff000 R13: 1920002bcef8 R14: dc00 R15: reserve_pfn_range+0x173/0x470 arch/x86/mm/pat/memtype.c:941 track_pfn_remap+0x18b/0x280 arch/x86/mm/pat/memtype.c:1033 remap_pfn_range+0x202/0xbf0 mm/memory.c:2130 dma_direct_mmap+0x197/0x260 kernel/dma/direct.c:453 dma_mmap_attrs+0xfe/0x150 kernel/dma/mapping.c:237 usbdev_mmap+0x3ae/0x730 drivers/usb/core/devio.c:254 call_mmap include/linux/fs.h:1912 [inline] mmap_region+0xafb/0x1540 mm/mmap.c:1772 do_mmap+0x849/0x1160 mm/mmap.c:1545 do_mmap_pgoff include/linux/mm.h:2553 [inline] vm_mmap_pgoff+0x197/0x200 mm/util.c:506 ksys_mmap_pgoff+0x457/0x5b0 mm/mmap.c:1595 do_syscall_32_irqs_on arch/x86/entry/common.c:337 [inline] do_fast_syscall_32+0x270/0xe90 arch/x86/entry/common.c:396 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 Kernel Offset: disabled Rebooting in 86400 seconds.. --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. For information about bisection process see: https://goo.gl/tpsmEJ#bisection syzbot can test patches for this bug, for details see: https://goo.gl/tpsmEJ#testing-patches Adding a range check in dma_direct_mmap() looks like a fine fix. I'm curious how the mapping attribute changed though. I expected that would be rare on x86 (ISA devices/etc). What is the GCE USB controller here? Its not the same as the virtual qemu/pci/xhci right?
Re: [PATCH] parisc: suppress error messages for 'make clean'
* Masahiro Yamada : > Hi Helge, > > On Sat, May 9, 2020 at 6:46 AM Helge Deller wrote: > > > > * Masahiro Yamada : > > > On Sat, Apr 25, 2020 at 2:47 PM Masahiro Yamada > > > wrote: > > > > > > > > 'make ARCH=parisc clean' emits a tons of error messages as follows: > > > > > > > > $ make ARCH=parisc clean > > > > gcc: error: unrecognized command line option '-mno-space-regs' > > > > gcc: error: unrecognized command line option '-mfast-indirect-calls'; > > > > did you mean '-mforce-indirect-call'? > > > > gcc: error: unrecognized command line option '-mdisable-fpregs' > > > > gcc: error: missing argument to '-Wframe-larger-than=' > > > > gcc: error: unrecognized command line option '-mno-space-regs' > > > > gcc: error: unrecognized command line option '-mfast-indirect-calls'; > > > > did you mean '-mforce-indirect-call'? > > > > gcc: error: unrecognized command line option '-mdisable-fpregs' > > > > gcc: error: missing argument to '-Wframe-larger-than=' > > > > ... > > > > > > > > You can supporess them except '-Wframe-larger-than' by setting correct > > > > CROSS_COMPILE=, but we should not require any compiler for cleaning. > > > > > > > > This $(shell ...) is evaluated so many times because LIBGCC is exported. > > > > Use the ':=' operator to evaluate it just once, and sink the stderr. > > > > > > > > > > Applied to linux-kbuild. > > > > That patch breaks then building the boot loader/compressor: > > ... > > hppa-linux-gnu-ld-X -e startup --as-needed -T > > arch/parisc/boot/compressed/vmlinux.lds arch/parisc/boot/compressed/head.o > > arch/parisc/boot/compressed/real2.o arch/parisc/boot/compressed/firmware.o > > arch/parisc/boot/compressed/misc.o arch/parisc/boot/compressed/piggy.o -o > > arch/parisc/boot/compressed/vmlinux > > hppa-linux-gnu-ld: arch/parisc/boot/compressed/misc.o: in function > > `dec_vli': > > (.text+0x104): undefined reference to `__ashldi3' > > hppa-linux-gnu-ld: arch/parisc/boot/compressed/misc.o: in function > > `lzma_len': > > (.text+0x2b0): undefined reference to `$$mulI' > > hppa-linux-gnu-ld: (.text+0x344): undefined reference to `$$mulI' > > hppa-linux-gnu-ld: (.text+0x3f8): undefined reference to `$$mulI' > > > > > > The patch below works, but I wonder if it's possible to avoid > > to examine LIBGCC twice? > > > > Helge > > > Sorry for the breakage. > > How about moving LIBGCC below ? Good idea. The patch below does work for me. We do not need $KBUILD_CFLAGS to get the libgcc.a filename, so we do not need to pipe the output to /dev/null either. Can you try if that works, and if yes, can you apply it? Helge diff --git a/arch/parisc/Makefile b/arch/parisc/Makefile index 628cd8bb7ad8..fadbbd010337 100644 --- a/arch/parisc/Makefile +++ b/arch/parisc/Makefile @@ -21,8 +21,6 @@ KBUILD_IMAGE := vmlinuz NM = sh $(srctree)/arch/parisc/nm CHECKFLAGS += -D__hppa__=1 -LIBGCC = $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name) -export LIBGCC ifdef CONFIG_64BIT UTS_MACHINE:= parisc64 @@ -110,6 +108,8 @@ cflags-$(CONFIG_PA8X00) += -march=2.0 -mschedule=8000 head-y := arch/parisc/kernel/head.o KBUILD_CFLAGS += $(cflags-y) +LIBGCC := $(shell $(CC) -print-libgcc-file-name) +export LIBGCC kernel-y := mm/ kernel/ math-emu/
[PATCH v11 04/18] x86/entry/64: Clean up paranoid exit
From: Andy Lutomirski All that paranoid exit needs to do is to disable IRQs, handle IRQ tracing, then restore CR3, and restore GS base. Simply do those actions in that order. Cleaning up the spaghetti code. Signed-off-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen Cc: Vegard Nossum --- arch/x86/entry/entry_64.S | 26 -- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 0e9504fabe526..3adb3c8e2409b 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1266,19 +1266,25 @@ SYM_CODE_END(paranoid_entry) SYM_CODE_START_LOCAL(paranoid_exit) UNWIND_HINT_REGS DISABLE_INTERRUPTS(CLBR_ANY) + + /* +* The order of operations is important. IRQ tracing requires +* kernel GS base and CR3. RESTORE_CR3 requires kernel GS base. +* +* NB to anyone to try to optimize this code: this code does +* not execute at all for exceptions from user mode. Those +* exceptions go through error_exit instead. +*/ TRACE_IRQS_OFF_DEBUG - testl %ebx, %ebx /* swapgs needed? */ - jnz .Lparanoid_exit_no_swapgs - TRACE_IRQS_IRETQ - /* Always restore stashed CR3 value (see paranoid_entry) */ - RESTORE_CR3 scratch_reg=%rbx save_reg=%r14 + RESTORE_CR3 scratch_reg=%rax save_reg=%r14 + + /* If EBX is 0, SWAPGS is required */ + testl %ebx, %ebx + jnz restore_regs_and_return_to_kernel + + /* We are returning to a context with user GS base */ SWAPGS_UNSAFE_STACK jmp restore_regs_and_return_to_kernel -.Lparanoid_exit_no_swapgs: - TRACE_IRQS_IRETQ_DEBUG - /* Always restore stashed CR3 value (see paranoid_entry) */ - RESTORE_CR3 scratch_reg=%rbx save_reg=%r14 - jmp restore_regs_and_return_to_kernel SYM_CODE_END(paranoid_exit) /* -- 2.20.1
[PATCH v11 10/18] x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
From: "Chang S. Bae" Add CPU feature conditional FS/GS base access to the relevant helper functions. That allows accelerating certain FS/GS base operations in subsequent changes. Note, that while possible, the user space entry/exit GS base operations are not going to use the new FSGSBASE instructions. The reason is that it would require additional storage for the user space value which adds more complexity to the low level code and experiments have shown marginal benefit. This may be revisited later but for now the SWAPGS based handling in the entry code is preserved except for the paranoid entry/exit code. Suggested-by: Tony Luck Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen Cc: Andrew Cooper --- arch/x86/include/asm/fsgsbase.h | 27 +++ arch/x86/kernel/process_64.c| 58 + 2 files changed, 70 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h index fdd1177499b40..aefd53767a5d4 100644 --- a/arch/x86/include/asm/fsgsbase.h +++ b/arch/x86/include/asm/fsgsbase.h @@ -49,35 +49,32 @@ static __always_inline void wrgsbase(unsigned long gsbase) asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory"); } +#include + /* Helper functions for reading/writing FS/GS base */ static inline unsigned long x86_fsbase_read_cpu(void) { unsigned long fsbase; - rdmsrl(MSR_FS_BASE, fsbase); + if (static_cpu_has(X86_FEATURE_FSGSBASE)) + fsbase = rdfsbase(); + else + rdmsrl(MSR_FS_BASE, fsbase); return fsbase; } -static inline unsigned long x86_gsbase_read_cpu_inactive(void) -{ - unsigned long gsbase; - - rdmsrl(MSR_KERNEL_GS_BASE, gsbase); - - return gsbase; -} - static inline void x86_fsbase_write_cpu(unsigned long fsbase) { - wrmsrl(MSR_FS_BASE, fsbase); + if (static_cpu_has(X86_FEATURE_FSGSBASE)) + wrfsbase(fsbase); + else + wrmsrl(MSR_FS_BASE, fsbase); } -static inline void x86_gsbase_write_cpu_inactive(unsigned long gsbase) -{ - wrmsrl(MSR_KERNEL_GS_BASE, gsbase); -} +extern unsigned long x86_gsbase_read_cpu_inactive(void); +extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase); #endif /* CONFIG_X86_64 */ diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 5ef9d8f25b0e8..aaa65f284b9b9 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -328,6 +328,64 @@ static unsigned long x86_fsgsbase_read_task(struct task_struct *task, return base; } +unsigned long x86_gsbase_read_cpu_inactive(void) +{ + unsigned long gsbase; + + if (static_cpu_has(X86_FEATURE_FSGSBASE)) { + bool need_restore = false; + unsigned long flags; + + /* +* We read the inactive GS base value by swapping +* to make it the active one. But we cannot allow +* an interrupt while we switch to and from. +*/ + if (!irqs_disabled()) { + local_irq_save(flags); + need_restore = true; + } + + native_swapgs(); + gsbase = rdgsbase(); + native_swapgs(); + + if (need_restore) + local_irq_restore(flags); + } else { + rdmsrl(MSR_KERNEL_GS_BASE, gsbase); + } + + return gsbase; +} + +void x86_gsbase_write_cpu_inactive(unsigned long gsbase) +{ + if (static_cpu_has(X86_FEATURE_FSGSBASE)) { + bool need_restore = false; + unsigned long flags; + + /* +* We write the inactive GS base value by swapping +* to make it the active one. But we cannot allow +* an interrupt while we switch to and from. +*/ + if (!irqs_disabled()) { + local_irq_save(flags); + need_restore = true; + } + + native_swapgs(); + wrgsbase(gsbase); + native_swapgs(); + + if (need_restore) + local_irq_restore(flags); + } else { + wrmsrl(MSR_KERNEL_GS_BASE, gsbase); + } +} + unsigned long x86_fsbase_read_task(struct task_struct *task) { unsigned long fsbase; -- 2.20.1
[PATCH v11 16/18] x86/fsgsbase/64: Enable FSGSBASE on 64bit by default and add a chicken bit
From: Andy Lutomirski Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable FSGSBASE by default, and add nofsgsbase to disable it. While this changes userspace visible ABI, we could not find a project that would be affected by this. Few projects were contacted for input and ack: - 5-level EPT: http://lkml.kernel.org/r/9ddf602b-6c8b-8c1e-ab46-07ed12366...@redhat.com - rr: https://mail.mozilla.org/pipermail/rr-dev/2018-March/000616.html - CRIU: https://lists.openvz.org/pipermail/criu/2018-March/040654.html Signed-off-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen --- .../admin-guide/kernel-parameters.txt | 3 +- arch/x86/kernel/cpu/common.c | 32 --- 2 files changed, 15 insertions(+), 20 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index af3aaade195b8..1924845c879c2 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3033,8 +3033,7 @@ no5lvl [X86-64] Disable 5-level paging mode. Forces kernel to use 4-level paging instead. - unsafe_fsgsbase [X86] Allow FSGSBASE instructions. This will be - replaced with a nofsgsbase flag. + nofsgsbase [X86] Disables FSGSBASE instructions. no_console_suspend [HW] Never suspend the console diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 4224760c74e27..0d480cbadc7dc 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -418,21 +418,21 @@ static void __init setup_cr_pinning(void) static_key_enable(_pinning.key); } -/* - * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are - * updated. This allows us to get the kernel ready incrementally. - * - * Once all the pieces are in place, these will go away and be replaced with - * a nofsgsbase chicken flag. - */ -static bool unsafe_fsgsbase; - -static __init int setup_unsafe_fsgsbase(char *arg) +static __init int x86_nofsgsbase_setup(char *arg) { - unsafe_fsgsbase = true; + /* Require an exact match without trailing characters. */ + if (strlen(arg)) + return 0; + + /* Do not emit a message if the feature is not present. */ + if (!boot_cpu_has(X86_FEATURE_FSGSBASE)) + return 1; + + setup_clear_cpu_cap(X86_FEATURE_FSGSBASE); + pr_info("FSGSBASE disabled via kernel command line\n"); return 1; } -__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase); +__setup("nofsgsbase", x86_nofsgsbase_setup); /* * Protection Keys are not available in 32-bit mode. @@ -1495,12 +1495,8 @@ static void identify_cpu(struct cpuinfo_x86 *c) setup_umip(c); /* Enable FSGSBASE instructions if available. */ - if (cpu_has(c, X86_FEATURE_FSGSBASE)) { - if (unsafe_fsgsbase) - cr4_set_bits(X86_CR4_FSGSBASE); - else - clear_cpu_cap(c, X86_FEATURE_FSGSBASE); - } + if (cpu_has(c, X86_FEATURE_FSGSBASE)) + cr4_set_bits(X86_CR4_FSGSBASE); /* * The vendor-specific functions might have changed features. -- 2.20.1
[PATCH v11 12/18] x86/fsgsbase/64: move save_fsgs to header file
Given copy_thread_tls() is now shared between 32 and 64 bit and we need to use save_fsgs() there, move it to a header file. Signed-off-by: Sasha Levin --- arch/x86/kernel/process.h| 68 arch/x86/kernel/process_64.c | 68 2 files changed, 68 insertions(+), 68 deletions(-) diff --git a/arch/x86/kernel/process.h b/arch/x86/kernel/process.h index 1d0797b2338a2..e21b6669a3851 100644 --- a/arch/x86/kernel/process.h +++ b/arch/x86/kernel/process.h @@ -37,3 +37,71 @@ static inline void switch_to_extra(struct task_struct *prev, prev_tif & _TIF_WORK_CTXSW_PREV)) __switch_to_xtra(prev, next); } + +enum which_selector { + FS, + GS +}; + +/* + * Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are + * not available. The goal is to be reasonably fast on non-FSGSBASE systems. + * It's forcibly inlined because it'll generate better code and this function + * is hot. + */ +static __always_inline void save_base_legacy(struct task_struct *prev_p, + unsigned short selector, + enum which_selector which) +{ + if (likely(selector == 0)) { + /* +* On Intel (without X86_BUG_NULL_SEG), the segment base could +* be the pre-existing saved base or it could be zero. On AMD +* (with X86_BUG_NULL_SEG), the segment base could be almost +* anything. +* +* This branch is very hot (it's hit twice on almost every +* context switch between 64-bit programs), and avoiding +* the RDMSR helps a lot, so we just assume that whatever +* value is already saved is correct. This matches historical +* Linux behavior, so it won't break existing applications. +* +* To avoid leaking state, on non-X86_BUG_NULL_SEG CPUs, if we +* report that the base is zero, it needs to actually be zero: +* see the corresponding logic in load_seg_legacy. +*/ + } else { + /* +* If the selector is 1, 2, or 3, then the base is zero on +* !X86_BUG_NULL_SEG CPUs and could be anything on +* X86_BUG_NULL_SEG CPUs. In the latter case, Linux +* has never attempted to preserve the base across context +* switches. +* +* If selector > 3, then it refers to a real segment, and +* saving the base isn't necessary. +*/ + if (which == FS) + prev_p->thread.fsbase = 0; + else + prev_p->thread.gsbase = 0; + } +} + +static __always_inline void save_fsgs(struct task_struct *task) +{ + savesegment(fs, task->thread.fsindex); + savesegment(gs, task->thread.gsindex); + if (static_cpu_has(X86_FEATURE_FSGSBASE)) { + /* +* If FSGSBASE is enabled, we can't make any useful guesses +* about the base, and user code expects us to save the current +* value. Fortunately, reading the base directly is efficient. +*/ + task->thread.fsbase = rdfsbase(); + task->thread.gsbase = x86_gsbase_read_cpu_inactive(); + } else { + save_base_legacy(task, task->thread.fsindex, FS); + save_base_legacy(task, task->thread.gsindex, GS); + } +} diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index e066750be89a0..4be88124d81ea 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -145,74 +145,6 @@ void release_thread(struct task_struct *dead_task) WARN_ON(dead_task->mm); } -enum which_selector { - FS, - GS -}; - -/* - * Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are - * not available. The goal is to be reasonably fast on non-FSGSBASE systems. - * It's forcibly inlined because it'll generate better code and this function - * is hot. - */ -static __always_inline void save_base_legacy(struct task_struct *prev_p, -unsigned short selector, -enum which_selector which) -{ - if (likely(selector == 0)) { - /* -* On Intel (without X86_BUG_NULL_SEG), the segment base could -* be the pre-existing saved base or it could be zero. On AMD -* (with X86_BUG_NULL_SEG), the segment base could be almost -* anything. -* -* This branch is very hot (it's hit twice on almost every -* context switch between 64-bit programs), and avoiding -
[PATCH v11 13/18] x86/fsgsbase/64: Use FSGSBASE instructions on thread copy and ptrace
From: "Chang S. Bae" When FSGSBASE is enabled, copying threads and reading FS/GS base using ptrace must read the actual values. When copying a thread, use fsgs_save() and copy the saved values. For ptrace, the bases must be read from memory regardless of the selector if FSGSBASE is enabled. Suggested-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen --- arch/x86/kernel/process.c| 10 ++ arch/x86/kernel/process_64.c | 6 -- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 9da70b279dad8..3ebb56cc2cfee 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -127,6 +127,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp, struct inactive_task_frame *frame; struct fork_frame *fork_frame; struct pt_regs *childregs; + struct task_struct *me = current; int ret = 0; childregs = task_pt_regs(p); @@ -140,10 +141,11 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp, memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps)); #ifdef CONFIG_X86_64 - savesegment(gs, p->thread.gsindex); - p->thread.gsbase = p->thread.gsindex ? 0 : current->thread.gsbase; - savesegment(fs, p->thread.fsindex); - p->thread.fsbase = p->thread.fsindex ? 0 : current->thread.fsbase; + save_fsgs(me); + p->thread.fsindex = me->thread.fsindex; + p->thread.fsbase = me->thread.fsbase; + p->thread.gsindex = me->thread.gsindex; + p->thread.gsbase = me->thread.gsbase; savesegment(es, p->thread.es); savesegment(ds, p->thread.ds); #else diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 4be88124d81ea..57cdbbb0381ac 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -346,7 +346,8 @@ unsigned long x86_fsbase_read_task(struct task_struct *task) if (task == current) fsbase = x86_fsbase_read_cpu(); - else if (task->thread.fsindex == 0) + else if (static_cpu_has(X86_FEATURE_FSGSBASE) || +(task->thread.fsindex == 0)) fsbase = task->thread.fsbase; else fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex); @@ -360,7 +361,8 @@ unsigned long x86_gsbase_read_task(struct task_struct *task) if (task == current) gsbase = x86_gsbase_read_cpu_inactive(); - else if (task->thread.gsindex == 0) + else if (static_cpu_has(X86_FEATURE_FSGSBASE) || +(task->thread.gsindex == 0)) gsbase = task->thread.gsbase; else gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex); -- 2.20.1
[PATCH v11 14/18] x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation
From: Tony Luck Before enabling FSGSBASE the kernel could safely assume that the content of GS base was a user address. Thus any speculative access as the result of a mispredicted branch controlling the execution of SWAPGS would be to a user address. So systems with speculation-proof SMAP did not need to add additional LFENCE instructions to mitigate. With FSGSBASE enabled a hostile user can set GS base to a kernel address. So they can make the kernel speculatively access data they wish to leak via a side channel. This means that SMAP provides no protection. Add FSGSBASE as an additional condition to enable the fence-based SWAPGS mitigation. Signed-off-by: Tony Luck Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen --- arch/x86/kernel/cpu/bugs.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index ed54b3b21c396..487603ea51cd1 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -450,14 +450,12 @@ static void __init spectre_v1_select_mitigation(void) * If FSGSBASE is enabled, the user can put a kernel address in * GS, in which case SMAP provides no protection. * -* [ NOTE: Don't check for X86_FEATURE_FSGSBASE until the -* FSGSBASE enablement patches have been merged. ] -* * If FSGSBASE is disabled, the user can only put a user space * address in GS. That makes an attack harder, but still * possible if there's no SMAP protection. */ - if (!smap_works_speculatively()) { + if (boot_cpu_has(X86_FEATURE_FSGSBASE) || + !smap_works_speculatively()) { /* * Mitigation can be provided from SWAPGS itself or * PTI as the CR3 write in the Meltdown mitigation -- 2.20.1
[PATCH v11 05/18] x86/entry/64: Switch CR3 before SWAPGS in paranoid entry
From: "Chang S. Bae" When FSGSBASE is enabled, the GS base handling in paranoid entry will need to retrieve the kernel GS base which requires that the kernel page table is active. As the CR3 switch to the kernel page tables (PTI is active) does not depend on kernel GS base, move the CR3 switch in front of the GS base handling. Comment the EBX content while at it. No functional change. Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen Cc: Vegard Nossum --- arch/x86/entry/entry_64.S | 31 +++ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 3adb3c8e2409b..7f27626f8426f 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1220,15 +1220,7 @@ SYM_CODE_START_LOCAL(paranoid_entry) cld PUSH_AND_CLEAR_REGS save_ret=1 ENCODE_FRAME_POINTER 8 - movl$1, %ebx - movl$MSR_GS_BASE, %ecx - rdmsr - testl %edx, %edx - js 1f /* negative -> in kernel */ - SWAPGS - xorl%ebx, %ebx -1: /* * Always stash CR3 in %r14. This value will be restored, * verbatim, at exit. Needed if paranoid_entry interrupted @@ -1238,16 +1230,31 @@ SYM_CODE_START_LOCAL(paranoid_entry) * This is also why CS (stashed in the "iret frame" by the * hardware at entry) can not be used: this may be a return * to kernel code, but with a user CR3 value. +* +* Switching CR3 does not depend on kernel GS base so it can +* be done before switching to the kernel GS base. This is +* required for FSGSBASE because the kernel GS base has to +* be retrieved from a kernel internal table. */ SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14 + /* EBX = 1 -> kernel GSBASE active, no restore required */ + movl$1, %ebx /* -* The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an -* unconditional CR3 write, even in the PTI case. So do an lfence -* to prevent GS speculation, regardless of whether PTI is enabled. +* The kernel-enforced convention is a negative GS base indicates +* a kernel value. No SWAPGS needed on entry and exit. */ - FENCE_SWAPGS_KERNEL_ENTRY + movl$MSR_GS_BASE, %ecx + rdmsr + testl %edx, %edx + jns .Lparanoid_entry_swapgs + ret +.Lparanoid_entry_swapgs: + SWAPGS + FENCE_SWAPGS_KERNEL_ENTRY + /* EBX = 0 -> SWAPGS required on exit */ + xorl%ebx, %ebx ret SYM_CODE_END(paranoid_entry) -- 2.20.1
[PATCH v11 09/18] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
From: Andi Kleen [ luto: Rename the variables from FS and GS to FSBASE and GSBASE and make safe to include on 32-bit kernels. ] Signed-off-by: Andi Kleen Signed-off-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Reviewed-by: Andy Lutomirski Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: Dave Hansen Cc: Tony Luck Cc: H. Peter Anvin Cc: Andi Kleen --- arch/x86/include/asm/fsgsbase.h | 30 ++ 1 file changed, 30 insertions(+) diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h index bca4c743de77c..fdd1177499b40 100644 --- a/arch/x86/include/asm/fsgsbase.h +++ b/arch/x86/include/asm/fsgsbase.h @@ -19,6 +19,36 @@ extern unsigned long x86_gsbase_read_task(struct task_struct *task); extern void x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase); extern void x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase); +/* Must be protected by X86_FEATURE_FSGSBASE check. */ + +static __always_inline unsigned long rdfsbase(void) +{ + unsigned long fsbase; + + asm volatile("rdfsbase %0" : "=r" (fsbase) :: "memory"); + + return fsbase; +} + +static __always_inline unsigned long rdgsbase(void) +{ + unsigned long gsbase; + + asm volatile("rdgsbase %0" : "=r" (gsbase) :: "memory"); + + return gsbase; +} + +static __always_inline void wrfsbase(unsigned long fsbase) +{ + asm volatile("wrfsbase %0" :: "r" (fsbase) : "memory"); +} + +static __always_inline void wrgsbase(unsigned long gsbase) +{ + asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory"); +} + /* Helper functions for reading/writing FS/GS base */ static inline unsigned long x86_fsbase_read_cpu(void) -- 2.20.1
[PATCH v11 08/18] x86/entry/64: Document GSBASE handling in the paranoid path
From: "Chang S. Bae" On FSGSBASE systems, the way to handle GS base in the paranoid path is different from the existing SWAPGS-based entry/exit path handling. Document the reason and what has to be done for FSGSBASE enabled systems. Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen --- Documentation/x86/entry_64.rst | 9 + 1 file changed, 9 insertions(+) diff --git a/Documentation/x86/entry_64.rst b/Documentation/x86/entry_64.rst index a48b3f6ebbe87..0499a40723af3 100644 --- a/Documentation/x86/entry_64.rst +++ b/Documentation/x86/entry_64.rst @@ -108,3 +108,12 @@ We try to only use IST entries and the paranoid entry code for vectors that absolutely need the more expensive check for the GS base - and we generate all 'normal' entry points with the regular (faster) paranoid=0 variant. + +On FSGSBASE systems, however, user space can set GS without kernel +interaction. It means the value of GS base itself does not imply anything, +whether a kernel value or a user space value. So, there is no longer a safe +way to check whether the exception is entering from user mode or kernel +mode in the paranoid entry code path. So the GS base value needs to be read +out, saved and the kernel GS base value written. On exit, the saved GS base +value needs to be restored unconditionally. The non-paranoid entry/exit +code still uses SWAPGS unconditionally as the state is known. -- 2.20.1
[PATCH v11 07/18] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit
From: "Chang S. Bae" Without FSGSBASE, user space cannot change GS base other than through a PRCTL. The kernel enforces that the user space GS base value is positive as negative values are used for detecting the kernel space GS base value in the paranoid entry code. If FSGSBASE is enabled, user space can set arbitrary GS base values without kernel intervention, including negative ones, which breaks the paranoid entry assumptions. To avoid this, paranoid entry needs to unconditionally save the current GS base value independent of the interrupted context, retrieve and write the kernel GS base and unconditionally restore the saved value on exit. The restore happens either in paranoid exit or in the special exit path of the NMI low level code. All other entry code paths which use unconditional SWAPGS are not affected as they do not depend on the actual content. The new logic for paranoid entry, when FSGSBASE is enabled, removes SWAPGS and replaces with unconditional WRGSBASE. Hence no fences are needed. Suggested-by: H. Peter Anvin Suggested-by: Andy Lutomirski Suggested-by: Thomas Gleixner Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Acked-by: Tom Lendacky Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen Cc: Tom Lendacky Cc: Vegard Nossum --- arch/x86/entry/calling.h | 6 +++ arch/x86/entry/entry_64.S | 78 ++- 2 files changed, 75 insertions(+), 9 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 0eb134e18b7a9..5f3a8ecaddc2d 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -340,6 +340,12 @@ For 32-bit we have the following conventions - kernel is built with #endif .endm +.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req + rdgsbase \save_reg + GET_PERCPU_BASE \scratch_reg + wrgsbase \scratch_reg +.endm + #endif /* CONFIG_X86_64 */ .macro STACKLEAK_ERASE diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 7f27626f8426f..a4fd01c8f2970 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -38,6 +38,7 @@ #include #include #include +#include #include #include "calling.h" @@ -1211,9 +1212,14 @@ idtentry machine_check do_mce has_error_code=0paranoid=1 #endif /* - * Save all registers in pt_regs, and switch gs if needed. - * Use slow, but surefire "are we in kernel?" check. - * Return: ebx=0: need swapgs on exit, ebx=1: otherwise + * Save all registers in pt_regs. Return GS base related information + * in EBX depending on the availability of the FSGSBASE instructions: + * + * FSGSBASER/EBX + * N0 -> SWAPGS on exit + * 1 -> no SWAPGS on exit + * + * YGS base value at entry, must be restored in paranoid_exit */ SYM_CODE_START_LOCAL(paranoid_entry) UNWIND_HINT_FUNC @@ -1238,7 +1244,29 @@ SYM_CODE_START_LOCAL(paranoid_entry) */ SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14 - /* EBX = 1 -> kernel GSBASE active, no restore required */ + /* +* Handling GS base depends on the availability of FSGSBASE. +* +* Without FSGSBASE the kernel enforces that negative GS base +* values indicate kernel GS base. With FSGSBASE no assumptions +* can be made about the GS base value when entering from user +* space. + */ + ALTERNATIVE "jmp .Lparanoid_entry_checkgs", "", X86_FEATURE_FSGSBASE + + /* +* Read the current GS base and store it in %rbx unconditionally, +* retrieve and set the current CPUs kernel GS base. The stored value +* has to be restored in paranoid_exit unconditionally. +* +* This unconditional write of GS base ensures no subsequent load +* based on a mispredicted GS base. +*/ + SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx + ret + +.Lparanoid_entry_checkgs: + /* EBX = 1 -> kernel GS base active, no restore required */ movl$1, %ebx /* * The kernel-enforced convention is a negative GS base indicates @@ -1265,10 +1293,17 @@ SYM_CODE_END(paranoid_entry) * * We may be returning to very strange contexts (e.g. very early * in syscall entry), so checking for preemption here would - * be complicated. Fortunately, we there's no good reason - * to try to handle preemption here. + * be complicated. Fortunately, there's no good reason to try + * to handle preemption here. + * + * R/EBX contains the GS base related information depending on the + * availability of the FSGSBASE instructions: + * + * FSGSBASER/EBX + * N0 -> SWAPGS on exit + * 1 -> no SWAPGS on exit * - * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it) + * YUser space GS base, must be
[PATCH v11 15/18] selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE
From: "Chang S. Bae" This validates that GS selector and base are independently preserved in ptrace commands. Suggested-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen --- tools/testing/selftests/x86/fsgsbase.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c index 950a48b2e3662..9a4349813a30a 100644 --- a/tools/testing/selftests/x86/fsgsbase.c +++ b/tools/testing/selftests/x86/fsgsbase.c @@ -465,7 +465,7 @@ static void test_ptrace_write_gsbase(void) wait(); if (WSTOPSIG(status) == SIGTRAP) { - unsigned long gs; + unsigned long gs, base; unsigned long gs_offset = USER_REGS_OFFSET(gs); unsigned long base_offset = USER_REGS_OFFSET(gs_base); @@ -481,6 +481,7 @@ static void test_ptrace_write_gsbase(void) err(1, "PTRACE_POKEUSER"); gs = ptrace(PTRACE_PEEKUSER, child, gs_offset, NULL); + base = ptrace(PTRACE_PEEKUSER, child, base_offset, NULL); /* * In a non-FSGSBASE system, the nonzero selector will load @@ -501,8 +502,14 @@ static void test_ptrace_write_gsbase(void) */ if (gs == 0) printf("\tNote: this is expected behavior on older kernels.\n"); + } else if (have_fsgsbase && (base != 0xFF)) { + nerrs++; + printf("[FAIL]\tGSBASE changed to %lx\n", base); } else { - printf("[OK]\tGS remained 0x%hx\n", *shared_scratch); + printf("[OK]\tGS remained 0x%hx", *shared_scratch); + if (have_fsgsbase) + printf(" and GSBASE changed to 0xFF"); + printf("\n"); } } -- 2.20.1
[PATCH v11 11/18] x86/fsgsbase/64: Use FSGSBASE in switch_to() if available
From: Andy Lutomirski With the new FSGSBASE instructions, FS/GS base can be efficiently read and written in __switch_to(). Use that capability to preserve the full state. This will enable user code to do whatever it wants with the new instructions without any kernel-induced gotchas. (There can still be architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GS base if WRGSBASE was used, but users are expected to read the CPU manual before doing things like that.) This is a considerable speedup. It seems to save about 100 cycles per context switch compared to the baseline 4.6-rc1 behavior on a Skylake laptop. [ chang: 5~10% performance improvements were seen by a context switch benchmark that ran threads with different FS/GS base values (to the baseline 4.16). ] Signed-off-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen --- arch/x86/kernel/process_64.c | 34 -- 1 file changed, 28 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index aaa65f284b9b9..e066750be89a0 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -199,8 +199,18 @@ static __always_inline void save_fsgs(struct task_struct *task) { savesegment(fs, task->thread.fsindex); savesegment(gs, task->thread.gsindex); - save_base_legacy(task, task->thread.fsindex, FS); - save_base_legacy(task, task->thread.gsindex, GS); + if (static_cpu_has(X86_FEATURE_FSGSBASE)) { + /* +* If FSGSBASE is enabled, we can't make any useful guesses +* about the base, and user code expects us to save the current +* value. Fortunately, reading the base directly is efficient. +*/ + task->thread.fsbase = rdfsbase(); + task->thread.gsbase = x86_gsbase_read_cpu_inactive(); + } else { + save_base_legacy(task, task->thread.fsindex, FS); + save_base_legacy(task, task->thread.gsindex, GS); + } } #if IS_ENABLED(CONFIG_KVM) @@ -279,10 +289,22 @@ static __always_inline void load_seg_legacy(unsigned short prev_index, static __always_inline void x86_fsgsbase_load(struct thread_struct *prev, struct thread_struct *next) { - load_seg_legacy(prev->fsindex, prev->fsbase, - next->fsindex, next->fsbase, FS); - load_seg_legacy(prev->gsindex, prev->gsbase, - next->gsindex, next->gsbase, GS); + if (static_cpu_has(X86_FEATURE_FSGSBASE)) { + /* Update the FS and GS selectors if they could have changed. */ + if (unlikely(prev->fsindex || next->fsindex)) + loadseg(FS, next->fsindex); + if (unlikely(prev->gsindex || next->gsindex)) + loadseg(GS, next->gsindex); + + /* Update the bases. */ + wrfsbase(next->fsbase); + x86_gsbase_write_cpu_inactive(next->gsbase); + } else { + load_seg_legacy(prev->fsindex, prev->fsbase, + next->fsindex, next->fsbase, FS); + load_seg_legacy(prev->gsindex, prev->gsbase, + next->gsindex, next->gsbase, GS); + } } static unsigned long x86_fsgsbase_read_task(struct task_struct *task, -- 2.20.1
[PATCH v11 17/18] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
From: Andi Kleen The kernel needs to explicitly enable FSGSBASE. So, the application needs to know if it can safely use these instructions. Just looking at the CPUID bit is not enough because it may be running in a kernel that does not enable the instructions. One way for the application would be to just try and catch the SIGILL. But that is difficult to do in libraries which may not want to overwrite the signal handlers of the main application. Enumerate the enabled FSGSBASE capability in bit 1 of AT_HWCAP2 in the ELF aux vector. AT_HWCAP2 is already used by PPC for similar purposes. The application can access it open coded or by using the getauxval() function in newer versions of glibc. Signed-off-by: Andi Kleen Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen --- arch/x86/include/uapi/asm/hwcap2.h | 3 +++ arch/x86/kernel/cpu/common.c | 4 +++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/uapi/asm/hwcap2.h b/arch/x86/include/uapi/asm/hwcap2.h index 8b2effe6efb82..5fdfcb47000f9 100644 --- a/arch/x86/include/uapi/asm/hwcap2.h +++ b/arch/x86/include/uapi/asm/hwcap2.h @@ -5,4 +5,7 @@ /* MONITOR/MWAIT enabled in Ring 3 */ #define HWCAP2_RING3MWAIT (1 << 0) +/* Kernel allows FSGSBASE instructions available in Ring 3 */ +#define HWCAP2_FSGSBASEBIT(1) + #endif diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 0d480cbadc7dc..b5a086ea34258 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1495,8 +1495,10 @@ static void identify_cpu(struct cpuinfo_x86 *c) setup_umip(c); /* Enable FSGSBASE instructions if available. */ - if (cpu_has(c, X86_FEATURE_FSGSBASE)) + if (cpu_has(c, X86_FEATURE_FSGSBASE)) { cr4_set_bits(X86_CR4_FSGSBASE); + elf_hwcap2 |= HWCAP2_FSGSBASE; + } /* * The vendor-specific functions might have changed features. -- 2.20.1
[PATCH v11 18/18] Documentation/x86/64: Add documentation for GS/FS addressing mode
From: Thomas Gleixner Explain how the GS/FS based addressing can be utilized in user space applications along with the differences between the generic prctl() based GS/FS base control and the FSGSBASE version available on newer CPUs. Originally-by: Andi Kleen Signed-off-by: Thomas Gleixner Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Reviewed-by: Randy Dunlap Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen Cc: Randy Dunlap Cc: Jonathan Corbet --- Documentation/x86/x86_64/fsgs.rst | 199 + Documentation/x86/x86_64/index.rst | 1 + 2 files changed, 200 insertions(+) create mode 100644 Documentation/x86/x86_64/fsgs.rst diff --git a/Documentation/x86/x86_64/fsgs.rst b/Documentation/x86/x86_64/fsgs.rst new file mode 100644 index 0..50960e09e1f66 --- /dev/null +++ b/Documentation/x86/x86_64/fsgs.rst @@ -0,0 +1,199 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Using FS and GS segments in user space applications +=== + +The x86 architecture supports segmentation. Instructions which access +memory can use segment register based addressing mode. The following +notation is used to address a byte within a segment: + + Segment-register:Byte-address + +The segment base address is added to the Byte-address to compute the +resulting virtual address which is accessed. This allows to access multiple +instances of data with the identical Byte-address, i.e. the same code. The +selection of a particular instance is purely based on the base-address in +the segment register. + +In 32-bit mode the CPU provides 6 segments, which also support segment +limits. The limits can be used to enforce address space protections. + +In 64-bit mode the CS/SS/DS/ES segments are ignored and the base address is +always 0 to provide a full 64bit address space. The FS and GS segments are +still functional in 64-bit mode. + +Common FS and GS usage +-- + +The FS segment is commonly used to address Thread Local Storage (TLS). FS +is usually managed by runtime code or a threading library. Variables +declared with the '__thread' storage class specifier are instantiated per +thread and the compiler emits the FS: address prefix for accesses to these +variables. Each thread has its own FS base address so common code can be +used without complex address offset calculations to access the per thread +instances. Applications should not use FS for other purposes when they use +runtimes or threading libraries which manage the per thread FS. + +The GS segment has no common use and can be used freely by +applications. GCC and Clang support GS based addressing via address space +identifiers. + +Reading and writing the FS/GS base address +-- + +There exist two mechanisms to read and write the FS/GS base address: + + - the arch_prctl() system call + + - the FSGSBASE instruction family + +Accessing FS/GS base with arch_prctl() +-- + + The arch_prctl(2) based mechanism is available on all 64-bit CPUs and all + kernel versions. + + Reading the base: + + arch_prctl(ARCH_GET_FS, ); + arch_prctl(ARCH_GET_GS, ); + + Writing the base: + + arch_prctl(ARCH_SET_FS, fsbase); + arch_prctl(ARCH_SET_GS, gsbase); + + The ARCH_SET_GS prctl may be disabled depending on kernel configuration + and security settings. + +Accessing FS/GS base with the FSGSBASE instructions +--- + + With the Ivy Bridge CPU generation Intel introduced a new set of + instructions to access the FS and GS base registers directly from user + space. These instructions are also supported on AMD Family 17H CPUs. The + following instructions are available: + + === === + RDFSBASE %reg Read the FS base register + RDGSBASE %reg Read the GS base register + WRFSBASE %reg Write the FS base register + WRGSBASE %reg Write the GS base register + === === + + The instructions avoid the overhead of the arch_prctl() syscall and allow + more flexible usage of the FS/GS addressing modes in user space + applications. This does not prevent conflicts between threading libraries + and runtimes which utilize FS and applications which want to use it for + their own purpose. + +FSGSBASE instructions enablement + + The instructions are enumerated in CPUID leaf 7, bit 0 of EBX. If + available /proc/cpuinfo shows 'fsgsbase' in the flag entry of the CPUs. + + The availability of the instructions does not enable them + automatically. The kernel has to enable them explicitly in CR4. The + reason for this is that older kernels make assumptions about the values in + the GS register and enforce them when GS base is set via + arch_prctl(). Allowing user
[PATCH v11 06/18] x86/entry/64: Introduce the FIND_PERCPU_BASE macro
From: "Chang S. Bae" GS base is used to find per-CPU data in the kernel. But when GS base is unknown, the per-CPU base can be found from the per_cpu_offset table with a CPU NR. The CPU NR is extracted from the limit field of the CPUNODE entry in GDT, or by the RDPID instruction. This is a prerequisite for using FSGSBASE in the low level entry code. Also, add the GAS-compatible RDPID macro as binutils 2.21 does not support it. Support is added in version 2.27. Suggested-by: H. Peter Anvin Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen Cc: Vegard Nossum --- arch/x86/entry/calling.h| 34 ++ arch/x86/include/asm/inst.h | 15 +++ 2 files changed, 49 insertions(+) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 0789e13ece905..0eb134e18b7a9 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -6,6 +6,7 @@ #include #include #include +#include /* @@ -347,6 +348,39 @@ For 32-bit we have the following conventions - kernel is built with #endif .endm +#ifdef CONFIG_SMP + +/* + * CPU/node NR is loaded from the limit (size) field of a special segment + * descriptor entry in GDT. + */ +.macro LOAD_CPU_AND_NODE_SEG_LIMIT reg:req + movq$__CPUNODE_SEG, \reg + lsl \reg, \reg +.endm + +/* + * Fetch the per-CPU GS base value for this processor and put it in @reg. + * We normally use %gs for accessing per-CPU data, but we are setting up + * %gs here and obviously can not use %gs itself to access per-CPU data. + */ +.macro GET_PERCPU_BASE reg:req + ALTERNATIVE \ + "LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \ + "RDPID \reg", \ + X86_FEATURE_RDPID + andq$VDSO_CPUNODE_MASK, \reg + movq__per_cpu_offset(, \reg, 8), \reg +.endm + +#else + +.macro GET_PERCPU_BASE reg:req + movqpcpu_unit_offsets(%rip), \reg +.endm + +#endif /* CONFIG_SMP */ + /* * This does 'call enter_from_user_mode' unless we can avoid it based on * kernel config or using the static jump infrastructure. diff --git a/arch/x86/include/asm/inst.h b/arch/x86/include/asm/inst.h index f5a796da07f88..d063841a17e39 100644 --- a/arch/x86/include/asm/inst.h +++ b/arch/x86/include/asm/inst.h @@ -306,6 +306,21 @@ .endif MODRM 0xc0 movq_r64_xmm_opd1 movq_r64_xmm_opd2 .endm + +.macro RDPID opd + REG_TYPE rdpid_opd_type \opd + .if rdpid_opd_type == REG_TYPE_R64 + R64_NUM rdpid_opd \opd + .else + R32_NUM rdpid_opd \opd + .endif + .byte 0xf3 + .if rdpid_opd > 7 + PFX_REX rdpid_opd 0 + .endif + .byte 0x0f, 0xc7 + MODRM 0xc0 rdpid_opd 0x7 +.endm #endif #endif -- 2.20.1
[PATCH v11 02/18] selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write
From: "Chang S. Bae" The test validates that the selector is not changed when a ptracer writes the ptracee's GS base. Originally-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen --- tools/testing/selftests/x86/fsgsbase.c | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c index 15a329da59fa3..950a48b2e3662 100644 --- a/tools/testing/selftests/x86/fsgsbase.c +++ b/tools/testing/selftests/x86/fsgsbase.c @@ -465,7 +465,7 @@ static void test_ptrace_write_gsbase(void) wait(); if (WSTOPSIG(status) == SIGTRAP) { - unsigned long gs, base; + unsigned long gs; unsigned long gs_offset = USER_REGS_OFFSET(gs); unsigned long base_offset = USER_REGS_OFFSET(gs_base); @@ -481,7 +481,6 @@ static void test_ptrace_write_gsbase(void) err(1, "PTRACE_POKEUSER"); gs = ptrace(PTRACE_PEEKUSER, child, gs_offset, NULL); - base = ptrace(PTRACE_PEEKUSER, child, base_offset, NULL); /* * In a non-FSGSBASE system, the nonzero selector will load @@ -489,11 +488,21 @@ static void test_ptrace_write_gsbase(void) * selector value is changed or not by the GSBASE write in * a ptracer. */ - if (gs == 0 && base == 0xFF) { - printf("[OK]\tGS was reset as expected\n"); - } else { + if (gs != *shared_scratch) { nerrs++; - printf("[FAIL]\tGS=0x%lx, GSBASE=0x%lx (should be 0, 0xFF)\n", gs, base); + printf("[FAIL]\tGS changed to %lx\n", gs); + + /* +* On older kernels, poking a nonzero value into the +* base would zero the selector. On newer kernels, +* this behavior has changed -- poking the base +* changes only the base and, if FSGSBASE is not +* available, this may not effect. +*/ + if (gs == 0) + printf("\tNote: this is expected behavior on older kernels.\n"); + } else { + printf("[OK]\tGS remained 0x%hx\n", *shared_scratch); } } -- 2.20.1
[PATCH v11 01/18] x86/ptrace: Prevent ptrace from clearing the FS/GS selector
From: "Chang S. Bae" When a ptracer writes a ptracee's FS/GS base with a different value, the selector is also cleared. While this behavior is incorrect as the selector should be preserved, most userspace applications did not notice that as they do not use non-zero segments to begin with. Instead, with this patch, when a tracee sets the base we will let it do so without clearing the selector. The change above means that a tracee that already has a selector set will fail in an attempt to set the base - the change won't stick and the value will be instead based on the value of the selector. As with the above, we haven't found userspace that would be affected by this change. Suggested-by: Andy Lutomirski Signed-off-by: Chang S. Bae [sasha: rewrite commit message] Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen --- arch/x86/kernel/ptrace.c | 17 ++--- 1 file changed, 2 insertions(+), 15 deletions(-) diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index f0e1ddbc2fd78..cc56efb75d275 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -380,25 +380,12 @@ static int putreg(struct task_struct *child, case offsetof(struct user_regs_struct,fs_base): if (value >= TASK_SIZE_MAX) return -EIO; - /* -* When changing the FS base, use do_arch_prctl_64() -* to set the index to zero and to set the base -* as requested. -* -* NB: This behavior is nonsensical and likely needs to -* change when FSGSBASE support is added. -*/ - if (child->thread.fsbase != value) - return do_arch_prctl_64(child, ARCH_SET_FS, value); + x86_fsbase_write_task(child, value); return 0; case offsetof(struct user_regs_struct,gs_base): - /* -* Exactly the same here as the %fs handling above. -*/ if (value >= TASK_SIZE_MAX) return -EIO; - if (child->thread.gsbase != value) - return do_arch_prctl_64(child, ARCH_SET_GS, value); + x86_gsbase_write_task(child, value); return 0; #endif } -- 2.20.1
[PATCH v11 03/18] x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
From: Andy Lutomirski This is temporary. It will allow the next few patches to be tested incrementally. Setting unsafe_fsgsbase is a root hole. Don't do it. Signed-off-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Reviewed-by: Tony Luck Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Dave Hansen Cc: Tony Luck Cc: Andi Kleen --- .../admin-guide/kernel-parameters.txt | 3 +++ arch/x86/kernel/cpu/common.c | 24 +++ 2 files changed, 27 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 7bc83f3d9bdfe..af3aaade195b8 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3033,6 +3033,9 @@ no5lvl [X86-64] Disable 5-level paging mode. Forces kernel to use 4-level paging instead. + unsafe_fsgsbase [X86] Allow FSGSBASE instructions. This will be + replaced with a nofsgsbase flag. + no_console_suspend [HW] Never suspend the console Disable suspending of consoles during suspend and diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index bed0cb83fe245..4224760c74e27 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -418,6 +418,22 @@ static void __init setup_cr_pinning(void) static_key_enable(_pinning.key); } +/* + * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are + * updated. This allows us to get the kernel ready incrementally. + * + * Once all the pieces are in place, these will go away and be replaced with + * a nofsgsbase chicken flag. + */ +static bool unsafe_fsgsbase; + +static __init int setup_unsafe_fsgsbase(char *arg) +{ + unsafe_fsgsbase = true; + return 1; +} +__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase); + /* * Protection Keys are not available in 32-bit mode. */ @@ -1478,6 +1494,14 @@ static void identify_cpu(struct cpuinfo_x86 *c) setup_smap(c); setup_umip(c); + /* Enable FSGSBASE instructions if available. */ + if (cpu_has(c, X86_FEATURE_FSGSBASE)) { + if (unsafe_fsgsbase) + cr4_set_bits(X86_CR4_FSGSBASE); + else + clear_cpu_cap(c, X86_FEATURE_FSGSBASE); + } + /* * The vendor-specific functions might have changed features. * Now we do "generic changes." -- 2.20.1
[PATCH v11 00/18] Enable FSGSBASE instructions
Benefits: Currently a user process that wishes to read or write the FS/GS base must make a system call. But recent X86 processors have added new instructions for use in 64-bit mode that allow direct access to the FS and GS segment base addresses. The operating system controls whether applications can use these instructions with a %cr4 control bit. In addition to benefits to applications, performance improvements to the OS context switch code are possible by making use of these instructions. A third party reported out promising performance numbers out of their initial benchmarking of the previous version of this patch series [9]. Enablement check: The kernel provides information about the enabled state of FSGSBASE to applications using the ELF_AUX vector. If the HWCAP2_FSGSBASE bit is set in the AUX vector, the kernel has FSGSBASE instructions enabled and applications can use them. Kernel changes: Major changes made in the kernel are in context switch, paranoid path, and ptrace. In a context switch, a task's FS/GS base will be secured regardless of its selector. In the paranoid path, GS base is unconditionally overwritten to the kernel GS base on entry and the original GS base is restored on exit. Ptrace includes divergence of FS/GS index and base values. Security: For mitigating the Spectre v1 SWAPGS issue, LFENCE instructions were added on most kernel entries. Those patches are dependent on previous behaviors that users couldn't load a kernel address into the GS base. These patches change that assumption since the user can load any address into GS base. The changes to the kernel entry path in this patch series take account of the SWAPGS issue. Changes from v10: - Rewrite the commit message for patch #1. - Document communication/acks from userspace projects that are potentially affected by this. Andi Kleen (2): x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 Andy Lutomirski (4): x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE x86/entry/64: Clean up paranoid exit x86/fsgsbase/64: Use FSGSBASE in switch_to() if available x86/fsgsbase/64: Enable FSGSBASE on 64bit by default and add a chicken bit Chang S. Bae (9): x86/ptrace: Prevent ptrace from clearing the FS/GS selector selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write x86/entry/64: Switch CR3 before SWAPGS in paranoid entry x86/entry/64: Introduce the FIND_PERCPU_BASE macro x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit x86/entry/64: Document GSBASE handling in the paranoid path x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions x86/fsgsbase/64: Use FSGSBASE instructions on thread copy and ptrace selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE Sasha Levin (1): x86/fsgsbase/64: move save_fsgs to header file Thomas Gleixner (1): Documentation/x86/64: Add documentation for GS/FS addressing mode Tony Luck (1): x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation .../admin-guide/kernel-parameters.txt | 2 + Documentation/x86/entry_64.rst| 9 + Documentation/x86/x86_64/fsgs.rst | 199 ++ Documentation/x86/x86_64/index.rst| 1 + arch/x86/entry/calling.h | 40 arch/x86/entry/entry_64.S | 131 +--- arch/x86/include/asm/fsgsbase.h | 45 +++- arch/x86/include/asm/inst.h | 15 ++ arch/x86/include/uapi/asm/hwcap2.h| 3 + arch/x86/kernel/cpu/bugs.c| 6 +- arch/x86/kernel/cpu/common.c | 22 ++ arch/x86/kernel/process.c | 10 +- arch/x86/kernel/process.h | 68 ++ arch/x86/kernel/process_64.c | 142 +++-- arch/x86/kernel/ptrace.c | 17 +- tools/testing/selftests/x86/fsgsbase.c| 24 ++- 16 files changed, 605 insertions(+), 129 deletions(-) create mode 100644 Documentation/x86/x86_64/fsgs.rst -- 2.20.1
Re: [PATCH v1 2/3] armv8: gpio: add gpio feature
On Sat, May 09, 2020 at 05:33:15PM +0200, Andrew Lunn wrote: > On Sat, May 09, 2020 at 06:39:55PM +0800, Hui Song wrote: > > From: "hui.song" > > > > add one struct mpc8xxx_gpio_plat to enable gpio feature. > > > > Signed-off-by: hui.song > > --- > > .../include/asm/arch-fsl-layerscape/gpio.h| 22 +++ > > 1 file changed, 22 insertions(+) > > create mode 100644 arch/arm/include/asm/arch-fsl-layerscape/gpio.h > > > > diff --git a/arch/arm/include/asm/arch-fsl-layerscape/gpio.h > > b/arch/arm/include/asm/arch-fsl-layerscape/gpio.h > > new file mode 100644 > > index 00..d8dd750a72 > > --- /dev/null > > +++ b/arch/arm/include/asm/arch-fsl-layerscape/gpio.h > > @@ -0,0 +1,22 @@ > > +/* SPDX-License-Identifier: GPL-2.0+ */ > > +/* > > + * Copyright 2014 Freescale Semiconductor, Inc. > > + */ > > + > > +/* > > + * Dummy header file to enable CONFIG_OF_CONTROL. > > + * If CONFIG_OF_CONTROL is enabled, lib/fdtdec.c is compiled. > > + * It includes via , so those SoCs that > > enable > > + * OF_CONTROL must have arch/gpio.h. > > + */ > > This does not seem right. You would expect each sub arch to have a > subdirectory in arch/arm/include/asm/ when in fact none do. >From what I can tell, these patches are not for the kernel. The filenames don't match th kernel layout. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up
Re: [PATCH net] net: broadcom: Imply BROADCOM_PHY for BCMGENET
On 5/9/2020 12:38 AM, Geert Uytterhoeven wrote: Hi Florian, Thanks for your patch! On Sat, May 9, 2020 at 12:32 AM Florian Fainelli wrote: The GENET controller on the Raspberry Pi 4 (2711) is typically interfaced with an external Broadcom PHY via a RGMII electrical interface. To make sure that delays are properly configured at the PHY side, ensure that we get a chance to have the dedicated Broadcom PHY driver (CONFIG_BROADCOM_PHY) enabled for this to happen. I guess it can be interfaced to a different external PHY, too? Yes, although this has not happened yet to the best of my knowledge. Fixes: 402482a6a78e ("net: bcmgenet: Clear ID_MODE_DIS in EXT_RGMII_OOB_CTRL when not needed") Reported-by: Marek Szyprowski Signed-off-by: Florian Fainelli --- a/drivers/net/ethernet/broadcom/Kconfig +++ b/drivers/net/ethernet/broadcom/Kconfig @@ -69,6 +69,7 @@ config BCMGENET select BCM7XXX_PHY select MDIO_BCM_UNIMAC select DIMLIB + imply BROADCOM_PHY if ARCH_BCM2835 Which means support for the BROADCOM_PHY is always included on ARCH_BCM2835, even if a different PHY is used? It is included by default on and can be deselected if needed, which is exactly what we want here, a sane default, but without the inflexibility of "select". help This driver supports the built-in Ethernet MACs found in the Broadcom BCM7xxx Set Top Box family chipset. Gr{oetje,eeting}s, Geert -- Florian
Re: [PATCH] staging: vc04_services: interface: vchiq_arm: vchiq_connected.c: Block comments should align the * on each line
On Sat, May 09, 2020 at 02:07:14PM +0100, John Oldman wrote: > Coding style issue Your subject line needs to be much shorter, don't you think? Please fix up and resend. greg k-h
Re: [PATCH v2] usb: raw-gadget: fix gadget endpoint selection
On Sat, May 09, 2020 at 11:02:13AM +0300, Felipe Balbi wrote: > > Hi, > > Andrey Konovalov writes: > >> here you're changing userspace ABI. Aren't we going to possibly break > >> some existing applications? > > > > Hi Felipe, > > > > I've been working on tests for Raw Gadget for the last few weeks [1], > > which revealed a few problems with the interface. This isn't yet > > included into any released kernel, so my hope that changing the ABI is > > OK during the rc stage. > > Fair enough. If that's okay with Greg, it's okay with me, but then how > do we include it into the -rc seen as it's not really a fix? > > Greg, are you okay with me including such large patches during the -rc? > They essentially add new IOCTLs to the raw-gadget interface. Yes, as the driver only went in for -rc1, it's fine to add fixes like this so late as we don't want to ship a -final with it in broken form. thanks, greg k-h
Re: [PATCH] staging: vt6656: vt6655: clean Makefiles
On Sat, May 09, 2020 at 11:07:27AM +0200, Matej Dujava wrote: > This patch is removing CFLAGS that are defining flags that are not used. You are also modifying the indentation and moving lines around for no reason :( Please only do one thing for a patch, and always describe everything you do in the changelog text. Can you fix this up and send a v2? thanks, greg k-h
Re: [PATCH] Driver: Adding helper macro for platform_driver boilerplate.
On Sat, May 09, 2020 at 03:37:16PM +0530, Harshal Chaudhari wrote: > From: Harshal > > For simple module that contain a single platform_driver without any > additional setup code then ends up being a block > of duplicated boilerplate. > > This patch add a new micro, module_platform_driver(), which replace the > module_init()/module_exit() registrations > with template functions. Please properly wrap your changelog text. And fixup the Subject line to match other patches for this driver. And the text needs to be revised a bit to say what this really is doing, as what you describe is not correct. > > Signed-off-by: harshal chaudhari Finally, this does not match your "From:" line for the patch, which means I couldn't take it even if all of the above was correct :( Please fix up and resend. thanks, greg k-h
[PATCH fixes] powerpc/40x: Make more space for system call exception
When CONFIG_VIRT_CPU_ACCOUNTING is selected, system call exception handler doesn't fit below 0xd00 and build fails. As exception 0xd00 doesn't exist and is never generated by 40x, comment it out in order to get more space for system call exception. Reported-by: kbuild test robot Fixes: 9e27086292aa ("powerpc/32: Warn and return ENOSYS on syscalls from kernel") Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_40x.S | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S index 9bb663977e84..2cec543c38f0 100644 --- a/arch/powerpc/kernel/head_40x.S +++ b/arch/powerpc/kernel/head_40x.S @@ -344,8 +344,9 @@ _ENTRY(saved_ksp_limit) /* 0x0C00 - System Call Exception */ START_EXCEPTION(0x0C00, SystemCall) SYSCALL_ENTRY 0xc00 +/* Trap_0D is commented out to get more space for system call exception */ - EXCEPTION(0x0D00, Trap_0D, unknown_exception, EXC_XFER_STD) +/* EXCEPTION(0x0D00, Trap_0D, unknown_exception, EXC_XFER_STD) */ EXCEPTION(0x0E00, Trap_0E, unknown_exception, EXC_XFER_STD) EXCEPTION(0x0F00, Trap_0F, unknown_exception, EXC_XFER_STD) -- 2.25.0
Re: [PATCH -next v2] locking/osq_lock: annotate a data race in osq_lock
> On May 9, 2020, at 12:12 PM, Paul E. McKenney wrote: > > Ah, and I forgot to ask. Why "if (data_race(prev->next == node)" instead > of "if (data_race(prev->next) == node"? I think the one you suggested is slightly better to point out the exact race. Do you want me to resend or you could squash it instead?
Re: [PATCH] net/sonic: Fix some resource leaks in error handling paths
Le 09/05/2020 à 03:54, Jakub Kicinski a écrit : On Fri, 8 May 2020 19:25:57 +0200 Christophe JAILLET wrote: @@ -527,8 +531,9 @@ static int mac_sonic_platform_remove(struct platform_device *pdev) struct sonic_local* lp = netdev_priv(dev); unregister_netdev(dev); - dma_free_coherent(lp->device, SIZEOF_SONIC_DESC * SONIC_BUS_SCALE(lp->dma_bitmode), - lp->descriptors, lp->descriptors_laddr); + dma_free_coherent(lp->device, + SIZEOF_SONIC_DESC * SONIC_BUS_SCALE(lp->dma_bitmode), + lp->descriptors, lp->descriptors_laddr); free_netdev(dev); return 0; This is a white-space only change, right? Since this is a fix we should avoid making cleanups which are not strictly necessary. Right. The reason of this clean-up is that I wanted to avoid a checkpatch warning with the proposed patch and I felt that having the same layout in the error handling path of the probe function and in the remove function was clearer. So I updated also the remove function. Fell free to ignore this hunk if not desired. I will not sent a V2 only for that. CJ
Re: [PATCH 01/15] taint: add module firmware crash taint support
On Sat, May 09, 2020 at 11:18:29AM -0400, Rafael Aquini wrote: > We are still missing the documentation bits for this > new flag, though. Ah yeah sorry about that. > How about having a blurb similar to: > > diff --git a/Documentation/admin-guide/tainted-kernels.rst > b/Documentation/admin-guide/tainted-kernels.rst > index 71e9184a9079..5c6a9e2478b0 100644 > --- a/Documentation/admin-guide/tainted-kernels.rst > +++ b/Documentation/admin-guide/tainted-kernels.rst > @@ -100,6 +100,7 @@ Bit Log Number Reason that got the kernel tainted > 15 _/K 32768 kernel has been live patched > 16 _/X 65536 auxiliary taint, defined for and used by distros > 17 _/T 131072 kernel was built with the struct randomization plugin > + 18 _/Q 262144 driver firmware crash annotation > === === == > > Note: The character ``_`` is representing a blank in this table to make > reading > @@ -162,3 +163,7 @@ More detailed explanation for tainting > produce extremely unusual kernel structure layouts (even performance > pathological ones), which is important to know when debugging. Set at > build time. > + > + 18) ``Q`` Device drivers might annotate the kernel with this taint, in cases > + their firmware might have crashed leaving the driver in a crippled and > + potentially useless state. Sure, I'll modify it a bit to add the use case to help with support issues, ie, to help rule out firmware issues. I'm starting to think that to make this even more usesul later we may want to add a uevent to add_taint() so that userspace can decide to look into this, ignore it, or report something to the user, say on their desktop. Luis
Re: [EXT] [PATCH 09/15] qed: use new module_firmware_crashed()
On Sat, May 09, 2020 at 09:32:51AM +0300, Igor Russkikh wrote: > > > This makes use of the new module_firmware_crashed() to help > > annotate when firmware for device drivers crash. When firmware > > crashes devices can sometimes become unresponsive, and recovery > > sometimes requires a driver unload / reload and in the worst cases > > a reboot. > > > > Using a taint flag allows us to annotate when this happens clearly. > > > > Cc: Ariel Elior > > Cc: gr-everest-linux...@marvell.com > > Signed-off-by: Luis Chamberlain > > --- > > drivers/net/ethernet/qlogic/qed/qed_debug.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c > > b/drivers/net/ethernet/qlogic/qed/qed_debug.c > > index f4eebaabb6d0..9cc6287b889b 100644 > > --- a/drivers/net/ethernet/qlogic/qed/qed_debug.c > > +++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c > > @@ -7854,6 +7854,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void > > *buffer) > > REGDUMP_HEADER_SIZE, > > _size); > > if (!rc) { > > + module_firmware_crashed(); > > *(u32 *)((u8 *)buffer + offset) = > > qed_calc_regdump_header(cdev, > > PROTECTION_OVERRIDE, > > cur_engine, > > @@ -7869,6 +7870,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void > > *buffer) > > rc = qed_dbg_fw_asserts(cdev, (u8 *)buffer + offset + > > REGDUMP_HEADER_SIZE, > > _size); > > if (!rc) { > > + module_firmware_crashed(); > > *(u32 *)((u8 *)buffer + offset) = > > qed_calc_regdump_header(cdev, FW_ASSERTS, > > cur_engine, > > feature_size, > > @@ -7906,6 +7908,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void > > *buffer) > > rc = qed_dbg_grc(cdev, (u8 *)buffer + offset + > > REGDUMP_HEADER_SIZE, _size); > > if (!rc) { > > + module_firmware_crashed(); > > *(u32 *)((u8 *)buffer + offset) = > > qed_calc_regdump_header(cdev, GRC_DUMP, > > cur_engine, > > > Hi Luis, > > qed_dbg_all_data is being used to gather debug dump from device. Failures > inside it may happen due to various reasons, but they normally do not indicate > FW failure. > > So I think its not a good place to insert this call. > Its hard to find exact good place to insert it in qed. Is there a way to check if what happened was indeed a fw crash? > One more thing is that AFAIU taint flag gets permanent on kernel, but for > example our device can recover itself from some FW crashes, thus it'd be > transparent for user. Similar things are *supposed* to recoverable with other device, however this can also sometimes lead to a situation where devices are not usable anymore, and require a full driver unload / load. > Whats the logical purpose of module_firmware_crashed? Does it mean fatal > unrecoverable error on device? Its just to annotate on the module and kernel that this has happened. I take it you may agree that, firmware crashing *often* is not good design, and these issues should be reported to / fixed by vendors. In cases where driver bugs are reported it is good to see if a firmware crash has happened before, so that during analysis this is ruled out. Luis
Re: [PATCH] usb: roles: Switch on role-switch uevent reporting
On 09/05/2020 04:24, Wen Yang wrote: 在 2020/5/9 上午12:29, Bryan O'Donoghue 写道: Right now we don't report to user-space a role switch when doing a usb_role_switch_set_role() despite having registered the uevent callbacks. This patch switches on the notifications allowing user-space to see role-switch change notifications and subsequently determine the current controller data-role. example: PFX=/devices/platform/soc/78d9000.usb/ci_hdrc.0 root@somebox# udevadm monitor -p KERNEL[49.894994] change $PFX/usb_role/ci_hdrc.0-role-switch (usb_role) ACTION=change DEVPATH=$PFX/usb_role/ci_hdrc.0-role-switch SUBSYSTEM=usb_role DEVTYPE=usb_role_switch USB_ROLE_SWITCH=ci_hdrc.0-role-switch SEQNUM=2432 Cc: Heikki Krogerus Cc: Greg Kroah-Hartman Cc: Chunfeng Yun Cc: Suzuki K Poulose Cc: Alexandre Belloni Cc: Wen Yang Cc: chenqiwu Cc: linux-kernel@vger.kernel.org Signed-off-by: Bryan O'Donoghue --- drivers/usb/roles/class.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/usb/roles/class.c b/drivers/usb/roles/class.c index 5b17709821df..27d92af29635 100644 --- a/drivers/usb/roles/class.c +++ b/drivers/usb/roles/class.c @@ -49,8 +49,10 @@ int usb_role_switch_set_role(struct usb_role_switch *sw, enum usb_role role) mutex_lock(>lock); ret = sw->set(sw, role); - if (!ret) + if (!ret) { sw->role = role; + kobject_uevent(>dev.kobj, KOBJ_CHANGE); + } mutex_unlock(>lock); Hi, we may also need to deal with the return value of kobject_uevent(). For an KOBJ_ADD you'd return an error. grep -r "= kobject_uevent(" * drivers/misc/cxl/sysfs.c: rc = kobject_uevent(>kobj, KOBJ_ADD); drivers/uio/uio.c: ret = kobject_uevent(>kobj, KOBJ_ADD); drivers/uio/uio.c: ret = kobject_uevent(>kobj, KOBJ_ADD); drivers/visorbus/visorchipset.c: res = kobject_uevent(_dev->acpi_device->dev.kobj, KOBJ_ONLINE); drivers/visorbus/visorchipset.c: int res = kobject_uevent(_dev->acpi_device->dev.kobj, For a KOBJ_CHANGE I guess we could print an error if (kobject_uevent(>dev.kobj, KOBJ_CHANGE) dev_err(>dev, "failed to signal USB role-switch uevent\n"); Nobody else seems that bothered about it. grep -r "if (kobject_uevent(" * Should we move it under the line mutex_unlock(>lock)? I think probably not. the mutex serializes the notification. In theory outside the mutex you could get an out-of-order notification. The main reason I put it where it is, is we already test ret and should only notify the change, when the role-switch has suceeded. As I say, in theory anyway, the mutex enforces the signalling, whatever about the reception, of the role-switch change, so IMO inside the bounds of the mutex is the right place to put it. --- bod
Re: [PATCH v5 0/4] Introduce the for_each_set_clump macro
On Tue, May 5, 2020 at 8:24 PM William Breathitt Gray wrote: > > On Tue, May 05, 2020 at 04:51:56PM +0300, Andy Shevchenko wrote: > > On Mon, May 4, 2020 at 5:41 PM William Breathitt Gray > > wrote: > > > On Mon, May 04, 2020 at 02:41:09PM +0300, Andy Shevchenko wrote: > > > > On Sun, May 03, 2020 at 04:38:36AM +0530, Syed Nayyar Waris wrote: > > > > ... > > > > > > Looking into the last patches where we have examples I still do not see > > > > a > > > > benefit of variadic clump sizes. power of 2 sizes would make sense (and > > > > be > > > > optimized accordingly (64-bit, 32-bit). > > > > > > > > -- > > > > With Best Regards, > > > > Andy Shevchenko > > > > > > There is of course benefit in defining for_each_set_clump with clump > > > sizes of powers of 2 (we can optimize for 32 and 64 bit sizes and avoid > > > boundary checks that we know will not occur), but at the very least the > > > variable size bitmap_set_value and bitmap_get_value provide significant > > > benefit for the readability of the gpio-xilinx code: > > > > > > bitmap_set_value(old, state[0], 0, width[0]); > > > bitmap_set_value(old, state[1], width[0], width[1]); > > > ... > > > state[0] = bitmap_get_value(new, 0, width[0]); > > > state[1] = bitmap_get_value(new, width[0], width[1]); > > > > > > These lines are simple and clear to read: we know immediately what they > > > do. But if we did not have bitmap_set_value/bitmap_get_value, we'd have > > > to use several bitwise operations for each line; the obfuscation of the > > > code would be an obvious hinderance here. > > > > Do I understand correctly that width[0] and width[1] may not be power > > of two and it's actually the case? > > > > -- > > With Best Regards, > > Andy Shevchenko > > I'm under the impression that width[0] and width[1] are arbitrarily > chosen by the user and could be any integer. I have never used this > hardware so I'm hoping one of the gpio-xilinx or GPIO subsystem > maintainers in this thread will respond with some guidance. > > If the values of width[0] and width[1] are restricted to powers of 2, > then I agree that there is no need for generic bitmap_set_value and > bitmap_get_value functions and we can instead use more optimized power > of 2 versions. > > William Breathitt Gray Regarding the question that whether width[0] and width[1] can have any value or they are restricted to power-of-2. Referring to the document (This xilinx GPIO IP was mentioned in the gpio-xilinx.c file): https://www.xilinx.com/support/documentation/ip_documentation/axi_gpio/v2_0/pg144-axi-gpio.pdf On page 8, we can see that the GPIO widths for the 2 channels can have values different from power-of-2.For example: 5, 15 etc. So, I think we should keep the 'for_each_set_clump', 'bitmap_get_value' and 'bitmap_set_value' as completely generic. I am proceeding further for my next patchset submission keeping above findings in mind. If you guys think something else or would like to add something, let me know. Regards Syed Nayyar Waris
Re: [PATCH v2 3/3] platform/x86: Intel PMT Telemetry capability driver
On Fri, 2020-05-08 at 12:57 +0300, Andy Shevchenko wrote: > On Fri, May 8, 2020 at 5:18 AM David E. Box < > david.e@linux.intel.com> wrote: > > PMT Telemetry is a capability of the Intel Platform Monitoring > > Technology. > > The Telemetry capability provides access to device telemetry > > metrics that > > provide hardware performance data to users from continuous, memory > > mapped, > > read-only register spaces. > > > > Register mappings are not provided by the driver. Instead, a GUID > > is read > > from a header for each endpoint. The GUID identifies the device and > > is to > > be used with an XML, provided by the vendor, to discover the > > available set > > of metrics and their register mapping. This allows firmware > > updates to > > modify the register space without needing to update the driver > > every time > > with new mappings. Firmware writes a new GUID in this case to > > specify the > > new mapping. Software tools with access to the associated XML file > > can > > then interpret the changes. > > > > This module manages access to all PMT Telemetry endpoints on a > > system, > > regardless of the device exporting them. It creates a pmt_telemetry > > class > > to manage the list. For each endpoint, sysfs files provide GUID and > > size > > information as well as a pointer to the parent device the telemetry > > comes > > from. Software may discover the association between endpoints and > > devices > > by iterating through the list in sysfs, or by looking for the > > existence of > > ABI needs documentation. We will be releasing a Linux software spec for PMT. We are waiting on public release of the PMT spec. For this patch we did document the sysfs class ABI. > > > the class folder under the device of interest. A device node of > > the same > > name allows software to then map the telemetry space for direct > > access. > > ... > > > +config INTEL_PMT_TELEM > > TELEMETRY > > ... > > > +obj-$(CONFIG_INTEL_PMT_TELEM) += intel_pmt_telem.o > > telemetry > > (Inside the file it's fine to have telem) > > ... > > > + priv->dvsec = dev_get_platdata(>dev); > > + if (!priv->dvsec) { > > + dev_err(>dev, "Platform data not found\n"); > > + return -ENODEV; > > + } > > I don't see how is it being used? Good catch :). This was initially used to pass the DVSEC info from the pci device to the telemetry driver. But with changes all of the needed info is now read from the driver's memory resource. It was unnoticed that dvsec fields are no longer used. Will remove in next version. Okay on other comments. David
Re: [PATCH 2/2] drm/panfrost: add devfreq regulator support
Hi Steven, On Thu, 7 May 2020 at 16:30, Steven Price wrote: > > On 02/05/2020 23:07, Clément Péron wrote: > > Hi Steven, > > > > On Tue, 14 Apr 2020 at 15:10, Steven Price wrote: > >> > >> Hi Clément, > >> > >> On 13/04/2020 18:28, Clément Péron wrote: > >>> Hi Steven, > >>> > > > > Since you've got a reproduction - can you get a backtrace where the > regulator is getting disabled? Regulator is disabled from regulator_late_cleanup() [ 33.757650] vdd-gpu: disabling [ 33.760718] CPU: 2 PID: 31 Comm: kworker/2:1 Not tainted 5.7.0-rc2-next-20200424 #8 [ 33.768362] Hardware name: Beelink GS1 (DT) [ 33.772553] Workqueue: events regulator_init_complete_work_function [ 33.778813] Call trace: [ 33.781261] dump_backtrace+0x0/0x1a0 [ 33.784922] show_stack+0x18/0x30 [ 33.788238] dump_stack+0xc0/0x114 [ 33.791638] regulator_late_cleanup+0x164/0x1f0 [ 33.796165] class_for_each_device+0x64/0xe0 [ 33.800431] regulator_init_complete_work_function+0x4c/0x60 [ 33.806084] process_one_work+0x19c/0x330 [ 33.810090] worker_thread+0x4c/0x430 [ 33.813748] kthread+0x138/0x160 [ 33.816973] ret_from_fork+0x10/0x24 the use_count is at 0... I have check and the regulator_get is called and regulator_put is never called for vdd-gpu. Not sure what is happening here... > > > - The Cooling map is not probe correctly : > > [2.545756] panfrost 180.gpu: [drm:panfrost_devfreq_init > > [panfrost]] Failed to register cooling device > > Introduce in this commit : > > https://github.com/clementperon/linux/commit/0252c38fd55ad78366ac4b1714e285c88db34557 > > > > Do you have an hint about what I'm missing ? > > Sorry, my knowledge of the cooling framework is very limited. What > you've got looks plausible, but I'm afraid I can't really help beyond > that! As before - can you try adding some printk()s in e.g. > of_devfreq_cooling_register_power() and find out where it is bailing out? Dumb issue, I was missing the CONFIG_DEVFREQ_THERMAL -_-, I will make a patch to enable it in arm64 defconfig. Regards, Clement > > Steve
Re: [PATCH for-5.7] io_uring: fix zero len do_splice()
On 5/9/20 10:07 AM, Pavel Begunkov wrote: > On 04/05/2020 23:00, Pavel Begunkov wrote: >> do_splice() doesn't expect len to be 0. Just always return 0 in this >> case as splice(2) do. >> > > If it was missed, may you take a look? I reattached the patch btw killing > reported warnings. Thanks for re-sending, I'll queue it up for 5.7. -- Jens Axboe
Re: [PATCH -next v2] locking/osq_lock: annotate a data race in osq_lock
On Sat, May 09, 2020 at 09:01:53AM -0400, Qian Cai wrote: > > > > On May 9, 2020, at 12:33 AM, Paul E. McKenney wrote: > > > > On Fri, May 08, 2020 at 04:59:05PM -0400, Qian Cai wrote: > >> > >> > >>> On Feb 11, 2020, at 8:54 AM, Qian Cai wrote: > >>> > >>> prev->next could be accessed concurrently as noticed by KCSAN, > >>> > >>> write (marked) to 0x9d3370dbbe40 of 8 bytes by task 3294 on cpu 107: > >>> osq_lock+0x25f/0x350 > >>> osq_wait_next at kernel/locking/osq_lock.c:79 > >>> (inlined by) osq_lock at kernel/locking/osq_lock.c:185 > >>> rwsem_optimistic_spin > >>> > >>> > >>> read to 0x9d3370dbbe40 of 8 bytes by task 3398 on cpu 100: > >>> osq_lock+0x196/0x350 > >>> osq_lock at kernel/locking/osq_lock.c:157 > >>> rwsem_optimistic_spin > >>> > >>> > >>> Since the write only stores NULL to prev->next and the read tests if > >>> prev->next equals to this_cpu_ptr(_node). Even if the value is > >>> shattered, the code is still working correctly. Thus, mark it as an > >>> intentional data race using the data_race() macro. > >>> > >>> Signed-off-by: Qian Cai > >> > >> Hmm, this patch has been dropped from linux-next from some reasons. > >> > >> Paul, can you pick this up along with KCSAN fixes? > >> > >> https://lore.kernel.org/lkml/1581429255-12542-1-git-send-email-...@lca.pw/ > > > > I have queued it on -rcu, but it is too late for v5.8 via the -rcu tree, > > so this is v5.9 at the earliest. Plus I would need an ack from one of > > the locking folks. > > Peter, Will, can you give an ACK? This v2 should incorporate all the feedback > from Peter, > > https://lore.kernel.org/lkml/20200211124753.gp14...@hirez.programming.kicks-ass.net/ > > V5.9 is fine. All I care about is it is always in linux-next (so the testing > bots won’t trigger this over and over again) and to be in mainline at some > point in the future. Ah, and I forgot to ask. Why "if (data_race(prev->next == node)" instead of "if (data_race(prev->next) == node"? Thanx, Paul > >>> --- > >>> > >>> v2: insert some code comments. > >>> > >>> kernel/locking/osq_lock.c | 6 +- > >>> 1 file changed, 5 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c > >>> index 1f7734949ac8..f733bcd99e8a 100644 > >>> --- a/kernel/locking/osq_lock.c > >>> +++ b/kernel/locking/osq_lock.c > >>> @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock) > >>>*/ > >>> > >>> for (;;) { > >>> - if (prev->next == node && > >>> + /* > >>> + * cpu_relax() below implies a compiler barrier which would > >>> + * prevent this comparison being optimized away. > >>> + */ > >>> + if (data_race(prev->next == node) && > >>> cmpxchg(>next, node, NULL) == node) > >>> break; > >>> > >>> -- > >>> 1.8.3.1 >
Re: [PATCH for-5.7] io_uring: fix zero len do_splice()
On 04/05/2020 23:00, Pavel Begunkov wrote: > do_splice() doesn't expect len to be 0. Just always return 0 in this > case as splice(2) do. > If it was missed, may you take a look? I reattached the patch btw killing reported warnings. --- From: Pavel Begunkov Subject: [PATCH] io_uring: fix zero len do_splice() do_splice() doesn't expect len to be 0. Do the check in io_splice() exactly as in sys_splice(). Fixes: 7d67af2c0134 ("io_uring: add splice(2) support") Reported-by: Jann Horn Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 65458eda2127..70d8943c337b 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2772,16 +2772,16 @@ static int io_splice(struct io_kiocb *req, bool force_nonblock) struct file *out = sp->file_out; unsigned int flags = sp->flags & ~SPLICE_F_FD_IN_FIXED; loff_t *poff_in, *poff_out; - long ret; + long ret = 0; if (force_nonblock) return -EAGAIN; poff_in = (sp->off_in == -1) ? NULL : >off_in; poff_out = (sp->off_out == -1) ? NULL : >off_out; - ret = do_splice(in, poff_in, out, poff_out, sp->len, flags); - if (force_nonblock && ret == -EAGAIN) - return -EAGAIN; + + if (sp->len) + ret = do_splice(in, poff_in, out, poff_out, sp->len, flags); io_put_file(req, in, (sp->flags & SPLICE_F_FD_IN_FIXED)); req->flags &= ~REQ_F_NEED_CLEANUP; -- 2.24.0
Re: [PATCH RFC tip/core/rcu] Add shrinker to shift to fast/inefficient GP mode
On Sat, May 09, 2020 at 11:54:40AM +0300, Konstantin Khlebnikov wrote: > On 08/05/2020 17.46, Paul E. McKenney wrote: > > On Fri, May 08, 2020 at 12:00:28PM +0300, Konstantin Khlebnikov wrote: > > > On 07/05/2020 22.09, Paul E. McKenney wrote: > > > > On Thu, May 07, 2020 at 02:31:02PM -0400, Johannes Weiner wrote: > > > > > On Thu, May 07, 2020 at 10:09:03AM -0700, Paul E. McKenney wrote: > > > > > > On Thu, May 07, 2020 at 01:00:06PM -0400, Johannes Weiner wrote: > > > > > > > On Wed, May 06, 2020 at 05:55:35PM -0700, Andrew Morton wrote: > > > > > > > > On Wed, 6 May 2020 17:42:40 -0700 "Paul E. McKenney" > > > > > > > > wrote: > > > > > > > > > > > > > > > > > This commit adds a shrinker so as to inform RCU when memory > > > > > > > > > is scarce. > > > > > > > > > RCU responds by shifting into the same fast and inefficient > > > > > > > > > mode that is > > > > > > > > > used in the presence of excessive numbers of RCU callbacks. > > > > > > > > > RCU remains > > > > > > > > > in this state for one-tenth of a second, though this time > > > > > > > > > window can be > > > > > > > > > extended by another call to the shrinker. > > > > > > > > > > > > > > We may be able to use shrinkers here, but merely being invoked > > > > > > > does > > > > > > > not carry a reliable distress signal. > > > > > > > > > > > > > > Shrinkers get invoked whenever vmscan runs. It's a useful > > > > > > > indicator > > > > > > > for when to age an auxiliary LRU list - test references, clear and > > > > > > > rotate or reclaim stale entries. The urgency, and what can and > > > > > > > cannot > > > > > > > be considered "stale", is encoded in the callback frequency and > > > > > > > scan > > > > > > > counts, and meant to be relative to the VM's own rate of aging: > > > > > > > "I've > > > > > > > tested X percent of mine for recent use, now you go and test the > > > > > > > same > > > > > > > share of your pool." It doesn't translate well to other > > > > > > > interpretations of the callbacks, although people have tried. > > > > > > > > > > > > Would it make sense for RCU to interpret two invocations within > > > > > > (say) > > > > > > 100ms of each other as indicating urgency? (Hey, I had to ask!) > > > > > > > > > > It's the perfect number for one combination of CPU, storage device, > > > > > and shrinker implementation :-) > > > > > > > > Woo-hoo!!! > > > > > > > > But is that one combination actually in use anywhere? ;-) > > > > > > > > > > > > > If it proves feasible, a later commit might add a function > > > > > > > > > call directly > > > > > > > > > indicating the end of the period of scarce memory. > > > > > > > > > > > > > > > > (Cc David Chinner, who often has opinions on shrinkers ;)) > > > > > > > > > > > > > > > > It's a bit abusive of the intent of the slab shrinkers, but I > > > > > > > > don't > > > > > > > > immediately see a problem with it. Always returning 0 from > > > > > > > > ->scan_objects might cause a problem in some situations(?). > > > > > > > > > > > > > > > > Perhaps we should have a formal "system getting low on memory, > > > > > > > > please > > > > > > > > do something" notification API. > > > > > > > > > > > > > > It's tricky to find a useful definition of what low on memory > > > > > > > means. In the past we've used sc->priority cutoffs, the vmpressure > > > > > > > interface (reclaimed/scanned - reclaim efficiency cutoffs), oom > > > > > > > notifiers (another reclaim efficiency cutoff). But none of these > > > > > > > reliably capture "distress", and they vary highly between > > > > > > > different > > > > > > > hardware setups. It can be hard to trigger OOM itself on fast IO > > > > > > > devices, even when the machine is way past useful (where useful is > > > > > > > somewhat subjective to the user). Userspace OOM implementations > > > > > > > that > > > > > > > consider userspace health (also subjective) are getting more > > > > > > > common. > > > > > > > > > > > > > > > How significant is this? How much memory can RCU consume? > > > > > > > > > > > > > > I think if rcu can end up consuming a significant share of > > > > > > > memory, one > > > > > > > way that may work would be to do proper shrinker integration and > > > > > > > track > > > > > > > the age of its objects relative to the age of other allocations > > > > > > > in the > > > > > > > system. I.e. toss them all on a clock list with "new" bits and > > > > > > > shrink > > > > > > > them at VM velocity. If the shrinker sees objects with new bit > > > > > > > set, > > > > > > > clear and rotate. If it sees objects without them, we know > > > > > > > rcu_heads > > > > > > > outlive cache pages etc. and should probably cycle faster too. > > > > > > > > > > > > It would be easy for RCU to pass back (or otherwise use) the age of > > > > > > the > > > > > > current grace period, if that would help. > > > > > > > > > > > > Tracking the age of individual callbacks is out of the
Re: [RFC] splice/tee: len=0 fast path after validity check
On 09/05/2020 17:42, Jens Axboe wrote: > On 5/9/20 2:46 AM, Pavel Begunkov wrote: >> When len=0, splice() and tee() return 0 even if specified fds are >> invalid, hiding errors from users. Move len=0 optimisation later after >> basic validity checks. >> >> before: >> splice(len=0, fd_in=-1, ...) == 0; >> >> after: >> splice(len=0, fd_in=-1, ...) == -EBADF; > > I'm not sure what the purpose of this would be. It probably should have > been done that way from the beginning, but it wasn't. While there's > very little risk of breaking any applications due to this change, it > also seems like a pointless exercise at this point. > > So my suggestion would be to just leave it alone. Ok -- Pavel Begunkov
Re: [PATCH v8 8/8] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32
Le 28/04/2020 à 18:05, Arnd Bergmann a écrit : On Tue, Apr 28, 2020 at 3:16 PM Christophe Leroy wrote: Provides __kernel_clock_gettime64() on vdso32. This is the 64 bits version of __kernel_clock_gettime() which is y2038 compliant. Signed-off-by: Christophe Leroy Looks good to me Reviewed-by: Arnd Bergmann There was a bug on ARM for the corresponding function, so far it is unclear if this was a problem related to particular hardware, the 32-bit kernel code, or the common implementation of clock_gettime64 in the vdso library, see https://github.com/richfelker/musl-cross-make/issues/96 Just to be sure that powerpc is not affected by the same issue, can you confirm that repeatedly calling clock_gettime64 on powerpc32, alternating between vdso and syscall, results in monotically increasing times? I think that's one of the things vdsotest checks, so yes that's ok I think. Christophe
VERY URGENT
FROM MR.SÉBASTIEN TONI AUDIT& ACCOUNT MANAGER BANK OF AFRICA (B.O.A) OUAGADOUGOU BURKINA FASO WEST AFRICA. Dear Friend, With due respect, I have decided to contact you on abusinesstransaction that will be beneficial to both of us. At the bank last account and auditing evaluation, my staffs came across an old account which was being maintained by a foreign client who we learn was among the deceased passengers of motor accident on November.2003, the deceased was unable to run this account since his death. Theaccount has remained dormant without the knowledge of his family since it was put in a safe deposit account in the bank for future investment by the client. Since his demise, even the members of his family haven't applied for claims over this fund and it has been in the safe deposit account until I discovered that it cannot be claimed since our client isaforeign national and we are sure that he has no next of kin here to file claims over the money. As the director of the department, this discovery was brought to my office so as to decide what is to bedone.I decided to seek ways through which to transfer this money out of the bank and out of the country too. The total amount in the account is 18.6 million with my positions as staffs of the bank, I am handicapped because I cannot operate foreign accounts and cannot lay bonafide claim over this money. The client was a foreign national and you will only be asked to act as his next of kin and I will supply you with all the necessary information and bank data to assist you in being able to transfer this money to any bank of your choice where this money could be transferred into.The total sum will be shared as follows: 50% for me, 50% for you and expenses incidental occur during the transfer will be incur by both of us. The transfer is risk free on both sides hence you are going to follow my instruction till the fund transfer to your account. Since I work in this bank that is why you should be confident in the success of this transaction because you will be updated with information as at when desired. I will wish you to keep this transaction secret and confidential as I am hoping to retire with my share of this money at the end of transaction which will be when this money is safety in your account. I will then come over to your country for sharing according to the previously agreed percentages. You might even have to advise me on possibilities of investment in your country or elsewhere of our choice. May God help you to help me to a restive retirement,Amen,And You have to contact me through my private e-mail at(sebastient...@gmail.com)Please for further information and inquires feel free to contact me back immediately for more explanation and better understanding I want you to assure me your capability of handling this project with trust by providing me your following information details such as: (1)NAME.. (2)AGE: (3)SEX:. (4)PHONE NUMBER:. (5)OCCUPATION:. (6)YOUR COUNTRY:. Yours sincerely, Mr.Sébastien Toni
Re: ioremap() called early from pnv_pci_init_ioda_phb()
Le 08/05/2020 à 19:41, Qian Cai a écrit : On May 8, 2020, at 10:39 AM, Qian Cai wrote: Booting POWER9 PowerNV has this message, "ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use early_ioremap() instead” but use the patch below will result in leaks because it will never call early_iounmap() anywhere. However, it looks me it was by design that phb->regs mapping would be there forever where it would be used in pnv_ioda_get_inval_reg(), so is just that check_early_ioremap_leak() initcall too strong? --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -36,6 +36,7 @@ #include #include #include +#include #include @@ -3827,7 +3828,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np, /* Get registers */ if (!of_address_to_resource(np, 0, )) { phb->regs_phys = r.start; - phb->regs = ioremap(r.start, resource_size()); + phb->regs = early_ioremap(r.start, resource_size()); if (phb->regs == NULL) pr_err(" Failed to map registers !\n”); This will also trigger a panic with debugfs reads, so isn’t that this commit bogus at least for powerpc64? d538aadc2718 (“powerpc/ioremap: warn on early use of ioremap()") No d538aadc2718 is not bogus. That's the point, we want to remove all early usages of ioremap() in order to remove the hack with the ioremap_bot stuff and all, and stick to the generic ioremap logic. In order to do so, all early use of ioremap() has to be converted to early_ioremap() or to fixmap or anything else that allows to do ioremaps before the slab is ready. early_ioremap() is for temporary mappings necessary at boottime. For long lasting mappings, another method is to be used. Now, the point is that other architectures like for instance x86 don't seem to have to use early_ioremap() much. Powerpc is for instance doing early mappings for PCI. Seems like x86 initialises PCI once slab is ready. Can't powerpc do the same ? Christophe
Re: [PATCH net-next 1/2] ath10k: fix gcc-10 zero-length-bounds warnings
Arnd, On Sat, May 09, 2020 at 02:06:32PM +0200, Arnd Bergmann wrote: > gcc-10 started warning about out-of-bounds access for zero-length > arrays: > > In file included from drivers/net/wireless/ath/ath10k/core.h:18, > from drivers/net/wireless/ath/ath10k/htt_rx.c:8: > drivers/net/wireless/ath/ath10k/htt_rx.c: In function > 'ath10k_htt_rx_tx_fetch_ind': > drivers/net/wireless/ath/ath10k/htt.h:1683:17: warning: array subscript 65535 > is outside the bounds of an interior zero-length array 'struct > htt_tx_fetch_record[0]' [-Wzero-length-bounds] > 1683 | return (void *)>records[le16_to_cpu(ind->num_records)]; > | ^~~~ > drivers/net/wireless/ath/ath10k/htt.h:1676:29: note: while referencing > 'records' > 1676 | struct htt_tx_fetch_record records[0]; > | ^~~ > > Make records[] a flexible array member to allow this, moving it behind > the other zero-length member that is not accessed in a way that gcc > warns about. > > Fixes: 3ba225b506a2 ("treewide: Replace zero-length array with flexible-array > member") This treewide patch no longer contains changes for ath10k. I removed them since Monday (05/04/2020). So, this "Fixes" tag does not apply. Thanks -- Gustavo > Fixes: 22e6b3bc5d96 ("ath10k: add new htt definitions") > Signed-off-by: Arnd Bergmann > --- > drivers/net/wireless/ath/ath10k/htt.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/wireless/ath/ath10k/htt.h > b/drivers/net/wireless/ath/ath10k/htt.h > index 8f3710cf28f4..aa056a186402 100644 > --- a/drivers/net/wireless/ath/ath10k/htt.h > +++ b/drivers/net/wireless/ath/ath10k/htt.h > @@ -1673,8 +1673,8 @@ struct htt_tx_fetch_ind { > __le32 token; > __le16 num_resp_ids; > __le16 num_records; > - struct htt_tx_fetch_record records[0]; > __le32 resp_ids[0]; /* ath10k_htt_get_tx_fetch_ind_resp_ids() */ > + struct htt_tx_fetch_record records[]; > } __packed; > > static inline void * > -- > 2.26.0 >
Re: [PATCH v5] x86: bitops: fix build regression
On Sat, May 9, 2020 at 8:20 AM Andy Shevchenko wrote: > > On Fri, May 8, 2020 at 9:35 PM Nick Desaulniers > wrote: > > > > This is easily reproducible via CC=clang+CONFIG_STAGING=y+CONFIG_VT6656=m. > > > > It turns out that if your config tickles __builtin_constant_p via > > differences in choices to inline or not, these statements produce > > invalid assembly: > > > > $ cat foo.c > > long a(long b, long c) { > > asm("orb\t%1, %0" : "+q"(c): "r"(b)); > > return c; > > } > > $ gcc foo.c > > foo.c: Assembler messages: > > foo.c:2: Error: `%rax' not allowed with `orb' > > > > Use the `%b` "x86 Operand Modifier" to instead force register allocation > > to select a lower-8-bit GPR operand. > > > > The "q" constraint only has meaning on -m32 otherwise is treated as > > "r". Not all GPRs have low-8-bit aliases for -m32. > > > > Looks very good! > One question though, does it work with minimum supported version of gcc? Yes. The operand width modifiers have been around a long time but not well documented until more recently. -- Brian Gerst
Re: ioremap() called early from pnv_pci_init_ioda_phb()
> On May 9, 2020, at 4:38 AM, Nicholas Piggin wrote: > > Your patch to use early_ioremap is faulting? I wonder why? Yes, I don’t know the reasons either. I suppose not many places in other parts of the kernel which keep using those addresses from early_ioremap() after system booted. Otherwise, we would see those leak warnings elsewhere. Thus, we probably have to audit the code, and if still necessary, call early_ioremap() and then early_iounmap() followed by a ioremap() once slab allocator is available?
Re: [PATCH v1 2/3] armv8: gpio: add gpio feature
On Sat, May 09, 2020 at 06:39:55PM +0800, Hui Song wrote: > From: "hui.song" > > add one struct mpc8xxx_gpio_plat to enable gpio feature. > > Signed-off-by: hui.song > --- > .../include/asm/arch-fsl-layerscape/gpio.h| 22 +++ > 1 file changed, 22 insertions(+) > create mode 100644 arch/arm/include/asm/arch-fsl-layerscape/gpio.h > > diff --git a/arch/arm/include/asm/arch-fsl-layerscape/gpio.h > b/arch/arm/include/asm/arch-fsl-layerscape/gpio.h > new file mode 100644 > index 00..d8dd750a72 > --- /dev/null > +++ b/arch/arm/include/asm/arch-fsl-layerscape/gpio.h > @@ -0,0 +1,22 @@ > +/* SPDX-License-Identifier: GPL-2.0+ */ > +/* > + * Copyright 2014 Freescale Semiconductor, Inc. > + */ > + > +/* > + * Dummy header file to enable CONFIG_OF_CONTROL. > + * If CONFIG_OF_CONTROL is enabled, lib/fdtdec.c is compiled. > + * It includes via , so those SoCs that enable > + * OF_CONTROL must have arch/gpio.h. > + */ This does not seem right. You would expect each sub arch to have a subdirectory in arch/arm/include/asm/ when in fact none do. Andrew
Re: [EXT] Re: [PATCH net-next 3/5] net: mvpp2: cls: Use RSS contexts to handle RSS tables
On Sat, May 9, 2020 at 4:49 PM Russell King - ARM Linux admin wrote: > > On Sat, May 09, 2020 at 02:51:05PM +0100, Russell King - ARM Linux admin > wrote: > > On Sat, May 09, 2020 at 03:14:05PM +0200, Matteo Croce wrote: > > > On Sat, May 9, 2020 at 1:45 PM Russell King - ARM Linux admin > > > wrote: > > > > > > > > On Sat, May 09, 2020 at 11:15:58AM +, Stefan Chulski wrote: > > > > > > > > > > > > > > > > -Original Message- > > > > > > From: Matteo Croce > > > > > > Sent: Saturday, May 9, 2020 3:13 AM > > > > > > To: David S . Miller > > > > > > Cc: Maxime Chevallier ; netdev > > > > > > ; LKML ; > > > > > > Antoine > > > > > > Tenart ; Thomas Petazzoni > > > > > > ; gregory.clem...@bootlin.com; > > > > > > miquel.ray...@bootlin.com; Nadav Haklai ; Stefan > > > > > > Chulski ; Marcin Wojtas ; > > > > > > Linux > > > > > > ARM ; Russell King - ARM > > > > > > Linux admin > > > > > > > > > > > > Subject: [EXT] Re: [PATCH net-next 3/5] net: mvpp2: cls: Use RSS > > > > > > contexts to > > > > > > handle RSS tables > > > > > > > > > > > > Hi, > > > > > > > > > > > > What do you think about temporarily disabling it like this? > > > > > > > > > > > > --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > > > > > +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > > > > > @@ -5775,7 +5775,8 @@ static int mvpp2_port_probe(struct > > > > > > platform_device > > > > > > *pdev, > > > > > > NETIF_F_HW_VLAN_CTAG_FILTER; > > > > > > > > > > > > if (mvpp22_rss_is_supported()) { > > > > > > - dev->hw_features |= NETIF_F_RXHASH; > > > > > > + if (port->phy_interface != PHY_INTERFACE_MODE_SGMII) > > > > > > + dev->hw_features |= NETIF_F_RXHASH; > > > > > > dev->features |= NETIF_F_NTUPLE; > > > > > > } > > > > > > > > > > > > > > > > > > David, is this "workaround" too bad to get accepted? > > > > > > > > > > Not sure that RSS related to physical interface(SGMII), better just > > > > > remove NETIF_F_RXHASH as "workaround". > > > > > > > > Hmm, I'm not sure this is the right way forward. This patch has the > > > > effect of disabling: > > > > > > > > d33ec4525007 ("net: mvpp2: add an RSS classification step for each > > > > flow") > > > > > > > > but the commit you're pointing at which caused the regression is: > > > > > > > > 895586d5dc32 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables") > > > > > > > > > > > > > > Hi, > > > > > > When git bisect pointed to 895586d5dc32 ("net: mvpp2: cls: Use RSS > > > contexts to handle RSS tables"), which was merged > > > almost an year after d33ec4525007 ("net: mvpp2: add an RSS > > > classification step for each flow"), so I assume that between these > > > two commits either the feature was working or it was disable and we > > > didn't notice > > > > > > Without knowing what was happening, which commit should my Fixes tag > > > point to? > > > > Let me make sure that I get this clear: > > > > - Prior to 895586d5dc32, you can turn on and off rxhash without issue > > on any port. > > - After 895586d5dc32, turning rxhash on eth2 prevents reception. > > > > Prior to 895586d5dc32, with rxhash on, it looks like hashing using > > CRC32 is supported but only one context. So, if it's possible to > > enable rxhash on any port on the mcbin without 895586d5dc32, and the > > port continues to work, I'd say the bug was introduced by > > 895586d5dc32. > > > > Of course, that would be reinforced if there was a measurable > > difference in performance due to rxhash on each port. > > I've just run this test, but I can detect no difference in performance > with or without 895586d5dc32 on eth0 or eth2 on the mcbin (apart from > eth2 stopping working with 895586d5dc32 applied.) I tested this by > reverting almost all changes to the mvpp2 driver between 5.6 and that > commit. > > That's not too surprising; I'm using my cex7 platform with the Mellanox > card in for one end of the 10G link, and that platform doesn't seem to > be able to saturdate a 10G link - it only seems to manage around 4Gbps. > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up > Well it depends on the traffic type. I used to generate 5k flows with T-Rex and an Intel X710 card. This way t-rex changes the UDP port of every packet: root@macchiatobin:~# tcpdump -tnni eth0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes IP 16.0.0.18.9874 > 48.0.0.81.2001: UDP, length 18 IP 16.0.0.248.56289 > 48.0.192.56.2001: UDP, length 18 IP 16.0.0.154.44965 > 48.0.128.26.2001: UDP, length 18 IP 16.0.0.23.31363 > 48.0.0.86.2001: UDP, length 18 IP 16.0.0.192.1674 > 48.0.192.63.2001: UDP, length 18 IP 16.0.0.155.62370 > 48.0.128.27.2001: UDP, length 18 IP 16.0.0.30.22126 > 48.0.0.93.2001: UDP, length
Re: [PATCH v1 1/3] gpio: mpc8xxx: support fsl-layerscape platform.
On Sat, May 09, 2020 at 06:39:54PM +0800, Hui Song wrote: > From: "hui.song" > > Make the MPC8XXX gpio driver to support the fsl-layerscape. > > Signed-off-by: hui.song > --- > drivers/gpio/mpc8xxx_gpio.c | 59 + > 1 file changed, 59 insertions(+) > > diff --git a/drivers/gpio/mpc8xxx_gpio.c b/drivers/gpio/mpc8xxx_gpio.c > index 1dfd22522c..466f5f50cf 100644 > --- a/drivers/gpio/mpc8xxx_gpio.c > +++ b/drivers/gpio/mpc8xxx_gpio.c > @@ -12,6 +12,8 @@ > #include > #include > #include > +#include > +#include > > struct ccsr_gpio { > u32 gpdir; > @@ -20,6 +22,7 @@ struct ccsr_gpio { > u32 gpier; > u32 gpimr; > u32 gpicr; > + u32 gpibe; > }; > > struct mpc8xxx_gpio_data { > @@ -49,31 +52,51 @@ inline u32 gpio_mask(uint gpio) > > static inline u32 mpc8xxx_gpio_get_val(struct ccsr_gpio *base, u32 mask) > { > +#if CONFIG_ARM > + return in_le32(>gpdat) & mask; > +#else > return in_be32(>gpdat) & mask; > +#endif > } Hi Hui Did the hardware engineers really change the endinness of the register? Forget about the CPU here, did the register change endinness? In general, you should not need to use #if like this, so long as the register itself is still the same. There are functions which will do the correct thing depending on if the CPU is big or little endian. > @@ -147,13 +183,29 @@ static int mpc8xxx_gpio_ofdata_to_platdata(struct > udevice *dev) > { > struct mpc8xxx_gpio_plat *plat = dev_get_platdata(dev); > fdt_addr_t addr; > + u32 i; > +#if CONFIG_ARM > + u32 reg[4]; > + > + dev_read_u32_array(dev, "reg", reg, 4); > +#else > u32 reg[2]; > > dev_read_u32_array(dev, "reg", reg, 2); > +#endif > + > +#if CONFIG_ARM > + for (i = 0; i < 2; i++) > + reg[i] = be32_to_cpu(reg[i]); > +#endif > addr = dev_translate_address(dev, reg); > > plat->addr = addr; > +#if CONFIG_ARM > + plat->size = reg[3]; > +#else > plat->size = reg[1]; > +#endif > plat->ngpios = dev_read_u32_default(dev, "ngpios", 32); So you are extending the DT binding. You need to document this. And it should really have a different compatible string, since the binding is not compatible between the two variants. > > return 0; > @@ -187,6 +239,7 @@ static int mpc8xxx_gpio_platdata_to_priv(struct udevice > *dev) > static int mpc8xxx_gpio_probe(struct udevice *dev) > { > struct gpio_dev_priv *uc_priv = dev_get_uclass_priv(dev); > + struct device_node const *np = dev->node.np; > struct mpc8xxx_gpio_data *data = dev_get_priv(dev); > char name[32], *str; > > @@ -198,6 +251,12 @@ static int mpc8xxx_gpio_probe(struct udevice *dev) > if (!str) > return -ENOMEM; > > + if (of_device_is_compatible(np, "fsl,qoriq-gpio", NULL, NULL)) { > + unsigned long gpibe = data->addr + sizeof(struct ccsr_gpio); > + > + out_be32(gpibe, 0x); That is an odd way to determine the address of a register. Andrew
Re: [PATCH 1/1] firmware: arm_scmi/mailbox: ignore notification for tx done using irq
On Sat, May 09, 2020 at 04:54:57PM +0800, joe_zhu...@126.com wrote: > From: Joe Zhu > > If mailbox uses IRQ method, it already notified framework with > mbox_chan_txdone() in ISR. > > Signed-off-by: Joe Zhu > --- > drivers/firmware/arm_scmi/mailbox.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/firmware/arm_scmi/mailbox.c > b/drivers/firmware/arm_scmi/mailbox.c > index 73077bbc4ad9..303a5dc42429 100644 > --- a/drivers/firmware/arm_scmi/mailbox.c > +++ b/drivers/firmware/arm_scmi/mailbox.c > @@ -9,6 +9,7 @@ > #include > #include > #include > +#include This is an indication that something is wrong. The mailbox controller and client interfaces are very clear. You need to use mailbox controller interface when implementing a mailbox controller and use only client interface when implementing a mailbox client. > #include > #include > #include > @@ -147,7 +148,8 @@ static void mailbox_mark_txdone(struct scmi_chan_info > *cinfo, int ret) >* Unfortunately, we have to kick the mailbox framework after we have >* received our message. >*/ > - mbox_client_txdone(smbox->chan, ret); > + if (!smbox->chan->mbox->txdone_irq) > + mbox_client_txdone(smbox->chan, ret); If this patch is to avoid getting "Client can't run the TX ticker" error messages, then you make need to fix that with something like below: Regards, Sudeep -->8 diff --git i/drivers/mailbox/mailbox.c w/drivers/mailbox/mailbox.c index 0b821a5b2db8..5a78a0adcce4 100644 --- i/drivers/mailbox/mailbox.c +++ w/drivers/mailbox/mailbox.c @@ -189,7 +189,9 @@ EXPORT_SYMBOL_GPL(mbox_chan_txdone); void mbox_client_txdone(struct mbox_chan *chan, int r) { if (unlikely(!(chan->txdone_method & TXDONE_BY_ACK))) { - dev_err(chan->mbox->dev, "Client can't run the TX ticker\n"); + if (unlikely(!(chan->txdone_method & TXDONE_BY_IRQ))) + dev_err(chan->mbox->dev, + "Client can't run the TX ticker\n"); return; }
Re: [PATCH 01/15] taint: add module firmware crash taint support
On Sat, May 09, 2020 at 04:35:38AM +, Luis Chamberlain wrote: > Device driver firmware can crash, and sometimes, this can leave your > system in a state which makes the device or subsystem completely > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead > of scraping some magical words from the kernel log, which is driver > specific, is much easier. So instead provide a helper which lets drivers > annotate this. > > Once this happens, scrapers can easily look for modules taint flags > for a firmware crash. This will taint both the kernel and respective > calling module. > > The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as this > fact should in no way shape or form affect lockdep. This taint is device > driver specific. > > Signed-off-by: Luis Chamberlain > --- > include/linux/kernel.h| 3 ++- > include/linux/module.h| 13 + > include/trace/events/module.h | 3 ++- > kernel/module.c | 5 +++-- > kernel/panic.c| 1 + > 5 files changed, 21 insertions(+), 4 deletions(-) > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h > index 04a5885cec1b..19e1541c82c7 100644 > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -601,7 +601,8 @@ extern enum system_states { > #define TAINT_LIVEPATCH 15 > #define TAINT_AUX16 > #define TAINT_RANDSTRUCT 17 > -#define TAINT_FLAGS_COUNT18 > +#define TAINT_FIRMWARE_CRASH 18 > +#define TAINT_FLAGS_COUNT19 > We are still missing the documentation bits for this new flag, though. How about having a blurb similar to: diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst index 71e9184a9079..5c6a9e2478b0 100644 --- a/Documentation/admin-guide/tainted-kernels.rst +++ b/Documentation/admin-guide/tainted-kernels.rst @@ -100,6 +100,7 @@ Bit Log Number Reason that got the kernel tainted 15 _/K 32768 kernel has been live patched 16 _/X 65536 auxiliary taint, defined for and used by distros 17 _/T 131072 kernel was built with the struct randomization plugin + 18 _/Q 262144 driver firmware crash annotation === === == Note: The character ``_`` is representing a blank in this table to make reading @@ -162,3 +163,7 @@ More detailed explanation for tainting produce extremely unusual kernel structure layouts (even performance pathological ones), which is important to know when debugging. Set at build time. + + 18) ``Q`` Device drivers might annotate the kernel with this taint, in cases + their firmware might have crashed leaving the driver in a crippled and + potentially useless state. > struct taint_flag { > char c_true;/* character printed when tainted */ > diff --git a/include/linux/module.h b/include/linux/module.h > index 2c2e988bcf10..221200078180 100644 > --- a/include/linux/module.h > +++ b/include/linux/module.h > @@ -697,6 +697,14 @@ static inline bool is_livepatch_module(struct module > *mod) > bool is_module_sig_enforced(void); > void set_module_sig_enforced(void); > > +void add_taint_module(struct module *mod, unsigned flag, > + enum lockdep_ok lockdep_ok); > + > +static inline void module_firmware_crashed(void) > +{ > + add_taint_module(THIS_MODULE, TAINT_FIRMWARE_CRASH, LOCKDEP_STILL_OK); > +} > + > #else /* !CONFIG_MODULES... */ > > static inline struct module *__module_address(unsigned long addr) > @@ -844,6 +852,11 @@ void *dereference_module_function_descriptor(struct > module *mod, void *ptr) > return ptr; > } > > +static inline void module_firmware_crashed(void) > +{ > + add_taint(TAINT_FIRMWARE_CRASH, LOCKDEP_STILL_OK); > +} > + > #endif /* CONFIG_MODULES */ > > #ifdef CONFIG_SYSFS > diff --git a/include/trace/events/module.h b/include/trace/events/module.h > index 097485c73c01..b749ea25affd 100644 > --- a/include/trace/events/module.h > +++ b/include/trace/events/module.h > @@ -26,7 +26,8 @@ struct module; > { (1UL << TAINT_OOT_MODULE),"O" }, \ > { (1UL << TAINT_FORCED_MODULE), "F" }, \ > { (1UL << TAINT_CRAP), "C" }, \ > - { (1UL << TAINT_UNSIGNED_MODULE), "E" }) > + { (1UL << TAINT_UNSIGNED_MODULE), "E" }, \ > + { (1UL << TAINT_FIRMWARE_CRASH),"Q" }) > > TRACE_EVENT(module_load, > > diff --git a/kernel/module.c b/kernel/module.c > index 80faaf2116dd..f98e8c25c6b4 100644 > --- a/kernel/module.c > +++ b/kernel/module.c > @@ -325,12 +325,13 @@ static inline int strong_try_module_get(struct module > *mod) > return -ENOENT; > } > > -static inline void add_taint_module(struct module *mod, unsigned flag, > - enum lockdep_ok lockdep_ok) > +void
Re: [PATCH v12 02/11] lib/test_linear_ranges: add a test for the 'linear_ranges'
On Fri, 2020-05-08 at 18:17 +0100, Mark Brown wrote: > On Fri, May 08, 2020 at 06:40:43PM +0300, Matti Vaittinen wrote: > > Add a KUnit test for the linear_ranges helper. > > > > Signed-off-by: Matti Vaittinen > > Reviewed-by: Brendan Higgins > > This now generates: > > WARNING: modpost: missing MODULE_LICENSE() in lib/linear_ranges.o > see include/linux/module.h for more information > > when the tests are built as a module and select the library. I sent a fix as an incremental patch. Please let me know if it is not the way to go. And Sorry for the trouble! Best Regards Matti Vaittinen
[PATCH] lib: linear_ranges: Add missing MODULE_LICENSE()
When linear_ranges is compiled as module we get warning about missing MODULE_LICENSE(). Fix it by adding MODULE_LICENSE("GPL") as is suggested by SPDX and EXPORTs. Signed-off-by: Matti Vaittinen --- I saw Mark applied the linear-ranges patch. So I sent this fix as incremental patch - but I still use the same Linus tree as a base of this fix - the linear-ranges file should be unchanged in regulator tree. If this does not apply I can clone regulator tree and create this fix on it. I don't know if this is the correct way to fix this as the linear-ranges should be merged to power-supply tree. I guess we can either: - Use this patch to fix regulator tree and create fixed tag for power-supply(?) - Add this fix in the original series and resend whole series(?) - re-create the series and drop the already applied patches. Add this fix as a fixup patch in new series and apply it to power-supply tree after the linear-ranges from regulator is merged to power-supply. Please adviece me if this patch is not the way to go. Oh, and I am really sorry for the trouble. I saw I had regulators=y in all of my compilations due to some pincontrol dependencies. So linear-ranges was not built as module in any of my test compilations :( Thanks for testing Mark! lib/linear_ranges.c | 4 1 file changed, 4 insertions(+) diff --git a/lib/linear_ranges.c b/lib/linear_ranges.c index d1336c75ccd7..9495ef3572b7 100644 --- a/lib/linear_ranges.c +++ b/lib/linear_ranges.c @@ -12,6 +12,7 @@ #include #include #include +#include /** * linear_range_values_in_range - return the amount of values in a range @@ -239,3 +240,6 @@ int linear_range_get_selector_high(const struct linear_range *r, return 0; } EXPORT_SYMBOL_GPL(linear_range_get_selector_high); + +MODULE_DESCRIPTION("linear-ranges helper"); +MODULE_LICENSE("GPL"); -- 2.21.0 -- Matti Vaittinen, Linux device drivers ROHM Semiconductors, Finland SWDC Kiviharjunlenkki 1E 90220 OULU FINLAND ~~~ "I don't think so," said Rene Descartes. Just then he vanished ~~~ Simon says - in Latin please. ~~~ "non cogito me" dixit Rene Descarte, deinde evanescavit ~~~ Thanks to Simon Glass for the translation =]
Re: remove a few uses of ->queuedata
On Sat, May 9, 2020 at 1:24 AM Christoph Hellwig wrote: > > On Fri, May 08, 2020 at 11:04:45AM -0700, Dan Williams wrote: > > On Fri, May 8, 2020 at 9:16 AM Christoph Hellwig wrote: > > > > > > Hi all, > > > > > > various bio based drivers use queue->queuedata despite already having > > > set up disk->private_data, which can be used just as easily. This > > > series cleans them up to only use a single private data pointer. > > > > ...but isn't the queue pretty much guaranteed to be cache hot and the > > gendisk cache cold? I'm not immediately seeing what else needs the > > gendisk in the I/O path. Is there another motivation I'm missing? > > ->private_data is right next to the ->queue pointer, pat0 and part_tbl > which are all used in the I/O submission path (generic_make_request / > generic_make_request_checks). This is mostly a prep cleanup patch to > also remove the pointless queue argument from ->make_request - then > ->queue is an extra dereference and extra churn. Ah ok. If the changelogs had been filled in with something like "In preparation for removing @q from make_request_fn, stop using ->queuedata", I probably wouldn't have looked twice. For the nvdimm/ driver updates you can add: Reviewed-by: Dan Williams ...or just let me know if you want me to pick those up through the nvdimm tree.
Re: [PATCH] drm/panel: visionox-rm69299: Add module license
在 2020/5/9 14:36, Randy Dunlap 写道: On 5/8/20 11:30 PM, Jason Yan wrote: Fix the following build warning: WARNING: modpost: missing MODULE_LICENSE() in drivers/gpu/drm/panel/panel-visionox-rm69299.o see include/linux/module.h for more information Signed-off-by: Jason Yan --- drivers/gpu/drm/panel/panel-visionox-rm69299.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/panel/panel-visionox-rm69299.c b/drivers/gpu/drm/panel/panel-visionox-rm69299.c index 25fe8b0bb040..ec279ffdbd94 100644 --- a/drivers/gpu/drm/panel/panel-visionox-rm69299.c +++ b/drivers/gpu/drm/panel/panel-visionox-rm69299.c @@ -300,3 +300,4 @@ static struct mipi_dsi_driver visionox_rm69299_driver = { module_mipi_dsi_driver(visionox_rm69299_driver); MODULE_DESCRIPTION("Visionox RM69299 DSI Panel Driver"); +MODULE_LICENSE("GPL"); I sent a patch for this yesterday. I'm sorry. Please ignore this one. Thanks,
Re: [PATCH v2] kernel: add panic_on_taint
On Sat, May 09, 2020 at 03:48:54AM +, Luis Chamberlain wrote: > On Fri, May 08, 2020 at 08:47:19AM -0400, Rafael Aquini wrote: > > On Thu, May 07, 2020 at 10:25:58PM +, Luis Chamberlain wrote: > > > On Thu, May 07, 2020 at 06:06:06PM -0400, Rafael Aquini wrote: > > > > On Thu, May 07, 2020 at 08:33:40PM +, Luis Chamberlain wrote: > > > > > I *think* that a cmdline route to enable this would likely remove the > > > > > need for the kernel config for this. But even with Vlastimil's work > > > > > merged, I think we'd want yet-another value to enable / disable this > > > > > feature. Do we need yet-another-taint flag to tell us that this > > > > > feature > > > > > was enabled? > > > > > > > > > > > > > I guess it makes sense to get rid of the sysctl interface for > > > > proc_on_taint, and only keep it as a cmdline option. > > > > > > That would be easier to support and k3eps this simple. > > > > > > > But the real issue seems to be, regardless we go with a cmdline-only > > > > option > > > > or not, the ability of proc_taint() to set any arbitrary taint flag > > > > other than just marking the kernel with TAINT_USER. > > > > > > I think we would have no other option but to add a new TAINT flag so > > > that we know that the taint flag was modified by a user. Perhaps just > > > re-using TAINT_USER when proc_taint() would suffice. > > > > > > > We might not need an extra taint flag if, perhaps, we could make these > > two features mutually exclusive. The idea here is that bitmasks added > > via panic_on_taint get filtered out in proc_taint(), so a malicious > > user couldn't exploit the latter interface to easily panic the system, > > when the first one is also in use. > > I get it, however I I can still see the person who enables enabling > panic-on-tain wanting to know if proc_taint() was used. So even if > it was not on their mask, if it was modified that seems like important > information for a bug report analysis. > For that purpose (tracking user taints) I think sth between these lines would work: diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 8a176d8727a3..651a82c13621 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2602,6 +2602,9 @@ int proc_douintvec(struct ctl_table *table, int write, do_proc_douintvec_conv, NULL); } +/* track which taint bits were set by the user */ +static unsigned long user_tainted; + /* * Taint values can only be increased * This means we can safely use a temporary. @@ -2629,11 +2632,20 @@ static int proc_taint(struct ctl_table *table, int write, */ int i; for (i = 0; i < BITS_PER_LONG && tmptaint >> i; i++) { - if ((tmptaint >> i) & 1) + if ((tmptaint >> i) & 1) { + set_bit(i, _tainted); add_taint(i, LOCKDEP_STILL_OK); + } } } + /* +* Users with SYS_ADMIN capability can fiddle with any arbitrary +* taint flag through this interface. +* If that's the case, we also need to mark the kernel "tainted by user" +*/ + add_taint(TAINT_USER, LOCKDEP_STILL_OK); + return err; } I don't think, though, it's panic_on_taint work to track that. I posted a v3 for this feature with a way to select if one wants to avoid user forced taints triggering panic() for flags also set for panic_on_taint. Cheers, -- Rafael
Re: [EXT] Re: [PATCH net-next 3/5] net: mvpp2: cls: Use RSS contexts to handle RSS tables
On Sat, May 09, 2020 at 02:51:05PM +0100, Russell King - ARM Linux admin wrote: > On Sat, May 09, 2020 at 03:14:05PM +0200, Matteo Croce wrote: > > On Sat, May 9, 2020 at 1:45 PM Russell King - ARM Linux admin > > wrote: > > > > > > On Sat, May 09, 2020 at 11:15:58AM +, Stefan Chulski wrote: > > > > > > > > > > > > > -Original Message- > > > > > From: Matteo Croce > > > > > Sent: Saturday, May 9, 2020 3:13 AM > > > > > To: David S . Miller > > > > > Cc: Maxime Chevallier ; netdev > > > > > ; LKML ; Antoine > > > > > Tenart ; Thomas Petazzoni > > > > > ; gregory.clem...@bootlin.com; > > > > > miquel.ray...@bootlin.com; Nadav Haklai ; Stefan > > > > > Chulski ; Marcin Wojtas ; > > > > > Linux > > > > > ARM ; Russell King - ARM Linux > > > > > admin > > > > > > > > > > Subject: [EXT] Re: [PATCH net-next 3/5] net: mvpp2: cls: Use RSS > > > > > contexts to > > > > > handle RSS tables > > > > > > > > > > Hi, > > > > > > > > > > What do you think about temporarily disabling it like this? > > > > > > > > > > --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > > > > +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > > > > @@ -5775,7 +5775,8 @@ static int mvpp2_port_probe(struct > > > > > platform_device > > > > > *pdev, > > > > > NETIF_F_HW_VLAN_CTAG_FILTER; > > > > > > > > > > if (mvpp22_rss_is_supported()) { > > > > > - dev->hw_features |= NETIF_F_RXHASH; > > > > > + if (port->phy_interface != PHY_INTERFACE_MODE_SGMII) > > > > > + dev->hw_features |= NETIF_F_RXHASH; > > > > > dev->features |= NETIF_F_NTUPLE; > > > > > } > > > > > > > > > > > > > > > David, is this "workaround" too bad to get accepted? > > > > > > > > Not sure that RSS related to physical interface(SGMII), better just > > > > remove NETIF_F_RXHASH as "workaround". > > > > > > Hmm, I'm not sure this is the right way forward. This patch has the > > > effect of disabling: > > > > > > d33ec4525007 ("net: mvpp2: add an RSS classification step for each flow") > > > > > > but the commit you're pointing at which caused the regression is: > > > > > > 895586d5dc32 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables") > > > > > > > > > > Hi, > > > > When git bisect pointed to 895586d5dc32 ("net: mvpp2: cls: Use RSS > > contexts to handle RSS tables"), which was merged > > almost an year after d33ec4525007 ("net: mvpp2: add an RSS > > classification step for each flow"), so I assume that between these > > two commits either the feature was working or it was disable and we > > didn't notice > > > > Without knowing what was happening, which commit should my Fixes tag point > > to? > > Let me make sure that I get this clear: > > - Prior to 895586d5dc32, you can turn on and off rxhash without issue > on any port. > - After 895586d5dc32, turning rxhash on eth2 prevents reception. > > Prior to 895586d5dc32, with rxhash on, it looks like hashing using > CRC32 is supported but only one context. So, if it's possible to > enable rxhash on any port on the mcbin without 895586d5dc32, and the > port continues to work, I'd say the bug was introduced by > 895586d5dc32. > > Of course, that would be reinforced if there was a measurable > difference in performance due to rxhash on each port. I've just run this test, but I can detect no difference in performance with or without 895586d5dc32 on eth0 or eth2 on the mcbin (apart from eth2 stopping working with 895586d5dc32 applied.) I tested this by reverting almost all changes to the mvpp2 driver between 5.6 and that commit. That's not too surprising; I'm using my cex7 platform with the Mellanox card in for one end of the 10G link, and that platform doesn't seem to be able to saturdate a 10G link - it only seems to manage around 4Gbps. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up
Re: [PATCH] memcg: expose root cgroup's memory.stat
On Sat, May 9, 2020 at 10:06 PM Shakeel Butt wrote: > > On Fri, May 8, 2020 at 2:44 PM Johannes Weiner wrote: > > > > On Fri, May 08, 2020 at 10:06:30AM -0700, Shakeel Butt wrote: > > > One way to measure the efficiency of memory reclaim is to look at the > > > ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are > > > not updated consistently at the system level and the ratio of these are > > > not very meaningful. The pgsteal and pgscan are updated for only global > > > reclaim while pgrefill gets updated for global as well as cgroup > > > reclaim. > > > > > > Please note that this difference is only for system level vmstats. The > > > cgroup stats returned by memory.stat are actually consistent. The > > > cgroup's pgsteal contains number of reclaimed pages for global as well > > > as cgroup reclaim. So, one way to get the system level stats is to get > > > these stats from root's memory.stat, so, expose memory.stat for the root > > > cgroup. > > > > > > from Johannes Weiner: > > > There are subtle differences between /proc/vmstat and > > > memory.stat, and cgroup-aware code that wants to watch the full > > > hierarchy currently has to know about these intricacies and > > > translate semantics back and forth. > > > > > > Generally having the fully recursive memory.stat at the root > > > level could help a broader range of usecases. > > > > The changelog begs the question why we don't just "fix" the > > system-level stats. It may be useful to include the conclusions from > > that discussion, and why there is value in keeping the stats this way. > > > > Right. Andrew, can you please add the following para to the changelog? > > Why not fix the stats by including both the global and cgroup reclaim > activity instead of exposing root cgroup's memory.stat? The reason is > the benefit of having metrics exposing the activity that happens > purely due to machine capacity rather than localized activity that > happens due to the limits throughout the cgroup tree. Additionally > there are userspace tools like sysstat(sar) which reads these stats to > inform about the system level reclaim activity. So, we should not > break such use-cases. > Acked-by: Yafang Shao > > > Signed-off-by: Shakeel Butt > > > Suggested-by: Johannes Weiner > > > > Acked-by: Johannes Weiner > > Thanks a lot. -- Thanks Yafang
Re: [RFC] splice/tee: len=0 fast path after validity check
On 5/9/20 2:46 AM, Pavel Begunkov wrote: > When len=0, splice() and tee() return 0 even if specified fds are > invalid, hiding errors from users. Move len=0 optimisation later after > basic validity checks. > > before: > splice(len=0, fd_in=-1, ...) == 0; > > after: > splice(len=0, fd_in=-1, ...) == -EBADF; I'm not sure what the purpose of this would be. It probably should have been done that way from the beginning, but it wasn't. While there's very little risk of breaking any applications due to this change, it also seems like a pointless exercise at this point. So my suggestion would be to just leave it alone. -- Jens Axboe
Re: [RFC PATCH 00/13] Core scheduling v5
On Tue, 2020-04-14 at 16:21 +0200, Peter Zijlstra wrote: > On Wed, Mar 04, 2020 at 04:59:50PM +, vpillai wrote: > > > > - Investigate the source of the overhead even when no tasks are > > tagged: > > https://lkml.org/lkml/2019/10/29/242 > > - explain why we're all still doing this > > Seriously, what actual problems does it solve? The patch-set still > isn't > L1TF complete and afaict it does exactly nothing for MDS. > Hey Peter! Late to the party, I know... But I'm replying anyway. At least, you'll have the chance to yell at me for this during OSPM. ;-P > Like I've written many times now, back when the world was simpler and > all we had to worry about was L1TF, core-scheduling made some sense, > but > how does it make sense today? > Indeed core-scheduling alone doesn't even completely solve L1TF. There are the interrupts and the VMEXITs issues. Both are being discussed in this thread and, FWIW, my personal opinion is that the way to go is what Alex says here: <79529592-5d60-2a41-fbb6-4a5f8279f...@amazon.com> (E.g., when he mentions solution 4 "Create a "safe" page table which runs with HT enabled", etc). But let's stick to your point: if it were only for L1TF, then fine, but it's all pointless because of MDS. My answer to this is very much focused on my usecase, which is virtualization. I know you hate us, and you surely have your good reasons, but you know... :-) Correct me if I'm wrong, but I think that the "nice" thing of L1TF is that it allows a VM to spy on another VM or on the host, but it does not allow a regular task to spy on another task or on the kernel (well, it would, but it's easily mitigated). The bad thing about MDS is that it instead allow *all* of that. Now, one thing that we absolutely want to avoid in virt is that a VM is able to spy on other VMs or on the host. Sure, we also care about tasks running in our VMs to be safe, but, really, inter-VM and VM-to-host isolation is the primary concern of an hypervisor. And how a VM (or stuff running inside a VM) can spy on another VM or on the host, via L1TF or MDS? Well, if the attacker VM and the victim VM --or if the attacker VM and the host-- are running on the same core. If they're not, it can't... which is basically an L1TF-only looking scenario. So, in virt, core-scheduling: 1) is the *only* way (aside from no-EPT) to prevent attacker VM to spy on victim VM, if they're running concurrently, both in guest mode, on the same core (and that's, of course, because with core-scheduling they just won't be doing that :-) ) 2) interrupts and VMEXITs needs being taken care of --which was the case already when, as you said "we had only L1TF". Once that is done we will effectively prevent all VM to VM and VM to host attack scenarios. Sure, it will still be possible, for instance, for task_A in VM1 to spy on task_B, also in VM1. This seems to be, AFAIUI, Joel's usecase, so I'm happy to leave it to him to defend that, as he's doing already (but indeed I'm very happy to see that it is also getting attention). Now, of course saying anything like "works for my own usecase so let's go for it" does not fly. But since you were asking whether and how this feature could make sense today, suppose that: 1) we get core-scheduling, 2) we find a solution for irqs and VMEXITs, as we would have to if there was only L1TF, 3) we manage to make the overhead of core-scheduling close to zero when it's there (I mean, enabled at compile time) but not used (I mean, no tagging of tasks, or whatever). That would mean that virt people can enable core-scheduling, and achieve good inter-VM and VM-to-host isolation, without imposing overhead to other use cases, that would leave core-scheduling disabled. And this is something that I would think it makes sense. Of course, we're not there... because even when this series will give us point 1, we will also need 2 and we need to make sure we also satisfy 3 (and we weren't, last time I checked ;-P). But I think it's worth keeping trying. I'd also add a couple of more ideas, still about core-scheduling in virt, but from a different standpoint than security: - if I tag vcpu0 and vcpu1 together[*], then vcpu2 and vcpu3 together, then vcpu4 and vcpu5 together, then I'm sure that each pair will always be scheduled on the same core. At which point I can define an SMT virtual topology, for the VM, that will make sense, even without pinning the vcpus; - if I run VMs from different customers, when vcpu2 of VM1 and vcpu1 of VM2 run on the same core, they influence each others' performance. If, e.g., I bill basing on time spent on CPUs, it means customer A's workload, running in VM1, may influence the billing of customer B, who owns VM2. With core scheduling, if I tag all the vcpus of each VM together, I won't have this any longer. [*] with "tag together" I mean let them have the same tag which, ATM would be "put them in the same cgroup and enable cpu.tag".
Re: ofono for d4: less hcked and more working version was Re: USB networking news, ofono for d4: less hacked version
* Pavel Machek [200508 10:03]: > I pushed new version of ofono: I'm still not sure about those incoming > sms (but _some_ sms are received). Rest should be better. Please check that you have applied commit 738b150ecefb ("ARM: dts: omap4-droid4: Fix occasional lost wakeirq for uart1"), otherwise incoming SMS may not always show up, and GPS can stop producing data. Hmm for ofono motchat, why not handle the U part directly in motchat and use just a timestamp based ID there? Regards, Tony
Re: [PATCH 4/6] exec: Run sync_mm_rss before taking exec_update_mutex
Kees Cook writes: > $ git grep exec_mm_release > fs/exec.c: exec_mm_release(tsk, old_mm); > include/linux/sched/mm.h:extern void exec_mm_release(struct task_struct *, > struct mm_struct *); > kernel/fork.c:void exec_mm_release(struct task_struct *tsk, struct mm_struct > *mm) > > kernel/fork.c: > > void exit_mm_release(struct task_struct *tsk, struct mm_struct *mm) > { > futex_exit_release(tsk); > mm_release(tsk, mm); > } > > void exec_mm_release(struct task_struct *tsk, struct mm_struct *mm) > { > futex_exec_release(tsk); > mm_release(tsk, mm); > } > > $ git grep exit_mm_release > include/linux/sched/mm.h:extern void exit_mm_release(struct task_struct *, > struct mm_struct *); > kernel/exit.c: exit_mm_release(current, mm); > kernel/fork.c:void exit_mm_release(struct task_struct *tsk, struct mm_struct > *mm) > > kernel/exit.c: > > exit_mm_release(current, mm); > if (!mm) > return; > sync_mm_rss(mm); > > It looks to me like both exec_mm_release() and exit_mm_release() could > easily have the sync_mm_rss(...) folded into their function bodies and > removed from the callers. *shrug* Well it would have to be all of: if (mm) sync_mm_rss(mm); I remember reading through exit_mm_release and seeing that nothing actually depended upon a non-NULL mm. Unless you have clear_child_tid set. I am not up to speed on that part of the mm layer right now to know if it is a good idea to put sync_mm_rss in exit_mm_release but at a quick look it feels like it. Eric
[PATCH] mm: fix LRU balancing effect of new transparent huge pages
From: Johannes Weiner Currently, THP are counted as single pages until they are split right before being swapped out. However, at that point the VM is already in the middle of reclaim, and adjusting the LRU balance then is useless. Always account THP by the number of basepages, and remove the fixup from the splitting path. Signed-off-by: Johannes Weiner Signed-off-by: Shakeel Butt --- Revived the patch from https://lore.kernel.org/patchwork/patch/685703/ mm/swap.c | 23 +-- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index 4eb179ee0b72..b75c0ce90418 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -262,14 +262,14 @@ void rotate_reclaimable_page(struct page *page) } } -static void update_page_reclaim_stat(struct lruvec *lruvec, -int file, int rotated) +static void update_page_reclaim_stat(struct lruvec *lruvec, int file, +int rotated, int nr_pages) { struct zone_reclaim_stat *reclaim_stat = >reclaim_stat; - reclaim_stat->recent_scanned[file]++; + reclaim_stat->recent_scanned[file] += nr_pages; if (rotated) - reclaim_stat->recent_rotated[file]++; + reclaim_stat->recent_rotated[file] += nr_pages; } static void __activate_page(struct page *page, struct lruvec *lruvec, @@ -288,7 +288,7 @@ static void __activate_page(struct page *page, struct lruvec *lruvec, __count_vm_events(PGACTIVATE, nr_pages); __count_memcg_events(lruvec_memcg(lruvec), PGACTIVATE, nr_pages); - update_page_reclaim_stat(lruvec, file, 1); + update_page_reclaim_stat(lruvec, file, 1, nr_pages); } } @@ -546,7 +546,7 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, __count_vm_events(PGDEACTIVATE, nr_pages); __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_pages); } - update_page_reclaim_stat(lruvec, file, 0); + update_page_reclaim_stat(lruvec, file, 0, nr_pages); } static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, @@ -564,7 +564,7 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, __count_vm_events(PGDEACTIVATE, nr_pages); __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_pages); - update_page_reclaim_stat(lruvec, file, 0); + update_page_reclaim_stat(lruvec, file, 0, nr_pages); } } @@ -590,7 +590,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, __count_vm_events(PGLAZYFREE, nr_pages); __count_memcg_events(lruvec_memcg(lruvec), PGLAZYFREE, nr_pages); - update_page_reclaim_stat(lruvec, 1, 0); + update_page_reclaim_stat(lruvec, 1, 0, nr_pages); } } @@ -899,8 +899,6 @@ EXPORT_SYMBOL(__pagevec_release); void lru_add_page_tail(struct page *page, struct page *page_tail, struct lruvec *lruvec, struct list_head *list) { - const int file = 0; - VM_BUG_ON_PAGE(!PageHead(page), page); VM_BUG_ON_PAGE(PageCompound(page_tail), page); VM_BUG_ON_PAGE(PageLRU(page_tail), page); @@ -926,9 +924,6 @@ void lru_add_page_tail(struct page *page, struct page *page_tail, add_page_to_lru_list_tail(page_tail, lruvec, page_lru(page_tail)); } - - if (!PageUnevictable(page)) - update_page_reclaim_stat(lruvec, file, PageActive(page_tail)); } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ @@ -973,7 +968,7 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, if (page_evictable(page)) { lru = page_lru(page); update_page_reclaim_stat(lruvec, page_is_file_lru(page), -PageActive(page)); +PageActive(page), nr_pages); if (was_unevictable) __count_vm_events(UNEVICTABLE_PGRESCUED, nr_pages); } else { -- 2.26.2.645.ge9eca65c58-goog
[PATCH net] mvpp2: enable rxhash only on the first port
Currently rxhash only works on the first port of the CP (Communication Processor). Enabling it on other ports completely blocks packet reception. This patch only adds rxhash as supported feature to the first port, so rxhash can't be enabled on other ports: # ethtool -K eth0 rxhash on # ethtool -K eth1 rxhash on # ethtool -K eth2 rxhash on Cannot change receive-hashing Could not change any device features # ethtool -K eth3 rxhash on Cannot change receive-hashing Could not change any device features Fixes: 895586d5dc32 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables") Signed-off-by: Matteo Croce --- drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c index 2b5dad2ec650..ba71583c7ae3 100644 --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c @@ -5423,7 +5423,8 @@ static int mvpp2_port_probe(struct platform_device *pdev, NETIF_F_HW_VLAN_CTAG_FILTER; if (mvpp22_rss_is_supported()) { - dev->hw_features |= NETIF_F_RXHASH; + if (port->id == 0) + dev->hw_features |= NETIF_F_RXHASH; dev->features |= NETIF_F_NTUPLE; } -- 2.26.2
[PATCH] gpiolib: notify user-space about line status changes after flags are set
From: Bartosz Golaszewski Currently we emit the REQUESTED line state event after the line is requested but before the flags are configured. This is obviously wrong as we want to pass the updated lineinfo to user-space together with the event. Since the flags can be configured in different ways depending on how the line is being requested - we need to call the notifier chain in different places separately. Fixes: 51c1064e82e7 ("gpiolib: add new ioctl() for monitoring changes in line info") Signed-off-by: Bartosz Golaszewski --- This comes late in the release cycle but I only recently got down to writing libgpiod support for this new ioctl(). When writing test cases I noticed this doesn't really work as expected. This patch fixes the issue I identified this week. There may be more coming the following week though... drivers/gpio/gpiolib.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 40f2d7f69be2..550751a6e51c 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -729,6 +729,10 @@ static int linehandle_create(struct gpio_device *gdev, void __user *ip) if (ret) goto out_free_descs; } + + atomic_notifier_call_chain(>gdev->notifier, + GPIOLINE_CHANGED_REQUESTED, desc); + dev_dbg(>dev, "registered chardev handle for line %d\n", offset); } @@ -1083,6 +1087,9 @@ static int lineevent_create(struct gpio_device *gdev, void __user *ip) if (ret) goto out_free_desc; + atomic_notifier_call_chain(>gdev->notifier, + GPIOLINE_CHANGED_REQUESTED, desc); + le->irq = gpiod_to_irq(desc); if (le->irq <= 0) { ret = -ENODEV; @@ -2975,8 +2982,6 @@ static int gpiod_request_commit(struct gpio_desc *desc, const char *label) } done: spin_unlock_irqrestore(_lock, flags); - atomic_notifier_call_chain(>gdev->notifier, - GPIOLINE_CHANGED_REQUESTED, desc); return ret; } @@ -4938,6 +4943,9 @@ struct gpio_desc *__must_check gpiod_get_index(struct device *dev, return ERR_PTR(ret); } + atomic_notifier_call_chain(>gdev->notifier, + GPIOLINE_CHANGED_REQUESTED, desc); + return desc; } EXPORT_SYMBOL_GPL(gpiod_get_index); @@ -5003,6 +5011,9 @@ struct gpio_desc *fwnode_get_named_gpiod(struct fwnode_handle *fwnode, return ERR_PTR(ret); } + atomic_notifier_call_chain(>gdev->notifier, + GPIOLINE_CHANGED_REQUESTED, desc); + return desc; } EXPORT_SYMBOL_GPL(fwnode_get_named_gpiod); -- 2.25.0
Re: linux-next 20200508 - build failure in kernel/resource.c w/ SPARSEMEM=n
I think the problem is the select of CONFIG_DEVICE_PRIVATE. Jason is looking into it already.
Re: [PATCH 1/4] dma-mapping: move the remaining DMA API calls out of line
On 09/05/2020 18:19, Christoph Hellwig wrote: > On Tue, May 05, 2020 at 02:18:37PM +1000, Alexey Kardashevskiy wrote: >> >> >> On 17/04/2020 17:58, Christoph Hellwig wrote: >>> On Wed, Apr 15, 2020 at 09:21:37PM +1000, Alexey Kardashevskiy wrote: And the fact they were exported leaves possibility that there is a driver somewhere relying on these symbols or distro kernel won't build because the symbol disappeared from exports (I do not know what KABI guarantees or if mainline kernel cares). >>> >>> We absolutely do not care. In fact for abuses of APIs that drivers >>> should not use we almost care to make them private and break people >>> abusing them. >> >> ok :) >> I do not care in particular but some might, a line separated with empty lines in the commit log would do. >>> >>> I'll add a blurb for the next version. >> >> >> Has it gone anywhere? Thanks, > > I've been hoping for the sg_buf helpers to land first, as they need > backporting and would conflict. Do you urgently need the series? Nah, not that urgent. Thanks, -- Alexey
Re: [PATCH] memcg: expose root cgroup's memory.stat
On Fri, May 8, 2020 at 2:44 PM Johannes Weiner wrote: > > On Fri, May 08, 2020 at 10:06:30AM -0700, Shakeel Butt wrote: > > One way to measure the efficiency of memory reclaim is to look at the > > ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are > > not updated consistently at the system level and the ratio of these are > > not very meaningful. The pgsteal and pgscan are updated for only global > > reclaim while pgrefill gets updated for global as well as cgroup > > reclaim. > > > > Please note that this difference is only for system level vmstats. The > > cgroup stats returned by memory.stat are actually consistent. The > > cgroup's pgsteal contains number of reclaimed pages for global as well > > as cgroup reclaim. So, one way to get the system level stats is to get > > these stats from root's memory.stat, so, expose memory.stat for the root > > cgroup. > > > > from Johannes Weiner: > > There are subtle differences between /proc/vmstat and > > memory.stat, and cgroup-aware code that wants to watch the full > > hierarchy currently has to know about these intricacies and > > translate semantics back and forth. > > > > Generally having the fully recursive memory.stat at the root > > level could help a broader range of usecases. > > The changelog begs the question why we don't just "fix" the > system-level stats. It may be useful to include the conclusions from > that discussion, and why there is value in keeping the stats this way. > Right. Andrew, can you please add the following para to the changelog? Why not fix the stats by including both the global and cgroup reclaim activity instead of exposing root cgroup's memory.stat? The reason is the benefit of having metrics exposing the activity that happens purely due to machine capacity rather than localized activity that happens due to the limits throughout the cgroup tree. Additionally there are userspace tools like sysstat(sar) which reads these stats to inform about the system level reclaim activity. So, we should not break such use-cases. > > Signed-off-by: Shakeel Butt > > Suggested-by: Johannes Weiner > > Acked-by: Johannes Weiner Thanks a lot.
Re: [PATCH 3/3] mm: swap: fix update_page_reclaim_stat for huge pages
On Fri, May 8, 2020 at 2:51 PM Johannes Weiner wrote: > > On Fri, May 08, 2020 at 02:22:15PM -0700, Shakeel Butt wrote: > > Currently update_page_reclaim_stat() updates the lruvec.reclaim_stats > > just once for a page irrespective if a page is huge or not. Fix that by > > passing the hpage_nr_pages(page) to it. > > > > Signed-off-by: Shakeel Butt > > https://lore.kernel.org/patchwork/patch/685703/ > > Laughs, then cries. > What happened to that patch? Fell through the cracks? > > @@ -928,7 +928,7 @@ void lru_add_page_tail(struct page *page, struct page > > *page_tail, > > } > > > > if (!PageUnevictable(page)) > > - update_page_reclaim_stat(lruvec, file, PageActive(page_tail)); > > + update_page_reclaim_stat(lruvec, file, PageActive(page_tail), > > 1); > > The change to __pagevec_lru_add_fn() below makes sure the tail pages > are already accounted. This would make them count twice. > Yes, you are right. I will just re-send your patch after rebase. > > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > > > @@ -973,7 +973,7 @@ static void __pagevec_lru_add_fn(struct page *page, > > struct lruvec *lruvec, > > if (page_evictable(page)) { > > lru = page_lru(page); > > update_page_reclaim_stat(lruvec, page_is_file_lru(page), > > - PageActive(page)); > > + PageActive(page), nr_pages); > > if (was_unevictable) > > __count_vm_events(UNEVICTABLE_PGRESCUED, nr_pages); > > } else {
Re: [PATCHv3] w1: omap-hdq: Simplify driver with PM runtime autosuspend
* H. Nikolaus Schaller [200509 11:48]: > I have found another small bug and a dev_dbg format weakness > and now it seems to work well even if I remove or reinsert the > battery while read operations are ongoing. OK great. > Still I need more time to fix up the patch(es). Well it's ready when it's ready :) Tony
[PATCH v3] kernel: add panic_on_taint
Analogously to the introduction of panic_on_warn, this patch introduces a kernel option named panic_on_taint in order to provide a simple and generic way to stop execution and catch a coredump when the kernel gets tainted by any given taint flag. This is useful for debugging sessions as it avoids rebuilding the kernel to explicitly add calls to panic() or BUG() into code sites that introduce the taint flags of interest. Another, perhaps less frequent, use for this option would be as a mean for assuring a security policy (in paranoid mode) case where no single taint is allowed for the running system. Suggested-by: Qian Cai Signed-off-by: Rafael Aquini --- Changelog: * v2: get rid of unnecessary/misguided compiler hints (Luis) * v2: enhance documentation text for the new kernel parameter (Randy) * v3: drop sysctl interface, keep it only as a kernel parameter (Luis) Documentation/admin-guide/kdump/kdump.rst | 10 + .../admin-guide/kernel-parameters.txt | 15 +++ include/linux/kernel.h| 2 + kernel/panic.c| 40 +++ kernel/sysctl.c | 9 - 5 files changed, 75 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index ac7e131d2935..de3cf6d377cc 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -521,6 +521,16 @@ will cause a kdump to occur at the panic() call. In cases where a user wants to specify this during runtime, /proc/sys/kernel/panic_on_warn can be set to 1 to achieve the same behaviour. +Trigger Kdump on add_taint() + + +The kernel parameter, panic_on_taint, calls panic() from within add_taint(), +whenever the value set in this bitmask matches with the bit flag being set +by add_taint(). This will cause a kdump to occur at the panic() call. +In cases where a user wants to specify this during runtime, +/proc/sys/kernel/panic_on_taint can be set to a respective bitmask value +to achieve the same behaviour. + Contact === diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 7bc83f3d9bdf..4a69fe49a70d 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3404,6 +3404,21 @@ panic_on_warn panic() instead of WARN(). Useful to cause kdump on a WARN(). + panic_on_taint= [KNL] conditionally panic() in add_taint() + Format: + Specifies, as a string, the TAINT flag set that will + compose a bitmask for calling panic() when the kernel + gets tainted. + See Documentation/admin-guide/tainted-kernels.rst for + details on the taint flags that users can pick to + compose the bitmask to assign to panic_on_taint. + When the string is prefixed with a '-' the bitmask + set in panic_on_taint will be mutually exclusive + with the sysctl knob kernel.tainted, and any attempt + to write to that sysctl will fail with -EINVAL for + any taint value that masks with the flags set for + this option. + crash_kexec_post_notifiers Run kdump after running panic-notifiers and dumping kmsg. This only for the users who doubt kdump always diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 9b7a8d74a9d6..66bc102cb59a 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -528,6 +528,8 @@ extern int panic_on_oops; extern int panic_on_unrecovered_nmi; extern int panic_on_io_nmi; extern int panic_on_warn; +extern unsigned long panic_on_taint; +extern bool panic_on_taint_exclusive; extern int sysctl_panic_on_rcu_stall; extern int sysctl_panic_on_stackoverflow; diff --git a/kernel/panic.c b/kernel/panic.c index b69ee9e76cb2..65c62f8a1de8 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -44,6 +45,8 @@ static int pause_on_oops_flag; static DEFINE_SPINLOCK(pause_on_oops_lock); bool crash_kexec_post_notifiers; int panic_on_warn __read_mostly; +unsigned long panic_on_taint; +bool panic_on_taint_exclusive = false; int panic_timeout = CONFIG_PANIC_TIMEOUT; EXPORT_SYMBOL_GPL(panic_timeout); @@ -434,6 +437,11 @@ void add_taint(unsigned flag, enum lockdep_ok lockdep_ok) pr_warn("Disabling lock debugging due to kernel taint\n"); set_bit(flag, _mask); + + if (tainted_mask & panic_on_taint) { + panic_on_taint = 0; + panic("panic_on_taint set
Re: ofono for d4: less hcked and more working version was Re: USB networking news, ofono for d4: less hacked version
* Pavel Machek [200508 10:03]: > Hi! > > > > But I might be confused. I recall some audio patches were needed for > > > basic phone calls (setting up mixers to connect gsm<->audio), but > > > those worked before gsmux support was enabled. (Maybe some hardcoded > > > commands were needed to be sent to gsmmux somewhere). > > > > We're currently reconfiguring the TDM transport that based on the > > unsolicited messages on dlci1. I still need to figure out how to add > > that back while keeping the serdev-ngsm driver generic. > > Is it really neccessary? I believe I was simply configuring codecs for > voice call and left them like that. Yes something needs to call set_tdm_slot() for voice calls. AFAIK we have only the dlci1 messages indicating a voice call is happening. So I got that added directly into the snd-soc-motmdm driver where it now listens to notifications on dlci1 while we still have /dev/gsmtty1 for userspace. Looks like I still need to fix the mixer controls before I can test voice calls again though.. Will post the patches as soon as I get that done. I'm not yet sure what's the preferred way to do notifications with ALSA, but based on the earlier comments the piece of code calling set_tdm_slot() should not be the codec driver. Instead, it should be called by either snd-soc-audio-graph-card or a custom card driver for the device. > I pushed new version of ofono: I'm still not sure about those incoming > sms (but _some_ sms are received). Rest should be better. OK Regards, Tony
Re: [EXT] Re: [PATCH net-next 3/5] net: mvpp2: cls: Use RSS contexts to handle RSS tables
On Sat, May 09, 2020 at 03:14:05PM +0200, Matteo Croce wrote: > On Sat, May 9, 2020 at 1:45 PM Russell King - ARM Linux admin > wrote: > > > > On Sat, May 09, 2020 at 11:15:58AM +, Stefan Chulski wrote: > > > > > > > > > > -Original Message- > > > > From: Matteo Croce > > > > Sent: Saturday, May 9, 2020 3:13 AM > > > > To: David S . Miller > > > > Cc: Maxime Chevallier ; netdev > > > > ; LKML ; Antoine > > > > Tenart ; Thomas Petazzoni > > > > ; gregory.clem...@bootlin.com; > > > > miquel.ray...@bootlin.com; Nadav Haklai ; Stefan > > > > Chulski ; Marcin Wojtas ; Linux > > > > ARM ; Russell King - ARM Linux > > > > admin > > > > > > > > Subject: [EXT] Re: [PATCH net-next 3/5] net: mvpp2: cls: Use RSS > > > > contexts to > > > > handle RSS tables > > > > > > > > Hi, > > > > > > > > What do you think about temporarily disabling it like this? > > > > > > > > --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > > > +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > > > @@ -5775,7 +5775,8 @@ static int mvpp2_port_probe(struct platform_device > > > > *pdev, > > > > NETIF_F_HW_VLAN_CTAG_FILTER; > > > > > > > > if (mvpp22_rss_is_supported()) { > > > > - dev->hw_features |= NETIF_F_RXHASH; > > > > + if (port->phy_interface != PHY_INTERFACE_MODE_SGMII) > > > > + dev->hw_features |= NETIF_F_RXHASH; > > > > dev->features |= NETIF_F_NTUPLE; > > > > } > > > > > > > > > > > > David, is this "workaround" too bad to get accepted? > > > > > > Not sure that RSS related to physical interface(SGMII), better just > > > remove NETIF_F_RXHASH as "workaround". > > > > Hmm, I'm not sure this is the right way forward. This patch has the > > effect of disabling: > > > > d33ec4525007 ("net: mvpp2: add an RSS classification step for each flow") > > > > but the commit you're pointing at which caused the regression is: > > > > 895586d5dc32 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables") > > > > > > Hi, > > When git bisect pointed to 895586d5dc32 ("net: mvpp2: cls: Use RSS > contexts to handle RSS tables"), which was merged > almost an year after d33ec4525007 ("net: mvpp2: add an RSS > classification step for each flow"), so I assume that between these > two commits either the feature was working or it was disable and we > didn't notice > > Without knowing what was happening, which commit should my Fixes tag point to? Let me make sure that I get this clear: - Prior to 895586d5dc32, you can turn on and off rxhash without issue on any port. - After 895586d5dc32, turning rxhash on eth2 prevents reception. Prior to 895586d5dc32, with rxhash on, it looks like hashing using CRC32 is supported but only one context. So, if it's possible to enable rxhash on any port on the mcbin without 895586d5dc32, and the port continues to work, I'd say the bug was introduced by 895586d5dc32. Of course, that would be reinforced if there was a measurable difference in performance due to rxhash on each port. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up
[PATCH 0/4] arm64: kgdb/kdb: Fix single-step debugging issues
This patch set is to fix several issues of single-step debugging in kgdb/kdb on arm64. It seems that these issues have been shelved a very long time, but i still hope to solve them, as the single-step debugging is an useful feature. Note: Based on patch "arm64: cacheflush: Fix KGDB trap detection", https://www.spinics.net/lists/arm-kernel/msg803741.html Wei Li (4): arm64: kgdb: Fix single-step exception handling oops arm64: Extract kprobes_save_local_irqflag() and kprobes_restore_local_irqflag() arm64: kgdb: Fix single-stepping into the irq handler wrongly arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step arch/arm64/include/asm/debug-monitors.h | 6 ++ arch/arm64/kernel/debug-monitors.c | 28 - arch/arm64/kernel/kgdb.c| 16 +++--- arch/arm64/kernel/probes/kprobes.c | 28 ++--- 4 files changed, 48 insertions(+), 30 deletions(-) -- 2.17.1
[PATCH 2/4] arm64: Extract kprobes_save_local_irqflag() and kprobes_restore_local_irqflag()
PSTATE.I and PSTATE.D are very important for single-step working. Without disabling interrupt on local CPU, there is a chance of interrupt occurrence in the period of exception return and start of out-of-line single-step, that result in wrongly single stepping into the interrupt handler. And if D bit is set then, it results into undefined exception and when it's handler enables dbg then single-step exception is generated, not as expected. As they are maintained well in kprobes_save_local_irqflag() and kprobes_restore_local_irqflag() for kprobe module, extract them as kernel_prepare_single_step() and kernel_cleanup_single_step() for general use. Signed-off-by: Wei Li --- arch/arm64/include/asm/debug-monitors.h | 4 arch/arm64/kernel/debug-monitors.c | 26 +++ arch/arm64/kernel/probes/kprobes.c | 28 ++--- 3 files changed, 32 insertions(+), 26 deletions(-) diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h index 7619f473155f..b62469f3475b 100644 --- a/arch/arm64/include/asm/debug-monitors.h +++ b/arch/arm64/include/asm/debug-monitors.h @@ -113,6 +113,10 @@ void user_fastforward_single_step(struct task_struct *task); void kernel_enable_single_step(struct pt_regs *regs); void kernel_disable_single_step(void); int kernel_active_single_step(void); +void kernel_prepare_single_step(unsigned long *flags, + struct pt_regs *regs); +void kernel_cleanup_single_step(unsigned long flags, + struct pt_regs *regs); #ifdef CONFIG_HAVE_HW_BREAKPOINT int reinstall_suspended_bps(struct pt_regs *regs); diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c index 48222a4760c2..25ce6b5a52d2 100644 --- a/arch/arm64/kernel/debug-monitors.c +++ b/arch/arm64/kernel/debug-monitors.c @@ -429,6 +429,32 @@ int kernel_active_single_step(void) } NOKPROBE_SYMBOL(kernel_active_single_step); +/* + * Interrupts need to be disabled before single-step mode is set, and not + * reenabled until after single-step mode ends. + * Without disabling interrupt on local CPU, there is a chance of + * interrupt occurrence in the period of exception return and start of + * out-of-line single-step, that result in wrongly single stepping + * into the interrupt handler. + */ +void kernel_prepare_single_step(unsigned long *flags, + struct pt_regs *regs) +{ + *flags = regs->pstate & DAIF_MASK; + regs->pstate |= PSR_I_BIT; + /* Unmask PSTATE.D for enabling software step exceptions. */ + regs->pstate &= ~PSR_D_BIT; +} +NOKPROBE_SYMBOL(kernel_prepare_single_step); + +void kernel_cleanup_single_step(unsigned long flags, + struct pt_regs *regs) +{ + regs->pstate &= ~DAIF_MASK; + regs->pstate |= flags; +} +NOKPROBE_SYMBOL(kernel_cleanup_single_step); + /* ptrace API */ void user_enable_single_step(struct task_struct *task) { diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c index d1c95dcf1d78..c655b6b543e3 100644 --- a/arch/arm64/kernel/probes/kprobes.c +++ b/arch/arm64/kernel/probes/kprobes.c @@ -168,30 +168,6 @@ static void __kprobes set_current_kprobe(struct kprobe *p) __this_cpu_write(current_kprobe, p); } -/* - * Interrupts need to be disabled before single-step mode is set, and not - * reenabled until after single-step mode ends. - * Without disabling interrupt on local CPU, there is a chance of - * interrupt occurrence in the period of exception return and start of - * out-of-line single-step, that result in wrongly single stepping - * into the interrupt handler. - */ -static void __kprobes kprobes_save_local_irqflag(struct kprobe_ctlblk *kcb, - struct pt_regs *regs) -{ - kcb->saved_irqflag = regs->pstate & DAIF_MASK; - regs->pstate |= PSR_I_BIT; - /* Unmask PSTATE.D for enabling software step exceptions. */ - regs->pstate &= ~PSR_D_BIT; -} - -static void __kprobes kprobes_restore_local_irqflag(struct kprobe_ctlblk *kcb, - struct pt_regs *regs) -{ - regs->pstate &= ~DAIF_MASK; - regs->pstate |= kcb->saved_irqflag; -} - static void __kprobes set_ss_context(struct kprobe_ctlblk *kcb, unsigned long addr) { @@ -227,7 +203,7 @@ static void __kprobes setup_singlestep(struct kprobe *p, set_ss_context(kcb, slot); /* mark pending ss */ /* IRQs and single stepping do not mix well. */ - kprobes_save_local_irqflag(kcb, regs); + kernel_prepare_single_step(>saved_irqflag, regs); kernel_enable_single_step(regs); instruction_pointer_set(regs, slot); } else { @@ -414,7 +390,7 @@ kprobe_single_step_handler(struct pt_regs *regs,
[PATCH 3/4] arm64: kgdb: Fix single-stepping into the irq handler wrongly
After the single-step exception handling oops is fixed, when we execute single-step in kdb/kgdb, we may see it jumps to the irq_handler (where PSTATE.D is cleared) instead of the next instruction. Add the prepare and cleanup work for single-step when enabling and disabling to maintain the PSTATE.I and PSTATE.D carefully. Before this patch: * kdb: Entering kdb (current=0x8000119e2dc0, pid 0) on processor 0 due to Keyboard Entry [0]kdb> bp printk Instruction(i) BP #0 at 0x8000101486cc (printk) is enabled addr at 8000101486cc, hardtype=0 installed=0 [0]kdb> g / # echo h > /proc/sysrq-trigger Entering kdb (current=0xfada65c0, pid 267) on processor 0 due to Breakpoint @ 0x8000101486cc [0]kdb> ss Entering kdb (current=0xfada65c0, pid 267) on processor 0 due to SS trap @ 0x800010082ab8 [0]kdb> 0x800010082ab8 0x800010082ab8 = 0x800010082ab8 (el1_irq+0x78) [0]kdb> 0x800010082ab0 <+112>: nop 0x800010082ab4 <+116>: msr daifclr, #0xd 0x800010082ab8 <+120>: adrpx1, 0x8000113a7000 0x800010082abc <+124>: ldr x1, [x1, #1408] * kgdb: (gdb) target remote 127.1:23002 Remote debugging using 127.1:23002 arch_kgdb_breakpoint () at /home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21 21 asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM)); (gdb) b printk Breakpoint 1 at 0x8000101486cc: file /home/liwei/main_code/linux/kernel/printk/printk.c, line 2076. (gdb) c Continuing. [New Thread 287] [Switching to Thread 283] Thread 177 hit Breakpoint 1, printk (fmt=0x80001130c9d8 "\001\066sysrq: HELP : ") at /home/liwei/main_code/linux/kernel/printk/printk.c:2076 2076{ (gdb) stepi el1_irq () at /home/liwei/main_code/linux/arch/arm64/kernel/entry.S:608 608 irq_handler (gdb) After this patch: * kdb: Entering kdb (current=0x8000119d2dc0, pid 0) on processor 0 due to Keyboard Entry [0]kdb> bp printk Instruction(i) BP #0 at 0x80001014874c (printk) is enabled addr at 80001014874c, hardtype=0 installed=0 [0]kdb> g / # echo h > /proc/sysrq-trigger Entering kdb (current=0xfa6948c0, pid 265) on processor 0 due to Breakpoint @ 0x80001014874c [0]kdb> ss Entering kdb (current=0xfa6948c0, pid 265) on processor 0 due to SS trap @ 0x800010148750 [0]kdb> * kgdb: (gdb) target remote 127.1:23002 Remote debugging using 127.1:23002 arch_kgdb_breakpoint () at /home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21 21 asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM)); (gdb) b printk Breakpoint 1 at 0x80001014874c: file /home/liwei/main_code/linux/kernel/printk/printk.c, line 2076. (gdb) c Continuing. [New Thread 277] [Switching to Thread 276] Thread 171 hit Breakpoint 1, printk (fmt=0x8000112fc130 "\001\066sysrq: HELP : ") at /home/liwei/main_code/linux/kernel/printk/printk.c:2076 2076{ (gdb) stepi 0x800010148750 2076{ (gdb) Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support") Signed-off-by: Wei Li --- arch/arm64/kernel/kgdb.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c index 1a157ca33262..3910ac06c261 100644 --- a/arch/arm64/kernel/kgdb.c +++ b/arch/arm64/kernel/kgdb.c @@ -100,6 +100,8 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = { { "fpcr", 4, -1 }, }; +static DEFINE_PER_CPU(unsigned long, kgdb_ss_flags); + char *dbg_get_reg(int regno, void *mem, struct pt_regs *regs) { if (regno >= DBG_MAX_REG_NUM || regno < 0) @@ -200,8 +202,11 @@ int kgdb_arch_handle_exception(int exception_vector, int signo, /* * Received continue command, disable single step */ - if (kernel_active_single_step()) + if (kernel_active_single_step()) { + kernel_cleanup_single_step(per_cpu(kgdb_ss_flags, + raw_smp_processor_id()), linux_regs); kernel_disable_single_step(); + } err = 0; break; @@ -221,8 +226,12 @@ int kgdb_arch_handle_exception(int exception_vector, int signo, /* * Enable single step handling */ - if (!kernel_active_single_step()) + if (!kernel_active_single_step()) { + kernel_prepare_single_step(_cpu(kgdb_ss_flags, + raw_smp_processor_id()), linux_regs); kernel_enable_single_step(linux_regs); + } + err = 0; break; default: -- 2.17.1
[PATCH 4/4] arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
After fixing wrongly single-stepping into the irq handler, when we execute single-step in kdb/kgdb, we can see only the first step can work. Refer to the ARM Architecture Reference Manual (ARM DDI 0487E.a) D2.12, i think PSTATE.SS=1 should be set each step for transferring the PE to the 'Active-not-pending' state. The problem here is PSTATE.SS=1 is not set since the second single-step. After the first single-step, the PE transferes to the 'Inactive' state, with PSTATE.SS=0 and MDSCR.SS=1, thus PSTATE.SS won't be set to 1 due to kernel_active_single_step()=true. Then the PE transferes to the 'Active-pending' state when ERET and returns to the debugger by step exception. Before this patch: * kdb: Entering kdb (current=0x8000119d2dc0, pid 0) on processor 0 due to Keyboard Entry [0]kdb> bp printk Instruction(i) BP #0 at 0x80001014874c (printk) is enabled addr at 80001014874c, hardtype=0 installed=0 [0]kdb> g / # echo h > /proc/sysrq-trigger Entering kdb (current=0xfa6948c0, pid 265) on processor 3 due to Breakpoint @ 0x80001014874c [3]kdb> ss Entering kdb (current=0xfa6948c0, pid 265) on processor 3 due to SS trap @ 0x800010148750 [3]kdb> ss Entering kdb (current=0xfa6948c0, pid 265) on processor 3 due to SS trap @ 0x800010148750 [3]kdb> ss Entering kdb (current=0xfa6948c0, pid 265) on processor 3 due to SS trap @ 0x800010148750 [3]kdb> * kgdb: (gdb) target remote 127.1:23002 Remote debugging using 127.1:23002 arch_kgdb_breakpoint () at /home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21 21 asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM)); (gdb) b printk Breakpoint 1 at 0x80001014874c: file /home/liwei/main_code/linux/kernel/printk/printk.c, line 2076. (gdb) c Continuing. [New Thread 277] [Switching to Thread 276] Thread 171 hit Breakpoint 1, printk (fmt=0x8000112fc130 "\001\066sysrq: HELP : ") at /home/liwei/main_code/linux/kernel/printk/printk.c:2076 2076{ (gdb) stepi 0x800010148750 2076{ (gdb) stepi 0x800010148750 2076{ (gdb) stepi 0x800010148750 2076{ (gdb) After this patch: * kdb: Entering kdb (current=0x8000119d2dc0, pid 0) on processor 0 due to Keyboard Entry [0]kdb> bp printk Instruction(i) BP #0 at 0x80001014874c (printk) is enabled addr at 80001014874c, hardtype=0 installed=0 [0]kdb> g / # echo h > /proc/sysrq-trigger Entering kdb (current=0xfa800040, pid 264) on processor 2 due to Breakpoint @ 0x80001014874c [2]kdb> ss Entering kdb (current=0xfa800040, pid 264) on processor 2 due to SS trap @ 0x800010148750 [2]kdb> ss Entering kdb (current=0xfa800040, pid 264) on processor 2 due to SS trap @ 0x800010148754 [2]kdb> ss Entering kdb (current=0xfa800040, pid 264) on processor 2 due to SS trap @ 0x800010148758 [2]kdb> * kgdb: (gdb) target remote 127.1:23002 Remote debugging using 127.1:23002 arch_kgdb_breakpoint () at /home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21 21 asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM)); (gdb) b printk Breakpoint 1 at 0x80001014874c: file /home/liwei/main_code/linux/kernel/printk/printk.c, line 2076. (gdb) c Continuing. [New Thread 281] [New Thread 280] [Switching to Thread 281] Thread 174 hit Breakpoint 1, printk (fmt=0x8000112fc138 "\001\066sysrq: HELP : ") at /home/liwei/main_code/linux/kernel/printk/printk.c:2076 2076{ (gdb) stepi 0x800010148750 2076{ (gdb) stepi 2080va_start(args, fmt); (gdb) stepi 0x800010148758 2080va_start(args, fmt); (gdb) Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support") Signed-off-by: Wei Li --- arch/arm64/include/asm/debug-monitors.h | 2 ++ arch/arm64/kernel/debug-monitors.c | 2 +- arch/arm64/kernel/kgdb.c| 3 ++- 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h index b62469f3475b..a48b507c89ee 100644 --- a/arch/arm64/include/asm/debug-monitors.h +++ b/arch/arm64/include/asm/debug-monitors.h @@ -78,6 +78,8 @@ struct step_hook { int (*fn)(struct pt_regs *regs, unsigned int esr); }; +void set_regs_spsr_ss(struct pt_regs *regs); + void register_user_step_hook(struct step_hook *hook); void unregister_user_step_hook(struct step_hook *hook); diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c index 25ce6b5a52d2..7a58233677de 100644 --- a/arch/arm64/kernel/debug-monitors.c +++ b/arch/arm64/kernel/debug-monitors.c @@ -141,7 +141,7 @@ postcore_initcall(debug_monitors_init); /* * Single step API and exception handling. */ -static void set_regs_spsr_ss(struct pt_regs *regs) +void set_regs_spsr_ss(struct pt_regs *regs) { regs->pstate |= DBG_SPSR_SS; } diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c index
[PATCH 1/4] arm64: kgdb: Fix single-step exception handling oops
After entering kdb due to breakpoint, when we execute 'ss' or 'go' (will delay installing breakpoints, do single-step first), it won't work correctly, and it will enter kdb due to oops. It's because the reason gotten in kdb_stub() is not as expected, and it seems that the ex_vector for single-step should be 0, like what arch powerpc/sh/parisc has implemented. Before the patch: Entering kdb (current=0x8000119e2dc0, pid 0) on processor 0 due to Keyboard Entry [0]kdb> bp printk Instruction(i) BP #0 at 0x8000101486cc (printk) is enabled addr at 8000101486cc, hardtype=0 installed=0 [0]kdb> g / # echo h > /proc/sysrq-trigger Entering kdb (current=0xfa878040, pid 266) on processor 3 due to Breakpoint @ 0x8000101486cc [3]kdb> ss Entering kdb (current=0xfa878040, pid 266) on processor 3 Oops: (null) due to oops @ 0x800010082ab8 CPU: 3 PID: 266 Comm: sh Not tainted 5.7.0-rc4-13839-gf0e5ad491718 #6 Hardware name: linux,dummy-virt (DT) pstate: 0085 (nzcv daIf -PAN -UAO) pc : el1_irq+0x78/0x180 lr : __handle_sysrq+0x80/0x190 sp : 800015003bf0 x29: 800015003d20 x28: fa878040 x27: x26: 80001126b1f0 x25: 800011b6a0d8 x24: x23: 8025 x22: 8000101486cc x21: 800015003d30 x20: x19: 8000119f2000 x18: x17: x16: x15: x14: x13: x12: x11: x10: x9 : x8 : 800015003e50 x7 : 0002 x6 : 380b9990 x5 : 8000106e99e8 x4 : fadd83c0 x3 : x2 : 800011b6a0d8 x1 : 800011b6a000 x0 : 80001130c9d8 Call trace: el1_irq+0x78/0x180 printk+0x0/0x84 write_sysrq_trigger+0xb0/0x118 proc_reg_write+0xb4/0xe0 __vfs_write+0x18/0x40 vfs_write+0xb0/0x1b8 ksys_write+0x64/0xf0 __arm64_sys_write+0x14/0x20 el0_svc_common.constprop.2+0xb0/0x168 do_el0_svc+0x20/0x98 el0_sync_handler+0xec/0x1a8 el0_sync+0x140/0x180 [3]kdb> After the patch: Entering kdb (current=0x8000119e2dc0, pid 0) on processor 0 due to Keyboard Entry [0]kdb> bp printk Instruction(i) BP #0 at 0x8000101486cc (printk) is enabled addr at 8000101486cc, hardtype=0 installed=0 [0]kdb> g / # echo h > /proc/sysrq-trigger Entering kdb (current=0xfa852bc0, pid 268) on processor 0 due to Breakpoint @ 0x8000101486cc [0]kdb> g Entering kdb (current=0xfa852bc0, pid 268) on processor 0 due to Breakpoint @ 0x8000101486cc [0]kdb> ss Entering kdb (current=0xfa852bc0, pid 268) on processor 0 due to SS trap @ 0x800010082ab8 [0]kdb> Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support") Signed-off-by: Wei Li --- arch/arm64/kernel/kgdb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c index 43119922341f..1a157ca33262 100644 --- a/arch/arm64/kernel/kgdb.c +++ b/arch/arm64/kernel/kgdb.c @@ -252,7 +252,7 @@ static int kgdb_step_brk_fn(struct pt_regs *regs, unsigned int esr) if (!kgdb_single_step) return DBG_HOOK_ERROR; - kgdb_handle_exception(1, SIGTRAP, 0, regs); + kgdb_handle_exception(0, SIGTRAP, 0, regs); return DBG_HOOK_HANDLED; } NOKPROBE_SYMBOL(kgdb_step_brk_fn); -- 2.17.1
Re: [PATCH 2/2] drm/bridge: Add ITE IT6251 bridge driver
Hi Richard, Thank you for the patch. On Sat, May 09, 2020 at 01:17:32PM +0200, s...@48.io wrote: > From: Marek Vasut > > Add driver for the ITE IT6251 LVDS-to-eDP bridge. > > Signed-off-by: Marek Vasut > Signed-off-by: Richard Marko > Cc: Daniel Vetter > Cc: Sean Cross > To: dri-de...@lists.freedesktop.org > --- > drivers/gpu/drm/bridge/Kconfig | 12 + > drivers/gpu/drm/bridge/Makefile | 1 + > drivers/gpu/drm/bridge/ite-it6251.c | 582 > 3 files changed, 595 insertions(+) > create mode 100644 drivers/gpu/drm/bridge/ite-it6251.c > > diff --git a/drivers/gpu/drm/bridge/Kconfig b/drivers/gpu/drm/bridge/Kconfig > index aaed2347ace9..934896a4ab2d 100644 > --- a/drivers/gpu/drm/bridge/Kconfig > +++ b/drivers/gpu/drm/bridge/Kconfig > @@ -38,6 +38,18 @@ config DRM_DISPLAY_CONNECTOR > on ARM-based platforms. Saying Y here when this driver is not needed > will not cause any issue. > > +config DRM_ITE_IT6251 > + tristate "ITE IT6251 LVDS/eDP bridge" > + depends on OF > + select DRM_KMS_HELPER > + select DRM_PANEL > + select REGMAP_I2C > + help > + Driver for ITE IT6251 LVDS-eDP bridge chip driver. This is used > + in Novena open-hardware laptop with eDP based panel. > + IT6251 supports LVDS input and DisplayPort 1.1a output, > + resolution up to 1080P and 10-bit color depth. > + > config DRM_LVDS_CODEC > tristate "Transparent LVDS encoders and decoders support" > depends on OF > diff --git a/drivers/gpu/drm/bridge/Makefile b/drivers/gpu/drm/bridge/Makefile > index 6fb062b5b0f0..4c195dc42fce 100644 > --- a/drivers/gpu/drm/bridge/Makefile > +++ b/drivers/gpu/drm/bridge/Makefile > @@ -1,6 +1,7 @@ > # SPDX-License-Identifier: GPL-2.0 > obj-$(CONFIG_DRM_CDNS_DSI) += cdns-dsi.o > obj-$(CONFIG_DRM_DISPLAY_CONNECTOR) += display-connector.o > +obj-$(CONFIG_DRM_ITE_IT6251) += ite-it6251.o > obj-$(CONFIG_DRM_LVDS_CODEC) += lvds-codec.o > obj-$(CONFIG_DRM_MEGACHIPS_STDP_GE_B850V3_FW) += > megachips-stdp-ge-b850v3-fw.o > obj-$(CONFIG_DRM_NXP_PTN3460) += nxp-ptn3460.o > diff --git a/drivers/gpu/drm/bridge/ite-it6251.c > b/drivers/gpu/drm/bridge/ite-it6251.c > new file mode 100644 > index ..b8534fb62c9d > --- /dev/null > +++ b/drivers/gpu/drm/bridge/ite-it6251.c > @@ -0,0 +1,582 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Copyright (C) 2014 Sean Cross > + * > + * Rework for mainline: Marek Vasut > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +struct it6251_bridge { > + struct i2c_client *client; > + struct i2c_client *lvds_client; > + struct regmap *regmap; > + struct regmap *lvds_regmap; > + struct regulator*regulator; > + > + struct drm_connectorconnector; > + struct drm_bridge bridge; > + struct drm_panel*panel; > +}; > + > +/* Register definitions */ > +#define IT6251_VENDOR_ID_LOW 0x00 > +#define IT6251_VENDOR_ID_HIGH0x01 > +#define IT6251_DEVICE_ID_LOW 0x02 > +#define IT6251_DEVICE_ID_HIGH0x03 > +#define IT6251_SYSTEM_STATUS 0x0d > +#define IT6251_SYSTEM_STATUS_RINTSTATUS BIT(0) > +#define IT6251_SYSTEM_STATUS_RHPDSTATUS BIT(1) > +#define IT6251_SYSTEM_STATUS_RVIDEOSTABLEBIT(2) > +#define IT6251_SYSTEM_STATUS_RPLL_IOLOCK BIT(3) > +#define IT6251_SYSTEM_STATUS_RPLL_XPLOCK BIT(4) > +#define IT6251_SYSTEM_STATUS_RPLL_SPLOCK BIT(5) > +#define IT6251_SYSTEM_STATUS_RAUXFREQ_LOCK BIT(6) > +#define IT6251_REF_STATE 0x0e > +#define IT6251_REF_STATE_MAIN_LINK_DISABLED BIT(0) > +#define IT6251_REF_STATE_AUX_CHANNEL_READBIT(1) > +#define IT6251_REF_STATE_CR_PATTERN BIT(2) > +#define IT6251_REF_STATE_EQ_PATTERN BIT(3) > +#define IT6251_REF_STATE_NORMAL_OPERATIONBIT(4) > +#define IT6251_REF_STATE_MUTED BIT(5) > +#define IT6251_RPCLK_CNT_LOW 0x13 > +#define IT6251_RPCLK_CNT_HIGH0x14 > +#define IT6251_RPC_REQ 0x2b > +#define IT6251_RPC_REQ_RPC_FIFOFULL BIT(6) > +#define IT6251_RPC_REQ_RPC_FIFOEMPTY BIT(7) > +#define IT6251_PCLK_CNT_LOW 0x57 > +#define IT6251_PCLK_CNT_HIGH 0x58 > +#define IT6251_DPHDEW_LOW0xa5 > +#define IT6251_DPHDEW_HIGH 0xa6 > +#define IT6251_DPVDEW_LOW0xaf > +#define IT6251_DPVDEW_HIGH 0xb0 > +#define IT6251_LVDS_PORT_ADDR0xfd > +#define IT6251_LVDS_PORT_CTRL0xfe > +#define IT6251_LVDS_PORT_CTRL_EN BIT(0) > + > +/* > + *