Re: [PATCH 3/4] mm: /proc/sys/vm/stat_refresh skip checking known negative stats
Mon, Mar 01, 2021 at 02:08:17PM -0800, Hugh Dickins wrote: > On Sun, 28 Feb 2021, Roman Gushchin wrote: > > On Thu, Feb 25, 2021 at 03:14:03PM -0800, Hugh Dickins wrote: > > > vmstat_refresh() can occasionally catch nr_zone_write_pending and > > > nr_writeback when they are transiently negative. The reason is partly > > > that the interrupt which decrements them in test_clear_page_writeback() > > > can come in before __test_set_page_writeback() got to increment them; > > > but transient negatives are still seen even when that is prevented, and > > > we have not yet resolved why (Roman believes that it is an unavoidable > > > consequence of the refresh scheduled on each cpu). But those stats are > > > not buggy, they have never been seen to drift away from 0 permanently: > > > so just avoid the annoyance of showing a warning on them. > > > > > > Similarly avoid showing a warning on nr_free_cma: CMA users have seen > > > that one reported negative from /proc/sys/vm/stat_refresh too, but it > > > does drift away permanently: I believe that's because its incrementation > > > and decrementation are decided by page migratetype, but the migratetype > > > of a pageblock is not guaranteed to be constant. > > > > > > Use switch statements so we can most easily add or remove cases later. > > > > I'm OK with the code, but I can't fully agree with the commit log. I don't > > think > > there is any mystery around negative values. Let me copy-paste the > > explanation > > from my original patch: > > > > These warnings* are generated by the vmstat_refresh() function, which > > assumes that atomic zone and numa counters can't go below zero. > > However, > > on a SMP machine it's not quite right: due to per-cpu caching it can in > > theory be as low as -(zone threshold) * NR_CPUs. > > > > For instance, let's say all cma pages are in use and NR_FREE_CMA_PAGES > > reached 0. Then we've reclaimed a small number of cma pages on each CPU > > except CPU0, so that most percpu NR_FREE_CMA_PAGES counters are slightly > > positive (the atomic counter is still 0). Then somebody on CPU0 > > consumes > > all these pages. The number of pages can easily exceed the threshold > > and > > a negative value will be committed to the atomic counter. > > > > * warnings about negative NR_FREE_CMA_PAGES > > Hi Roman, thanks for your Acks on the others - and indeed this > is the one on which disagreement was more to be expected. > > I certainly wanted (and included below) a Link to your original patch; > and even wondered whether to paste your description into mine. > But I read it again and still have issues with it. > > Mainly, it does not convey at all, that touching stat_refresh adds the > per-cpu counts into the global atomics, resetting per-cpu counts to 0. > Which does not invalidate your explanation: races might still manage > to underflow; but it does take the "easily" out of "can easily exceed". Hi Hugh! It could be that "easily" simple comes from the scale (number of machines). > > Since I don't use CMA on any machine, I cannot be sure, but it looked > like a bad example to rely upon, because of its migratetype-based > accounting. If you use /proc/sys/vm/stat_refresh frequently enough, > without suppressing the warning, I guess that uncertainty could be > resolved by checking whether nr_free_cma is seen with negative value > in consecutive refreshes - which would tend to support my migratetype > theory - or only singly - which would support your raciness theory. > > > > > Actually, the same is almost true for ANY other counter. What differs CMA, > > dirty > > and write pending counters is that they can reach 0 value under normal > > conditions. > > Other counters are usually not reaching values small enough to see negative > > values > > on a reasonable sized machine. > > Looking through /proc/vmstat now, yes, I can see that there are fewer > counters which hover near 0 than I had imagined: more have a positive > bias, or are monotonically increasing. And I'd be lying if I said I'd > never seen any others than nr_writeback or nr_zone_write_pending caught > negative. But what are you asking for? Should the patch be changed, to > retry the refresh_vm_stats() before warning, if it sees any negative? > Depends on how terrible one line in dmesg is considered! > > > > > Does it makes sense? > > I'm not sure: you were not asking for the patch to be changed, but > its commit log: and I better not say "Roman believes that it is an > unavoidable consequence of the refresh scheduled on each cpu" if > that's untrue (or unclear: now it reads to me as if we're accusing > the refresh of messing things up, whereas it's the non-atomic nature > of the refresh which leaves it vulnerable to races). I think we both agree that for some counters going slightly into negative is possible and isn't an indication of an error, if only they don't become too negative. For other
Re: Question about the "EXPERIMENTAL" tag for dax in XFS
On Mon, Mar 1, 2021 at 2:47 PM Dave Chinner wrote: > > On Mon, Mar 01, 2021 at 12:55:53PM -0800, Dan Williams wrote: > > On Sun, Feb 28, 2021 at 2:39 PM Dave Chinner wrote: > > > > > > On Sat, Feb 27, 2021 at 03:40:24PM -0800, Dan Williams wrote: > > > > On Sat, Feb 27, 2021 at 2:36 PM Dave Chinner > > > > wrote: > > > > > On Fri, Feb 26, 2021 at 02:41:34PM -0800, Dan Williams wrote: > > > > > > On Fri, Feb 26, 2021 at 1:28 PM Dave Chinner > > > > > > wrote: > > > > > > > On Fri, Feb 26, 2021 at 12:59:53PM -0800, Dan Williams wrote: > > > > > it points to, check if it points to the PMEM that is being removed, > > > > > grab the page it points to, map that to the relevant struct page, > > > > > run collect_procs() on that page, then kill the user processes that > > > > > map that page. > > > > > > > > > > So why can't we walk the ptescheck the physical pages that they > > > > > map to and if they map to a pmem page we go poison that > > > > > page and that kills any user process that maps it. > > > > > > > > > > i.e. I can't see how unexpected pmem device unplug is any different > > > > > to an MCE delivering a hwpoison event to a DAX mapped page. > > > > > > > > I guess the tradeoff is walking a long list of inodes vs walking a > > > > large array of pages. > > > > > > Not really. You're assuming all a filesystem has to do is invalidate > > > everything if a device goes away, and that's not true. Finding if an > > > inode has a mapping that spans a specific device in a multi-device > > > filesystem can be a lot more complex than that. Just walking inodes > > > is easy - determining whihc inodes need invalidation is the hard > > > part. > > > > That inode-to-device level of specificity is not needed for the same > > reason that drop_caches does not need to be specific. If the wrong > > page is unmapped a re-fault will bring it back, and re-fault will fail > > for the pages that are successfully removed. > > > > > That's where ->corrupt_range() comes in - the filesystem is already > > > set up to do reverse mapping from physical range to inode(s) > > > offsets... > > > > Sure, but what is the need to get to that level of specificity with > > the filesystem for something that should rarely happen in the course > > of normal operation outside of a mistake? > > Dan, you made this mistake with the hwpoisoning code that we're > trying to fix that here. Hard coding a 1:1 physical address to > inode/offset into the DAX mapping was a bad mistake. It's also one > that should never have occurred because it's *obviously wrong* to > filesystem developers and has been for a long time. I admit that mistake. The traditional memory error handling model assumptions around page->mapping were broken by DAX, I'm not trying to repeat that mistake. I feel we're talking past each other on the discussion of the proposals. > Now we have the filesytem people providing a mechanism for the pmem > devices to tell the filesystems about physical device failures so > they can handle such failures correctly themselves. Having the > device go away unexpectedly from underneath a mounted and active > filesystem is a *device failure*, not an "unplug event". It's the same difference to the physical page, all mappings to that page need to be torn down. I'm happy to call an fs callback and let each filesystem do what it wants with a "every pfn in this dax device needs to be unmapped". I'm looking at the ->corrupted_range() patches trying to map it to this use case and I don't see how, for example a realtime-xfs over DM over multiple PMEM gets the notification to the right place. bd_corrupted_range() uses get_super() which get the wrong answer for both realtime-xfs and DM. I'd flip that arrangement around and have the FS tell the block device "if something happens to you, here is the super_block to notify". So to me this looks like a fs_dax_register_super() helper that plumbs the superblock through an arbitrary stack of block devices to the leaf block-device that might want to send a notification up when a global unmap operation needs to be performed. I naively think that "for_each_inode() unmap_mapping_range(>i_mapping)" is sufficient as a generic implementation, that does not preclude XFS to override that generic implementation and handle it directly if it so chooses. > The mistake you made was not understanding how filesystems work, > nor actually asking filesystem developers what they actually needed. You're going too far here, but that's off topic. > You're doing the same thing here - you're telling us what you think > the solution filesystems need is. No, I'm not, I'm trying to understand tradeoffs. I apologize if this is coming across as not listening. > Please listen when we say "that is > not sufficient" because we don't want to be backed into a corner > that we have to fix ourselves again before we can enable some basic > filesystem functionality that we should have been able to support on > DAX from the start... That's
Re: [PATCH 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests
On Mon, 2021-03-01 at 08:21 -0800, Sean Christopherson wrote: > On Mon, Mar 01, 2021, Kai Huang wrote: > > + /* > > +* SECS pages are "pinned" by child pages, an unpinned once all > > s/an/and Thanks! > > > +* children have been EREMOVE'd. A child page in this instance > > +* may have pinned an SECS page encountered in an earlier release(), > > +* creating a zombie. Since some children were EREMOVE'd above, > > +* try to EREMOVE all zombies in the hopes that one was unpinned. > > +*/ > > + mutex_lock(_secs_pages_lock); > > + list_for_each_entry_safe(epc_page, tmp, _secs_pages, list) { > > + /* > > +* Speculatively remove the page from the list of zombies, > > +* if the page is successfully EREMOVE it will be added to > > +* the list of free pages. If EREMOVE fails, throw the page > > +* on the local list, which will be spliced on at the end. > > +*/ > > + list_del(_page->list); > > + > > + if (sgx_vepc_free_page(epc_page)) > > + list_add_tail(_page->list, _pages); > > + } > > + > > + if (!list_empty(_pages)) > > + list_splice_tail(_pages, _secs_pages); > > + mutex_unlock(_secs_pages_lock); > > + > > + kfree(vepc); > > + > > + return 0; > > +}
Re: [PATCH 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
On Mon, 2021-03-01 at 09:29 -0800, Sean Christopherson wrote: > On Mon, Mar 01, 2021, Kai Huang wrote: > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > > index 7449ef33f081..a7dc86e87a09 100644 > > --- a/arch/x86/kernel/cpu/sgx/encl.c > > +++ b/arch/x86/kernel/cpu/sgx/encl.c > > @@ -381,6 +381,26 @@ const struct vm_operations_struct sgx_vm_ops = { > > .access = sgx_vma_access, > > }; > > > > > > > > > > +static void sgx_encl_free_epc_page(struct sgx_epc_page *epc_page) > > +{ > > + int ret; > > + > > + WARN_ON_ONCE(epc_page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); > > + > > + ret = __eremove(sgx_get_epc_virt_addr(epc_page)); > > + if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret)) { > > This can be ENCLS_WARN, especially if you're printing a separate error message > about leaking the page. That being said, I'm not sure a seperate error > message > is a good idea. If other stuff gets dumped to the kernel log between the WARN > and the pr_err_once(), it may not be clear to admins that the two events are > directly connected. It's even possible the prints could come from two > different > CPUs. Good point. Thanks for educating me :) > > Why not dump a short blurb in the WARN itself? The error message can be > thrown > in a define if the line length is too obnoxious (it's ~109 chars if embedded > directly). > > #define EREMOVE_ERROR_MESSAGE \ > "EREMOVE returned %d (0x%x). EPC page leaked, reboot recommended." > > if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret)) Will do in your way. Thanks! > > > + /* > > +* Give a message to remind EPC page is leaked, and requires > > +* machine reboot to get leaked pages back. This can be improved > > +* in the future by adding stats of leaked pages, etc. > > +*/ > > + pr_err_once("EPC page is leaked. Require machine reboot to get > > leaked pages back.\n"); > > + return; > > + } > > + > > + sgx_free_epc_page(epc_page); > > +} > > + > > /** > > * sgx_encl_release - Destroy an enclave instance > > * @kref: address of a kref inside _encl
Re: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions
On Mon, Mar 01, 2021 at 05:23:47PM -0700, Dave Jiang wrote: > > So after looking at the code in vfio_pci_intrs.c, I agree that the set_irqs > code between VFIO_PCI and this driver can be made in common. Given that Alex > doesn't want a vfio_pci device embedded in the driver, idxd isn't a vfio_pci so it would be improper to do something like that here anyhow. > I think we'll need some sort of generic VFIO device that can be used > from the vfio_pci side and vfio_mdev side to pass down in order to > have common support library functions. Why do you need more layers? Just make some helper functions to manage this and build them into their own struct and function family. All this needs is some callback to for the end driver to hook in the raw device programming and some entry points to direct the emulation access to the module. It should be fully self contained and completely unrelated to vfio_pci Jason
Re: [PATCH v5 0/5] mm/hugetlb: Early cow on fork, and a few cleanups
On Mon, Mar 01, 2021 at 04:28:46PM -0800, Andrew Morton wrote: > On Mon, 1 Mar 2021 09:11:51 -0500 Peter Xu wrote: > > > On Wed, Feb 17, 2021 at 06:35:42PM -0500, Peter Xu wrote: > > > v5: > > > - patch 4: change "int cow" into "bool cow" > > > - collect r-bs for Jason > > > > Andrew, > > > > I just noticed 5.12-rc1 has released; is this series still possible to make > > it > > for 5.12, or needs to wait for 5.13? > > > > It has taken a while to settle down. What is the case for > fast-tracking it into 5.12? IIRC hugetlb users and fork and DMA will get the unexpected VA corruption that triggered all this work. Jason
Re: [PATCH v5 0/5] mm/hugetlb: Early cow on fork, and a few cleanups
On Mon, 1 Mar 2021 09:11:51 -0500 Peter Xu wrote: > On Wed, Feb 17, 2021 at 06:35:42PM -0500, Peter Xu wrote: > > v5: > > - patch 4: change "int cow" into "bool cow" > > - collect r-bs for Jason > > Andrew, > > I just noticed 5.12-rc1 has released; is this series still possible to make it > for 5.12, or needs to wait for 5.13? > It has taken a while to settle down. What is the case for fast-tracking it into 5.12?
Re: [PATCH net] hv_netvsc: Fix validation in netvsc_linkstatus_callback()
Hello: This patch was applied to netdev/net.git (refs/heads/master): On Mon, 1 Mar 2021 19:25:30 +0100 you wrote: > Contrary to the RNDIS protocol specification, certain (pre-Fe) > implementations of Hyper-V's vSwitch did not account for the status > buffer field in the length of an RNDIS packet; the bug was fixed in > newer implementations. Validate the status buffer fields using the > length of the 'vmtransfer_page' packet (all implementations), that > is known/validated to be less than or equal to the receive section > size and not smaller than the length of the RNDIS message. > > [...] Here is the summary with links: - [net] hv_netvsc: Fix validation in netvsc_linkstatus_callback() https://git.kernel.org/netdev/net/c/3946688edbc5 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Re: [PATCH V3 XRT Alveo 03/18] fpga: xrt: xclbin file helper functions
Hi Tom, On 02/28/2021 08:54 AM, Tom Rix wrote: CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email. On 2/26/21 1:23 PM, Lizhi Hou wrote: Hi Tom, snip I also do not see a pragma pack, usually this is set of 1 so the compiler does not shuffle elements, increase size etc. This data structure is shared with other tools. And the structure is well defined with reasonable alignment. It is compatible with all compilers we have tested. So pragma pack is not necessary. You can not have possibly tested all the configurations since the kernel supports many arches and compilers. If the tested existing alignment is ok, pragma pack should be a noop on your tested configurations. And help cover the untested configurations. Got it. I will add pragma pack(1). Lizhi Tom
Re: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions
On 2/10/2021 4:59 PM, Jason Gunthorpe wrote: On Fri, Feb 05, 2021 at 01:53:24PM -0700, Dave Jiang wrote: <-- cut for brevity --> +static int vdcm_idxd_set_msix_trigger(struct vdcm_idxd *vidxd, + unsigned int index, unsigned int start, + unsigned int count, uint32_t flags, + void *data) +{ + int i, rc = 0; + + if (count > VIDXD_MAX_MSIX_ENTRIES - 1) + count = VIDXD_MAX_MSIX_ENTRIES - 1; + + if (count == 0 && (flags & VFIO_IRQ_SET_DATA_NONE)) { + /* Disable all MSIX entries */ + for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) { + rc = msix_trigger_unregister(vidxd, i); + if (rc < 0) + return rc; + } + return 0; + } + + for (i = 0; i < count; i++) { + if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { + u32 fd = *(u32 *)(data + i * sizeof(u32)); + + rc = msix_trigger_register(vidxd, fd, i); + if (rc < 0) + return rc; + } else if (flags & VFIO_IRQ_SET_DATA_NONE) { + rc = msix_trigger_unregister(vidxd, i); + if (rc < 0) + return rc; + } + } + return rc; +} + +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, + unsigned int index, unsigned int start, + unsigned int count, void *data) +{ + int (*func)(struct vdcm_idxd *vidxd, unsigned int index, + unsigned int start, unsigned int count, uint32_t flags, + void *data) = NULL; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + switch (index) { + case VFIO_PCI_INTX_IRQ_INDEX: + dev_warn(dev, "intx interrupts not supported.\n"); + break; + case VFIO_PCI_MSI_IRQ_INDEX: + dev_dbg(dev, "msi interrupt.\n"); + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { + case VFIO_IRQ_SET_ACTION_MASK: + case VFIO_IRQ_SET_ACTION_UNMASK: + break; + case VFIO_IRQ_SET_ACTION_TRIGGER: + func = vdcm_idxd_set_msix_trigger; This would be a good place to insert a common VFIO helper library to take care of the MSI-X emulation for IMS. Hi Jason, So after looking at the code in vfio_pci_intrs.c, I agree that the set_irqs code between VFIO_PCI and this driver can be made in common. Given that Alex doesn't want a vfio_pci device embedded in the driver, I think we'll need some sort of generic VFIO device that can be used from the vfio_pci side and vfio_mdev side to pass down in order to have common support library functions. Do you have any thoughts on how to do this cleanly architecturally? Also, with vfio_pci common split [1] still being worked on, do you think we can defer the work on making the interrupt setup code common until the vfio_pci split work settles? Thanks! [1]: https://lore.kernel.org/kvm/20210201162828.5938-1-mgurto...@nvidia.com/
Re: [PATCH v3 1/8] mm: Remove special swap entry functions
On Tuesday, 2 March 2021 4:46:42 AM AEDT Jason Gunthorpe wrote: > > I wish you could come up with a more descriptive word that special > here > > What I understand is this is true when the swap_offset is a pfn? Correct, and that points to a better name. Maybe is_pfn_swap_entry()? In which case adding a helper as Christoph suggested makes some more sense. Eg: pfn_swap_entry_to_page() > > -static inline struct page *migration_entry_to_page(swp_entry_t entry) > > -{ > > - struct page *p = pfn_to_page(swp_offset(entry)); > > - /* > > -* Any use of migration entries may only occur while the > > -* corresponding page is locked > > -*/ > > - BUG_ON(!PageLocked(compound_head(p))); > > - return p; > > And this constraint has been completely lost? Yes, sorry I should have called that out. I didn't think loosing the check was a big deal, but I can add some checks to some of the call sites which would catch a page being incorrectly unlocked. > A comment in front of the is_special_entry explaining all the rule > would help alot Will add one. > Transformation looks fine otherwise Thanks. - Alistair > Jason >
linux-next: build failure after merge of the powerpc-fixes tree
Hi all, After merging the powerpc-fixes tree, today's linux-next build (powerpc allyesconfig) failed like this: drivers/net/ethernet/ibm/ibmvnic.c:5399:13: error: conflicting types for 'ibmvnic_remove' 5399 | static void ibmvnic_remove(struct vio_dev *dev) | ^~ drivers/net/ethernet/ibm/ibmvnic.c:81:12: note: previous declaration of 'ibmvnic_remove' was here 81 | static int ibmvnic_remove(struct vio_dev *); |^~ Caused by commit 1bdd1e6f9320 ("vio: make remove callback return void") I have applied the following patch for today: From: Stephen Rothwell Date: Tue, 2 Mar 2021 11:06:37 +1100 Subject: [PATCH] vio: fix for make remove callback return void Signed-off-by: Stephen Rothwell --- drivers/net/ethernet/ibm/ibmvnic.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index eb39318766f6..fe3201ba2034 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -78,7 +78,6 @@ MODULE_LICENSE("GPL"); MODULE_VERSION(IBMVNIC_DRIVER_VERSION); static int ibmvnic_version = IBMVNIC_INITIAL_VERSION; -static int ibmvnic_remove(struct vio_dev *); static void release_sub_crqs(struct ibmvnic_adapter *, bool); static int ibmvnic_reset_crq(struct ibmvnic_adapter *); static int ibmvnic_send_crq_init(struct ibmvnic_adapter *); -- 2.30.0 -- Cheers, Stephen Rothwell pgp7u0BheRH7K.pgp Description: OpenPGP digital signature
[PATCH] c6x: Remove stale symlink 'scripts/dtc/include-prefixes/c6x'
Remove stale symlink 'scripts/dtc/include-prefixes/c6x' Signed-off-by: Victor Erminpour --- scripts/dtc/include-prefixes/c6x | 1 - 1 file changed, 1 deletion(-) delete mode 12 scripts/dtc/include-prefixes/c6x diff --git a/scripts/dtc/include-prefixes/c6x b/scripts/dtc/include-prefixes/c6x deleted file mode 12 index 49ded4cae2be.. --- a/scripts/dtc/include-prefixes/c6x +++ /dev/null @@ -1 +0,0 @@ -../../../arch/c6x/boot/dts \ No newline at end of file
[PATCH] NFS: fs_context: validate UDP retrans to prevent shift out-of-bounds
Fix shift out-of-bounds in xprt_calc_majortimeo(). This is caused by a garbage timeout (retrans) mount option being passed to nfs mount, in this case from syzkaller. If the protocol is XPRT_TRANSPORT_UDP, then 'retrans' is a shift value for a 64-bit long integer, so 'retrans' cannot be >= 64. If it is >= 64, fail the mount and return an error. Fixes: 9954bf92c0cd ("NFS: Move mount parameterisation bits into their own file") Reported-by: syzbot+ba2e91df8f7480941...@syzkaller.appspotmail.com Reported-by: syzbot+f3a0fa110fd630ab5...@syzkaller.appspotmail.com Signed-off-by: Randy Dunlap Cc: Trond Myklebust Cc: Anna Schumaker Cc: linux-...@vger.kernel.org Cc: David Howells Cc: Al Viro Cc: sta...@vger.kernel.org --- fs/nfs/fs_context.c | 12 1 file changed, 12 insertions(+) --- lnx-512-rc1.orig/fs/nfs/fs_context.c +++ lnx-512-rc1/fs/nfs/fs_context.c @@ -974,6 +974,15 @@ static int nfs23_parse_monolithic(struct sizeof(mntfh->data) - mntfh->size); /* +* for proto == XPRT_TRANSPORT_UDP, which is what uses +* to_exponential, implying shift: limit the shift value +* to BITS_PER_LONG (majortimeo is unsigned long) +*/ + if (!(data->flags & NFS_MOUNT_TCP)) /* this will be UDP */ + if (data->retrans >= 64) /* shift value is too large */ + goto out_invalid_data; + + /* * Translate to nfs_fs_context, which nfs_fill_super * can deal with. */ @@ -1073,6 +1082,9 @@ out_no_address: out_invalid_fh: return nfs_invalf(fc, "NFS: invalid root filehandle"); + +out_invalid_data: + return nfs_invalf(fc, "NFS: invalid binary mount data"); } #if IS_ENABLED(CONFIG_NFS_V4)
RE: [PATCH 4.19 055/247] soc: aspeed: snoop: Add clock control logic
> -Original Message- > From: Joel Stanley > Sent: Monday, March 1, 2021 2:44 PM > To: Greg Kroah-Hartman ; John Wang > ; Yoo, Jae Hyun > > Cc: Linux Kernel Mailing List ; > sta...@vger.kernel.org; Vernon Mauery ; > Sasha Levin > Subject: Re: [PATCH 4.19 055/247] soc: aspeed: snoop: Add clock control logic > > On Mon, 1 Mar 2021 at 16:37, Greg Kroah-Hartman > wrote: > > > > From: Jae Hyun Yoo > > > > [ Upstream commit 3f94cf15583be554df7aaa651b8ff8e1b68fbe51 ] > > > > If LPC SNOOP driver is registered ahead of lpc-ctrl module, LPC SNOOP > > block will be enabled without heart beating of LCLK until lpc-ctrl > > enables the LCLK. This issue causes improper handling on host > > interrupts when the host sends interrupt in that time frame. > > Then kernel eventually forcibly disables the interrupt with dumping > > stack and printing a 'nobody cared this irq' message out. > > > > To prevent this issue, all LPC sub-nodes should enable LCLK > > individually so this patch adds clock control logic into the LPC SNOOP > > driver. > > Jae, John; with this backported do we need to also provide a corresponding > device tree change for the stable tree, otherwise this driver will no longer > probe? Right. The second patch https://lore.kernel.org/linux-arm-kernel/20201208091748.1920-2-wangzhiqiang...@bytedance.com/ John submitted should be applied to stable tree too to make this module be probed correctly. > > > > Fixes: 3772e5da4454 ("drivers/misc: Aspeed LPC snoop output using misc > > chardev") > > Signed-off-by: Jae Hyun Yoo > > Signed-off-by: Vernon Mauery > > Signed-off-by: John Wang > > Reviewed-by: Joel Stanley > > Link: > > https://lore.kernel.org/r/20201208091748.1920-1-wangzhiqiang.bj@byteda > > nce.com > > Signed-off-by: Joel Stanley > > Signed-off-by: Sasha Levin > > --- > > drivers/misc/aspeed-lpc-snoop.c | 30 +++- > -- > > 1 file changed, 27 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/misc/aspeed-lpc-snoop.c > > b/drivers/misc/aspeed-lpc-snoop.c index c10be21a1663d..b4a776bf44bc5 > > 100644 > > --- a/drivers/misc/aspeed-lpc-snoop.c > > +++ b/drivers/misc/aspeed-lpc-snoop.c > > @@ -15,6 +15,7 @@ > > */ > > > > #include > > +#include > > #include > > #include > > #include > > @@ -71,6 +72,7 @@ struct aspeed_lpc_snoop_channel { struct > > aspeed_lpc_snoop { > > struct regmap *regmap; > > int irq; > > + struct clk *clk; > > struct aspeed_lpc_snoop_channel chan[NUM_SNOOP_CHANNELS]; }; > > > > @@ -286,22 +288,42 @@ static int aspeed_lpc_snoop_probe(struct > platform_device *pdev) > > return -ENODEV; > > } > > > > + lpc_snoop->clk = devm_clk_get(dev, NULL); > > + if (IS_ERR(lpc_snoop->clk)) { > > + rc = PTR_ERR(lpc_snoop->clk); > > + if (rc != -EPROBE_DEFER) > > + dev_err(dev, "couldn't get clock\n"); > > + return rc; > > + } > > + rc = clk_prepare_enable(lpc_snoop->clk); > > + if (rc) { > > + dev_err(dev, "couldn't enable clock\n"); > > + return rc; > > + } > > + > > rc = aspeed_lpc_snoop_config_irq(lpc_snoop, pdev); > > if (rc) > > - return rc; > > + goto err; > > > > rc = aspeed_lpc_enable_snoop(lpc_snoop, dev, 0, port); > > if (rc) > > - return rc; > > + goto err; > > > > /* Configuration of 2nd snoop channel port is optional */ > > if (of_property_read_u32_index(dev->of_node, "snoop-ports", > >1, ) == 0) { > > rc = aspeed_lpc_enable_snoop(lpc_snoop, dev, 1, port); > > - if (rc) > > + if (rc) { > > aspeed_lpc_disable_snoop(lpc_snoop, 0); > > + goto err; > > + } > > } > > > > + return 0; > > + > > +err: > > + clk_disable_unprepare(lpc_snoop->clk); > > + > > return rc; > > } > > > > @@ -313,6 +335,8 @@ static int aspeed_lpc_snoop_remove(struct > platform_device *pdev) > > aspeed_lpc_disable_snoop(lpc_snoop, 0); > > aspeed_lpc_disable_snoop(lpc_snoop, 1); > > > > + clk_disable_unprepare(lpc_snoop->clk); > > + > > return 0; > > } > > > > -- > > 2.27.0 > > > > > >
Upper bound mode for kernel timers
Hi Thomas, As discussed on IRC: We had a report of a regression in the TCP keepalive timer. The user had a 3600s keepalive timer for preventing firewall disconnects (on a 3650s interval). They observed keepalive timers coming in up to four minutes late, causing unexpected disconnects. The regression was observed to have come from the timer wheel rewrite from almost five years ago: 500462a9de65 ("timers: Switch to a non-cascading wheel") As you mentioned, with a HZ of 1000, the granularity for a one-hour timer is four minutes, which matches the seen behavior. To "fix" it, the user can just lower the timeout value by four minutes, but that's a workaround, because the keepalive timer isn't working as advertised. One potential fix would be an "upper bound mode" in the timer, i.e. give the user a way to specify that the given 'expires' value is an upper bound rather than a lower bound. As you graciously offered, if you or Anna-Maria can implement that new interface, we (Artem or I) can write up a patch to use it for the keepalive timer. -- Josh
Re: [PATCH v3 5/8] mm: Device exclusive memory access
On Fri, Feb 26, 2021 at 06:18:29PM +1100, Alistair Popple wrote: > +/** > + * make_device_exclusive_range() - Mark a range for exclusive use by a device > + * @mm: mm_struct of assoicated target process > + * @start: start of the region to mark for exclusive device access > + * @end: end address of region > + * @pages: returns the pages which were successfully mark for exclusive acces > + * > + * Returns: number of pages successfully marked for exclusive access > + * > + * This function finds the ptes mapping page(s) to the given address range > and > + * replaces them with special swap entries preventing userspace CPU access. > On > + * fault these entries are replaced with the original mapping after calling > MMU > + * notifiers. > + */ > +int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, > + unsigned long end, struct page **pages) > +{ > + long npages = (end - start) >> PAGE_SHIFT; > + long i; > + > + npages = get_user_pages_remote(mm, start, npages, > +FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, > +pages, NULL, NULL); > + for (i = 0; i < npages; i++) { > + if (!trylock_page(pages[i])) { > + put_page(pages[i]); > + pages[i] = NULL; > + continue; > + } > + > + if (!try_to_protect(pages[i])) { Isn't this racy? get_user_pages returns the ptes at an instant in time, they could have already been changed to something else? I would think you'd want to switch to the swap entry atomically under th PTLs? Jason
[PATCH v2] docs: filesystem: Update smaps vm flag list to latest
We've missed a few documentation when adding new VM_* flags. Add the missing pieces so they'll be in sync now. Signed-off-by: Peter Xu --- v2: - rebase --- Documentation/filesystems/proc.rst | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 48fbfc336ebf..81bfe3c800cc 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -540,7 +540,9 @@ encoded manner. The codes are the following: acarea is accountable nrswap space is not reserved for the area htarea uses huge tlb pages +sfsynchronous page fault ararchitecture specific flag +wfwipe on fork dddo not include area into core dump sdsoft dirty flag mmmixed map area @@ -549,6 +551,8 @@ encoded manner. The codes are the following: mgmergable advise flag btarm64 BTI guarded page mtarm64 MTE allocation tags are enabled +umuserfaultfd missing tracking +uwuserfaultfd wr-protect tracking ===== Note that there is no guarantee that every flag and associated mnemonic will -- 2.26.2
Re: [PATCH] docs: filesystem: Update smaps vm flag list to latest
On Mon, Mar 01, 2021 at 03:17:13PM -0700, Jonathan Corbet wrote: > Peter Xu writes: > > > We've missed a few documentation when adding new VM_* flags. Add the > > missing > > pieces so they'll be in sync now. > > > > Signed-off-by: Peter Xu > > --- > > Documentation/filesystems/proc.rst | 5 + > > 1 file changed, 5 insertions(+) > > So this patch doesn't apply; what version of the kernel did you generate > it against? Could you redo against current kernels, please? Sure. "mt" just got added, hence conflicted, but the rest are still missing. Reposting. Thanks, -- Peter Xu
Re: [x86, build] 6dafca9780: WARNING:at_arch/x86/kernel/ftrace.c:#ftrace_verify_code
On Mon, Mar 1, 2021 at 3:45 PM Steven Rostedt wrote: > > On Mon, 1 Mar 2021 14:14:51 -0800 > Sami Tolvanen wrote: > > > Basically, the problem is that ftrace_replace_code() expects to find > > ideal_nops[NOP_ATOMIC5] here, which in this case is 66:66:66:66:90, > > while objtool has replaced the __fentry__ call with 0f:1f:44:00:00. > > > > As ideal_nops changes depending on kernel config and hardware, when > > CC_USING_NOP_MCOUNT is defined we could either change > > ftrace_nop_replace() to always use P6_NOP5, or skip > > ftrace_verify_code() in ftrace_replace_code() for > > FTRACE_UPDATE_MAKE_CALL. > > So I hacked up the code to get -mnop-record to work on x86, and checked the > vmlinux and it gives me: > > 81bc6120 : > 81bc6120: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > 81bc6125: 55 push %rbp > 81bc6126: 65 48 8b 2c 25 c0 7d 01 00 mov > %gs:0x17dc0,%rbp 81bc612b: R_X86_64_32S current_task > 81bc612f: 53 push %rbx > 81bc6130: 48 8b 45 18 mov0x18(%rbp),%rax > > > Which is the 0f:1f:44:00:00, and it works fine for me. > > Now, that could be because the ideal_nops[NOP_ATOMIC5] is the same, which > would explain this. > > No, we should *not* change ftrace_nop_replace() to always use any P6_NOP5, > as there was a reason we did this. Because not all nops are the same, and > this gets called for *every* function that is traced. > > No, we should not skip ftrace_verify_code() *ever*. (/me was just > referencing on twitter the scenario where ftrace bricked e1000e cards). > > This is probably why I never was much for the compiler conversion into nops, > because it may chose the wrong one for the architecture. Sure, makes sense. Should we just skip the conversion in objtool then and let the kernel deal with it? > What we could do, is if the nop chosen by the compiler is not the ideal > nop, to go back and modify all the nops added by the compiler to the ideal > one, which would keep it using the most efficient one. > > Or, add something like this: > [...] > ret = ftrace_verify_code(rec->ip, old); > + > + if (__is_defined(CC_USING_NOP_MCOUNT) && ret && old_nop) { > + /* Compiler could have put in P6_NOP5 */ > + old = P6_NOP5; > + ret = ftrace_verify_code(rec->ip, old); > + } > + Wouldn't that still hit WARN(1) in the initial ftrace_verify_code() call if ideal_nops doesn't match? Sami
[PATCH v2 1/5] userfaultfd: support minor fault handling for shmem
Modify the userfaultfd register API to allow registering shmem VMAs in minor mode. Modify the shmem mcopy implementation to support UFFDIO_CONTINUE in order to resolve such faults. Combine the shmem mcopy handler functions into a single shmem_mcopy_atomic_pte, which takes a mode parameter. This matches how the hugetlbfs implementation is structured, and lets us remove a good chunk of boilerplate. Signed-off-by: Axel Rasmussen --- fs/userfaultfd.c | 6 +-- include/linux/shmem_fs.h | 26 - include/uapi/linux/userfaultfd.h | 4 +- mm/memory.c | 8 +-- mm/shmem.c | 92 +++- mm/userfaultfd.c | 27 +- 6 files changed, 79 insertions(+), 84 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 14f92285d04f..9f3b8684cf3c 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1267,8 +1267,7 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma, } if (vm_flags & VM_UFFD_MINOR) { - /* FIXME: Add minor fault interception for shmem. */ - if (!is_vm_hugetlb_page(vma)) + if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma))) return false; } @@ -1941,7 +1940,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, /* report all available features and ioctls to userland */ uffdio_api.features = UFFD_API_FEATURES; #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR - uffdio_api.features &= ~UFFD_FEATURE_MINOR_HUGETLBFS; + uffdio_api.features &= + ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); #endif uffdio_api.ioctls = UFFD_API_IOCTLS; ret = -EFAULT; diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index d82b6f396588..f0919c3722e7 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -9,6 +9,7 @@ #include #include #include +#include /* inode in-kernel data */ @@ -122,21 +123,16 @@ static inline bool shmem_file(struct file *file) extern bool shmem_charge(struct inode *inode, long pages); extern void shmem_uncharge(struct inode *inode, long pages); +#ifdef CONFIG_USERFAULTFD #ifdef CONFIG_SHMEM -extern int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, - struct vm_area_struct *dst_vma, - unsigned long dst_addr, - unsigned long src_addr, - struct page **pagep); -extern int shmem_mfill_zeropage_pte(struct mm_struct *dst_mm, - pmd_t *dst_pmd, - struct vm_area_struct *dst_vma, - unsigned long dst_addr); -#else -#define shmem_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \ - src_addr, pagep)({ BUG(); 0; }) -#define shmem_mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, \ -dst_addr) ({ BUG(); 0; }) -#endif +int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, + struct vm_area_struct *dst_vma, + unsigned long dst_addr, unsigned long src_addr, + enum mcopy_atomic_mode mode, struct page **pagep); +#else /* !CONFIG_SHMEM */ +#define shmem_mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \ + src_addr, mode, pagep)({ BUG(); 0; }) +#endif /* CONFIG_SHMEM */ +#endif /* CONFIG_USERFAULTFD */ #endif diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index bafbeb1a2624..47d9790d863d 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -31,7 +31,8 @@ UFFD_FEATURE_MISSING_SHMEM | \ UFFD_FEATURE_SIGBUS |\ UFFD_FEATURE_THREAD_ID | \ - UFFD_FEATURE_MINOR_HUGETLBFS) + UFFD_FEATURE_MINOR_HUGETLBFS | \ + UFFD_FEATURE_MINOR_SHMEM) #define UFFD_API_IOCTLS\ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -196,6 +197,7 @@ struct uffdio_api { #define UFFD_FEATURE_SIGBUS(1<<7) #define UFFD_FEATURE_THREAD_ID (1<<8) #define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9) +#define UFFD_FEATURE_MINOR_SHMEM (1<<10) __u64 features; __u64 ioctls; diff --git a/mm/memory.c b/mm/memory.c index c8e357627318..a1e5ff55027e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3929,9 +3929,11 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf) * something). */ if (vma->vm_ops->map_pages && fault_around_bytes >>
[PATCH v2 5/5] userfaultfd/selftests: exercise minor fault handling shmem support
Enable test_uffdio_minor for test_type == TEST_SHMEM, and modify the test slightly to pass in / check for the right feature flags. Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/userfaultfd.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 5183ddb3080d..f31e9a4edc55 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -1410,7 +1410,7 @@ static int userfaultfd_minor_test(void) void *expected_page; char c; struct uffd_stats stats = { 0 }; - uint64_t features = UFFD_FEATURE_MINOR_HUGETLBFS; + uint64_t req_features, features_out; if (!test_uffdio_minor) return 0; @@ -1418,10 +1418,18 @@ static int userfaultfd_minor_test(void) printf("testing minor faults: "); fflush(stdout); - if (uffd_test_ctx_clear() || uffd_test_ctx_init_ext()) + if (test_type == TEST_HUGETLB) + req_features = UFFD_FEATURE_MINOR_HUGETLBFS; + else if (test_type == TEST_SHMEM) + req_features = UFFD_FEATURE_MINOR_SHMEM; + else + return 1; + + features_out = req_features; + if (uffd_test_ctx_clear() || uffd_test_ctx_init_ext(_out)) return 1; - /* If kernel reports the feature isn't supported, skip the test. */ - if (!(features & UFFD_FEATURE_MINOR_HUGETLBFS)) { + /* If kernel reports required features aren't supported, skip test. */ + if ((features_out & req_features) != req_features) { printf("skipping test due to lack of feature support\n"); fflush(stdout); return 0; @@ -1431,7 +1439,7 @@ static int userfaultfd_minor_test(void) uffdio_register.range.len = nr_pages * page_size; uffdio_register.mode = UFFDIO_REGISTER_MODE_MINOR; if (ioctl(uffd, UFFDIO_REGISTER, _register)) { - fprintf(stderr, "register failure\n"); + perror("register failure"); exit(1); } @@ -1695,6 +1703,7 @@ static void set_test_type(const char *type) map_shared = true; test_type = TEST_SHMEM; uffd_test_ops = _uffd_test_ops; + test_uffdio_minor = true; } else { fprintf(stderr, "Unknown test type: %s\n", type); exit(1); } -- 2.30.1.766.gb4fecdf3b7-goog
[PATCH v2 3/5] userfaultfd/selftests: create alias mappings in the shmem test
Previously, we just allocated two shm areas: area_src and area_dst. With this commit, change this so we also allocate area_src_alias, and area_dst_alias. area_*_alias and area_* (respectively) point to the same underlying physical pages, but are different VMAs. In a future commit in this series, we'll leverage this setup to exercise minor fault handling support for shmem, just like we do in the hugetlb_shared test. Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/userfaultfd.c | 29 +--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 859398efb4fe..4a18590fe0f8 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -298,8 +298,9 @@ static int shmem_release_pages(char *rel_area) static void shmem_allocate_area(void **alloc_area) { - unsigned long offset = - alloc_area == (void **)_src ? 0 : nr_pages * page_size; + void *area_alias = NULL; + bool is_src = alloc_area == (void **)_src; + unsigned long offset = is_src ? 0 : nr_pages * page_size; *alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE, MAP_SHARED, shm_fd, offset); @@ -308,12 +309,34 @@ static void shmem_allocate_area(void **alloc_area) goto fail; } + area_alias = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE, + MAP_SHARED, shm_fd, offset); + if (area_alias == MAP_FAILED) { + perror("mmap of memfd alias failed"); + goto fail_munmap; + } + + if (is_src) + area_src_alias = area_alias; + else + area_dst_alias = area_alias; + return; +fail_munmap: + if (munmap(*alloc_area, nr_pages * page_size) < 0) { + perror("munmap of memfd failed\n"); + exit(1); + } fail: *alloc_area = NULL; } +static void shmem_alias_mapping(__u64 *start, size_t len, unsigned long offset) +{ + *start = (unsigned long)area_dst_alias + offset; +} + struct uffd_test_ops { unsigned long expected_ioctls; void (*allocate_area)(void **alloc_area); @@ -341,7 +364,7 @@ static struct uffd_test_ops shmem_uffd_test_ops = { .expected_ioctls = SHMEM_EXPECTED_IOCTLS, .allocate_area = shmem_allocate_area, .release_pages = shmem_release_pages, - .alias_mapping = noop_alias_mapping, + .alias_mapping = shmem_alias_mapping, }; static struct uffd_test_ops hugetlb_uffd_test_ops = { -- 2.30.1.766.gb4fecdf3b7-goog
[PATCH v2 2/5] userfaultfd/selftests: use memfd_create for shmem test type
This is a preparatory commit. In the future, we want to be able to setup alias mappings for area_src and area_dst in the shmem test, like we do in the hugetlb_shared test. With a VMA obtained via mmap(MAP_ANONYMOUS | MAP_SHARED), it isn't clear how to do this. So, mmap() with an fd, so we can create alias mappings. Use memfd_create instead of actually passing in a tmpfs path like hugetlb does, since it's more convenient / simpler to run, and works just as well. Future commits will: 1. Setup the alias mappings. 2. Extend our tests to actually take advantage of this, to test new userfaultfd behavior being introduced in this series. Also, a small fix in the area we're changing: when the hugetlb setup fails in main(), pass in the right argv[] so we actually print out the hugetlb file path. Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/userfaultfd.c | 35 1 file changed, 30 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index f5ab5e0312e7..859398efb4fe 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -85,6 +85,7 @@ static bool test_uffdio_wp = false; static bool test_uffdio_minor = false; static bool map_shared; +static int shm_fd; static int huge_fd; static char *huge_fd_off0; static unsigned long long *count_verify; @@ -297,12 +298,20 @@ static int shmem_release_pages(char *rel_area) static void shmem_allocate_area(void **alloc_area) { + unsigned long offset = + alloc_area == (void **)_src ? 0 : nr_pages * page_size; + *alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE, - MAP_ANONYMOUS | MAP_SHARED, -1, 0); + MAP_SHARED, shm_fd, offset); if (*alloc_area == MAP_FAILED) { - fprintf(stderr, "shared memory mmap failed\n"); - *alloc_area = NULL; + perror("mmap of memfd failed"); + goto fail; } + + return; + +fail: + *alloc_area = NULL; } struct uffd_test_ops { @@ -1672,15 +1681,31 @@ int main(int argc, char **argv) usage(); huge_fd = open(argv[4], O_CREAT | O_RDWR, 0755); if (huge_fd < 0) { - fprintf(stderr, "Open of %s failed", argv[3]); + fprintf(stderr, "Open of %s failed", argv[4]); perror("open"); exit(1); } if (ftruncate(huge_fd, 0)) { - fprintf(stderr, "ftruncate %s to size 0 failed", argv[3]); + fprintf(stderr, "ftruncate %s to size 0 failed", argv[4]); perror("ftruncate"); exit(1); } + } else if (test_type == TEST_SHMEM) { + shm_fd = memfd_create(argv[0], 0); + if (shm_fd < 0) { + perror("memfd_create"); + exit(1); + } + if (ftruncate(shm_fd, nr_pages * page_size * 2)) { + perror("ftruncate"); + exit(1); + } + if (fallocate(shm_fd, + FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0, + nr_pages * page_size * 2)) { + perror("fallocate"); + exit(1); + } } printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n", nr_pages, nr_pages_per_cpu); -- 2.30.1.766.gb4fecdf3b7-goog
[PATCH v2 4/5] userfaultfd/selftests: reinitialize test context in each test
Currently, the context (fds, mmap-ed areas, etc.) are global. Each test mutates this state in some way, in some cases really "clobbering it" (e.g., the events test mremap-ing area_dst over the top of area_src, or the minor faults tests overwriting the count_verify values in the test areas). We run the tests in a particular order, each test is careful to make the right assumptions about its starting state, etc. But, this is fragile. It's better for a test's success or failure to not depend on what some other prior test case did to the global state. To that end, clear and reinitialize the test context at the start of each test case, so whatever prior test cases did doesn't affect future tests. This is particularly relevant to this series because the events test's mremap of area_dst screws up assumptions the minor fault test was relying on. This wasn't a problem for hugetlb, as we don't mremap in that case. Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/userfaultfd.c | 249 ++- 1 file changed, 151 insertions(+), 98 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 4a18590fe0f8..5183ddb3080d 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -89,7 +89,8 @@ static int shm_fd; static int huge_fd; static char *huge_fd_off0; static unsigned long long *count_verify; -static int uffd, uffd_flags, finished, *pipefd; +static int uffd = -1; +static int uffd_flags, finished, *pipefd; static char *area_src, *area_src_alias, *area_dst, *area_dst_alias; static char *zeropage; pthread_attr_t attr; @@ -376,6 +377,146 @@ static struct uffd_test_ops hugetlb_uffd_test_ops = { static struct uffd_test_ops *uffd_test_ops; +static int userfaultfd_open(uint64_t *features) +{ + struct uffdio_api uffdio_api; + + uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); + if (uffd < 0) { + fprintf(stderr, + "userfaultfd syscall not available in this kernel\n"); + return 1; + } + uffd_flags = fcntl(uffd, F_GETFD, NULL); + + uffdio_api.api = UFFD_API; + uffdio_api.features = *features; + if (ioctl(uffd, UFFDIO_API, _api)) { + fprintf(stderr, "UFFDIO_API failed.\nPlease make sure to " + "run with either root or ptrace capability.\n"); + return 1; + } + if (uffdio_api.api != UFFD_API) { + fprintf(stderr, "UFFDIO_API error: %" PRIu64 "\n", + (uint64_t)uffdio_api.api); + return 1; + } + + *features = uffdio_api.features; + return 0; +} + +static int uffd_test_ctx_init_ext(uint64_t *features) +{ + unsigned long nr, cpu; + + uffd_test_ops->allocate_area((void **)_src); + if (!area_src) + return 1; + uffd_test_ops->allocate_area((void **)_dst); + if (!area_dst) + return 1; + + if (uffd_test_ops->release_pages(area_src)) + return 1; + + if (uffd_test_ops->release_pages(area_dst)) + return 1; + + if (userfaultfd_open(features)) + return 1; + + count_verify = malloc(nr_pages * sizeof(unsigned long long)); + if (!count_verify) { + perror("count_verify"); + return 1; + } + + for (nr = 0; nr < nr_pages; nr++) { + *area_mutex(area_src, nr) = + (pthread_mutex_t)PTHREAD_MUTEX_INITIALIZER; + count_verify[nr] = *area_count(area_src, nr) = 1; + /* +* In the transition between 255 to 256, powerpc will +* read out of order in my_bcmp and see both bytes as +* zero, so leave a placeholder below always non-zero +* after the count, to avoid my_bcmp to trigger false +* positives. +*/ + *(area_count(area_src, nr) + 1) = 1; + } + + pipefd = malloc(sizeof(int) * nr_cpus * 2); + if (!pipefd) { + perror("pipefd"); + return 1; + } + for (cpu = 0; cpu < nr_cpus; cpu++) { + if (pipe2([cpu * 2], O_CLOEXEC | O_NONBLOCK)) { + perror("pipe"); + return 1; + } + } + + return 0; +} + +static inline int uffd_test_ctx_init(uint64_t features) +{ + return uffd_test_ctx_init_ext(); +} + +static inline int munmap_area(void **area) +{ + if (*area) { + if (munmap(*area, nr_pages * page_size)) { + perror("munmap"); + return 1; + } + } + + *area = NULL; + return 0; +} + +static int uffd_test_ctx_clear(void) +{ + int ret = 0; + size_t i; + + if (pipefd) { + for (i = 0; i < nr_cpus * 2;
[PATCH v2 0/5] userfaultfd: support minor fault handling for shmem
Base This series is based on top of my series which adds minor fault handling for hugetlbfs [1]. (And, therefore, it is based on 5.12-rc1 and Peter Xu's series for disabling huge pmd sharing as well.) [1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmus...@google.com/T/#t Changelog = v1->v2: - For UFFDIO_CONTINUE, don't mess with page flags. Just use find_lock_page to get a locked page from the page cache, instead of doing __SetPageLocked. This fixes a VM_BUG_ON v1 hit when handling minor faults for THP-backed shmem (a tmpfs mounted with huge=always). Overview See my original series linked above for a detailed overview of minor fault handling in general. The feature in this series works exactly like the hugetblfs version (from userspace's perspective). I'm sending this as a separate series because: - The original minor fault handling series has a full set of R-Bs, and seems close to being merged. So, it seems reasonable to start looking at this next step, which extends the basic functionality. - shmem is different enough that this series may require some additional work before it's ready, and I don't want to delay the original series unnecessarily by bundling them together. Use Case In some cases it is useful to have VM memory backed by tmpfs instead of hugetlbfs. So, this feature will be used to support the same VM live migration use case described in my original series. Additionally, Android folks (Lokesh Gidra ) hope to optimize the Android Runtime garbage collector using this feature: "The plan is to use userfaultfd for concurrently compacting the heap. With this feature, the heap can be shared-mapped at another location where the GC-thread(s) could continue the compaction operation without the need to invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads get faults on the heap, UFFDIO_CONTINUE can be used to resume execution. Furthermore, this feature enables updating references in the 'non-moving' portion of the heap efficiently. Without this feature, uneccessary page copying (ioctl(UFFDIO_COPY)) would be required." Axel Rasmussen (5): userfaultfd: support minor fault handling for shmem userfaultfd/selftests: use memfd_create for shmem test type userfaultfd/selftests: create alias mappings in the shmem test userfaultfd/selftests: reinitialize test context in each test userfaultfd/selftests: exercise minor fault handling shmem support fs/userfaultfd.c | 6 +- include/linux/shmem_fs.h | 26 +- include/uapi/linux/userfaultfd.h | 4 +- mm/memory.c | 8 +- mm/shmem.c | 92 +++ mm/userfaultfd.c | 27 +- tools/testing/selftests/vm/userfaultfd.c | 322 +++ 7 files changed, 295 insertions(+), 190 deletions(-) -- 2.30.1.766.gb4fecdf3b7-goog
Re: [x86, build] 6dafca9780: WARNING:at_arch/x86/kernel/ftrace.c:#ftrace_verify_code
On Mon, 1 Mar 2021 14:14:51 -0800 Sami Tolvanen wrote: > Basically, the problem is that ftrace_replace_code() expects to find > ideal_nops[NOP_ATOMIC5] here, which in this case is 66:66:66:66:90, > while objtool has replaced the __fentry__ call with 0f:1f:44:00:00. > > As ideal_nops changes depending on kernel config and hardware, when > CC_USING_NOP_MCOUNT is defined we could either change > ftrace_nop_replace() to always use P6_NOP5, or skip > ftrace_verify_code() in ftrace_replace_code() for > FTRACE_UPDATE_MAKE_CALL. So I hacked up the code to get -mnop-record to work on x86, and checked the vmlinux and it gives me: 81bc6120 : 81bc6120: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 81bc6125: 55 push %rbp 81bc6126: 65 48 8b 2c 25 c0 7d 01 00 mov%gs:0x17dc0,%rbp 81bc612b: R_X86_64_32S current_task 81bc612f: 53 push %rbx 81bc6130: 48 8b 45 18 mov0x18(%rbp),%rax Which is the 0f:1f:44:00:00, and it works fine for me. Now, that could be because the ideal_nops[NOP_ATOMIC5] is the same, which would explain this. No, we should *not* change ftrace_nop_replace() to always use any P6_NOP5, as there was a reason we did this. Because not all nops are the same, and this gets called for *every* function that is traced. No, we should not skip ftrace_verify_code() *ever*. (/me was just referencing on twitter the scenario where ftrace bricked e1000e cards). This is probably why I never was much for the compiler conversion into nops, because it may chose the wrong one for the architecture. What we could do, is if the nop chosen by the compiler is not the ideal nop, to go back and modify all the nops added by the compiler to the ideal one, which would keep it using the most efficient one. Or, add something like this: diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index 7edbd5ee5ed4..aef3ea53f931 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -152,12 +152,19 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) { unsigned long ip = rec->ip; const char *new, *old; + int ret; old = ftrace_nop_replace(); new = ftrace_call_replace(ip, addr); /* Should only be called when module is loaded */ - return ftrace_modify_code_direct(rec->ip, old, new); + ret = ftrace_modify_code_direct(rec->ip, old, new); + if (__is_defined(CC_USING_NOP_MCOUNT) && ret) { + /* Compiler could have put in P6_NOP5 */ + old = P6_NOP5; + ret = ftrace_modify_code_direct(rec->ip, old, new); + } + return ret; } /* @@ -199,6 +206,8 @@ void ftrace_replace_code(int enable) int ret; for_ftrace_rec_iter(iter) { + bool old_nop = false; + rec = ftrace_rec_iter_record(iter); switch (ftrace_test_record(rec, enable)) { @@ -208,6 +217,7 @@ void ftrace_replace_code(int enable) case FTRACE_UPDATE_MAKE_CALL: old = ftrace_nop_replace(); + old_nop = true; break; case FTRACE_UPDATE_MODIFY_CALL: @@ -217,6 +227,13 @@ void ftrace_replace_code(int enable) } ret = ftrace_verify_code(rec->ip, old); + + if (__is_defined(CC_USING_NOP_MCOUNT) && ret && old_nop) { + /* Compiler could have put in P6_NOP5 */ + old = P6_NOP5; + ret = ftrace_verify_code(rec->ip, old); + } + if (ret) { ftrace_bug(ret, rec); return; -- Steve
Re: [PATCH v2] KVM: x86: Revise guest_fpu xcomp_bv field
On Thu, Feb 25, 2021, Jing Liu wrote: > XCOMP_BV[63] field indicates that the save area is in the compacted > format and XCOMP_BV[62:0] indicates the states that have space allocated > in the save area, including both XCR0 and XSS bits enabled by the host > kernel. Use xfeatures_mask_all for calculating xcomp_bv and reuse > XCOMP_BV_COMPACTED_FORMAT defined by kernel. > > Signed-off-by: Jing Liu > --- > arch/x86/kvm/x86.c | 8 ++-- > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 1b404e4d7dd8..f115493f577d 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4435,8 +4435,6 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct > kvm_vcpu *vcpu, > return 0; > } > > -#define XSTATE_COMPACTION_ENABLED (1ULL << 63) > - > static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu) > { > struct xregs_state *xsave = >arch.guest_fpu->state.xsave; > @@ -4494,7 +4492,8 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src) > /* Set XSTATE_BV and possibly XCOMP_BV. */ > xsave->header.xfeatures = xstate_bv; > if (boot_cpu_has(X86_FEATURE_XSAVES)) > - xsave->header.xcomp_bv = host_xcr0 | XSTATE_COMPACTION_ENABLED; > + xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | > + xfeatures_mask_all; Doesn't fill_xsave also need to be updated? Not with xfeatures_mask_all, but to account for arch.ia32_xss? I believe it's a nop with the current code, since supported_xss is zero, but it should be fixed, no? > > /* >* Copy each region from the non-compacted offset to the > @@ -9912,9 +9911,6 @@ static void fx_init(struct kvm_vcpu *vcpu) > return; > > fpstate_init(>arch.guest_fpu->state); > - if (boot_cpu_has(X86_FEATURE_XSAVES)) > - vcpu->arch.guest_fpu->state.xsave.header.xcomp_bv = > - host_xcr0 | XSTATE_COMPACTION_ENABLED; Ugh, this _really_ needs a comment in the changelog. It took me a while to realize fpstate_init() does exactly what the new fill_xave() is doing. And isn't the code in load_xsave() redundant and can be removed? Any code that uses get_xsave_addr() would be have a dependency on load_xsave() if it's not redundant, and I can't see how that would work. > > /* >* Ensure guest xcr0 is valid for loading > -- > 2.18.4 >
Re: [PATCH] sysctl: use min() helper for namecmp()
On Sun, Feb 28, 2021 at 04:44:22PM +0900, Masahiro Yamada wrote: > (CC: Andrew Morton) > > A friendly reminder. > > > This is just a minor clean-up. > > If nobody picks it up, > I hope perhaps Andrew Morton will do. > > This patch: > https://lore.kernel.org/patchwork/patch/1360092/ > > > > > > On Mon, Jan 4, 2021 at 5:33 PM Masahiro Yamada wrote: > > > > Make it slightly readable by using min(). > > > > Signed-off-by: Masahiro Yamada Acked-by: Kees Cook Feel free to take this via your tree Masahiro. Thanks! -Kees > > --- > > > > fs/proc/proc_sysctl.c | 7 +-- > > 1 file changed, 1 insertion(+), 6 deletions(-) > > > > diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c > > index 317899222d7f..86341c0f0c40 100644 > > --- a/fs/proc/proc_sysctl.c > > +++ b/fs/proc/proc_sysctl.c > > @@ -94,14 +94,9 @@ static void sysctl_print_dir(struct ctl_dir *dir) > > > > static int namecmp(const char *name1, int len1, const char *name2, int > > len2) > > { > > - int minlen; > > int cmp; > > > > - minlen = len1; > > - if (minlen > len2) > > - minlen = len2; > > - > > - cmp = memcmp(name1, name2, minlen); > > + cmp = memcmp(name1, name2, min(len1, len2)); > > if (cmp == 0) > > cmp = len1 - len2; > > return cmp; > > -- > > 2.27.0 > > > > > -- > Best Regards > Masahiro Yamada -- Kees Cook Reviewed-by: Kees Cook -- Kees Cook
Re: [PATCH net] net: dsa: tag_mtk: fix 802.1ad VLAN egress
Hello: This patch was applied to netdev/net.git (refs/heads/master): On Tue, 2 Mar 2021 00:01:59 +0800 you wrote: > A different TPID bit is used for 802.1ad VLAN frames. > > Reported-by: Ilario Gelmetti > Fixes: f0af34317f4b ("net: dsa: mediatek: combine MediaTek tag with VLAN tag") > Signed-off-by: DENG Qingfang > --- > net/dsa/tag_mtk.c | 19 +-- > 1 file changed, 13 insertions(+), 6 deletions(-) Here is the summary with links: - [net] net: dsa: tag_mtk: fix 802.1ad VLAN egress https://git.kernel.org/netdev/net/c/9200f515c41f You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
[tip:x86/platform] BUILD SUCCESS 2430915f8291212f2bd2155176b817c34a18a2b1
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/platform branch HEAD: 2430915f8291212f2bd2155176b817c34a18a2b1 x86/platform/uv: Fix indentation warning in Documentation/ABI/testing/sysfs-firmware-sgi_uv elapsed time: 720m configs tested: 95 configs skipped: 2 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64 defconfig arm64allyesconfig arm allyesconfig arm allmodconfig arm moxart_defconfig m68kq40_defconfig powerpc katmai_defconfig alpha defconfig ia64 alldefconfig powerpc makalu_defconfig powerpc chrp32_defconfig i386 allyesconfig mipsjmr3927_defconfig arcnsim_700_defconfig arm nhk8815_defconfig armzeus_defconfig mips cu1830-neo_defconfig sh rsk7269_defconfig mips mpc30x_defconfig arm versatile_defconfig sparc defconfig sparc64 defconfig shapsh4ad0a_defconfig powerpc canyonlands_defconfig sh sh7710voipgw_defconfig mips decstation_r4k_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 allnoconfig c6x allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig s390 allmodconfig parisc allyesconfig s390defconfig sparcallyesconfig i386 tinyconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig i386 randconfig-a006-20210228 i386 randconfig-a005-20210228 i386 randconfig-a004-20210228 i386 randconfig-a003-20210228 i386 randconfig-a001-20210228 i386 randconfig-a002-20210228 x86_64 randconfig-a013-20210301 x86_64 randconfig-a016-20210301 x86_64 randconfig-a015-20210301 x86_64 randconfig-a014-20210301 x86_64 randconfig-a012-20210301 x86_64 randconfig-a011-20210301 i386 randconfig-a016-20210301 i386 randconfig-a012-20210301 i386 randconfig-a014-20210301 i386 randconfig-a013-20210301 i386 randconfig-a011-20210301 i386 randconfig-a015-20210301 riscvnommu_k210_defconfig riscvallyesconfig riscvnommu_virt_defconfig riscv allnoconfig riscv defconfig riscv rv32_defconfig riscvallmodconfig x86_64 allyesconfig x86_64rhel-7.6-kselftests x86_64 defconfig x86_64 rhel-8.3 x86_64 rhel-8.3-kbuiltin x86_64 kexec clang tested configs: x86_64 randconfig-a006-20210301 x86_64 randconfig-a001-20210301 x86_64 randconfig-a004-20210301 x86_64 randconfig-a002-20210301 x86_64 randconfig-a005-20210301 x86_64 randconfig-a003-20210301 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
Re: [PATCH] spi: cadence-quadspi: add missing of_node_put
On Mon, 15 Feb 2021 19:04:25 +0800, angkery wrote: > Fix OF node leaks by calling of_node_put in > for_each_available_child_of_node when the cycle returns. > > Generated by: scripts/coccinelle/iterators/for_each_child.cocci Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next Thanks! [1/1] spi: cadence-quadspi: add missing of_node_put commit: 44233a5ba2511b85da3c055a0ab7c28976544e47 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH] scsi: ufs: Fix incorrect ufshcd_state after ufshcd_reset_and_restore()
On Mon, Mar 01 2021 at 11:19 -0800, Adrian Hunter wrote: If ufshcd_probe_hba() fails it sets ufshcd_state to UFSHCD_STATE_ERROR, however, if it is called again, as it is within a loop in ufshcd_reset_and_restore(), and succeeds, then it will not set the state back to UFSHCD_STATE_OPERATIONAL unless the state was UFSHCD_STATE_RESET. That can result in the state being UFSHCD_STATE_ERROR even though ufshcd_reset_and_restore() is successful and returns zero. Fix by initializing the state to UFSHCD_STATE_RESET in the start of each loop in ufshcd_reset_and_restore(). If there is an error, ufshcd_reset_and_restore() will change the state to UFSHCD_STATE_ERROR, otherwise ufshcd_probe_hba() will have set the state appropriately. Fixes: 4db7a2360597 ("scsi: ufs: Fix concurrency of error handler and other error recovery paths") Signed-off-by: Adrian Hunter --- Reviewed-by: Asutosh Das drivers/scsi/ufs/ufshcd.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 77161750c9fb..91a403afe038 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -7031,6 +7031,8 @@ static int ufshcd_reset_and_restore(struct ufs_hba *hba) spin_unlock_irqrestore(hba->host->host_lock, flags); do { + hba->ufshcd_state = UFSHCD_STATE_RESET; + /* Reset the attached device */ ufshcd_device_reset(hba); -- 2.17.1
Re: seccomp: Delay filter activation
On Mon, Mar 01, 2021 at 02:21:56PM +0100, Christian Brauner wrote: > On Mon, Mar 01, 2021 at 12:09:09PM +0100, Christian Brauner wrote: > > On Sat, Feb 20, 2021 at 01:31:57AM -0800, Sargun Dhillon wrote: > > > We've run into a problem where attaching a filter can be quite messy > > > business because the filter itself intercepts sendmsg, and other > > > syscalls related to exfiltrating the listener FD. I believe that this > > > problem set has been brought up before, and although there are > > > "simpler" methods of exfiltrating the listener, like clone3 or > > > pidfd_getfd, but these are still less than ideal. I'm trying to make sure I understand: the target process would like to have a filter attached that blocks sendmsg, but that would mean it has no way to send the listener FD to its manager? And you'd want to have listening working for sendmsg (otherwise you could do it with two filters, I imagine)? > > int fd_filter = seccomp(SECCOMP_SET_MODE_FILTER, > > SECCOMP_FILTER_DETACHED, ); > > > > BARRIER_WAIT_SETUP_DONE; > > > > int ret = seccomp(SECCOMP_ATTACH_FILTER, 0, INT_TO_PTR(fd_listener)); > > This obviously should've been sm like: > > struct seccomp_filter_attach { > union { > __s32 pidfd; > __s32 pid; > }; > __u32 fd_filter; > }; > > and then > > int ret = seccomp(SECCOMP_ATTACH_FILTER, 0, seccomp_filter_attach); Given the difficulty with TSYNC, I'm not excited about adding an "apply this filter to another process" API. :) The prior thread was here: https://lore.kernel.org/lkml/20201029075841.GB29881@ircssh-2.c.rugged-nimbus-611.internal/ But I haven't had time to follow up. Both Andy and Sargun discuss filter "replacement", but I'm not a fan of that, since I'd really like to keep the "additive-only" property of seccomp. So, I'm still back to wanting an answer to my questions at the end of https://lore.kernel.org/lkml/202010281503.3D1FCFE0@keescook/ Namely, how to best indicate the point of execution where "delayed" filters become applied? If we require supporting the "2b" (launched oblivious target) case (which I think we must), we need to signal it externally, or via an automatic trip point. Since synchronizing with an oblivious target is rather nasty (e.g. involving ptrace or at least ptrace access checking), I'd rather create a predefined trip point. Having it be "execve" limits the utility of this feature for cooperating targets, though, so I think "apply on exec" isn't great. struct seccomp_filter_attach_trigger { u64 nr; unsigned char *filter; }; seccomp(SECCOMP_ATTACH_FILTER_TRIGGER, 0, seccomp_filter_attach_trigger); after "nr" is evaluated (but before it runs), seccomp installs the filter. And by "installs", I'm not sure if it needs to keep it in a queue, with separate ref coutning, or if it should be in the main filter stack, but have an "alive" toggle, or what. -- Kees Cook
Re: [PATCH] spi: rockchip: avoid objtool warning
On Thu, 25 Feb 2021 13:55:34 +0100, Arnd Bergmann wrote: > Building this file with clang leads to a an unreachable code path > causing a warning from objtool: > > drivers/spi/spi-rockchip.o: warning: objtool: > rockchip_spi_transfer_one()+0x2e0: sibling call from callable instruction > with modified stack frame > > Use BUG() instead of unreachable() to avoid the undefined behavior > if it does happen. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next Thanks! [1/1] spi: rockchip: avoid objtool warning commit: d86e880f7a7c5b64a650146a1353f98750863f21 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH] spi: atmel: Drop unused variable
On Thu, 18 Feb 2021 15:28:40 +0200, Tudor Ambarus wrote: > The DMA cap mask is no longer used since: > commit 7758e390699f ("spi: atmel: remove compat for non DT board when > requesting dma chan") > Drop it now. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next Thanks! [1/1] spi: atmel: Drop unused variable commit: c5f754fd0a31d2c6f2f8d11f3db1427b5566f1e7 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH] [v2] spi: rockchip: avoid objtool warning
On Fri, 26 Feb 2021 15:00:48 +0100, Arnd Bergmann wrote: > Building this file with clang leads to a an unreachable code path > causing a warning from objtool: > > drivers/spi/spi-rockchip.o: warning: objtool: > rockchip_spi_transfer_one()+0x2e0: sibling call from callable instruction > with modified stack frame > > Change the unreachable() into an error return that can be > handled if it ever happens, rather than silently crashing > the kernel. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next Thanks! [1/1] spi: rockchip: avoid objtool warning commit: d86e880f7a7c5b64a650146a1353f98750863f21 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH] spi: omap2-mcspi: Activate pinctrl idle state during runtime suspend
On Mon, 22 Feb 2021 03:32:43 +0100, Alexander Sverdlin wrote: > Set the (optional) idle pinctrl state during runtime suspend. This is the > same schema used in PL022 driver and can help with HW designs sharing > the SPI lines for different purposes. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next Thanks! [1/1] spi: omap2-mcspi: Activate pinctrl idle state during runtime suspend commit: 9923f8e3039ed0361c2476d5d3c5195c7f766504 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH] qcom: spmi-regulator: Add support for ULT LV_P50 and ULT P300
On Thu, 25 Feb 2021 22:35:13 +0100, Konrad Dybcio wrote: > The ULT LV_P50 shares the same configuration as the other ULT LV_Pxxx > and the ULT P300 shares the same as the other ULT Pxxx. > > These two regulator types are found on PM8950 and its variants. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next Thanks! [1/1] qcom: spmi-regulator: Add support for ULT LV_P50 and ULT P300 commit: b15d870510c0a3910c9980ebceab885a390af60c All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH] regulator: pf8x00: Use regulator_map_voltage_ascend for pf8x00_buck7_ops
On Tue, 16 Feb 2021 14:01:28 +0800, Axel Lin wrote: > The voltages in pf8x00_sw7_voltages are in ascendant order, so use > regulator_map_voltage_ascend. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next Thanks! [1/1] regulator: pf8x00: Use regulator_map_voltage_ascend for pf8x00_buck7_ops commit: 6930ab7ac03c1be5d1944473cbf327c9d4d14ce4 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH] [v2] Input: Add "Share" button to Microsoft Xbox One controller.
Hi Cameron, I was first thinking of adding a new XTYPE but then realized it is still XBox One but just a model with extra button, so adding MAP_SHARE_BUTTON would avoid adding a new XTYPE there. Addressed the name to be "Microsoft Xbox One X pad" and removed the {}, please review again, thanks! Chris On Sat, Feb 27, 2021 at 6:01 PM Cameron Gutman wrote: > > On 2/24/21 11:32 PM, Chris Ye wrote: > > Add "Share" button input capability and input event mapping for > > Microsoft Xbox One controller. > > Fixed Microsoft Xbox One controller share button not working under USB > > connection. > > > > Signed-off-by: Chris Ye > > --- > > drivers/input/joystick/xpad.c | 9 - > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/input/joystick/xpad.c b/drivers/input/joystick/xpad.c > > index 9f0d07dcbf06..0c3374091aff 100644 > > --- a/drivers/input/joystick/xpad.c > > +++ b/drivers/input/joystick/xpad.c > > @@ -79,6 +79,7 @@ > > #define MAP_DPAD_TO_BUTTONS (1 << 0) > > #define MAP_TRIGGERS_TO_BUTTONS (1 << 1) > > #define MAP_STICKS_TO_NULL (1 << 2) > > +#define MAP_SHARE_BUTTON (1 << 3) > > #define DANCEPAD_MAP_CONFIG (MAP_DPAD_TO_BUTTONS | \ > > MAP_TRIGGERS_TO_BUTTONS | MAP_STICKS_TO_NULL) > > > > @@ -130,6 +131,7 @@ static const struct xpad_device { > > { 0x045e, 0x02e3, "Microsoft X-Box One Elite pad", 0, XTYPE_XBOXONE }, > > { 0x045e, 0x02ea, "Microsoft X-Box One S pad", 0, XTYPE_XBOXONE }, > > { 0x045e, 0x0719, "Xbox 360 Wireless Receiver", MAP_DPAD_TO_BUTTONS, > > XTYPE_XBOX360W }, > > + { 0x045e, 0x0b12, "Microsoft X-Box One X pad", MAP_SHARE_BUTTON, > > XTYPE_XBOXONE }, > > Let's use 'Xbox' for new entries instead of 'X-Box'. There was an effort to > standardize on 'Xbox' (which is what Microsoft uses), but changing device > names can impact userspace which may use these names in mapping heuristics > (SDL does this). We can at least not make the problem worse though. > > > { 0x046d, 0xc21d, "Logitech Gamepad F310", 0, XTYPE_XBOX360 }, > > { 0x046d, 0xc21e, "Logitech Gamepad F510", 0, XTYPE_XBOX360 }, > > { 0x046d, 0xc21f, "Logitech Gamepad F710", 0, XTYPE_XBOX360 }, > > @@ -862,6 +864,8 @@ static void xpadone_process_packet(struct usb_xpad > > *xpad, u16 cmd, unsigned char > > /* menu/view buttons */ > > input_report_key(dev, BTN_START, data[4] & 0x04); > > input_report_key(dev, BTN_SELECT, data[4] & 0x08); > > + if (xpad->mapping & MAP_SHARE_BUTTON) > > + input_report_key(dev, KEY_RECORD, data[22] & 0x01); > > > > I was worried adding a button to an existing supported gamepad like this > might cause a breaking change to SDL's gamepad mapping for this gamepad, > since SDL assigns each present button an index rather than using the keycodes > directly (adding a new one could change the old indices). Fortunately, SDL > always processes buttons in the BTN_GAMEPAD range first, so this new button > ends up at the end of the list anyway. > > > > /* buttons A,B,X,Y */ > > input_report_key(dev, BTN_A,data[4] & 0x10); > > @@ -1669,9 +1673,12 @@ static int xpad_init_input(struct usb_xpad *xpad) > > > > /* set up model-specific ones */ > > if (xpad->xtype == XTYPE_XBOX360 || xpad->xtype == XTYPE_XBOX360W || > > - xpad->xtype == XTYPE_XBOXONE) { > > + xpad->xtype == XTYPE_XBOXONE) { > > for (i = 0; xpad360_btn[i] >= 0; i++) > > input_set_capability(input_dev, EV_KEY, > > xpad360_btn[i]); > > + if (xpad->mapping & MAP_SHARE_BUTTON) { > > + input_set_capability(input_dev, EV_KEY, KEY_RECORD); > > + } > > Style nit: Drop the uneeded {} here > > > } else { > > for (i = 0; xpad_btn[i] >= 0; i++) > > input_set_capability(input_dev, EV_KEY, xpad_btn[i]); > > > LGTM, other than the minor changes suggested above. > > > Regards, > Cameron
Re: [PATCH v2] regulator: add missing call to of_node_put()
On Fri, 26 Feb 2021 09:39:35 +0800, Yang Li wrote: > In one of the error paths of the for_each_child_of_node() loop, > add missing call to of_node_put(). > > Fix the following coccicheck warning: > ./drivers/regulator/scmi-regulator.c:343:1-23: WARNING: Function > "for_each_child_of_node" should have of_node_put() before return around > line 347. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next Thanks! [1/1] regulator: add missing call to of_node_put() commit: 755a74fc655ee95ce37bb0f552cbd39b52978a05 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH v2] regulator: pca9450: Clear PRESET_EN bit to fix BUCK1/2/3 voltage setting
On Mon, 22 Feb 2021 12:52:20 +0100, Schrempf Frieder wrote: > The driver uses the DVS registers PCA9450_REG_BUCKxOUT_DVS0 to set the > voltage for the buck regulators 1, 2 and 3. This has no effect as the > PRESET_EN bit is set by default and therefore the preset values are used > instead, which are set to 850 mV. > > To fix this we clear the PRESET_EN bit at time of initialization. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next Thanks! [1/1] regulator: pca9450: Clear PRESET_EN bit to fix BUCK1/2/3 voltage setting commit: 66f9f2d5d94f374605d829b9e690e8cdc9d0d05d All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH] ASoC: fsl_xcvr: move reset assert into runtime_resume
On Mon, 22 Feb 2021 17:09:50 +0800, Shengjiu Wang wrote: > Move reset assert into runtime_resume since we > cannot rely on reset assert state when the device > is put out from suspend. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/1] ASoC: fsl_xcvr: move reset assert into runtime_resume commit: 0f780e4bef4587f07060109040955d6b6aa179a2 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH 0/4] drop unneeded snd_soc_dai_set_drvdata
On Sat, 13 Feb 2021 11:19:03 +0100, Julia Lawall wrote: > snd_soc_dai_set_drvdata is not needed when the set data comes from > snd_soc_dai_get_drvdata or dev_get_drvdata. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/4] ASoC: mmp-sspa: drop unneeded snd_soc_dai_set_drvdata commit: 131036ffae211a9cc3bfb053fadce87484e13fc5 [2/4] ASoC: mxs-saif: drop unneeded snd_soc_dai_set_drvdata commit: 7150186f1edb2fa94554be1bec26aa65a7df3388 [3/4] ASoC: sun4i-i2s: drop unneeded snd_soc_dai_set_drvdata commit: 0c34af2d5c9ba5103637c33c4f52d658172b991d [4/4] ASoC: fsl: drop unneeded snd_soc_dai_set_drvdata commit: eb9db3066cdb57dbfd1fb3d85ca143ad5d719bfb All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH] ASoC: Intel: boards: sof-wm8804: add check for PLL setting
On Fri, 26 Feb 2021 18:56:53 +, Colin King wrote: > Currently the return from snd_soc_dai_set_pll is not checking for > failure, this is the only driver in the kernel that ignores this, > so it probably should be added for sake of completeness. Fix this > by adding an error return check. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/1] ASoC: Intel: boards: sof-wm8804: add check for PLL setting commit: e067855b814600248234a2a7283a7a9006e5aadc All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH] sound: soc/uniphier: Simplify the return expression of uniphier_aio_startup
On Wed, 24 Feb 2021 16:54:07 +0800, dingsen...@163.com wrote: > Simplify the return expression in the aio-cpu.c. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/1] sound: soc/uniphier: Simplify the return expression of uniphier_aio_startup commit: e3fdb6288dd08d965dea4bf00186e20f79153b2b All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
[tip:locking/urgent] BUILD SUCCESS 8b97c027dfe4ba195be08fd0e18f716005763b8a
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git locking/urgent branch HEAD: 8b97c027dfe4ba195be08fd0e18f716005763b8a static_call: Fix the module key fixup elapsed time: 722m configs tested: 95 configs skipped: 2 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64 defconfig arm64allyesconfig arm allyesconfig arm allmodconfig arm moxart_defconfig m68kq40_defconfig powerpc katmai_defconfig alpha defconfig ia64 alldefconfig powerpc makalu_defconfig powerpc chrp32_defconfig i386 allyesconfig mipsjmr3927_defconfig arcnsim_700_defconfig arm nhk8815_defconfig armzeus_defconfig mips cu1830-neo_defconfig sh rsk7269_defconfig mips mpc30x_defconfig arm versatile_defconfig sparc defconfig sparc64 defconfig shapsh4ad0a_defconfig powerpc canyonlands_defconfig sh sh7710voipgw_defconfig mips decstation_r4k_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 allnoconfig c6x allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig s390 allmodconfig parisc allyesconfig s390defconfig sparcallyesconfig i386 tinyconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig i386 randconfig-a006-20210228 i386 randconfig-a005-20210228 i386 randconfig-a004-20210228 i386 randconfig-a003-20210228 i386 randconfig-a001-20210228 i386 randconfig-a002-20210228 x86_64 randconfig-a013-20210301 x86_64 randconfig-a016-20210301 x86_64 randconfig-a015-20210301 x86_64 randconfig-a014-20210301 x86_64 randconfig-a012-20210301 x86_64 randconfig-a011-20210301 i386 randconfig-a016-20210301 i386 randconfig-a012-20210301 i386 randconfig-a014-20210301 i386 randconfig-a013-20210301 i386 randconfig-a011-20210301 i386 randconfig-a015-20210301 riscvnommu_k210_defconfig riscvallyesconfig riscvnommu_virt_defconfig riscv allnoconfig riscv defconfig riscv rv32_defconfig riscvallmodconfig x86_64 allyesconfig x86_64rhel-7.6-kselftests x86_64 defconfig x86_64 rhel-8.3 x86_64 rhel-8.3-kbuiltin x86_64 kexec clang tested configs: x86_64 randconfig-a006-20210301 x86_64 randconfig-a001-20210301 x86_64 randconfig-a004-20210301 x86_64 randconfig-a002-20210301 x86_64 randconfig-a005-20210301 x86_64 randconfig-a003-20210301 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
Re: [PATCH 0/4] ASoC: rt*: Constify static structs
On Wed, 24 Feb 2021 22:19:14 +0100, Rikard Falkeborn wrote: > Constify a number of static structs that are never modified in RealTek > codecs. The most important patches are the first two, which constifies > snd_soc_dai_ops and sdw_slave_ops, both which contain function pointers. > The other two patches are for good measure, since I was already touching > the code there. > > When doing this, I discovered sound/soc/codecs/rt1016.c is not in a > Makefile, so there is not really any way to build it (I added locally to > the Makefile to compile-test my changes). Is this expected or an oversight? > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/4] ASoC: rt*: Constify static struct sdw_slave_ops commit: 3ebb1b951880d3152547ac4018bfcce0fd7810bd [2/4] ASoC: rt*: Constify static struct snd_soc_dai_ops commit: 84732dd4ff3ad28cc65eedfa3061fe3808e8469b [3/4] ASoC: rt*: Constify static struct acpi_device_id commit: c85ca92c716bd04981ebcd2c67cd03f96748859e [4/4] ASoc: rt5631: Constify static struct coeff_clk_div commit: 39f9eb61307061eed197eae651ef56cb3544f9b2 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH] ASoC: constify of_phandle_args in snd_soc_get_dai_name()
On Sun, 21 Feb 2021 16:30:24 +0100, Krzysztof Kozlowski wrote: > The pointer to of_phandle_args passed to snd_soc_get_dai_name() and > of_xlate_dai_name() implementations is not modified. Since it is being > used only to translate passed OF node to a DAI name, it should not be > modified, so mark it as const for correctness and safer code. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/1] ASoC: constify of_phandle_args in snd_soc_get_dai_name() commit: 54928c5c63c83afd5a1c2a91802a9c37e9a4ff88 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH] ASoC: fsl_sai: Add pm qos cpu latency support
On Mon, 22 Feb 2021 16:40:20 +0800, Shengjiu Wang wrote: > On SoCs such as i.MX7ULP, cpuidle has some levels which > may disable system/bus clocks, so need to add pm_qos to > prevent cpuidle from entering low level idles and make sure > system/bus clocks are enabled when sai is active. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/1] ASoC: fsl_sai: Add pm qos cpu latency support commit: 6d85d770c171972c0f33f74b84bf0fedc111e89f All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH 0/9] ASoC: fsl: remove cppcheck warnings
On Fri, 19 Feb 2021 17:29:28 -0600, Pierre-Louis Bossart wrote: > Nothing critical and no functional changes. > > The only change that needs attention if the 'fsl_ssi: remove > unnecessary tests' patch, where variables are to zero, then tested to > set register fields. Either the tests are indeed redundant or the > entire programming sequence is incorrect. > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/9] ASoC: fsl: fsl_asrc: remove useless assignment commit: ca289c2c70c131dc2d4a37e5f6f5c71acfc7cb8b [2/9] ASoC: fsl: fsl_dma: remove unused variable commit: faff74679f510b9e469238b8ff610eb2b8ad5602 [3/9] ASoC: fsl: fsl_easrc: remove useless assignments commit: e80382fe721f71100cd49e209fbac260042a0106 [4/9] ASoC: fsl: fsl_esai: clarify expression commit: e7347520a4323fafea1df84abb29ae979c595931 [5/9] ASoC: fsl: fsl_ssi: remove unnecessary tests commit: e06a8f1a7c4ceb9f3f804bbe5e2fd25230bc91b1 [6/9] ASoC: fsl: imx-hdmi: remove unused structure members commit: 40e2c4450a34429b6343a7c8f80b4c6715bbd393 [7/9] ASoC: fsl: mpc5200: signed parameter in snprintf format commit: 5a6d43108095c2bb94947ccf3f53a7e71ae5774e [8/9] ASoC: fsl: mpc8610: remove useless assignment commit: 3fb0dcec3e60466afd6a3d770c06a8a879160f68 [9/9] ASoC: fsl: p1022_ds: remove useless assignment commit: bafe21c9d01b3f39d26ff6271905c5c9ef00dc44 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
[tip:perf/urgent] BUILD SUCCESS a8abc881981762631a22568d5e4b2c0ce4aeb15c
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/urgent branch HEAD: a8abc881981762631a22568d5e4b2c0ce4aeb15c perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR elapsed time: 721m configs tested: 95 configs skipped: 2 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64 defconfig arm64allyesconfig arm allyesconfig arm allmodconfig arm moxart_defconfig m68kq40_defconfig powerpc katmai_defconfig alpha defconfig ia64 alldefconfig powerpc makalu_defconfig powerpc chrp32_defconfig i386 allyesconfig mipsjmr3927_defconfig arcnsim_700_defconfig arm nhk8815_defconfig armzeus_defconfig mips cu1830-neo_defconfig sh rsk7269_defconfig mips mpc30x_defconfig arm versatile_defconfig sparc defconfig sparc64 defconfig shapsh4ad0a_defconfig powerpc canyonlands_defconfig sh sh7710voipgw_defconfig mips decstation_r4k_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 allnoconfig c6x allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig s390 allmodconfig parisc allyesconfig s390defconfig sparcallyesconfig i386 tinyconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig i386 randconfig-a006-20210228 i386 randconfig-a005-20210228 i386 randconfig-a004-20210228 i386 randconfig-a003-20210228 i386 randconfig-a001-20210228 i386 randconfig-a002-20210228 x86_64 randconfig-a013-20210301 x86_64 randconfig-a016-20210301 x86_64 randconfig-a015-20210301 x86_64 randconfig-a014-20210301 x86_64 randconfig-a012-20210301 x86_64 randconfig-a011-20210301 i386 randconfig-a016-20210301 i386 randconfig-a012-20210301 i386 randconfig-a014-20210301 i386 randconfig-a013-20210301 i386 randconfig-a011-20210301 i386 randconfig-a015-20210301 riscvnommu_k210_defconfig riscvallyesconfig riscvnommu_virt_defconfig riscv allnoconfig riscv defconfig riscv rv32_defconfig riscvallmodconfig x86_64 allyesconfig x86_64rhel-7.6-kselftests x86_64 defconfig x86_64 rhel-8.3 x86_64 rhel-8.3-kbuiltin x86_64 kexec clang tested configs: x86_64 randconfig-a006-20210301 x86_64 randconfig-a001-20210301 x86_64 randconfig-a004-20210301 x86_64 randconfig-a002-20210301 x86_64 randconfig-a005-20210301 x86_64 randconfig-a003-20210301 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
Re: [PATCH][next] ASoC: codecs: lpass-rx-macro: remove redundant initialization of variable hph_pwr_mode
On Mon, 15 Feb 2021 20:05:01 +, Colin King wrote: > The variable hph_pwr_mode is being initialized with a value that is > never read and it is being updated later with a new value. The > initialization is redundant and can be removed. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/1] ASoC: codecs: lpass-rx-macro: remove redundant initialization of variable hph_pwr_mode commit: 7f7d1c4fce10ca68e87165898e6232353e4be1af All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
[tip:sched/urgent] BUILD SUCCESS fba111913e51a934eaad85734254eab801343836
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/urgent branch HEAD: fba111913e51a934eaad85734254eab801343836 sched/membarrier: fix missing local execution of ipi_sync_rq_state() elapsed time: 720m configs tested: 95 configs skipped: 2 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64 defconfig arm64allyesconfig arm allyesconfig arm allmodconfig arm moxart_defconfig m68kq40_defconfig powerpc katmai_defconfig alpha defconfig ia64 alldefconfig powerpc makalu_defconfig powerpc chrp32_defconfig i386 allyesconfig mipsjmr3927_defconfig arcnsim_700_defconfig arm nhk8815_defconfig armzeus_defconfig mips cu1830-neo_defconfig sh rsk7269_defconfig mips mpc30x_defconfig arm versatile_defconfig sparc defconfig sparc64 defconfig shapsh4ad0a_defconfig powerpc canyonlands_defconfig sh sh7710voipgw_defconfig mips decstation_r4k_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 allnoconfig c6x allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig s390 allmodconfig parisc allyesconfig s390defconfig sparcallyesconfig i386 tinyconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig i386 randconfig-a006-20210228 i386 randconfig-a005-20210228 i386 randconfig-a004-20210228 i386 randconfig-a003-20210228 i386 randconfig-a001-20210228 i386 randconfig-a002-20210228 x86_64 randconfig-a013-20210301 x86_64 randconfig-a016-20210301 x86_64 randconfig-a015-20210301 x86_64 randconfig-a014-20210301 x86_64 randconfig-a012-20210301 x86_64 randconfig-a011-20210301 i386 randconfig-a016-20210301 i386 randconfig-a012-20210301 i386 randconfig-a014-20210301 i386 randconfig-a013-20210301 i386 randconfig-a011-20210301 i386 randconfig-a015-20210301 riscvnommu_k210_defconfig riscvallyesconfig riscvnommu_virt_defconfig riscv allnoconfig riscv defconfig riscv rv32_defconfig riscvallmodconfig x86_64 allyesconfig x86_64rhel-7.6-kselftests x86_64 defconfig x86_64 rhel-8.3 x86_64 rhel-8.3-kbuiltin x86_64 kexec clang tested configs: x86_64 randconfig-a006-20210301 x86_64 randconfig-a001-20210301 x86_64 randconfig-a004-20210301 x86_64 randconfig-a002-20210301 x86_64 randconfig-a005-20210301 x86_64 randconfig-a003-20210301 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
[PATCH] [v3] Input: Add "Share" button to Microsoft Xbox One controller.
Add "Share" button input capability and input event mapping for Microsoft Xbox One controller. Fixed Microsoft Xbox One controller share button not working under USB connection. Signed-off-by: Chris Ye --- drivers/input/joystick/xpad.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/input/joystick/xpad.c b/drivers/input/joystick/xpad.c index 9f0d07dcbf06..b51c0e381cc9 100644 --- a/drivers/input/joystick/xpad.c +++ b/drivers/input/joystick/xpad.c @@ -79,6 +79,7 @@ #define MAP_DPAD_TO_BUTTONS(1 << 0) #define MAP_TRIGGERS_TO_BUTTONS(1 << 1) #define MAP_STICKS_TO_NULL (1 << 2) +#define MAP_SHARE_BUTTON (1 << 3) #define DANCEPAD_MAP_CONFIG(MAP_DPAD_TO_BUTTONS | \ MAP_TRIGGERS_TO_BUTTONS | MAP_STICKS_TO_NULL) @@ -130,6 +131,7 @@ static const struct xpad_device { { 0x045e, 0x02e3, "Microsoft X-Box One Elite pad", 0, XTYPE_XBOXONE }, { 0x045e, 0x02ea, "Microsoft X-Box One S pad", 0, XTYPE_XBOXONE }, { 0x045e, 0x0719, "Xbox 360 Wireless Receiver", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX360W }, + { 0x045e, 0x0b12, "Microsoft Xbox One X pad", MAP_SHARE_BUTTON, XTYPE_XBOXONE }, { 0x046d, 0xc21d, "Logitech Gamepad F310", 0, XTYPE_XBOX360 }, { 0x046d, 0xc21e, "Logitech Gamepad F510", 0, XTYPE_XBOX360 }, { 0x046d, 0xc21f, "Logitech Gamepad F710", 0, XTYPE_XBOX360 }, @@ -862,6 +864,8 @@ static void xpadone_process_packet(struct usb_xpad *xpad, u16 cmd, unsigned char /* menu/view buttons */ input_report_key(dev, BTN_START, data[4] & 0x04); input_report_key(dev, BTN_SELECT, data[4] & 0x08); + if (xpad->mapping & MAP_SHARE_BUTTON) + input_report_key(dev, KEY_RECORD, data[22] & 0x01); /* buttons A,B,X,Y */ input_report_key(dev, BTN_A,data[4] & 0x10); @@ -1669,9 +1673,11 @@ static int xpad_init_input(struct usb_xpad *xpad) /* set up model-specific ones */ if (xpad->xtype == XTYPE_XBOX360 || xpad->xtype == XTYPE_XBOX360W || - xpad->xtype == XTYPE_XBOXONE) { + xpad->xtype == XTYPE_XBOXONE) { for (i = 0; xpad360_btn[i] >= 0; i++) input_set_capability(input_dev, EV_KEY, xpad360_btn[i]); + if (xpad->mapping & MAP_SHARE_BUTTON) + input_set_capability(input_dev, EV_KEY, KEY_RECORD); } else { for (i = 0; xpad_btn[i] >= 0; i++) input_set_capability(input_dev, EV_KEY, xpad_btn[i]); -- 2.30.1.766.gb4fecdf3b7-goog
Re: [PATCH net] net: l2tp: reduce log level when passing up invalid packets
On 2/23/21 10:47 AM, Tom Parkin wrote: On Mon, Feb 22, 2021 at 14:31:38 -0800, Jakub Kicinski wrote: On Mon, 22 Feb 2021 17:40:16 +0100 Matthias Schiffer wrote: This will not be sufficient for my usecase: To stay compatible with older versions of fastd, I can't set the T flag in the first packet of the handshake, as it won't be known whether the peer has a new enough fastd version to understand packets that have this bit set. Luckily, the second handshake byte is always 0 in fastd's protocol, so these packets fail the tunnel version check and are passed to userspace regardless. I'm aware that this usecase is far outside of the original intentions of the code and can only be described as a hack, but I still consider this a regression in the kernel, as it was working fine in the past, without visible warnings. I'm sorry, but for the reasons stated above I disagree about it being a regression. Hmm, is it common for protocol implementations in the kernel to warn about invalid packets they receive? While L2TP uses connected sockets and thus usually no unrelated packets end up in the socket, a simple UDP port scan originating from the configured remote address/port will trigger the "short packet" warning now (nmap uses a zero-length payload for UDP scans by default). Log spam caused by a malicous party might also be a concern. Indeed, seems like appropriate counters would be a good fit here? The prints are both potentially problematic for security and lossy. Yes, I agree with this argument. Sounds good, I'll send an updated patch adding a counter for invalid packets. By now I've found another project affected by the kernel warnings: https://github.com/wlanslovenija/tunneldigger/issues/160 OpenPGP_signature Description: OpenPGP digital signature
[RFC PATCH v4 3/3] scheduler: Add cluster scheduler level for x86
From: Tim Chen There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce is shared among a cluster of cores instead of being exclusive to one single core. To prevent oversubscription of L2 cache, load should be balanced between such L2 clusters, especially for tasks with no shared data. Also with cluster scheduling policy where tasks are woken up in the same L2 cluster, we will benefit from keeping tasks related to each other and likely sharing data in the same L2 cluster. Add CPU masks of CPUs sharing the L2 cache so we can build such L2 cluster scheduler domain. Signed-off-by: Tim Chen Signed-off-by: Barry Song --- arch/x86/Kconfig| 8 arch/x86/include/asm/smp.h | 7 +++ arch/x86/include/asm/topology.h | 1 + arch/x86/kernel/cpu/cacheinfo.c | 1 + arch/x86/kernel/cpu/common.c| 3 +++ arch/x86/kernel/smpboot.c | 43 - 6 files changed, 62 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d3338a8..40110de 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1009,6 +1009,14 @@ config NR_CPUS This is purely to save memory: each supported CPU adds about 8KB to the kernel image. +config SCHED_CLUSTER + bool "Cluster scheduler support" + default n + help +Cluster scheduler support improves the CPU scheduler's decision +making when dealing with machines that have clusters of CPUs +sharing L2 cache. If unsure say N here. + config SCHED_SMT def_bool y if SMP diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index c0538f8..9cbc4ae 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -16,7 +16,9 @@ DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map); /* cpus sharing the last level cache: */ DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map); +DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map); DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id); +DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_l2c_id); DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number); static inline struct cpumask *cpu_llc_shared_mask(int cpu) @@ -24,6 +26,11 @@ static inline struct cpumask *cpu_llc_shared_mask(int cpu) return per_cpu(cpu_llc_shared_map, cpu); } +static inline struct cpumask *cpu_l2c_shared_mask(int cpu) +{ + return per_cpu(cpu_l2c_shared_map, cpu); +} + DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid); DECLARE_EARLY_PER_CPU_READ_MOSTLY(u32, x86_cpu_to_acpiid); DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_bios_cpu_apicid); diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h index 9239399..2a11ccc 100644 --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -103,6 +103,7 @@ static inline void setup_node_to_cpumask_map(void) { } #include extern const struct cpumask *cpu_coregroup_mask(int cpu); +extern const struct cpumask *cpu_clustergroup_mask(int cpu); #define topology_logical_package_id(cpu) (cpu_data(cpu).logical_proc_id) #define topology_physical_package_id(cpu) (cpu_data(cpu).phys_proc_id) diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c index 3ca9be4..0d03a71 100644 --- a/arch/x86/kernel/cpu/cacheinfo.c +++ b/arch/x86/kernel/cpu/cacheinfo.c @@ -846,6 +846,7 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c) l2 = new_l2; #ifdef CONFIG_SMP per_cpu(cpu_llc_id, cpu) = l2_id; + per_cpu(cpu_l2c_id, cpu) = l2_id; #endif } diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 35ad848..fb08c73 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -78,6 +78,9 @@ /* Last level cache ID of each logical CPU */ DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id) = BAD_APICID; +/* L2 cache ID of each logical CPU */ +DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_l2c_id) = BAD_APICID; + /* correctly size the local cpu masks */ void __init setup_cpu_local_masks(void) { diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 02813a7..c85ffa8 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -101,6 +101,8 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map); +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map); + /* Per CPU bogomips and other parameters */ DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info); EXPORT_PER_CPU_SYMBOL(cpu_info); @@ -501,6 +503,21 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) return topology_sane(c, o, "llc"); } +static bool match_l2c(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) +{ + int cpu1 = c->cpu_index, cpu2 = o->cpu_index; + + /* Do not match if we do not have a valid APICID for cpu: */ + if (per_cpu(cpu_l2c_id, cpu1) == BAD_APICID) + return false; + + /* Do not match if
RE: [PATCH v3 6/8] mm: Selftests for exclusive device memory
> From: Alistair Popple > Sent: Thursday, February 25, 2021 11:19 PM > To: linux...@kvack.org; nouv...@lists.freedesktop.org; > bske...@redhat.com; a...@linux-foundation.org > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; dri- > de...@lists.freedesktop.org; John Hubbard ; Ralph > Campbell ; jgli...@redhat.com; Jason Gunthorpe > ; h...@infradead.org; dan...@ffwll.ch; Alistair Popple > > Subject: [PATCH v3 6/8] mm: Selftests for exclusive device memory > > Adds some selftests for exclusive device memory. > > Signed-off-by: Alistair Popple One minor nit below, but you can add Tested-by: Ralph Campbell Reviewed-by: Ralph Campbell > +static int dmirror_exclusive(struct dmirror *dmirror, > + struct hmm_dmirror_cmd *cmd) > +{ > + unsigned long start, end, addr; > + unsigned long size = cmd->npages << PAGE_SHIFT; > + struct mm_struct *mm = dmirror->notifier.mm; > + struct page *pages[64]; > + struct dmirror_bounce bounce; > + unsigned long next; > + int ret; > + > + start = cmd->addr; > + end = start + size; > + if (end < start) > + return -EINVAL; > + > + /* Since the mm is for the mirrored process, get a reference first. */ > + if (!mmget_not_zero(mm)) > + return -EINVAL; > + > + mmap_read_lock(mm); > + for (addr = start; addr < end; addr = next) { > + int i, mapped; > + > + if (end < addr + (64 << PAGE_SHIFT)) > + next = end; > + else > + next = addr + (64 << PAGE_SHIFT); I suggest using ARRAY_SIZE(pages) instead of '64' to make the meaning clear.
[RFC PATCH v4 2/3] scheduler: add scheduler level for clusters
ARM64 chip Kunpeng 920 has 6 or 8 clusters in each NUMA node, and each cluster has 4 cpus. All clusters share L3 cache data, but each cluster has local L3 tag. On the other hand, each clusters will share some internal system bus. This means cache coherence overhead inside one cluster is much less than the overhead across clusters. This patch adds the sched_domain for clusters. On kunpeng 920, without this patch, domain0 of cpu0 would be MC with cpu0~cpu23 with ; with this patch, MC becomes domain1, a new domain0 "CLS" including cpu0-cpu3. This will help spread unrelated tasks among clusters, thus decrease the contention and improve the throughput, for example, stream benchmark can improve around 4.3%~6.3% by this patch: w/o patch: numactl -N 0 /usr/lib/lmbench/bin/stream -P 12 -M 1024M -N 5 STREAM copy latency: 3.36 nanoseconds STREAM copy bandwidth: 57072.50 MB/sec STREAM scale latency: 3.40 nanoseconds STREAM scale bandwidth: 56542.52 MB/sec STREAM add latency: 5.10 nanoseconds STREAM add bandwidth: 56482.83 MB/sec STREAM triad latency: 5.14 nanoseconds STREAM triad bandwidth: 56069.52 MB/sec w/ patch: $ numactl -N 0 /usr/lib/lmbench/bin/stream -P 12 -M 1024M -N 5 STREAM copy latency: 3.22 nanoseconds STREAM copy bandwidth: 59660.96 MB/sec-> +4.5% STREAM scale latency: 3.25 nanoseconds STREAM scale bandwidth: 59002.29 MB/sec -> +4.3% STREAM add latency: 4.80 nanoseconds STREAM add bandwidth: 60036.62 MB/sec -> +6.3% STREAM triad latency: 4.86 nanoseconds STREAM triad bandwidth: 59228.30 MB/sec -> +5.6% On the other hand, while doing WAKE_AFFINE, this patch will try to find a core in the target cluster before scanning the whole llc domain. So it helps gather related tasks within one cluster. we run the below hackbench with different -g parameter from 2 to 14, for each different g, we run the command 10 times and get the average time $ numactl -N 0 hackbench -p -T -l 2 -g $1 hackbench will report the time which is needed to complete a certain number of messages transmissions between a certain number of tasks, for example: $ numactl -N 0 hackbench -p -T -l 2 -g 10 Running in threaded mode with 10 groups using 40 file descriptors each (== 400 tasks) Each sender will pass 2 messages of 100 bytes Time: 8.874 The below is the result of hackbench w/ and w/o the patch: g=2 4 6 8 10 12 14 w/o: 1.9596 4.0506 5.9654 8.0068 9.8147 11.4900 13.1163 w/ : 1.9362 3.9197 5.6570 7.1376 8.5263 10.0512 11.3256 +3.3% +5.2% +10.9% +13.2% +12.8% +13.7% Signed-off-by: Barry Song --- -v4: * rebased to tip/sched/core with the latest unified code of select_idle_cpu * also added benchmark data of spreading unrelated tasks * avoided the iteration of sched_domain by moving to static_key(addressing Vincent's comment arch/arm64/Kconfig | 7 + include/linux/sched/cluster.h | 19 include/linux/sched/sd_flags.h | 9 ++ include/linux/sched/topology.h | 7 + include/linux/topology.h | 7 + kernel/sched/core.c| 18 kernel/sched/fair.c| 66 +- kernel/sched/sched.h | 1 + kernel/sched/topology.c| 6 9 files changed, 126 insertions(+), 14 deletions(-) create mode 100644 include/linux/sched/cluster.h diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index f39568b..158b0fa 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -971,6 +971,13 @@ config SCHED_MC making when dealing with multi-core CPU chips at a cost of slightly increased overhead in some places. If unsure say N here. +config SCHED_CLUSTER + bool "Cluster scheduler support" + help + Cluster scheduler support improves the CPU scheduler's decision + making when dealing with machines that have clusters(sharing internal + bus or sharing LLC cache tag). If unsure say N here. + config SCHED_SMT bool "SMT scheduler support" help diff --git a/include/linux/sched/cluster.h b/include/linux/sched/cluster.h new file mode 100644 index 000..ea6c475 --- /dev/null +++ b/include/linux/sched/cluster.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_SCHED_CLUSTER_H +#define _LINUX_SCHED_CLUSTER_H + +#include + +#ifdef CONFIG_SCHED_CLUSTER +extern struct static_key_false sched_cluster_present; + +static __always_inline bool sched_cluster_active(void) +{ + return static_branch_likely(_cluster_present); +} +#else +static inline bool sched_cluster_active(void) { return false; } + +#endif + +#endif diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h index 34b21e9..fc3c894 100644 --- a/include/linux/sched/sd_flags.h +++ b/include/linux/sched/sd_flags.h @@ -100,6 +100,15 @@ SD_FLAG(SD_SHARE_CPUCAPACITY, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS) /* + * Domain members share CPU cluster resources (i.e. llc cache tags) + *
[RFC PATCH v4 0/3] scheduler: expose the topology of clusters and add cluster scheduler
ARM64 server chip Kunpeng 920 has 6 or 8 clusters in each NUMA node, and each cluster has 4 cpus. All clusters share L3 cache data while each cluster has local L3 tag. On the other hand, each cluster will share some internal system bus. This means cache is much more affine inside one cluster than across clusters. +---+ +-+ | +--++--++---+ | | | CPU0 || cpu1 | |+---+ | | | +--++--+ || | | | | ++L3 | | | | +--++--+ cluster ||tag| | | | | CPU2 || CPU3 | || | | | | +--++--+ |+---+ | | | | | | +---+ | | +---+ | | | +--++--+ +--+ | | | || | |+---+ | | | +--++--+ || | | | | ||L3 | | | | +--++--+ ++tag| | | | | || | || | | | | +--++--+ |+---+ | | | | | | +---+ | L3| | data | +---+ | | | +--++--+ |+---+ | | | | || | || | | | | +--++--+ ++L3 | | | | ||tag| | | | +--++--+ || | | | | | || |+++---+ | | | +--++--+|---+ | +---| | | +---| | | | +--++--++---+ | | | || | |+---+ | | | +--++--+ || | | | | ++L3 | | | | +--++--+ ||tag| | | | | || | || | | | | +--++--+ |+---+ | | | | | | +---+ | | +---+ | | | +--++--+ +--+ | | | || | | +---+ | | | +--++--+ | | | | | There is a similar need for clustering in x86. Some x86 cores could share L2 caches that is similar to the cluster in Kupeng 920 (e.g. on Jacobsville there are 6 clusters of 4 Atom cores, each cluster sharing a separate L2, and 24 cores sharing L3). Having a sched_domain for clusters will bring two aspects of improvement: 1. spreading unrelated tasks among clusters, which decreases the contention of resources and improve the throughput. unrelated tasks might be put randomly without cluster sched_domain: +---++-+ | ++ ++ || | | |task| |task| || | | |1 | |2 | || | | ++ ++ || | | || | | cluster1|| cluster2| +---++-+ but with cluster sched_domain, they are likely to spread due to LB: +---++-+ | ++|| ++ | | |task||| |task| | | |1 |||
[RFC PATCH v4 1/3] topology: Represent clusters of CPUs within a die.
From: Jonathan Cameron Both ACPI and DT provide the ability to describe additional layers of topology between that of individual cores and higher level constructs such as the level at which the last level cache is shared. In ACPI this can be represented in PPTT as a Processor Hierarchy Node Structure [1] that is the parent of the CPU cores and in turn has a parent Processor Hierarchy Nodes Structure representing a higher level of topology. For example Kunpeng 920 has 6 or 8 clusters in each NUMA node, and each cluster has 4 cpus. All clusters share L3 cache data, but each cluster has local L3 tag. On the other hand, each clusters will share some internal system bus. +---+ +-+ | +--++--++---+ | | | CPU0 || cpu1 | |+---+ | | | +--++--+ || | | | | ++L3 | | | | +--++--+ cluster ||tag| | | | | CPU2 || CPU3 | || | | | | +--++--+ |+---+ | | | | | | +---+ | | +---+ | | | +--++--+ +--+ | | | || | |+---+ | | | +--++--+ || | | | | ||L3 | | | | +--++--+ ++tag| | | | | || | || | | | | +--++--+ |+---+ | | | | | | +---+ | L3| | data | +---+ | | | +--++--+ |+---+ | | | | || | || | | | | +--++--+ ++L3 | | | | ||tag| | | | +--++--+ || | | | | | || |+++---+ | | | +--++--+|---+ | +---| | | +---| | | | +--++--++---+ | | | || | |+---+ | | | +--++--+ || | | | | ++L3 | | | | +--++--+ ||tag| | | | | || | || | | | | +--++--+ |+---+ | | | | | | +---+ | | +---+ | | | +--++--+ +--+ | | | || | | +---+ | | | +--++--+ | | | | | | | |L3 | | | | +--++--+ +---+tag| | | | | || | | | | | | | +--++--+ | +---+ | | | | | | +---+ | | +---+ ++ | | +--++--+ +--+ | | | || | | +---+ | | | +--++--+ | | | | | | | |L3 | | | | +--++--+ +--+tag| | | | | || | | | | | | | +--++--+ |
Re: [PATCH v6 08/12] fork: Clear PASID for new mm
Hi Fenghua, On Thu, 25 Feb 2021 22:17:11 +, Fenghua Yu wrote: > Hi, Jean, > > On Wed, Feb 24, 2021 at 11:19:27AM +0100, Jean-Philippe Brucker wrote: > > Hi Fenghua, > > > > [Trimmed the Cc list] > > > > On Mon, Jul 13, 2020 at 04:48:03PM -0700, Fenghua Yu wrote: > > > When a new mm is created, its PASID should be cleared, i.e. the PASID > > > is initialized to its init state 0 on both ARM and X86. > > > > I just noticed this patch was dropped in v7, and am wondering whether we > > could still upstream it. Does x86 need a child with a new address space > > (!CLONE_VM) to inherit the PASID of the parent? That doesn't make much > > sense with regard to IOMMU structures - same PASID indexing multiple > > PGDs? > > You are right: x86 should clear mm->pasid when a new mm is created. > This patch somehow is losted:( > > > > > Currently iommu_sva_alloc_pasid() assumes mm->pasid is always > > initialized to 0 and fails on forked tasks. I'm trying to figure out > > how to fix this. Could we clear the pasid on fork or does it break the > > x86 model? > > x86 calls ioasid_alloc() instead of iommu_sva_alloc_pasid(). So We should consolidate at some point, there is no need to store pasid in two places. > functionality is not a problem without this patch on x86. But I think I feel the reason that x86 doesn't care is that mm->pasid is not used unless bind_mm is called. For the fork children even mm->pasid is non-zero, it has no effect since it is not loaded onto MSRs. Perhaps you could also add a check or WARN_ON(!mm->pasid) in load_pasid()? > we do need to have this patch in the kernel because PASID is per addr > space and two addr spaces shouldn't have the same PASID. > Agreed. > Who will accept this patch? > > Thanks. > > -Fenghua Thanks, Jacob
[PATCH] usb: serial: io_edgeport: fix memory leak in edge_startup
sysbot found memory leak in edge_startup(). The problem was that when an error was received from the usb_submit_urb(), nothing was cleaned up. Reported-by: syzbot+59f777bdcbdd7eea5...@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin --- drivers/usb/serial/io_edgeport.c | 26 -- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/drivers/usb/serial/io_edgeport.c b/drivers/usb/serial/io_edgeport.c index a493670c06e6..68401adcffde 100644 --- a/drivers/usb/serial/io_edgeport.c +++ b/drivers/usb/serial/io_edgeport.c @@ -3003,26 +3003,32 @@ static int edge_startup(struct usb_serial *serial) response = -ENODEV; } - usb_free_urb(edge_serial->interrupt_read_urb); - kfree(edge_serial->interrupt_in_buffer); - - usb_free_urb(edge_serial->read_urb); - kfree(edge_serial->bulk_in_buffer); - - kfree(edge_serial); - - return response; + goto error; } /* start interrupt read for this edgeport this interrupt will * continue as long as the edgeport is connected */ response = usb_submit_urb(edge_serial->interrupt_read_urb, GFP_KERNEL); - if (response) + if (response) { dev_err(ddev, "%s - Error %d submitting control urb\n", __func__, response); + + goto error; + } } return response; + +error: + usb_free_urb(edge_serial->interrupt_read_urb); + kfree(edge_serial->interrupt_in_buffer); + + usb_free_urb(edge_serial->read_urb); + kfree(edge_serial->bulk_in_buffer); + + kfree(edge_serial); + + return response; } -- 2.25.1
RE: [PATCH v3 5/8] mm: Device exclusive memory access
> From: Alistair Popple > Sent: Thursday, February 25, 2021 11:18 PM > To: linux...@kvack.org; nouv...@lists.freedesktop.org; > bske...@redhat.com; a...@linux-foundation.org > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; dri- > de...@lists.freedesktop.org; John Hubbard ; Ralph > Campbell ; jgli...@redhat.com; Jason Gunthorpe > ; h...@infradead.org; dan...@ffwll.ch; Alistair Popple > > Subject: [PATCH v3 5/8] mm: Device exclusive memory access > > Some devices require exclusive write access to shared virtual memory (SVM) > ranges to perform atomic operations on that memory. This requires CPU page > tables to be updated to deny access whilst atomic operations are occurring. > > In order to do this introduce a new swap entry type (SWP_DEVICE_EXCLUSIVE). > When a SVM range needs to be marked for exclusive access by a device all page > table mappings for the particular range are replaced with device exclusive > swap > entries. This causes any CPU access to the page to result in a fault. > > Faults are resovled by replacing the faulting entry with the original > mapping. This > results in MMU notifiers being called which a driver uses to update access > permissions such as revoking atomic access. After notifiers have been called > the > device will no longer have exclusive access to the region. > > Signed-off-by: Alistair Popple > --- > Documentation/vm/hmm.rst | 15 > include/linux/rmap.h | 3 + > include/linux/swap.h | 4 +- > include/linux/swapops.h | 44 ++- > mm/hmm.c | 5 ++ > mm/memory.c | 108 +- > mm/mprotect.c| 8 ++ > mm/page_vma_mapped.c | 9 ++- > mm/rmap.c| 163 +++ > 9 files changed, 352 insertions(+), 7 deletions(-) ... > +int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, > + unsigned long end, struct page **pages) { > + long npages = (end - start) >> PAGE_SHIFT; > + long i; Nit: you should use unsigned long for 'i' and 'npages' to match start/end.
Re: [PATCH 4.19 055/247] soc: aspeed: snoop: Add clock control logic
On Mon, 1 Mar 2021 at 16:37, Greg Kroah-Hartman wrote: > > From: Jae Hyun Yoo > > [ Upstream commit 3f94cf15583be554df7aaa651b8ff8e1b68fbe51 ] > > If LPC SNOOP driver is registered ahead of lpc-ctrl module, LPC > SNOOP block will be enabled without heart beating of LCLK until > lpc-ctrl enables the LCLK. This issue causes improper handling on > host interrupts when the host sends interrupt in that time frame. > Then kernel eventually forcibly disables the interrupt with > dumping stack and printing a 'nobody cared this irq' message out. > > To prevent this issue, all LPC sub-nodes should enable LCLK > individually so this patch adds clock control logic into the LPC > SNOOP driver. Jae, John; with this backported do we need to also provide a corresponding device tree change for the stable tree, otherwise this driver will no longer probe? > > Fixes: 3772e5da4454 ("drivers/misc: Aspeed LPC snoop output using misc > chardev") > Signed-off-by: Jae Hyun Yoo > Signed-off-by: Vernon Mauery > Signed-off-by: John Wang > Reviewed-by: Joel Stanley > Link: > https://lore.kernel.org/r/20201208091748.1920-1-wangzhiqiang...@bytedance.com > Signed-off-by: Joel Stanley > Signed-off-by: Sasha Levin > --- > drivers/misc/aspeed-lpc-snoop.c | 30 +++--- > 1 file changed, 27 insertions(+), 3 deletions(-) > > diff --git a/drivers/misc/aspeed-lpc-snoop.c b/drivers/misc/aspeed-lpc-snoop.c > index c10be21a1663d..b4a776bf44bc5 100644 > --- a/drivers/misc/aspeed-lpc-snoop.c > +++ b/drivers/misc/aspeed-lpc-snoop.c > @@ -15,6 +15,7 @@ > */ > > #include > +#include > #include > #include > #include > @@ -71,6 +72,7 @@ struct aspeed_lpc_snoop_channel { > struct aspeed_lpc_snoop { > struct regmap *regmap; > int irq; > + struct clk *clk; > struct aspeed_lpc_snoop_channel chan[NUM_SNOOP_CHANNELS]; > }; > > @@ -286,22 +288,42 @@ static int aspeed_lpc_snoop_probe(struct > platform_device *pdev) > return -ENODEV; > } > > + lpc_snoop->clk = devm_clk_get(dev, NULL); > + if (IS_ERR(lpc_snoop->clk)) { > + rc = PTR_ERR(lpc_snoop->clk); > + if (rc != -EPROBE_DEFER) > + dev_err(dev, "couldn't get clock\n"); > + return rc; > + } > + rc = clk_prepare_enable(lpc_snoop->clk); > + if (rc) { > + dev_err(dev, "couldn't enable clock\n"); > + return rc; > + } > + > rc = aspeed_lpc_snoop_config_irq(lpc_snoop, pdev); > if (rc) > - return rc; > + goto err; > > rc = aspeed_lpc_enable_snoop(lpc_snoop, dev, 0, port); > if (rc) > - return rc; > + goto err; > > /* Configuration of 2nd snoop channel port is optional */ > if (of_property_read_u32_index(dev->of_node, "snoop-ports", >1, ) == 0) { > rc = aspeed_lpc_enable_snoop(lpc_snoop, dev, 1, port); > - if (rc) > + if (rc) { > aspeed_lpc_disable_snoop(lpc_snoop, 0); > + goto err; > + } > } > > + return 0; > + > +err: > + clk_disable_unprepare(lpc_snoop->clk); > + > return rc; > } > > @@ -313,6 +335,8 @@ static int aspeed_lpc_snoop_remove(struct platform_device > *pdev) > aspeed_lpc_disable_snoop(lpc_snoop, 0); > aspeed_lpc_disable_snoop(lpc_snoop, 1); > > + clk_disable_unprepare(lpc_snoop->clk); > + > return 0; > } > > -- > 2.27.0 > > >
Re: [PATCH 5.10 000/661] 5.10.20-rc2 review
On 3/1/21 11:37 AM, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 5.10.20 release. > There are 661 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Wed, 03 Mar 2021 19:34:53 +. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.20-rc2.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-5.10.y > and the diffstat can be found below. > > thanks, > > greg k-h On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels: Tested-by: Florian Fainelli -- Florian
Re: [PATCHv2 3/4] coresight: etm4x: Add support to exclude kernel mode tracing
Hi, On Mon, Mar 1, 2021 at 11:05 AM Sai Prakash Ranjan wrote: > > On production systems with ETMs enabled, it is preferred to exclude > kernel mode(NS EL1) tracing for security concerns and support only > userspace(NS EL0) tracing. Perf subsystem interface uses the newly > introduced kernel config CONFIG_EXCLUDE_KERNEL_PMU_TRACE to exclude > kernel mode tracing, but there is an additional interface via sysfs > for ETMs which also needs to be handled to exclude kernel > mode tracing. So we use this same generic kernel config to handle > the sysfs mode of tracing. This config is disabled by default and > would not affect the current configuration which has both kernel and > userspace tracing enabled by default. > > Tested-by: Denis Nikitin > Signed-off-by: Sai Prakash Ranjan > --- > drivers/hwtracing/coresight/coresight-etm4x-core.c | 6 +- > drivers/hwtracing/coresight/coresight-etm4x-sysfs.c | 6 ++ > 2 files changed, 11 insertions(+), 1 deletion(-) Not that I'm an expert in the perf subsystem, but the concern I had with v1 is now addressed. FWIW this seems fine to me now. Reviewed-by: Douglas Anderson > --- a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c > +++ b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c > @@ -296,6 +296,12 @@ static ssize_t mode_store(struct device *dev, > if (kstrtoul(buf, 16, )) > return -EINVAL; > > + if (IS_ENABLED(CONFIG_EXCLUDE_KERNEL_PMU_TRACE) && (!(val & > ETM_MODE_EXCL_KERN))) { > + dev_warn(dev, > + "Kernel mode tracing is not allowed, check your > kernel config\n"); slight nit that I think your string needs to be indented by 1 space. ;-)
Re: [PATCH v1 03/15] powerpc/uaccess: Remove __get/put_user_inatomic()
Christophe Leroy writes: > Since commit 662bbcb2747c ("mm, sched: Allow uaccess in atomic with > pagefault_disable()"), __get/put_user() can be used in atomic parts > of the code, therefore the __get/put_user_inatomic() introduced > by commit e68c825bb016 ("[POWERPC] Add inatomic versions of __get_user > and __put_user") have become useless. I spent some time chasing these macro definitions. Let me see if I understand you. __get_user(x, ptr) becomes __get_user_nocheck(..., true) __get_user_inatomic() become __get_user_nosleep() The difference between how __get_user_nosleep() and __get_user_nocheck(..., true) operate is that __get_user_nocheck calls might_fault() and __get_user_nosleep() does not. If I understand the commit you reference and mm/memory.c, you're saying that we can indeed call might_fault() when page faults are disabled, because __might_fault() checks if page faults are disabled and does not fire a warning if it is called with page faults disabled. Therefore, it is safe to remove our _inatomic version that does not call might_fault and just to call might_fault unconditionally. Is that right? I haven't checked changes you made to the various .c files in fine detail but they appear to be entirely mechanical. > powerpc is the only one having such functions. There is a real > intention not to have to provide such _inatomic() helpers, see the > comment in might_fault() in mm/memory.c introduced by > commit 3ee1afa308f2 ("x86: some lock annotations for user > copy paths, v2"): > > /* >* it would be nicer only to annotate paths which are not under >* pagefault_disable, however that requires a larger audit and >* providing helpers like get_user_atomic. >*/ > I'm not fully sure I understand what you're saying in this part of the commit message. Kind regards, Daniel > > Signed-off-by: Christophe Leroy > --- > arch/powerpc/include/asm/uaccess.h| 37 --- > arch/powerpc/kernel/align.c | 32 > .../kernel/hw_breakpoint_constraints.c| 2 +- > arch/powerpc/kernel/traps.c | 2 +- > 4 files changed, 18 insertions(+), 55 deletions(-) > > diff --git a/arch/powerpc/include/asm/uaccess.h > b/arch/powerpc/include/asm/uaccess.h > index a08c482b1315..01aea0df4dd0 100644 > --- a/arch/powerpc/include/asm/uaccess.h > +++ b/arch/powerpc/include/asm/uaccess.h > @@ -53,11 +53,6 @@ static inline bool __access_ok(unsigned long addr, > unsigned long size) > #define __put_user(x, ptr) \ > __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr))) > > -#define __get_user_inatomic(x, ptr) \ > - __get_user_nosleep((x), (ptr), sizeof(*(ptr))) > -#define __put_user_inatomic(x, ptr) \ > - __put_user_nosleep((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr))) > - > #ifdef CONFIG_PPC64 > > #define ___get_user_instr(gu_op, dest, ptr) \ > @@ -92,9 +87,6 @@ static inline bool __access_ok(unsigned long addr, unsigned > long size) > #define __get_user_instr(x, ptr) \ > ___get_user_instr(__get_user, x, ptr) > > -#define __get_user_instr_inatomic(x, ptr) \ > - ___get_user_instr(__get_user_inatomic, x, ptr) > - > extern long __put_user_bad(void); > > #define __put_user_size(x, ptr, size, retval)\ > @@ -141,20 +133,6 @@ __pu_failed: > \ > __pu_err; \ > }) > > -#define __put_user_nosleep(x, ptr, size) \ > -({ \ > - long __pu_err; \ > - __typeof__(*(ptr)) __user *__pu_addr = (ptr); \ > - __typeof__(*(ptr)) __pu_val = (x); \ > - __typeof__(size) __pu_size = (size);\ > - \ > - __chk_user_ptr(__pu_addr); \ > - __put_user_size(__pu_val, __pu_addr, __pu_size, __pu_err); \ > - \ > - __pu_err; \ > -}) > - > - > /* > * We don't tell gcc that we are accessing memory, but this is OK > * because we do not write to any memory gcc knows about, so there > @@ -320,21 +298,6 @@ do { > \ > __gu_err; \ > }) > > -#define __get_user_nosleep(x, ptr, size) \ > -({ \ > - long __gu_err; \ > - __long_type(*(ptr)) __gu_val; \ > - __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ > - __typeof__(size) __gu_size = (size);\ > -
Re: [PATCHv2 2/4] perf evsel: Print warning for excluding kernel mode instruction tracing
Hi, On Mon, Mar 1, 2021 at 11:05 AM Sai Prakash Ranjan wrote: > > Add a warning message to check CONFIG_EXCLUDE_KERNEL_HW_ITRACE kernel > config which excludes kernel mode instruction tracing to help perf tool > users identify the perf event open failure when they attempt kernel mode > tracing with this config enabled. > > Tested-by: Denis Nikitin > Signed-off-by: Sai Prakash Ranjan > --- > tools/perf/util/evsel.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) I'm not really knowledgeable at all about the perf subsystem so my review doesn't hold a lot of weight. However, Sai's patch seems sane to me. Reviewed-by: Douglas Anderson
Re: Question about the "EXPERIMENTAL" tag for dax in XFS
On Mon, Mar 01, 2021 at 12:55:53PM -0800, Dan Williams wrote: > On Sun, Feb 28, 2021 at 2:39 PM Dave Chinner wrote: > > > > On Sat, Feb 27, 2021 at 03:40:24PM -0800, Dan Williams wrote: > > > On Sat, Feb 27, 2021 at 2:36 PM Dave Chinner wrote: > > > > On Fri, Feb 26, 2021 at 02:41:34PM -0800, Dan Williams wrote: > > > > > On Fri, Feb 26, 2021 at 1:28 PM Dave Chinner > > > > > wrote: > > > > > > On Fri, Feb 26, 2021 at 12:59:53PM -0800, Dan Williams wrote: > > > > it points to, check if it points to the PMEM that is being removed, > > > > grab the page it points to, map that to the relevant struct page, > > > > run collect_procs() on that page, then kill the user processes that > > > > map that page. > > > > > > > > So why can't we walk the ptescheck the physical pages that they > > > > map to and if they map to a pmem page we go poison that > > > > page and that kills any user process that maps it. > > > > > > > > i.e. I can't see how unexpected pmem device unplug is any different > > > > to an MCE delivering a hwpoison event to a DAX mapped page. > > > > > > I guess the tradeoff is walking a long list of inodes vs walking a > > > large array of pages. > > > > Not really. You're assuming all a filesystem has to do is invalidate > > everything if a device goes away, and that's not true. Finding if an > > inode has a mapping that spans a specific device in a multi-device > > filesystem can be a lot more complex than that. Just walking inodes > > is easy - determining whihc inodes need invalidation is the hard > > part. > > That inode-to-device level of specificity is not needed for the same > reason that drop_caches does not need to be specific. If the wrong > page is unmapped a re-fault will bring it back, and re-fault will fail > for the pages that are successfully removed. > > > That's where ->corrupt_range() comes in - the filesystem is already > > set up to do reverse mapping from physical range to inode(s) > > offsets... > > Sure, but what is the need to get to that level of specificity with > the filesystem for something that should rarely happen in the course > of normal operation outside of a mistake? Dan, you made this mistake with the hwpoisoning code that we're trying to fix that here. Hard coding a 1:1 physical address to inode/offset into the DAX mapping was a bad mistake. It's also one that should never have occurred because it's *obviously wrong* to filesystem developers and has been for a long time. Now we have the filesytem people providing a mechanism for the pmem devices to tell the filesystems about physical device failures so they can handle such failures correctly themselves. Having the device go away unexpectedly from underneath a mounted and active filesystem is a *device failure*, not an "unplug event". The mistake you made was not understanding how filesystems work, nor actually asking filesystem developers what they actually needed. You're doing the same thing here - you're telling us what you think the solution filesystems need is. Please listen when we say "that is not sufficient" because we don't want to be backed into a corner that we have to fix ourselves again before we can enable some basic filesystem functionality that we should have been able to support on DAX from the start... > > > There's likely always more pages than inodes, but perhaps it's more > > > efficient to walk the 'struct page' array than sb->s_inodes? > > > > I really don't see you seem to be telling us that invalidation is an > > either/or choice. There's more ways to convert physical block > > address -> inode file offset and mapping index than brute force > > inode cache walks > > Yes, but I was trying to map it to an existing mechanism and the > internals of drop_pagecache_sb() are, in coarse terms, close to what > needs to happen here. No. drop_pagecache_sb() is not a relevant model for telling a filesystem that the block device underneath has gone away, nor for a device to ensure that access protections that *are managed by the filesystem* are enforced/revoked sanely. drop_pagecache_sb() is a brute-force model for invalidating user data mappings that the filesystem performs in response to such a notification. It only needs this brute-force approach if it has no other way to find active DAX mappings across the range of the device that has gone away. But this model doesn't work for direct mapped metadata, journals or any other internal direct filesystem mappings that aren't referenced by inodes that the filesytem might be using. The filesystem still needs to invalidate all those mappings and prevent further access to them, even from within the kernel itself. Filesystems are way more complex than pure DAX devices, and hence handle errors and failure events differently. Unlike DAX devices, we have both internal and external references to the DAX device, and we can have both external and internal direct maps. Invalidating user data mappings is all dax devices need to do on unplug,
Re: [PATCHv2 1/4] perf/core: Add support to exclude kernel mode PMU tracing
Hi, On Mon, Mar 1, 2021 at 11:05 AM Sai Prakash Ranjan wrote: > > Hardware assisted tracing families such as ARM Coresight, Intel PT > provides rich tracing capabilities including instruction level > tracing and accurate timestamps which are very useful for profiling > and also pose a significant security risk. One such example of > security risk is when kernel mode tracing is not excluded and these > hardware assisted tracing can be used to analyze cryptographic code > execution. In this case, even the root user must not be able to infer > anything. > > To explain it more clearly in the words of a security team member > (credits: Mattias Nissler), > > "Consider a system where disk contents are encrypted and the encryption > key is set up by the user when mounting the file system. From that point > on the encryption key resides in the kernel. It seems reasonable to > expect that the disk encryption key be protected from exfiltration even > if the system later suffers a root compromise (or even against insiders > that have root access), at least as long as the attacker doesn't > manage to compromise the kernel." > > Here the idea is to protect such important information from all users > including root users since root privileges does not have to mean full > control over the kernel [1] and root compromise does not have to be > the end of the world. > > But "Peter said even the regular counters can be used for full branch > trace, the information isn't as accurate as PT and friends and not easier > but is good enough to infer plenty". This would mean that a global tunable > config for all kernel mode pmu tracing is more appropriate than the one > targeting the hardware assisted instruction tracing. > > Currently we can exclude kernel mode tracing via perf_event_paranoid > sysctl but it has following limitations, > > * No option to restrict kernel mode instruction tracing by the >root user. > * Not possible to restrict kernel mode instruction tracing when the >hardware assisted tracing IPs like ARM Coresight ETMs use an >additional interface via sysfs for tracing in addition to perf >interface. > > So introduce a new config CONFIG_EXCLUDE_KERNEL_PMU_TRACE to exclude > kernel mode pmu tracing which will be generic and applicable to all > hardware tracing families and which can also be used with other > interfaces like sysfs in case of ETMs. > > [1] https://lwn.net/Articles/796866/ > > Suggested-by: Suzuki K Poulose > Suggested-by: Al Grant > Tested-by: Denis Nikitin > Link: > https://lore.kernel.org/lkml/20201015124522.1876-1-saiprakash.ran...@codeaurora.org/ > Signed-off-by: Sai Prakash Ranjan > --- > init/Kconfig | 11 +++ > kernel/events/core.c | 3 +++ > 2 files changed, 14 insertions(+) I'm not really knowledgeable at all about the perf subsystem so my review doesn't hold a lot of weight. However, Sai's patch seems sane to me. Reviewed-by: Douglas Anderson
Re: [PATCHv2 4/4] coresight: etm3x: Add support to exclude kernel mode tracing
Hi, On Mon, Mar 1, 2021 at 11:05 AM Sai Prakash Ranjan wrote: > > On production systems with ETMs enabled, it is preferred to exclude > kernel mode(NS EL1) tracing for security concerns and support only > userspace(NS EL0) tracing. Perf subsystem interface uses the newly > introduced kernel config CONFIG_EXCLUDE_KERNEL_PMU_TRACE to exclude > kernel mode tracing, but there is an additional interface > via sysfs for ETMs which also needs to be handled to exclude kernel > mode tracing. So we use this same generic kernel config to handle > the sysfs mode of tracing. This config is disabled by default and > would not affect the current configuration which has both kernel and > userspace tracing enabled by default. > > Signed-off-by: Sai Prakash Ranjan > --- > drivers/hwtracing/coresight/coresight-etm3x-core.c | 3 +++ > drivers/hwtracing/coresight/coresight-etm3x-sysfs.c | 6 ++ > 2 files changed, 9 insertions(+) Reviewed-by: Douglas Anderson > diff --git a/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c > b/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c > index e8c7649f123e..f522fc2e01b3 100644 > --- a/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c > +++ b/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c > @@ -116,6 +116,12 @@ static ssize_t mode_store(struct device *dev, > if (ret) > return ret; > > + if (IS_ENABLED(CONFIG_EXCLUDE_KERNEL_PMU_TRACE) && (!(val & > ETM_MODE_EXCL_KERN))) { > + dev_warn(dev, > + "Kernel mode tracing is not allowed, check your > kernel config\n"); Same nit as in patch #3 that the above string should be indented by 1 more space.
[PATCH v9 4/6] userfaultfd: add UFFDIO_CONTINUE ioctl
This ioctl is how userspace ought to resolve "minor" userfaults. The idea is, userspace is notified that a minor fault has occurred. It might change the contents of the page using its second non-UFFD mapping, or not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for MINOR registered VMAs. ZEROPAGE maps the VMA to the zero page; but in the minor fault case, we already have some pre-existing underlying page. Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping. We'd just use memcpy() or similar instead. It turns out hugetlb_mcopy_atomic_pte() already does very close to what we want, if an existing page is provided via `struct page **pagep`. We already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so just extend that design: add an enum for the three modes of operation, and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE case. (Basically, look up the existing page, and avoid adding the existing page to the page cache or calling set_page_huge_active() on it.) Reviewed-by: Peter Xu Signed-off-by: Axel Rasmussen --- fs/userfaultfd.c | 67 include/linux/hugetlb.h | 3 ++ include/linux/userfaultfd_k.h| 18 + include/uapi/linux/userfaultfd.h | 21 +- mm/hugetlb.c | 40 --- mm/userfaultfd.c | 37 +++--- 6 files changed, 156 insertions(+), 30 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index ba35cafa8b0d..14f92285d04f 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1487,6 +1487,10 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_WP)) ioctls_out &= ~((__u64)1 << _UFFDIO_WRITEPROTECT); + /* CONTINUE ioctl is only supported for MINOR ranges. */ + if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR)) + ioctls_out &= ~((__u64)1 << _UFFDIO_CONTINUE); + /* * Now that we scanned all vmas we can already tell * userland which ioctls methods are guaranteed to @@ -1840,6 +1844,66 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, return ret; } +static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) +{ + __s64 ret; + struct uffdio_continue uffdio_continue; + struct uffdio_continue __user *user_uffdio_continue; + struct userfaultfd_wake_range range; + + user_uffdio_continue = (struct uffdio_continue __user *)arg; + + ret = -EAGAIN; + if (READ_ONCE(ctx->mmap_changing)) + goto out; + + ret = -EFAULT; + if (copy_from_user(_continue, user_uffdio_continue, + /* don't copy the output fields */ + sizeof(uffdio_continue) - (sizeof(__s64 + goto out; + + ret = validate_range(ctx->mm, _continue.range.start, +uffdio_continue.range.len); + if (ret) + goto out; + + ret = -EINVAL; + /* double check for wraparound just in case. */ + if (uffdio_continue.range.start + uffdio_continue.range.len <= + uffdio_continue.range.start) { + goto out; + } + if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE) + goto out; + + if (mmget_not_zero(ctx->mm)) { + ret = mcopy_continue(ctx->mm, uffdio_continue.range.start, +uffdio_continue.range.len, +>mmap_changing); + mmput(ctx->mm); + } else { + return -ESRCH; + } + + if (unlikely(put_user(ret, _uffdio_continue->mapped))) + return -EFAULT; + if (ret < 0) + goto out; + + /* len == 0 would wake all */ + BUG_ON(!ret); + range.len = ret; + if (!(uffdio_continue.mode & UFFDIO_CONTINUE_MODE_DONTWAKE)) { + range.start = uffdio_continue.range.start; + wake_userfault(ctx, ); + } + ret = range.len == uffdio_continue.range.len ? 0 : -EAGAIN; + +out: + return ret; +} + static inline unsigned int uffd_ctx_features(__u64 user_features) { /* @@ -1927,6 +1991,9 @@ static long userfaultfd_ioctl(struct file *file, unsigned cmd, case UFFDIO_WRITEPROTECT: ret = userfaultfd_writeprotect(ctx, arg); break; + case UFFDIO_CONTINUE: + ret = userfaultfd_continue(ctx, arg); + break; } return ret; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 7b86bf809d7a..1d3246b31a41 100644 ---
Re: [PATCH V6] x86/mm: Tracking linear mapping split events
Hello, On Thu, Feb 18, 2021 at 03:57:44PM -0800, Saravanan D wrote: > To help with debugging the sluggishness caused by TLB miss/reload, > we introduce monotonic hugepage [direct mapped] split event counts since > system state: SYSTEM_RUNNING to be displayed as part of > /proc/vmstat in x86 servers ... > Signed-off-by: Saravanan D > Acked-by: Tejun Heo > Acked-by: Johannes Weiner > Acked-by: Dave Hansen Andrew, do you mind picking this one up? It has enough acks and can go through either mm or x86 tree. Thank you. -- tejun
Re: [PATCH v2 1/2] tty/serial: Add rx-tx-swap OF option to stm32-usart
On 3/1/21 11:28 AM, Fabrice Gasnier wrote: On 2/27/21 5:41 PM, Martin Devera wrote: STM32 F7/H7 usarts supports RX & TX pin swapping. Add option to turn it on. Tested on STM32MP157. Signed-off-by: Martin Devera --- drivers/tty/serial/stm32-usart.c | 3 ++- drivers/tty/serial/stm32-usart.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c index b3675cf25a69..3650c8798061 100644 --- a/drivers/tty/serial/stm32-usart.c +++ b/drivers/tty/serial/stm32-usart.c @@ -758,7 +758,7 @@ static void stm32_usart_set_termios(struct uart_port *port, cr1 = USART_CR1_TE | USART_CR1_RE; if (stm32_port->fifoen) cr1 |= USART_CR1_FIFOEN; - cr2 = 0; + cr2 = stm32_port->swap ? USART_CR2_SWAP : 0; Hi Martin, Same could be done in the startup routine, that enables the port for reception (as described in Documentation/driver-api/serial/driver.rst) Hello Fabrice, I already incorporated all your comments but I'm struggling with the one above. The code must be in stm32_usart_set_termios too, because CR2 is modified. What is the reason to have it in startup() ? Is it because USART can be started without calling set_termios at all ? Like to reuse bootloader's last settings ? Thanks, Martin
[PATCH v9 5/6] userfaultfd: update documentation to describe minor fault handling
Reword / reorganize things a little bit into "lists", so new features / modes / ioctls can sort of just be appended. Describe how UFFDIO_REGISTER_MODE_MINOR and UFFDIO_CONTINUE can be used to intercept and resolve minor faults. Make it clear that COPY and ZEROPAGE are used for MISSING faults, whereas CONTINUE is used for MINOR faults. Reviewed-by: Peter Xu Signed-off-by: Axel Rasmussen --- Documentation/admin-guide/mm/userfaultfd.rst | 107 --- 1 file changed, 66 insertions(+), 41 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 65eefa66c0ba..3aa38e8b8361 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -63,36 +63,36 @@ the generic ioctl available. The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl defines what memory types are supported by the ``userfaultfd`` and what -events, except page fault notifications, may be generated. - -If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs -virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in -``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be -set if the kernel supports registering ``userfaultfd`` ranges on shared -memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, -``MAP_SHARED``, ``memfd_create``, etc). - -The userland application that wants to use ``userfaultfd`` with hugetlbfs -or shared memory need to set the corresponding flag in -``uffdio_api.features`` to enable those features. - -If the userland desires to receive notifications for events other than -page faults, it has to verify that ``uffdio_api.features`` has appropriate -``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more -detail below in `Non-cooperative userfaultfd`_ section. - -Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should -be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to -register a memory range in the ``userfaultfd`` by setting the +events, except page fault notifications, may be generated: + +- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events + other than page faults are supported. These events are described in more + detail below in the `Non-cooperative userfaultfd`_ section. + +- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM`` + indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING`` + registrations for hugetlbfs and shared memory (covering all shmem APIs, + i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``, + etc) virtual memory areas, respectively. + +- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports + ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory + areas. + +The userland application should set the feature flags it intends to use +when invoking the ``UFFDIO_API`` ioctl, to request that those features be +enabled if supported. + +Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER`` +ioctl should be invoked (if present in the returned ``uffdio_api.ioctls`` +bitmask) to register a memory range in the ``userfaultfd`` by setting the uffdio_register structure accordingly. The ``uffdio_register.mode`` bitmask will specify to the kernel which kind of faults to track for -the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing -pages). The ``UFFDIO_REGISTER`` ioctl will return the +the range. The ``UFFDIO_REGISTER`` ioctl will return the ``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve userfaults on the range registered. Not all ioctls will necessarily be -supported for all memory types depending on the underlying virtual -memory backend (anonymous memory vs tmpfs vs real filebacked -mappings). +supported for all memory types (e.g. anonymous memory vs. shmem vs. +hugetlbfs), or all types of intercepted faults. Userland can use the ``uffdio_register.ioctls`` to manage the virtual address space in the background (to add or potentially also remove @@ -100,21 +100,46 @@ memory from the ``userfaultfd`` registered range). This means a userfault could be triggering just before userland maps in the background the user-faulted page. -The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That -atomically copies a page into the userfault registered range and wakes -up the blocked userfaults -(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set). -Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in -guaranteeing that nothing can see an half copied page since it'll -keep userfaulting until the copy has finished. +Resolving Userfaults + + +There are three basic ways to resolve userfaults: + +- ``UFFDIO_COPY`` atomically copies some existing page contents from + userspace. + +- ``UFFDIO_ZEROPAGE`` atomically zeros the new page. +
Re: [PATCH v2 1/4] KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR
On Wed, Feb 03, 2021, Like Xu wrote: > @@ -348,10 +352,26 @@ static bool intel_pmu_handle_lbr_msrs_access(struct > kvm_vcpu *vcpu, > return true; > } > > +/* > + * Check if the requested depth values is supported > + * based on the bits [0:7] of the guest cpuid.1c.eax. > + */ > +static bool arch_lbr_depth_is_valid(struct kvm_vcpu *vcpu, u64 depth) > +{ > + struct kvm_cpuid_entry2 *best; > + > + best = kvm_find_cpuid_entry(vcpu, 0x1c, 0); > + if (depth && best) > + return (best->eax & 0xff) & (1ULL << (depth / 8 - 1)); I believe this will genereate undefined behavior if depth > 64. Or if depth < 8. And I believe this check also needs to enforce that depth is a multiple of 8. For each bit n set in this field, the IA32_LBR_DEPTH.DEPTH value 8*(n+1) is supported. Thus it's impossible for 0-7, 9-15, etc... to be legal depths. > + > + return false; > +} > +
Re: [PATCH 1/1] docs: arm: /chosen node parameters
Heinrich Schuchardt writes: > Add missing items to table of parameters set in the /chosen node by the EFI > stub. > > Signed-off-by: Heinrich Schuchardt > --- > Documentation/arm/uefi.rst | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/Documentation/arm/uefi.rst b/Documentation/arm/uefi.rst > index f732f957421f..9b0b5e458a1e 100644 > --- a/Documentation/arm/uefi.rst > +++ b/Documentation/arm/uefi.rst > @@ -64,4 +64,11 @@ linux,uefi-mmap-desc-size 32-bit Size in bytes of each > entry in the UEFI > memory map. > > linux,uefi-mmap-desc-ver32-bit Version of the mmap descriptor format. > + > +linux,initrd-start 64-bit Physical start address of an initrd > + > +linux,initrd-end64-bit Physical end address of an initrd > + > +kaslr-seed 64-bit Entropy used to randomize the kernel > image > + base address location. > == == > === Applied, thanks. jon
Re: [PATCH v1 01/15] powerpc/uaccess: Remove __get_user_allowed() and unsafe_op_wrap()
On Tue, Mar 02, 2021 at 09:02:54AM +1100, Daniel Axtens wrote: > Checkpatch does have one check that is relevant: > > CHECK: Macro argument reuse 'p' - possible side-effects? > #36: FILE: arch/powerpc/include/asm/uaccess.h:482: > +#define unsafe_get_user(x, p, e) do { > \ > + if (unlikely(__get_user_nocheck((x), (p), sizeof(*(p)), false)))\ > + goto e; \ > +} while (0) sizeof (of something other than a VLA) does not evaluate its operand. The checkpatch warning is incorrect (well, it does say "possible" -- it just didn't find a possible problem here). You can write bla = sizeof *p++; and p is *not* incremented. Segher
Re: [PATCH] mmc: Try power cycling card if command request times out
Hi Adrian! Thank you for your comments! On Mon, Mar 01, 2021 at 11:40:03AM +0100, Adrian Hunter wrote: > On 1/03/21 10:50 am, Ulf Hansson wrote: > > + Adrian > > > > On Tue, 16 Feb 2021 at 23:43, Mårten Lindahl > > wrote: > >> > >> Sometimes SD cards that has been run for a long time enters a state > >> where it cannot by itself be recovered, but needs a power cycle to be > >> operational again. Card status analysis has indicated that the card can > >> end up in a state where all external commands are ignored by the card > >> since it is halted by data timeouts. > >> > >> If the card has been heavily used for a long time it can be weared out, > >> and should typically be replaced. But on some tests, it shows that the > >> card can still be functional after a power cycle, but as it requires an > >> operator to do it, the card can remain in a non-operational state for a > >> long time until the problem has been observed by the operator. > >> > >> This patch adds function to power cycle the card in case it does not > >> respond to a command, and then resend the command if the power cycle > >> was successful. This procedure will be tested 1 time before giving up, > >> and resuming host operation as normal. > > > > I assume the context above is all about the ioctl interface? > > > > So, when the card enters this non functional state, have you tried > > just reading a block through the regular I/O interface. Does it > > trigger a power cycle of the card - and then makes it functional > > again? > > > >> > >> Signed-off-by: Mårten Lindahl > >> --- > >> Please note: This might not be the way we want to handle these cases, > >> but at least it lets us start the discussion. In which cases should the > >> mmc framework deal with error messages like ETIMEDOUT, and in which > >> cases should it be handled by userspace? > >> The mmc framework tries to recover a failed block request > >> (mmc_blk_mq_rw_recovery) which may end up in a HW reset of the card. > >> Would it be an idea to act in a similar way when an ioctl times out? > > > > Maybe, it's a good idea to allow the similar reset for ioctls as we do > > for regular I/O requests. My concern with this though, is that we > > might allow user space to trigger a HW resets a bit too easily - and > > that could damage the card. > > > > Did you consider this? > > > >> > >> drivers/mmc/core/block.c | 20 ++-- > >> 1 file changed, 18 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c > >> index 42e27a298218..d007b2af64d6 100644 > >> --- a/drivers/mmc/core/block.c > >> +++ b/drivers/mmc/core/block.c > >> @@ -976,6 +976,7 @@ static inline void mmc_blk_reset_success(struct > >> mmc_blk_data *md, int type) > >> */ > >> static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request > >> *req) > >> { > >> + int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE; > >> struct mmc_queue_req *mq_rq; > >> struct mmc_card *card = mq->card; > >> struct mmc_blk_data *md = mq->blkdata; > >> @@ -983,7 +984,7 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, > >> struct request *req) > >> bool rpmb_ioctl; > >> u8 **ext_csd; > >> u32 status; > >> - int ret; > >> + int ret, retry = 1; > >> int i; > >> > >> mq_rq = req_to_mmc_queue_req(req); > >> @@ -994,9 +995,24 @@ static void mmc_blk_issue_drv_op(struct mmc_queue > >> *mq, struct request *req) > >> case MMC_DRV_OP_IOCTL_RPMB: > > SD cards do not have RPMB. Did you mean eMMC? > No, you are right. This action should be excluded from 'case MMC_DRV_OP_IOCTL_RPMB'. > > >> idata = mq_rq->drv_op_data; > >> for (i = 0, ret = 0; i < mq_rq->ioc_count; i++) { > >> +cmd_do: > >> ret = __mmc_blk_ioctl_cmd(card, md, idata[i]); > >> - if (ret) > >> + if (ret == -ETIMEDOUT) { > >> + dev_warn(mmc_dev(card->host), > >> +"error %d sending command\n", > >> ret); > >> +cmd_reset: > >> + mmc_blk_reset_success(md, type); > > mmc_blk_reset_success() is called upon success, not failure. The reset will > not be attempted twice in a row, for a given type, without a "success" in > between. > Ok, yes I see. This line and the cmd_reset label should be removed, and if mmc_blk_reset fails we should break, not retry. Kind regards Mårten > >> + if (retry--) { > >> + dev_warn(mmc_dev(card->host), > >> +"power cycling card\n"); > >> + if (mmc_blk_reset > >> + (md, card->host, type)) > >> + goto cmd_reset; > >> +
[PATCH v9 6/6] userfaultfd/selftests: add test exercising minor fault handling
Fix a dormant bug in userfaultfd_events_test(), where we did `return faulting_process(0)` instead of `exit(faulting_process(0))`. This caused the forked process to keep running, trying to execute any further test cases after the events test in parallel with the "real" process. Add a simple test case which exercises minor faults. In short, it does the following: 1. "Sets up" an area (area_dst) and a second shared mapping to the same underlying pages (area_dst_alias). 2. Register one of these areas with userfaultfd, in minor fault mode. 3. Start a second thread to handle any minor faults. 4. Populate the underlying pages with the non-UFFD-registered side of the mapping. Basically, memset() each page with some arbitrary contents. 5. Then, using the UFFD-registered mapping, read all of the page contents, asserting that the contents match expectations (we expect the minor fault handling thread can modify the page contents before resolving the fault). The minor fault handling thread, upon receiving an event, flips all the bits (~) in that page, just to prove that it can modify it in some arbitrary way. Then it issues a UFFDIO_CONTINUE ioctl, to setup the mapping and resolve the fault. The reading thread should wake up and see this modification. Currently the minor fault test is only enabled in hugetlb_shared mode, as this is the only configuration the kernel feature supports. Reviewed-by: Peter Xu Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/userfaultfd.c | 164 ++- 1 file changed, 158 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 92b8ec423201..f5ab5e0312e7 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -81,6 +81,8 @@ static volatile bool test_uffdio_copy_eexist = true; static volatile bool test_uffdio_zeropage_eexist = true; /* Whether to test uffd write-protection */ static bool test_uffdio_wp = false; +/* Whether to test uffd minor faults */ +static bool test_uffdio_minor = false; static bool map_shared; static int huge_fd; @@ -96,6 +98,7 @@ struct uffd_stats { int cpu; unsigned long missing_faults; unsigned long wp_faults; + unsigned long minor_faults; }; /* pthread_mutex_t starts at page offset 0 */ @@ -153,17 +156,19 @@ static void uffd_stats_reset(struct uffd_stats *uffd_stats, uffd_stats[i].cpu = i; uffd_stats[i].missing_faults = 0; uffd_stats[i].wp_faults = 0; + uffd_stats[i].minor_faults = 0; } } static void uffd_stats_report(struct uffd_stats *stats, int n_cpus) { int i; - unsigned long long miss_total = 0, wp_total = 0; + unsigned long long miss_total = 0, wp_total = 0, minor_total = 0; for (i = 0; i < n_cpus; i++) { miss_total += stats[i].missing_faults; wp_total += stats[i].wp_faults; + minor_total += stats[i].minor_faults; } printf("userfaults: %llu missing (", miss_total); @@ -172,6 +177,9 @@ static void uffd_stats_report(struct uffd_stats *stats, int n_cpus) printf("\b), %llu wp (", wp_total); for (i = 0; i < n_cpus; i++) printf("%lu+", stats[i].wp_faults); + printf("\b), %llu minor (", minor_total); + for (i = 0; i < n_cpus; i++) + printf("%lu+", stats[i].minor_faults); printf("\b)\n"); } @@ -328,7 +336,7 @@ static struct uffd_test_ops shmem_uffd_test_ops = { }; static struct uffd_test_ops hugetlb_uffd_test_ops = { - .expected_ioctls = UFFD_API_RANGE_IOCTLS_BASIC, + .expected_ioctls = UFFD_API_RANGE_IOCTLS_BASIC & ~(1 << _UFFDIO_CONTINUE), .allocate_area = hugetlb_allocate_area, .release_pages = hugetlb_release_pages, .alias_mapping = hugetlb_alias_mapping, @@ -362,6 +370,22 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp) } } +static void continue_range(int ufd, __u64 start, __u64 len) +{ + struct uffdio_continue req; + + req.range.start = start; + req.range.len = len; + req.mode = 0; + + if (ioctl(ufd, UFFDIO_CONTINUE, )) { + fprintf(stderr, + "UFFDIO_CONTINUE failed for address 0x%" PRIx64 "\n", + (uint64_t)start); + exit(1); + } +} + static void *locking_thread(void *arg) { unsigned long cpu = (unsigned long) arg; @@ -569,8 +593,32 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, } if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP) { + /* Write protect page faults */ wp_range(uffd, msg->arg.pagefault.address, page_size, false); stats->wp_faults++; + } else if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_MINOR) { +
[PATCH v9 3/6] userfaultfd: hugetlbfs: only compile UFFD helpers if config enabled
For background, mm/userfaultfd.c provides a general mcopy_atomic implementation. But some types of memory (i.e., hugetlb and shmem) need a slightly different implementation, so they provide their own helpers for this. In other words, userfaultfd is the only caller of these functions. This patch achieves two things: 1. Don't spend time compiling code which will end up never being referenced anyway (a small build time optimization). 2. In patches later in this series, we extend the signature of these helpers with UFFD-specific state (a mode enumeration). Once this happens, we *have to* either not compile the helpers, or unconditionally define the UFFD-only state (which seems messier to me). This includes the declarations in the headers, as otherwise they'd yield warnings about implicitly defining the type of those arguments. Reviewed-by: Mike Kravetz Reviewed-by: Peter Xu Signed-off-by: Axel Rasmussen --- include/linux/hugetlb.h | 4 mm/hugetlb.c| 2 ++ 2 files changed, 6 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index c0b10f0c7f23..7b86bf809d7a 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -134,11 +134,13 @@ void hugetlb_show_meminfo(void); unsigned long hugetlb_total_pages(void); vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags); +#ifdef CONFIG_USERFAULTFD int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, struct page **pagep); +#endif /* CONFIG_USERFAULTFD */ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, struct vm_area_struct *vma, vm_flags_t vm_flags); @@ -310,6 +312,7 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb, BUG(); } +#ifdef CONFIG_USERFAULTFD static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, struct vm_area_struct *dst_vma, @@ -320,6 +323,7 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, BUG(); return 0; } +#endif /* CONFIG_USERFAULTFD */ static inline pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 61fd15185f0a..4422dab8db9a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4618,6 +4618,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, return ret; } +#ifdef CONFIG_USERFAULTFD /* * Used by userfaultfd UFFDIO_COPY. Based on mcopy_atomic_pte with * modifications for huge pages. @@ -4748,6 +4749,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, put_page(page); goto out; } +#endif /* CONFIG_USERFAULTFD */ static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma, int refs, struct page **pages, -- 2.30.1.766.gb4fecdf3b7-goog
[PATCH v9 2/6] userfaultfd: disable huge PMD sharing for MINOR registered VMAs
As the comment says: for the MINOR fault use case, although the page might be present and populated in the other (non-UFFD-registered) half of the mapping, it may be out of date, and we explicitly want userspace to get a minor fault so it can check and potentially update the page's contents. Huge PMD sharing would prevent these faults from occurring for suitably aligned areas, so disable it upon UFFD registration. Reviewed-by: Peter Xu Reviewed-by: Mike Kravetz Signed-off-by: Axel Rasmussen --- include/linux/userfaultfd_k.h | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 0390e5ac63b3..e060d5f77cc5 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -56,12 +56,19 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, } /* - * Never enable huge pmd sharing on uffd-wp registered vmas, because uffd-wp - * protect information is per pgtable entry. + * Never enable huge pmd sharing on some uffd registered vmas: + * + * - VM_UFFD_WP VMAs, because write protect information is per pgtable entry. + * + * - VM_UFFD_MINOR VMAs, because otherwise we would never get minor faults for + * VMAs which share huge pmds. (If you have two mappings to the same + * underlying pages, and fault in the non-UFFD-registered one with a write, + * with huge pmd sharing this would *also* setup the second UFFD-registered + * mapping, and we'd not get minor faults.) */ static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) { - return vma->vm_flags & VM_UFFD_WP; + return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); } static inline bool userfaultfd_missing(struct vm_area_struct *vma) -- 2.30.1.766.gb4fecdf3b7-goog
[PATCH v9 0/6] userfaultfd: add minor fault handling
Base This series is based on v5.12-rc1. Additionally, this series depends on Peter Xu's series to allow disabling huge pmd sharing. [1] https://lore.kernel.org/patchwork/cover/1382204/ Changelog = v8->v9: - Removed an unneeded double !! from a VM_BUG_ON check in handle_userfault. - Introduced a handle_userfault helper in hugetlb.c, to reduce repetition. - Rebased to v5.12-rc1, which has Mike's hugetlb changes which originally motivated rebasing onto akpm's tree (so, it also applies cleanly to akpm's tree). v7->v8: - Check CONFIG_HAVE_ARCH_USERFAULTFD_MINOR instead of commenting in userfaultfd_register. - Remove redundant "ret = -EINVAL;" in userfaultfd_register. - Revert removing trailing \ in include/trace/events/mmflags.h. - Don't set "*pagep = NULL" in the is_continue case in hugetlb_mcopy_atomic_pte. v6->v7: - Based upon discussion, switched back to the VM_* flags approach which was used in v5, instead of implementing this as an API feature. Switched to using a high bit (instead of brokenly conflicting with VM_LOCKED), which implies introducing CONFIG_HAVE_ARCH_USERFAULTFD_MINOR and selecting it only on 64-bit architectures (x86_64 and arm64 for now). v5->v6: - Fixed the condition guarding a second case where we unlock_page() in hugetlb_mcopy_atomic_pte(). - Significantly refactored how minor registration works. Because there are no VM_* flags available to use, it has to be a userfaultfd API feature, rather than a registration mode. This has a few knock on consequences worth calling out: - userfaultfd_minor() can no longer be inline, because we have to inspect the userfaultfd_ctx, which is only defined in fs/userfaultfd.c. This means slightly more overhead (1 function call) on all hugetlbfs minor faults. - vma_can_userfault() no longer changes. It seems valid to me to create an FD with the minor fault feature enabled, and then register e.g. some non-hugetlbfs region in MISSING mode, fully expecting to not get any minor faults for it, alongside some other region which you *do* want minor faults for. So, at registration time, either should be accepted. - Since I'm no longer adding a new registration mode, I'm no longer introducing __VM_UFFD_FLAGS or UFFD_API_REGISTER_MODES, and all the related cleanups have been reverted. v4->v5: - Typo fix in the documentation update. - Removed comment in vma_can_userfault. The same information is better covered in the documentation update, so the comment is unnecessary (and slightly confusing as written). - Reworded comment for MCOPY_ATOMIC_CONTINUE mode. - For non-shared CONTINUE, only make the PTE(s) non-writable, don't change flags on the VMA. - In hugetlb_mcopy_atomic_pte, always unlock the page in MCOPY_ATOMIC_CONTINUE, even if we don't have VM_SHARED. - In hugetlb_mcopy_atomic_pte, introduce "bool is_continue" to make that kind of mode check more terse. - Merged two nested if()s into a single expression in __mcopy_atomic_hugetlb. - Moved "return -EINVAL if MCOPY_CONTINUE isn't supported for this vma type" up one level, into __mcopy_atomic. - Rebased onto linux-next/akpm, instead of the latest 5.11 RC. Resolved conflicts with Mike's recent hugetlb changes. v3->v4: - Relaxed restriction for minor registration to allow any hugetlb VMAs, not just those with VM_SHARED. Fixed setting VM_WRITE flag in a CONTINUE ioctl for non-VM_SHARED VMAs. - Reordered if() branches in hugetlb_mcopy_atomic_pte, so the conditions are simpler and easier to read. - Reverted most of the mfill_atomic_pte change (the anon / shmem path). Just return -EINVAL for CONTINUE, and set zeropage = (mode == MCOPY_ATOMIC_ZEROPAGE), so we can keep the delta small. - Split out adding #ifdef CONFIG_USERFAULTFD to a separate patch (instead of lumping it together with adding UFFDIO_CONTINUE). - Fixed signature of hugetlb_mcopy_atomic_pte for !CONFIG_HUGETLB_PAGE (signature must be the same in either case). - Rebased onto a newer version of Peter's patches to disable huge PMD sharing. v2->v3: - Added #ifdef CONFIG_USERFAULTFD around hugetlb helper functions, to fix build errors when building without CONFIG_USERFAULTFD set. v1->v2: - Fixed a bug in the hugetlb_mcopy_atomic_pte retry case. We now plumb in the enum mcopy_atomic_mode, so we can differentiate between the three cases this function needs to handle: 1) We're doing a COPY op, and need to allocate a page, add to cache, etc. 2) We're doing a COPY op, but allocation in this function failed previously; we're in the retry path. The page was allocated, but not e.g. added to page cache, so that still needs to be done. 3) We're doing a CONTINUE op, we need to look up an existing page instead of allocating a new one. - Rebased onto a newer version of Peter's patches to disable huge PMD sharing, which fixes syzbot complaints on some non-x86 architectures. - Moved __VM_UFFD_FLAGS into
[PATCH v9 1/6] userfaultfd: add minor fault registration mode
This feature allows userspace to intercept "minor" faults. By "minor" faults, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. This commit adds the new registration mode, and sets the relevant flag on the VMAs being registered. In the hugetlb fault path, if we find that we have huge_pte_none(), but find_lock_page() does indeed find an existing page, then we have a "minor" fault, and if the VMA has the userfaultfd registration flag, we call into userfaultfd to handle it. This is implemented as a new registration mode, instead of an API feature. This is because the alternative implementation has significant drawbacks [1]. However, doing it this was requires we allocate a VM_* flag for the new registration mode. On 32-bit systems, there are no unused bits, so this feature is only supported on architectures with CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in MINOR mode on 32-bit architectures, we return -EINVAL. [1] https://lore.kernel.org/patchwork/patch/1380226/ Reviewed-by: Peter Xu Signed-off-by: Axel Rasmussen --- arch/arm64/Kconfig | 1 + arch/x86/Kconfig | 1 + fs/proc/task_mmu.c | 3 ++ fs/userfaultfd.c | 78 ++- include/linux/mm.h | 7 +++ include/linux/userfaultfd_k.h| 15 +- include/trace/events/mmflags.h | 7 +++ include/uapi/linux/userfaultfd.h | 15 +- init/Kconfig | 5 ++ mm/hugetlb.c | 79 +--- 10 files changed, 149 insertions(+), 62 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 1f212b47a48a..ce6044273ef1 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -208,6 +208,7 @@ config ARM64 select SWIOTLB select SYSCTL_EXCEPTION_TRACE select THREAD_INFO_IN_TASK + select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD help ARM 64-bit (AArch64) Linux support. diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2792879d398e..7f71b71ed372 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -164,6 +164,7 @@ config X86 select HAVE_ARCH_TRANSPARENT_HUGEPAGE select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64 select HAVE_ARCH_USERFAULTFD_WP if X86_64 && USERFAULTFD + select HAVE_ARCH_USERFAULTFD_MINOR if X86_64 && USERFAULTFD select HAVE_ARCH_VMAP_STACK if X86_64 select HAVE_ARCH_WITHIN_STACK_FRAMES select HAVE_ASM_MODVERSIONS diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 3cec6fbef725..e1c9095ebe70 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -661,6 +661,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_PKEY_BIT4)] = "", #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR + [ilog2(VM_UFFD_MINOR)] = "ui", +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ }; size_t i; diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index e5ce3b4e6c3d..ba35cafa8b0d 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -197,24 +197,21 @@ static inline struct uffd_msg userfault_msg(unsigned long address, msg_init(); msg.event = UFFD_EVENT_PAGEFAULT; msg.arg.pagefault.address = address; + /* +* These flags indicate why the userfault occurred: +* - UFFD_PAGEFAULT_FLAG_WP indicates a write protect fault. +* - UFFD_PAGEFAULT_FLAG_MINOR indicates a minor fault. +* - Neither of these flags being set indicates a MISSING fault. +* +* Separately, UFFD_PAGEFAULT_FLAG_WRITE indicates it was a write +* fault. Otherwise, it was a read fault. +*/ if (flags & FAULT_FLAG_WRITE) - /* -* If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the -* uffdio_api.features and UFFD_PAGEFAULT_FLAG_WRITE -* was not set in a UFFD_EVENT_PAGEFAULT, it means it -* was a read fault, otherwise if set it means it's -* a write fault. -*/ msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WRITE; if (reason & VM_UFFD_WP) - /* -* If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the -* uffdio_api.features and UFFD_PAGEFAULT_FLAG_WP was -* not set in a
Re: [PATCH 1/2] fs: eventpoll: fix comments & kernel-doc notation
Randy Dunlap writes: > Use the documented kernel-doc format for function Return: descriptions. > Begin constant values in kernel-doc comments with '%'. > > Remove kernel-doc "/**" from 2 functions that are not documented with > kernel-doc notation. > > Fix typos, punctuation, & grammar. > > Also fix a few kernel-doc warnings: > > ../fs/eventpoll.c:1883: warning: Function parameter or member 'ep' not > described in 'ep_loop_check_proc' > ../fs/eventpoll.c:1883: warning: Excess function parameter 'priv' description > in 'ep_loop_check_proc' > ../fs/eventpoll.c:1932: warning: Function parameter or member 'ep' not > described in 'ep_loop_check' > ../fs/eventpoll.c:1932: warning: Excess function parameter 'from' description > in 'ep_loop_check' > > Signed-off-by: Randy Dunlap > Cc: Jonathan Corbet > Cc: linux-...@vger.kernel.org > Cc: Andrew Morton > Cc: Alexander Viro > --- > Jon: Al says that he is OK with one of you merging this fs/ > (only comments) patch. > > fs/eventpoll.c | 52 +++ > 1 file changed, 26 insertions(+), 26 deletions(-) Both patches applied, thanks. jon
Re: [PATCH 5.4 000/338] 5.4.102-rc2 review
On 3/1/21 11:47 AM, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 5.4.102 release. > There are 338 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Wed, 03 Mar 2021 19:43:25 +. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.102-rc2.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-5.4.y > and the diffstat can be found below. > > thanks, > > greg k-h On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels: Tested-by: Florian Fainelli -- Florian
Re: [PATCH] Documentation: ioctl: add entry for nsfs.h
Randy Dunlap writes: > All userspace ioctls major/magic number should be documented in > Documentation/userspace-api/ioctl/ioctl-number.rst, so add > the entry for . > > Signed-off-by: Randy Dunlap > Cc: Andrey Vagin > Cc: Serge Hallyn > Cc: Eric W. Biederman > Cc: linux-...@vger.kernel.org > Cc: Jonathan Corbet > --- > Feel free to modify the patch as needed. > > Probably don't need to backport: > # Fixes: 6786741dbf99 ("nsfs: add ioctl to get an owning user namespace for > ns file descriptor") > > Documentation/userspace-api/ioctl/ioctl-number.rst |1 + > 1 file changed, 1 insertion(+) Applied (rather belatedly, sorry). Thanks, jon
Re: [PATCH v1 02/15] powerpc/uaccess: Define ___get_user_instr() for ppc32
Hi Christophe, > +#else /* !CONFIG_PPC64 */ > +#define ___get_user_instr(gu_op, dest, ptr) \ > + gu_op((dest).val, (u32 __user *)(ptr)) > +#endif /* CONFIG_PPC64 */ > > #define get_user_instr(x, ptr) \ > ___get_user_instr(get_user, x, ptr) > @@ -91,18 +95,6 @@ static inline bool __access_ok(unsigned long addr, > unsigned long size) > #define __get_user_instr_inatomic(x, ptr) \ > ___get_user_instr(__get_user_inatomic, x, ptr) > > -#else /* !CONFIG_PPC64 */ > -#define get_user_instr(x, ptr) \ > - get_user((x).val, (u32 __user *)(ptr)) > - > -#define __get_user_instr(x, ptr) \ > - __get_user_nocheck((x).val, (u32 __user *)(ptr), sizeof(u32), true) > - > -#define __get_user_instr_inatomic(x, ptr) \ > - __get_user_nosleep((x).val, (u32 __user *)(ptr), sizeof(u32)) > - > -#endif /* CONFIG_PPC64 */ The previous version of __get_user_instr called __get_user_nocheck, this version calls __get_user. Likewise __get_user_instr_inatomic called __get_user_nosleep and now it calls __get_user_inatomic. I was confused by this until I chased the macro definitions and realised that both names refer to the same thing: #define __get_user(x, ptr) \ __get_user_nocheck((x), (ptr), sizeof(*(ptr)), true) #define __get_user_inatomic(x, ptr) \ __get_user_nosleep((x), (ptr), sizeof(*(ptr))) (I don't think you need to do anything here, I'm just documenting what I considered while reviewing your patch.) As such: Reviewed-by: Daniel Axtens Kind regards, Daniel > - > extern long __put_user_bad(void); > > #define __put_user_size(x, ptr, size, retval)\ > -- > 2.25.0
Re: [PATCH] docs: networking: bonding.rst Fix a typo in bonding.rst
Hello: This patch was applied to netdev/net.git (refs/heads/master): On Mon, 1 Mar 2021 21:28:23 +0900 you wrote: > This patch fixes a spelling typo in bonding.rst. > > Signed-off-by: Masanari Iida > --- > Documentation/networking/bonding.rst | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Here is the summary with links: - docs: networking: bonding.rst Fix a typo in bonding.rst https://git.kernel.org/netdev/net/c/2353db75c3db You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Re: [PATCH] Documentation: Replace more lkml.org links with lore
Kees Cook writes: > As started by commit 05a5f51ca566 ("Documentation: Replace lkml.org > links with lore"), replace a few more scattered lkml.org links with > lore to better use a single source that's more likely to stay available > long-term. > > Signed-off-by: Kees Cook > --- > CREDITS| 2 +- > tools/scripts/Makefile.include | 3 ++- > 2 files changed, 3 insertions(+), 2 deletions(-) I've (rather belatedly) applied this, thanks. jon
[tip:timers/urgent] BUILD SUCCESS 05f7fcc675f50001a30b8938c05d11ca9f599f8c
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/urgent branch HEAD: 05f7fcc675f50001a30b8938c05d11ca9f599f8c hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event() elapsed time: 731m configs tested: 124 configs skipped: 2 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm64allyesconfig arm allyesconfig arm allmodconfig arm defconfig arm64 defconfig arm moxart_defconfig m68kq40_defconfig powerpc katmai_defconfig alpha defconfig ia64 alldefconfig powerpc makalu_defconfig sh se7724_defconfig mips xway_defconfig armrealview_defconfig mipsvocore2_defconfig powerpc mpc832x_rdb_defconfig powerpc walnut_defconfig m68kmvme16x_defconfig armvexpress_defconfig powerpc chrp32_defconfig i386 allyesconfig mipsjmr3927_defconfig arcnsim_700_defconfig arm nhk8815_defconfig arm sama5_defconfig powerpc maple_defconfig sh alldefconfig sh kfr2r09_defconfig powerpc mpc834x_itxgp_defconfig riscvalldefconfig arm spitz_defconfig powerpcwarp_defconfig xtensa common_defconfig armneponset_defconfig sh magicpanelr2_defconfig armzeus_defconfig mips cu1830-neo_defconfig sh rsk7269_defconfig mips mpc30x_defconfig arm versatile_defconfig nios2alldefconfig powerpc ebony_defconfig powerpc mpc8313_rdb_defconfig riscvnommu_virt_defconfig powerpc mpc834x_mds_defconfig sparc defconfig sparc64 defconfig shapsh4ad0a_defconfig powerpc canyonlands_defconfig sh sh7710voipgw_defconfig mips decstation_r4k_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 allnoconfig c6x allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig s390 allmodconfig parisc allyesconfig s390defconfig sparcallyesconfig i386 tinyconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig i386 randconfig-a006-20210228 i386 randconfig-a005-20210228 i386 randconfig-a004-20210228 i386 randconfig-a003-20210228 i386 randconfig-a001-20210228 i386 randconfig-a002-20210228 i386 randconfig-a005-20210301 i386 randconfig-a003-20210301 i386 randconfig-a002-20210301 i386 randconfig-a004-20210301 i386 randconfig-a006-20210301 i386 randconfig-a001-20210301 x86_64 randconfig-a013-20210301 x86_64 randconfig-a016-20210301 x86_64 randconfig-a015-20210301 x86_64 randconfig-a014-20210301 x86_64 randconfig-a012-20210301 x86_64 randconfig-a011
Re: [PATCH v1] microblaze: tag highmem_setup() with __meminit
On Mon, Mar 01, 2021 at 12:47:49PM +0100, David Hildenbrand wrote: > With commit a0cd7a7c4bc0 ("mm: simplify free_highmem_page() and > free_reserved_page()") the kernel test robot complains about a warning: > > WARNING: modpost: vmlinux.o(.text.unlikely+0x23ac): Section mismatch in > reference from the function highmem_setup() to the function > .meminit.text:memblock_is_reserved() > > This has been broken ever since microblaze added highmem support, > because memblock_is_reserved() was already tagged with "__init" back then - > most probably the function always got inlined, so we never stumbled over > it. It might be good to point out that we need __meminit instead of __init because microblaze platform does not define CONFIG_ARCH_KEEP_MEMBLOCK, and __init_memblock fallsback to that. (I had to go and look as I was puzzled :-) ) Reviewed-by: Oscar Salvador > > Reported-by: kernel test robot > Fixes: 2f2f371f8907 ("microblaze: Highmem support") > Cc: Andrew Morton > Cc: Michal Simek > Cc: Mike Rapoport > Cc: Andrew Morton > Cc: Thomas Gleixner > Cc: Arvind Sankar > Cc: Ira Weiny > Cc: Randy Dunlap > Cc: Oscar Salvador > Cc: Anshuman Khandual > Signed-off-by: David Hildenbrand > --- > arch/microblaze/mm/init.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c > index 181e48782e6c..05cf1fb3f5ff 100644 > --- a/arch/microblaze/mm/init.c > +++ b/arch/microblaze/mm/init.c > @@ -52,7 +52,7 @@ static void __init highmem_init(void) > pkmap_page_table = virt_to_kpte(PKMAP_BASE); > } > > -static void highmem_setup(void) > +static void __meminit highmem_setup(void) > { > unsigned long pfn; > > -- > 2.29.2 > > -- Oscar Salvador SUSE L3
Re: [PATCH] docs: filesystem: Update smaps vm flag list to latest
Peter Xu writes: > We've missed a few documentation when adding new VM_* flags. Add the missing > pieces so they'll be in sync now. > > Signed-off-by: Peter Xu > --- > Documentation/filesystems/proc.rst | 5 + > 1 file changed, 5 insertions(+) So this patch doesn't apply; what version of the kernel did you generate it against? Could you redo against current kernels, please? Thanks, jon
Re: [x86, build] 6dafca9780: WARNING:at_arch/x86/kernel/ftrace.c:#ftrace_verify_code
On Sun, Feb 28, 2021 at 11:25 PM kernel test robot wrote: > > > Greeting, > > FYI, we noticed the following commit (built with clang-13): > > commit: 6dafca97803309c3cb5148d449bfa711e41ddef2 ("x86, build: use objtool > mcount") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master Thanks for the report, I'm able to reproduce the warning. > [4.764496] [ ftrace bug ] > [4.764847] ftrace failed to modify > [4.764852] do_sys_open (kbuild/src/consumer/fs/open.c:1186) > [4.765483] actual: 0f:1f:44:00:00 > [4.765784] Setting ftrace call site to call ftrace function > [4.766193] ftrace record flags: 5001 > [4.766490] (1) R > [4.766490] expected tramp: 81037af0 > [4.766959] [ cut here ] Basically, the problem is that ftrace_replace_code() expects to find ideal_nops[NOP_ATOMIC5] here, which in this case is 66:66:66:66:90, while objtool has replaced the __fentry__ call with 0f:1f:44:00:00. As ideal_nops changes depending on kernel config and hardware, when CC_USING_NOP_MCOUNT is defined we could either change ftrace_nop_replace() to always use P6_NOP5, or skip ftrace_verify_code() in ftrace_replace_code() for FTRACE_UPDATE_MAKE_CALL. Steven, Peter, any thoughts? Sami
Re: [PATCH v3 03/25] mm/vmstat: Add folio stat wrappers
On Mon, Mar 01, 2021 at 04:17:39PM -0500, Zi Yan wrote: > On 28 Jan 2021, at 2:03, Matthew Wilcox (Oracle) wrote: > > Allow page counters to be more readily modified by callers which have > > a folio. Name these wrappers with 'stat' instead of 'state' as requested > > Shouldn’t we change the stats with folio_nr_pages(folio) here? And all > changes below. Otherwise one folio is always counted as a single page. That's a good point. Looking through the changes in my current folio tree (which doesn't get as far as the thp tree did; ie doesn't yet allocate multi-page folios, so hasn't been tested with anything larger than a single page), the callers are ... @@ -2698,3 +2698,3 @@ int clear_page_dirty_for_io(struct page *page) - if (TestClearPageDirty(page)) { - dec_lruvec_page_state(page, NR_FILE_DIRTY); - dec_zone_page_state(page, NR_ZONE_WRITE_PENDING); + if (TestClearFolioDirty(folio)) { + dec_lruvec_folio_stat(folio, NR_FILE_DIRTY); + dec_zone_folio_stat(folio, NR_ZONE_WRITE_PENDING); @@ -2432,3 +2433,3 @@ void account_page_dirtied(struct page *page, struct addres s_space *mapping) - __inc_lruvec_page_state(page, NR_FILE_DIRTY); - __inc_zone_page_state(page, NR_ZONE_WRITE_PENDING); - __inc_node_page_state(page, NR_DIRTIED); + __inc_lruvec_folio_stat(folio, NR_FILE_DIRTY); + __inc_zone_folio_stat(folio, NR_ZONE_WRITE_PENDING); + __inc_node_folio_stat(folio, NR_DIRTIED); @@ -891 +890 @@ noinline int __add_to_page_cache_locked(struct page *page, - __inc_lruvec_page_state(page, NR_FILE_PAGES); + __inc_lruvec_folio_stat(folio, NR_FILE_PAGES); @@ -2759,2 +2759,2 @@ int test_clear_page_writeback(struct page *page) - dec_zone_page_state(page, NR_ZONE_WRITE_PENDING); - inc_node_page_state(page, NR_WRITTEN); + dec_zone_folio_stat(folio, NR_ZONE_WRITE_PENDING); + inc_node_folio_stat(folio, NR_WRITTEN); I think it's clear from this that I haven't found all the places that I need to change yet ;-) Looking at the places I did change in the thp tree, there are changes like this: @@ -860,27 +864,30 @@ noinline int __add_to_page_cache_locked(struct page *page, - if (!huge) - __inc_lruvec_page_state(page, NR_FILE_PAGES); + if (!huge) { + __mod_lruvec_page_state(page, NR_FILE_PAGES, nr); + if (nr > 1) + __mod_node_page_state(page_pgdat(page), + NR_FILE_THPS, nr); + } ... but I never did do some of the changes which the above changes imply are needed. So the thp tree probably had all kinds of bad statistics that I never noticed. So ... at least some of the users are definitely going to want to cache the 'nr_pages' and use it multiple times, including calling __mod_node_folio_state(), but others should do what you suggested. Thanks! I'll make that change.
AW: [PATCH 0/8] USB Audio Gadget part 2: Feedback endpoint, Volume/Mute support
Hi Ruslan, thanks a lot for your quick answer. > -Ursprüngliche Nachricht- > Von: Ruslan Bilovol > Gesendet: Montag, 1. März 2021 22:34 > An: Johannes Freyberger > Cc: Felipe Balbi ; Jonathan Corbet ; > Greg Kroah-Hartman ; Glenn Schmottlach > ; linux-...@vger.kernel.org; linux- > ker...@vger.kernel.org; Linux USB > Betreff: Re: [PATCH 0/8] USB Audio Gadget part 2: Feedback endpoint, > Volume/Mute support > > Hi Johannes, > > On Mon, Mar 1, 2021 at 6:49 PM Johannes Freyberger > wrote: > > > > Hi Ruslan, > > > > thanks for all your efforts to make the USB Audio Gadget work in Win10 > > using UAC2. Meanwhile I managed to apply and compile your previous > > modifications and now my Raspberry PI shows up in the Windows Device > > Manager as a valid > > UAC2 audio device. Unfortunately it still doesn't work to transfer any > > audio as it seems the audio endpoints or the topology is not working. > > Are you testing my previous version of the patches on some older kernel? > > Just for records - these two patch sets (part 1 and part 2) are based on > Greg's > usb-next branch (commit b5a12546e779d4f5586f58e60e0ef5070a833a64 > which is based on v5.11-rc5 tag). I retested them today with a BBB board and > it works fine under Win 10. Also I rebased these two patchsets today against > latest Greg's usb-next branch which is now Linus's v5.12-rc1 tag and again it > works fine under Win10 - both Volume/Mute controls and audio streaming. > > These patches have been tested previously on Raspberry PI 4 running v5.9 > and v5.10 stable kernels. The only issues I've seen were because of > Raspberry's DWC2 DMA issue in the driver that I described in this cover > letter. > However if you disable volume/mute controls, it won't affect you. > > > I checked it > > with some tools and found one providing some information on the USB > > part (it's called UVCview.exe and is part of the Windows Driver Kit). > > Here's the output which I hope can give some hints on the problems > > still existing in this driver: > > From the output below I see UAC2 descriptors are completely screwed up > (or UVCview.exe doesn't show them correctly). Windows is very strict to the > descriptors and doesn't allow devices to start in case of any issues. > So if it appears as a valid UAC2 device in Device Manager, most likely > UVCview.exe doesn't decode UAC2 descriptors well. > You are right, they really look screwed up. Meanwhile I found another similar tool which also knows Audio 2.0 and here everything looks fine ( https://www.uwe-sieber.de/usbtreeview.html#download ) > Could you please also apply these patches to the latest kernel (v5.12-rc1) and > test? Yes, I'd like to do this and I want to apologize for my newbie questions in advance. But I have to admit I'm rather new to Linux, Kernel compiling etc. and I followed the description on https://www.raspberrypi.org/documentation/linux/kernel/building.md and then applied your patches - partially I had to do some modifications by hand as the sources had changed. The version I downloaded via "git clone --depth=1 https://github.com/raspberrypi/linux; seems to be Linux 5.10.17-v7l. And I cannot see the version you mention at https://github.com/raspberrypi/linux/branches . Where can I get the version v5.12-rc1 for these tests? > > Thanks, > Ruslan > Thanks to you for helping beginners like me, best regards, Johannes > > > > ---===>Device Information<===--- English product name: > > "Linux USB Audio Gadget" > > > > ConnectionStatus: > > Current Config Value: 0x01 -> Device Bus Speed: High > > Device Address:0x0F > > Open Pipes: 0 > > *!*ERROR: No open pipes! > > > > ===>Device Descriptor<=== > > bLength: 0x12 > > bDescriptorType: 0x01 > > bcdUSB: 0x0200 > > bDeviceClass: 0xEF -> This is a Multi-interface > > Function Code Device > > bDeviceSubClass: 0x02 -> This is the Common Class Sub > > Class > > bDeviceProtocol: 0x01 -> This is the Interface > > Association Descriptor protocol > > bMaxPacketSize0: 0x40 = (64) Bytes > > idVendor:0x1D6B = The Linux Foundation > > idProduct: 0x0101 > > bcdDevice: 0x0510 > > iManufacturer: 0x01 > > English (United States) "Linux 5.10.17-v7l-R3LAY_TEST+ with > > fe98.usb" > > iProduct: 0x02 > > English (United States) "Linux USB Audio Gadget" > > iSerialNumber: 0x00 > > bNumConfigurations:0x01 > > > > ===>Configuration Descriptor<=== > > bLength: 0x09 > > bDescriptorType: 0x02 > > wTotalLength:0x00E2 -> Validated > > bNumInterfaces:0x03 > >
Re: [PATCH] Documentation/submitting-patches: Extend commit message layout description
Borislav Petkov writes: > From: Borislav Petkov > Subject: [PATCH] Documentation/submitting-patches: Extend commit message > layout description > > Add more blurb about the level of detail that should be contained in a > patch's commit message. Extend and make more explicit what text should > be added under the --- line. Extend examples and split into more easily > palatable paragraphs. > > This has been partially carved out from a tip subsystem handbook > patchset by Thomas Gleixner: > > https://lkml.kernel.org/r/20181107171010.421878...@linutronix.de > > and incorporates follow-on comments. > > Signed-off-by: Borislav Petkov > --- > > /me sends the next generic topic blurb. > > Documentation/process/submitting-patches.rst | 89 > 1 file changed, 56 insertions(+), 33 deletions(-) Applied, with one tweak: > +If there are four patches in a patch series the individual patches may > +be numbered like this: 1/4, 2/4, 3/4, 4/4. This assures that developers > +understand the order in which the patches should be applied and that > +they have reviewed or applied all of the patches in the patch series. > > A couple of example Subjects:: > > Subject: [PATCH 2/5] ext2: improve scalability of bitmap searching > Subject: [PATCH v2 01/27] x86: fix eflags tracking > +Subject: [PATCH v2] sub/sys: Condensed patch summary > +Subject: [PATCH v2 M/N] sub/sys: Condensed patch summary It's no longer "a couple" so I made this "Here are some good example Subjects". Thanks, jon
Re: [PATCH v3 0/3] docs: arm: Improvements to Marvell SoC documentation
Lubomir Rintel writes: > Hi, > > please consider applying the patches chained to this message. > > The objective is to deal with the a large amount of dead links to > material that often comes handy in marvel.rst; and improve some details > along the way. > > Compared to v2, the patches "[PATCH v2 2/5] docs: arm: marvell: fix 38x > functional spec link" and "[PATCH v2 5/5] docs: arm: marvell: rename > marvel.rst to marvell.rst" have been removed, because analogous patches > have already been applied. Also, more dead links have been removed, > reducing the count of links removed in [PATCH v3 1/3] to one. > Detailed changelogs in individual patches. I've applied parts 1 and 3; since there is evidently an archive link for the one killed in part 2, I left that out. Thanks, jon