Re: [PATCH 5/5] dma-direct: always allow dma mask <= physiscal memory size
On Mon, Nov 19, 2018 at 5:50 PM Robin Murphy wrote: > > On 19/11/2018 14:18, Ramon Fried wrote: > > On Tue, Oct 9, 2018 at 8:02 AM Benjamin Herrenschmidt > > wrote: > >> > >> On Wed, 2018-10-03 at 16:10 -0700, Alexander Duyck wrote: > -* Because 32-bit DMA masks are so common we expect every > architecture > -* to be able to satisfy them - either by not supporting more > physical > -* memory, or by providing a ZONE_DMA32. If neither is the > case, the > -* architecture needs to use an IOMMU instead of the direct > mapping. > -*/ > - if (mask < phys_to_dma(dev, DMA_BIT_MASK(32))) > + u64 min_mask; > + > + if (IS_ENABLED(CONFIG_ZONE_DMA)) > + min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS); > + else > + min_mask = DMA_BIT_MASK(32); > + > + min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT); > + > + if (mask >= phys_to_dma(dev, min_mask)) > return 0; > -#endif > return 1; > } > >>> > >>> So I believe I have run into the same issue that Guenter reported. On > >>> an x86_64 system w/ Intel IOMMU. I wasn't able to complete boot and > >>> all probe attempts for various devices were failing with -EIO errors. > >>> > >>> I believe the last mask check should be "if (mask < phys_to_dma(dev, > >>> min_mask))" not a ">=" check. > >> > >> Right, that test is backwards. I needed to change it here too (powermac > >> with the rest of the powerpc series). > >> > >> Cheers, > >> Ben. > >> > >> > > > > Hi, I'm working on a MIPS64 soc with PCIe root complex on it, and it > > appears that this series of patches are causing all PCI drivers that > > request 64bit mask to fail with -5. > > It's broken in 4.19. However, I just checked, it working on master. > > We may need to backport a couple of patches to 4.19. I'm not sure > > though which patches should be backported as there were at least 10 > > patches resolving this dma_direct area recently. > > Christoph, Robin. > > Can we ask Greg to backport all these changes ? What do you think ? > > As far as I'm aware, the only real issue in 4.19 was my subtle breakage > around setting bus_dma_mask - that's fixed by 6778be4e5209, which > according to my inbox got picked up by autosel for 4.19 stable last week. > > Robin. Yep, 6778be4e5209 fixes the issue. Thanks a lot ! Ramon.
Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
On Tue, 2018-11-20 at 17:55 +1100, Alexey Kardashevskiy wrote: > > On 20/11/2018 14:51, Sam Bobroff wrote: > > On Tue, Nov 20, 2018 at 01:51:06PM +1100, Michael Ellerman wrote: > > > Alexey Kardashevskiy writes: > > > > > > > The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS > > > > call in > > > > skiboot's NPU driver does not touch the pci_error_type > > > > parameter so > > > > it might have garbage but the powernv code analyzes it > > > > nevertheless. > > > > > > > > This initializes pcierr and fstate to zero in all call sites. > > > > > > > > Signed-off-by: Alexey Kardashevskiy > > > > --- > > > > > > Can we tag this with a Fixes? And seems like it should probably > > > go to > > > stable, or can we not trigger this path on older kernels? > > > > > > cheers > > > > Hmm, it's triggered by use on an NPU PE so that would be any kernel > > that > > can run on P8 or later (AFAIK). > > > > It looks like the issue was present earlier, but the code was last > > touched when it was moved, in... > > > > 40ae5f693f6a ("powerpc/powernv: Drop PHB operation get_state()") > > > > ... which was back in v4.1. > > The original commit is > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c41a7f3f7593fe57 > > 2013-06-20 13:21:09 +0800 > === > powerpc/eeh: I/O chip EEH state retrieval > > The patch adds I/O chip backend to retrieve the state for the > indicated PE. While the PE state is temperarily unavailable, > the upper layer (powernv platform) should return default delay > (1 second). > === > > This was 3.10-rc5 (this is what that sha1's Makefile has). > > But this was not the issue till skiboot decided not to zero these > parameters so "Fixes:" should point to what? It should still be that commit, it's perfectly reasonable (however unlikely) that someone could be running a 3.10 kernel with a modern skiboot. > > > > Sam. > > > > > > Without this, this happens: > > > > > > > > pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data > > > > EEH: PHB#6 failure detected, location: N/A > > > > CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 4.19.0- > > > > le_f5a7bb7_aikATfstn1-p1 torvalds#106 > > > > Call Trace: > > > > [c03fea9df9c0] [c0a990ec] dump_stack+0xb0/0xf4 > > > > (unreliable) > > > > [c03fea9dfa00] [c0038480] > > > > eeh_dev_check_failure+0x1f0/0x5f0 > > > > [c03fea9dfaa0] [c00a2768] > > > > pnv_pci_read_config+0x128/0x160 > > > > [c03fea9dfae0] [c05d2b0c] > > > > pci_bus_read_config_dword+0x9c/0xf0 > > > > [c03fea9dfb40] [c05df3d4] pci_save_state+0x64/0x250 > > > > [c03fea9dfbc0] [c05e0730] > > > > pci_dev_save_and_disable+0x70/0xa0 > > > > [c03fea9dfbf0] [c05e4078] > > > > pci_try_reset_function+0x48/0xc0 > > > > [c03fea9dfc20] [c0081cbc1b1c] > > > > vfio_pci_ioctl+0x334/0xea0 [vfio_pci] > > > > [c03fea9dfcf0] [c0081ca9046c] > > > > vfio_device_fops_unl_ioctl+0x44/0x70 [vfio] > > > > [c03fea9dfd10] [c039fd84] do_vfs_ioctl+0xd4/0xa00 > > > > [c03fea9dfdb0] [c03a07b4] ksys_ioctl+0x104/0x120 > > > > [c03fea9dfe00] [c03a07f8] sys_ioctl+0x28/0x80 > > > > [c03fea9dfe20] [c000b3a4] system_call+0x5c/0x70 > > > > EEH: Detected error on PHB#6 > > > > EEH: This PCI device has failed 1 times in the last hour and > > > > will be permanently disabled after 5 fail > > > > ures. > > > > EEH: Notify device drivers to shutdown > > > > EEH: Beginning: 'error_detected(IO frozen)' > > > > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0) > > > > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0) > > > > EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci- > > > > >error_detected(IO frozen) > > > > EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can > > > > recover' > > > > EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci- > > > > >error_detected(IO frozen) > > > > EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can > > > > recover' > > > > EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci- > > > > >error_detected(IO frozen) > > > > EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can > > > > recover' > > > > EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci- > > > > >error_detected(IO frozen) > > > > EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can > > > > recover' > > > > EEH: Finished:'error_detected(IO frozen)' with aggregate > > > > recovery state:'can recover' > > > > EEH: Collect temporary log > > > > pnv_pci_dump_phb_diag_data: Unrecognized ioType 0 > > > > EEH: Reset without hotplug activity > > > > iommu: Removing device 0006:00:01.0 from group 4 > > > > iommu: Removing device 0006:00:01.1 from group 4 > > > > iommu: Removing device 0006:00:02.0 from group 4 > > > > iommu: Removing device 0006:00:02.1 from group 4 > > > > pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0 > > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7) > > > > pnv_eeh_restore_config: Can't reinit
Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
On 20/11/2018 14:51, Sam Bobroff wrote: > On Tue, Nov 20, 2018 at 01:51:06PM +1100, Michael Ellerman wrote: >> Alexey Kardashevskiy writes: >> >>> The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in >>> skiboot's NPU driver does not touch the pci_error_type parameter so >>> it might have garbage but the powernv code analyzes it nevertheless. >>> >>> This initializes pcierr and fstate to zero in all call sites. >>> >>> Signed-off-by: Alexey Kardashevskiy >>> --- >> >> Can we tag this with a Fixes? And seems like it should probably go to >> stable, or can we not trigger this path on older kernels? >> >> cheers > > Hmm, it's triggered by use on an NPU PE so that would be any kernel that > can run on P8 or later (AFAIK). > > It looks like the issue was present earlier, but the code was last > touched when it was moved, in... > > 40ae5f693f6a ("powerpc/powernv: Drop PHB operation get_state()") > > ... which was back in v4.1. The original commit is https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c41a7f3f7593fe57 2013-06-20 13:21:09 +0800 === powerpc/eeh: I/O chip EEH state retrieval The patch adds I/O chip backend to retrieve the state for the indicated PE. While the PE state is temperarily unavailable, the upper layer (powernv platform) should return default delay (1 second). === This was 3.10-rc5 (this is what that sha1's Makefile has). But this was not the issue till skiboot decided not to zero these parameters so "Fixes:" should point to what? > > Sam. > >>> Without this, this happens: >>> >>> pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data >>> EEH: PHB#6 failure detected, location: N/A >>> CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted >>> 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106 >>> Call Trace: >>> [c03fea9df9c0] [c0a990ec] dump_stack+0xb0/0xf4 (unreliable) >>> [c03fea9dfa00] [c0038480] eeh_dev_check_failure+0x1f0/0x5f0 >>> [c03fea9dfaa0] [c00a2768] pnv_pci_read_config+0x128/0x160 >>> [c03fea9dfae0] [c05d2b0c] pci_bus_read_config_dword+0x9c/0xf0 >>> [c03fea9dfb40] [c05df3d4] pci_save_state+0x64/0x250 >>> [c03fea9dfbc0] [c05e0730] pci_dev_save_and_disable+0x70/0xa0 >>> [c03fea9dfbf0] [c05e4078] pci_try_reset_function+0x48/0xc0 >>> [c03fea9dfc20] [c0081cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci] >>> [c03fea9dfcf0] [c0081ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 >>> [vfio] >>> [c03fea9dfd10] [c039fd84] do_vfs_ioctl+0xd4/0xa00 >>> [c03fea9dfdb0] [c03a07b4] ksys_ioctl+0x104/0x120 >>> [c03fea9dfe00] [c03a07f8] sys_ioctl+0x28/0x80 >>> [c03fea9dfe20] [c000b3a4] system_call+0x5c/0x70 >>> EEH: Detected error on PHB#6 >>> EEH: This PCI device has failed 1 times in the last hour and will be >>> permanently disabled after 5 fail >>> ures. >>> EEH: Notify device drivers to shutdown >>> EEH: Beginning: 'error_detected(IO frozen)' >>> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0) >>> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0) >>> EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen) >>> EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover' >>> EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen) >>> EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover' >>> EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen) >>> EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover' >>> EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen) >>> EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover' >>> EEH: Finished:'error_detected(IO frozen)' with aggregate recovery >>> state:'can recover' >>> EEH: Collect temporary log >>> pnv_pci_dump_phb_diag_data: Unrecognized ioType 0 >>> EEH: Reset without hotplug activity >>> iommu: Removing device 0006:00:01.0 from group 4 >>> iommu: Removing device 0006:00:01.1 from group 4 >>> iommu: Removing device 0006:00:02.0 from group 4 >>> iommu: Removing device 0006:00:02.1 from group 4 >>> pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0 >>> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7) >>> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7) >>> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7) >>> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7) >>> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7) >>> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7) >>> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7) >>> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7) >>> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7) >>> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7) >>> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7) >>> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7) >>> EEH: Sleep 5s ahead of partial hotplug >>> pci 0004:04 : [PE# 00] Setting
Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
On Tue, Nov 20, 2018 at 01:51:06PM +1100, Michael Ellerman wrote: > Alexey Kardashevskiy writes: > > > The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in > > skiboot's NPU driver does not touch the pci_error_type parameter so > > it might have garbage but the powernv code analyzes it nevertheless. > > > > This initializes pcierr and fstate to zero in all call sites. > > > > Signed-off-by: Alexey Kardashevskiy > > --- > > Can we tag this with a Fixes? And seems like it should probably go to > stable, or can we not trigger this path on older kernels? > > cheers Hmm, it's triggered by use on an NPU PE so that would be any kernel that can run on P8 or later (AFAIK). It looks like the issue was present earlier, but the code was last touched when it was moved, in... 40ae5f693f6a ("powerpc/powernv: Drop PHB operation get_state()") ... which was back in v4.1. Sam. > > Without this, this happens: > > > > pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data > > EEH: PHB#6 failure detected, location: N/A > > CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted > > 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106 > > Call Trace: > > [c03fea9df9c0] [c0a990ec] dump_stack+0xb0/0xf4 (unreliable) > > [c03fea9dfa00] [c0038480] eeh_dev_check_failure+0x1f0/0x5f0 > > [c03fea9dfaa0] [c00a2768] pnv_pci_read_config+0x128/0x160 > > [c03fea9dfae0] [c05d2b0c] pci_bus_read_config_dword+0x9c/0xf0 > > [c03fea9dfb40] [c05df3d4] pci_save_state+0x64/0x250 > > [c03fea9dfbc0] [c05e0730] pci_dev_save_and_disable+0x70/0xa0 > > [c03fea9dfbf0] [c05e4078] pci_try_reset_function+0x48/0xc0 > > [c03fea9dfc20] [c0081cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci] > > [c03fea9dfcf0] [c0081ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 > > [vfio] > > [c03fea9dfd10] [c039fd84] do_vfs_ioctl+0xd4/0xa00 > > [c03fea9dfdb0] [c03a07b4] ksys_ioctl+0x104/0x120 > > [c03fea9dfe00] [c03a07f8] sys_ioctl+0x28/0x80 > > [c03fea9dfe20] [c000b3a4] system_call+0x5c/0x70 > > EEH: Detected error on PHB#6 > > EEH: This PCI device has failed 1 times in the last hour and will be > > permanently disabled after 5 fail > > ures. > > EEH: Notify device drivers to shutdown > > EEH: Beginning: 'error_detected(IO frozen)' > > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0) > > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0) > > EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen) > > EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover' > > EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen) > > EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover' > > EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen) > > EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover' > > EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen) > > EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover' > > EEH: Finished:'error_detected(IO frozen)' with aggregate recovery > > state:'can recover' > > EEH: Collect temporary log > > pnv_pci_dump_phb_diag_data: Unrecognized ioType 0 > > EEH: Reset without hotplug activity > > iommu: Removing device 0006:00:01.0 from group 4 > > iommu: Removing device 0006:00:01.1 from group 4 > > iommu: Removing device 0006:00:02.0 from group 4 > > iommu: Removing device 0006:00:02.1 from group 4 > > pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0 > > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7) > > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7) > > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7) > > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7) > > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7) > > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7) > > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7) > > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7) > > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7) > > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7) > > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7) > > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7) > > EEH: Sleep 5s ahead of partial hotplug > > pci 0004:04 : [PE# 00] Setting up window#0 0..3fff pg=1000 > > pci 0004:05 : [PE# 18] Setting up window#0 0..3fff pg=1000 > > pci 0004:06 : [PE# 30] Setting up window#0 0..3fff pg=1000 > > pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fff pg=1000 > > pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fff pg=1000 > > pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fff pg=1000 > > EEH: Beginning: 'slot_reset' > > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0) > > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0) > > EEH: Finished:'slot_reset' with aggregate recovery state:'none' > > EEH: Notify device driver to resume > > EEH: B
Re: [PATCH net] net/ibmnvic: Fix deadlock problem in reset
From: Juliet Kim Date: Mon, 19 Nov 2018 15:59:22 -0600 > This patch changes to use rtnl_lock only during a reset to avoid > deadlock that could occur when a thread operating close is holding > rtnl_lock and waiting for reset_lock acquired by another thread, > which is waiting for rtnl_lock in order to set the number of tx/rx > queues during a reset. > > Also, we now setting the number of tx/rx queues during a soft reset > for failover or LPM events. > > Signed-off-by: Juliet Kim Applied.
Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
Alexey Kardashevskiy writes: > The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in > skiboot's NPU driver does not touch the pci_error_type parameter so > it might have garbage but the powernv code analyzes it nevertheless. > > This initializes pcierr and fstate to zero in all call sites. > > Signed-off-by: Alexey Kardashevskiy > --- Can we tag this with a Fixes? And seems like it should probably go to stable, or can we not trigger this path on older kernels? cheers > Without this, this happens: > > pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data > EEH: PHB#6 failure detected, location: N/A > CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted > 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106 > Call Trace: > [c03fea9df9c0] [c0a990ec] dump_stack+0xb0/0xf4 (unreliable) > [c03fea9dfa00] [c0038480] eeh_dev_check_failure+0x1f0/0x5f0 > [c03fea9dfaa0] [c00a2768] pnv_pci_read_config+0x128/0x160 > [c03fea9dfae0] [c05d2b0c] pci_bus_read_config_dword+0x9c/0xf0 > [c03fea9dfb40] [c05df3d4] pci_save_state+0x64/0x250 > [c03fea9dfbc0] [c05e0730] pci_dev_save_and_disable+0x70/0xa0 > [c03fea9dfbf0] [c05e4078] pci_try_reset_function+0x48/0xc0 > [c03fea9dfc20] [c0081cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci] > [c03fea9dfcf0] [c0081ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 > [vfio] > [c03fea9dfd10] [c039fd84] do_vfs_ioctl+0xd4/0xa00 > [c03fea9dfdb0] [c03a07b4] ksys_ioctl+0x104/0x120 > [c03fea9dfe00] [c03a07f8] sys_ioctl+0x28/0x80 > [c03fea9dfe20] [c000b3a4] system_call+0x5c/0x70 > EEH: Detected error on PHB#6 > EEH: This PCI device has failed 1 times in the last hour and will be > permanently disabled after 5 fail > ures. > EEH: Notify device drivers to shutdown > EEH: Beginning: 'error_detected(IO frozen)' > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0) > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0) > EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen) > EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover' > EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen) > EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover' > EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen) > EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover' > EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen) > EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover' > EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'can > recover' > EEH: Collect temporary log > pnv_pci_dump_phb_diag_data: Unrecognized ioType 0 > EEH: Reset without hotplug activity > iommu: Removing device 0006:00:01.0 from group 4 > iommu: Removing device 0006:00:01.1 from group 4 > iommu: Removing device 0006:00:02.0 from group 4 > iommu: Removing device 0006:00:02.1 from group 4 > pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0 > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7) > EEH: Sleep 5s ahead of partial hotplug > pci 0004:04 : [PE# 00] Setting up window#0 0..3fff pg=1000 > pci 0004:05 : [PE# 18] Setting up window#0 0..3fff pg=1000 > pci 0004:06 : [PE# 30] Setting up window#0 0..3fff pg=1000 > pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fff pg=1000 > pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fff pg=1000 > pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fff pg=1000 > EEH: Beginning: 'slot_reset' > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0) > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0) > EEH: Finished:'slot_reset' with aggregate recovery state:'none' > EEH: Notify device driver to resume > EEH: Beginning: 'resume' > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0) > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0) > EEH: Finished:'resume' > EEH: Recovery successful. > --- > arch/powerpc/platforms/powernv/eeh-powernv.c | 8 > arch/powerpc/platforms/powernv/pci-ioda.c| 4 ++-- > arch/powerpc/platforms/powernv/pci.c | 4 ++-- > 3 files changed, 8 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c > b/arch/powerpc/platforms/powernv/eeh-powernv.c > index abc0be7..f380
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
On 11/19/2018 6:49 PM, alex_gagn...@dellteam.com wrote: On 11/19/2018 02:33 PM, Sinan Kaya wrote: However; table assumes governance about for which entities firmware first should be enabled. There is no cross reference to _OSC or permission negotiation like _OST. Well, from an OSPM perspective, is FFS something that can be enabled or disabled? FFS seems to be static to OSPM, which would change the sort of assumptions we can reasonably make here. IMO, it can be enabled/disabled in BIOS. I have seen this implementation before. If the trigger is the presence of a statically compiled ACPI HEST table (as the current code does); presence of FFS would be static from OSPM perspective. BIOS could patch this table or hide it during boot. If FFS were to be negotiated via _OSC as indirectly implied in this series, then same BIOS could patch the ACPI table to return different values for the _OSC return. As I said in my previous email, the right place to talk about this is UEFI forum. The way I would present the problem to he spec writers is that, although the spec appears to be consistent, we've seen firmware vendors that made the wrong assumptions about HEST/_OSC. Instead of describing AER ownership with _OSC, they attempted to do it with HEST. So we should add an implementation note, or clarification about this. I agree. Cool. While the UEFI Secret Society debates, can we figure out if/how [patch 1/2] breaks those systems, or is it only [patch 2/2] of this series that we suspect? I went back and looked at both patches. Both of them are removing references to HEST table. I think both patches are impacted by this discussion. Alex
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
On 11/19/2018 02:33 PM, Sinan Kaya wrote: > True. I was trying to get it out in a rush. I omitted words. Sounds like you'd make an top notch spec writer! :p > However; table assumes governance about for which entities firmware first > should be enabled. There is no cross reference to _OSC or permission > negotiation like _OST. Well, from an OSPM perspective, is FFS something that can be enabled or disabled? FFS seems to be static to OSPM, which would change the sort of assumptions we can reasonably make here. >>> As I said in my previous email, the right place to talk about this is UEFI >>> forum. >> >> The way I would present the problem to he spec writers is that, although >> the spec appears to be consistent, we've seen firmware vendors that made >> the wrong assumptions about HEST/_OSC. Instead of describing AER >> ownership with _OSC, they attempted to do it with HEST. So we should add >> an implementation note, or clarification about this. > > I agree. Cool. While the UEFI Secret Society debates, can we figure out if/how [patch 1/2] breaks those systems, or is it only [patch 2/2] of this series that we suspect? Alex
Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
On Mon, 2018-11-19 at 10:44 -0200, Breno Leitao wrote: > On a signal handler return, the user could set a context with MSR[TS] bits > set, and these bits would be copied to task regs->msr. > > At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set, > several __get_user() are called and then a recheckpoint is executed. Do we have the same problem on the entry side? There are a bunch of copy_from_user() before we do a tm_reclaim() (ie when still transactional). Is there some chance of the same problem there? > This is a problem since a page fault (in kernel space) could happen when > calling __get_user(). If it happens, the process MSR[TS] bits were > already set, but recheckpoint was not executed, and SPRs are still invalid. > > The page fault can cause the current process to be de-scheduled, with > MSR[TS] active and without tm_recheckpoint() being called. More > importantly, without TEXAR[FS] bit set also. This patch is great and should go in ASAP > Since TEXASR might not have the FS bit set, and when the process is > scheduled back, it will try to reclaim, which will be aborted because of > the CPU is not in the suspended state, and, then, recheckpoint. This > recheckpoint will restore thread->texasr into TEXASR SPR, which might be > zero, hitting a BUG_ON(). > > [ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446! Can you put more of the backtrace here? Can be useful for people searching for a similar problem. > This patch simply delays the MSR[TS] set, so, if there is any page fault in > the __get_user() section, it does not have regs->msr[TS] set, since the TM > structures are still invalid, thus avoiding doing TM operations for > in-kernel exceptions and possible process reschedule. Can you put a comment in the code what says after the MSR[TS] setting, there can be no get/put_user before the recheckpoint? Also, it looks like the 32bit signals case is safe, but please add a comment in there too. > With this patch, the MSR[TS] will only be set just before recheckpointing > and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in > invalid state. > > It is not possible to move tm_recheckpoint to happen earlier, because it is > required to get the checkpointed registers from userspace, with > __get_user(), thus, the only way to avoid this undesired behavior is > delaying the MSR[TS] set, as done in this patch. > > Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals") > Cc: sta...@vger.kernel.org (v3.9+) > Signed-off-by: Breno Leitao > --- > arch/powerpc/kernel/signal_64.c | 29 +++-- > 1 file changed, 15 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c > index 83d51bf586c7..15b153bdd826 100644 > --- a/arch/powerpc/kernel/signal_64.c > +++ b/arch/powerpc/kernel/signal_64.c > @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct > *tsk, > if (MSR_TM_RESV(msr)) > return -EINVAL; > > - /* pull in MSR TS bits from user context */ > - regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); > - > - /* > - * Ensure that TM is enabled in regs->msr before we leave the signal > - * handler. It could be the case that (a) user disabled the TM bit > - * through the manipulation of the MSR bits in uc_mcontext or (b) the > - * TM bit was disabled because a sufficient number of context switches > - * happened whilst in the signal handler and load_tm overflowed, > - * disabling the TM bit. In either case we can end up with an illegal > - * TM state leading to a TM Bad Thing when we return to userspace. > - */ > - regs->msr |= MSR_TM; > - > /* pull in MSR LE from user context */ > regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE); > > @@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct > *tsk, > tm_enable(); > /* Make sure the transaction is marked as failed */ > tsk->thread.tm_texasr |= TEXASR_FS; > + > + /* pull in MSR TS bits from user context */ > + regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); > + > + /* > + * Ensure that TM is enabled in regs->msr before we leave the signal > + * handler. It could be the case that (a) user disabled the TM bit > + * through the manipulation of ararararthe MSR bits in uc_mcontext or > (b) the > + * TM bit was disabled because a sufficient number of context switches > + * happened whilst in the signal handler and load_tm overflowed, > + * disabling the TM bit. In either case we can end up with an illegal > + * TM state leading to a TM Bad Thing when we return to userspace. > + */ > + regs->msr |= MSR_TM; > + > /* This loads the checkpointed FP/VEC state, if used */ > tm_recheckpoint(&tsk->thread); >
Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
On Mon, Nov 19, 2018 at 03:25:17PM +1100, Alexey Kardashevskiy wrote: > The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in > skiboot's NPU driver does not touch the pci_error_type parameter so > it might have garbage but the powernv code analyzes it nevertheless. > > This initializes pcierr and fstate to zero in all call sites. > > Signed-off-by: Alexey Kardashevskiy The skiboot code for npu2_freeze_status() is just: *freeze_state = OPAL_EEH_STOPPED_NOT_FROZEN; return OPAL_SUCCESS; It looks like npu_freeze_status() is affected as well. I think we could consider these skiboot bugs, but this still seems like an improvement. Reviewed-by: Sam Bobroff > --- > > Without this, this happens: > > pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data > EEH: PHB#6 failure detected, location: N/A > CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted > 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106 > Call Trace: > [c03fea9df9c0] [c0a990ec] dump_stack+0xb0/0xf4 (unreliable) > [c03fea9dfa00] [c0038480] eeh_dev_check_failure+0x1f0/0x5f0 > [c03fea9dfaa0] [c00a2768] pnv_pci_read_config+0x128/0x160 > [c03fea9dfae0] [c05d2b0c] pci_bus_read_config_dword+0x9c/0xf0 > [c03fea9dfb40] [c05df3d4] pci_save_state+0x64/0x250 > [c03fea9dfbc0] [c05e0730] pci_dev_save_and_disable+0x70/0xa0 > [c03fea9dfbf0] [c05e4078] pci_try_reset_function+0x48/0xc0 > [c03fea9dfc20] [c0081cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci] > [c03fea9dfcf0] [c0081ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 > [vfio] > [c03fea9dfd10] [c039fd84] do_vfs_ioctl+0xd4/0xa00 > [c03fea9dfdb0] [c03a07b4] ksys_ioctl+0x104/0x120 > [c03fea9dfe00] [c03a07f8] sys_ioctl+0x28/0x80 > [c03fea9dfe20] [c000b3a4] system_call+0x5c/0x70 > EEH: Detected error on PHB#6 > EEH: This PCI device has failed 1 times in the last hour and will be > permanently disabled after 5 fail > ures. > EEH: Notify device drivers to shutdown > EEH: Beginning: 'error_detected(IO frozen)' > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0) > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0) > EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen) > EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover' > EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen) > EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover' > EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen) > EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover' > EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen) > EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover' > EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'can > recover' > EEH: Collect temporary log > pnv_pci_dump_phb_diag_data: Unrecognized ioType 0 > EEH: Reset without hotplug activity > iommu: Removing device 0006:00:01.0 from group 4 > iommu: Removing device 0006:00:01.1 from group 4 > iommu: Removing device 0006:00:02.0 from group 4 > iommu: Removing device 0006:00:02.1 from group 4 > pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0 > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7) > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7) > EEH: Sleep 5s ahead of partial hotplug > pci 0004:04 : [PE# 00] Setting up window#0 0..3fff pg=1000 > pci 0004:05 : [PE# 18] Setting up window#0 0..3fff pg=1000 > pci 0004:06 : [PE# 30] Setting up window#0 0..3fff pg=1000 > pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fff pg=1000 > pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fff pg=1000 > pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fff pg=1000 > EEH: Beginning: 'slot_reset' > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0) > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0) > EEH: Finished:'slot_reset' with aggregate recovery state:'none' > EEH: Notify device driver to resume > EEH: Beginning: 'resume' > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0) > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0) > EEH: Finished:'resume' > EEH: Recovery successful. > --- > arch/powerpc/platforms/powernv/eeh-powernv.c | 8 > arch/powerpc/platforms/powernv/pci-ioda.c| 4 ++-- > arch/powerpc/platforms/po
Re: [PATCH v2] powerpc/math-emu: Update macros from gmp-6.1.2
On Tue, 20 Nov 2018 at 05:24, Nick Desaulniers wrote: > > > > The only functional change I noticed was this in udiv_qrnnd. > > > > __r1 = (n1) % __d1; > > __q1 = (n1) / __d1; > > > > Becomes this: > > > > __q1 = (n1) / __d1; > > __r1 = (n1) - __q1 * __d1; > > > > This is equivalent as it instead of calculating the remainder > > using modulo, it uses the result of integer division to subtract the > > count of 'whole' d1 from r1. > > I don't understand this; why was this functional change made? I couldn't see why. It pre-dates GMP's mecurial history that . \ > > -(q) = (UWtype) __q1 * __ll_B | __q0; \ > > +(q) = __q1 * __ll_B | __q0; > > \ > > (r) = __r0; > > \ > >} while (0) > > This appears to now differ from the upstream source: > https://github.com/gcc-mirror/gcc/blob/f7289f563a5e447ac21a5901ed75aad0dbd37732/include/longlong.h#L1679 AIUI the upstream source is GMP: https://gmplib.org/repo/gmp/file/tip/longlong.h > If we're going to borrow implementations from GCC, let's borrow the > same implementation. Otherwise it's hard to have confidence in this > part of the patch. I agree we should use the upstream source. Segher, which tree contains the One True Upstream for longlong.h? Cheers, Joel
Re: [PATCH v02] powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update
Hi Michael, Thank you for the patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on v4.20-rc3 next-20181119] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Michael-Bringmann/powerpc-numa-Perform-full-re-add-of-CPU-for-PRRN-VPHN-topology-update/20181119-224033 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-allyesconfig (attached as .config) compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=powerpc All error/warnings (new ones prefixed by >>): arch/powerpc/mm/numa.c: In function 'numa_update_cpu_topology': >> arch/powerpc/mm/numa.c:1361:4: error: implicit declaration of function >> 'dlpar_cpu_readd'; did you mean 'raw_cpu_read'? >> [-Werror=implicit-function-declaration] dlpar_cpu_readd(cpu); ^~~ raw_cpu_read arch/powerpc/mm/numa.c: In function 'topology_schedule_update': >> arch/powerpc/mm/numa.c:1451:2: warning: this 'if' clause does not guard... >> [-Wmisleading-indentation] if (!topology_update_in_progress); ^~ arch/powerpc/mm/numa.c:1452:3: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if' schedule_work(&topology_work); ^ cc1: some warnings being treated as errors vim +1361 arch/powerpc/mm/numa.c 1298 1299 /* 1300 * Update the node maps and sysfs entries for each cpu whose home node 1301 * has changed. Returns 1 when the topology has changed, and 0 otherwise. 1302 * 1303 * readd_cpus: Also readd any CPUs that have changed affinity 1304 */ 1305 static int numa_update_cpu_topology(bool readd_cpus) 1306 { 1307 unsigned int cpu, sibling, changed = 0; 1308 struct topology_update_data *updates, *ud; 1309 cpumask_t updated_cpus; 1310 struct device *dev; 1311 int weight, new_nid, i = 0; 1312 1313 if ((!prrn_enabled && !vphn_enabled && topology_inited) || 1314 topology_update_in_progress) 1315 return 0; 1316 1317 weight = cpumask_weight(&cpu_associativity_changes_mask); 1318 if (!weight) 1319 return 0; 1320 1321 updates = kcalloc(weight, sizeof(*updates), GFP_KERNEL); 1322 if (!updates) 1323 return 0; 1324 1325 topology_update_in_progress = 1; 1326 1327 cpumask_clear(&updated_cpus); 1328 1329 for_each_cpu(cpu, &cpu_associativity_changes_mask) { 1330 /* 1331 * If siblings aren't flagged for changes, updates list 1332 * will be too short. Skip on this update and set for next 1333 * update. 1334 */ 1335 if (!cpumask_subset(cpu_sibling_mask(cpu), 1336 &cpu_associativity_changes_mask)) { 1337 pr_info("Sibling bits not set for associativity " 1338 "change, cpu%d\n", cpu); 1339 cpumask_or(&cpu_associativity_changes_mask, 1340 &cpu_associativity_changes_mask, 1341 cpu_sibling_mask(cpu)); 1342 cpu = cpu_last_thread_sibling(cpu); 1343 continue; 1344 } 1345 1346 new_nid = find_and_online_cpu_nid(cpu); 1347 1348 if ((new_nid == numa_cpu_lookup_table[cpu]) || 1349 !cpu_present(cpu)) { 1350 cpumask_andnot(&cpu_associativity_changes_mask, 1351 &cpu_associativity_changes_mask, 1352 cpu_sibling_mask(cpu)); 1353 if (cpu_present(cpu)) 1354 dbg("Assoc chg gives same node %d for cpu%d\n", 1355 new_nid, cpu); 1356 cpu = cpu_last_thread_sibling(cpu); 1357 continue; 1358 } 1359 1360 if (readd_cpus) > 1361 dlpar_cpu_readd(cpu); 1362 1363 for_each_cpu(sibli
[PATCH net] net/ibmnvic: Fix deadlock problem in reset
This patch changes to use rtnl_lock only during a reset to avoid deadlock that could occur when a thread operating close is holding rtnl_lock and waiting for reset_lock acquired by another thread, which is waiting for rtnl_lock in order to set the number of tx/rx queues during a reset. Also, we now setting the number of tx/rx queues during a soft reset for failover or LPM events. Signed-off-by: Juliet Kim --- drivers/net/ethernet/ibm/ibmvnic.c | 59 +--- drivers/net/ethernet/ibm/ibmvnic.h |2 + 2 files changed, 22 insertions(+), 39 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 7893bef..4a5de59 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1103,20 +1103,15 @@ static int ibmvnic_open(struct net_device *netdev) return 0; } - mutex_lock(&adapter->reset_lock); - if (adapter->state != VNIC_CLOSED) { rc = ibmvnic_login(netdev); - if (rc) { - mutex_unlock(&adapter->reset_lock); + if (rc) return rc; - } rc = init_resources(adapter); if (rc) { netdev_err(netdev, "failed to initialize resources\n"); release_resources(adapter); - mutex_unlock(&adapter->reset_lock); return rc; } } @@ -1124,8 +1119,6 @@ static int ibmvnic_open(struct net_device *netdev) rc = __ibmvnic_open(netdev); netif_carrier_on(netdev); - mutex_unlock(&adapter->reset_lock); - return rc; } @@ -1269,10 +1262,8 @@ static int ibmvnic_close(struct net_device *netdev) return 0; } - mutex_lock(&adapter->reset_lock); rc = __ibmvnic_close(netdev); ibmvnic_cleanup(netdev); - mutex_unlock(&adapter->reset_lock); return rc; } @@ -1820,20 +1811,15 @@ static int do_reset(struct ibmvnic_adapter *adapter, return rc; } else if (adapter->req_rx_queues != old_num_rx_queues || adapter->req_tx_queues != old_num_tx_queues) { - adapter->map_id = 1; release_rx_pools(adapter); release_tx_pools(adapter); - rc = init_rx_pools(netdev); - if (rc) - return rc; - rc = init_tx_pools(netdev); - if (rc) - return rc; - release_napi(adapter); - rc = init_napi(adapter); + release_vpd_data(adapter); + + rc = init_resources(adapter); if (rc) return rc; + } else { rc = reset_tx_pools(adapter); if (rc) @@ -1917,17 +1903,8 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter, adapter->state = VNIC_PROBED; return 0; } - /* netif_set_real_num_xx_queues needs to take rtnl lock here -* unless wait_for_reset is set, in which case the rtnl lock -* has already been taken before initializing the reset -*/ - if (!adapter->wait_for_reset) { - rtnl_lock(); - rc = init_resources(adapter); - rtnl_unlock(); - } else { - rc = init_resources(adapter); - } + + rc = init_resources(adapter); if (rc) return rc; @@ -1986,13 +1963,21 @@ static void __ibmvnic_reset(struct work_struct *work) struct ibmvnic_rwi *rwi; struct ibmvnic_adapter *adapter; struct net_device *netdev; + bool we_lock_rtnl = false; u32 reset_state; int rc = 0; adapter = container_of(work, struct ibmvnic_adapter, ibmvnic_reset); netdev = adapter->netdev; - mutex_lock(&adapter->reset_lock); + /* netif_set_real_num_xx_queues needs to take rtnl lock here +* unless wait_for_reset is set, in which case the rtnl lock +* has already been taken before initializing the reset +*/ + if (!adapter->wait_for_reset) { + rtnl_lock(); + we_lock_rtnl = true; + } reset_state = adapter->state; rwi = get_next_rwi(adapter); @@ -2020,12 +2005,11 @@ static void __ibmvnic_reset(struct work_struct *work) if (rc) { netdev_dbg(adapter->netdev, "Reset failed\n"); free_all_rwi(adapter); - mutex_unlock(&adapter->reset_lock); - return; } adapter->resetting = false; - mutex_unlock(&adapter->reset_lock); + if (we_lock
Re: [PATCH 4/9] PCI: consolidate PCI config entry in drivers/pci
Hi Christoph, On Thu, Nov 15, 2018 at 08:05:32PM +0100, Christoph Hellwig wrote: > There is no good reason to duplicate the PCI menu in every architecture. > Instead provide a selectable HAVE_PCI symbol that indicates availability > of PCI support, and a FORCE_PCI symbol to for PCI on and the handle the > rest in drivers/pci. > > Signed-off-by: Christoph Hellwig > Reviewed-by: Palmer Dabbelt > Acked-by: Max Filippov > Acked-by: Thomas Gleixner > Acked-by: Bjorn Helgaas > Acked-by: Geert Uytterhoeven For patches 4, 5, 7, 8 & 9: Acked-by: Paul Burton # MIPS parts Thanks, Paul
[PATCH v3] powerpc/ptrace: replace ptrace_report_syscall() with a tracehook call
From: Elvira Khabirova Arch code should use tracehook_*() helpers as documented in include/linux/tracehook.h, ptrace_report_syscall() is not expected to be used outside that file. Co-authored-by: Dmitry V. Levin Fixes: 5521eb4bca2d ("powerpc/ptrace: Add support for PTRACE_SYSEMU") Signed-off-by: Elvira Khabirova Signed-off-by: Dmitry V. Levin --- v3: add a descriptive comment v2: explicitly ignore tracehook_report_syscall_entry() return code arch/powerpc/kernel/ptrace.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index afb819f4ca68..e84220d91bbd 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -3266,7 +3266,12 @@ long do_syscall_trace_enter(struct pt_regs *regs) user_exit(); if (test_thread_flag(TIF_SYSCALL_EMU)) { - ptrace_report_syscall(regs); + /* +* A nonzero return code from tracehook_report_syscall_entry() +* tells us to prevent the syscall execution, but we are not +* going to execute it anyway. +*/ + (void) tracehook_report_syscall_entry(regs); /* * Returning -1 will skip the syscall execution. We want to * avoid clobbering any register also, thus, not 'gotoing' -- ldv
Re: [PATCH 3/9] MIPS: remove the HT_PCI config option
On Mon, Nov 19, 2018 at 01:01:41PM -0800, Paul Burton wrote: > On Thu, Nov 15, 2018 at 08:05:31PM +0100, Christoph Hellwig wrote: > > This option is always selected from LOONGSON_MACH3X. Switch to just > > seleting PCI from that option and definining LOONGSON_PCIIO_BASE based > > on CONFIG_LOONGSON_MACH3X. > > > > Signed-off-by: Christoph Hellwig > > --- > > arch/mips/Kconfig| 11 --- > > arch/mips/include/asm/mach-loongson64/loongson.h | 2 +- > > arch/mips/loongson64/Kconfig | 2 +- > > 3 files changed, 2 insertions(+), 13 deletions(-) > > > > diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig > > index 8272ea4c7264..7d28c9dd75d0 100644 > > --- a/arch/mips/Kconfig > > +++ b/arch/mips/Kconfig > > @@ -3040,17 +3040,6 @@ config PCI > > your box. Other bus systems are ISA, EISA, or VESA. If you have PCI, > > say Y, otherwise N. > > > > -config HT_PCI > > - bool "Support for HT-linked PCI" > > - default y > > - depends on CPU_LOONGSON3 > > - select PCI > > - select PCI_DOMAINS > > - help > > - Loongson family machines use Hyper-Transport bus for inter-core > > - connection and device connection. The PCI bus is a subordinate > > - linked at HT. Choose Y for Loongson-3 based machines. > > - > > config PCI_DOMAINS > > bool > > > > diff --git a/arch/mips/loongson64/Kconfig b/arch/mips/loongson64/Kconfig > > index c865b4b9b775..781a5156ab21 100644 > > --- a/arch/mips/loongson64/Kconfig > > +++ b/arch/mips/loongson64/Kconfig > > @@ -76,7 +76,7 @@ config LOONGSON_MACH3X > > select CPU_HAS_WB > > select HW_HAS_PCI > > select ISA > > - select HT_PCI > > + select PCI > > select I8259 > > select IRQ_MIPS_CPU > > select NR_CPUS_DEFAULT_4 > > Should this also select PCI_DOMAINS to preserve the existing behavior? > > If not, could you explain why in the commit message? Ah, I see - PCI already selects PCI_DOMAINS. I think it would have been worth mentioning but I don't mind if you don't think it a big enough deal to respin the patch, so: Acked-by: Paul Burton Thanks, Paul
Re: [PATCH 3/9] MIPS: remove the HT_PCI config option
Hi Christoph, On Thu, Nov 15, 2018 at 08:05:31PM +0100, Christoph Hellwig wrote: > This option is always selected from LOONGSON_MACH3X. Switch to just > seleting PCI from that option and definining LOONGSON_PCIIO_BASE based > on CONFIG_LOONGSON_MACH3X. > > Signed-off-by: Christoph Hellwig > --- > arch/mips/Kconfig| 11 --- > arch/mips/include/asm/mach-loongson64/loongson.h | 2 +- > arch/mips/loongson64/Kconfig | 2 +- > 3 files changed, 2 insertions(+), 13 deletions(-) > > diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig > index 8272ea4c7264..7d28c9dd75d0 100644 > --- a/arch/mips/Kconfig > +++ b/arch/mips/Kconfig > @@ -3040,17 +3040,6 @@ config PCI > your box. Other bus systems are ISA, EISA, or VESA. If you have PCI, > say Y, otherwise N. > > -config HT_PCI > - bool "Support for HT-linked PCI" > - default y > - depends on CPU_LOONGSON3 > - select PCI > - select PCI_DOMAINS > - help > - Loongson family machines use Hyper-Transport bus for inter-core > - connection and device connection. The PCI bus is a subordinate > - linked at HT. Choose Y for Loongson-3 based machines. > - > config PCI_DOMAINS > bool > > diff --git a/arch/mips/loongson64/Kconfig b/arch/mips/loongson64/Kconfig > index c865b4b9b775..781a5156ab21 100644 > --- a/arch/mips/loongson64/Kconfig > +++ b/arch/mips/loongson64/Kconfig > @@ -76,7 +76,7 @@ config LOONGSON_MACH3X > select CPU_HAS_WB > select HW_HAS_PCI > select ISA > - select HT_PCI > + select PCI > select I8259 > select IRQ_MIPS_CPU > select NR_CPUS_DEFAULT_4 Should this also select PCI_DOMAINS to preserve the existing behavior? If not, could you explain why in the commit message? Thanks, Paul
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya wrote: > > On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote: > > I've asked around a few people at Dell and they unanimously agree that > > _OSC is the correct way to determine ownership of AER. In linux, we > > use the result of _OSC to enable AER services, but we use HEST to > > determine AER ownership. That's inconsistent. This series drops the > > use of HEST in favor of _OSC. > > > > [1]https://lkml.org/lkml/2018/11/15/62 > > This change breaks the existing systems that rely on the HEST table > telling the operating system about firmware first presence. > > Besides, HEST table has much more granularity about which PCI component > needs firmware such as global/device/switch. > > You should probably circulate these ideas for wider consumption in UEFI > forum as UEFI owns the HEST table definition. I agree with Sinan, this will break existing systems, and the granularity of the HEST definition is more useful than the single bit in _OSC.
Re: [PATCH 5/5] dma-direct: always allow dma mask <= physiscal memory size
On Tue, Oct 9, 2018 at 8:02 AM Benjamin Herrenschmidt wrote: > > On Wed, 2018-10-03 at 16:10 -0700, Alexander Duyck wrote: > > > -* Because 32-bit DMA masks are so common we expect every > > > architecture > > > -* to be able to satisfy them - either by not supporting more > > > physical > > > -* memory, or by providing a ZONE_DMA32. If neither is the case, > > > the > > > -* architecture needs to use an IOMMU instead of the direct > > > mapping. > > > -*/ > > > - if (mask < phys_to_dma(dev, DMA_BIT_MASK(32))) > > > + u64 min_mask; > > > + > > > + if (IS_ENABLED(CONFIG_ZONE_DMA)) > > > + min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS); > > > + else > > > + min_mask = DMA_BIT_MASK(32); > > > + > > > + min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT); > > > + > > > + if (mask >= phys_to_dma(dev, min_mask)) > > > return 0; > > > -#endif > > > return 1; > > > } > > > > So I believe I have run into the same issue that Guenter reported. On > > an x86_64 system w/ Intel IOMMU. I wasn't able to complete boot and > > all probe attempts for various devices were failing with -EIO errors. > > > > I believe the last mask check should be "if (mask < phys_to_dma(dev, > > min_mask))" not a ">=" check. > > Right, that test is backwards. I needed to change it here too (powermac > with the rest of the powerpc series). > > Cheers, > Ben. > > Hi, I'm working on a MIPS64 soc with PCIe root complex on it, and it appears that this series of patches are causing all PCI drivers that request 64bit mask to fail with -5. It's broken in 4.19. However, I just checked, it working on master. We may need to backport a couple of patches to 4.19. I'm not sure though which patches should be backported as there were at least 10 patches resolving this dma_direct area recently. Christoph, Robin. Can we ask Greg to backport all these changes ? What do you think ? Thanks, Ramon.
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
On 11/19/2018 3:16 PM, alex_gagn...@dellteam.com wrote: On 11/19/2018 01:32 PM, Sinan Kaya wrote: ACPI 6.2: 18.3.2.4 PCI Express Root Port AER Structure Flags: Bit [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that system firmware will handle errors from this source first. Bit [1] - GLOBAL: If set, indicates that the settings contained in this structure apply globally to all PCI Express Devices. All other bits must be set to zero. It doesn't say shall, may or might. It says will. It says "system firmware will handle errors". It does not say "system firmware owns AER registers". In absence on any descriptor text on the meaning of these tables, this really looks to me like it should be interpreted as a descriptor of APEI error sources, not a mutex on who writes to certain bits-- AER in this case. True. I was trying to get it out in a rush. I omitted words. However; table assumes governance about for which entities firmware first should be enabled. There is no cross reference to _OSC or permission negotiation like _OST. I don't think that is contradictory or inconsistent. I also wasn't able to find any reference to HEST in UEFI 2.7, only in ACPI spec. You are right. It was a confusion on my side. The right place to look is ACPI specification. I was involved in this a couple of years ago. Some pieces were in UEFI spec. Other pieces were in ACPI. I guess they got unified now. I think It depends on your PCI topology. For other topologies with multiple PCI root complexes, I can see this being used per root complex flag to indicate which root complex needs firmware first and which one doesn't. _OSC is per root bus, so it's already granular enough, right? Why would it depend on PCI topology? I was speculating. I don't have the full background on this. Need to consult the spec developers. As I said in my previous email, the right place to talk about this is UEFI forum. The way I would present the problem to he spec writers is that, although the spec appears to be consistent, we've seen firmware vendors that made the wrong assumptions about HEST/_OSC. Instead of describing AER ownership with _OSC, they attempted to do it with HEST. So we should add an implementation note, or clarification about this. I agree. Alex
Re: [PATCH net-next 2/6] net/bpf: split VLAN_PRESENT bit handling from VLAN_TCI
On Mon, Nov 19, 2018 at 12:26:46PM +0100, Daniel Borkmann wrote: > On 11/10/2018 07:58 PM, Michał Mirosław wrote: > > Signed-off-by: Michał Mirosław > > Why you have empty commit messages for non-trivial changes like this in > 4 out of 6 of your patches ... > > How was it tested on the JITs you were changing? Did you test on both, > big and little endian machines? I have only x86 boxes currently so didn't try to test others. I hope upstreaming the series through net-next will allow us to find any fallouts, if any. The changes are very simple, though: they move code around (the "splitting" part) and eventually change a vlan_present flag's position in a skbuff structure. Dependency on CPU endianness is removed by using byte loads for the flag. Best Regards, Michał Mirosław
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
On 11/19/2018 01:32 PM, Sinan Kaya wrote: > ACPI 6.2: > > 18.3.2.4 PCI Express Root Port AER Structure > > Flags: > > Bit [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that system > firmware will handle errors from this source first. > Bit [1] - GLOBAL: If set, indicates that the settings contained in this > structure apply globally to all PCI Express Devices. > All other bits must be set to zero. > > It doesn't say shall, may or might. It says will. It says "system firmware will handle errors". It does not say "system firmware owns AER registers". In absence on any descriptor text on the meaning of these tables, this really looks to me like it should be interpreted as a descriptor of APEI error sources, not a mutex on who writes to certain bits-- AER in this case. I don't think that is contradictory or inconsistent. I also wasn't able to find any reference to HEST in UEFI 2.7, only in ACPI spec. > I think It depends on your PCI topology. > > For other topologies with multiple PCI root complexes, I can see this being > used per root complex flag to indicate which root complex needs firmware first > and which one doesn't. _OSC is per root bus, so it's already granular enough, right? Why would it depend on PCI topology? >> I'd like see how exactly we break one of those elusive systems with _OSC. I >> suspect _OSC and HEST end up having the same information, and that's why we >> didn't see any real-life issue with mixing the approaches. > > I'm already aware of two systems that rely on HEST table to pass information > to > the OS that firmware first is enabled. Both of the systems do not change their > _OSC bits during this assuming HEST table has priority over _OSC for firmware > first. Are those hax86 systems? It seems like the systems have broken firmware. I see several ways to handle broken systems like those: - Parse both HEST and _OSC, and decide AER ownership with root bridge granularity. i.e. host_bridge->native_aer is authoritative, but is derived from both HEST and _OSC - Add quirks for the broken systems - Keep doing what we're doing until current code breaks a new system > If we add this patch, OS will try to claim the AER address space while > firmware > wants exclusive access. Yay! FFS wants exclusive access, but does not claim it. Oh, FFS! > As I said in my previous email, the right place to talk about this is UEFI > forum. The way I would present the problem to he spec writers is that, although the spec appears to be consistent, we've seen firmware vendors that made the wrong assumptions about HEST/_OSC. Instead of describing AER ownership with _OSC, they attempted to do it with HEST. So we should add an implementation note, or clarification about this. Alex
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
UEFI HEST table specification also claims that it should be the ultimate table for when PCI firmware-first should be disabled/enabled. IIRC, EFI absorbed ACPI before FFS was a thing. Could you point me to the UEFI chapter that says HEST is authoritative? (not being a smartie, just that my free software PDF readers can't search within a file that large) ACPI 6.2: 18.3.2.4 PCI Express Root Port AER Structure Flags: Bit [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that system firmware will handle errors from this source first. Bit [1] - GLOBAL: If set, indicates that the settings contained in this structure apply globally to all PCI Express Devices. All other bits must be set to zero. It doesn't say shall, may or might. It says will. I think somebody needs to fix these. I saw an email from Harb Abdulhamid sent to aswg this morning. That's why I suggested circulating this idea in UEFI forums first. Let's see what everybody thinks. We can go from there. However you look at it, we have a glaring inconsistency in how we handle AER control in linux. I'm surprised we didn't see huge issues because of mixing HEST/_OSC. What systems rely on the HEST definition as opposed to _OSC? It doesn't make sense to me that you could have a system with mixed FFS and native AER on the same root port. The granularity of HEST shouldn't matter here because of how AER works. I think It depends on your PCI topology. For other topologies with multiple PCI root complexes, I can see this being used per root complex flag to indicate which root complex needs firmware first and which one doesn't. I'd like see how exactly we break one of those elusive systems with _OSC. I suspect _OSC and HEST end up having the same information, and that's why we didn't see any real-life issue with mixing the approaches. I'm already aware of two systems that rely on HEST table to pass information to the OS that firmware first is enabled. Both of the systems do not change their _OSC bits during this assuming HEST table has priority over _OSC for firmware first. If we add this patch, OS will try to claim the AER address space while firmware wants exclusive access. As I said in my previous email, the right place to talk about this is UEFI forum. Alex P.S. (SARCASM ALERT) Isn't UEFI is a pile of stuff that didn't stick to the wall? P.P.S I remember someone asking why we don't disable FFS in the BIOS. I think it would be next to impossible to get certain platform vendors to relinquish FFS control, unless the specs required things to be that way -- and had a "standard" way to do so. Then getting the specs to change in that direction is also a battle.
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
On 11/19/2018 12:24 PM, Sinan Kaya wrote: On 11/19/2018 1:10 PM, Keith Busch wrote: We can't really turn off firmware first in the kernel without asking help from the firmware. The _OSC method this patch utilizes is the ACPI spec defined way for the kernel to wrest control from firmware. BIOS specific menu settings shouldn't be our only recourse when we have a spec authority defining generic OS interfaces to accomplish the same thing. Unless there is a disagreement on the _OSC interpreation, we'd have to accept that platforms breaking from this patch are non-compliant. It depends on which spec you look :) UEFI HEST table specification also claims that it should be the ultimate table for when PCI firmware-first should be disabled/enabled. IIRC, EFI absorbed ACPI before FFS was a thing. Could you point me to the UEFI chapter that says HEST is authoritative? (not being a smartie, just that my free software PDF readers can't search within a file that large) I think somebody needs to fix these. I saw an email from Harb Abdulhamid sent to aswg this morning. That's why I suggested circulating this idea in UEFI forums first. Let's see what everybody thinks. We can go from there. However you look at it, we have a glaring inconsistency in how we handle AER control in linux. I'm surprised we didn't see huge issues because of mixing HEST/_OSC. What systems rely on the HEST definition as opposed to _OSC? It doesn't make sense to me that you could have a system with mixed FFS and native AER on the same root port. The granularity of HEST shouldn't matter here because of how AER works. I'd like see how exactly we break one of those elusive systems with _OSC. I suspect _OSC and HEST end up having the same information, and that's why we didn't see any real-life issue with mixing the approaches. Alex P.S. (SARCASM ALERT) Isn't UEFI is a pile of stuff that didn't stick to the wall? P.P.S I remember someone asking why we don't disable FFS in the BIOS. I think it would be next to impossible to get certain platform vendors to relinquish FFS control, unless the specs required things to be that way -- and had a "standard" way to do so. Then getting the specs to change in that direction is also a battle.
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
On 11/19/2018 1:10 PM, Keith Busch wrote: We can't really turn off firmware first in the kernel without asking help from the firmware. The _OSC method this patch utilizes is the ACPI spec defined way for the kernel to wrest control from firmware. BIOS specific menu settings shouldn't be our only recourse when we have a spec authority defining generic OS interfaces to accomplish the same thing. Unless there is a disagreement on the _OSC interpreation, we'd have to accept that platforms breaking from this patch are non-compliant. It depends on which spec you look :) UEFI HEST table specification also claims that it should be the ultimate table for when PCI firmware-first should be disabled/enabled. I think somebody needs to fix these. I saw an email from Harb Abdulhamid sent to aswg this morning. That's why I suggested circulating this idea in UEFI forums first. Let's see what everybody thinks. We can go from there.
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
On Mon, Nov 19, 2018 at 12:56:56PM -0500, Sinan Kaya wrote: > On 11/19/2018 12:41 PM, Keith Busch wrote: > > > Still, breaking existing systems that rely on HEST table is not cool. > > > I'd rather have users specify "pcie_ports=native" to skip FF rather than > > > having broken systems by default to be honest. > > The pcie_ports=native work-around ignores FF to potentially unknown > > results, though. > > > > How about being able to enable/disable FF in BIOS? > > We can't really turn off firmware first in the kernel without asking help > from the firmware. The _OSC method this patch utilizes is the ACPI spec defined way for the kernel to wrest control from firmware. BIOS specific menu settings shouldn't be our only recourse when we have a spec authority defining generic OS interfaces to accomplish the same thing. Unless there is a disagreement on the _OSC interpreation, we'd have to accept that platforms breaking from this patch are non-compliant. > Like you said, it causes unpredictable results. > > There will be two competing software trying to touch the same registers.
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
On 11/19/2018 12:41 PM, Keith Busch wrote: Still, breaking existing systems that rely on HEST table is not cool. I'd rather have users specify "pcie_ports=native" to skip FF rather than having broken systems by default to be honest. The pcie_ports=native work-around ignores FF to potentially unknown results, though. How about being able to enable/disable FF in BIOS? We can't really turn off firmware first in the kernel without asking help from the firmware. Like you said, it causes unpredictable results. There will be two competing software trying to touch the same registers.
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
On Mon, Nov 19, 2018 at 12:42:25PM -0500, Sinan Kaya wrote: > On 11/19/2018 12:32 PM, Sinan Kaya wrote: > > > > > > But we're not using HEST as a fine grain control. We disable native AER > > > handling if *any* device has FF set in HEST, and that just forces people > > > to use pcie_ports=native to get around that. > > > > > > > I don't see *any* in the code. aer_hest_parse() does the HEST table > > parsing. > > It switches to firmware first mode if global flag in HEST is set. Otherwise > > for each BDF in device, hest_match_pci() is used to do a cross-matching > > against > > HEST table contents. > > > > Am I missing something? > > I see. I think you are talking about aer_firmware_first, right? > > aer_set_firmware_first() and pcie_aer_get_firmware_first() seem to do the > right > thing. Right, but what difference does it make if device specific AER checks do the right thing if pcie_aer_init() doesn't even register it's port driver? > aer_firmware_first is probably getting set because events are all routed to a > single root port and aer_acpi_firmware_first() is used to decide whether AER > should be initialized or not. > > I think I understand what is going on now. > > Still, breaking existing systems that rely on HEST table is not cool. > I'd rather have users specify "pcie_ports=native" to skip FF rather than > having broken systems by default to be honest. The pcie_ports=native work-around ignores FF to potentially unknown results, though.
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
On 11/19/2018 12:32 PM, Sinan Kaya wrote: But we're not using HEST as a fine grain control. We disable native AER handling if *any* device has FF set in HEST, and that just forces people to use pcie_ports=native to get around that. I don't see *any* in the code. aer_hest_parse() does the HEST table parsing. It switches to firmware first mode if global flag in HEST is set. Otherwise for each BDF in device, hest_match_pci() is used to do a cross-matching against HEST table contents. Am I missing something? I see. I think you are talking about aer_firmware_first, right? aer_set_firmware_first() and pcie_aer_get_firmware_first() seem to do the right thing. aer_firmware_first is probably getting set because events are all routed to a single root port and aer_acpi_firmware_first() is used to decide whether AER should be initialized or not. I think I understand what is going on now. Still, breaking existing systems that rely on HEST table is not cool. I'd rather have users specify "pcie_ports=native" to skip FF rather than having broken systems by default to be honest.
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
On Mon, Nov 19, 2018 at 12:32:42PM -0500, Sinan Kaya wrote: > On 11/19/2018 11:53 AM, Keith Busch wrote: > > On Mon, Nov 19, 2018 at 11:53:05AM -0500, Tyler Baicar wrote: > > > On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya wrote: > > > > > > > > On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote: > > > > > I've asked around a few people at Dell and they unanimously agree that > > > > > _OSC is the correct way to determine ownership of AER. In linux, we > > > > > use the result of _OSC to enable AER services, but we use HEST to > > > > > determine AER ownership. That's inconsistent. This series drops the > > > > > use of HEST in favor of _OSC. > > > > > > > > > > [1]https://lkml.org/lkml/2018/11/15/62 > > > > > > > > This change breaks the existing systems that rely on the HEST table > > > > telling the operating system about firmware first presence. > > > > > > > > Besides, HEST table has much more granularity about which PCI component > > > > needs firmware such as global/device/switch. > > > > > > > > You should probably circulate these ideas for wider consumption in UEFI > > > > forum as UEFI owns the HEST table definition. > > > > > > I agree with Sinan, this will break existing systems, and the granularity > > > of the > > > HEST definition is more useful than the single bit in _OSC. > > > > But we're not using HEST as a fine grain control. We disable native AER > > handling if *any* device has FF set in HEST, and that just forces people > > to use pcie_ports=native to get around that. > > > > I don't see *any* in the code. aer_hest_parse() does the HEST table parsing. > It switches to firmware first mode if global flag in HEST is set. Otherwise > for each BDF in device, hest_match_pci() is used to do a cross-matching > against > HEST table contents. > > Am I missing something? You might be. :) static int aer_hest_parse(struct acpi_hest_header *hest_hdr, void *data) { /* * If no specific device is supplied, determine whether * FIRMWARE_FIRST is set for *any* PCIe device. */ if (!info->pci_dev) { info->firmware_first |= ff; return 0; }
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
On 11/19/2018 11:53 AM, Keith Busch wrote: On Mon, Nov 19, 2018 at 11:53:05AM -0500, Tyler Baicar wrote: On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya wrote: On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote: I've asked around a few people at Dell and they unanimously agree that _OSC is the correct way to determine ownership of AER. In linux, we use the result of _OSC to enable AER services, but we use HEST to determine AER ownership. That's inconsistent. This series drops the use of HEST in favor of _OSC. [1]https://lkml.org/lkml/2018/11/15/62 This change breaks the existing systems that rely on the HEST table telling the operating system about firmware first presence. Besides, HEST table has much more granularity about which PCI component needs firmware such as global/device/switch. You should probably circulate these ideas for wider consumption in UEFI forum as UEFI owns the HEST table definition. I agree with Sinan, this will break existing systems, and the granularity of the HEST definition is more useful than the single bit in _OSC. But we're not using HEST as a fine grain control. We disable native AER handling if *any* device has FF set in HEST, and that just forces people to use pcie_ports=native to get around that. I don't see *any* in the code. aer_hest_parse() does the HEST table parsing. It switches to firmware first mode if global flag in HEST is set. Otherwise for each BDF in device, hest_match_pci() is used to do a cross-matching against HEST table contents. Am I missing something?
Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
On Mon, Nov 19, 2018 at 11:53:05AM -0500, Tyler Baicar wrote: > On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya wrote: > > > > On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote: > > > I've asked around a few people at Dell and they unanimously agree that > > > _OSC is the correct way to determine ownership of AER. In linux, we > > > use the result of _OSC to enable AER services, but we use HEST to > > > determine AER ownership. That's inconsistent. This series drops the > > > use of HEST in favor of _OSC. > > > > > > [1]https://lkml.org/lkml/2018/11/15/62 > > > > This change breaks the existing systems that rely on the HEST table > > telling the operating system about firmware first presence. > > > > Besides, HEST table has much more granularity about which PCI component > > needs firmware such as global/device/switch. > > > > You should probably circulate these ideas for wider consumption in UEFI > > forum as UEFI owns the HEST table definition. > > I agree with Sinan, this will break existing systems, and the granularity of > the > HEST definition is more useful than the single bit in _OSC. But we're not using HEST as a fine grain control. We disable native AER handling if *any* device has FF set in HEST, and that just forces people to use pcie_ports=native to get around that.
Re: [PATCH v2 3/4] powerpc: add system call table generation support
On Wed, Nov 14, 2018 at 11:04 AM Firoz Khan wrote: > Adding a new table entry consisting of: > - System call number. > - ABI. > - System call name. > - Entry point name. > - Compat entry name, if required. > > syscallhdr.sh and syscalltbl.sh will generate uapi header- > unistd_32/64.h and syscall_table_32/64/c32.h files respect- > ively. File syscall_table_32/64/c32.h is included by sys- > call.S - the real system call table. Both .sh files will > parse the content syscall.tbl to generate the header and > table files. You don't mention how this handles the SPU, which seems to be the main difference from the other architectures. > +# The format is: > +# > +# > +# The can be common, 64, or 32 for this file. > +# > +0 common restart_syscall sys_restart_syscall > sys_restart_syscall > +1 common exitsys_exit > sys_exit > +2 common forkppc_fork > ppc_fork > +3 common readsys_read > sys_readsys_read > +4 common write sys_write > sys_write sys_write > +5 common opensys_open > compat_sys_open sys_open > +6 common close sys_close > sys_close sys_close > +7 common waitpid sys_waitpid > sys_waitpid sys_waitpid > +8 common creat sys_creat > sys_creat sys_creat The SPU syscall is always the same as the 64-bit syscall, so listing it explictily in the last column seems to add a lot of duplication, and makes the format different from the other architectures. Have you considered using the field (second column) to decide whether a syscall is used for SPU or not instead? I would have done it like |+0 nospu restart_syscall sys_restart_syscall sys_restart_syscall |+1 nospu exitsys_exit sys_exit |+2 nospu forkppc_fork ppc_fork |+3 common readsys_read sys_read |+4 common write sys_write sys_write |+5 common opensys_open compat_sys_open |+6 common close sys_close sys_close ... |+29132 fstatat64 sys_fstatat64 sys_fstatat64 |+29164 newfstatat sys_newfstatat |+291spu newfstatat sys_newfstatat ... with 'nospu' meaning 64+32+compat. > +9 common linksys_link > sys_linksys_link > +10 common unlink sys_unlink > sys_unlink sys_unlink > +11 common execve sys_execve > compat_sys_execve > +12 common chdir sys_chdir > sys_chdir sys_chdir > +13 common timesys_time > compat_sys_time sys_time > +14 common mknod sys_mknod > sys_mknod sys_mknod > +15 common chmod sys_chmod > sys_chmod sys_chmod > +16 common lchown sys_lchown > sys_lchown sys_lchown > +17 common break sys_ni_syscall > sys_ni_syscall > +18 32 oldstat sys_stat > sys_ni_syscall > +18 64 oldstat sys_ni_syscall 'oldstat' seems odd. Your conversion is correct, but the existing behavior of not providing support for the syscall in compat mode feels wrong. We have four calls in this category: arch/powerpc/include/asm/systbl.h:OLDSYS(stat) arch/powerpc/include/asm/systbl.h:OLDSYS(fstat) arch/powerpc/include/asm/systbl.h:OLDSYS(lstat) arch/powerpc/include/asm/systbl.h:OLDSYS(debug_setcontext) For the first three, it seems that the 64-bit kernel ought to set '__ARCH_WANT_OLD_STAT': diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index 96ddce5c76c3..335dfcc47f20 100644 --- a/arch/powerpc/include/asm/unistd.h +++
Re: [PATCH v2 2/4] powerpc: move macro definition from asm/systbl.h
On Wed, Nov 14, 2018 at 11:04 AM Firoz Khan wrote: > diff --git a/arch/powerpc/include/asm/systbl.h > b/arch/powerpc/include/asm/systbl.h > index 01b5171..c4321b9 100644 > --- a/arch/powerpc/include/asm/systbl.h > +++ b/arch/powerpc/include/asm/systbl.h > @@ -76,7 +76,6 @@ > SYSCALL_SPU(ssetmask) > SYSCALL_SPU(setreuid) > SYSCALL_SPU(setregid) > -#define compat_sys_sigsuspend sys_sigsuspend > SYS32ONLY(sigsuspend) I think the macro here is just a workaround for the fact that SYS32ONLY() always prepends the name with 'compat_' for the compat version, and there is no other macro to do this. After the conversion, this can easily be done using the regular table, as you need separate names for the 32-bit entries anyway. > SYSX(sys_ni_syscall,compat_sys_s > diff --git a/arch/powerpc/platforms/cell/spu_callbacks.c > b/arch/powerpc/platforms/cell/spu_callbacks.c > index 8ae8620..7517a43 100644 > --- a/arch/powerpc/platforms/cell/spu_callbacks.c > +++ b/arch/powerpc/platforms/cell/spu_callbacks.c > @@ -47,6 +47,7 @@ > #define COMPAT_SPU_NEW(func) sys_##func, > #define SYSX_SPU(f, f3264, f32)f, > > +#define compat_sys_sigsuspend sys_sigsuspend > #include > }; The spu_callbacks.c and systbl_chk.c files don't need this macro, but that doesn't matter once you drop this patch. Arnd
Re: [PATCH 09/10] drivers/perf: perf/core: generalise event exclusion checking with perf macro
On Fri, Nov 16, 2018 at 10:24:12AM +, Andrew Murray wrote: > Replace checking of perf event exclusion flags with perf macro. > > This is a functional change as exclude_host and exclude_guest are added > in the following files: > > - drivers/perf/qcom_l2_pmu.c > - drivers/perf/qcom_l3_pmu.c > - drivers/perf/arm_pmu.c > > And exclude_idle and exclude_hv are added in these files: > > - drivers/perf/xgene_pmu.c FWIW, that all sounds fine to me. Assuming you fix up the 'macro' nit: Acked-by: Mark Rutland ... unless we go for Peter's core cap for this. Thanks, Mark. > Signed-off-by: Andrew Murray > --- > drivers/perf/arm-cci.c | 7 +-- > drivers/perf/arm-ccn.c | 5 + > drivers/perf/arm_dsu_pmu.c | 7 +-- > drivers/perf/arm_pmu.c | 9 + > drivers/perf/hisilicon/hisi_uncore_pmu.c | 7 +-- > drivers/perf/qcom_l2_pmu.c | 3 +-- > drivers/perf/qcom_l3_pmu.c | 3 +-- > drivers/perf/xgene_pmu.c | 3 +-- > 8 files changed, 8 insertions(+), 36 deletions(-) > > diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c > index 1bfeb16..d749f19 100644 > --- a/drivers/perf/arm-cci.c > +++ b/drivers/perf/arm-cci.c > @@ -1328,12 +1328,7 @@ static int cci_pmu_event_init(struct perf_event *event) > return -EOPNOTSUPP; > > /* We have no filtering of any kind */ > - if (event->attr.exclude_user|| > - event->attr.exclude_kernel || > - event->attr.exclude_hv || > - event->attr.exclude_idle|| > - event->attr.exclude_host|| > - event->attr.exclude_guest) > + if (event_has_exclude_flags(event)) > return -EINVAL; > > /* > diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c > index 7dd850e..9a22a95 100644 > --- a/drivers/perf/arm-ccn.c > +++ b/drivers/perf/arm-ccn.c > @@ -741,10 +741,7 @@ static int arm_ccn_pmu_event_init(struct perf_event > *event) > return -EOPNOTSUPP; > } > > - if (has_branch_stack(event) || event->attr.exclude_user || > - event->attr.exclude_kernel || event->attr.exclude_hv || > - event->attr.exclude_idle || event->attr.exclude_host || > - event->attr.exclude_guest) { > + if (has_branch_stack(event) || event_has_exclude_flags(event)) { > dev_dbg(ccn->dev, "Can't exclude execution levels!\n"); > return -EINVAL; > } > diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c > index 660cb8a..300ff3d 100644 > --- a/drivers/perf/arm_dsu_pmu.c > +++ b/drivers/perf/arm_dsu_pmu.c > @@ -563,12 +563,7 @@ static int dsu_pmu_event_init(struct perf_event *event) > } > > if (has_branch_stack(event) || > - event->attr.exclude_user || > - event->attr.exclude_kernel || > - event->attr.exclude_hv || > - event->attr.exclude_idle || > - event->attr.exclude_host || > - event->attr.exclude_guest) { > + event_has_exclude_flags(event)) { > dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n"); > return -EINVAL; > } > diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c > index 7f01f6f..a03634f 100644 > --- a/drivers/perf/arm_pmu.c > +++ b/drivers/perf/arm_pmu.c > @@ -357,13 +357,6 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void > *dev) > } > > static int > -event_requires_mode_exclusion(struct perf_event_attr *attr) > -{ > - return attr->exclude_idle || attr->exclude_user || > -attr->exclude_kernel || attr->exclude_hv; > -} > - > -static int > __hw_perf_event_init(struct perf_event *event) > { > struct arm_pmu *armpmu = to_arm_pmu(event->pmu); > @@ -395,7 +388,7 @@ __hw_perf_event_init(struct perf_event *event) >*/ > if ((!armpmu->set_event_filter || >armpmu->set_event_filter(hwc, &event->attr)) && > - event_requires_mode_exclusion(&event->attr)) { > + event_has_exclude_flags(event)) { > pr_debug("ARM performance counters do not support " >"mode exclusion\n"); > return -EOPNOTSUPP; > diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c > b/drivers/perf/hisilicon/hisi_uncore_pmu.c > index 9efd241..d3edff9 100644 > --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c > +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c > @@ -143,12 +143,7 @@ int hisi_uncore_pmu_event_init(struct perf_event *event) > return -EOPNOTSUPP; > > /* counters do not have these bits */ > - if (event->attr.exclude_user|| > - event->attr.exclude_kernel || > - event->attr.exclude_host|| > - event->attr.exclude_guest || > - event->attr.exclude_hv || > - event->attr.exclude_idle) > + if (event_has_exclude_flags(event)) > return -EIN
Re: [PATCH 01/10] perf/core: Add macro to test for event exclusion flags
On Fri, Nov 16, 2018 at 10:24:04AM +, Andrew Murray wrote: > Add a macro that tests if any of the perf event exclusion flags > are set on a given event. > > Signed-off-by: Andrew Murray Aside from the s/macro/function, or s/macro/helper/, this looks sound to me. Assuming you fix that up here and in subsequent commit messages, for this patch feel free to add: Acked-by: Mark Rutland Mark. > --- > include/linux/perf_event.h | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index 53c500f..89ee7fa 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event, > extern void > perf_log_lost_samples(struct perf_event *event, u64 lost); > > +static inline bool event_has_exclude_flags(struct perf_event *event) > +{ > + struct perf_event_attr *attr = &event->attr; > + > + return attr->exclude_idle || attr->exclude_user || > +attr->exclude_kernel || attr->exclude_hv || > +attr->exclude_guest || attr->exclude_host; > +} > + > static inline bool is_sampling_event(struct perf_event *event) > { > return event->attr.sample_period != 0; > -- > 2.7.4 >
Re: [PATCH 5/5] dma-direct: always allow dma mask <= physiscal memory size
On 19/11/2018 14:18, Ramon Fried wrote: On Tue, Oct 9, 2018 at 8:02 AM Benjamin Herrenschmidt wrote: On Wed, 2018-10-03 at 16:10 -0700, Alexander Duyck wrote: -* Because 32-bit DMA masks are so common we expect every architecture -* to be able to satisfy them - either by not supporting more physical -* memory, or by providing a ZONE_DMA32. If neither is the case, the -* architecture needs to use an IOMMU instead of the direct mapping. -*/ - if (mask < phys_to_dma(dev, DMA_BIT_MASK(32))) + u64 min_mask; + + if (IS_ENABLED(CONFIG_ZONE_DMA)) + min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS); + else + min_mask = DMA_BIT_MASK(32); + + min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT); + + if (mask >= phys_to_dma(dev, min_mask)) return 0; -#endif return 1; } So I believe I have run into the same issue that Guenter reported. On an x86_64 system w/ Intel IOMMU. I wasn't able to complete boot and all probe attempts for various devices were failing with -EIO errors. I believe the last mask check should be "if (mask < phys_to_dma(dev, min_mask))" not a ">=" check. Right, that test is backwards. I needed to change it here too (powermac with the rest of the powerpc series). Cheers, Ben. Hi, I'm working on a MIPS64 soc with PCIe root complex on it, and it appears that this series of patches are causing all PCI drivers that request 64bit mask to fail with -5. It's broken in 4.19. However, I just checked, it working on master. We may need to backport a couple of patches to 4.19. I'm not sure though which patches should be backported as there were at least 10 patches resolving this dma_direct area recently. Christoph, Robin. Can we ask Greg to backport all these changes ? What do you think ? As far as I'm aware, the only real issue in 4.19 was my subtle breakage around setting bus_dma_mask - that's fixed by 6778be4e5209, which according to my inbox got picked up by autosel for 4.19 stable last week. Robin.
Re: [PATCH 1/4] bpf: account for freed JIT allocations in arch code
On Mon, 19 Nov 2018 at 02:37, Daniel Borkmann wrote: > > On 11/17/2018 07:57 PM, Ard Biesheuvel wrote: > > Commit ede95a63b5e84 ("bpf: add bpf_jit_limit knob to restrict unpriv > > allocations") added a call to bpf_jit_uncharge_modmem() to the routine > > bpf_jit_binary_free() which is called from the __weak bpf_jit_free(). > > This function is overridden by arches, some of which do not call > > bpf_jit_binary_free() to release the memory, and so the released > > memory is not accounted for, potentially leading to spurious allocation > > failures. > > > > So replace the direct calls to module_memfree() in the arch code with > > calls to bpf_jit_binary_free(). > > Sorry but this patch is completely buggy, and above description on the > accounting incorrect as well. Looks like this patch was not tested at all. > My apologies. I went off into the weeds a bit looking at different versions for 32-bit and 64-bit on different architectures. So indeed, this patch should be dropped. > The below cBPF JITs that use module_memfree() which you replace with > bpf_jit_binary_free() are using module_alloc() internally to get the JIT > image buffer ... > Indeed. So would you prefer for arm64 to override bpf_jit_free() in its entirety, and not call bpf_jit_binary_free() but simply call bpf_jit_uncharge_modmem() and vfree() directly? It's either that, or we'd have to untangle this a bit, to avoid having one __weak function on top of the other just so other arches can replace the module_memfree() call in bpf_jit_binary_free() with vfree() (which amount to the same thing on arm64 anyway)
Re: [LKP] dad4f140ed [ 7.709376] WARNING:suspicious_RCU_usage
On Sun, Nov 18, 2018 at 05:19:04PM -0800, Matthew Wilcox wrote: > On Mon, Nov 19, 2018 at 09:08:20AM +0800, kernel test robot wrote: > > Greetings, > > > > 0day kernel testing robot got the below dmesg and the first bad commit is > > Umm. I don't see a 'suspicious RCU usage' message in here. I see a > lot of vmalloc warnings ... ? Never mind, the RCU warnings were in one of the attached dmesg files. I found the problem; it's in the test-suite and not something that could lead to any real problems today. Thanks for the report!
Re: [PATCH 00/10] perf/core: Generalise event exclusion checking
On Fri, Nov 16, 2018 at 10:24:03AM +, Andrew Murray wrote: > Many PMU drivers do not have the capability to exclude counting events > that occur in specific contexts such as idle, kernel, guest, etc. These > drivers indicate this by returning an error in their event_init upon > testing the events attribute flags. > > However this approach requires that each time a new event modifier is > added to perf, all the perf drivers need to be modified to indicate that > they don't support the attribute. This results in additional boiler-plate > code common to many drivers that needs to be maintained. An example of > this is the addition of exclude_host and exclude_guest in 2011 yet many > PMU drivers do not support this or indicate an error on events that make > use of it. > > This patch generalises the test for exclusion and updates PMU drivers to > use it. This is a functional change as some PMU drivers will now correctly > report that they don't support certain events whereas they previously did. Right, I like that idea, and yes, there's a lot of fail around there :/ > A longer term approach may instead be for PMU's to advertise their > capabilities on registration. This I think is the better approach. We already have the PERF_PMU_CAP_flags that can be used to advertise various PMU capabilities. Something along these lines I suppose; then every PMU that actually checks the flags, needs to set the flag, otherwise it'll fail. diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 53c500f0ca79..de15723ea52a 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -244,6 +244,7 @@ struct perf_event; #define PERF_PMU_CAP_EXCLUSIVE 0x10 #define PERF_PMU_CAP_ITRACE0x20 #define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x40 +#define PERF_PMU_CAP_EXCLUDE 0x80 /** * struct pmu - generic performance monitoring unit diff --git a/kernel/events/core.c b/kernel/events/core.c index 84530ab358c3..d76b724177b9 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9772,6 +9772,14 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event) if (ctx) perf_event_ctx_unlock(event->group_leader, ctx); + if (!ret) { + if ((pmu->capabilities & PERF_PMU_CAP_EXCLUDE) || + event_has_exclude_flags(event)) { + event->destroy(event); + ret = -EINVAL; + } + } + if (ret) module_put(pmu->module);
Re: [PATCH 01/10] perf/core: Add macro to test for event exclusion flags
On Fri, Nov 16, 2018 at 10:24:04AM +, Andrew Murray wrote: > Add a macro that tests if any of the perf event exclusion flags > are set on a given event. It is in fact an inline function, not a CPP macro. > Signed-off-by: Andrew Murray > --- > include/linux/perf_event.h | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index 53c500f..89ee7fa 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event, > extern void > perf_log_lost_samples(struct perf_event *event, u64 lost); > > +static inline bool event_has_exclude_flags(struct perf_event *event) > +{ > + struct perf_event_attr *attr = &event->attr; > + > + return attr->exclude_idle || attr->exclude_user || > +attr->exclude_kernel || attr->exclude_hv || > +attr->exclude_guest || attr->exclude_host; > +} > + > static inline bool is_sampling_event(struct perf_event *event) > { > return event->attr.sample_period != 0; > -- > 2.7.4 >
[PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
On a signal handler return, the user could set a context with MSR[TS] bits set, and these bits would be copied to task regs->msr. At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set, several __get_user() are called and then a recheckpoint is executed. This is a problem since a page fault (in kernel space) could happen when calling __get_user(). If it happens, the process MSR[TS] bits were already set, but recheckpoint was not executed, and SPRs are still invalid. The page fault can cause the current process to be de-scheduled, with MSR[TS] active and without tm_recheckpoint() being called. More importantly, without TEXAR[FS] bit set also. Since TEXASR might not have the FS bit set, and when the process is scheduled back, it will try to reclaim, which will be aborted because of the CPU is not in the suspended state, and, then, recheckpoint. This recheckpoint will restore thread->texasr into TEXASR SPR, which might be zero, hitting a BUG_ON(). [ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446! This patch simply delays the MSR[TS] set, so, if there is any page fault in the __get_user() section, it does not have regs->msr[TS] set, since the TM structures are still invalid, thus avoiding doing TM operations for in-kernel exceptions and possible process reschedule. With this patch, the MSR[TS] will only be set just before recheckpointing and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in invalid state. It is not possible to move tm_recheckpoint to happen earlier, because it is required to get the checkpointed registers from userspace, with __get_user(), thus, the only way to avoid this undesired behavior is delaying the MSR[TS] set, as done in this patch. Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals") Cc: sta...@vger.kernel.org (v3.9+) Signed-off-by: Breno Leitao --- arch/powerpc/kernel/signal_64.c | 29 +++-- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 83d51bf586c7..15b153bdd826 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, if (MSR_TM_RESV(msr)) return -EINVAL; - /* pull in MSR TS bits from user context */ - regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); - - /* -* Ensure that TM is enabled in regs->msr before we leave the signal -* handler. It could be the case that (a) user disabled the TM bit -* through the manipulation of the MSR bits in uc_mcontext or (b) the -* TM bit was disabled because a sufficient number of context switches -* happened whilst in the signal handler and load_tm overflowed, -* disabling the TM bit. In either case we can end up with an illegal -* TM state leading to a TM Bad Thing when we return to userspace. -*/ - regs->msr |= MSR_TM; - /* pull in MSR LE from user context */ regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE); @@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, tm_enable(); /* Make sure the transaction is marked as failed */ tsk->thread.tm_texasr |= TEXASR_FS; + + /* pull in MSR TS bits from user context */ + regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); + + /* +* Ensure that TM is enabled in regs->msr before we leave the signal +* handler. It could be the case that (a) user disabled the TM bit +* through the manipulation of the MSR bits in uc_mcontext or (b) the +* TM bit was disabled because a sufficient number of context switches +* happened whilst in the signal handler and load_tm overflowed, +* disabling the TM bit. In either case we can end up with an illegal +* TM state leading to a TM Bad Thing when we return to userspace. +*/ + regs->msr |= MSR_TM; + /* This loads the checkpointed FP/VEC state, if used */ tm_recheckpoint(&tsk->thread); -- 2.19.0
Re: [PATCH net-next 2/6] net/bpf: split VLAN_PRESENT bit handling from VLAN_TCI
On 11/10/2018 07:58 PM, Michał Mirosław wrote: > Signed-off-by: Michał Mirosław Why you have empty commit messages for non-trivial changes like this in 4 out of 6 of your patches ... How was it tested on the JITs you were changing? Did you test on both, big and little endian machines? > --- > net/core/filter.c | 40 +--- > 1 file changed, 21 insertions(+), 19 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index e521c5ebc7d1..c151b906df53 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -296,22 +296,21 @@ static u32 convert_skb_access(int skb_field, int > dst_reg, int src_reg, > break; > > case SKF_AD_VLAN_TAG: > - case SKF_AD_VLAN_TAG_PRESENT: > BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_tci) != 2); > - BUILD_BUG_ON(VLAN_TAG_PRESENT != 0x1000); > > /* dst_reg = *(u16 *) (src_reg + offsetof(vlan_tci)) */ > *insn++ = BPF_LDX_MEM(BPF_H, dst_reg, src_reg, > offsetof(struct sk_buff, vlan_tci)); > - if (skb_field == SKF_AD_VLAN_TAG) { > - *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg, > - ~VLAN_TAG_PRESENT); > - } else { > - /* dst_reg >>= 12 */ > - *insn++ = BPF_ALU32_IMM(BPF_RSH, dst_reg, 12); > - /* dst_reg &= 1 */ > +#ifdef VLAN_TAG_PRESENT > + *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg, ~VLAN_TAG_PRESENT); > +#endif > + break; > + case SKF_AD_VLAN_TAG_PRESENT: > + *insn++ = BPF_LDX_MEM(BPF_B, dst_reg, src_reg, > PKT_VLAN_PRESENT_OFFSET()); > + if (PKT_VLAN_PRESENT_BIT) > + *insn++ = BPF_ALU32_IMM(BPF_RSH, dst_reg, > PKT_VLAN_PRESENT_BIT); > + if (PKT_VLAN_PRESENT_BIT < 7) > *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg, 1); > - } > break; > } > > @@ -6140,19 +6139,22 @@ static u32 bpf_convert_ctx_access(enum > bpf_access_type type, > break; > > case offsetof(struct __sk_buff, vlan_present): > + *target_size = 1; > + *insn++ = BPF_LDX_MEM(BPF_B, si->dst_reg, si->src_reg, > + PKT_VLAN_PRESENT_OFFSET()); > + if (PKT_VLAN_PRESENT_BIT) > + *insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, > PKT_VLAN_PRESENT_BIT); > + if (PKT_VLAN_PRESENT_BIT < 7) > + *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, 1); > + break; > + > case offsetof(struct __sk_buff, vlan_tci): > - BUILD_BUG_ON(VLAN_TAG_PRESENT != 0x1000); > - > *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg, > bpf_target_off(struct sk_buff, vlan_tci, > 2, >target_size)); > - if (si->off == offsetof(struct __sk_buff, vlan_tci)) { > - *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, > - ~VLAN_TAG_PRESENT); > - } else { > - *insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, 12); > - *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, 1); > - } > +#ifdef VLAN_TAG_PRESENT > + *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, > ~VLAN_TAG_PRESENT); > +#endif > break; > > case offsetof(struct __sk_buff, cb[0]) ... >
Re: [PATCH] ALSA: aoa: Use device_type helpers to access the node type
On Fri, 16 Nov 2018 23:11:04 +0100, Rob Herring wrote: > > Remove directly accessing device_node.type pointer and use the accessors > instead. This will eventually allow removing the type pointer. > > Replace the open coded iterating over child nodes with > for_each_child_of_node() while we're here. > > Cc: Johannes Berg > Cc: Jaroslav Kysela > Cc: Takashi Iwai > Cc: linuxppc-dev@lists.ozlabs.org > Cc: alsa-de...@alsa-project.org > Signed-off-by: Rob Herring Applied, thanks. Takashi
Re: [PATCH net-next 0/6] Remove VLAN.CFI overload
On 11/10/2018 10:47 PM, David Miller wrote: > From: Michał Mirosław > Date: Sat, 10 Nov 2018 19:58:29 +0100 > >> Fix BPF code/JITs to allow for separate VLAN_PRESENT flag >> storage and finally move the flag to separate storage in skbuff. >> >> This is final step to make CLAN.CFI transparent to core Linux >> networking stack. >> >> An #ifdef is introduced temporarily to mark fragments masking >> VLAN_TAG_PRESENT. This is removed altogether in the final patch. > > Daniel and Alexei, please review. Sorry, was completely swamped due to plumbers, just getting to it now.
Re: [PATCH 1/4] bpf: account for freed JIT allocations in arch code
On 11/17/2018 07:57 PM, Ard Biesheuvel wrote: > Commit ede95a63b5e84 ("bpf: add bpf_jit_limit knob to restrict unpriv > allocations") added a call to bpf_jit_uncharge_modmem() to the routine > bpf_jit_binary_free() which is called from the __weak bpf_jit_free(). > This function is overridden by arches, some of which do not call > bpf_jit_binary_free() to release the memory, and so the released > memory is not accounted for, potentially leading to spurious allocation > failures. > > So replace the direct calls to module_memfree() in the arch code with > calls to bpf_jit_binary_free(). Sorry but this patch is completely buggy, and above description on the accounting incorrect as well. Looks like this patch was not tested at all. The below cBPF JITs that use module_memfree() which you replace with bpf_jit_binary_free() are using module_alloc() internally to get the JIT image buffer ... > Signed-off-by: Ard Biesheuvel > --- > arch/mips/net/bpf_jit.c | 2 +- > arch/powerpc/net/bpf_jit_comp.c | 2 +- > arch/powerpc/net/bpf_jit_comp64.c | 5 + > arch/sparc/net/bpf_jit_comp_32.c | 2 +- > 4 files changed, 4 insertions(+), 7 deletions(-) > > diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c > index 4d8cb9bb8365..1b69897274a1 100644 > --- a/arch/mips/net/bpf_jit.c > +++ b/arch/mips/net/bpf_jit.c > @@ -1264,7 +1264,7 @@ void bpf_jit_compile(struct bpf_prog *fp) > void bpf_jit_free(struct bpf_prog *fp) > { > if (fp->jited) > - module_memfree(fp->bpf_func); > + bpf_jit_binary_free(bpf_jit_binary_hdr(fp)); > > bpf_prog_unlock_free(fp); > } > diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c > index d5bfe24bb3b5..a1ea1ea6b40d 100644 > --- a/arch/powerpc/net/bpf_jit_comp.c > +++ b/arch/powerpc/net/bpf_jit_comp.c > @@ -683,7 +683,7 @@ void bpf_jit_compile(struct bpf_prog *fp) > void bpf_jit_free(struct bpf_prog *fp) > { > if (fp->jited) > - module_memfree(fp->bpf_func); > + bpf_jit_binary_free(bpf_jit_binary_hdr(fp)); > > bpf_prog_unlock_free(fp); > } > diff --git a/arch/powerpc/net/bpf_jit_comp64.c > b/arch/powerpc/net/bpf_jit_comp64.c > index 50b129785aee..84c8f013a6c6 100644 > --- a/arch/powerpc/net/bpf_jit_comp64.c > +++ b/arch/powerpc/net/bpf_jit_comp64.c > @@ -1024,11 +1024,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog > *fp) > /* Overriding bpf_jit_free() as we don't set images read-only. */ > void bpf_jit_free(struct bpf_prog *fp) > { > - unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; > - struct bpf_binary_header *bpf_hdr = (void *)addr; > - > if (fp->jited) > - bpf_jit_binary_free(bpf_hdr); > + bpf_jit_binary_free(bpf_jit_binary_hdr(fp)); > > bpf_prog_unlock_free(fp); > } > diff --git a/arch/sparc/net/bpf_jit_comp_32.c > b/arch/sparc/net/bpf_jit_comp_32.c > index a5ff88643d5c..01bda6bc9e7f 100644 > --- a/arch/sparc/net/bpf_jit_comp_32.c > +++ b/arch/sparc/net/bpf_jit_comp_32.c > @@ -759,7 +759,7 @@ cond_branch: f_offset = addrs[i + > filter[i].jf]; > void bpf_jit_free(struct bpf_prog *fp) > { > if (fp->jited) > - module_memfree(fp->bpf_func); > + bpf_jit_binary_free(bpf_jit_binary_hdr(fp)); > > bpf_prog_unlock_free(fp); > } >