Re: [PATCH 5/5] dma-direct: always allow dma mask <= physiscal memory size

2018-11-19 Thread Ramon Fried
On Mon, Nov 19, 2018 at 5:50 PM Robin Murphy  wrote:
>
> On 19/11/2018 14:18, Ramon Fried wrote:
> > On Tue, Oct 9, 2018 at 8:02 AM Benjamin Herrenschmidt
> >  wrote:
> >>
> >> On Wed, 2018-10-03 at 16:10 -0700, Alexander Duyck wrote:
>  -* Because 32-bit DMA masks are so common we expect every 
>  architecture
>  -* to be able to satisfy them - either by not supporting more 
>  physical
>  -* memory, or by providing a ZONE_DMA32.  If neither is the 
>  case, the
>  -* architecture needs to use an IOMMU instead of the direct 
>  mapping.
>  -*/
>  -   if (mask < phys_to_dma(dev, DMA_BIT_MASK(32)))
>  +   u64 min_mask;
>  +
>  +   if (IS_ENABLED(CONFIG_ZONE_DMA))
>  +   min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS);
>  +   else
>  +   min_mask = DMA_BIT_MASK(32);
>  +
>  +   min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT);
>  +
>  +   if (mask >= phys_to_dma(dev, min_mask))
>    return 0;
>  -#endif
>    return 1;
> }
> >>>
> >>> So I believe I have run into the same issue that Guenter reported. On
> >>> an x86_64 system w/ Intel IOMMU. I wasn't able to complete boot and
> >>> all probe attempts for various devices were failing with -EIO errors.
> >>>
> >>> I believe the last mask check should be "if (mask < phys_to_dma(dev,
> >>> min_mask))" not a ">=" check.
> >>
> >> Right, that test is backwards. I needed to change it here too (powermac
> >> with the rest of the powerpc series).
> >>
> >> Cheers,
> >> Ben.
> >>
> >>
> >
> > Hi, I'm working on a MIPS64 soc with PCIe root complex on it, and it
> > appears that this series of patches are causing all PCI drivers that
> > request 64bit mask to fail with -5.
> > It's broken in 4.19. However, I just checked, it working on master.
> > We may need to backport a couple of patches to 4.19. I'm not sure
> > though which patches should be backported as there were at least 10
> > patches resolving this dma_direct area recently.
> > Christoph, Robin.
> > Can we ask Greg to backport all these changes ? What do you think ?
>
> As far as I'm aware, the only real issue in 4.19 was my subtle breakage
> around setting bus_dma_mask - that's fixed by 6778be4e5209, which
> according to my inbox got picked up by autosel for 4.19 stable last week.
>
> Robin.
Yep, 6778be4e5209 fixes the issue.
Thanks a lot !
Ramon.


Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status

2018-11-19 Thread Russell Currey
On Tue, 2018-11-20 at 17:55 +1100, Alexey Kardashevskiy wrote:
> 
> On 20/11/2018 14:51, Sam Bobroff wrote:
> > On Tue, Nov 20, 2018 at 01:51:06PM +1100, Michael Ellerman wrote:
> > > Alexey Kardashevskiy  writes:
> > > 
> > > > The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS
> > > > call in
> > > > skiboot's NPU driver does not touch the pci_error_type
> > > > parameter so
> > > > it might have garbage but the powernv code analyzes it
> > > > nevertheless.
> > > > 
> > > > This initializes pcierr and fstate to zero in all call sites.
> > > > 
> > > > Signed-off-by: Alexey Kardashevskiy 
> > > > ---
> > > 
> > > Can we tag this with a Fixes? And seems like it should probably
> > > go to
> > > stable, or can we not trigger this path on older kernels?
> > > 
> > > cheers
> > 
> > Hmm, it's triggered by use on an NPU PE so that would be any kernel
> > that
> > can run on P8 or later (AFAIK).
> > 
> > It looks like the issue was present earlier, but the code was last
> > touched when it was moved, in...
> > 
> > 40ae5f693f6a ("powerpc/powernv: Drop PHB operation get_state()")
> > 
> > ... which was back in v4.1.
> 
> The original commit is
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c41a7f3f7593fe57
> 
> 2013-06-20 13:21:09 +0800
> ===
> powerpc/eeh: I/O chip EEH state retrieval
> 
> The patch adds I/O chip backend to retrieve the state for the
> indicated PE. While the PE state is temperarily unavailable,
> the upper layer (powernv platform) should return default delay
> (1 second).
> ===
> 
> This was 3.10-rc5 (this is what that sha1's Makefile has).
> 
> But this was not the issue till skiboot decided not to zero these
> parameters so "Fixes:" should point to what?

It should still be that commit, it's perfectly reasonable (however
unlikely) that someone could be running a 3.10 kernel with a modern
skiboot.

> 
> 
> > Sam.
> > 
> > > > Without this, this happens:
> > > > 
> > > > pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
> > > > EEH: PHB#6 failure detected, location: N/A
> > > > CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 4.19.0-
> > > > le_f5a7bb7_aikATfstn1-p1 torvalds#106
> > > > Call Trace:
> > > > [c03fea9df9c0] [c0a990ec] dump_stack+0xb0/0xf4
> > > > (unreliable)
> > > > [c03fea9dfa00] [c0038480]
> > > > eeh_dev_check_failure+0x1f0/0x5f0
> > > > [c03fea9dfaa0] [c00a2768]
> > > > pnv_pci_read_config+0x128/0x160
> > > > [c03fea9dfae0] [c05d2b0c]
> > > > pci_bus_read_config_dword+0x9c/0xf0
> > > > [c03fea9dfb40] [c05df3d4] pci_save_state+0x64/0x250
> > > > [c03fea9dfbc0] [c05e0730]
> > > > pci_dev_save_and_disable+0x70/0xa0
> > > > [c03fea9dfbf0] [c05e4078]
> > > > pci_try_reset_function+0x48/0xc0
> > > > [c03fea9dfc20] [c0081cbc1b1c]
> > > > vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
> > > > [c03fea9dfcf0] [c0081ca9046c]
> > > > vfio_device_fops_unl_ioctl+0x44/0x70 [vfio]
> > > > [c03fea9dfd10] [c039fd84] do_vfs_ioctl+0xd4/0xa00
> > > > [c03fea9dfdb0] [c03a07b4] ksys_ioctl+0x104/0x120
> > > > [c03fea9dfe00] [c03a07f8] sys_ioctl+0x28/0x80
> > > > [c03fea9dfe20] [c000b3a4] system_call+0x5c/0x70
> > > > EEH: Detected error on PHB#6
> > > > EEH: This PCI device has failed 1 times in the last hour and
> > > > will be permanently disabled after 5 fail
> > > > ures.
> > > > EEH: Notify device drivers to shutdown
> > > > EEH: Beginning: 'error_detected(IO frozen)'
> > > > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > > > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > > > EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci-
> > > > >error_detected(IO frozen)
> > > > EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can
> > > > recover'
> > > > EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci-
> > > > >error_detected(IO frozen)
> > > > EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can
> > > > recover'
> > > > EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci-
> > > > >error_detected(IO frozen)
> > > > EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can
> > > > recover'
> > > > EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci-
> > > > >error_detected(IO frozen)
> > > > EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can
> > > > recover'
> > > > EEH: Finished:'error_detected(IO frozen)' with aggregate
> > > > recovery state:'can recover'
> > > > EEH: Collect temporary log
> > > > pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
> > > > EEH: Reset without hotplug activity
> > > > iommu: Removing device 0006:00:01.0 from group 4
> > > > iommu: Removing device 0006:00:01.1 from group 4
> > > > iommu: Removing device 0006:00:02.0 from group 4
> > > > iommu: Removing device 0006:00:02.1 from group 4
> > > > pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> > > > pnv_eeh_restore_config: Can't reinit 

Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status

2018-11-19 Thread Alexey Kardashevskiy



On 20/11/2018 14:51, Sam Bobroff wrote:
> On Tue, Nov 20, 2018 at 01:51:06PM +1100, Michael Ellerman wrote:
>> Alexey Kardashevskiy  writes:
>>
>>> The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
>>> skiboot's NPU driver does not touch the pci_error_type parameter so
>>> it might have garbage but the powernv code analyzes it nevertheless.
>>>
>>> This initializes pcierr and fstate to zero in all call sites.
>>>
>>> Signed-off-by: Alexey Kardashevskiy 
>>> ---
>>
>> Can we tag this with a Fixes? And seems like it should probably go to
>> stable, or can we not trigger this path on older kernels?
>>
>> cheers
> 
> Hmm, it's triggered by use on an NPU PE so that would be any kernel that
> can run on P8 or later (AFAIK).
> 
> It looks like the issue was present earlier, but the code was last
> touched when it was moved, in...
> 
> 40ae5f693f6a ("powerpc/powernv: Drop PHB operation get_state()")
> 
> ... which was back in v4.1.


The original commit is
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c41a7f3f7593fe57

2013-06-20 13:21:09 +0800
===
powerpc/eeh: I/O chip EEH state retrieval

The patch adds I/O chip backend to retrieve the state for the
indicated PE. While the PE state is temperarily unavailable,
the upper layer (powernv platform) should return default delay
(1 second).
===

This was 3.10-rc5 (this is what that sha1's Makefile has).

But this was not the issue till skiboot decided not to zero these
parameters so "Fixes:" should point to what?


> 
> Sam.
> 
>>> Without this, this happens:
>>>
>>> pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
>>> EEH: PHB#6 failure detected, location: N/A
>>> CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 
>>> 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106
>>> Call Trace:
>>> [c03fea9df9c0] [c0a990ec] dump_stack+0xb0/0xf4 (unreliable)
>>> [c03fea9dfa00] [c0038480] eeh_dev_check_failure+0x1f0/0x5f0
>>> [c03fea9dfaa0] [c00a2768] pnv_pci_read_config+0x128/0x160
>>> [c03fea9dfae0] [c05d2b0c] pci_bus_read_config_dword+0x9c/0xf0
>>> [c03fea9dfb40] [c05df3d4] pci_save_state+0x64/0x250
>>> [c03fea9dfbc0] [c05e0730] pci_dev_save_and_disable+0x70/0xa0
>>> [c03fea9dfbf0] [c05e4078] pci_try_reset_function+0x48/0xc0
>>> [c03fea9dfc20] [c0081cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
>>> [c03fea9dfcf0] [c0081ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 
>>> [vfio]
>>> [c03fea9dfd10] [c039fd84] do_vfs_ioctl+0xd4/0xa00
>>> [c03fea9dfdb0] [c03a07b4] ksys_ioctl+0x104/0x120
>>> [c03fea9dfe00] [c03a07f8] sys_ioctl+0x28/0x80
>>> [c03fea9dfe20] [c000b3a4] system_call+0x5c/0x70
>>> EEH: Detected error on PHB#6
>>> EEH: This PCI device has failed 1 times in the last hour and will be 
>>> permanently disabled after 5 fail
>>> ures.
>>> EEH: Notify device drivers to shutdown
>>> EEH: Beginning: 'error_detected(IO frozen)'
>>> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
>>> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
>>> EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen)
>>> EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover'
>>> EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen)
>>> EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover'
>>> EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen)
>>> EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover'
>>> EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen)
>>> EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover'
>>> EEH: Finished:'error_detected(IO frozen)' with aggregate recovery 
>>> state:'can recover'
>>> EEH: Collect temporary log
>>> pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
>>> EEH: Reset without hotplug activity
>>> iommu: Removing device 0006:00:01.0 from group 4
>>> iommu: Removing device 0006:00:01.1 from group 4
>>> iommu: Removing device 0006:00:02.0 from group 4
>>> iommu: Removing device 0006:00:02.1 from group 4
>>> pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
>>> EEH: Sleep 5s ahead of partial hotplug
>>> pci 0004:04 : [PE# 00] Setting 

Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status

2018-11-19 Thread Sam Bobroff
On Tue, Nov 20, 2018 at 01:51:06PM +1100, Michael Ellerman wrote:
> Alexey Kardashevskiy  writes:
> 
> > The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
> > skiboot's NPU driver does not touch the pci_error_type parameter so
> > it might have garbage but the powernv code analyzes it nevertheless.
> >
> > This initializes pcierr and fstate to zero in all call sites.
> >
> > Signed-off-by: Alexey Kardashevskiy 
> > ---
> 
> Can we tag this with a Fixes? And seems like it should probably go to
> stable, or can we not trigger this path on older kernels?
> 
> cheers

Hmm, it's triggered by use on an NPU PE so that would be any kernel that
can run on P8 or later (AFAIK).

It looks like the issue was present earlier, but the code was last
touched when it was moved, in...

40ae5f693f6a ("powerpc/powernv: Drop PHB operation get_state()")

... which was back in v4.1.

Sam.

> > Without this, this happens:
> >
> > pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
> > EEH: PHB#6 failure detected, location: N/A
> > CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 
> > 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106
> > Call Trace:
> > [c03fea9df9c0] [c0a990ec] dump_stack+0xb0/0xf4 (unreliable)
> > [c03fea9dfa00] [c0038480] eeh_dev_check_failure+0x1f0/0x5f0
> > [c03fea9dfaa0] [c00a2768] pnv_pci_read_config+0x128/0x160
> > [c03fea9dfae0] [c05d2b0c] pci_bus_read_config_dword+0x9c/0xf0
> > [c03fea9dfb40] [c05df3d4] pci_save_state+0x64/0x250
> > [c03fea9dfbc0] [c05e0730] pci_dev_save_and_disable+0x70/0xa0
> > [c03fea9dfbf0] [c05e4078] pci_try_reset_function+0x48/0xc0
> > [c03fea9dfc20] [c0081cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
> > [c03fea9dfcf0] [c0081ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 
> > [vfio]
> > [c03fea9dfd10] [c039fd84] do_vfs_ioctl+0xd4/0xa00
> > [c03fea9dfdb0] [c03a07b4] ksys_ioctl+0x104/0x120
> > [c03fea9dfe00] [c03a07f8] sys_ioctl+0x28/0x80
> > [c03fea9dfe20] [c000b3a4] system_call+0x5c/0x70
> > EEH: Detected error on PHB#6
> > EEH: This PCI device has failed 1 times in the last hour and will be 
> > permanently disabled after 5 fail
> > ures.
> > EEH: Notify device drivers to shutdown
> > EEH: Beginning: 'error_detected(IO frozen)'
> > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover'
> > EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover'
> > EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover'
> > EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover'
> > EEH: Finished:'error_detected(IO frozen)' with aggregate recovery 
> > state:'can recover'
> > EEH: Collect temporary log
> > pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
> > EEH: Reset without hotplug activity
> > iommu: Removing device 0006:00:01.0 from group 4
> > iommu: Removing device 0006:00:01.1 from group 4
> > iommu: Removing device 0006:00:02.0 from group 4
> > iommu: Removing device 0006:00:02.1 from group 4
> > pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> > EEH: Sleep 5s ahead of partial hotplug
> > pci 0004:04 : [PE# 00] Setting up window#0 0..3fff pg=1000
> > pci 0004:05 : [PE# 18] Setting up window#0 0..3fff pg=1000
> > pci 0004:06 : [PE# 30] Setting up window#0 0..3fff pg=1000
> > pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fff pg=1000
> > pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fff pg=1000
> > pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fff pg=1000
> > EEH: Beginning: 'slot_reset'
> > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > EEH: Finished:'slot_reset' with aggregate recovery state:'none'
> > EEH: Notify device driver to resume
> > EEH: B

Re: [PATCH net] net/ibmnvic: Fix deadlock problem in reset

2018-11-19 Thread David Miller
From: Juliet Kim 
Date: Mon, 19 Nov 2018 15:59:22 -0600

> This patch changes to use rtnl_lock only during a reset to avoid
> deadlock that could occur when a thread operating close is holding
> rtnl_lock and waiting for reset_lock acquired by another thread,
> which is waiting for rtnl_lock in order to set the number of tx/rx
> queues during a reset.
> 
> Also, we now setting the number of tx/rx queues during a soft reset
> for failover or LPM events.
> 
> Signed-off-by: Juliet Kim 

Applied.


Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status

2018-11-19 Thread Michael Ellerman
Alexey Kardashevskiy  writes:

> The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
> skiboot's NPU driver does not touch the pci_error_type parameter so
> it might have garbage but the powernv code analyzes it nevertheless.
>
> This initializes pcierr and fstate to zero in all call sites.
>
> Signed-off-by: Alexey Kardashevskiy 
> ---

Can we tag this with a Fixes? And seems like it should probably go to
stable, or can we not trigger this path on older kernels?

cheers

> Without this, this happens:
>
> pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
> EEH: PHB#6 failure detected, location: N/A
> CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 
> 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106
> Call Trace:
> [c03fea9df9c0] [c0a990ec] dump_stack+0xb0/0xf4 (unreliable)
> [c03fea9dfa00] [c0038480] eeh_dev_check_failure+0x1f0/0x5f0
> [c03fea9dfaa0] [c00a2768] pnv_pci_read_config+0x128/0x160
> [c03fea9dfae0] [c05d2b0c] pci_bus_read_config_dword+0x9c/0xf0
> [c03fea9dfb40] [c05df3d4] pci_save_state+0x64/0x250
> [c03fea9dfbc0] [c05e0730] pci_dev_save_and_disable+0x70/0xa0
> [c03fea9dfbf0] [c05e4078] pci_try_reset_function+0x48/0xc0
> [c03fea9dfc20] [c0081cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
> [c03fea9dfcf0] [c0081ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 
> [vfio]
> [c03fea9dfd10] [c039fd84] do_vfs_ioctl+0xd4/0xa00
> [c03fea9dfdb0] [c03a07b4] ksys_ioctl+0x104/0x120
> [c03fea9dfe00] [c03a07f8] sys_ioctl+0x28/0x80
> [c03fea9dfe20] [c000b3a4] system_call+0x5c/0x70
> EEH: Detected error on PHB#6
> EEH: This PCI device has failed 1 times in the last hour and will be 
> permanently disabled after 5 fail
> ures.
> EEH: Notify device drivers to shutdown
> EEH: Beginning: 'error_detected(IO frozen)'
> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover'
> EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover'
> EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover'
> EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover'
> EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'can 
> recover'
> EEH: Collect temporary log
> pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
> EEH: Reset without hotplug activity
> iommu: Removing device 0006:00:01.0 from group 4
> iommu: Removing device 0006:00:01.1 from group 4
> iommu: Removing device 0006:00:02.0 from group 4
> iommu: Removing device 0006:00:02.1 from group 4
> pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> EEH: Sleep 5s ahead of partial hotplug
> pci 0004:04 : [PE# 00] Setting up window#0 0..3fff pg=1000
> pci 0004:05 : [PE# 18] Setting up window#0 0..3fff pg=1000
> pci 0004:06 : [PE# 30] Setting up window#0 0..3fff pg=1000
> pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fff pg=1000
> pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fff pg=1000
> pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fff pg=1000
> EEH: Beginning: 'slot_reset'
> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> EEH: Finished:'slot_reset' with aggregate recovery state:'none'
> EEH: Notify device driver to resume
> EEH: Beginning: 'resume'
> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> EEH: Finished:'resume'
> EEH: Recovery successful.
> ---
>  arch/powerpc/platforms/powernv/eeh-powernv.c | 8 
>  arch/powerpc/platforms/powernv/pci-ioda.c| 4 ++--
>  arch/powerpc/platforms/powernv/pci.c | 4 ++--
>  3 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
> b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index abc0be7..f380

Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Sinan Kaya

On 11/19/2018 6:49 PM, alex_gagn...@dellteam.com wrote:

On 11/19/2018 02:33 PM, Sinan Kaya wrote:

However; table assumes governance about for which entities firmware first
should be enabled. There is no cross reference to _OSC or permission
negotiation like _OST.


Well, from an OSPM perspective, is FFS something that can be enabled or
disabled? FFS seems to be static to OSPM, which would change the sort of
assumptions we can reasonably make here.


IMO, it can be enabled/disabled in BIOS. I have seen this implementation before.
If the trigger is the presence of a statically compiled ACPI HEST table (as the
current code does); presence of FFS would be static from OSPM perspective.
BIOS could patch this table or hide it during boot.

If FFS were to be negotiated via _OSC as indirectly implied in this series, then
same BIOS could patch the ACPI table to return different values for the _OSC
return.





As I said in my previous email, the right place to talk about this is UEFI
forum.


The way I would present the problem to he spec writers is that, although
the spec appears to be consistent, we've seen firmware vendors that made
the wrong assumptions about HEST/_OSC. Instead of describing AER
ownership with _OSC, they attempted to do it with HEST. So we should add
an implementation note, or clarification about this.


I agree.


Cool. While the UEFI Secret Society debates, can we figure out if/how
[patch 1/2] breaks those systems, or is it only [patch 2/2] of this
series that we suspect?


I went back and looked at both patches. Both of them are removing references to
HEST table. I think both patches are impacted by this discussion.



Alex





Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Alex_Gagniuc
On 11/19/2018 02:33 PM, Sinan Kaya wrote:
> True. I was trying to get it out in a rush. I omitted words.

Sounds like you'd make an top notch spec writer! :p

> However; table assumes governance about for which entities firmware first
> should be enabled. There is no cross reference to _OSC or permission
> negotiation like _OST.

Well, from an OSPM perspective, is FFS something that can be enabled or 
disabled? FFS seems to be static to OSPM, which would change the sort of 
assumptions we can reasonably make here.


>>> As I said in my previous email, the right place to talk about this is UEFI
>>> forum.
>>
>> The way I would present the problem to he spec writers is that, although
>> the spec appears to be consistent, we've seen firmware vendors that made
>> the wrong assumptions about HEST/_OSC. Instead of describing AER
>> ownership with _OSC, they attempted to do it with HEST. So we should add
>> an implementation note, or clarification about this.
> 
> I agree.

Cool. While the UEFI Secret Society debates, can we figure out if/how 
[patch 1/2] breaks those systems, or is it only [patch 2/2] of this 
series that we suspect?

Alex


Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint

2018-11-19 Thread Michael Neuling
On Mon, 2018-11-19 at 10:44 -0200, Breno Leitao wrote:
> On a signal handler return, the user could set a context with MSR[TS] bits
> set, and these bits would be copied to task regs->msr.
> 
> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
> several __get_user() are called and then a recheckpoint is executed.

Do we have the same problem on the entry side?  There are a bunch of
copy_from_user() before we do a tm_reclaim() (ie when still transactional). Is
there some chance of the same problem there?

> This is a problem since a page fault (in kernel space) could happen when
> calling __get_user(). If it happens, the process MSR[TS] bits were
> already set, but recheckpoint was not executed, and SPRs are still invalid.
> 
> The page fault can cause the current process to be de-scheduled, with
> MSR[TS] active and without tm_recheckpoint() being called.  More
> importantly, without TEXAR[FS] bit set also.

This patch is great and should go in ASAP 

> Since TEXASR might not have the FS bit set, and when the process is
> scheduled back, it will try to reclaim, which will be aborted because of
> the CPU is not in the suspended state, and, then, recheckpoint. This
> recheckpoint will restore thread->texasr into TEXASR SPR, which might be
> zero, hitting a BUG_ON().
> 
>   [ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!

Can you put more of the backtrace here?  Can be useful for people searching for
a similar problem.

> This patch simply delays the MSR[TS] set, so, if there is any page fault in
> the __get_user() section, it does not have regs->msr[TS] set, since the TM
> structures are still invalid, thus avoiding doing TM operations for
> in-kernel exceptions and possible process reschedule.

Can you put a comment in the code what says after the MSR[TS] setting, there can
be no get/put_user before the recheckpoint? 

Also, it looks like the 32bit signals case is safe, but please add a comment in
there too.

> With this patch, the MSR[TS] will only be set just before recheckpointing
> and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
> invalid state.
> 
> It is not possible to move tm_recheckpoint to happen earlier, because it is
> required to get the checkpointed registers from userspace, with
> __get_user(), thus, the only way to avoid this undesired behavior is
> delaying the MSR[TS] set, as done in this patch.
> 
> Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
> Cc: sta...@vger.kernel.org (v3.9+)
> Signed-off-by: Breno Leitao 
> ---
>  arch/powerpc/kernel/signal_64.c | 29 +++--
>  1 file changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 83d51bf586c7..15b153bdd826 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct 
> *tsk,
>   if (MSR_TM_RESV(msr))
>   return -EINVAL;
>  
> - /* pull in MSR TS bits from user context */
> - regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> -
> - /*
> -  * Ensure that TM is enabled in regs->msr before we leave the signal
> -  * handler. It could be the case that (a) user disabled the TM bit
> -  * through the manipulation of the MSR bits in uc_mcontext or (b) the
> -  * TM bit was disabled because a sufficient number of context switches
> -  * happened whilst in the signal handler and load_tm overflowed,
> -  * disabling the TM bit. In either case we can end up with an illegal
> -  * TM state leading to a TM Bad Thing when we return to userspace.
> -  */
> - regs->msr |= MSR_TM;
> -
>   /* pull in MSR LE from user context */
>   regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
>  
> @@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct 
> *tsk,
>   tm_enable();
>   /* Make sure the transaction is marked as failed */
>   tsk->thread.tm_texasr |= TEXASR_FS;
> +
> + /* pull in MSR TS bits from user context */
> + regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> +
> + /*
> +  * Ensure that TM is enabled in regs->msr before we leave the signal
> +  * handler. It could be the case that (a) user disabled the TM bit
> +  * through the manipulation of ararararthe MSR bits in uc_mcontext or 
> (b) the
> +  * TM bit was disabled because a sufficient number of context switches
> +  * happened whilst in the signal handler and load_tm overflowed,
> +  * disabling the TM bit. In either case we can end up with an illegal
> +  * TM state leading to a TM Bad Thing when we return to userspace.
> +  */
> + regs->msr |= MSR_TM;
> +
>   /* This loads the checkpointed FP/VEC state, if used */
>   tm_recheckpoint(&tsk->thread);
>  


Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status

2018-11-19 Thread Sam Bobroff
On Mon, Nov 19, 2018 at 03:25:17PM +1100, Alexey Kardashevskiy wrote:
> The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
> skiboot's NPU driver does not touch the pci_error_type parameter so
> it might have garbage but the powernv code analyzes it nevertheless.
> 
> This initializes pcierr and fstate to zero in all call sites.
> 
> Signed-off-by: Alexey Kardashevskiy 

The skiboot code for npu2_freeze_status() is just:

*freeze_state = OPAL_EEH_STOPPED_NOT_FROZEN;
return OPAL_SUCCESS;

It looks like npu_freeze_status() is affected as well. I think we
could consider these skiboot bugs, but this still seems like an
improvement.

Reviewed-by: Sam Bobroff 

> ---
> 
> Without this, this happens:
> 
> pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
> EEH: PHB#6 failure detected, location: N/A
> CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 
> 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106
> Call Trace:
> [c03fea9df9c0] [c0a990ec] dump_stack+0xb0/0xf4 (unreliable)
> [c03fea9dfa00] [c0038480] eeh_dev_check_failure+0x1f0/0x5f0
> [c03fea9dfaa0] [c00a2768] pnv_pci_read_config+0x128/0x160
> [c03fea9dfae0] [c05d2b0c] pci_bus_read_config_dword+0x9c/0xf0
> [c03fea9dfb40] [c05df3d4] pci_save_state+0x64/0x250
> [c03fea9dfbc0] [c05e0730] pci_dev_save_and_disable+0x70/0xa0
> [c03fea9dfbf0] [c05e4078] pci_try_reset_function+0x48/0xc0
> [c03fea9dfc20] [c0081cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
> [c03fea9dfcf0] [c0081ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 
> [vfio]
> [c03fea9dfd10] [c039fd84] do_vfs_ioctl+0xd4/0xa00
> [c03fea9dfdb0] [c03a07b4] ksys_ioctl+0x104/0x120
> [c03fea9dfe00] [c03a07f8] sys_ioctl+0x28/0x80
> [c03fea9dfe20] [c000b3a4] system_call+0x5c/0x70
> EEH: Detected error on PHB#6
> EEH: This PCI device has failed 1 times in the last hour and will be 
> permanently disabled after 5 fail
> ures.
> EEH: Notify device drivers to shutdown
> EEH: Beginning: 'error_detected(IO frozen)'
> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover'
> EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover'
> EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover'
> EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover'
> EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'can 
> recover'
> EEH: Collect temporary log
> pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
> EEH: Reset without hotplug activity
> iommu: Removing device 0006:00:01.0 from group 4
> iommu: Removing device 0006:00:01.1 from group 4
> iommu: Removing device 0006:00:02.0 from group 4
> iommu: Removing device 0006:00:02.1 from group 4
> pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> EEH: Sleep 5s ahead of partial hotplug
> pci 0004:04 : [PE# 00] Setting up window#0 0..3fff pg=1000
> pci 0004:05 : [PE# 18] Setting up window#0 0..3fff pg=1000
> pci 0004:06 : [PE# 30] Setting up window#0 0..3fff pg=1000
> pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fff pg=1000
> pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fff pg=1000
> pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fff pg=1000
> EEH: Beginning: 'slot_reset'
> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> EEH: Finished:'slot_reset' with aggregate recovery state:'none'
> EEH: Notify device driver to resume
> EEH: Beginning: 'resume'
> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> EEH: Finished:'resume'
> EEH: Recovery successful.
> ---
>  arch/powerpc/platforms/powernv/eeh-powernv.c | 8 
>  arch/powerpc/platforms/powernv/pci-ioda.c| 4 ++--
>  arch/powerpc/platforms/po

Re: [PATCH v2] powerpc/math-emu: Update macros from gmp-6.1.2

2018-11-19 Thread Joel Stanley
On Tue, 20 Nov 2018 at 05:24, Nick Desaulniers  wrote:
> >
> > The only functional change I noticed was this in udiv_qrnnd.
> >
> > __r1 = (n1) % __d1;
> > __q1 = (n1) / __d1;
> >
> > Becomes this:
> >
> > __q1 = (n1) / __d1;
> > __r1 = (n1) - __q1 * __d1;
> >
> > This is equivalent as it instead of calculating the remainder
> > using modulo, it uses the result of integer division to subtract the
> > count of 'whole' d1 from r1.
>
> I don't understand this; why was this functional change made?

I couldn't see why. It pre-dates GMP's mecurial history that  .
  \
> > -(q) = (UWtype) __q1 * __ll_B | __q0;   \
> > +(q) = __q1 * __ll_B | __q0;
> > \
> >  (r) = __r0;
> > \
> >} while (0)
>
> This appears to now differ from the upstream source:
> https://github.com/gcc-mirror/gcc/blob/f7289f563a5e447ac21a5901ed75aad0dbd37732/include/longlong.h#L1679

AIUI the upstream source is GMP:

 https://gmplib.org/repo/gmp/file/tip/longlong.h

> If we're going to borrow implementations from GCC, let's borrow the
> same implementation. Otherwise it's hard to have confidence in this
> part of the patch.

I agree we should use the upstream source.

Segher, which tree contains the One True Upstream for longlong.h?

Cheers,

Joel


Re: [PATCH v02] powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update

2018-11-19 Thread kbuild test robot
Hi Michael,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.20-rc3 next-20181119]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Michael-Bringmann/powerpc-numa-Perform-full-re-add-of-CPU-for-PRRN-VPHN-topology-update/20181119-224033
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All error/warnings (new ones prefixed by >>):

   arch/powerpc/mm/numa.c: In function 'numa_update_cpu_topology':
>> arch/powerpc/mm/numa.c:1361:4: error: implicit declaration of function 
>> 'dlpar_cpu_readd'; did you mean 'raw_cpu_read'? 
>> [-Werror=implicit-function-declaration]
   dlpar_cpu_readd(cpu);
   ^~~
   raw_cpu_read
   arch/powerpc/mm/numa.c: In function 'topology_schedule_update':
>> arch/powerpc/mm/numa.c:1451:2: warning: this 'if' clause does not guard... 
>> [-Wmisleading-indentation]
 if (!topology_update_in_progress);
 ^~
   arch/powerpc/mm/numa.c:1452:3: note: ...this statement, but the latter is 
misleadingly indented as if it were guarded by the 'if'
  schedule_work(&topology_work);
  ^
   cc1: some warnings being treated as errors

vim +1361 arch/powerpc/mm/numa.c

  1298  
  1299  /*
  1300   * Update the node maps and sysfs entries for each cpu whose home node
  1301   * has changed. Returns 1 when the topology has changed, and 0 
otherwise.
  1302   *
  1303   * readd_cpus: Also readd any CPUs that have changed affinity
  1304   */
  1305  static int numa_update_cpu_topology(bool readd_cpus)
  1306  {
  1307  unsigned int cpu, sibling, changed = 0;
  1308  struct topology_update_data *updates, *ud;
  1309  cpumask_t updated_cpus;
  1310  struct device *dev;
  1311  int weight, new_nid, i = 0;
  1312  
  1313  if ((!prrn_enabled && !vphn_enabled && topology_inited) ||
  1314  topology_update_in_progress)
  1315  return 0;
  1316  
  1317  weight = cpumask_weight(&cpu_associativity_changes_mask);
  1318  if (!weight)
  1319  return 0;
  1320  
  1321  updates = kcalloc(weight, sizeof(*updates), GFP_KERNEL);
  1322  if (!updates)
  1323  return 0;
  1324  
  1325  topology_update_in_progress = 1;
  1326  
  1327  cpumask_clear(&updated_cpus);
  1328  
  1329  for_each_cpu(cpu, &cpu_associativity_changes_mask) {
  1330  /*
  1331   * If siblings aren't flagged for changes, updates list
  1332   * will be too short. Skip on this update and set for 
next
  1333   * update.
  1334   */
  1335  if (!cpumask_subset(cpu_sibling_mask(cpu),
  1336  
&cpu_associativity_changes_mask)) {
  1337  pr_info("Sibling bits not set for associativity 
"
  1338  "change, cpu%d\n", cpu);
  1339  cpumask_or(&cpu_associativity_changes_mask,
  1340  &cpu_associativity_changes_mask,
  1341  cpu_sibling_mask(cpu));
  1342  cpu = cpu_last_thread_sibling(cpu);
  1343  continue;
  1344  }
  1345  
  1346  new_nid = find_and_online_cpu_nid(cpu);
  1347  
  1348  if ((new_nid == numa_cpu_lookup_table[cpu]) ||
  1349  !cpu_present(cpu)) {
  1350  cpumask_andnot(&cpu_associativity_changes_mask,
  1351  &cpu_associativity_changes_mask,
  1352  cpu_sibling_mask(cpu));
  1353  if (cpu_present(cpu))
  1354  dbg("Assoc chg gives same node %d for 
cpu%d\n",
  1355  new_nid, cpu);
  1356  cpu = cpu_last_thread_sibling(cpu);
  1357  continue;
  1358  }
  1359  
  1360  if (readd_cpus)
> 1361  dlpar_cpu_readd(cpu);
  1362  
  1363  for_each_cpu(sibli

[PATCH net] net/ibmnvic: Fix deadlock problem in reset

2018-11-19 Thread Juliet Kim
This patch changes to use rtnl_lock only during a reset to avoid
deadlock that could occur when a thread operating close is holding
rtnl_lock and waiting for reset_lock acquired by another thread,
which is waiting for rtnl_lock in order to set the number of tx/rx
queues during a reset.

Also, we now setting the number of tx/rx queues during a soft reset
for failover or LPM events.

Signed-off-by: Juliet Kim 
---
 drivers/net/ethernet/ibm/ibmvnic.c |   59 +---
 drivers/net/ethernet/ibm/ibmvnic.h |2 +
 2 files changed, 22 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 7893bef..4a5de59 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1103,20 +1103,15 @@ static int ibmvnic_open(struct net_device *netdev)
return 0;
}
 
-   mutex_lock(&adapter->reset_lock);
-
if (adapter->state != VNIC_CLOSED) {
rc = ibmvnic_login(netdev);
-   if (rc) {
-   mutex_unlock(&adapter->reset_lock);
+   if (rc)
return rc;
-   }
 
rc = init_resources(adapter);
if (rc) {
netdev_err(netdev, "failed to initialize resources\n");
release_resources(adapter);
-   mutex_unlock(&adapter->reset_lock);
return rc;
}
}
@@ -1124,8 +1119,6 @@ static int ibmvnic_open(struct net_device *netdev)
rc = __ibmvnic_open(netdev);
netif_carrier_on(netdev);
 
-   mutex_unlock(&adapter->reset_lock);
-
return rc;
 }
 
@@ -1269,10 +1262,8 @@ static int ibmvnic_close(struct net_device *netdev)
return 0;
}
 
-   mutex_lock(&adapter->reset_lock);
rc = __ibmvnic_close(netdev);
ibmvnic_cleanup(netdev);
-   mutex_unlock(&adapter->reset_lock);
 
return rc;
 }
@@ -1820,20 +1811,15 @@ static int do_reset(struct ibmvnic_adapter *adapter,
return rc;
} else if (adapter->req_rx_queues != old_num_rx_queues ||
   adapter->req_tx_queues != old_num_tx_queues) {
-   adapter->map_id = 1;
release_rx_pools(adapter);
release_tx_pools(adapter);
-   rc = init_rx_pools(netdev);
-   if (rc)
-   return rc;
-   rc = init_tx_pools(netdev);
-   if (rc)
-   return rc;
-
release_napi(adapter);
-   rc = init_napi(adapter);
+   release_vpd_data(adapter);
+
+   rc = init_resources(adapter);
if (rc)
return rc;
+
} else {
rc = reset_tx_pools(adapter);
if (rc)
@@ -1917,17 +1903,8 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter,
adapter->state = VNIC_PROBED;
return 0;
}
-   /* netif_set_real_num_xx_queues needs to take rtnl lock here
-* unless wait_for_reset is set, in which case the rtnl lock
-* has already been taken before initializing the reset
-*/
-   if (!adapter->wait_for_reset) {
-   rtnl_lock();
-   rc = init_resources(adapter);
-   rtnl_unlock();
-   } else {
-   rc = init_resources(adapter);
-   }
+
+   rc = init_resources(adapter);
if (rc)
return rc;
 
@@ -1986,13 +1963,21 @@ static void __ibmvnic_reset(struct work_struct *work)
struct ibmvnic_rwi *rwi;
struct ibmvnic_adapter *adapter;
struct net_device *netdev;
+   bool we_lock_rtnl = false;
u32 reset_state;
int rc = 0;
 
adapter = container_of(work, struct ibmvnic_adapter, ibmvnic_reset);
netdev = adapter->netdev;
 
-   mutex_lock(&adapter->reset_lock);
+   /* netif_set_real_num_xx_queues needs to take rtnl lock here
+* unless wait_for_reset is set, in which case the rtnl lock
+* has already been taken before initializing the reset
+*/
+   if (!adapter->wait_for_reset) {
+   rtnl_lock();
+   we_lock_rtnl = true;
+   }
reset_state = adapter->state;
 
rwi = get_next_rwi(adapter);
@@ -2020,12 +2005,11 @@ static void __ibmvnic_reset(struct work_struct *work)
if (rc) {
netdev_dbg(adapter->netdev, "Reset failed\n");
free_all_rwi(adapter);
-   mutex_unlock(&adapter->reset_lock);
-   return;
}
 
adapter->resetting = false;
-   mutex_unlock(&adapter->reset_lock);
+   if (we_lock

Re: [PATCH 4/9] PCI: consolidate PCI config entry in drivers/pci

2018-11-19 Thread Paul Burton
Hi Christoph,

On Thu, Nov 15, 2018 at 08:05:32PM +0100, Christoph Hellwig wrote:
> There is no good reason to duplicate the PCI menu in every architecture.
> Instead provide a selectable HAVE_PCI symbol that indicates availability
> of PCI support, and a FORCE_PCI symbol to for PCI on and the handle the
> rest in drivers/pci.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Palmer Dabbelt 
> Acked-by: Max Filippov 
> Acked-by: Thomas Gleixner 
> Acked-by: Bjorn Helgaas 
> Acked-by: Geert Uytterhoeven 

For patches 4, 5, 7, 8 & 9:

Acked-by: Paul Burton  # MIPS parts

Thanks,
Paul


[PATCH v3] powerpc/ptrace: replace ptrace_report_syscall() with a tracehook call

2018-11-19 Thread Dmitry V. Levin
From: Elvira Khabirova 

Arch code should use tracehook_*() helpers as documented
in include/linux/tracehook.h,
ptrace_report_syscall() is not expected to be used outside that file.

Co-authored-by: Dmitry V. Levin 
Fixes: 5521eb4bca2d ("powerpc/ptrace: Add support for PTRACE_SYSEMU")
Signed-off-by: Elvira Khabirova 
Signed-off-by: Dmitry V. Levin 
---

v3: add a descriptive comment
v2: explicitly ignore tracehook_report_syscall_entry() return code

 arch/powerpc/kernel/ptrace.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index afb819f4ca68..e84220d91bbd 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -3266,7 +3266,12 @@ long do_syscall_trace_enter(struct pt_regs *regs)
user_exit();
 
if (test_thread_flag(TIF_SYSCALL_EMU)) {
-   ptrace_report_syscall(regs);
+   /*
+* A nonzero return code from tracehook_report_syscall_entry()
+* tells us to prevent the syscall execution, but we are not
+* going to execute it anyway.
+*/
+   (void) tracehook_report_syscall_entry(regs);
/*
 * Returning -1 will skip the syscall execution. We want to
 * avoid clobbering any register also, thus, not 'gotoing'
-- 
ldv


Re: [PATCH 3/9] MIPS: remove the HT_PCI config option

2018-11-19 Thread Paul Burton
On Mon, Nov 19, 2018 at 01:01:41PM -0800, Paul Burton wrote:
> On Thu, Nov 15, 2018 at 08:05:31PM +0100, Christoph Hellwig wrote:
> > This option is always selected from LOONGSON_MACH3X.  Switch to just
> > seleting PCI from that option and definining LOONGSON_PCIIO_BASE based
> > on CONFIG_LOONGSON_MACH3X.
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  arch/mips/Kconfig| 11 ---
> >  arch/mips/include/asm/mach-loongson64/loongson.h |  2 +-
> >  arch/mips/loongson64/Kconfig |  2 +-
> >  3 files changed, 2 insertions(+), 13 deletions(-)
> > 
> > diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> > index 8272ea4c7264..7d28c9dd75d0 100644
> > --- a/arch/mips/Kconfig
> > +++ b/arch/mips/Kconfig
> > @@ -3040,17 +3040,6 @@ config PCI
> >   your box. Other bus systems are ISA, EISA, or VESA. If you have PCI,
> >   say Y, otherwise N.
> >  
> > -config HT_PCI
> > -   bool "Support for HT-linked PCI"
> > -   default y
> > -   depends on CPU_LOONGSON3
> > -   select PCI
> > -   select PCI_DOMAINS
> > -   help
> > - Loongson family machines use Hyper-Transport bus for inter-core
> > - connection and device connection. The PCI bus is a subordinate
> > - linked at HT. Choose Y for Loongson-3 based machines.
> > -
> >  config PCI_DOMAINS
> > bool
> >  
> > diff --git a/arch/mips/loongson64/Kconfig b/arch/mips/loongson64/Kconfig
> > index c865b4b9b775..781a5156ab21 100644
> > --- a/arch/mips/loongson64/Kconfig
> > +++ b/arch/mips/loongson64/Kconfig
> > @@ -76,7 +76,7 @@ config LOONGSON_MACH3X
> > select CPU_HAS_WB
> > select HW_HAS_PCI
> > select ISA
> > -   select HT_PCI
> > +   select PCI
> > select I8259
> > select IRQ_MIPS_CPU
> > select NR_CPUS_DEFAULT_4
> 
> Should this also select PCI_DOMAINS to preserve the existing behavior?
> 
> If not, could you explain why in the commit message?

Ah, I see - PCI already selects PCI_DOMAINS. I think it would have been
worth mentioning but I don't mind if you don't think it a big enough
deal to respin the patch, so:

Acked-by: Paul Burton 

Thanks,
Paul


Re: [PATCH 3/9] MIPS: remove the HT_PCI config option

2018-11-19 Thread Paul Burton
Hi Christoph,

On Thu, Nov 15, 2018 at 08:05:31PM +0100, Christoph Hellwig wrote:
> This option is always selected from LOONGSON_MACH3X.  Switch to just
> seleting PCI from that option and definining LOONGSON_PCIIO_BASE based
> on CONFIG_LOONGSON_MACH3X.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/mips/Kconfig| 11 ---
>  arch/mips/include/asm/mach-loongson64/loongson.h |  2 +-
>  arch/mips/loongson64/Kconfig |  2 +-
>  3 files changed, 2 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 8272ea4c7264..7d28c9dd75d0 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -3040,17 +3040,6 @@ config PCI
> your box. Other bus systems are ISA, EISA, or VESA. If you have PCI,
> say Y, otherwise N.
>  
> -config HT_PCI
> - bool "Support for HT-linked PCI"
> - default y
> - depends on CPU_LOONGSON3
> - select PCI
> - select PCI_DOMAINS
> - help
> -   Loongson family machines use Hyper-Transport bus for inter-core
> -   connection and device connection. The PCI bus is a subordinate
> -   linked at HT. Choose Y for Loongson-3 based machines.
> -
>  config PCI_DOMAINS
>   bool
>  
> diff --git a/arch/mips/loongson64/Kconfig b/arch/mips/loongson64/Kconfig
> index c865b4b9b775..781a5156ab21 100644
> --- a/arch/mips/loongson64/Kconfig
> +++ b/arch/mips/loongson64/Kconfig
> @@ -76,7 +76,7 @@ config LOONGSON_MACH3X
>   select CPU_HAS_WB
>   select HW_HAS_PCI
>   select ISA
> - select HT_PCI
> + select PCI
>   select I8259
>   select IRQ_MIPS_CPU
>   select NR_CPUS_DEFAULT_4

Should this also select PCI_DOMAINS to preserve the existing behavior?

If not, could you explain why in the commit message?

Thanks,
Paul


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Tyler Baicar
On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya  wrote:
>
> On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:
> > I've asked around a few people at Dell and they unanimously agree that
> > _OSC is the correct way to determine ownership of AER. In linux, we
> > use the result of _OSC to enable AER services, but we use HEST to
> > determine AER ownership. That's inconsistent. This series drops the
> > use of HEST in favor of _OSC.
> >
> > [1]https://lkml.org/lkml/2018/11/15/62
>
> This change breaks the existing systems that rely on the HEST table
> telling the operating system about firmware first presence.
>
> Besides, HEST table has much more granularity about which PCI component
> needs firmware such as global/device/switch.
>
> You should probably circulate these ideas for wider consumption in UEFI
> forum as UEFI owns the HEST table definition.

I agree with Sinan, this will break existing systems, and the granularity of the
HEST definition is more useful than the single bit in _OSC.


Re: [PATCH 5/5] dma-direct: always allow dma mask <= physiscal memory size

2018-11-19 Thread Ramon Fried
On Tue, Oct 9, 2018 at 8:02 AM Benjamin Herrenschmidt
 wrote:
>
> On Wed, 2018-10-03 at 16:10 -0700, Alexander Duyck wrote:
> > > -* Because 32-bit DMA masks are so common we expect every 
> > > architecture
> > > -* to be able to satisfy them - either by not supporting more 
> > > physical
> > > -* memory, or by providing a ZONE_DMA32.  If neither is the case, 
> > > the
> > > -* architecture needs to use an IOMMU instead of the direct 
> > > mapping.
> > > -*/
> > > -   if (mask < phys_to_dma(dev, DMA_BIT_MASK(32)))
> > > +   u64 min_mask;
> > > +
> > > +   if (IS_ENABLED(CONFIG_ZONE_DMA))
> > > +   min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS);
> > > +   else
> > > +   min_mask = DMA_BIT_MASK(32);
> > > +
> > > +   min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT);
> > > +
> > > +   if (mask >= phys_to_dma(dev, min_mask))
> > >  return 0;
> > > -#endif
> > >  return 1;
> > >   }
> >
> > So I believe I have run into the same issue that Guenter reported. On
> > an x86_64 system w/ Intel IOMMU. I wasn't able to complete boot and
> > all probe attempts for various devices were failing with -EIO errors.
> >
> > I believe the last mask check should be "if (mask < phys_to_dma(dev,
> > min_mask))" not a ">=" check.
>
> Right, that test is backwards. I needed to change it here too (powermac
> with the rest of the powerpc series).
>
> Cheers,
> Ben.
>
>

Hi, I'm working on a MIPS64 soc with PCIe root complex on it, and it
appears that this series of patches are causing all PCI drivers that
request 64bit mask to fail with -5.
It's broken in 4.19. However, I just checked, it working on master.
We may need to backport a couple of patches to 4.19. I'm not sure
though which patches should be backported as there were at least 10
patches resolving this dma_direct area recently.
Christoph, Robin.
Can we ask Greg to backport all these changes ? What do you think ?

Thanks,
Ramon.


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Sinan Kaya

On 11/19/2018 3:16 PM, alex_gagn...@dellteam.com wrote:

On 11/19/2018 01:32 PM, Sinan Kaya wrote:

ACPI 6.2:

18.3.2.4 PCI Express Root Port AER Structure

Flags:

Bit [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that system
firmware will handle errors from this source first.
Bit [1] - GLOBAL: If set, indicates that the settings contained in this
structure apply globally to all PCI Express Devices.
All other bits must be set to zero.

It doesn't say shall, may or might. It says will.


It says "system firmware will handle errors". It does not say "system
firmware owns AER registers". In absence on any descriptor text on the
meaning of these tables, this really looks to me like it should be
interpreted as a descriptor of APEI error sources, not a mutex on who
writes to certain bits-- AER in this case.


True. I was trying to get it out in a rush. I omitted words.

However; table assumes governance about for which entities firmware first
should be enabled. There is no cross reference to _OSC or permission
negotiation like _OST.



I don't think that is contradictory or inconsistent.
I also wasn't able to find any reference to HEST in UEFI 2.7, only in
ACPI spec.


You are right. It was a confusion on my side. The right place to look is
ACPI specification. I was involved in this a couple of years ago. Some pieces
were in UEFI spec. Other pieces were in ACPI. I guess they got unified
now.




I think It depends on your PCI topology.

For other topologies with multiple PCI root complexes, I can see this being
used per root complex flag to indicate which root complex needs firmware first
and which one doesn't.


_OSC is per root bus, so it's already granular enough, right? Why would
it depend on PCI topology?



I was speculating. I don't have the full background on this. Need to consult
the spec developers.


As I said in my previous email, the right place to talk about this is UEFI
forum.


The way I would present the problem to he spec writers is that, although
the spec appears to be consistent, we've seen firmware vendors that made
the wrong assumptions about HEST/_OSC. Instead of describing AER
ownership with _OSC, they attempted to do it with HEST. So we should add
an implementation note, or clarification about this.


I agree.



Alex





Re: [PATCH net-next 2/6] net/bpf: split VLAN_PRESENT bit handling from VLAN_TCI

2018-11-19 Thread Michał Mirosław
On Mon, Nov 19, 2018 at 12:26:46PM +0100, Daniel Borkmann wrote:
> On 11/10/2018 07:58 PM, Michał Mirosław wrote:
> > Signed-off-by: Michał Mirosław 
> 
> Why you have empty commit messages for non-trivial changes like this in
> 4 out of 6 of your patches ...
> 
> How was it tested on the JITs you were changing? Did you test on both,
> big and little endian machines?

I have only x86 boxes currently so didn't try to test others. I hope
upstreaming the series through net-next will allow us to find any
fallouts, if any. The changes are very simple, though: they move
code around (the "splitting" part) and eventually change a vlan_present
flag's position in a skbuff structure. Dependency on CPU endianness is
removed by using byte loads for the flag.

Best Regards,
Michał Mirosław


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Alex_Gagniuc
On 11/19/2018 01:32 PM, Sinan Kaya wrote:
> ACPI 6.2:
> 
> 18.3.2.4 PCI Express Root Port AER Structure
> 
> Flags:
> 
> Bit [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that system
> firmware will handle errors from this source first.
> Bit [1] - GLOBAL: If set, indicates that the settings contained in this
> structure apply globally to all PCI Express Devices.
> All other bits must be set to zero.
> 
> It doesn't say shall, may or might. It says will.

It says "system firmware will handle errors". It does not say "system 
firmware owns AER registers". In absence on any descriptor text on the 
meaning of these tables, this really looks to me like it should be 
interpreted as a descriptor of APEI error sources, not a mutex on who 
writes to certain bits-- AER in this case.

I don't think that is contradictory or inconsistent.
I also wasn't able to find any reference to HEST in UEFI 2.7, only in 
ACPI spec.

> I think It depends on your PCI topology.
> 
> For other topologies with multiple PCI root complexes, I can see this being
> used per root complex flag to indicate which root complex needs firmware first
> and which one doesn't.

_OSC is per root bus, so it's already granular enough, right? Why would 
it depend on PCI topology?


>> I'd like see how exactly we break one of those elusive systems with _OSC. I
>> suspect _OSC and HEST end up having the same information, and that's why we
>> didn't see any real-life issue with mixing the approaches.
> 
> I'm already aware of two systems that rely on HEST table to pass information 
> to
> the OS that firmware first is enabled. Both of the systems do not change their
> _OSC bits during this assuming HEST table has priority over _OSC for firmware
> first.

Are those hax86 systems?
It seems like the systems have broken firmware. I see several ways to 
handle broken systems like those:
  - Parse both HEST and _OSC, and decide AER ownership with root bridge 
granularity. i.e. host_bridge->native_aer is authoritative, but is 
derived from both HEST and _OSC
  - Add quirks for the broken systems
  - Keep doing what we're doing until current code breaks a new system

> If we add this patch, OS will try to claim the AER address space while 
> firmware
> wants exclusive access.

Yay! FFS wants exclusive access, but does not claim it. Oh, FFS!


> As I said in my previous email, the right place to talk about this is UEFI
> forum.

The way I would present the problem to he spec writers is that, although 
the spec appears to be consistent, we've seen firmware vendors that made 
the wrong assumptions about HEST/_OSC. Instead of describing AER 
ownership with _OSC, they attempted to do it with HEST. So we should add 
an implementation note, or clarification about this.

Alex


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Sinan Kaya

UEFI HEST table specification also claims that it should be the ultimate
table for when PCI firmware-first should be disabled/enabled.


IIRC, EFI absorbed ACPI before FFS was a thing. Could you point me to the UEFI 
chapter that says HEST is authoritative?
(not being a smartie, just that my free software PDF readers can't search within 
a file that large)




ACPI 6.2:

18.3.2.4 PCI Express Root Port AER Structure

Flags:

Bit [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that system 
firmware will handle errors from this source first.
Bit [1] - GLOBAL: If set, indicates that the settings contained in this 
structure apply globally to all PCI Express Devices.

All other bits must be set to zero.

It doesn't say shall, may or might. It says will.




I think somebody needs to fix these. I saw an email from Harb Abdulhamid
sent to aswg this morning.

That's why I suggested circulating this idea in UEFI forums first.
Let's see what everybody thinks. We can go from there.


However you look at it, we have a glaring inconsistency in how we handle AER 
control in linux. I'm surprised we didn't see huge issues because of mixing 
HEST/_OSC.


What systems rely on the HEST definition as opposed to _OSC? It doesn't make 
sense to me that you could have a system with mixed FFS and native AER on the 
same root port. The granularity of HEST shouldn't matter here because of how AER 
works.


I think It depends on your PCI topology.

For other topologies with multiple PCI root complexes, I can see this being
used per root complex flag to indicate which root complex needs firmware first
and which one doesn't.



I'd like see how exactly we break one of those elusive systems with _OSC. I 
suspect _OSC and HEST end up having the same information, and that's why we 
didn't see any real-life issue with mixing the approaches.


I'm already aware of two systems that rely on HEST table to pass information to
the OS that firmware first is enabled. Both of the systems do not change their
_OSC bits during this assuming HEST table has priority over _OSC for firmware
first.

If we add this patch, OS will try to claim the AER address space while firmware
wants exclusive access.

As I said in my previous email, the right place to talk about this is UEFI
forum.



Alex


P.S.
(SARCASM ALERT) Isn't UEFI is a pile of stuff that didn't stick to the wall?

P.P.S
I remember someone asking why we don't disable FFS in the BIOS. I think it would 
be next to impossible to get certain platform vendors to relinquish FFS control, 
unless the specs required things to be that way -- and had a "standard" way to 
do so.


Then getting the specs to change in that direction is also a battle.





Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Alex G.

On 11/19/2018 12:24 PM, Sinan Kaya wrote:

On 11/19/2018 1:10 PM, Keith Busch wrote:
We can't really turn off firmware first in the kernel without asking 
help

from the firmware.

The _OSC method this patch utilizes is the ACPI spec defined way for
the kernel to wrest control from firmware. BIOS specific menu settings
shouldn't be our only recourse when we have a spec authority defining
generic OS interfaces to accomplish the same thing.

Unless there is a disagreement on the _OSC interpreation, we'd have to
accept that platforms breaking from this patch are non-compliant.



It depends on which spec you look :)

UEFI HEST table specification also claims that it should be the ultimate
table for when PCI firmware-first should be disabled/enabled.


IIRC, EFI absorbed ACPI before FFS was a thing. Could you point me to 
the UEFI chapter that says HEST is authoritative?
(not being a smartie, just that my free software PDF readers can't 
search within a file that large)




I think somebody needs to fix these. I saw an email from Harb Abdulhamid
sent to aswg this morning.

That's why I suggested circulating this idea in UEFI forums first.
Let's see what everybody thinks. We can go from there.


However you look at it, we have a glaring inconsistency in how we handle 
AER control in linux. I'm surprised we didn't see huge issues because of 
mixing HEST/_OSC.


What systems rely on the HEST definition as opposed to _OSC? It doesn't 
make sense to me that you could have a system with mixed FFS and native 
AER on the same root port. The granularity of HEST shouldn't matter here 
because of how AER works.


I'd like see how exactly we break one of those elusive systems with 
_OSC. I suspect _OSC and HEST end up having the same information, and 
that's why we didn't see any real-life issue with mixing the approaches.


Alex


P.S.
(SARCASM ALERT) Isn't UEFI is a pile of stuff that didn't stick to the wall?

P.P.S
I remember someone asking why we don't disable FFS in the BIOS. I think 
it would be next to impossible to get certain platform vendors to 
relinquish FFS control, unless the specs required things to be that way 
-- and had a "standard" way to do so.


Then getting the specs to change in that direction is also a battle.


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Sinan Kaya

On 11/19/2018 1:10 PM, Keith Busch wrote:

We can't really turn off firmware first in the kernel without asking help
from the firmware.

The _OSC method this patch utilizes is the ACPI spec defined way for
the kernel to wrest control from firmware. BIOS specific menu settings
shouldn't be our only recourse when we have a spec authority defining
generic OS interfaces to accomplish the same thing.

Unless there is a disagreement on the _OSC interpreation, we'd have to
accept that platforms breaking from this patch are non-compliant.



It depends on which spec you look :)

UEFI HEST table specification also claims that it should be the ultimate
table for when PCI firmware-first should be disabled/enabled.

I think somebody needs to fix these. I saw an email from Harb Abdulhamid
sent to aswg this morning.

That's why I suggested circulating this idea in UEFI forums first.
Let's see what everybody thinks. We can go from there.


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Keith Busch
On Mon, Nov 19, 2018 at 12:56:56PM -0500, Sinan Kaya wrote:
> On 11/19/2018 12:41 PM, Keith Busch wrote:
> > > Still, breaking existing systems that rely on HEST table is not cool.
> > > I'd rather have users specify "pcie_ports=native" to skip FF rather than
> > > having broken systems by default to be honest.
> > The pcie_ports=native work-around ignores FF to potentially unknown
> > results, though.
> > 
> 
> How about being able to enable/disable FF in BIOS?
>
> We can't really turn off firmware first in the kernel without asking help
> from the firmware.

The _OSC method this patch utilizes is the ACPI spec defined way for
the kernel to wrest control from firmware. BIOS specific menu settings
shouldn't be our only recourse when we have a spec authority defining
generic OS interfaces to accomplish the same thing.

Unless there is a disagreement on the _OSC interpreation, we'd have to
accept that platforms breaking from this patch are non-compliant.

> Like you said, it causes unpredictable results.
>
> There will be two competing software trying to touch the same registers.


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Sinan Kaya

On 11/19/2018 12:41 PM, Keith Busch wrote:

Still, breaking existing systems that rely on HEST table is not cool.
I'd rather have users specify "pcie_ports=native" to skip FF rather than
having broken systems by default to be honest.

The pcie_ports=native work-around ignores FF to potentially unknown
results, though.



How about being able to enable/disable FF in BIOS?

We can't really turn off firmware first in the kernel without asking help
from the firmware. Like you said, it causes unpredictable results.

There will be two competing software trying to touch the same registers.


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Keith Busch
On Mon, Nov 19, 2018 at 12:42:25PM -0500, Sinan Kaya wrote:
> On 11/19/2018 12:32 PM, Sinan Kaya wrote:
> > > 
> > > But we're not using HEST as a fine grain control. We disable native AER
> > > handling if *any* device has FF set in HEST, and that just forces people
> > > to use pcie_ports=native to get around that.
> > > 
> > 
> > I don't see *any* in the code.  aer_hest_parse() does the HEST table 
> > parsing.
> > It switches to firmware first mode if global flag in HEST is set. Otherwise
> > for each BDF in device, hest_match_pci() is used to do a cross-matching 
> > against
> > HEST table contents.
> > 
> > Am I missing something?
> 
> I see. I think you are talking about aer_firmware_first, right?
> 
> aer_set_firmware_first() and  pcie_aer_get_firmware_first() seem to do the 
> right
> thing.

Right, but what difference does it make if device specific AER checks do
the right thing if pcie_aer_init() doesn't even register it's port driver?
 
> aer_firmware_first is probably getting set because events are all routed to a
> single root port and aer_acpi_firmware_first() is used to decide whether AER
> should be initialized or not.
> 
> I think I understand what is going on now.
> 
> Still, breaking existing systems that rely on HEST table is not cool.
> I'd rather have users specify "pcie_ports=native" to skip FF rather than
> having broken systems by default to be honest.

The pcie_ports=native work-around ignores FF to potentially unknown
results, though.


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Sinan Kaya

On 11/19/2018 12:32 PM, Sinan Kaya wrote:


But we're not using HEST as a fine grain control. We disable native AER
handling if *any* device has FF set in HEST, and that just forces people
to use pcie_ports=native to get around that.



I don't see *any* in the code.  aer_hest_parse() does the HEST table parsing.
It switches to firmware first mode if global flag in HEST is set. Otherwise
for each BDF in device, hest_match_pci() is used to do a cross-matching against
HEST table contents.

Am I missing something?


I see. I think you are talking about aer_firmware_first, right?

aer_set_firmware_first() and  pcie_aer_get_firmware_first() seem to do the right
thing.

aer_firmware_first is probably getting set because events are all routed to a
single root port and aer_acpi_firmware_first() is used to decide whether AER
should be initialized or not.

I think I understand what is going on now.

Still, breaking existing systems that rely on HEST table is not cool.
I'd rather have users specify "pcie_ports=native" to skip FF rather than
having broken systems by default to be honest.


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Keith Busch
On Mon, Nov 19, 2018 at 12:32:42PM -0500, Sinan Kaya wrote:
> On 11/19/2018 11:53 AM, Keith Busch wrote:
> > On Mon, Nov 19, 2018 at 11:53:05AM -0500, Tyler Baicar wrote:
> > > On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya  wrote:
> > > > 
> > > > On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:
> > > > > I've asked around a few people at Dell and they unanimously agree that
> > > > > _OSC is the correct way to determine ownership of AER. In linux, we
> > > > > use the result of _OSC to enable AER services, but we use HEST to
> > > > > determine AER ownership. That's inconsistent. This series drops the
> > > > > use of HEST in favor of _OSC.
> > > > > 
> > > > > [1]https://lkml.org/lkml/2018/11/15/62
> > > > 
> > > > This change breaks the existing systems that rely on the HEST table
> > > > telling the operating system about firmware first presence.
> > > > 
> > > > Besides, HEST table has much more granularity about which PCI component
> > > > needs firmware such as global/device/switch.
> > > > 
> > > > You should probably circulate these ideas for wider consumption in UEFI
> > > > forum as UEFI owns the HEST table definition.
> > > 
> > > I agree with Sinan, this will break existing systems, and the granularity 
> > > of the
> > > HEST definition is more useful than the single bit in _OSC.
> > 
> > But we're not using HEST as a fine grain control. We disable native AER
> > handling if *any* device has FF set in HEST, and that just forces people
> > to use pcie_ports=native to get around that.
> > 
> 
> I don't see *any* in the code.  aer_hest_parse() does the HEST table parsing.
> It switches to firmware first mode if global flag in HEST is set. Otherwise
> for each BDF in device, hest_match_pci() is used to do a cross-matching 
> against
> HEST table contents.
> 
> Am I missing something?

You might be. :)

static int aer_hest_parse(struct acpi_hest_header *hest_hdr, void *data)
{

/*
 * If no specific device is supplied, determine whether
 * FIRMWARE_FIRST is set for *any* PCIe device.
 */
if (!info->pci_dev) {
info->firmware_first |= ff;
return 0;
}



Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Sinan Kaya

On 11/19/2018 11:53 AM, Keith Busch wrote:

On Mon, Nov 19, 2018 at 11:53:05AM -0500, Tyler Baicar wrote:

On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya  wrote:


On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:

I've asked around a few people at Dell and they unanimously agree that
_OSC is the correct way to determine ownership of AER. In linux, we
use the result of _OSC to enable AER services, but we use HEST to
determine AER ownership. That's inconsistent. This series drops the
use of HEST in favor of _OSC.

[1]https://lkml.org/lkml/2018/11/15/62


This change breaks the existing systems that rely on the HEST table
telling the operating system about firmware first presence.

Besides, HEST table has much more granularity about which PCI component
needs firmware such as global/device/switch.

You should probably circulate these ideas for wider consumption in UEFI
forum as UEFI owns the HEST table definition.


I agree with Sinan, this will break existing systems, and the granularity of the
HEST definition is more useful than the single bit in _OSC.


But we're not using HEST as a fine grain control. We disable native AER
handling if *any* device has FF set in HEST, and that just forces people
to use pcie_ports=native to get around that.



I don't see *any* in the code.  aer_hest_parse() does the HEST table parsing.
It switches to firmware first mode if global flag in HEST is set. Otherwise
for each BDF in device, hest_match_pci() is used to do a cross-matching against
HEST table contents.

Am I missing something?


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Keith Busch
On Mon, Nov 19, 2018 at 11:53:05AM -0500, Tyler Baicar wrote:
> On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya  wrote:
> >
> > On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:
> > > I've asked around a few people at Dell and they unanimously agree that
> > > _OSC is the correct way to determine ownership of AER. In linux, we
> > > use the result of _OSC to enable AER services, but we use HEST to
> > > determine AER ownership. That's inconsistent. This series drops the
> > > use of HEST in favor of _OSC.
> > >
> > > [1]https://lkml.org/lkml/2018/11/15/62
> >
> > This change breaks the existing systems that rely on the HEST table
> > telling the operating system about firmware first presence.
> >
> > Besides, HEST table has much more granularity about which PCI component
> > needs firmware such as global/device/switch.
> >
> > You should probably circulate these ideas for wider consumption in UEFI
> > forum as UEFI owns the HEST table definition.
> 
> I agree with Sinan, this will break existing systems, and the granularity of 
> the
> HEST definition is more useful than the single bit in _OSC.

But we're not using HEST as a fine grain control. We disable native AER
handling if *any* device has FF set in HEST, and that just forces people
to use pcie_ports=native to get around that.


Re: [PATCH v2 3/4] powerpc: add system call table generation support

2018-11-19 Thread Arnd Bergmann
On Wed, Nov 14, 2018 at 11:04 AM Firoz Khan  wrote:

> Adding a new table entry consisting of:
> - System call number.
> - ABI.
> - System call name.
> - Entry point name.
> - Compat entry name, if required.
>
> syscallhdr.sh and syscalltbl.sh will generate uapi header-
> unistd_32/64.h and syscall_table_32/64/c32.h files respect-
> ively. File syscall_table_32/64/c32.h is included by sys-
> call.S - the real system call table. Both .sh files will
> parse the content syscall.tbl to generate the header and
> table files.

You don't mention how this handles the SPU, which seems to be the main
difference from the other architectures.

> +# The format is:
> +#  
> +#
> +# The  can be common, 64, or 32 for this file.
> +#
> +0  common  restart_syscall sys_restart_syscall   
>   sys_restart_syscall
> +1  common  exitsys_exit  
>   sys_exit
> +2  common  forkppc_fork  
>   ppc_fork
> +3  common  readsys_read  
>   sys_readsys_read
> +4  common  write   sys_write 
>   sys_write   sys_write
> +5  common  opensys_open  
>   compat_sys_open sys_open
> +6  common  close   sys_close 
>   sys_close   sys_close
> +7  common  waitpid sys_waitpid   
>   sys_waitpid sys_waitpid
> +8  common  creat   sys_creat 
>   sys_creat   sys_creat

The SPU syscall is always the same as the 64-bit syscall, so listing it
explictily in the last column seems to add a lot of duplication, and
makes the format different from the other architectures.

Have you considered using the  field (second column) to decide whether
a syscall is used for SPU or not instead?

I would have done it like

|+0  nospu  restart_syscall sys_restart_syscall
 sys_restart_syscall
|+1  nospu  exitsys_exit
 sys_exit
|+2  nospu  forkppc_fork
 ppc_fork
|+3  common  readsys_read
  sys_read
|+4  common  write   sys_write
  sys_write
|+5  common  opensys_open
  compat_sys_open
|+6  common  close   sys_close
  sys_close
...
|+29132  fstatat64   sys_fstatat64
  sys_fstatat64
|+29164  newfstatat  sys_newfstatat
|+291spu  newfstatat  sys_newfstatat
...

with 'nospu' meaning 64+32+compat.

> +9  common  linksys_link  
>   sys_linksys_link
> +10 common  unlink  sys_unlink
>   sys_unlink  sys_unlink
> +11 common  execve  sys_execve
>   compat_sys_execve
> +12 common  chdir   sys_chdir 
>   sys_chdir   sys_chdir
> +13 common  timesys_time  
>   compat_sys_time sys_time
> +14 common  mknod   sys_mknod 
>   sys_mknod   sys_mknod
> +15 common  chmod   sys_chmod 
>   sys_chmod   sys_chmod
> +16 common  lchown  sys_lchown
>   sys_lchown  sys_lchown
> +17 common  break   sys_ni_syscall
>   sys_ni_syscall
> +18 32  oldstat sys_stat  
>   sys_ni_syscall
> +18 64  oldstat sys_ni_syscall

'oldstat' seems odd. Your conversion is correct, but the existing
behavior of not
providing support for the syscall in compat mode feels wrong. We have four
calls in this category:

arch/powerpc/include/asm/systbl.h:OLDSYS(stat)
arch/powerpc/include/asm/systbl.h:OLDSYS(fstat)
arch/powerpc/include/asm/systbl.h:OLDSYS(lstat)
arch/powerpc/include/asm/systbl.h:OLDSYS(debug_setcontext)

For the first three, it seems that the 64-bit kernel ought to set
'__ARCH_WANT_OLD_STAT':

diff --git a/arch/powerpc/include/asm/unistd.h
b/arch/powerpc/include/asm/unistd.h
index 96ddce5c76c3..335dfcc47f20 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ 

Re: [PATCH v2 2/4] powerpc: move macro definition from asm/systbl.h

2018-11-19 Thread Arnd Bergmann
On Wed, Nov 14, 2018 at 11:04 AM Firoz Khan  wrote:

> diff --git a/arch/powerpc/include/asm/systbl.h 
> b/arch/powerpc/include/asm/systbl.h
> index 01b5171..c4321b9 100644
> --- a/arch/powerpc/include/asm/systbl.h
> +++ b/arch/powerpc/include/asm/systbl.h
> @@ -76,7 +76,6 @@
>  SYSCALL_SPU(ssetmask)
>  SYSCALL_SPU(setreuid)
>  SYSCALL_SPU(setregid)
> -#define compat_sys_sigsuspend sys_sigsuspend
>  SYS32ONLY(sigsuspend)

I think the macro here is just a workaround for the fact that SYS32ONLY()
always prepends the name with 'compat_' for the compat version, and there
is no other macro to do this. After the conversion, this can easily be
done using the regular table, as you need separate names for the
32-bit entries anyway.

>  SYSX(sys_ni_syscall,compat_sys_s
> diff --git a/arch/powerpc/platforms/cell/spu_callbacks.c 
> b/arch/powerpc/platforms/cell/spu_callbacks.c
> index 8ae8620..7517a43 100644
> --- a/arch/powerpc/platforms/cell/spu_callbacks.c
> +++ b/arch/powerpc/platforms/cell/spu_callbacks.c
> @@ -47,6 +47,7 @@
>  #define COMPAT_SPU_NEW(func)   sys_##func,
>  #define SYSX_SPU(f, f3264, f32)f,
>
> +#define compat_sys_sigsuspend  sys_sigsuspend
>  #include 
>  };

The spu_callbacks.c and systbl_chk.c files don't need this macro,
but that doesn't matter once you drop this patch.

  Arnd


Re: [PATCH 09/10] drivers/perf: perf/core: generalise event exclusion checking with perf macro

2018-11-19 Thread Mark Rutland
On Fri, Nov 16, 2018 at 10:24:12AM +, Andrew Murray wrote:
> Replace checking of perf event exclusion flags with perf macro.
> 
> This is a functional change as exclude_host and exclude_guest are added
> in the following files:
> 
>  - drivers/perf/qcom_l2_pmu.c
>  - drivers/perf/qcom_l3_pmu.c
>  - drivers/perf/arm_pmu.c
> 
> And exclude_idle and exclude_hv are added in these files:
> 
>  - drivers/perf/xgene_pmu.c

FWIW, that all sounds fine to me.

Assuming you fix up the 'macro' nit:

Acked-by: Mark Rutland 

... unless we go for Peter's core cap for this.

Thanks,
Mark.

> Signed-off-by: Andrew Murray 
> ---
>  drivers/perf/arm-cci.c   | 7 +--
>  drivers/perf/arm-ccn.c   | 5 +
>  drivers/perf/arm_dsu_pmu.c   | 7 +--
>  drivers/perf/arm_pmu.c   | 9 +
>  drivers/perf/hisilicon/hisi_uncore_pmu.c | 7 +--
>  drivers/perf/qcom_l2_pmu.c   | 3 +--
>  drivers/perf/qcom_l3_pmu.c   | 3 +--
>  drivers/perf/xgene_pmu.c | 3 +--
>  8 files changed, 8 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
> index 1bfeb16..d749f19 100644
> --- a/drivers/perf/arm-cci.c
> +++ b/drivers/perf/arm-cci.c
> @@ -1328,12 +1328,7 @@ static int cci_pmu_event_init(struct perf_event *event)
>   return -EOPNOTSUPP;
>  
>   /* We have no filtering of any kind */
> - if (event->attr.exclude_user||
> - event->attr.exclude_kernel  ||
> - event->attr.exclude_hv  ||
> - event->attr.exclude_idle||
> - event->attr.exclude_host||
> - event->attr.exclude_guest)
> + if (event_has_exclude_flags(event))
>   return -EINVAL;
>  
>   /*
> diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
> index 7dd850e..9a22a95 100644
> --- a/drivers/perf/arm-ccn.c
> +++ b/drivers/perf/arm-ccn.c
> @@ -741,10 +741,7 @@ static int arm_ccn_pmu_event_init(struct perf_event 
> *event)
>   return -EOPNOTSUPP;
>   }
>  
> - if (has_branch_stack(event) || event->attr.exclude_user ||
> - event->attr.exclude_kernel || event->attr.exclude_hv ||
> - event->attr.exclude_idle || event->attr.exclude_host ||
> - event->attr.exclude_guest) {
> + if (has_branch_stack(event) || event_has_exclude_flags(event)) {
>   dev_dbg(ccn->dev, "Can't exclude execution levels!\n");
>   return -EINVAL;
>   }
> diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
> index 660cb8a..300ff3d 100644
> --- a/drivers/perf/arm_dsu_pmu.c
> +++ b/drivers/perf/arm_dsu_pmu.c
> @@ -563,12 +563,7 @@ static int dsu_pmu_event_init(struct perf_event *event)
>   }
>  
>   if (has_branch_stack(event) ||
> - event->attr.exclude_user ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle ||
> - event->attr.exclude_host ||
> - event->attr.exclude_guest) {
> + event_has_exclude_flags(event)) {
>   dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n");
>   return -EINVAL;
>   }
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 7f01f6f..a03634f 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -357,13 +357,6 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void 
> *dev)
>  }
>  
>  static int
> -event_requires_mode_exclusion(struct perf_event_attr *attr)
> -{
> - return attr->exclude_idle || attr->exclude_user ||
> -attr->exclude_kernel || attr->exclude_hv;
> -}
> -
> -static int
>  __hw_perf_event_init(struct perf_event *event)
>  {
>   struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> @@ -395,7 +388,7 @@ __hw_perf_event_init(struct perf_event *event)
>*/
>   if ((!armpmu->set_event_filter ||
>armpmu->set_event_filter(hwc, &event->attr)) &&
> -  event_requires_mode_exclusion(&event->attr)) {
> +  event_has_exclude_flags(event)) {
>   pr_debug("ARM performance counters do not support "
>"mode exclusion\n");
>   return -EOPNOTSUPP;
> diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c 
> b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> index 9efd241..d3edff9 100644
> --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
> +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> @@ -143,12 +143,7 @@ int hisi_uncore_pmu_event_init(struct perf_event *event)
>   return -EOPNOTSUPP;
>  
>   /* counters do not have these bits */
> - if (event->attr.exclude_user||
> - event->attr.exclude_kernel  ||
> - event->attr.exclude_host||
> - event->attr.exclude_guest   ||
> - event->attr.exclude_hv  ||
> - event->attr.exclude_idle)
> + if (event_has_exclude_flags(event))
>   return -EIN

Re: [PATCH 01/10] perf/core: Add macro to test for event exclusion flags

2018-11-19 Thread Mark Rutland
On Fri, Nov 16, 2018 at 10:24:04AM +, Andrew Murray wrote:
> Add a macro that tests if any of the perf event exclusion flags
> are set on a given event.
> 
> Signed-off-by: Andrew Murray 

Aside from the s/macro/function, or s/macro/helper/, this looks sound to
me.

Assuming you fix that up here and in subsequent commit messages, for
this patch feel free to add:

Acked-by: Mark Rutland 

Mark.

> ---
>  include/linux/perf_event.h | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 53c500f..89ee7fa 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event,
>  extern void
>  perf_log_lost_samples(struct perf_event *event, u64 lost);
>  
> +static inline bool event_has_exclude_flags(struct perf_event *event)
> +{
> + struct perf_event_attr *attr = &event->attr;
> +
> + return attr->exclude_idle || attr->exclude_user ||
> +attr->exclude_kernel || attr->exclude_hv ||
> +attr->exclude_guest || attr->exclude_host;
> +}
> +
>  static inline bool is_sampling_event(struct perf_event *event)
>  {
>   return event->attr.sample_period != 0;
> -- 
> 2.7.4
> 


Re: [PATCH 5/5] dma-direct: always allow dma mask <= physiscal memory size

2018-11-19 Thread Robin Murphy

On 19/11/2018 14:18, Ramon Fried wrote:

On Tue, Oct 9, 2018 at 8:02 AM Benjamin Herrenschmidt
 wrote:


On Wed, 2018-10-03 at 16:10 -0700, Alexander Duyck wrote:

-* Because 32-bit DMA masks are so common we expect every architecture
-* to be able to satisfy them - either by not supporting more physical
-* memory, or by providing a ZONE_DMA32.  If neither is the case, the
-* architecture needs to use an IOMMU instead of the direct mapping.
-*/
-   if (mask < phys_to_dma(dev, DMA_BIT_MASK(32)))
+   u64 min_mask;
+
+   if (IS_ENABLED(CONFIG_ZONE_DMA))
+   min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS);
+   else
+   min_mask = DMA_BIT_MASK(32);
+
+   min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT);
+
+   if (mask >= phys_to_dma(dev, min_mask))
  return 0;
-#endif
  return 1;
   }


So I believe I have run into the same issue that Guenter reported. On
an x86_64 system w/ Intel IOMMU. I wasn't able to complete boot and
all probe attempts for various devices were failing with -EIO errors.

I believe the last mask check should be "if (mask < phys_to_dma(dev,
min_mask))" not a ">=" check.


Right, that test is backwards. I needed to change it here too (powermac
with the rest of the powerpc series).

Cheers,
Ben.




Hi, I'm working on a MIPS64 soc with PCIe root complex on it, and it
appears that this series of patches are causing all PCI drivers that
request 64bit mask to fail with -5.
It's broken in 4.19. However, I just checked, it working on master.
We may need to backport a couple of patches to 4.19. I'm not sure
though which patches should be backported as there were at least 10
patches resolving this dma_direct area recently.
Christoph, Robin.
Can we ask Greg to backport all these changes ? What do you think ?


As far as I'm aware, the only real issue in 4.19 was my subtle breakage 
around setting bus_dma_mask - that's fixed by 6778be4e5209, which 
according to my inbox got picked up by autosel for 4.19 stable last week.


Robin.


Re: [PATCH 1/4] bpf: account for freed JIT allocations in arch code

2018-11-19 Thread Ard Biesheuvel
On Mon, 19 Nov 2018 at 02:37, Daniel Borkmann  wrote:
>
> On 11/17/2018 07:57 PM, Ard Biesheuvel wrote:
> > Commit ede95a63b5e84 ("bpf: add bpf_jit_limit knob to restrict unpriv
> > allocations") added a call to bpf_jit_uncharge_modmem() to the routine
> > bpf_jit_binary_free() which is called from the __weak bpf_jit_free().
> > This function is overridden by arches, some of which do not call
> > bpf_jit_binary_free() to release the memory, and so the released
> > memory is not accounted for, potentially leading to spurious allocation
> > failures.
> >
> > So replace the direct calls to module_memfree() in the arch code with
> > calls to bpf_jit_binary_free().
>
> Sorry but this patch is completely buggy, and above description on the
> accounting incorrect as well. Looks like this patch was not tested at all.
>

My apologies. I went off into the weeds a bit looking at different
versions for 32-bit and 64-bit on different architectures. So indeed,
this patch should be dropped.

> The below cBPF JITs that use module_memfree() which you replace with
> bpf_jit_binary_free() are using module_alloc() internally to get the JIT
> image buffer ...
>

Indeed. So would you prefer for arm64 to override bpf_jit_free() in
its entirety, and not call bpf_jit_binary_free() but simply call
bpf_jit_uncharge_modmem() and vfree() directly? It's either that, or
we'd have to untangle this a bit, to avoid having one __weak function
on top of the other just so other arches can replace the
module_memfree() call in bpf_jit_binary_free() with vfree() (which
amount to the same thing on arm64 anyway)


Re: [LKP] dad4f140ed [ 7.709376] WARNING:suspicious_RCU_usage

2018-11-19 Thread Matthew Wilcox
On Sun, Nov 18, 2018 at 05:19:04PM -0800, Matthew Wilcox wrote:
> On Mon, Nov 19, 2018 at 09:08:20AM +0800, kernel test robot wrote:
> > Greetings,
> > 
> > 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> Umm.  I don't see a 'suspicious RCU usage' message in here.  I see a
> lot of vmalloc warnings ... ?

Never mind, the RCU warnings were in one of the attached dmesg files.
I found the problem; it's in the test-suite and not something that could
lead to any real problems today.

Thanks for the report!


Re: [PATCH 00/10] perf/core: Generalise event exclusion checking

2018-11-19 Thread Peter Zijlstra
On Fri, Nov 16, 2018 at 10:24:03AM +, Andrew Murray wrote:
> Many PMU drivers do not have the capability to exclude counting events
> that occur in specific contexts such as idle, kernel, guest, etc. These
> drivers indicate this by returning an error in their event_init upon
> testing the events attribute flags.
> 
> However this approach requires that each time a new event modifier is
> added to perf, all the perf drivers need to be modified to indicate that
> they don't support the attribute. This results in additional boiler-plate
> code common to many drivers that needs to be maintained. An example of
> this is the addition of exclude_host and exclude_guest in 2011 yet many
> PMU drivers do not support this or indicate an error on events that make
> use of it.
> 
> This patch generalises the test for exclusion and updates PMU drivers to
> use it. This is a functional change as some PMU drivers will now correctly
> report that they don't support certain events whereas they previously did.

Right, I like that idea, and yes, there's a lot of fail around there :/

> A longer term approach may instead be for PMU's to advertise their
> capabilities on registration.

This I think is the better approach. We already have the
PERF_PMU_CAP_flags that can be used to advertise various PMU
capabilities.

Something along these lines I suppose; then every PMU that actually
checks the flags, needs to set the flag, otherwise it'll fail.

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 53c500f0ca79..de15723ea52a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -244,6 +244,7 @@ struct perf_event;
 #define PERF_PMU_CAP_EXCLUSIVE 0x10
 #define PERF_PMU_CAP_ITRACE0x20
 #define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x40
+#define PERF_PMU_CAP_EXCLUDE   0x80
 
 /**
  * struct pmu - generic performance monitoring unit
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 84530ab358c3..d76b724177b9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9772,6 +9772,14 @@ static int perf_try_init_event(struct pmu *pmu, struct 
perf_event *event)
if (ctx)
perf_event_ctx_unlock(event->group_leader, ctx);
 
+   if (!ret) {
+   if ((pmu->capabilities & PERF_PMU_CAP_EXCLUDE) ||
+   event_has_exclude_flags(event)) {
+   event->destroy(event);
+   ret = -EINVAL;
+   }
+   }
+
if (ret)
module_put(pmu->module);
 



Re: [PATCH 01/10] perf/core: Add macro to test for event exclusion flags

2018-11-19 Thread Peter Zijlstra
On Fri, Nov 16, 2018 at 10:24:04AM +, Andrew Murray wrote:
> Add a macro that tests if any of the perf event exclusion flags
> are set on a given event.

It is in fact an inline function, not a CPP macro.

> Signed-off-by: Andrew Murray 
> ---
>  include/linux/perf_event.h | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 53c500f..89ee7fa 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event,
>  extern void
>  perf_log_lost_samples(struct perf_event *event, u64 lost);
>  
> +static inline bool event_has_exclude_flags(struct perf_event *event)
> +{
> + struct perf_event_attr *attr = &event->attr;
> +
> + return attr->exclude_idle || attr->exclude_user ||
> +attr->exclude_kernel || attr->exclude_hv ||
> +attr->exclude_guest || attr->exclude_host;
> +}
> +
>  static inline bool is_sampling_event(struct perf_event *event)
>  {
>   return event->attr.sample_period != 0;
> -- 
> 2.7.4
> 


[PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint

2018-11-19 Thread Breno Leitao
On a signal handler return, the user could set a context with MSR[TS] bits
set, and these bits would be copied to task regs->msr.

At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
several __get_user() are called and then a recheckpoint is executed.

This is a problem since a page fault (in kernel space) could happen when
calling __get_user(). If it happens, the process MSR[TS] bits were
already set, but recheckpoint was not executed, and SPRs are still invalid.

The page fault can cause the current process to be de-scheduled, with
MSR[TS] active and without tm_recheckpoint() being called.  More
importantly, without TEXAR[FS] bit set also.

Since TEXASR might not have the FS bit set, and when the process is
scheduled back, it will try to reclaim, which will be aborted because of
the CPU is not in the suspended state, and, then, recheckpoint. This
recheckpoint will restore thread->texasr into TEXASR SPR, which might be
zero, hitting a BUG_ON().

[ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!

This patch simply delays the MSR[TS] set, so, if there is any page fault in
the __get_user() section, it does not have regs->msr[TS] set, since the TM
structures are still invalid, thus avoiding doing TM operations for
in-kernel exceptions and possible process reschedule.

With this patch, the MSR[TS] will only be set just before recheckpointing
and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
invalid state.

It is not possible to move tm_recheckpoint to happen earlier, because it is
required to get the checkpointed registers from userspace, with
__get_user(), thus, the only way to avoid this undesired behavior is
delaying the MSR[TS] set, as done in this patch.

Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
Cc: sta...@vger.kernel.org (v3.9+)
Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/signal_64.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 83d51bf586c7..15b153bdd826 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
if (MSR_TM_RESV(msr))
return -EINVAL;
 
-   /* pull in MSR TS bits from user context */
-   regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
-
-   /*
-* Ensure that TM is enabled in regs->msr before we leave the signal
-* handler. It could be the case that (a) user disabled the TM bit
-* through the manipulation of the MSR bits in uc_mcontext or (b) the
-* TM bit was disabled because a sufficient number of context switches
-* happened whilst in the signal handler and load_tm overflowed,
-* disabling the TM bit. In either case we can end up with an illegal
-* TM state leading to a TM Bad Thing when we return to userspace.
-*/
-   regs->msr |= MSR_TM;
-
/* pull in MSR LE from user context */
regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
 
@@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
tm_enable();
/* Make sure the transaction is marked as failed */
tsk->thread.tm_texasr |= TEXASR_FS;
+
+   /* pull in MSR TS bits from user context */
+   regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
+
+   /*
+* Ensure that TM is enabled in regs->msr before we leave the signal
+* handler. It could be the case that (a) user disabled the TM bit
+* through the manipulation of the MSR bits in uc_mcontext or (b) the
+* TM bit was disabled because a sufficient number of context switches
+* happened whilst in the signal handler and load_tm overflowed,
+* disabling the TM bit. In either case we can end up with an illegal
+* TM state leading to a TM Bad Thing when we return to userspace.
+*/
+   regs->msr |= MSR_TM;
+
/* This loads the checkpointed FP/VEC state, if used */
tm_recheckpoint(&tsk->thread);
 
-- 
2.19.0



Re: [PATCH net-next 2/6] net/bpf: split VLAN_PRESENT bit handling from VLAN_TCI

2018-11-19 Thread Daniel Borkmann
On 11/10/2018 07:58 PM, Michał Mirosław wrote:
> Signed-off-by: Michał Mirosław 

Why you have empty commit messages for non-trivial changes like this in
4 out of 6 of your patches ...

How was it tested on the JITs you were changing? Did you test on both,
big and little endian machines?

> ---
>  net/core/filter.c | 40 +---
>  1 file changed, 21 insertions(+), 19 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index e521c5ebc7d1..c151b906df53 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -296,22 +296,21 @@ static u32 convert_skb_access(int skb_field, int 
> dst_reg, int src_reg,
>   break;
>  
>   case SKF_AD_VLAN_TAG:
> - case SKF_AD_VLAN_TAG_PRESENT:
>   BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_tci) != 2);
> - BUILD_BUG_ON(VLAN_TAG_PRESENT != 0x1000);
>  
>   /* dst_reg = *(u16 *) (src_reg + offsetof(vlan_tci)) */
>   *insn++ = BPF_LDX_MEM(BPF_H, dst_reg, src_reg,
> offsetof(struct sk_buff, vlan_tci));
> - if (skb_field == SKF_AD_VLAN_TAG) {
> - *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg,
> - ~VLAN_TAG_PRESENT);
> - } else {
> - /* dst_reg >>= 12 */
> - *insn++ = BPF_ALU32_IMM(BPF_RSH, dst_reg, 12);
> - /* dst_reg &= 1 */
> +#ifdef VLAN_TAG_PRESENT
> + *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg, ~VLAN_TAG_PRESENT);
> +#endif
> + break;
> + case SKF_AD_VLAN_TAG_PRESENT:
> + *insn++ = BPF_LDX_MEM(BPF_B, dst_reg, src_reg, 
> PKT_VLAN_PRESENT_OFFSET());
> + if (PKT_VLAN_PRESENT_BIT)
> + *insn++ = BPF_ALU32_IMM(BPF_RSH, dst_reg, 
> PKT_VLAN_PRESENT_BIT);
> + if (PKT_VLAN_PRESENT_BIT < 7)
>   *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg, 1);
> - }
>   break;
>   }
>  
> @@ -6140,19 +6139,22 @@ static u32 bpf_convert_ctx_access(enum 
> bpf_access_type type,
>   break;
>  
>   case offsetof(struct __sk_buff, vlan_present):
> + *target_size = 1;
> + *insn++ = BPF_LDX_MEM(BPF_B, si->dst_reg, si->src_reg,
> +   PKT_VLAN_PRESENT_OFFSET());
> + if (PKT_VLAN_PRESENT_BIT)
> + *insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, 
> PKT_VLAN_PRESENT_BIT);
> + if (PKT_VLAN_PRESENT_BIT < 7)
> + *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, 1);
> + break;
> +
>   case offsetof(struct __sk_buff, vlan_tci):
> - BUILD_BUG_ON(VLAN_TAG_PRESENT != 0x1000);
> -
>   *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
> bpf_target_off(struct sk_buff, vlan_tci, 
> 2,
>target_size));
> - if (si->off == offsetof(struct __sk_buff, vlan_tci)) {
> - *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg,
> - ~VLAN_TAG_PRESENT);
> - } else {
> - *insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, 12);
> - *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, 1);
> - }
> +#ifdef VLAN_TAG_PRESENT
> + *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, 
> ~VLAN_TAG_PRESENT);
> +#endif
>   break;
>  
>   case offsetof(struct __sk_buff, cb[0]) ...
> 



Re: [PATCH] ALSA: aoa: Use device_type helpers to access the node type

2018-11-19 Thread Takashi Iwai
On Fri, 16 Nov 2018 23:11:04 +0100,
Rob Herring wrote:
> 
> Remove directly accessing device_node.type pointer and use the accessors
> instead. This will eventually allow removing the type pointer.
> 
> Replace the open coded iterating over child nodes with
> for_each_child_of_node() while we're here.
> 
> Cc: Johannes Berg 
> Cc: Jaroslav Kysela 
> Cc: Takashi Iwai 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: alsa-de...@alsa-project.org
> Signed-off-by: Rob Herring 

Applied, thanks.


Takashi


Re: [PATCH net-next 0/6] Remove VLAN.CFI overload

2018-11-19 Thread Daniel Borkmann
On 11/10/2018 10:47 PM, David Miller wrote:
> From: Michał Mirosław 
> Date: Sat, 10 Nov 2018 19:58:29 +0100
> 
>> Fix BPF code/JITs to allow for separate VLAN_PRESENT flag
>> storage and finally move the flag to separate storage in skbuff.
>>
>> This is final step to make CLAN.CFI transparent to core Linux
>> networking stack.
>>
>> An #ifdef is introduced temporarily to mark fragments masking
>> VLAN_TAG_PRESENT. This is removed altogether in the final patch.
> 
> Daniel and Alexei, please review.

Sorry, was completely swamped due to plumbers, just getting to it now.


Re: [PATCH 1/4] bpf: account for freed JIT allocations in arch code

2018-11-19 Thread Daniel Borkmann
On 11/17/2018 07:57 PM, Ard Biesheuvel wrote:
> Commit ede95a63b5e84 ("bpf: add bpf_jit_limit knob to restrict unpriv
> allocations") added a call to bpf_jit_uncharge_modmem() to the routine
> bpf_jit_binary_free() which is called from the __weak bpf_jit_free().
> This function is overridden by arches, some of which do not call
> bpf_jit_binary_free() to release the memory, and so the released
> memory is not accounted for, potentially leading to spurious allocation
> failures.
> 
> So replace the direct calls to module_memfree() in the arch code with
> calls to bpf_jit_binary_free().

Sorry but this patch is completely buggy, and above description on the
accounting incorrect as well. Looks like this patch was not tested at all.

The below cBPF JITs that use module_memfree() which you replace with
bpf_jit_binary_free() are using module_alloc() internally to get the JIT
image buffer ...

> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/mips/net/bpf_jit.c   | 2 +-
>  arch/powerpc/net/bpf_jit_comp.c   | 2 +-
>  arch/powerpc/net/bpf_jit_comp64.c | 5 +
>  arch/sparc/net/bpf_jit_comp_32.c  | 2 +-
>  4 files changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> index 4d8cb9bb8365..1b69897274a1 100644
> --- a/arch/mips/net/bpf_jit.c
> +++ b/arch/mips/net/bpf_jit.c
> @@ -1264,7 +1264,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
>  void bpf_jit_free(struct bpf_prog *fp)
>  {
>   if (fp->jited)
> - module_memfree(fp->bpf_func);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>  
>   bpf_prog_unlock_free(fp);
>  }
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index d5bfe24bb3b5..a1ea1ea6b40d 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -683,7 +683,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
>  void bpf_jit_free(struct bpf_prog *fp)
>  {
>   if (fp->jited)
> - module_memfree(fp->bpf_func);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>  
>   bpf_prog_unlock_free(fp);
>  }
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 50b129785aee..84c8f013a6c6 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -1024,11 +1024,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
> *fp)
>  /* Overriding bpf_jit_free() as we don't set images read-only. */
>  void bpf_jit_free(struct bpf_prog *fp)
>  {
> - unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
> - struct bpf_binary_header *bpf_hdr = (void *)addr;
> -
>   if (fp->jited)
> - bpf_jit_binary_free(bpf_hdr);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>  
>   bpf_prog_unlock_free(fp);
>  }
> diff --git a/arch/sparc/net/bpf_jit_comp_32.c 
> b/arch/sparc/net/bpf_jit_comp_32.c
> index a5ff88643d5c..01bda6bc9e7f 100644
> --- a/arch/sparc/net/bpf_jit_comp_32.c
> +++ b/arch/sparc/net/bpf_jit_comp_32.c
> @@ -759,7 +759,7 @@ cond_branch:  f_offset = addrs[i + 
> filter[i].jf];
>  void bpf_jit_free(struct bpf_prog *fp)
>  {
>   if (fp->jited)
> - module_memfree(fp->bpf_func);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>  
>   bpf_prog_unlock_free(fp);
>  }
>