[PATCH] EDAC/al-mc-edac: Slighly simplify code

2020-11-28 Thread Christophe JAILLET
Use 'devm_add_action_or_reset()' instead of open coding it.
This makes the error handling code look more consistent.
This also save a few LoC.

Signed-off-by: Christophe JAILLET 
---
 drivers/edac/al_mc_edac.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/edac/al_mc_edac.c b/drivers/edac/al_mc_edac.c
index 7d4f396c27b5..178b9e581a72 100644
--- a/drivers/edac/al_mc_edac.c
+++ b/drivers/edac/al_mc_edac.c
@@ -238,11 +238,9 @@ static int al_mc_edac_probe(struct platform_device *pdev)
if (!mci)
return -ENOMEM;
 
-   ret = devm_add_action(>dev, devm_al_mc_edac_free, mci);
-   if (ret) {
-   edac_mc_free(mci);
+   ret = devm_add_action_or_reset(>dev, devm_al_mc_edac_free, mci);
+   if (ret)
return ret;
-   }
 
platform_set_drvdata(pdev, mci);
al_mc = mci->pvt_info;
@@ -293,11 +291,9 @@ static int al_mc_edac_probe(struct platform_device *pdev)
return ret;
}
 
-   ret = devm_add_action(>dev, devm_al_mc_edac_del, >dev);
-   if (ret) {
-   edac_mc_del_mc(>dev);
+   ret = devm_add_action_or_reset(>dev, devm_al_mc_edac_del, 
>dev);
+   if (ret)
return ret;
-   }
 
if (al_mc->irq_ue > 0) {
ret = devm_request_irq(>dev,
-- 
2.27.0



Re: scheduling while atomic in z3fold

2020-11-28 Thread Mike Galbraith
On Sun, 2020-11-29 at 07:41 +0100, Mike Galbraith wrote:
> On Sat, 2020-11-28 at 15:27 +0100, Oleksandr Natalenko wrote:
> >
> > > > Shouldn't the list manipulation be protected with
> > > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
> >
> > Totally untested:
>
> Hrm, the thing doesn't seem to care deeply about preemption being
> disabled, so adding another lock may be overkill.  It looks like you
> could get the job done via migrate_disable()+this_cpu_ptr().

There is however an ever so tiny chance that I'm wrong about that :)

crash.rt> bt -s
PID: 6699   TASK: 913c464b5640  CPU: 0   COMMAND: "oom01"
 #0 [b6b94adff6f0] machine_kexec+366 at bd05f87e
 #1 [b6b94adff738] __crash_kexec+210 at bd14c052
 #2 [b6b94adff7f8] crash_kexec+48 at bd14d240
 #3 [b6b94adff808] oops_end+202 at bd02680a
 #4 [b6b94adff828] no_context+333 at bd06d7ed
 #5 [b6b94adff888] exc_page_fault+696 at bd8c0b68
 #6 [b6b94adff8e0] asm_exc_page_fault+30 at bda00ace
 #7 [b6b94adff968] mark_wakeup_next_waiter+81 at bd0ea1e1
 #8 [b6b94adff9c8] rt_mutex_futex_unlock+79 at bd8cc3cf
 #9 [b6b94adffa08] z3fold_zpool_free+1319 at bd2b6b17
#10 [b6b94adffa68] zswap_free_entry+67 at bd27c6f3
#11 [b6b94adffa78] zswap_frontswap_invalidate_page+138 at bd27c7fa
#12 [b6b94adffaa0] __frontswap_invalidate_page+72 at bd27bee8
#13 [b6b94adffac8] swapcache_free_entries+494 at bd276e1e
#14 [b6b94adffb10] free_swap_slot+173 at bd27b7dd
#15 [b6b94adffb30] __swap_entry_free+112 at bd2768d0
#16 [b6b94adffb58] free_swap_and_cache+57 at bd278939
#17 [b6b94adffb80] unmap_page_range+1485 at bd24c52d
#18 [b6b94adffc40] __oom_reap_task_mm+178 at bd218f02
#19 [b6b94adffd10] exit_mmap+339 at bd257da3
#20 [b6b94adffdb0] mmput+78 at bd07fe7e
#21 [b6b94adffdc0] do_exit+822 at bd089bc6
#22 [b6b94adffe28] do_group_exit+71 at bd08a547
#23 [b6b94adffe50] get_signal+319 at bd0979ff
#24 [b6b94adffe98] arch_do_signal+30 at bd022cbe
#25 [b6b94adfff28] exit_to_user_mode_prepare+293 at bd1223e5
#26 [b6b94adfff48] irqentry_exit_to_user_mode+5 at bd8c1675
#27 [b6b94adfff50] asm_exc_page_fault+30 at bda00ace
RIP: 00414300  RSP: 7f5ddf065ec0  RFLAGS: 00010206
RAX: 1000  RBX: c000  RCX: adf28000
RDX: 7f5d0bf8d000  RSI: c000  RDI: 
RBP: 7f5c5e065000   R8:    R9: 
R10: 0022  R11: 0246  R12: 1000
R13: 0001  R14: 0001  R15: 7ffc953ebcd0
ORIG_RAX:   CS: 0033  SS: 002b
crash.rt>




[PATCH] media: saa7146: switch from 'pci_' to 'dma_' API

2020-11-28 Thread Christophe JAILLET
The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.

When memory is allocated in 'saa7146_pgtable_alloc()' GFP_KERNEL can be
used because the callers are either .buf_prepare functions or function that
already use GFP_KERNEL (hidden in a 'vmalloc_32()' call).

When memory is allocated in 'saa7146_init_one()' GFP_KERNEL can be used
because it is probe function and no lock is taken in the between.

When memory is allocated in 'saa7146_vv_init()' GFP_KERNEL can be used
because this function already uses GFP_KERNEL and no lock is taken in the
between.

When memory is allocated in 'vbi_workaround()' GFP_KERNEL can be used
because it is only called from a .open function.


@@
@@
-PCI_DMA_BIDIRECTIONAL
+DMA_BIDIRECTIONAL

@@
@@
-PCI_DMA_TODEVICE
+DMA_TO_DEVICE

@@
@@
-PCI_DMA_FROMDEVICE
+DMA_FROM_DEVICE

@@
@@
-PCI_DMA_NONE
+DMA_NONE

@@
expression e1, e2, e3;
@@
-pci_alloc_consistent(e1, e2, e3)
+dma_alloc_coherent(>dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-pci_zalloc_consistent(e1, e2, e3)
+dma_alloc_coherent(>dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-pci_free_consistent(e1, e2, e3, e4)
+dma_free_coherent(>dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_map_single(e1, e2, e3, e4)
+dma_map_single(>dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_unmap_single(e1, e2, e3, e4)
+dma_unmap_single(>dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-pci_map_page(e1, e2, e3, e4, e5)
+dma_map_page(>dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-pci_unmap_page(e1, e2, e3, e4)
+dma_unmap_page(>dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_map_sg(e1, e2, e3, e4)
+dma_map_sg(>dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_unmap_sg(e1, e2, e3, e4)
+dma_unmap_sg(>dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+dma_sync_single_for_cpu(>dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_dma_sync_single_for_device(e1, e2, e3, e4)
+dma_sync_single_for_device(>dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+dma_sync_sg_for_cpu(>dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+dma_sync_sg_for_device(>dev, e2, e3, e4)

@@
expression e1, e2;
@@
-pci_dma_mapping_error(e1, e2)
+dma_mapping_error(>dev, e2)

@@
expression e1, e2;
@@
-pci_set_dma_mask(e1, e2)
+dma_set_mask(>dev, e2)

@@
expression e1, e2;
@@
-pci_set_consistent_dma_mask(e1, e2)
+dma_set_coherent_mask(>dev, e2)

Signed-off-by: Christophe JAILLET 
---
If needed, see post from Christoph Hellwig on the kernel-janitors ML:
   https://marc.info/?l=kernel-janitors=158745678307186=4
---
 drivers/media/common/saa7146/saa7146_core.c | 39 +++--
 drivers/media/common/saa7146/saa7146_fops.c |  7 ++--
 drivers/media/common/saa7146/saa7146_vbi.c  |  6 ++--
 3 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/drivers/media/common/saa7146/saa7146_core.c 
b/drivers/media/common/saa7146/saa7146_core.c
index 21fb16cc5ca1..f2d13b71416c 100644
--- a/drivers/media/common/saa7146/saa7146_core.c
+++ b/drivers/media/common/saa7146/saa7146_core.c
@@ -177,7 +177,7 @@ void *saa7146_vmalloc_build_pgtable(struct pci_dev *pci, 
long length, struct saa
goto err_free_slist;
 
pt->nents = pages;
-   slen = pci_map_sg(pci,pt->slist,pt->nents,PCI_DMA_FROMDEVICE);
+   slen = dma_map_sg(>dev, pt->slist, pt->nents, DMA_FROM_DEVICE);
if (0 == slen)
goto err_free_pgtable;
 
@@ -187,7 +187,7 @@ void *saa7146_vmalloc_build_pgtable(struct pci_dev *pci, 
long length, struct saa
return mem;
 
 err_unmap_sg:
-   pci_unmap_sg(pci, pt->slist, pt->nents, PCI_DMA_FROMDEVICE);
+   dma_unmap_sg(>dev, pt->slist, pt->nents, DMA_FROM_DEVICE);
 err_free_pgtable:
saa7146_pgtable_free(pci, pt);
 err_free_slist:
@@ -201,7 +201,7 @@ void *saa7146_vmalloc_build_pgtable(struct pci_dev *pci, 
long length, struct saa
 
 void saa7146_vfree_destroy_pgtable(struct pci_dev *pci, void *mem, struct 
saa7146_pgtable *pt)
 {
-   pci_unmap_sg(pci, pt->slist, pt->nents, PCI_DMA_FROMDEVICE);
+   dma_unmap_sg(>dev, pt->slist, pt->nents, DMA_FROM_DEVICE);
saa7146_pgtable_free(pci, pt);
kfree(pt->slist);
pt->slist = NULL;
@@ -212,7 +212,7 @@ void saa7146_pgtable_free(struct pci_dev *pci, struct 
saa7146_pgtable *pt)
 {
if (NULL == pt->cpu)
return;
-   pci_free_consistent(pci, pt->size, pt->cpu, pt->dma);
+   dma_free_coherent(>dev, pt->size, pt->cpu, pt->dma);
pt->cpu = NULL;
 }
 
@@ -221,7 +221,7 @@ int saa7146_pgtable_alloc(struct pci_dev 

[PATCH] SCSI: bnx2i: requires MMU

2020-11-28 Thread Randy Dunlap
The SCSI_BNX2_ISCSI kconfig symbol selects CNIC and CNIC selects UIO,
which depends on MMU.
Since 'select' does not follow dependency chains, add the same MMU
dependency to SCSI_BNX2_ISCSI.

Quietens this kconfig warning:

WARNING: unmet direct dependencies detected for CNIC
  Depends on [n]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_BROADCOM [=y] 
&& PCI [=y] && (IPV6 [=m] || IPV6 [=m]=n) && MMU [=n]
  Selected by [m]:
  - SCSI_BNX2_ISCSI [=m] && SCSI_LOWLEVEL [=y] && SCSI [=y] && NET [=y] && PCI 
[=y] && (IPV6 [=m] || IPV6 [=m]=n)

Fixes: cf4e6363859d ("[SCSI] bnx2i: Add bnx2i iSCSI driver.")
Signed-off-by: Randy Dunlap 
Cc: linux-s...@vger.kernel.org
Cc: Nilesh Javali 
Cc: Manish Rangankar 
Cc: gr-qlogic-storage-upstr...@marvell.com
Cc: "James E.J. Bottomley" 
Cc: "Martin K. Petersen" 
---
 drivers/scsi/bnx2i/Kconfig|1 +
 1 file changed, 1 insertions(+)

--- linux-next-20201125.orig/drivers/scsi/bnx2i/Kconfig
+++ linux-next-20201125/drivers/scsi/bnx2i/Kconfig
@@ -4,6 +4,7 @@ config SCSI_BNX2_ISCSI
depends on NET
depends on PCI
depends on (IPV6 || IPV6=n)
+   depends on MMU
select SCSI_ISCSI_ATTRS
select NETDEVICES
select ETHERNET


[PATCH] net: broadcom CNIC: requires MMU

2020-11-28 Thread Randy Dunlap
The CNIC kconfig symbol selects UIO and UIO depends on MMU.
Since 'select' does not follow dependency chains, add the same MMU
dependency to CNIC.

Quietens this kconfig warning:

WARNING: unmet direct dependencies detected for UIO
  Depends on [n]: MMU [=n]
  Selected by [m]:
  - CNIC [=m] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_BROADCOM [=y] 
&& PCI [=y] && (IPV6 [=m] || IPV6 [=m]=n)

Fixes: adfc5217e9db ("broadcom: Move the Broadcom drivers")
Signed-off-by: Randy Dunlap 
Cc: Jeff Kirsher 
Cc: Rasesh Mody 
Cc: gr-linux-nic-...@marvell.com
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: net...@vger.kernel.org
---
This isn't really the correct Fixes: tag, but I don't know how to go
backwards in git history to find it. :(

 drivers/net/ethernet/broadcom/Kconfig |1 +
 1 file changed, 1 insertions(+)

--- linux-next-20201125.orig/drivers/net/ethernet/broadcom/Kconfig
+++ linux-next-20201125/drivers/net/ethernet/broadcom/Kconfig
@@ -88,6 +88,7 @@ config BNX2
 config CNIC
tristate "QLogic CNIC support"
depends on PCI && (IPV6 || IPV6=n)
+   depends on MMU
select BNX2
select UIO
help


[PATCH] vdpa/mlx5: Use random MAC for the vdpa net instance

2020-11-28 Thread Eli Cohen
We should not try to use the VF MAC address as that is used by the
regular (e.g. mlx5_core) NIC implementation. Instead, use a random
generated MAC address.

Suggested by: Cindy Lu 
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 1fa6fcac8299..80d06d958b8b 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1955,10 +1955,7 @@ void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev)
if (err)
goto err_mtu;
 
-   err = mlx5_query_nic_vport_mac_address(mdev, 0, 0, config->mac);
-   if (err)
-   goto err_mtu;
-
+   eth_random_addr(config->mac);
mvdev->vdev.dma_dev = mdev->device;
err = mlx5_vdpa_alloc_resources(>mvdev);
if (err)
-- 
2.26.2



Re: scheduling while atomic in z3fold

2020-11-28 Thread Mike Galbraith
On Sat, 2020-11-28 at 15:27 +0100, Oleksandr Natalenko wrote:
>
> > > Shouldn't the list manipulation be protected with
> > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
>
> Totally untested:

Hrm, the thing doesn't seem to care deeply about preemption being
disabled, so adding another lock may be overkill.  It looks like you
could get the job done via migrate_disable()+this_cpu_ptr().

In the case of the list in __z3fold_alloc(), after the unlocked peek,
it double checks under the existing lock.  There's not a whole lot of
difference between another cpu a few lines down in the same function
diddling the list and a preemption doing the same, so why bother?

Ditto add_to_unbuddied().  Flow decisions are being made based on zhdr-
>first/middle/last_chunks state prior to preemption being disabled as
the thing sits, so presumably their stability is not dependent thereon,
while the list is protected by the already existing lock, making the
preempt_disable() look more incidental than intentional.

Equally untested :)

---
 mm/z3fold.c |   17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -642,14 +642,16 @@ static inline void add_to_unbuddied(stru
 {
if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
zhdr->middle_chunks == 0) {
-   struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
-
+   struct list_head *unbuddied;
int freechunks = num_free_chunks(zhdr);
+
+   migrate_disable();
+   unbuddied = this_cpu_ptr(pool->unbuddied);
spin_lock(>lock);
list_add(>buddy, [freechunks]);
spin_unlock(>lock);
zhdr->cpu = smp_processor_id();
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();
}
 }

@@ -886,8 +888,9 @@ static inline struct z3fold_header *__z3
int chunks = size_to_chunks(size), i;

 lookup:
+   migrate_disable();
/* First, try to find an unbuddied z3fold page. */
-   unbuddied = get_cpu_ptr(pool->unbuddied);
+   unbuddied = this_cpu_ptr(pool->unbuddied);
for_each_unbuddied_list(i, chunks) {
struct list_head *l = [i];

@@ -905,7 +908,7 @@ static inline struct z3fold_header *__z3
!z3fold_page_trylock(zhdr)) {
spin_unlock(>lock);
zhdr = NULL;
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();
if (can_sleep)
cond_resched();
goto lookup;
@@ -919,7 +922,7 @@ static inline struct z3fold_header *__z3
test_bit(PAGE_CLAIMED, >private)) {
z3fold_page_unlock(zhdr);
zhdr = NULL;
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();
if (can_sleep)
cond_resched();
goto lookup;
@@ -934,7 +937,7 @@ static inline struct z3fold_header *__z3
kref_get(>refcount);
break;
}
-   put_cpu_ptr(pool->unbuddied);
+   migrate_enable();

if (!zhdr) {
int cpu;



Re: [PATCH] hwmon: corsair-psu: update supported devices

2020-11-28 Thread Wilken Gottwalt
On Sat, 28 Nov 2020 17:21:40 -0300
Jonas Malaco  wrote:

> On Sat, Nov 28, 2020 at 7:35 AM Wilken Gottwalt
>  wrote:
> >
> > On Sat, 28 Nov 2020 02:37:38 -0300
> > Jonas Malaco  wrote:
> >
> > > On Thu, Nov 26, 2020 at 8:43 AM Wilken Gottwalt
> > >  wrote:
> > > >
> > > > Adds support for another Corsair PSUs series: AX760i, AX860i, AX1200i,
> > > > AX1500i and AX1600i. The first 3 power supplies are supported through
> > > > the Corsair Link USB Dongle which is some kind of USB/Serial/TTL
> > > > converter especially made for the COM ports of these power supplies.
> > > > There are 3 known revisions of these adapters. The AX1500i power supply
> > > > has revision 3 built into the case and AX1600i is the only one in that
> > > > series, which has an unique usb hid id like the RM/RX series.
> > >
> > > Can I ask what AXi power supplies were tested?
> > >
> > > I ask because, based on the user-space implementations I am aware of,
> > > the AXi dongle protocol appears to be different from the RMi/HXi series.
> >
> > I was not able to test this against the AX power supplies, they are really
> > hard to find (and are far to expensive). But I went through all these tools
> > and stuck to the most common commands, which all 3 series support. Not every
> > series supports all commands (there also seem to be different firmwares in
> > the micro-conrollers). But this is fine, some sensors will show up as N/A.
> > Even my HX850i does not support all commands covered in this driver.
> 
> I think the similarities come from all using wrappers over the PMBus
> interface to the voltage controller.  But I am not sure the wrapping
> protocols are identical.
> 
> For example, cpsumon shows significantly more things going on during a
> read than what is needed for the RMi/HXi series.[1]
> 
> [1] 
> https://github.com/ka87/cpsumon/blob/fd639684d7f9/libcpsumon/src/cpsumon.c#L213-L231
> 
> 
> >
> > > AXi dongle:
> > >  - https://github.com/ka87/cpsumon
> >
> > This tool made me to consider including the AX series, because it uses some
> > of the same commands on the AX760i, AX860i, AX1200i and AX1500i. But it is
> > a usb-serial tool only. But it was nice to know, that the commands are 
> > mostly
> > the same. I left out all the commands for configuring, PCIe power rails,
> > efficiency and others which do not really belong into hwmon.
> >
> > > RMi/HXi:
> > >  - https://github.com/jonasmalacofilho/liquidctl
> > >  - https://github.com/audiohacked/OpenCorsairLink
> >
> > This tool made me include the AX series, because it uses the rmi protocol
> > component for the rmi driver (RM/HX series) and the corsair dongles.
> 
> The corsairlink_driver_dongle has no implementations for reading sensor
> data (compare that with the corsairlink_driver_rmi).[2][3]  There is
> also no code that actually tries to read (write) from (to) the device
> using that dongle driver.[4]
> 
> I also looked at a few of the issues, and all of the ones I read
> mentioned AXi support being under development, and the hypothesis of the
> AXi series being compatible with the RMi/HXi code still remaining to be
> confirmed.
> 
> [2] 
> https://github.com/audiohacked/OpenCorsairLink/blob/61d336a61b85/drivers/dongle.c#L33-L39
> [3] 
> https://github.com/audiohacked/OpenCorsairLink/blob/61d336a61b85/drivers/rmi.c#L33-L57
> [4] 
> https://github.com/audiohacked/OpenCorsairLink/blob/61d336a61b85/main.c#L106
> 
> 
> >
> > >  - https://github.com/notaz/corsairmi
> >
> > This one covers only some HX/RM PSUs, but is uses the rawhid access which
> > made me looking up the actual usb chips/bridges Corsair uses.
> >
> > >
> > > One additional concern is that the non-HID AXi dongles may only have bulk
> > > USB endpoints, and this is a HID driver.[1]
> >
> > You are right, in the case of the dongles it could be different. But I did
> > some research on Corsair usb driven devices and they really like to stick to
> > the cp210x, which is an usb hid bridge. The commit
> > b9326057a3d8447f5d2e74a7b521ccf21add2ec0 actually covers two Corsair USB
> > dongles as a cp210x device. So it is very likely that all Corsair PSUs with
> > such an interface/dongle use usb hid. But I'm completely open to get proven
> > wrong. Actually I really would like to see this tested by people who have
> > access to the more rare devices.
> 
> I could be wrong (and I am sorry for the noise if that is the case), but
> as far as I can see the cp210x does not create a HID device.

No no, this is fine. It really helps if some more people are looking into this.
I wish I had access to at least one of the later models (AX1500i/AX1600i), I
make mistakes from time to time. And it really doesn't help that Corsair changes
single devices in the same product line by firmware update. The AX1600i seems to
be the only one, which uses exactly the same protocol like the RM/HX series, but
is missing the actual usb hid part. But there seems to be a firmware where the
usb hid part was available for a short time. So, what to 

Re: [PATCH RFC v5 00/13] perf pmu-events: Support event aliasing for system PMUs

2020-11-28 Thread kajoljain



On 11/6/20 6:05 PM, John Garry wrote:
> Currently event aliasing and metrics for only CPU and uncore PMUs is
> supported. In fact, only uncore PMUs aliasing is supported for when the
> uncore PMUs are fixed for a CPU, which may not always be the case for
> certain architectures.
> 
> This series adds support for PMU event aliasing and metrics for system and
> other uncore PMUs which are not tied to a specific CPU.
> 
> For this, we introduce system event tables in generated pmu-events.c,
> which contain a per-SoC table of events of all its system PMUs. Each
> per-PMU event is matched by a "COMPAT" property.
> 
> When creating aliased and metrics PMUs, we treat core/uncore and
> system PMUs differently:
> 
> - For CPU PMUs, we always match for the event mapfile based on the CPUID.
>   This has not changed.
> 
> - For an system PMUs, we iterate through all the events in all the system
>   PMU tables.
> 
>   Matches are based on the "COMPAT" property matching the PMU sysfs
>   identifier contents, in /sys/bus/event_source/devices//identifier
> 
>   Uncore PMUs, may be matched via CPUID or same as system PMU, depending
>   on whether the uncore PMU is tied to a specific CPUID.
> 
> Initial reference support is also added for ARM SMMUv3 PMCG (Performance
> Monitor Event Group) PMU for HiSilicon hip09 platform with only a single
> event so far - see driver in drivers/perf/arm_smmuv3_pmu.c reference.
> 
> Here is a sample output with this series on Huawei D06CS board:
> 
> root@ubuntu:/# ./perf list
>[...]
> 
> smmu v3 pmcg:
>smmuv3_pmcg.config_cache_miss
> [Configuration cache miss caused by transaction or(ATS or
> non-ATS)translation request. Unit: smmuv3_pmcg]
>smmuv3_pmcg.config_struct_access
> [Configuration structure access. Unit: smmuv3_pmcg]
>smmuv3_pmcg.cycles
> [Clock cycles. Unit: smmuv3_pmcg]
>smmuv3_pmcg.l1_tlb
> [SMMUv3 PMCG L1 TABLE transation. Unit: smmuv3_pmcg]
>smmuv3_pmcg.pcie_ats_trans_passed
> [PCIe ATS Translated Transaction passed through SMMU. Unit:
> smmuv3_pmcg]
>smmuv3_pmcg.pcie_ats_trans_rq
> [PCIe ATS Translation Request received. Unit: smmuv3_pmcg]
>smmuv3_pmcg.tlb_miss
> [TLB miss caused by incoming transaction or (ATS or non-ATS) 
> translation request. Unit: smmuv3_pmcg]
>smmuv3_pmcg.trans_table_walk_access
> [Translation table walk access. Unit: smmuv3_pmcg]
>smmuv3_pmcg.transaction
> [Transaction. Unit: smmuv3_pmcg]
> 
> root@ubuntu:/# ./perf stat -v -e smmuv3_pmcg.l1_tlb sleep 1
> Using CPUID 0x480fd010
> -> smmuv3_pmcg_200100020/event=0x8a/
> -> smmuv3_pmcg_200140020/event=0x8a/
> -> smmuv3_pmcg_100020/event=0x8a/
> -> smmuv3_pmcg_140020/event=0x8a/
> -> smmuv3_pmcg_200148020/event=0x8a/
> -> smmuv3_pmcg_148020/event=0x8a/
> smmuv3_pmcg.l1_tlb: 0 1001221690 1001221690
> smmuv3_pmcg.l1_tlb: 0 1001220090 1001220090
> smmuv3_pmcg.l1_tlb: 101 1001219660 1001219660
> smmuv3_pmcg.l1_tlb: 0 1001219010 1001219010
> smmuv3_pmcg.l1_tlb: 0 1001218360 1001218360
> smmuv3_pmcg.l1_tlb: 134 1001217850 1001217850
> 
> Performance counter stats for 'system wide':
> 
> 235  smmuv3_pmcg.l1_tlb 
> 
> 1.001263128 seconds time elapsed
> 
> root@ubuntu:/#
> 
> Support is also added for imx8mm DDR PMU and HiSilicon hip09 uncore events.
> Some events for hip09 may not be accurate at the moment.
> 
> Series is here:
> https://github.com/hisilicon/kernel-dev/tree/private-topic-perf-5.10-sys-pmu-events-v5
> 
> Kernel part is here:
> https://lore.kernel.org/lkml/1602149181-237415-1-git-send-email-john.ga...@huawei.com/T/#mc34f758ab72f3d4a90d854b9bda7e6bbb90835b2
> 
> Differences to v4:
> - Drop hack for fixing metrics containing aliases which match multiple
>   PMUs, and add a proper fix attempt
> - Rebase to acme perf/core from 30 Oct
> - Fix up imx8 event names according to request from Joakim
> 
> Differences to v3:
> - Rebase to v5.9-rc7
> - Includes Ian's uncore metric expressions Fix and another fix
> - Add hip09 uncore events
> - Tidy jevents.c changes a bit
> 
> Differences to v2:
> - fixups for imx8mm JSONs
> - fix for metrics being repeated per PMU
> - use sysfs__read_str()
> - fix typo in PMCG JSON
> - drop evsel fix, which someone else fixed
> 
> Differences to v1:
> - Stop using SoC id and use a per-PMU identifier instead
> - Add metric group sys events support
>- This is a bit hacky
> - Add imx8mm DDR Perf support
> - Add fix for parse events sel
>   - without it, I get this spewed for metric event:
> 
>   assertion failed at util/parse-events.c:1637
> 
> Joakim Zhang (1):
>   perf vendor events: Add JSON metrics for imx8mm DDR Perf
> 
> John Garry (12):
>   perf jevents: Add support for an extra directory level
>   perf jevents: Add support for system events tables
>   perf pmu: Add pmu_id()
>   perf pmu: Add pmu_add_sys_aliases()
>   perf vendor events arm64: Add Architected events smmuv3-pmcg.json
>   perf vendor events 

Re: [PATCH v2 6/8] venus: venc: add handling for VIDIOC_ENCODER_CMD

2020-11-28 Thread Fritz Koenig
On Wed, Nov 11, 2020 at 6:38 AM Stanimir Varbanov
 wrote:
>
> From: Dikshita Agarwal 
>
> Add handling for below commands in encoder:
> 1. V4L2_ENC_CMD_STOP
> 2. V4L2_ENC_CMD_START
>
> Signed-off-by: Dikshita Agarwal 
> Signed-off-by: Stanimir Varbanov 
> ---
>  drivers/media/platform/qcom/venus/venc.c | 77 +++-
>  1 file changed, 76 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/media/platform/qcom/venus/venc.c 
> b/drivers/media/platform/qcom/venus/venc.c
> index 99bfabf90bd2..7512e4a16270 100644
> --- a/drivers/media/platform/qcom/venus/venc.c
> +++ b/drivers/media/platform/qcom/venus/venc.c
> @@ -507,6 +507,59 @@ static int venc_enum_frameintervals(struct file *file, 
> void *fh,
> return 0;
>  }
>
> +static int venc_encoder_cmd(struct file *file, void *fh,
> +   struct v4l2_encoder_cmd *ec)
> +{
> +   struct venus_inst *inst = to_inst(file);
> +   struct v4l2_m2m_ctx *m2m_ctx = inst->m2m_ctx;
> +   struct hfi_frame_data fdata = {0};
> +   int ret = 0;
> +
> +   ret = v4l2_m2m_ioctl_try_encoder_cmd(file, fh, ec);
> +   if (ret < 0)
> +   return ret;
> +
> +   mutex_lock(>lock);
> +
> +   if (!vb2_is_streaming(_ctx->cap_q_ctx.q) ||
> +   !vb2_is_streaming(_ctx->out_q_ctx.q))
> +   goto unlock;
> +
> +   if (m2m_ctx->is_draining) {
> +   ret = -EBUSY;
> +   goto unlock;
> +   }
> +
> +   if (ec->cmd == V4L2_ENC_CMD_STOP) {
> +   if (v4l2_m2m_has_stopped(m2m_ctx)) {
> +   ret = 0;
> +   goto unlock;
> +   }
> +
> +   m2m_ctx->is_draining = true;
> +
> +   fdata.buffer_type = HFI_BUFFER_INPUT;
> +   fdata.flags |= HFI_BUFFERFLAG_EOS;
> +   fdata.device_addr = 0;
> +   fdata.clnt_data = (u32)-1;
> +
> +   ret = hfi_session_process_buf(inst, );
> +   if (ret)
> +   goto unlock;
> +   }
> +
> +   if (ec->cmd == V4L2_ENC_CMD_START && v4l2_m2m_has_stopped(m2m_ctx)) {
> +   vb2_clear_last_buffer_dequeued(_ctx->cap_q_ctx.q);
> +   inst->m2m_ctx->has_stopped = false;
> +   venus_helper_process_initial_out_bufs(inst);
> +   venus_helper_process_initial_cap_bufs(inst);
> +   }
> +
> +unlock:
> +   mutex_unlock(>lock);
> +   return ret;
> +}
> +
>  static const struct v4l2_ioctl_ops venc_ioctl_ops = {
> .vidioc_querycap = venc_querycap,
> .vidioc_enum_fmt_vid_cap = venc_enum_fmt,
> @@ -534,6 +587,8 @@ static const struct v4l2_ioctl_ops venc_ioctl_ops = {
> .vidioc_enum_frameintervals = venc_enum_frameintervals,
> .vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
> .vidioc_unsubscribe_event = v4l2_event_unsubscribe,
> +   .vidioc_try_encoder_cmd = v4l2_m2m_ioctl_try_encoder_cmd,
> +   .vidioc_encoder_cmd = venc_encoder_cmd,
>  };
>
>  static int venc_set_properties(struct venus_inst *inst)
> @@ -946,9 +1001,22 @@ static int venc_start_streaming(struct vb2_queue *q, 
> unsigned int count)
>  static void venc_vb2_buf_queue(struct vb2_buffer *vb)
>  {
> struct venus_inst *inst = vb2_get_drv_priv(vb->vb2_queue);
> +   struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
> +   struct v4l2_m2m_ctx *m2m_ctx = inst->m2m_ctx;
>
> mutex_lock(>lock);
> -   venus_helper_vb2_buf_queue(vb);
> +
> +   v4l2_m2m_buf_queue(m2m_ctx, vbuf);
> +
> +   if (!(inst->streamon_out && inst->streamon_cap))
> +   goto unlock;
> +
> +   if (v4l2_m2m_has_stopped(m2m_ctx))
> +   goto unlock;
> +
> +   venus_helper_process_buf(vb);
> +
> +unlock:
> mutex_unlock(>lock);
>  }
>
> @@ -968,6 +1036,7 @@ static void venc_buf_done(struct venus_inst *inst, 
> unsigned int buf_type,
> struct vb2_v4l2_buffer *vbuf;
> struct vb2_buffer *vb;
> unsigned int type;
> +   struct v4l2_m2m_ctx *m2m_ctx = inst->m2m_ctx;
>
> if (buf_type == HFI_BUFFER_INPUT)
> type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
> @@ -986,6 +1055,12 @@ static void venc_buf_done(struct venus_inst *inst, 
> unsigned int buf_type,
> vb->planes[0].data_offset = data_offset;
> vb->timestamp = timestamp_us * NSEC_PER_USEC;
> vbuf->sequence = inst->sequence_cap++;
> +
> +   if ((!bytesused && m2m_ctx->is_draining) ||
> +   (vbuf->flags & V4L2_BUF_FLAG_LAST)) {
> +   vbuf->flags |= V4L2_BUF_FLAG_LAST;
> +   v4l2_m2m_mark_stopped(inst->m2m_ctx);
> +   }
> } else {
> vbuf->sequence = inst->sequence_out++;
> }
> --
> 2.17.1
>

Reviewed-by: Fritz Koenig 


Re: [PATCH v2 5/8] venus: pm_helpers: Check instance state when calculate instance frequency

2020-11-28 Thread Fritz Koenig
On Wed, Nov 11, 2020 at 6:38 AM Stanimir Varbanov
 wrote:
>
> Skip calculating instance frequency if it is not in running state.
>
> Signed-off-by: Stanimir Varbanov 
> ---
>  drivers/media/platform/qcom/venus/pm_helpers.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/media/platform/qcom/venus/pm_helpers.c 
> b/drivers/media/platform/qcom/venus/pm_helpers.c
> index ca99908ca3d3..cc84dc5e371b 100644
> --- a/drivers/media/platform/qcom/venus/pm_helpers.c
> +++ b/drivers/media/platform/qcom/venus/pm_helpers.c
> @@ -940,6 +940,9 @@ static unsigned long calculate_inst_freq(struct 
> venus_inst *inst,
>
> mbs_per_sec = load_per_instance(inst);
>
> +   if (inst->state != INST_START)
> +   return 0;
> +
> vpp_freq = mbs_per_sec * inst->clk_data.codec_freq_data->vpp_freq;
> /* 21 / 20 is overhead factor */
> vpp_freq += vpp_freq / 20;
> --
> 2.17.1
>

Reviewed-by: Fritz Koenig 


Re: [PATCH v2 8/8] venus: helpers: Delete unused stop streaming helper

2020-11-28 Thread Fritz Koenig
On Wed, Nov 11, 2020 at 6:38 AM Stanimir Varbanov
 wrote:
>
> After re-design of encoder driver this helper is not needed
> anymore.
>
> Signed-off-by: Stanimir Varbanov 
> ---
>  drivers/media/platform/qcom/venus/helpers.c | 43 -
>  drivers/media/platform/qcom/venus/helpers.h |  1 -
>  2 files changed, 44 deletions(-)
>
> diff --git a/drivers/media/platform/qcom/venus/helpers.c 
> b/drivers/media/platform/qcom/venus/helpers.c
> index 490c026b58a3..51c80417f361 100644
> --- a/drivers/media/platform/qcom/venus/helpers.c
> +++ b/drivers/media/platform/qcom/venus/helpers.c
> @@ -1406,49 +1406,6 @@ void venus_helper_buffers_done(struct venus_inst 
> *inst, unsigned int type,
>  }
>  EXPORT_SYMBOL_GPL(venus_helper_buffers_done);
>
> -void venus_helper_vb2_stop_streaming(struct vb2_queue *q)
> -{
> -   struct venus_inst *inst = vb2_get_drv_priv(q);
> -   struct venus_core *core = inst->core;
> -   int ret;
> -
> -   mutex_lock(>lock);
> -
> -   if (inst->streamon_out & inst->streamon_cap) {
> -   ret = hfi_session_stop(inst);
> -   ret |= hfi_session_unload_res(inst);
> -   ret |= venus_helper_unregister_bufs(inst);
> -   ret |= venus_helper_intbufs_free(inst);
> -   ret |= hfi_session_deinit(inst);
> -
> -   if (inst->session_error || core->sys_error)
> -   ret = -EIO;
> -
> -   if (ret)
> -   hfi_session_abort(inst);
> -
> -   venus_helper_free_dpb_bufs(inst);
> -
> -   venus_pm_load_scale(inst);
> -   INIT_LIST_HEAD(>registeredbufs);
> -   }
> -
> -   venus_helper_buffers_done(inst, V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE,
> - VB2_BUF_STATE_ERROR);
> -   venus_helper_buffers_done(inst, V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE,
> - VB2_BUF_STATE_ERROR);
> -
> -   if (q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE)
> -   inst->streamon_out = 0;
> -   else
> -   inst->streamon_cap = 0;
> -
> -   venus_pm_release_core(inst);
> -
> -   mutex_unlock(>lock);
> -}
> -EXPORT_SYMBOL_GPL(venus_helper_vb2_stop_streaming);
> -
>  int venus_helper_process_initial_cap_bufs(struct venus_inst *inst)
>  {
> struct v4l2_m2m_ctx *m2m_ctx = inst->m2m_ctx;
> diff --git a/drivers/media/platform/qcom/venus/helpers.h 
> b/drivers/media/platform/qcom/venus/helpers.h
> index 231af29667e7..3eae2acbcc8e 100644
> --- a/drivers/media/platform/qcom/venus/helpers.h
> +++ b/drivers/media/platform/qcom/venus/helpers.h
> @@ -20,7 +20,6 @@ int venus_helper_vb2_buf_init(struct vb2_buffer *vb);
>  int venus_helper_vb2_buf_prepare(struct vb2_buffer *vb);
>  void venus_helper_vb2_buf_queue(struct vb2_buffer *vb);
>  void venus_helper_process_buf(struct vb2_buffer *vb);
> -void venus_helper_vb2_stop_streaming(struct vb2_queue *q);
>  int venus_helper_vb2_start_streaming(struct venus_inst *inst);
>  void venus_helper_m2m_device_run(void *priv);
>  void venus_helper_m2m_job_abort(void *priv);
> --
> 2.17.1
>

Reviewed-by: Fritz Koenig 


Re: [PATCH v2 4/8] venus: helpers: Calculate properly compressed buffer size

2020-11-28 Thread Fritz Koenig
On Wed, Nov 11, 2020 at 6:38 AM Stanimir Varbanov
 wrote:
>
> For resolutions below 720p the size of the compressed buffer must
> be bigger. Correct this by checking the resolution when calculating
> buffer size and multiply by four.

I'm confused because the commit message doesn't appear to line up with
the code.  It says multiply by four here, but the code has by eight.

>
> Signed-off-by: Stanimir Varbanov 
> ---
>  drivers/media/platform/qcom/venus/helpers.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/media/platform/qcom/venus/helpers.c 
> b/drivers/media/platform/qcom/venus/helpers.c
> index 688e3e3e8362..490c026b58a3 100644
> --- a/drivers/media/platform/qcom/venus/helpers.c
> +++ b/drivers/media/platform/qcom/venus/helpers.c
> @@ -986,6 +986,8 @@ u32 venus_helper_get_framesz(u32 v4l2_fmt, u32 width, u32 
> height)
>
> if (compressed) {
> sz = ALIGN(height, 32) * ALIGN(width, 32) * 3 / 2 / 2;
> +   if (width < 1280 || height < 720)
> +   sz *= 8;
> return ALIGN(sz, SZ_4K);
> }
>
> --
> 2.17.1
>


Re: [PATCH v2 3/8] venus: hfi_cmds: Allow null buffer address on encoder input

2020-11-28 Thread Fritz Koenig
On Wed, Nov 11, 2020 at 6:38 AM Stanimir Varbanov
 wrote:
>
> Allow null buffer address for encoder input buffers. This will
> be used to send null input buffers to signal end-of-stream.
>
> Signed-off-by: Stanimir Varbanov 
> ---
>  drivers/media/platform/qcom/venus/hfi_cmds.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/media/platform/qcom/venus/hfi_cmds.c 
> b/drivers/media/platform/qcom/venus/hfi_cmds.c
> index 4f7565834469..2affaa2ed70f 100644
> --- a/drivers/media/platform/qcom/venus/hfi_cmds.c
> +++ b/drivers/media/platform/qcom/venus/hfi_cmds.c
> @@ -278,7 +278,7 @@ int pkt_session_etb_encoder(
> struct hfi_session_empty_buffer_uncompressed_plane0_pkt *pkt,
> void *cookie, struct hfi_frame_data *in_frame)
>  {
> -   if (!cookie || !in_frame->device_addr)
> +   if (!cookie)
> return -EINVAL;
>
> pkt->shdr.hdr.size = sizeof(*pkt);
> --
> 2.17.1
>
Reviewed-by: Fritz Koenig 


[PATCH] venus: venc: Add VIDIOC_TRY_ENCODER_CMD support

2020-11-28 Thread Fritz Koenig
V4L2_ENC_CMD_STOP and V4L2_ENC_CMD_START are already
supported.  Add a way to query for support.

---
 drivers/media/platform/qcom/venus/venc.c | 26 
 1 file changed, 26 insertions(+)

diff --git a/drivers/media/platform/qcom/venus/venc.c 
b/drivers/media/platform/qcom/venus/venc.c
index 2ddfeddf98514..e05db3c4bfb24 100644
--- a/drivers/media/platform/qcom/venus/venc.c
+++ b/drivers/media/platform/qcom/venus/venc.c
@@ -507,6 +507,27 @@ static int venc_enum_frameintervals(struct file *file, 
void *fh,
return 0;
 }
 
+static int
+venc_try_encoder_cmd(struct file *file, void *fh, struct v4l2_encoder_cmd *cmd)
+{
+   struct venus_inst *inst = to_inst(file);
+   struct device *dev = inst->core->dev_dec;
+
+   switch (cmd->cmd) {
+   case V4L2_ENC_CMD_STOP:
+   case V4L2_ENC_CMD_START:
+   if (cmd->flags != 0) {
+   dev_dbg(dev, "flags=%u are not supported", cmd->flags);
+   return -EINVAL;
+   }
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static int
 venc_encoder_cmd(struct file *file, void *fh, struct v4l2_encoder_cmd *cmd)
 {
@@ -514,6 +535,10 @@ venc_encoder_cmd(struct file *file, void *fh, struct 
v4l2_encoder_cmd *cmd)
struct hfi_frame_data fdata = {0};
int ret = 0;
 
+   ret = venc_try_encoder_cmd(file, fh, cmd);
+   if (ret < 0)
+   return ret;
+
ret = v4l2_m2m_ioctl_try_encoder_cmd(file, fh, cmd);
if (ret)
return ret;
@@ -575,6 +600,7 @@ static const struct v4l2_ioctl_ops venc_ioctl_ops = {
.vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
.vidioc_unsubscribe_event = v4l2_event_unsubscribe,
.vidioc_encoder_cmd = venc_encoder_cmd,
+   .vidioc_try_encoder_cmd = venc_try_encoder_cmd,
 };
 
 static int venc_set_properties(struct venus_inst *inst)
-- 
2.29.2.454.gaff20da3a2-goog



Re: [PATCH] exit: fix a race in release_task when flushing the dentry

2020-11-28 Thread Greg Kroah-Hartman
On Sat, Nov 28, 2020 at 11:28:53PM +0800, Wen Yang wrote:
> 
> 
> 在 2020/11/28 下午10:05, Greg Kroah-Hartman 写道:
> > On Sat, Nov 28, 2020 at 09:59:09PM +0800, Wen Yang wrote:
> > > 
> > > 
> > > 在 2020/11/28 下午4:06, Greg Kroah-Hartman 写道:
> > > > On Sat, Nov 28, 2020 at 02:47:22PM +0800, Wen Yang wrote:
> > > > > [ Upstream commit 7bc3e6e55acf065500a24621f3b313e7e5998acf ]
> > > > 
> > > > No, that is not this commit at all.
> > > > 
> > > > What are you wanting to have happen here?
> > > > 
> > > > confused,
> > > > 
> > > > greg k-h
> > > > 
> > > 
> > > Thanks.
> > > Let's explain it briefly:
> > > 
> > > The dentries such as /proc//ns/ipc have the DCACHE_OP_DELETE flag, 
> > > they
> > > should be deleted when the process exits.
> > > Suppose the following race appears:
> > > 
> > > release_taskdput
> > > -> proc_flush_task
> > >  ->  dentry->d_op->d_delete(dentry)
> > > -> __exit_signal
> > >  -> dentry->d_lockref.count--  and return.
> > > 
> > > 
> > > In the proc_flush_task function, because another processe is using this
> > > dentry, it cannot be deleted;
> > > In the dput function, d_delete may be executed before __exit_signal (the 
> > > pid
> > > has not been unhashed), so that d_delete returns false and the dentry can
> > > not be deleted.
> > > 
> > > So this dentry is still caches (count is 0), and its parent dentries are
> > > also caches, and those dentries can only be deleted when drop_caches is
> > > manually triggered.
> > > 
> > > 
> > > In the release_task function, we should move proc_flush_task after the
> > > tasklist_lock is released(Just like the commit
> > > 7bc3e6e55acf065500a24621f3b313e7e5998acf did).
> > 
> > I do not understand, is this a patch being submitted for the main kernel
> > tree, or for a stable kernel release?
> > 
> > If stable, please read:
> >  https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> > for how to do this properly.
> > 
> > If main kernel tree, you can't have the "Upstream commit" line in the
> > changelog text as that makes no sense at all.
> 
> 
> Hi,
> This patch is submitted to the stable branches (from 4.9.y
> to 5.6.y).
> 
> This problem can also be solved if the following patch could be ported to
> the stable branch:
> 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc")
> 26dbc60f385f ("proc: Generalize proc_sys_prune_dcache into
> proc_prune_siblings_dcache")
> f90f3cafe8d5 ("proc: Use d_invalidate in proc_prune_siblings_dcache")
> 
> However, the above-mentioned patches modify too much code (more than 100
> lines), and there may also be some undiscovered bugs.
> 
> So the safer method may be to apply this small patch(also ported from the
> equivalent fix already exist in Linus’ tree).
> 
> We will reformat the patch later.

We always prefer to take the original, upstream patches, instead of
one-off changes as almost always, those one-off changes end up being
wrong and hard to work with over time.

So if we need more than one patch to solve this reported problem, that's
fine, can you test the above series of patches and provide a backported
set of them that we can use for this?

thanks,

gre gk-h


Re: [PATCH v2 1/8] venus: hfi: Use correct state in unload resources

2020-11-28 Thread Fritz Koenig
On Wed, Nov 11, 2020 at 6:38 AM Stanimir Varbanov
 wrote:
>
> INST_RELEASE_RESOURCES state is set but not used, correct this
> by enter into INIT state once the unload resources is done.
>
> Signed-off-by: Stanimir Varbanov 
> ---
>  drivers/media/platform/qcom/venus/hfi.c | 2 +-
>  drivers/media/platform/qcom/venus/hfi.h | 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/media/platform/qcom/venus/hfi.c 
> b/drivers/media/platform/qcom/venus/hfi.c
> index 638ed5cfe05e..4c87228e8e1d 100644
> --- a/drivers/media/platform/qcom/venus/hfi.c
> +++ b/drivers/media/platform/qcom/venus/hfi.c
> @@ -388,7 +388,7 @@ int hfi_session_unload_res(struct venus_inst *inst)
> if (ret)
> return ret;
>
> -   inst->state = INST_RELEASE_RESOURCES;
> +   inst->state = INST_INIT;
>
> return 0;
>  }
> diff --git a/drivers/media/platform/qcom/venus/hfi.h 
> b/drivers/media/platform/qcom/venus/hfi.h
> index f25d412d6553..e9c944271cc1 100644
> --- a/drivers/media/platform/qcom/venus/hfi.h
> +++ b/drivers/media/platform/qcom/venus/hfi.h
> @@ -87,7 +87,6 @@ struct hfi_event_data {
>  #define INST_LOAD_RESOURCES4
>  #define INST_START 5
>  #define INST_STOP  6
> -#define INST_RELEASE_RESOURCES 7
>
>  struct venus_core;
>  struct venus_inst;
> --
> 2.17.1
>

Reviewed-by: Fritz Koenig 


Re: [PATCH v2 2/8] venus: helpers: Add a new helper for buffer processing

2020-11-28 Thread Fritz Koenig
On Wed, Nov 11, 2020 at 6:38 AM Stanimir Varbanov
 wrote:
>
> The new helper will be used from encoder and decoder drivers
> to enqueue buffers for processing by firmware.
>
> Signed-off-by: Stanimir Varbanov 
> ---
>  drivers/media/platform/qcom/venus/helpers.c | 20 
>  drivers/media/platform/qcom/venus/helpers.h |  1 +
>  2 files changed, 21 insertions(+)
>
> diff --git a/drivers/media/platform/qcom/venus/helpers.c 
> b/drivers/media/platform/qcom/venus/helpers.c
> index efa2781d6f55..688e3e3e8362 100644
> --- a/drivers/media/platform/qcom/venus/helpers.c
> +++ b/drivers/media/platform/qcom/venus/helpers.c
> @@ -1369,6 +1369,26 @@ void venus_helper_vb2_buf_queue(struct vb2_buffer *vb)
>  }
>  EXPORT_SYMBOL_GPL(venus_helper_vb2_buf_queue);
>
> +void venus_helper_process_buf(struct vb2_buffer *vb)
> +{
> +   struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
> +   struct venus_inst *inst = vb2_get_drv_priv(vb->vb2_queue);
> +   int ret;
> +
> +   cache_payload(inst, vb);
> +
> +   if (vb2_start_streaming_called(vb->vb2_queue)) {
> +   ret = is_buf_refed(inst, vbuf);
> +   if (ret)
> +   return;
> +
> +   ret = session_process_buf(inst, vbuf);
> +   if (ret)
> +   return_buf_error(inst, vbuf);
> +   }
> +}
> +EXPORT_SYMBOL_GPL(venus_helper_process_buf);
> +
>  void venus_helper_buffers_done(struct venus_inst *inst, unsigned int type,
>enum vb2_buffer_state state)
>  {
> diff --git a/drivers/media/platform/qcom/venus/helpers.h 
> b/drivers/media/platform/qcom/venus/helpers.h
> index f36c9f717798..231af29667e7 100644
> --- a/drivers/media/platform/qcom/venus/helpers.h
> +++ b/drivers/media/platform/qcom/venus/helpers.h
> @@ -19,6 +19,7 @@ void venus_helper_buffers_done(struct venus_inst *inst, 
> unsigned int type,
>  int venus_helper_vb2_buf_init(struct vb2_buffer *vb);
>  int venus_helper_vb2_buf_prepare(struct vb2_buffer *vb);
>  void venus_helper_vb2_buf_queue(struct vb2_buffer *vb);
> +void venus_helper_process_buf(struct vb2_buffer *vb);
>  void venus_helper_vb2_stop_streaming(struct vb2_queue *q);
>  int venus_helper_vb2_start_streaming(struct venus_inst *inst);
>  void venus_helper_m2m_device_run(void *priv);
> --
> 2.17.1
>

Reviewed-by: Fritz Koenig 


[tip:master] BUILD SUCCESS c1c38fd953ac77525dc0f302c9f69749ce4832d7

2020-11-28 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git  master
branch HEAD: c1c38fd953ac77525dc0f302c9f69749ce4832d7  Merge branch 'core/entry'

elapsed time: 724m

configs tested: 93
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
m68km5307c3_defconfig
arm palmz72_defconfig
armcerfcube_defconfig
sh ecovec24_defconfig
shsh7757lcr_defconfig
powerpc  chrp32_defconfig
openrisc alldefconfig
um   x86_64_defconfig
arm  tango4_defconfig
umkunit_defconfig
arm ebsa110_defconfig
mips  rm200_defconfig
armmagician_defconfig
arm  exynos_defconfig
c6xevmc6457_defconfig
powerpc xes_mpc85xx_defconfig
riscvalldefconfig
um i386_defconfig
powerpc  ppc6xx_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a004-20201129
i386 randconfig-a003-20201129
i386 randconfig-a002-20201129
i386 randconfig-a005-20201129
i386 randconfig-a001-20201129
i386 randconfig-a006-20201129
x86_64   randconfig-a015-20201129
x86_64   randconfig-a011-20201129
x86_64   randconfig-a016-20201129
x86_64   randconfig-a014-20201129
x86_64   randconfig-a012-20201129
x86_64   randconfig-a013-20201129
i386 randconfig-a012-20201129
i386 randconfig-a013-20201129
i386 randconfig-a011-20201129
i386 randconfig-a016-20201129
i386 randconfig-a014-20201129
i386 randconfig-a015-20201129
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
riscvallmodconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  kexec

clang tested configs:
x86_64   randconfig-a003-20201129
x86_64   randconfig-a006-20201129
x86_64   randconfig-a004-20201129
x86_64   randconfig-a005-20201129
x86_64   randconfig-a002-20201129
x86_64   randconfig-a001-20201129

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


[PATCH v2] phy: rockchip: set pulldown for strobe line in dts

2020-11-28 Thread Chris Ruehl
This patchset add support to set the strobe line pulldown via dt property

2 files modified:
drivers/phy/rockchip/phy-rockchip-emmc.c
Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt

Signed-off-by: Chris Ruehl 
---
v2:
- Fix issues show with checkpatch --strict
- Add patch to update the Documentation


[PATCH v2 1/2] phy: rockchip: set pulldown for strobe line in dts

2020-11-28 Thread Chris Ruehl
This patch add support to set the internal pulldown via dt property
and allow simplify the board design for the trace from emmc-phy to
the eMMC chipset.
Default to not set the pull-down.

This patch was inspired from the 4.4 tree of the
Rockchip SDK, where it is enabled unconditional.
The patch had been tested with our rk3399 customized board.

Signed-off-by: Chris Ruehl 
---
 drivers/phy/rockchip/phy-rockchip-emmc.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/phy/rockchip/phy-rockchip-emmc.c 
b/drivers/phy/rockchip/phy-rockchip-emmc.c
index 2dc19ddd120f..48e2d75b1004 100644
--- a/drivers/phy/rockchip/phy-rockchip-emmc.c
+++ b/drivers/phy/rockchip/phy-rockchip-emmc.c
@@ -67,6 +67,10 @@
 #define PHYCTRL_OTAPDLYENA_SHIFT   0xb
 #define PHYCTRL_OTAPDLYSEL_MASK0xf
 #define PHYCTRL_OTAPDLYSEL_SHIFT   0x7
+#define PHYCTRL_REN_STRB_DISABLE   0x0
+#define PHYCTRL_REN_STRB_ENABLE0x1
+#define PHYCTRL_REN_STRB_MASK  0x1
+#define PHYCTRL_REN_STRB_SHIFT 0x9
 
 #define PHYCTRL_IS_CALDONE(x) \
x) >> PHYCTRL_CALDONE_SHIFT) & \
@@ -80,6 +84,7 @@ struct rockchip_emmc_phy {
struct regmap   *reg_base;
struct clk  *emmcclk;
unsigned int drive_impedance;
+   unsigned int enable_strobe_pulldown;
 };
 
 static int rockchip_emmc_phy_power(struct phy *phy, bool on_off)
@@ -295,6 +300,13 @@ static int rockchip_emmc_phy_power_on(struct phy *phy)
   PHYCTRL_OTAPDLYSEL_MASK,
   PHYCTRL_OTAPDLYSEL_SHIFT));
 
+   /* Internal pull-down for strobe line */
+   regmap_write(rk_phy->reg_base,
+rk_phy->reg_offset + GRF_EMMCPHY_CON2,
+HIWORD_UPDATE(rk_phy->enable_strobe_pulldown,
+  PHYCTRL_REN_STRB_MASK,
+  PHYCTRL_REN_STRB_SHIFT));
+
/* Power up emmc phy analog blocks */
return rockchip_emmc_phy_power(phy, PHYCTRL_PDB_PWR_ON);
 }
@@ -359,10 +371,14 @@ static int rockchip_emmc_phy_probe(struct platform_device 
*pdev)
rk_phy->reg_offset = reg_offset;
rk_phy->reg_base = grf;
rk_phy->drive_impedance = PHYCTRL_DR_50OHM;
+   rk_phy->enable_strobe_pulldown = PHYCTRL_REN_STRB_DISABLE;
 
if (!of_property_read_u32(dev->of_node, "drive-impedance-ohm", ))
rk_phy->drive_impedance = convert_drive_impedance_ohm(pdev, 
val);
 
+   if (of_property_read_bool(dev->of_node, "enable-strobe-pulldown"))
+   rk_phy->enable_strobe_pulldown = PHYCTRL_REN_STRB_ENABLE;
+
generic_phy = devm_phy_create(dev, dev->of_node, );
if (IS_ERR(generic_phy)) {
dev_err(dev, "failed to create PHY\n");
-- 
2.20.1



[PATCH v2 2/2] devicetree: phy: rockchip-emmc: pulldown property

2020-11-28 Thread Chris Ruehl
Update the documentation and add the bool property
enable-strobe-pulldown used to enable the internal pull-down for the
strobe line.

Signed-off-by: Chris Ruehl 
---
 Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt 
b/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt
index e728786f21e0..3e4d2d79a65d 100644
--- a/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt
+++ b/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt
@@ -16,6 +16,8 @@ Optional properties:
  - drive-impedance-ohm: Specifies the drive impedance in Ohm.
 Possible values are 33, 40, 50, 66 and 100.
 If not set, the default value of 50 will be applied.
+ - enable-strobe-pulldown: Enable internal pull-down for the strobe line.
+   If not set, pull-down is not used.
 
 Example:
 
-- 
2.20.1



Re: [PATCH] keys: remove trailing semicolon in macro definition

2020-11-28 Thread Joe Perches
On Sun, 2020-11-29 at 06:45 +0200, Jarkko Sakkinen wrote:
> On Fri, Nov 27, 2020 at 11:15:43AM -0800, t...@redhat.com wrote:
> > From: Tom Rix 
> > 
> > The macro use will already have a semicolon.
> > 
> > Signed-off-by: Tom Rix 
> 
> I'm in-between whether this is worth of merging. The commit message
> does not help with that decision too much.

It seems worthy of merging to me modulo whatver improvement is desired in
the commit message.

There are 3 existing uses of request_key_net.  All have a trailing semicolon.
There is 1 existing use of request_key_net_rcu.  It has a trailing semicolon.

No object change should occur.




Re: [PATCH] scsi: ses: Fix crash caused by kfree an invalid pointer

2020-11-28 Thread Douglas Gilbert

On 2020-11-28 6:27 p.m., James Bottomley wrote:

On Sat, 2020-11-28 at 20:23 +0800, Ding Hui wrote:

We can get a crash when disconnecting the iSCSI session,
the call trace like this:

   [2a00fb70] kfree at 0830e224
   [2a00fba0] ses_intf_remove at 01f200e4
   [2a00fbd0] device_del at 086b6a98
   [2a00fc50] device_unregister at 086b6d58
   [2a00fc70] __scsi_remove_device at 0870608c
   [2a00fca0] scsi_remove_device at 08706134
   [2a00fcc0] __scsi_remove_target at 087062e4
   [2a00fd10] scsi_remove_target at 087064c0
   [2a00fd70] __iscsi_unbind_session at 01c872c4
   [2a00fdb0] process_one_work at 0810f35c
   [2a00fe00] worker_thread at 0810f648
   [2a00fe70] kthread at 08116e98

In ses_intf_add, components count could be 0, and kcalloc 0 size
scomp,
but not saved in edev->component[i].scratch

In this situation, edev->component[0].scratch is an invalid pointer,
when kfree it in ses_intf_remove_enclosure, a crash like above would
happen
The call trace also could be other random cases when kfree cannot
catch
the invalid pointer

We should not use edev->component[] array when the components count
is 0
We also need check index when use edev->component[] array in
ses_enclosure_data_process

Tested-by: Zeng Zhicong 
Cc: stable  # 2.6.25+
Signed-off-by: Ding Hui 


This doesn't really look to be the right thing to do: an enclosure
which has no component can't usefully be controlled by the driver since
there's nothing for it to do, so what we should do in this situation is
refuse to attach like the proposed patch below.

It does seem a bit odd that someone would build an enclosure that
doesn't enclose anything, so would you mind running

sg_ses -e


'-e' is the short form of '--enumerate'. That will report the names
and abbreviations of the diagnostic pages that the utility itself
knows about (and supports). It won't show anything specific about
the environment that sg_ses is executed in.

You probably meant:
  sg_ses 

Examples of the likely forms are:
  sg_ses /dev/bsg/1:0:0:0
  sg_ses /dev/sg2
  sg_ses /dev/ses0

This from a nearby machine:

$ lsscsi -gs
[3:0:0:0]  disk  ATA  Samsung SSD 850  1B6Q  /dev/sda   /dev/sg0120GB
[4:0:0:0]  disk  IBM-207x HUSMM8020ASS20   J4B6  /dev/sdc   /dev/sg2200GB
[4:0:1:0]  disk  ATA  INTEL SSDSC2KW25 003C  /dev/sdd   /dev/sg3256GB
[4:0:2:0]  disk  SEAGATE  ST1NM0096E005  /dev/sde   /dev/sg4   10.0TB
[4:0:3:0]  enclosu Areca Te ARC-802801.37.69 0137  -/dev/sg5-
[4:0:4:0]  enclosu IntelRES2SV2400d00  -/dev/sg6-
[7:0:0:0]  diskKingston DataTravelerMini PMAP  /dev/sdb /dev/sg1   1.03GB
[N:0:0:1]  diskWDC WDS256G1X0C-00ENX0__1   /dev/nvme0n1  -  256GB

# sg_ses /dev/sg5
  Areca Te  ARC-802801.37.69  0137
Supported diagnostic pages:
  Supported Diagnostic Pages [sdp] [0x0]
  Configuration (SES) [cf] [0x1]
  Enclosure Status/Control (SES) [ec,es] [0x2]
  String In/Out (SES) [str] [0x4]
  Threshold In/Out (SES) [th] [0x5]
  Element Descriptor (SES) [ed] [0x7]
  Additional Element Status (SES-2) [aes] [0xa]
  Supported SES Diagnostic Pages (SES-2) [ssp] [0xd]
  Download Microcode (SES-2) [dm] [0xe]
  Subenclosure Nickname (SES-2) [snic] [0xf]
  Protocol Specific (SAS transport) [] [0x3f]

# sg_ses -p cf /dev/sg5
  Areca Te  ARC-802801.37.69  0137
Configuration diagnostic page:
  number of secondary subenclosures: 0
  generation code: 0x0
  enclosure descriptor list
Subenclosure identifier: 0 [primary]
  relative ES process id: 1, number of ES processes: 1
  number of type descriptor headers: 9
  enclosure logical identifier (hex): d5b401503fc0ec16
  enclosure vendor: Areca Te  product: ARC-802801.37.69  rev: 0137
  vendor-specific data:
11 22 33 44 55 00 00 00 ."3DU...

  type descriptor header and text list
Element type: Array device slot, subenclosure id: 0
  number of possible elements: 24
  text: ArrayDevicesInSubEnclsr0
Element type: Enclosure, subenclosure id: 0
  number of possible elements: 1
  text: EnclosureElementInSubEnclsr0
Element type: SAS expander, subenclosure id: 0
  number of possible elements: 1
  text: SAS Expander
Element type: Cooling, subenclosure id: 0
  number of possible elements: 5
  text: CoolingElementInSubEnclsr0
Element type: Temperature sensor, subenclosure id: 0
  number of possible elements: 2
  text: TempSensorsInSubEnclsr0
Element type: Voltage sensor, subenclosure id: 0
  number of possible elements: 2
  text: VoltageSensorsInSubEnclsr0
Element type: SAS connector, subenclosure id: 0
  number of possible elements: 3
  text: ConnectorsInSubEnclsr0
Element type: Power supply, subenclosure id: 0
  number of possible 

Re: [PATCH] nvmet: Kconfig: Fix spelling mistake "incuding" -> "including"

2020-11-28 Thread Chaitanya Kulkarni
On 11/26/20 14:40, Colin King wrote:
> From: Colin Ian King 
>
> There is a spelling mistake in the Kconfig help text. Fix it.
>
> Signed-off-by: Colin Ian King 
Looks good.

Reviewed-by: Chaitanya Kulkarni 




[rcu:rcu/next] BUILD SUCCESS 5ca88db79d8d7d8fa645caa17173592ca22003b2

2020-11-28 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git  rcu/next
branch HEAD: 5ca88db79d8d7d8fa645caa17173592ca22003b2  torture: Add 
--kcsan-kmake-arg to torture.sh for KCSAN

elapsed time: 722m

configs tested: 94
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
nds32alldefconfig
arm   aspeed_g4_defconfig
powerpc   motionpro_defconfig
powerpc powernv_defconfig
sh  r7780mp_defconfig
arm  prima2_defconfig
sh  landisk_defconfig
nios2 10m50_defconfig
ia64generic_defconfig
m68k   m5275evb_defconfig
mips  loongson3_defconfig
mips  maltasmvp_defconfig
arm   omap2plus_defconfig
sh  r7785rp_defconfig
mips   capcella_defconfig
mipsnlm_xlp_defconfig
powerpc pq2fads_defconfig
armshmobile_defconfig
arm  exynos_defconfig
powerpccell_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a004-20201129
i386 randconfig-a003-20201129
i386 randconfig-a002-20201129
i386 randconfig-a005-20201129
i386 randconfig-a001-20201129
i386 randconfig-a006-20201129
x86_64   randconfig-a015-20201129
x86_64   randconfig-a011-20201129
x86_64   randconfig-a016-20201129
x86_64   randconfig-a014-20201129
x86_64   randconfig-a012-20201129
x86_64   randconfig-a013-20201129
i386 randconfig-a012-20201129
i386 randconfig-a013-20201129
i386 randconfig-a011-20201129
i386 randconfig-a016-20201129
i386 randconfig-a014-20201129
i386 randconfig-a015-20201129
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
riscvallmodconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  kexec

clang tested configs:
x86_64   randconfig-a003-20201129
x86_64   randconfig-a006-20201129
x86_64   randconfig-a004-20201129
x86_64   randconfig-a005-20201129
x86_64   randconfig-a002-20201129
x86_64   randconfig-a001-20201129

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Re: [PATCH] keys: remove trailing semicolon in macro definition

2020-11-28 Thread Jarkko Sakkinen
On Fri, Nov 27, 2020 at 11:15:43AM -0800, t...@redhat.com wrote:
> From: Tom Rix 
> 
> The macro use will already have a semicolon.
> 
> Signed-off-by: Tom Rix 

I'm in-between whether this is worth of merging. The commit message
does not help with that decision too much.

/Jarkko


Re: [PATCH] mm/memcg: bail out early when !memcg in mem_cgroup_lruvec

2020-11-28 Thread Alex Shi



在 2020/11/28 下午12:02, Andrew Morton 写道:
> On Fri, 27 Nov 2020 11:08:35 +0800 Alex Shi  
> wrote:
> 
>> Sometime, we use NULL memcg in mem_cgroup_lruvec(memcg, pgdat)
>> so we could get out early in the situation to avoid useless checking.
>>
>> Also warning if both parameter are NULL.
> 
> Why do you think a warning is needed here?

Uh, Consider there are no problem for long time, it could be saved.

> 
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -613,14 +613,13 @@ static inline struct lruvec *mem_cgroup_lruvec(struct 
>> mem_cgroup *memcg,
>>  struct mem_cgroup_per_node *mz;
>>  struct lruvec *lruvec;
>>  
>> -if (mem_cgroup_disabled()) {
>> +VM_WARN_ON_ONCE(!memcg && !pgdat);
>> +
>> +if (mem_cgroup_disabled() || !memcg) {
>>  lruvec = >__lruvec;
>>  goto out;
>>  }
>>  
>> -if (!memcg)
>> -memcg = root_mem_cgroup;
>> -
> 
> This change isn't obviously equivalent, is it?

If !memcg, the root_mem_cgroup will still lead the lruvec to a pgdat
same as parameter.

> 
>>  mz = mem_cgroup_nodeinfo(memcg, pgdat->node_id);
>>  lruvec = >lruvec;
>>  out:
> 
> And the resulting code is awkward:
> 
>   if (mem_cgroup_disabled() || !memcg) {
>   lruvec = >__lruvec;
>   goto out;
>   }
> 
>   mz = mem_cgroup_nodeinfo(memcg, pgdat->node_id);
>   lruvec = >lruvec;
> out:
> 
> 
> could be
> 
>   if (mem_cgroup_disabled() || !memcg) {
>   lruvec = >__lruvec;
>   } else {
>   mem_cgroup_per_node mz;
> 
>   mz = mem_cgroup_nodeinfo(memcg, pgdat->node_id);
>   lruvec = >lruvec;
>   }
> 

Right. remove 'goto' is better for understander.

So, is the following patch ok?

>From 225f29e03b40a7cbaeb4e3bb76f8efbcd7d648a2 Mon Sep 17 00:00:00 2001
From: Alex Shi 
Date: Wed, 25 Nov 2020 14:06:33 +0800
Subject: [PATCH v2] mm/memcg: bail out early when !memcg in mem_cgroup_lruvec

Sometime, we use NULL memcg in mem_cgroup_lruvec(memcg, pgdat)
so we could get out early in the situation to avoid useless checking.

Polished as Andrew Morton's suggestion.

Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Shakeel Butt 
Cc: Roman Gushchin 
Cc: Lorenzo Stoakes 
Cc: Stephen Rothwell 
Cc: Alexander Duyck 
Cc: Yafang Shao 
Cc: Wei Yang 
Cc: linux-kernel@vger.kernel.org
---
 include/linux/memcontrol.h | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 3e6a1df3bdb9..4ff2ffe2b73d 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -610,20 +610,17 @@ mem_cgroup_nodeinfo(struct mem_cgroup *memcg, int nid)
 static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
   struct pglist_data *pgdat)
 {
-   struct mem_cgroup_per_node *mz;
struct lruvec *lruvec;
 
-   if (mem_cgroup_disabled()) {
+   if (mem_cgroup_disabled() || !memcg) {
lruvec = >__lruvec;
-   goto out;
-   }
+   } else {
+   struct mem_cgroup_per_node *mz;
 
-   if (!memcg)
-   memcg = root_mem_cgroup;
+   mz = mem_cgroup_nodeinfo(memcg, pgdat->node_id);
+   lruvec = >lruvec;
+   }
 
-   mz = mem_cgroup_nodeinfo(memcg, pgdat->node_id);
-   lruvec = >lruvec;
-out:
/*
 * Since a node can be onlined after the mem_cgroup was created,
 * we have to be prepared to initialize lruvec->pgdat here;
-- 
2.29.GIT



Re: [PATCH 1/5] PCI/DPC: Ignore devices with no AER Capability

2020-11-28 Thread Kuppuswamy, Sathyanarayanan




On 11/28/20 3:25 PM, Bjorn Helgaas wrote:

On Sat, Nov 28, 2020 at 01:56:23PM -0800, Kuppuswamy, Sathyanarayanan wrote:

On 11/28/20 1:53 PM, Bjorn Helgaas wrote:

On Sat, Nov 28, 2020 at 01:49:46PM -0800, Kuppuswamy, Sathyanarayanan wrote:

On 11/28/20 12:24 PM, Bjorn Helgaas wrote:

On Wed, Nov 25, 2020 at 06:01:57PM -0800, Kuppuswamy, Sathyanarayanan wrote:

On 11/25/20 5:18 PM, Bjorn Helgaas wrote:

From: Bjorn Helgaas 

Downstream Ports may support DPC regardless of whether they support AER
(see PCIe r5.0, sec 6.2.10.2).  Previously, if the user booted with
"pcie_ports=dpc-native", it was possible for dpc_probe() to succeed even if
the device had no AER Capability, but dpc_get_aer_uncorrect_severity()
depends on the AER Capability.

dpc_probe() previously failed if:

  !pcie_aer_is_native(pdev) && !pcie_ports_dpc_native
  !(pcie_aer_is_native() || pcie_ports_dpc_native)# by De Morgan's law

so it succeeded if:

  pcie_aer_is_native() || pcie_ports_dpc_native

Fail dpc_probe() if the device has no AER Capability.

Signed-off-by: Bjorn Helgaas 
Cc: Olof Johansson 
---
 drivers/pci/pcie/dpc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index e05aba86a317..ed0dbc43d018 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -287,6 +287,9 @@ static int dpc_probe(struct pcie_device *dev)
int status;
u16 ctl, cap;
+   if (!pdev->aer_cap)
+   return -ENOTSUPP;

Don't we check aer_cap support in drivers/pci/pcie/portdrv_core.c ?

We don't enable DPC service, if AER service is not enabled. And AER
service is only enabled if AER capability is supported.

So dpc_probe() should not happen if AER capability is not supported?


I don't think that's always true.  If I'm reading this right, we have
this:

 get_port_device_capability(...)
 {
 #ifdef CONFIG_PCIEAER
   if (dev->aer_cap && ...)
 services |= PCIE_PORT_SERVICE_AER;
 #endif

   if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
   pci_aer_available() &&
   (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
 services |= PCIE_PORT_SERVICE_DPC;
 }

and in the case where:

 - CONFIG_PCIEAER=y
 - booted with "pcie_ports=dpc-native" (pcie_ports_dpc_native is true)
 - "dev" has no AER capability
 - "dev" has DPC capability

I think we do enable PCIE_PORT_SERVICE_DPC.

Got it. But further looking into it, I am wondering whether
we should keep this dependency? Currently we just use it to
dump the error information. Do we need to create dependency
between DPC and AER (which is functionality not dependent) just
to see more details about the error?


That's a good question, but I don't really want to get into the actual
operation of the AER and DPC drivers in this series, so maybe
something we should explore later.



In that case, can you move this check to
drivers/pci/pcie/portdrv_core.c?  I don't see the point of
distributed checks in both get_port_device_capability() and
dpc_probe().


I totally agree that these distributed checks are terrible, but my
long-term hope is to get rid of portdrv and handle these "services"
more like we handle other capabilities.  For example, maybe we can
squash dpc_probe() into pci_dpc_init(), so I'd actually like to move
things from get_port_device_capability() into dpc_probe().

Removing the service driver model will be a major overhaul. It would
affect even the error recovery drivers. You can find motivation
for service drivers in Documentation/PCI/pciebus-howto.rst.

But till we fix this part, I recommend grouping all dependency checks
to one place (either dpc_probe() or portdrv service driver).




--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper

2020-11-28 Thread Sasha Levin

On Wed, Nov 25, 2020 at 07:08:54PM +0100, Paolo Bonzini wrote:

On 25/11/20 19:01, Sasha Levin wrote:

On Wed, Nov 25, 2020 at 06:48:21PM +0100, Paolo Bonzini wrote:

On 25/11/20 16:35, Sasha Levin wrote:

From: Mike Christie 

[ Upstream commit 18f1becb6948cd411fd01968a0a54af63732e73c ]

Move code to parse lun from req's lun_buf to helper, so tmf code
can use it in the next patch.

Signed-off-by: Mike Christie 
Reviewed-by: Paolo Bonzini 
Acked-by: Jason Wang 
Link: 
https://lore.kernel.org/r/1604986403-4931-5-git-send-email-michael.chris...@oracle.com

Signed-off-by: Michael S. Tsirkin 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Sasha Levin 


This doesn't seem like stable material, does it?


It went in as a dependency for efd838fec17b ("vhost scsi: Add support
for LUN resets."), which is the next patch.


Which doesn't seem to be suitable for stable either...  Patch 3/5 in 


Why not? It was sent as a fix to Linus.

the series might be (vhost scsi: fix cmd completion race), so I can 
understand including 1/5 and 2/5 just in case, but not the rest.  Does 
the bot not understand diffstats?


Not on their own, no. What's wrong with the diffstats?

--
Thanks,
Sasha


Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-11-28 Thread Andy Lutomirski
On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin  wrote:
>
> On big systems, the mm refcount can become highly contented when doing
> a lot of context switching with threaded applications (particularly
> switching between the idle thread and an application thread).
>
> Abandoning lazy tlb slows switching down quite a bit in the important
> user->idle->user cases, so so instead implement a non-refcounted scheme
> that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down
> any remaining lazy ones.
>
> Shootdown IPIs are some concern, but they have not been observed to be
> a big problem with this scheme (the powerpc implementation generated
> 314 additional interrupts on a 144 CPU system during a kernel compile).
> There are a number of strategies that could be employed to reduce IPIs
> if they turn out to be a problem for some workload.

I'm still wondering whether we can do even better.

The IPIs you're doing aren't really necessary -- we don't
fundamentally need to free the pagetables immediately when all
non-lazy users are done with them (and current kernels don't) -- what
we need to do is to synchronize all the bookkeeping.  So, with
adequate locking (famous last words), a couple of alternative schemes
ought to be possible.

a) Instead of sending an IPI, increment mm_count on behalf of the
remote CPU and do something to make sure that the remote CPU knows we
did this on its behalf.  Then free the mm when mm_count hits zero.

b) Treat mm_cpumask as part of the refcount.  Add one to mm_count when
an mm is created.  Once mm_users hits zero, whoever clears the last
bit in mm_cpumask is responsible for decrementing a single reference
from mm_count, and whoever sets it to zero frees the mm.

Version (b) seems fairly straightforward to implement -- add RCU
protection and a atomic_t special_ref_cleared (initially 0) to struct
mm_struct itself.  After anyone clears a bit to mm_cpumask (which is
already a barrier), they read mm_users.  If it's zero, then they scan
mm_cpumask and see if it's empty.  If it is, they atomically swap
special_ref_cleared to 1.  If it was zero before the swap, they do
mmdrop().  I can imagine some tweaks that could make this a big
faster, at least in the limit of a huge number of CPUs.

Version (a) seems a bit harder to reason about.  Maybe it could be
done like this.  Add a percpu variable mm_with_extra_count.  This
variable can be NULL, but it can also be an mm that has an extra
reference on behalf of the cpu in question.

__mmput scans mm_cpumask and, for each cpu in the mask, mmgrabs the mm
and cmpxchgs that cpu's mm_with_extra_count from NULL to mm.  If it
succeeds, then we win.  If it fails, further thought is required, and
maybe we have to send an IPI, although maybe some other cleverness is
possible.  Any time a CPU switches mms, it does atomic swaps
mm_with_extra_count to NULL and mmdrops whatever the mm was.  (Maybe
it needs to check the mm isn't equal to the new mm, although it would
be quite bizarre for this to happen.)  Other than these mmgrab and
mmdrop calls, the mm switching code doesn't mmgrab or mmdrop at all.


Version (a) seems like it could have excellent performance.


*However*, I think we should consider whether we want to do something
even bigger first.  Even with any of these changes, we still need to
maintain mm_cpumask(), and that itself can be a scalability problem.
I wonder if we can solve this problem too.  Perhaps the switch_mm()
paths could only ever set mm_cpumask bits, and anyone who would send
an IPI because a bit is set in mm_cpumask would first check some
percpu variable (cpu_rq(cpu)->something?  an entirely new variable) to
see if the bit in mm_cpumask is spurious.  Or perhaps mm_cpumask could
be split up across multiple cachelines, one per node.

We should keep the recent lessons from Apple in mind, though: x86 is a
dinosaur.  The future of atomics is going to look a lot more like
ARM's LSE than x86's rather anemic set.  This means that mm_cpumask
operations won't need to be full barriers forever, and we might not
want to take the implied full barriers in set_bit() and clear_bit()
for granted.

--Andy


mapping.c:undefined reference to `phys_to_dma'

2020-11-28 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   aae5ab854e38151e69f261dbf0e3b7e396403178
commit: 5ceda74093a5c1c3f42a02b894df031f3bbc9af1 dma-direct: rename and cleanup 
__phys_to_dma
date:   3 months ago
config: mips-randconfig-r031-20201129 (attached as .config)
compiler: mips64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5ceda74093a5c1c3f42a02b894df031f3bbc9af1
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout 5ceda74093a5c1c3f42a02b894df031f3bbc9af1
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   mips64-linux-ld: kernel/dma/mapping.o: in function `dma_map_page_attrs':
>> mapping.c:(.text+0x10c): undefined reference to `phys_to_dma'
   mips64-linux-ld: kernel/dma/mapping.o: in function `dma_unmap_page_attrs':
   mapping.c:(.text+0x23c): undefined reference to `dma_to_phys'
   mips64-linux-ld: mapping.c:(.text+0x274): undefined reference to 
`dma_to_phys'
   mips64-linux-ld: kernel/dma/mapping.o: in function `dma_sync_single_for_cpu':
   mapping.c:(.text+0x3d4): undefined reference to `dma_to_phys'
   mips64-linux-ld: kernel/dma/direct.o: in function 
`dma_direct_get_required_mask':
>> direct.c:(.text+0xe4): undefined reference to `phys_to_dma'
   mips64-linux-ld: kernel/dma/direct.o: in function `dma_direct_alloc':
   direct.c:(.text+0x210): undefined reference to `dma_to_phys'
>> mips64-linux-ld: direct.c:(.text+0x2d8): undefined reference to `phys_to_dma'
   mips64-linux-ld: direct.c:(.text+0x354): undefined reference to `phys_to_dma'
   mips64-linux-ld: direct.c:(.text+0x42c): undefined reference to `phys_to_dma'
   mips64-linux-ld: direct.c:(.text+0x4d8): undefined reference to `dma_to_phys'
   mips64-linux-ld: kernel/dma/direct.o: in function `dma_direct_free':
   direct.c:(.text+0x5d8): undefined reference to `dma_to_phys'
   mips64-linux-ld: kernel/dma/direct.o: in function `dma_direct_map_sg':
   direct.c:(.text+0x764): undefined reference to `phys_to_dma'
   mips64-linux-ld: kernel/dma/direct.o: in function `dma_direct_get_sgtable':
   direct.c:(.text+0xa00): undefined reference to `dma_to_phys'
   mips64-linux-ld: kernel/dma/direct.o: in function `dma_direct_mmap':
   direct.c:(.text+0xb10): undefined reference to `dma_to_phys'
   mips64-linux-ld: kernel/dma/direct.o: in function `dma_direct_supported':
   direct.c:(.text+0xbf8): undefined reference to `phys_to_dma'
   mips64-linux-ld: kernel/dma/direct.o: in function `dma_direct_need_sync':
   direct.c:(.text+0xc28): undefined reference to `dma_to_phys'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH 0/9] keys: Miscellaneous fixes

2020-11-28 Thread Jarkko Sakkinen
On Fri, Nov 27, 2020 at 04:45:24PM +, David Howells wrote:
> 
> Hi Jarkko,
> 
> I've collected together a bunch of minor keyrings fixes, but I'm not sure
> there's anything that can't wait for the next merge window.
> 
> The patches can be found on the following branch:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=keys-fixes
> 
> David


I think that looks good, thank you. I'm sending PR next week. Should I
bundle those to that?

/Jarkko


> ---
> Alexander A. Klimov (1):
>   encrypted-keys: Replace HTTP links with HTTPS ones
> 
> Denis Efremov (1):
>   security/keys: use kvfree_sensitive()
> 
> Gabriel Krisman Bertazi (1):
>   watch_queue: Drop references to /dev/watch_queue
> 
> Gustavo A. R. Silva (1):
>   security: keys: Fix fall-through warnings for Clang
> 
> Jann Horn (1):
>   keys: Remove outdated __user annotations
> 
> Krzysztof Kozlowski (1):
>   KEYS: asymmetric: Fix kerneldoc
> 
> Randy Dunlap (2):
>   security: keys: delete repeated words in comments
>   crypto: asymmetric_keys: fix some comments in pkcs7_parser.h
> 
> Tom Rix (1):
>   KEYS: remove redundant memset
> 
> 
>  Documentation/security/keys/core.rst |  4 ++--
>  crypto/asymmetric_keys/asymmetric_type.c |  6 --
>  crypto/asymmetric_keys/pkcs7_parser.h|  5 ++---
>  include/keys/encrypted-type.h|  2 +-
>  samples/Kconfig  |  2 +-
>  samples/watch_queue/watch_test.c |  2 +-
>  security/keys/Kconfig|  8 
>  security/keys/big_key.c  |  9 +++--
>  security/keys/keyctl.c   |  2 +-
>  security/keys/keyctl_pkey.c  |  2 --
>  security/keys/keyring.c  | 10 +-
>  11 files changed, 24 insertions(+), 28 deletions(-)
> 
> 
> 


Re: [PATCH v3] char: tpm: add i2c driver for cr50

2020-11-28 Thread Jarkko Sakkinen
On Fri, Nov 27, 2020 at 01:01:09PM +0200, Adrian Ratiu wrote:
> From: "dlau...@chromium.org" 
> 
> Add TPM 2.0 compatible I2C interface for chips with cr50 firmware.
> 
> The firmware running on the currently supported H1 MCU requires a
> special driver to handle its specific protocol, and this makes it
> unsuitable to use tpm_tis_core_* and instead it must implement the
> underlying TPM protocol similar to the other I2C TPM drivers.
> 
> - All 4 bytes of status register must be read/written at once.
> - FIFO and burst count is limited to 63 and must be drained by AP.
> - Provides an interrupt to indicate when read response data is ready
> and when the TPM is finished processing write data.
> 
> This driver is based on the existing infineon I2C TPM driver, which
> most closely matches the cr50 i2c protocol behavior.
> 
> Cc: Helen Koike 
> Cc: Jarkko Sakkinen 
> Cc: Ezequiel Garcia 
> Signed-off-by: Duncan Laurie 
> [swb...@chromium.org: Depend on i2c even if it's a module, replace
> boilier plate with SPDX tag, drop asm/byteorder.h include, simplify
> return from probe]
> Signed-off-by: Stephen Boyd 
> Signed-off-by: Fabien Lahoudere 
> Signed-off-by: Adrian Ratiu 
> ---
> Changes in v3:
>   - Misc small fixes (typos/renamings, comments, default values)
>   - Moved i2c_write memcpy before lock to minimize critical section (Helen)
>   - Dropped priv->locality because it stored a constant value (Helen)
>   - Many kdoc, function name and style fixes in general (Jarkko)
>   - Kept the force release enum instead of defines or bool (Ezequiel)
> 
> Changes in v2:
>   - Various small fixes all over (reorder includes, MAX_BUFSIZE, comments, 
> etc)
>   - Reworked return values of i2c_wait_tpm_ready() to fix timeout mis-handling
> so ret == 0 now means success, the wait period jiffies is ignored because that
> number is meaningless and return a proper timeout error in case jiffies == 0.
>   - Make i2c default to 1 message per transfer (requested by Helen)
>   - Move -EIO error reporting to transfer function to cleanup transfer() 
> itself
> and its R/W callers
>   - Remove magic value hardcodings and introduce enum force_release.
> 
> Applies on next-20201127, tested on Chromebook EVE.
> ---
>  drivers/char/tpm/Kconfig|  10 +
>  drivers/char/tpm/Makefile   |   2 +
>  drivers/char/tpm/tpm_tis_i2c_cr50.c | 770 
>  3 files changed, 782 insertions(+)
>  create mode 100644 drivers/char/tpm/tpm_tis_i2c_cr50.c
> 
> diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
> index a18c314da211..4308f9ca7a43 100644
> --- a/drivers/char/tpm/Kconfig
> +++ b/drivers/char/tpm/Kconfig
> @@ -86,6 +86,16 @@ config TCG_TIS_SYNQUACER
> To compile this driver as a module, choose  M here;
> the module will be called tpm_tis_synquacer.
>  
> +config TCG_TIS_I2C_CR50
> + tristate "TPM Interface Specification 2.0 Interface (I2C - CR50)"
> + depends on I2C
> + select TCG_CR50
> + help
> +   This is a driver for the Google cr50 I2C TPM interface which is a
> +   custom microcontroller and requires a custom i2c protocol interface
> +   to handle the limitations of the hardware.  To compile this driver
> +   as a module, choose M here; the module will be called 
> tcg_tis_i2c_cr50.
> +
>  config TCG_TIS_I2C_ATMEL
>   tristate "TPM Interface Specification 1.2 Interface (I2C - Atmel)"
>   depends on I2C
> diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile
> index 84db4fb3a9c9..66d39ea6bd10 100644
> --- a/drivers/char/tpm/Makefile
> +++ b/drivers/char/tpm/Makefile
> @@ -27,6 +27,8 @@ obj-$(CONFIG_TCG_TIS_SPI) += tpm_tis_spi.o
>  tpm_tis_spi-y := tpm_tis_spi_main.o
>  tpm_tis_spi-$(CONFIG_TCG_TIS_SPI_CR50) += tpm_tis_spi_cr50.o
>  
> +obj-$(CONFIG_TCG_TIS_I2C_CR50) += tpm_tis_i2c_cr50.o
> +
>  obj-$(CONFIG_TCG_TIS_I2C_ATMEL) += tpm_i2c_atmel.o
>  obj-$(CONFIG_TCG_TIS_I2C_INFINEON) += tpm_i2c_infineon.o
>  obj-$(CONFIG_TCG_TIS_I2C_NUVOTON) += tpm_i2c_nuvoton.o
> diff --git a/drivers/char/tpm/tpm_tis_i2c_cr50.c 
> b/drivers/char/tpm/tpm_tis_i2c_cr50.c
> new file mode 100644
> index ..896bf0163150
> --- /dev/null
> +++ b/drivers/char/tpm/tpm_tis_i2c_cr50.c
> @@ -0,0 +1,770 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2016 Google Inc.
> + *
> + * Based on Linux Kernel TPM driver by
> + * Peter Huewe 
> + * Copyright (C) 2011 Infineon Technologies
> + *
> + * cr50 is a firmware for H1 secure modules that requires special
> + * handling for the I2C interface.
> + *
> + * - Use an interrupt for transaction status instead of hardcoded delays.
> + * - Must use write+wait+read read protocol.
> + * - All 4 bytes of status register must be read/written at once.
> + * - Burst count max is 63 bytes, and burst count behaves slightly 
> differently
> + *   than other I2C TPMs.
> + * - When reading from FIFO the full burstcnt must be read instead of just
> + *   reading header and determining the 

Re: [PATCH 2/2] tools/memory-model: Fix typo in klitmus7 compatibility table

2020-11-28 Thread Paul E. McKenney
On Sat, Nov 28, 2020 at 03:01:49PM +0900, Akira Yokosawa wrote:
> >From 4f577823fa60e14ae58caa2d3c0b2ced64e6eb43 Mon Sep 17 00:00:00 2001
> From: Akira Yokosawa 
> Date: Sat, 28 Nov 2020 14:32:15 +0900
> Subject: [PATCH 2/2] tools/memory-model: Fix typo in klitmus7 compatibility 
> table
> 
> klitmus7 of herdtools7 7.48 or earlier depends on ACCESS_ONCE(),
> which was removed in Linux v4.15.
> Fix the obvious typo in the table.
> 
> Fixes: d075a78a5ab1 ("tools/memory-model/README: Expand dependency of 
> klitmus7")
> Signed-off-by: Akira Yokosawa 

Both queued for review and further testing, thank you!

Thanx, Paul

> ---
>  tools/memory-model/README | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/memory-model/README b/tools/memory-model/README
> index 39d08d1f0443..9a84c45504ab 100644
> --- a/tools/memory-model/README
> +++ b/tools/memory-model/README
> @@ -51,7 +51,7 @@ klitmus7 Compatibility Table
>     ==
>   target Linux  herdtools7
>     --
> -  -- 4.18  7.48 --
> +  -- 4.14  7.48 --
>   4.15 -- 4.19  7.49 --
>   4.20 -- 5.5   7.54 --
>   5.6  --   7.56 --
> -- 
> 2.17.1
> 
> 


Re: [PATCH v2] char: tpm: add i2c driver for cr50

2020-11-28 Thread Jarkko Sakkinen
On Thu, Nov 26, 2020 at 03:19:24AM -0300, Ezequiel Garcia wrote:
> On Thu, 2020-11-26 at 05:30 +0200, Jarkko Sakkinen wrote:
> > On Tue, 2020-11-24 at 10:14 -0300, Ezequiel Garcia wrote:
> > > Hi Jarkko,
> > > 
> > > Thanks for your review.
> > > 
> > > On Tue, 2020-11-24 at 00:06 +0200, Jarkko Sakkinen wrote:
> > > > On Fri, Nov 20, 2020 at 07:23:45PM +0200, Adrian Ratiu wrote:
> > > > > From: "dlau...@chromium.org" 
> > > > > 
> > > > > Add TPM 2.0 compatible I2C interface for chips with cr50
> > > > > firmware.
> > > > > 
> > > > > The firmware running on the currently supported H1 MCU requires a
> > > > > special driver to handle its specific protocol, and this makes it
> > > > > unsuitable to use tpm_tis_core_* and instead it must implement
> > > > > the
> > > > > underlying TPM protocol similar to the other I2C TPM drivers.
> > > > > 
> > > > > - All 4 byes of status register must be read/written at once.
> > > > > - FIFO and burst count is limited to 63 and must be drained by
> > > > > AP.
> > > > > - Provides an interrupt to indicate when read response data is
> > > > > ready
> > > > > and when the TPM is finished processing write data.
> > > > > 
> > > > > This driver is based on the existing infineon I2C TPM driver,
> > > > > which
> > > > > most closely matches the cr50 i2c protocol behavior.
> > > > > 
> > > > > Cc: Helen Koike 
> > > > > Signed-off-by: Duncan Laurie 
> > > > > [swb...@chromium.org: Depend on i2c even if it's a module,
> > > > > replace
> > > > > boilier plate with SPDX tag, drop asm/byteorder.h include,
> > > > > simplify
> > > > > return from probe]
> > > > > Signed-off-by: Stephen Boyd 
> > > > > Signed-off-by: Fabien Lahoudere 
> > > > > Signed-off-by: Adrian Ratiu 
> > > > > ---
> > > > > Changes in v2:
> > > > >   - Various small fixes all over (reorder includes, MAX_BUFSIZE,
> > > > > comments, etc)
> > > > >   - Reworked return values of i2c_wait_tpm_ready() to fix timeout
> > > > > mis-handling
> > > > > so ret == 0 now means success, the wait period jiffies is ignored
> > > > > because that
> > > > > number is meaningless and return a proper timeout error in case
> > > > > jiffies == 0.
> > > > >   - Make i2c default to 1 message per transfer (requested by
> > > > > Helen)
> > > > >   - Move -EIO error reporting to transfer function to cleanup
> > > > > transfer() itself
> > > > > and its R/W callers
> > > > >   - Remove magic value hardcodings and introduce enum
> > > > > force_release.
> > > > > 
> > > > > v1 posted at https://lkml.org/lkml/2020/2/25/349
> > > > > 
> > > > > Applies on next-20201120, tested on Chromebook EVE.
> > > > > ---
> > > > >  drivers/char/tpm/Kconfig|  10 +
> > > > >  drivers/char/tpm/Makefile   |   2 +
> > > > >  drivers/char/tpm/tpm_tis_i2c_cr50.c | 768
> > > > > 
> > > > >  3 files changed, 780 insertions(+)
> > > > >  create mode 100644 drivers/char/tpm/tpm_tis_i2c_cr50.c
> > > > > 
> > > > > diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
> > > > > index a18c314da211..4308f9ca7a43 100644
> > > > > --- a/drivers/char/tpm/Kconfig
> > > > > +++ b/drivers/char/tpm/Kconfig
> > > > > @@ -86,6 +86,16 @@ config TCG_TIS_SYNQUACER
> > > > >   To compile this driver as a module, choose  M here;
> > > > >   the module will be called tpm_tis_synquacer.
> > > > >  
> > > > > +config TCG_TIS_I2C_CR50
> > > > > +   tristate "TPM Interface Specification 2.0 Interface (I2C
> > > > > - CR50)"
> > > > > +   depends on I2C
> > > > > +   select TCG_CR50
> > > > > +   help
> > > > > + This is a driver for the Google cr50 I2C TPM interface
> > > > > which is a
> > > > > + custom microcontroller and requires a custom i2c
> > > > > protocol interface
> > > > > + to handle the limitations of the hardware.  To compile
> > > > > this driver
> > > > > + as a module, choose M here; the module will be called
> > > > > tcg_tis_i2c_cr50.
> > > > > +
> > > > >  config TCG_TIS_I2C_ATMEL
> > > > > tristate "TPM Interface Specification 1.2 Interface (I2C
> > > > > - Atmel)"
> > > > > depends on I2C
> > > > > diff --git a/drivers/char/tpm/Makefile
> > > > > b/drivers/char/tpm/Makefile
> > > > > index 84db4fb3a9c9..66d39ea6bd10 100644
> > > > > --- a/drivers/char/tpm/Makefile
> > > > > +++ b/drivers/char/tpm/Makefile
> > > > > @@ -27,6 +27,8 @@ obj-$(CONFIG_TCG_TIS_SPI) += tpm_tis_spi.o
> > > > >  tpm_tis_spi-y := tpm_tis_spi_main.o
> > > > >  tpm_tis_spi-$(CONFIG_TCG_TIS_SPI_CR50) += tpm_tis_spi_cr50.o
> > > > >  
> > > > > +obj-$(CONFIG_TCG_TIS_I2C_CR50) += tpm_tis_i2c_cr50.o
> > > > > +
> > > > >  obj-$(CONFIG_TCG_TIS_I2C_ATMEL) += tpm_i2c_atmel.o
> > > > >  obj-$(CONFIG_TCG_TIS_I2C_INFINEON) += tpm_i2c_infineon.o
> > > > >  obj-$(CONFIG_TCG_TIS_I2C_NUVOTON) += tpm_i2c_nuvoton.o
> > > > > diff --git a/drivers/char/tpm/tpm_tis_i2c_cr50.c
> > > > > b/drivers/char/tpm/tpm_tis_i2c_cr50.c
> > > > > new file mode 100644
> > > > > index 

Re: [PATCH] x86/signals: Fix save/restore signal stack to correctly support sigset_t

2020-11-28 Thread Al Viro
On Sat, Nov 28, 2020 at 06:19:31PM -0800, Walt Drummond wrote:
> Thanks Al.  I want to understand the nuance, so please bear with me as I
> reason this out.   The cast in stone nature of this is due to both the need
> to keep userspace and kernel space in sync (ie, you'd have to coordinate
> libc and kernel changes super tightly to pull this off), and any change in
> the size of struct rt_sigframe would break backwards compatibility with
> older binaries, is that correct?

Pretty much so.  I would expect gdb and friends to be very unhappy about
that, for starters, along with a bunch of fun stuff like JVM, etc.

Ask the userland folks (libc, gdb, etc.) how would they feel about such
changes.  I'm fairly sure that it's _not_ going to be a matter of
changing _NSIG, rebuilding the kernel and living happily ever after.


Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

2020-11-28 Thread Jarkko Sakkinen
On Tue, Nov 24, 2020 at 10:45:01PM +0100, Hans de Goede wrote:
> Hi,
> 
> On 11/24/20 6:52 PM, Jerry Snitselaar wrote:
> > 
> > Jarkko Sakkinen @ 2020-11-23 20:26 MST:
> > 
> >> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
> >>>
> >>> Matthew Garrett @ 2020-10-15 15:39 MST:
> >>>
>  On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar  
>  wrote:
> >
> > There is a misconfiguration in the bios of the gpio pin used for the
> > interrupt in the T490s. When interrupts are enabled in the tpm_tis
> > driver code this results in an interrupt storm. This was initially
> > reported when we attempted to enable the interrupt code in the tpm_tis
> > driver, which previously wasn't setting a flag to enable it. Due to
> > the reports of the interrupt storm that code was reverted and we went 
> > back
> > to polling instead of using interrupts. Now that we know the T490s 
> > problem
> > is a firmware issue, add code to check if the system is a T490s and
> > disable interrupts if that is the case. This will allow us to enable
> > interrupts for everyone else. If the user has a fixed bios they can
> > force the enabling of interrupts with tpm_tis.interrupts=1 on the
> > kernel command line.
> 
>  I think an implication of this is that systems haven't been
>  well-tested with interrupts enabled. In general when we've found a
>  firmware issue in one place it ends up happening elsewhere as well, so
>  it wouldn't surprise me if there are other machines that will also be
>  unhappy with interrupts enabled. Would it be possible to automatically
>  detect this case (eg, if we get more than a certain number of
>  interrupts in a certain timeframe immediately after enabling the
>  interrupt) and automatically fall back to polling in that case? It
>  would also mean that users with fixed firmware wouldn't need to pass a
>  parameter.
> >>>
> >>> I believe Matthew is correct here. I found another system today
> >>> with completely different vendor for both the system and the tpm chip.
> >>> In addition another Lenovo model, the L490, has the issue.
> >>>
> >>> This initial attempt at a solution like Matthew suggested works on
> >>> the system I found today, but I imagine it is all sorts of wrong.
> >>> In the 2 systems where I've seen it, there are about 10 interrupts
> >>> in around 1.5 seconds, and then the irq code shuts down the interrupt
> >>> because they aren't being handled.
> >>>
> >>>
> >>> diff --git a/drivers/char/tpm/tpm_tis_core.c 
> >>> b/drivers/char/tpm/tpm_tis_core.c
> >>> index 49ae09ac604f..478e9d02a3fa 100644
> >>> --- a/drivers/char/tpm/tpm_tis_core.c
> >>> +++ b/drivers/char/tpm/tpm_tis_core.c
> >>> @@ -27,6 +27,11 @@
> >>>  #include "tpm.h"
> >>>  #include "tpm_tis_core.h"
> >>>
> >>> +static unsigned int time_start = 0;
> >>> +static bool storm_check = true;
> >>> +static bool storm_killed = false;
> >>> +static u32 irqs_fired = 0;
> >>
> >> Maybe kstat_irqs() would be a better idea than ad hoc stats.
> >>
> > 
> > Thanks, yes that would be better.
> > 
> >>> +
> >>>  static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
> >>>
> >>>  static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
> >>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, 
> >>> const u8 *buf, size_t len)
> >>> return rc;
> >>>  }
> >>>
> >>> -static void disable_interrupts(struct tpm_chip *chip)
> >>> +static void __disable_interrupts(struct tpm_chip *chip)
> >>>  {
> >>> struct tpm_tis_data *priv = dev_get_drvdata(>dev);
> >>> u32 intmask;
> >>> int rc;
> >>>
> >>> -   if (priv->irq == 0)
> >>> -   return;
> >>> -
> >>> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), 
> >>> );
> >>> if (rc < 0)
> >>> intmask = 0;
> >>>
> >>> intmask &= ~TPM_GLOBAL_INT_ENABLE;
> >>> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), 
> >>> intmask);
> >>> +   chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >>> +}
> >>> +
> >>> +static void disable_interrupts(struct tpm_chip *chip)
> >>> +{
> >>> +   struct tpm_tis_data *priv = dev_get_drvdata(>dev);
> >>>
> >>> +   if (priv->irq == 0)
> >>> +   return;
> >>> +
> >>> +   __disable_interrupts(chip);
> >>> devm_free_irq(chip->dev.parent, priv->irq, chip);
> >>> priv->irq = 0;
> >>> -   chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >>>  }
> >>>
> >>>  /*
> >>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 
> >>> *buf, size_t len)
> >>> int rc, irq;
> >>> struct tpm_tis_data *priv = dev_get_drvdata(>dev);
> >>>
> >>> +   if (unlikely(storm_killed)) {
> >>> +   devm_free_irq(chip->dev.parent, priv->irq, chip);
> >>> +   priv->irq = 0;
> >>> +   storm_killed = false;
> >>> +   }
> >>
> >> OK this 

Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

2020-11-28 Thread Jarkko Sakkinen
On Tue, Nov 24, 2020 at 10:10:21AM -0800, James Bottomley wrote:
> On Tue, 2020-11-24 at 10:52 -0700, Jerry Snitselaar wrote:
> > Before diving further into that though, does anyone else have an
> > opinion on ripping out the irq code, and just using polling? We've
> > been only polling since 2015 anyways.
> 
> Well only a biased one, obviously: polling causes large amounts of busy
> waiting, which is a waste of CPU resources and does increase the time
> it takes us to do TPM operations ... not a concern if you're doing long
> computation ones, like signatures, but it is a problem for short
> operations like bulk updates of PCRs.  The other potential issue, as we
> saw with atmel is that if you prod the chip too often (which you have
> to do with polling) you risk upsetting it.  We've spent ages trying to
> tune the polling parameters to balance reduction of busy wait with chip
> upset and still, apparently, not quite got it right.  If the TPM has a
> functioning IRQ then it gets us out of the whole polling mess entirely.
> The big question is how many chips that report an IRQ actually have a
> malfunctioning one?
> 
> James

Do we have a way to know is Windows TPM code using IRQ's?

/Jarkko


Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

2020-11-28 Thread Jarkko Sakkinen
On Tue, Nov 24, 2020 at 10:52:56AM -0700, Jerry Snitselaar wrote:
> 
> Jarkko Sakkinen @ 2020-11-23 20:26 MST:
> 
> > On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote:
> >> 
> >> Matthew Garrett @ 2020-10-15 15:39 MST:
> >> 
> >> > On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar  
> >> > wrote:
> >> >>
> >> >> There is a misconfiguration in the bios of the gpio pin used for the
> >> >> interrupt in the T490s. When interrupts are enabled in the tpm_tis
> >> >> driver code this results in an interrupt storm. This was initially
> >> >> reported when we attempted to enable the interrupt code in the tpm_tis
> >> >> driver, which previously wasn't setting a flag to enable it. Due to
> >> >> the reports of the interrupt storm that code was reverted and we went 
> >> >> back
> >> >> to polling instead of using interrupts. Now that we know the T490s 
> >> >> problem
> >> >> is a firmware issue, add code to check if the system is a T490s and
> >> >> disable interrupts if that is the case. This will allow us to enable
> >> >> interrupts for everyone else. If the user has a fixed bios they can
> >> >> force the enabling of interrupts with tpm_tis.interrupts=1 on the
> >> >> kernel command line.
> >> >
> >> > I think an implication of this is that systems haven't been
> >> > well-tested with interrupts enabled. In general when we've found a
> >> > firmware issue in one place it ends up happening elsewhere as well, so
> >> > it wouldn't surprise me if there are other machines that will also be
> >> > unhappy with interrupts enabled. Would it be possible to automatically
> >> > detect this case (eg, if we get more than a certain number of
> >> > interrupts in a certain timeframe immediately after enabling the
> >> > interrupt) and automatically fall back to polling in that case? It
> >> > would also mean that users with fixed firmware wouldn't need to pass a
> >> > parameter.
> >> 
> >> I believe Matthew is correct here. I found another system today
> >> with completely different vendor for both the system and the tpm chip.
> >> In addition another Lenovo model, the L490, has the issue.
> >> 
> >> This initial attempt at a solution like Matthew suggested works on
> >> the system I found today, but I imagine it is all sorts of wrong.
> >> In the 2 systems where I've seen it, there are about 10 interrupts
> >> in around 1.5 seconds, and then the irq code shuts down the interrupt
> >> because they aren't being handled.
> >> 
> >> 
> >> diff --git a/drivers/char/tpm/tpm_tis_core.c 
> >> b/drivers/char/tpm/tpm_tis_core.c
> >> index 49ae09ac604f..478e9d02a3fa 100644
> >> --- a/drivers/char/tpm/tpm_tis_core.c
> >> +++ b/drivers/char/tpm/tpm_tis_core.c
> >> @@ -27,6 +27,11 @@
> >>  #include "tpm.h"
> >>  #include "tpm_tis_core.h"
> >> 
> >> +static unsigned int time_start = 0;
> >> +static bool storm_check = true;
> >> +static bool storm_killed = false;
> >> +static u32 irqs_fired = 0;
> >
> > Maybe kstat_irqs() would be a better idea than ad hoc stats.
> >
> 
> Thanks, yes that would be better.
> 
> >> +
> >>  static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value);
> >> 
> >>  static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask)
> >> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, 
> >> const u8 *buf, size_t len)
> >> return rc;
> >>  }
> >> 
> >> -static void disable_interrupts(struct tpm_chip *chip)
> >> +static void __disable_interrupts(struct tpm_chip *chip)
> >>  {
> >> struct tpm_tis_data *priv = dev_get_drvdata(>dev);
> >> u32 intmask;
> >> int rc;
> >> 
> >> -   if (priv->irq == 0)
> >> -   return;
> >> -
> >> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), 
> >> );
> >> if (rc < 0)
> >> intmask = 0;
> >> 
> >> intmask &= ~TPM_GLOBAL_INT_ENABLE;
> >> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), 
> >> intmask);
> >> +   chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >> +}
> >> +
> >> +static void disable_interrupts(struct tpm_chip *chip)
> >> +{
> >> +   struct tpm_tis_data *priv = dev_get_drvdata(>dev);
> >> 
> >> +   if (priv->irq == 0)
> >> +   return;
> >> +
> >> +   __disable_interrupts(chip);
> >> devm_free_irq(chip->dev.parent, priv->irq, chip);
> >> priv->irq = 0;
> >> -   chip->flags &= ~TPM_CHIP_FLAG_IRQ;
> >>  }
> >> 
> >>  /*
> >> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 
> >> *buf, size_t len)
> >> int rc, irq;
> >> struct tpm_tis_data *priv = dev_get_drvdata(>dev);
> >> 
> >> +   if (unlikely(storm_killed)) {
> >> +   devm_free_irq(chip->dev.parent, priv->irq, chip);
> >> +   priv->irq = 0;
> >> +   storm_killed = false;
> >> +   }
> >
> > OK this kind of bad solution because if tpm_tis_send() is not called,
> > then IRQ is never freed. AFAIK, devres_* do not sleep but use spin
> > lock, 

[PATCHv3] media: vb2: always set buffer cache sync hints

2020-11-28 Thread Sergey Senozhatsky
We need to always set ->need_cache_sync_on_prepare and
->need_cache_sync_on_finish when we initialize vb2 buffer.

Currently these flags are set/adjusted only in V4L2's
vb2_queue_or_prepare_buf(), which means that for the code
paths that don't use V4L2 vb2 will always tell videobuf2
core to skip ->prepare() and ->finish() cache syncs/flushes.

This is a quick solution that should do the trick. The
proper fix, however, is much more complicated and requires
a rather big videobuf2 refactoring - we need to move cache
sync/flush decision making out of core videobuf2 to the
allocators.

Fixes: f5f5fa73fbfb ("media: videobuf2: handle V4L2 buffer cache flags")
Reported-by: Tomasz Figa 
Signed-off-by: Sergey Senozhatsky 
---
 drivers/media/common/videobuf2/videobuf2-core.c | 14 ++
 1 file changed, 14 insertions(+)

v3: Improved code comment and dropped queue allow_cache_hints check (Tomasz)
v2: Added a comment and set cache sync flags only for specific buffers (Hans)

diff --git a/drivers/media/common/videobuf2/videobuf2-core.c 
b/drivers/media/common/videobuf2/videobuf2-core.c
index 5499013cf82e..3f11fc5b5d9a 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -414,6 +414,20 @@ static int __vb2_queue_alloc(struct vb2_queue *q, enum 
vb2_memory memory,
vb->index = q->num_buffers + buffer;
vb->type = q->type;
vb->memory = memory;
+   /*
+* A workaround fix. We need to set these flags here so that
+* videobuf2 core will call ->prepare()/->finish() cache
+* sync/flush on vb2 buffers when appropriate. Otherwise, for
+* backends that don't rely on V4L2 (perhaps dvb) these flags
+* will always be false and, hence, videobuf2 core will skip
+* cache sync/flush operations. However, we can avoid explicit
+* ->prepare() and ->finish() cache sync for DMABUF buffers,
+* because DMA exporter takes care of it.
+*/
+   if (q->memory != VB2_MEMORY_DMABUF) {
+   vb->need_cache_sync_on_prepare = 1;
+   vb->need_cache_sync_on_finish = 1;
+   }
for (plane = 0; plane < num_planes; ++plane) {
vb->planes[plane].length = plane_sizes[plane];
vb->planes[plane].min_length = plane_sizes[plane];
-- 
2.29.2



Re: [PATCH v2 3/4] input: touchscreen: usbtouchscreen: Remove unused variable 'ret'

2020-11-28 Thread Dmitry Torokhov
Hi Lee,

On Thu, Nov 26, 2020 at 01:36:06PM +, Lee Jones wrote:
> Fixes the following W=1 kernel build warning(s):
> 
>  drivers/input/touchscreen/usbtouchscreen.c: In function ‘nexio_read_data’:
>  drivers/input/touchscreen/usbtouchscreen.c:1052:50: warning: variable ‘ret’ 
> set but not used [-Wunused-but-set-variable]
> 
> Cc: Dmitry Torokhov 
> Cc: Henrik Rydberg 
> Cc: James Hilliard 
> Cc: Daniel Ritz 
> Cc: linux-in...@vger.kernel.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/input/touchscreen/usbtouchscreen.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/input/touchscreen/usbtouchscreen.c 
> b/drivers/input/touchscreen/usbtouchscreen.c
> index 397cb1d3f481b..c3b7130cd9033 100644
> --- a/drivers/input/touchscreen/usbtouchscreen.c
> +++ b/drivers/input/touchscreen/usbtouchscreen.c
> @@ -1049,7 +1049,7 @@ static int nexio_read_data(struct usbtouch_usb 
> *usbtouch, unsigned char *pkt)
>   unsigned int data_len = be16_to_cpu(packet->data_len);
>   unsigned int x_len = be16_to_cpu(packet->x_len);
>   unsigned int y_len = be16_to_cpu(packet->y_len);
> - int x, y, begin_x, begin_y, end_x, end_y, w, h, ret;
> + int x, y, begin_x, begin_y, end_x, end_y, w, h;
>  
>   /* got touch data? */
>   if ((pkt[0] & 0xe0) != 0xe0)
> @@ -1061,7 +1061,7 @@ static int nexio_read_data(struct usbtouch_usb 
> *usbtouch, unsigned char *pkt)
>   x_len -= 0x80;
>  
>   /* send ACK */
> - ret = usb_submit_urb(priv->ack, GFP_ATOMIC);
> + usb_submit_urb(priv->ack, GFP_ATOMIC);

I thought you were going to add error handling here?

>  
>   if (!usbtouch->type->max_xc) {
>   usbtouch->type->max_xc = 2 * x_len;
> -- 
> 2.25.1
> 

Thanks.

-- 
Dmitry


Re: [PATCH v2 2/4] input: touchscreen: melfas_mip4: Remove a bunch of unused variables

2020-11-28 Thread Dmitry Torokhov
Hi Lee,

On Thu, Nov 26, 2020 at 01:36:05PM +, Lee Jones wrote:
> Fixes the following W=1 kernel build warning(s):
> 
>  drivers/input/touchscreen/melfas_mip4.c: In function ‘mip4_report_touch’:
>  drivers/input/touchscreen/melfas_mip4.c:474:5: warning: variable ‘size’ set 
> but not used [-Wunused-but-set-variable]
>  drivers/input/touchscreen/melfas_mip4.c:472:5: warning: variable 
> ‘pressure_stage’ set but not used [-Wunused-but-set-variable]
>  drivers/input/touchscreen/melfas_mip4.c:469:7: warning: variable ‘palm’ set 
> but not used [-Wunused-but-set-variable]
>  drivers/input/touchscreen/melfas_mip4.c:468:7: warning: variable ‘hover’ set 
> but not used [-Wunused-but-set-variable]
> 
> Cc: Sangwon Jee 
> Cc: Dmitry Torokhov 
> Cc: Henrik Rydberg 
> Cc: linux-in...@vger.kernel.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/input/touchscreen/melfas_mip4.c | 11 ---
>  1 file changed, 11 deletions(-)
> 
> diff --git a/drivers/input/touchscreen/melfas_mip4.c 
> b/drivers/input/touchscreen/melfas_mip4.c
> index f67efdd040b24..9c98759098c7a 100644
> --- a/drivers/input/touchscreen/melfas_mip4.c
> +++ b/drivers/input/touchscreen/melfas_mip4.c
> @@ -465,13 +465,9 @@ static void mip4_report_keys(struct mip4_ts *ts, u8 
> *packet)
>  static void mip4_report_touch(struct mip4_ts *ts, u8 *packet)
>  {
>   int id;
> - bool hover;
> - bool palm;

So __always_unused did not work?

-- 
Dmitry


Re: [PATCH] x86/signals: Fix save/restore signal stack to correctly support sigset_t

2020-11-28 Thread Walt Drummond
(Sorry, resending as Gmail decided to ignore "Plaintext mode")

Thanks Al.  I want to understand the nuance, so please bear with me as
I reason this out.   The cast in stone nature of this is due to both
the need to keep userspace and kernel space in sync (ie, you'd have to
coordinate libc and kernel changes super tightly to pull this off),
and any change in the size of struct rt_sigframe would break backwards
compatibility with older binaries, is that correct?

Thanks, appreciate the help here.
--Walt


On Sat, Nov 28, 2020 at 6:19 PM Walt Drummond  wrote:
>
> Thanks Al.  I want to understand the nuance, so please bear with me as I 
> reason this out.   The cast in stone nature of this is due to both the need 
> to keep userspace and kernel space in sync (ie, you'd have to coordinate libc 
> and kernel changes super tightly to pull this off), and any change in the 
> size of struct rt_sigframe would break backwards compatibility with older 
> binaries, is that correct?
>
> Thanks, appreciate the help here.
> --Walt
>
>
> On Fri, Nov 27, 2020 at 9:23 PM Al Viro  wrote:
>>
>> On Thu, Nov 19, 2020 at 02:11:33PM -0800, Walt Drummond wrote:
>> > The macro unsafe_put_sigmask() only handles the first 64 bits of the
>> > sigmask_t, which works today.  However, if the definition of the
>> > sigset_t structure ever changed,
>>
>> ... existing userland would get fucked over, since sigset_t is
>> present in user-visible data structures.  Including the ones
>> we are using that thing for - struct rt_sigframe, for starters.
>>
>> Layout of those suckers is very much cast in stone.  We *can't*
>> change it, no matter what we do kernel-side.
>>
>> NAKed-by: Al Viro 


From Mrs Marie Smith

2020-11-28 Thread Mrs Marie Smith
Hello, With Due Respect,

My name is Mrs Marie Smith, I am an Canadian, was married to Late Dr. IDRISU 
Smith , We were married for 32 years without a child. My late husband was a 
businessman and Gold merchant but he was poisoned by his business partners. And 
Recently, My Doctor told me that I will undergo a Surgery (Cancer ) and I may 
not survive it or live longer than required due to my health condition. I 
inherited a total sum funds from my late husband, which is deposited in a bank 
. Having known my condition, I decided to donate this fund to any God fearing 
man or woman who will use the money the way I am going to instruct. I want 
somebody who will fulfill the desire of I and my late husband to help the less 
privileged people, orphanages and the widows in our society. I took this 
decision because according to my history, I was once an orphan who was raised 
from orphanage home in Canada .

This is why I am taking this decision to hand over this fund to you in good 
faith and measure to help the less privileged by opening a foundation you will 
call "MARIE FOUNDATION". If you are interested in to handle this project, 
please give me your, name, address, profession and telephone numbers for me to 
get the inheritance letter for you as soon as possible. I have a limited time 
to live, I am not afraid of death, hence I know where I am going. I want you to 
always remember me in your daily prayers because of my upcoming Cancer Surgery. 
Kindly contact me if you are interested to carry out this humanitarian 
duty. 

Please give me your below details

Your Full name
Your Address
Occupation
Phone No

Write me on my private email: mrsmariesmith...@hotmail.com  
Mrs Marie Smith


[RFC PATCH] blk-iocost: Optimize the ioc_refreash_vrate() function

2020-11-28 Thread Baolin Wang
The ioc_refreash_vrate() will only be called in ioc_timer_fn() after
starting a new period or stopping the period.

So when starting a new period, the variable 'pleft' in ioc_refreash_vrate()
is always the period's time, which means if the abs(ioc->vtime_err)
is less than the period's time, the vcomp is 0, and we do not need
compensate the vtime_rate in this case, just set it as the base vrate
and return.

When stopping the period, the ioc->vtime_err will be cleared to 0,
and we also do not need to compensate the vtime_rate, just set it as
the base vrate and return.

Signed-off-by: Baolin Wang 
---
 block/blk-iocost.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 8348db4..58c9533 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -943,30 +943,29 @@ static bool ioc_refresh_params(struct ioc *ioc, bool 
force)
  */
 static void ioc_refresh_vrate(struct ioc *ioc, struct ioc_now *now)
 {
-   s64 pleft = ioc->period_at + ioc->period_us - now->now;
s64 vperiod = ioc->period_us * ioc->vtime_base_rate;
s64 vcomp, vcomp_min, vcomp_max;
 
lockdep_assert_held(>lock);
 
-   /* we need some time left in this period */
-   if (pleft <= 0)
-   goto done;
+   if (abs(ioc->vtime_err) < ioc->period_us) {
+   atomic64_set(>vtime_rate, ioc->vtime_base_rate);
+   return;
+   }
 
/*
 * Calculate how much vrate should be adjusted to offset the error.
 * Limit the amount of adjustment and deduct the adjusted amount from
 * the error.
 */
-   vcomp = -div64_s64(ioc->vtime_err, pleft);
+   vcomp = -div64_s64(ioc->vtime_err, ioc->period_us);
vcomp_min = -(ioc->vtime_base_rate >> 1);
vcomp_max = ioc->vtime_base_rate;
vcomp = clamp(vcomp, vcomp_min, vcomp_max);
 
-   ioc->vtime_err += vcomp * pleft;
+   ioc->vtime_err += vcomp * ioc->period_us;
 
atomic64_set(>vtime_rate, ioc->vtime_base_rate + vcomp);
-done:
/* bound how much error can accumulate */
ioc->vtime_err = clamp(ioc->vtime_err, -vperiod, vperiod);
 }
-- 
1.8.3.1



Re: [PATCH v12 12/17] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2020-11-28 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:11 -0500
Tony Krowiak  wrote:

> Let's hot plug/unplug adapters, domains and control domains assigned to or
> unassigned from an AP matrix mdev device while it is in use by a guest per
> the following rules:
> 
> * Assign an adapter to mdev's matrix:
> 
>   The adapter will be hot plugged into the guest under the following
>   conditions:
>   1. The adapter is not yet assigned to the guest's matrix
>   2. At least one domain is assigned to the guest's matrix
>   3. Each APQN derived from the APID of the newly assigned adapter and
>  the APQIs of the domains already assigned to the guest's
>  matrix references a queue device bound to the vfio_ap device driver.
> 
>   The adapter and each domain assigned to the mdev's matrix will be hot
>   plugged into the guest under the following conditions:
>   1. The adapter is not yet assigned to the guest's matrix
>   2. No domains are assigned to the guest's matrix
>   3  At least one domain is assigned to the mdev's matrix
>   4. Each APQN derived from the APID of the newly assigned adapter and
>  the APQIs of the domains assigned to the mdev's matrix references a
>  queue device bound to the vfio_ap device driver.
> 
> * Unassign an adapter from mdev's matrix:
> 
>   The adapter will be hot unplugged from the KVM guest if it is
>   assigned to the guest's matrix.
> 
> * Assign a domain to mdev's matrix:
> 
>   The domain will be hot plugged into the guest under the following
>   conditions:
>   1. The domain is not yet assigned to the guest's matrix
>   2. At least one adapter is assigned to the guest's matrix
>   3. Each APQN derived from the APQI of the newly assigned domain and
>  the APIDs of the adapters already assigned to the guest's
>  matrix references a queue device bound to the vfio_ap device driver.
> 
>   The domain and each adapter assigned to the mdev's matrix will be hot
>   plugged into the guest under the following conditions:
>   1. The domain is not yet assigned to the guest's matrix
>   2. No adapters are assigned to the guest's matrix
>   3  At least one adapter is assigned to the mdev's matrix
>   4. Each APQN derived from the APQI of the newly assigned domain and
>  the APIDs of the adapters assigned to the mdev's matrix references a
>  queue device bound to the vfio_ap device driver.
> 
> * Unassign adapter from mdev's matrix:
> 
>   The domain will be hot unplugged from the KVM guest if it is
>   assigned to the guest's matrix.
> 
> * Assign a control domain:
> 
>   The control domain will be hot plugged into the KVM guest if it is not
>   assigned to the guest's APCB. The AP architecture ensures a guest will
>   only get access to the control domain if it is in the host's AP
>   configuration, so there is no risk in hot plugging it; however, it will
>   become automatically available to the guest when it is added to the host
>   configuration.
> 
> * Unassign a control domain:
> 
>   The control domain will be hot unplugged from the KVM guest if it is
>   assigned to the guest's APCB.

This is where things start getting tricky. E.g. do we need to revise
filtering after an unassign? (For example an assign_adapter X didn't
change the shadow, because queue XY was missing, but now we unplug domain
Y. Should the adapter X pop up? I guess it should.)


> 
> Note: Now that hot plug/unplug is implemented, there is the possibility
>   that an assignment/unassignment of an adapter, domain or control
>   domain could be initiated while the guest is starting, so the
>   matrix device lock will be taken for the group notification callback
>   that initializes the guest's APCB when the KVM pointer is made
>   available to the vfio_ap device driver.
> 
> Signed-off-by: Tony Krowiak 
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 190 +-
>  1 file changed, 159 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 586ec5776693..4f96b7861607 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -631,6 +631,60 @@ static void vfio_ap_mdev_manage_qlinks(struct 
> ap_matrix_mdev *matrix_mdev,
>   }
>  }
>  
> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long apid)
> +{
> + unsigned long apqi, apqn;
> + unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
> +
> + /*
> +  * If the APID is already assigned to the guest's shadow APCB, there is
> +  * no need to assign it.
> +  */
> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> + return false;
> +
> + /*
> +  * If no domains have yet been assigned to the shadow APCB and one or
> +  * more domains have been assigned to the matrix mdev, then use
> +  * the domains assigned to the matrix mdev; otherwise, there is nothing
> +  * to assign to the 

Re: [PATCH v2 bpf-next 00/13] Atomics for eBPF

2020-11-28 Thread Alexei Starovoitov
On Fri, Nov 27, 2020 at 09:53:05PM -0800, Yonghong Song wrote:
> 
> 
> On 11/27/20 9:57 AM, Brendan Jackman wrote:
> > Status of the patches
> > =
> > 
> > Thanks for the reviews! Differences from v1->v2 [1]:
> > 
> > * Fixed mistakes in the netronome driver
> > 
> > * Addd sub, add, or, xor operations
> > 
> > * The above led to some refactors to keep things readable. (Maybe I
> >should have just waited until I'd implemented these before starting
> >the review...)
> > 
> > * Replaced BPF_[CMP]SET | BPF_FETCH with just BPF_[CMP]XCHG, which
> >include the BPF_FETCH flag
> > 
> > * Added a bit of documentation. Suggestions welcome for more places
> >to dump this info...
> > 
> > The prog_test that's added depends on Clang/LLVM features added by
> > Yonghong in https://reviews.llvm.org/D72184
> > 
> > This only includes a JIT implementation for x86_64 - I don't plan to
> > implement JIT support myself for other architectures.
> > 
> > Operations
> > ==
> > 
> > This patchset adds atomic operations to the eBPF instruction set. The
> > use-case that motivated this work was a trivial and efficient way to
> > generate globally-unique cookies in BPF progs, but I think it's
> > obvious that these features are pretty widely applicable.  The
> > instructions that are added here can be summarised with this list of
> > kernel operations:
> > 
> > * atomic[64]_[fetch_]add
> > * atomic[64]_[fetch_]sub
> > * atomic[64]_[fetch_]and
> > * atomic[64]_[fetch_]or
> 
> * atomic[64]_[fetch_]xor
> 
> > * atomic[64]_xchg
> > * atomic[64]_cmpxchg
> 
> Thanks. Overall looks good to me but I did not check carefully
> on jit part as I am not an expert in x64 assembly...
> 
> This patch also introduced atomic[64]_{sub,and,or,xor}, similar to
> xadd. I am not sure whether it is necessary. For one thing,
> users can just use atomic[64]_fetch_{sub,and,or,xor} to ignore
> return value and they will achieve the same result, right?
> From llvm side, there is no ready-to-use gcc builtin matching
> atomic[64]_{sub,and,or,xor} which does not have return values.
> If we go this route, we will need to invent additional bpf
> specific builtins.

I think bpf specific builtins are overkill.
As you said the users can use atomic_fetch_xor() and ignore
return value. I think llvm backend should be smart enough to use
BPF_ATOMIC | BPF_XOR insn without BPF_FETCH bit in such case.
But if it's too cumbersome to do at the moment we skip this
optimization for now.


Re: [RESEND PATCH 17/19] mmc: sunxi: add support for A100 mmc controller

2020-11-28 Thread André Przywara
On 28/11/2020 19:56, André Przywara wrote:
> On 10/11/2020 06:46, Frank Lee wrote:

Hi,

one more thing below ...

>> From: Yangtao Li 
>>
>> This patch adds support for A100 MMC controller, which use word address
>> for internal dma.
>>
>> Signed-off-by: Yangtao Li 
>> ---
>>  drivers/mmc/host/sunxi-mmc.c | 28 +---
>>  1 file changed, 25 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/mmc/host/sunxi-mmc.c b/drivers/mmc/host/sunxi-mmc.c
>> index fc62773602ec..1518b64112b7 100644
>> --- a/drivers/mmc/host/sunxi-mmc.c
>> +++ b/drivers/mmc/host/sunxi-mmc.c
>> @@ -244,6 +244,7 @@ struct sunxi_idma_des {
>>  
>>  struct sunxi_mmc_cfg {
>>  u32 idma_des_size_bits;
>> +u32 idma_des_shift;
>>  const struct sunxi_mmc_clk_delay *clk_delays;
>>  
>>  /* does the IP block support autocalibration? */
>> @@ -343,7 +344,7 @@ static int sunxi_mmc_init_host(struct sunxi_mmc_host 
>> *host)
>>  /* Enable CEATA support */
>>  mmc_writel(host, REG_FUNS, SDXC_CEATA_ON);
>>  /* Set DMA descriptor list base address */
>> -mmc_writel(host, REG_DLBA, host->sg_dma);
>> +mmc_writel(host, REG_DLBA, host->sg_dma >> host->cfg->idma_des_shift);
>>  
>>  rval = mmc_readl(host, REG_GCTRL);
>>  rval |= SDXC_INTERRUPT_ENABLE_BIT;
>> @@ -373,8 +374,10 @@ static void sunxi_mmc_init_idma_des(struct 
>> sunxi_mmc_host *host,
>>  
>>  next_desc += sizeof(struct sunxi_idma_des);
>>  pdes[i].buf_addr_ptr1 =
>> -cpu_to_le32(sg_dma_address(>sg[i]));
>> -pdes[i].buf_addr_ptr2 = cpu_to_le32((u32)next_desc);
>> +cpu_to_le32(sg_dma_address(>sg[i]) >>
>> +host->cfg->idma_des_shift);
>> +pdes[i].buf_addr_ptr2 = cpu_to_le32((u32)next_desc >>
>> +host->cfg->idma_des_shift);
> 
> I think you should cast after the shift, otherwise you lose the ability
> to run above 4 GB. This won't be a problem at the moment, since we still
> use the default 32-bit DMA mask, but might bite us later.
> 
> Otherwise this patch looks fine, and works on the H616 as well.
> 
> Cheers,
> Andre
> 
>>  }
>>  
>>  pdes[0].config |= cpu_to_le32(SDXC_IDMAC_DES0_FD);
>> @@ -1178,6 +1181,23 @@ static const struct sunxi_mmc_cfg sun50i_a64_emmc_cfg 
>> = {
>>  .needs_new_timings = true,
>>  };
>>  
>> +static const struct sunxi_mmc_cfg sun50i_a100_cfg = {
>> +.idma_des_size_bits = 16,
>> +.idma_des_shift = 2,
>> +.clk_delays = NULL,
>> +.can_calibrate = true,
>> +.mask_data0 = true,
>> +.needs_new_timings = true,
>> +};
>> +
>> +static const struct sunxi_mmc_cfg sun50i_a100_emmc_cfg = {
>> +.idma_des_size_bits = 13,
>> +.idma_des_shift = 2,

Is that actually true? Don't know about the A100, but the H616 manual
mentions that "SMHC2" deals with byte addresses, in contrast to the
other two ones. So MMC2 would be compatible with the a64_emmc_cfg?

Cheers,
Andre

>> +.clk_delays = NULL,
>> +.can_calibrate = true,
>> +.needs_new_timings = true,
>> +};
>> +
>>  static const struct of_device_id sunxi_mmc_of_match[] = {
>>  { .compatible = "allwinner,sun4i-a10-mmc", .data = _a10_cfg },
>>  { .compatible = "allwinner,sun5i-a13-mmc", .data = _a13_cfg },
>> @@ -1186,6 +1206,8 @@ static const struct of_device_id sunxi_mmc_of_match[] 
>> = {
>>  { .compatible = "allwinner,sun9i-a80-mmc", .data = _a80_cfg },
>>  { .compatible = "allwinner,sun50i-a64-mmc", .data = _a64_cfg },
>>  { .compatible = "allwinner,sun50i-a64-emmc", .data = 
>> _a64_emmc_cfg },
>> +{ .compatible = "allwinner,sun50i-a100-mmc", .data = _a100_cfg },
>> +{ .compatible = "allwinner,sun50i-a100-emmc", .data = 
>> _a100_emmc_cfg },
>>  { /* sentinel */ }
>>  };
>>  MODULE_DEVICE_TABLE(of, sunxi_mmc_of_match);
>>
> 



Re: [PATCH v2 bpf-next 11/13] bpf: Add bitwise atomic instructions

2020-11-28 Thread Alexei Starovoitov
On Fri, Nov 27, 2020 at 09:39:10PM -0800, Yonghong Song wrote:
> 
> 
> On 11/27/20 9:57 AM, Brendan Jackman wrote:
> > This adds instructions for
> > 
> > atomic[64]_[fetch_]and
> > atomic[64]_[fetch_]or
> > atomic[64]_[fetch_]xor
> > 
> > All these operations are isomorphic enough to implement with the same
> > verifier, interpreter, and x86 JIT code, hence being a single commit.
> > 
> > The main interesting thing here is that x86 doesn't directly support
> > the fetch_ version these operations, so we need to generate a CMPXCHG
> > loop in the JIT. This requires the use of two temporary registers,
> > IIUC it's safe to use BPF_REG_AX and x86's AUX_REG for this purpose.
> 
> similar to previous xsub (atomic[64]_sub), should we implement
> xand, xor, xxor in llvm?

yes. please. Unlike atomic_fetch_sub that can be handled by llvm.
atomic_fetch_or/xor/and has to be seen as separate instructions
because JITs will translate them as loop.


Re: [PATCH v2 bpf-next 10/13] bpf: Add instructions for atomic[64]_[fetch_]sub

2020-11-28 Thread Alexei Starovoitov
On Fri, Nov 27, 2020 at 09:35:07PM -0800, Yonghong Song wrote:
> 
> 
> On 11/27/20 9:57 AM, Brendan Jackman wrote:
> > Including only interpreter and x86 JIT support.
> > 
> > x86 doesn't provide an atomic exchange-and-subtract instruction that
> > could be used for BPF_SUB | BPF_FETCH, however we can just emit a NEG
> > followed by an XADD to get the same effect.
> > 
> > Signed-off-by: Brendan Jackman 
> > ---
> >   arch/x86/net/bpf_jit_comp.c  | 16 ++--
> >   include/linux/filter.h   | 20 
> >   kernel/bpf/core.c|  1 +
> >   kernel/bpf/disasm.c  | 16 
> >   kernel/bpf/verifier.c|  2 ++
> >   tools/include/linux/filter.h | 20 
> >   6 files changed, 69 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 7431b2937157..a8a9fab13fcf 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -823,6 +823,7 @@ static int emit_atomic(u8 **pprog, u8 atomic_op,
> > /* emit opcode */
> > switch (atomic_op) {
> > +   case BPF_SUB:
> > case BPF_ADD:
> > /* lock *(u32/u64*)(dst_reg + off) = src_reg */
> > EMIT1(simple_alu_opcodes[atomic_op]);
> > @@ -1306,8 +1307,19 @@ st:  if (is_imm8(insn->off))
> > case BPF_STX | BPF_ATOMIC | BPF_W:
> > case BPF_STX | BPF_ATOMIC | BPF_DW:
> > -   err = emit_atomic(, insn->imm, dst_reg, src_reg,
> > - insn->off, BPF_SIZE(insn->code));
> > +   if (insn->imm == (BPF_SUB | BPF_FETCH)) {
> > +   /*
> > +* x86 doesn't have an XSUB insn, so we negate
> > +* and XADD instead.
> > +*/
> > +   emit_neg(, src_reg, BPF_SIZE(insn->code) 
> > == BPF_DW);
> > +   err = emit_atomic(, BPF_ADD | BPF_FETCH,
> > + dst_reg, src_reg, insn->off,
> > + BPF_SIZE(insn->code));
> > +   } else {
> > +   err = emit_atomic(, insn->imm, dst_reg, 
> > src_reg,
> > + insn->off, 
> > BPF_SIZE(insn->code));
> > +   }
> > if (err)
> > return err;
> > break;
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index 6186280715ed..a20a3a536bf5 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -280,6 +280,26 @@ static inline bool insn_is_zext(const struct bpf_insn 
> > *insn)
> > .off   = OFF,   \
> > .imm   = BPF_ADD | BPF_FETCH })
> > +/* Atomic memory sub, *(uint *)(dst_reg + off16) -= src_reg */
> > +
> > +#define BPF_ATOMIC_SUB(SIZE, DST, SRC, OFF)\
> > +   ((struct bpf_insn) {\
> > +   .code  = BPF_STX | BPF_SIZE(SIZE) | BPF_ATOMIC, \
> > +   .dst_reg = DST, \
> > +   .src_reg = SRC, \
> > +   .off   = OFF,   \
> > +   .imm   = BPF_SUB })
> 
> Currently, llvm does not support XSUB, should we support it in llvm?
> At source code, as implemented in JIT, user can just do a negate
> followed by xadd.

I forgot we have BPF_NEG insn :)
Indeed it's probably easier to handle atomic_fetch_sub() builtin
completely on llvm side. It can generate bpf_neg followed by atomic_fetch_add.
No need to burden verifier, interpreter and JITs with it.


Re: [PATCH v2 bpf-next 08/13] bpf: Add instructions for atomic_[cmp]xchg

2020-11-28 Thread Alexei Starovoitov
On Fri, Nov 27, 2020 at 05:57:33PM +, Brendan Jackman wrote:
>  
>  /* atomic op type fields (stored in immediate) */
> -#define BPF_FETCH0x01/* fetch previous value into src reg */
> +#define BPF_XCHG (0xe0 | BPF_FETCH)  /* atomic exchange */
> +#define BPF_CMPXCHG  (0xf0 | BPF_FETCH)  /* atomic compare-and-write */
> +#define BPF_FETCH0x01/* fetch previous value into src reg or r0*/

I think such comment is more confusing than helpful.
I'd just say that the fetch bit is not valid on its own.
It's used to build other instructions like cmpxchg and atomic_fetch_add.

> + } else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
> +insn->imm == (BPF_CMPXCHG)) {

redundant ().

> + verbose(cbs->private_data, "(%02x) r0 = 
> atomic%s_cmpxchg(*(%s *)(r%d %+d), r0, r%d)\n",
> + insn->code,
> + BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
> + bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
> + insn->dst_reg, insn->off,
> + insn->src_reg);
> + } else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
> +insn->imm == (BPF_XCHG)) {

redundant ().


Re: [PATCH] media: solo6x10: switch from 'pci_' to 'dma_' API

2020-11-28 Thread Ismael Luceno
On 27/Nov/2020 21:34, Christophe JAILLET wrote:
> The wrappers in include/linux/pci-dma-compat.h should go away.
> 
> The patch has been generated with the coccinelle script below and has been
> hand modified to replace GFP_ with a correct flag.
> It has been compile tested.
> 
> When memory is allocated in 'snd_solo_pcm_open()' (solo6x10-g723.c)
> GFP_KERNEL can be used because this flag is already used jew a few lines
> above.
> 
> When memory is allocated in 'solo_enc_alloc()' (solo6x10-v4l2-enc.c)
> GFP_KERNEL can be used because this flag is already used jew a few lines
> above.
> 
> When memory is allocated in 'solo_enc_v4l2_init()' (solo6x10-v4l2-enc.c)
> GFP_KERNEL can be used because calls 'solo_enc_alloc()' which already uses
> this flag.
> 
> @@
> @@
> -PCI_DMA_BIDIRECTIONAL
> +DMA_BIDIRECTIONAL
> 
> @@
> @@
> -PCI_DMA_TODEVICE
> +DMA_TO_DEVICE
> 
> @@
> @@
> -PCI_DMA_FROMDEVICE
> +DMA_FROM_DEVICE
> 
> @@
> @@
> -PCI_DMA_NONE
> +DMA_NONE
> 
> @@
> expression e1, e2, e3;
> @@
> -pci_alloc_consistent(e1, e2, e3)
> +dma_alloc_coherent(>dev, e2, e3, GFP_)
> 
> @@
> expression e1, e2, e3;
> @@
> -pci_zalloc_consistent(e1, e2, e3)
> +dma_alloc_coherent(>dev, e2, e3, GFP_)
> 
> @@
> expression e1, e2, e3, e4;
> @@
> -pci_free_consistent(e1, e2, e3, e4)
> +dma_free_coherent(>dev, e2, e3, e4)
> 
> @@
> expression e1, e2, e3, e4;
> @@
> -pci_map_single(e1, e2, e3, e4)
> +dma_map_single(>dev, e2, e3, e4)
> 
> @@
> expression e1, e2, e3, e4;
> @@
> -pci_unmap_single(e1, e2, e3, e4)
> +dma_unmap_single(>dev, e2, e3, e4)
> 
> @@
> expression e1, e2, e3, e4, e5;
> @@
> -pci_map_page(e1, e2, e3, e4, e5)
> +dma_map_page(>dev, e2, e3, e4, e5)
> 
> @@
> expression e1, e2, e3, e4;
> @@
> -pci_unmap_page(e1, e2, e3, e4)
> +dma_unmap_page(>dev, e2, e3, e4)
> 
> @@
> expression e1, e2, e3, e4;
> @@
> -pci_map_sg(e1, e2, e3, e4)
> +dma_map_sg(>dev, e2, e3, e4)
> 
> @@
> expression e1, e2, e3, e4;
> @@
> -pci_unmap_sg(e1, e2, e3, e4)
> +dma_unmap_sg(>dev, e2, e3, e4)
> 
> @@
> expression e1, e2, e3, e4;
> @@
> -pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
> +dma_sync_single_for_cpu(>dev, e2, e3, e4)
> 
> @@
> expression e1, e2, e3, e4;
> @@
> -pci_dma_sync_single_for_device(e1, e2, e3, e4)
> +dma_sync_single_for_device(>dev, e2, e3, e4)
> 
> @@
> expression e1, e2, e3, e4;
> @@
> -pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
> +dma_sync_sg_for_cpu(>dev, e2, e3, e4)
> 
> @@
> expression e1, e2, e3, e4;
> @@
> -pci_dma_sync_sg_for_device(e1, e2, e3, e4)
> +dma_sync_sg_for_device(>dev, e2, e3, e4)
> 
> @@
> expression e1, e2;
> @@
> -pci_dma_mapping_error(e1, e2)
> +dma_mapping_error(>dev, e2)
> 
> @@
> expression e1, e2;
> @@
> -pci_set_dma_mask(e1, e2)
> +dma_set_mask(>dev, e2)
> 
> @@
> expression e1, e2;
> @@
> -pci_set_consistent_dma_mask(e1, e2)
> +dma_set_coherent_mask(>dev, e2)
> 
> Signed-off-by: Christophe JAILLET 
> ---
> If needed, see post from Christoph Hellwig on the kernel-janitors ML:
>https://marc.info/?l=kernel-janitors=158745678307186=4
> ---
>  drivers/media/pci/solo6x10/solo6x10-g723.c| 11 +++---
>  drivers/media/pci/solo6x10/solo6x10-p2m.c | 10 +++---
>  .../media/pci/solo6x10/solo6x10-v4l2-enc.c| 35 ++-
>  3 files changed, 29 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/media/pci/solo6x10/solo6x10-g723.c 
> b/drivers/media/pci/solo6x10/solo6x10-g723.c
> index d137b94869d8..6cebad665565 100644
> --- a/drivers/media/pci/solo6x10/solo6x10-g723.c
> +++ b/drivers/media/pci/solo6x10/solo6x10-g723.c
> @@ -124,9 +124,10 @@ static int snd_solo_pcm_open(struct snd_pcm_substream 
> *ss)
>   if (solo_pcm == NULL)
>   goto oom;
>  
> - solo_pcm->g723_buf = pci_alloc_consistent(solo_dev->pdev,
> -   G723_PERIOD_BYTES,
> -   _pcm->g723_dma);
> + solo_pcm->g723_buf = dma_alloc_coherent(_dev->pdev->dev,
> + G723_PERIOD_BYTES,
> + _pcm->g723_dma,
> + GFP_KERNEL);
>   if (solo_pcm->g723_buf == NULL)
>   goto oom;
>  
> @@ -148,8 +149,8 @@ static int snd_solo_pcm_close(struct snd_pcm_substream 
> *ss)
>   struct solo_snd_pcm *solo_pcm = snd_pcm_substream_chip(ss);
>  
>   snd_pcm_substream_chip(ss) = solo_pcm->solo_dev;
> - pci_free_consistent(solo_pcm->solo_dev->pdev, G723_PERIOD_BYTES,
> - solo_pcm->g723_buf, solo_pcm->g723_dma);
> + dma_free_coherent(_pcm->solo_dev->pdev->dev, G723_PERIOD_BYTES,
> +   solo_pcm->g723_buf, solo_pcm->g723_dma);
>   kfree(solo_pcm);
>  
>   return 0;
> diff --git a/drivers/media/pci/solo6x10/solo6x10-p2m.c 
> b/drivers/media/pci/solo6x10/solo6x10-p2m.c
> index 

Re: [PATCH v12 11/17] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

2020-11-28 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:10 -0500
Tony Krowiak  wrote:

> The current implementation does not allow assignment of an AP adapter or
> domain to an mdev device if each APQN resulting from the assignment
> does not reference an AP queue device that is bound to the vfio_ap device
> driver. This patch allows assignment of AP resources to the matrix mdev as
> long as the APQNs resulting from the assignment:
>1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
>2. Are not assigned to another matrix mdev.
> 
> The rationale behind this is twofold:
>1. The AP architecture does not preclude assignment of APQNs to an AP
>   configuration that are not available to the system.
>2. APQNs that do not reference a queue device bound to the vfio_ap
>   device driver will not be assigned to the guest's CRYCB, so the
>   guest will not get access to queues not bound to the vfio_ap driver.
> 
> Signed-off-by: Tony Krowiak 

Again code looks good. I'm still worried about all the incremental
changes (good for review) and their testability.


Re: [PATCH v2 bpf-next 01/13] bpf: x86: Factor out emission of ModR/M for *(reg + off)

2020-11-28 Thread Alexei Starovoitov
On Fri, Nov 27, 2020 at 05:57:26PM +, Brendan Jackman wrote:
> +/* Emit the ModR/M byte for addressing *(r1 + off) and r2 */
> +static void emit_modrm_dstoff(u8 **pprog, u32 r1, u32 r2, int off)

same concern as in the another patch. If you could avoid intel's puzzling names
like above it will make reviewing the patch easier.


drivers/net/dsa/ocelot/seville_vsc9953.c:1107:34: warning: unused variable 'seville_of_match'

2020-11-28 Thread kernel test robot
Hi Vladimir,

First bad commit (maybe != root cause):

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   45e885c439e825c19f3a51e46ef8210984bc0a9c
commit: d60bc62de4ae068ed4b215c24cdfdd5035aa986e net: dsa: seville: build as 
separate module
date:   2 months ago
config: x86_64-randconfig-r006-20201129 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
e9e45b3887ca343e90fe91fe77b98d47e66ca312)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d60bc62de4ae068ed4b215c24cdfdd5035aa986e
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout d60bc62de4ae068ed4b215c24cdfdd5035aa986e
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/net/dsa/ocelot/seville_vsc9953.c:1107:34: warning: unused variable 
>> 'seville_of_match' [-Wunused-const-variable]
   static const struct of_device_id seville_of_match[] = {
^
   1 warning generated.

vim +/seville_of_match +1107 drivers/net/dsa/ocelot/seville_vsc9953.c

84705fc165526e8 Maxim Kochetkov 2020-07-13  1106  
84705fc165526e8 Maxim Kochetkov 2020-07-13 @1107  static const struct 
of_device_id seville_of_match[] = {
84705fc165526e8 Maxim Kochetkov 2020-07-13  1108{ .compatible = 
"mscc,vsc9953-switch" },
84705fc165526e8 Maxim Kochetkov 2020-07-13  1109{ },
84705fc165526e8 Maxim Kochetkov 2020-07-13  1110  };
84705fc165526e8 Maxim Kochetkov 2020-07-13    MODULE_DEVICE_TABLE(of, 
seville_of_match);
84705fc165526e8 Maxim Kochetkov 2020-07-13  1112  

:: The code at line 1107 was first introduced by commit
:: 84705fc165526e8e55d208b2b10a48cc720a106a net: dsa: felix: introduce 
support for Seville VSC9953 switch

:: TO: Maxim Kochetkov 
:: CC: David S. Miller 

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH v2 bpf-next 02/13] bpf: x86: Factor out emission of REX byte

2020-11-28 Thread Alexei Starovoitov
On Fri, Nov 27, 2020 at 05:57:27PM +, Brendan Jackman wrote:
> The JIT case for encoding atomic ops is about to get more
> complicated. In order to make the review & resulting code easier,
> let's factor out some shared helpers.
> 
> Signed-off-by: Brendan Jackman 
> ---
>  arch/x86/net/bpf_jit_comp.c | 39 ++---
>  1 file changed, 23 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 94b17bd30e00..a839c1a54276 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -702,6 +702,21 @@ static void emit_modrm_dstoff(u8 **pprog, u32 r1, u32 
> r2, int off)
>   *pprog = prog;
>  }
>  
> +/*
> + * Emit a REX byte if it will be necessary to address these registers

What is "REX byte" ?
May be rename it to maybe_emit_mod() ?

> + */
> +static void maybe_emit_rex(u8 **pprog, u32 reg_rm, u32 reg_reg, bool wide)

could you please keep original names as dst_reg/src_reg instead of 
reg_rm/reg_reg ?
reg_reg reads really odd and reg_rm is equally puzzling unless the reader 
studied
intel's manual. I didn't. All these new abbreviations are challenging for me.
> +{
> + u8 *prog = *pprog;
> + int cnt = 0;
> +
> + if (wide)

what is 'wide' ? Why not to call it 'bool is_alu64' ?

> + EMIT1(add_2mod(0x48, reg_rm, reg_reg));
> + else if (is_ereg(reg_rm) || is_ereg(reg_reg))
> + EMIT1(add_2mod(0x40, reg_rm, reg_reg));
> + *pprog = prog;
> +}


Re: [PATCH v12 10/17] s390/vfio-ap: initialize the guest apcb

2020-11-28 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:09 -0500
Tony Krowiak  wrote:

> The APCB is a control block containing the masks that specify the adapters,
> domains and control domains to which a KVM guest is granted access. When
> the vfio_ap device driver is notified that the KVM pointer has been set,
> the guest's APCB is initialized from the AP configuration of adapters,
> domains and control domains assigned to the matrix mdev. The linux device
> model, however, precludes passing through to a guest any devices that
> are not bound to the device driver facilitating the pass-through.
> Consequently, APQNs assigned to the matrix mdev that do not reference
> AP queue devices must be filtered before assigning them to the KVM guest's
> APCB; however, the AP architecture precludes filtering individual APQNs, so
> the APQNs will be filtered by APID. That is, if a given APQN does not
> reference a queue device bound to the vfio_ap driver, its APID will not
> get assigned to the guest's APCB. For example:
> 
> Queues bound to vfio_ap:
> 04.0004
> 04.0022
> 04.0035
> 05.0004
> 05.0022
> 
> Adapters/domains assigned to the matrix mdev:
> 04 0004
>0022
>0035
> 05 0004
>0022
>0035
> 
> APQNs assigned to APCB:
> 04.0004
> 04.0022
> 04.0035
> 
> The APID 05 was filtered from the matrix mdev's matrix because
> queue device 05.0035 is not bound to the vfio_ap device driver.
> 
> Signed-off-by: Tony Krowiak 

This adds filtering. So from here guest_matrix may be different
than matrix also for an mdev that is associated with a guest. I'm still
grappling with the big picture. Have you thought about testability?
How is a testcase supposed to figure out which behavior is
to be deemed correct?

I don't like the title line. It implies that guest apcb was
uninitialized before. Which is not the case.






Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper

2020-11-28 Thread Alexei Starovoitov
On Thu, Nov 26, 2020 at 05:57:47PM +0100, Florent Revest wrote:
> This helper exposes the kallsyms_lookup function to eBPF tracing
> programs. This can be used to retrieve the name of the symbol at an
> address. For example, when hooking into nf_register_net_hook, one can
> audit the name of the registered netfilter hook and potentially also
> the name of the module in which the symbol is located.
> 
> Signed-off-by: Florent Revest 
> ---
>  include/uapi/linux/bpf.h   | 16 +
>  kernel/trace/bpf_trace.c   | 41 ++
>  tools/include/uapi/linux/bpf.h | 16 +
>  3 files changed, 73 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c3458ec1f30a..670998635eac 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3817,6 +3817,21 @@ union bpf_attr {
>   *   The **hash_algo** is returned on success,
>   *   **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
>   *   invalid arguments are passed.
> + *
> + * long bpf_kallsyms_lookup(u64 address, char *symbol, u32 symbol_size, char 
> *module, u32 module_size)
> + *   Description
> + *   Uses kallsyms to write the name of the symbol at *address*
> + *   into *symbol* of size *symbol_sz*. This is guaranteed to be
> + *   zero terminated.
> + *   If the symbol is in a module, up to *module_size* bytes of
> + *   the module name is written in *module*. This is also
> + *   guaranteed to be zero-terminated. Note: a module name
> + *   is always shorter than 64 bytes.
> + *   Return
> + *   On success, the strictly positive length of the full symbol
> + *   name, If this is greater than *symbol_size*, the written
> + *   symbol is truncated.
> + *   On error, a negative value.

Looks like debug-only helper.
I cannot think of a way to use in production code.
What program suppose to do with that string?
Do string compare? BPF side doesn't have a good way to do string manipulations.
If you really need to print a symbolic name for a given address
I'd rather extend bpf_trace_printk() to support %pS


[RFC PATCH 11/13] fs/userfaultfd: complete write asynchronously

2020-11-28 Thread Nadav Amit
From: Nadav Amit 

Userfaultfd writes can now be used for copy/zeroing. When using iouring
with userfaultfd, performing the copying/zeroing on the faulting thread
instead of the handler/iouring thread has several advantages:

(1) The data of the faulting thread will be available on the local
caches, which would make subsequent memory accesses faster.

(2) find_vma() would be able to use the vma-cache, which cannot be done
from a different process or io-uring kernel thread.

(3) The page is more likely to be allocated on the correct NUMA node.

To do so, userfaultfd work queue structs are extended to hold the
information that is required for the faulting thread to copy/zero. The
handler wakes one of the faulting threads to perform the copy/zero and
that thread wakes the other threads after the zero/copy is done.

Cc: Jens Axboe 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Alexander Viro 
Cc: io-ur...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Nadav Amit 
---
 fs/userfaultfd.c | 241 ++-
 1 file changed, 178 insertions(+), 63 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index eae6ac303951..5c22170544e3 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -105,58 +105,71 @@ struct userfaultfd_unmap_ctx {
struct list_head list;
 };
 
+struct userfaultfd_wake_info {
+   __u64 mode;
+   struct kiocb *iocb_callback;
+   struct iov_iter from;
+   unsigned long start;
+   unsigned long len;
+   bool copied;
+};
+
 struct userfaultfd_wait_queue {
struct uffd_msg msg;
wait_queue_entry_t wq;
struct userfaultfd_ctx *ctx;
+   struct userfaultfd_wake_info wake_info;
bool waken;
 };
 
-struct userfaultfd_wake_range {
-   unsigned long start;
-   unsigned long len;
-};
+
 
 static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode,
 int wake_flags, void *key)
 {
-   struct userfaultfd_wake_range *range = key;
-   int ret;
+   struct userfaultfd_wake_info *wake_info = key;
struct userfaultfd_wait_queue *uwq;
unsigned long start, len;
+   int ret = 0;
 
uwq = container_of(wq, struct userfaultfd_wait_queue, wq);
-   ret = 0;
/* len == 0 means wake all */
-   start = range->start;
-   len = range->len;
+   start = wake_info->start;
+   len = wake_info->len;
if (len && (start > uwq->msg.arg.pagefault.address ||
start + len <= uwq->msg.arg.pagefault.address))
goto out;
 
-   smp_store_mb(uwq->waken, true);
+   uwq->wake_info = *wake_info;
+
+   if (wake_info->iocb_callback)
+   wake_info->copied = true;
+
+   /* Ensure uwq->wake_info is visible to handle_userfault() before waken 
*/
+   smp_wmb();
+
+   WRITE_ONCE(uwq->waken, true);
 
/*
 * The Program-Order guarantees provided by the scheduler
 * ensure uwq->waken is visible before the task is woken.
 */
ret = wake_up_state(wq->private, mode);
-   if (ret) {
-   /*
-* Wake only once, autoremove behavior.
-*
-* After the effect of list_del_init is visible to the other
-* CPUs, the waitqueue may disappear from under us, see the
-* !list_empty_careful() in handle_userfault().
-*
-* try_to_wake_up() has an implicit smp_mb(), and the
-* wq->private is read before calling the extern function
-* "wake_up_state" (which in turns calls try_to_wake_up).
-*/
-   list_del_init(>entry);
-   }
+
+   /*
+* Wake only once, autoremove behavior.
+*
+* After the effect of list_del_init is visible to the other
+* CPUs, the waitqueue may disappear from under us, see the
+* !list_empty_careful() in handle_userfault().
+*
+* try_to_wake_up() has an implicit smp_mb(), and the
+* wq->private is read before calling the extern function
+* "wake_up_state" (which in turns calls try_to_wake_up).
+*/
+   list_del_init(>entry);
 out:
-   return ret;
+   return ret || wake_info->copied;
 }
 
 /**
@@ -384,6 +397,9 @@ static bool userfaultfd_get_async_complete_locked(struct 
userfaultfd_ctx *ctx,
return true;
 }
 
+static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx,
+  struct userfaultfd_wake_info 
*wake_info);
+
 static bool userfaultfd_get_async_complete(struct userfaultfd_ctx *ctx,
struct kiocb **iocb, struct iov_iter *iter)
 {
@@ -414,6 +430,43 @@ static void userfaultfd_copy_async_msg(struct kiocb *iocb,
iter->kvec = NULL;
 }
 
+static void 

[RFC PATCH 10/13] fs/userfaultfd: add write_iter() interface

2020-11-28 Thread Nadav Amit
From: Nadav Amit 

In order to use userfaultfd with io-uring, there are two options for
extensions: support userfaultfd ioctls or provide similar functionality
through the "write" interface. The latter approach seems more compelling
as it does not require io-uring changes, and keeps all the logic of
userfaultfd where it should be. In addition it allows to provide
asynchronous completions by performing the copying/zeroing in the
faulting thread (which will be done in a later patch).

This patch enhances the userfaultfd API to provide write interface to
perform similar operations for copy/zero. The lower bits of the position
(smaller than PAGE_SHIFT) are being used to encode the required
operation: zero/copy/wake/write-protect. In the case of zeroing, the
source data is ignored and only the length is being used to determine
the size of the data that needs to be zeroed.

Cc: Jens Axboe 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Alexander Viro 
Cc: io-ur...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Nadav Amit 
---
 fs/userfaultfd.c | 96 +++-
 include/uapi/linux/userfaultfd.h | 14 -
 2 files changed, 107 insertions(+), 3 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 7bbee2a00d37..eae6ac303951 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1140,6 +1140,34 @@ static __poll_t userfaultfd_poll(struct file *file, 
poll_table *wait)
 
 static const struct file_operations userfaultfd_fops;
 
+/* Open-coded version of anon_inode_getfd() to setup FMODE_PWRITE */
+static int userfaultfd_getfd(const char *name, const struct file_operations 
*fops,
+void *priv, int flags)
+{
+   int error, fd;
+   struct file *file;
+
+   error = get_unused_fd_flags(flags);
+   if (error < 0)
+   return error;
+   fd = error;
+
+   file = anon_inode_getfile(name, fops, priv, flags);
+
+   if (IS_ERR(file)) {
+   error = PTR_ERR(file);
+   goto err_put_unused_fd;
+   }
+   file->f_mode |= FMODE_PWRITE;
+   fd_install(fd, file);
+
+   return fd;
+
+err_put_unused_fd:
+   put_unused_fd(fd);
+   return error;
+}
+
 static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
  struct userfaultfd_ctx *new,
  struct uffd_msg *msg)
@@ -1161,7 +1189,7 @@ static int resolve_userfault_fork(struct userfaultfd_ctx 
*ctx,
task_unlock(current);
}
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, new,
+   fd = userfaultfd_getfd("[userfaultfd]", _fops, new,
  O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
 
if (files != NULL) {
@@ -1496,6 +1524,69 @@ static __always_inline int validate_range(struct 
mm_struct *mm,
return 0;
 }
 
+ssize_t userfaultfd_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+   struct file *file = iocb->ki_filp;
+   struct userfaultfd_wake_range range;
+   struct userfaultfd_ctx *ctx = file->private_data;
+   size_t len = iov_iter_count(from);
+   __u64 dst = iocb->ki_pos & PAGE_MASK;
+   unsigned long mode = iocb->ki_pos & ~PAGE_MASK;
+   bool zeropage;
+   __s64 ret;
+
+   BUG_ON(len == 0);
+
+   zeropage = mode & UFFDIO_WRITE_MODE_ZEROPAGE;
+
+   ret = -EINVAL;
+   if (mode & ~(UFFDIO_WRITE_MODE_DONTWAKE | UFFDIO_WRITE_MODE_WP |
+UFFDIO_WRITE_MODE_ZEROPAGE))
+   goto out;
+
+   mode = mode & (UFFDIO_WRITE_MODE_DONTWAKE | UFFDIO_WRITE_MODE_WP);
+
+   /*
+* Keep compatibility with zeropage ioctl, which does not allow
+* write-protect and dontwake.
+*/
+   if (zeropage &&
+   (mode & (UFFDIO_WRITE_MODE_DONTWAKE | UFFDIO_WRITE_MODE_WP)) ==
+(UFFDIO_WRITE_MODE_DONTWAKE | UFFDIO_WRITE_MODE_WP))
+   goto out;
+
+   ret = -EAGAIN;
+   if (READ_ONCE(ctx->mmap_changing))
+   goto out;
+
+   ret = validate_range(ctx->mm, , len);
+   if (ret)
+   goto out;
+
+   if (mmget_not_zero(ctx->mm)) {
+   if (zeropage)
+   ret = mfill_zeropage(ctx->mm, dst, from,
+>mmap_changing);
+   else
+   ret = mcopy_atomic(ctx->mm, dst, from,
+  >mmap_changing, mode);
+   mmput(ctx->mm);
+   } else {
+   return -ESRCH;
+   }
+   if (ret < 0)
+   goto out;
+
+   /* len == 0 would wake all */
+   range.len = ret;
+   if (!(mode & UFFDIO_COPY_MODE_DONTWAKE)) {
+   range.start = dst;
+   wake_userfault(ctx, );
+   }
+out:
+   return ret;
+}
+
 static inline bool vma_can_userfault(struct vm_area_struct *vma,

[RFC PATCH 09/13] fs/userfaultfd: use iov_iter for copy/zero

2020-11-28 Thread Nadav Amit
From: Nadav Amit 

Use iov_iter for copy and zero ioctls. This is done in preparation to
support a write_iter() interface that would provide similar services as
UFFDIO_COPY/ZERO.

In the case of UFFDIO_ZERO, the iov_iter is not really used for any
purpose other than providing the length of the range that is zeroed.

Cc: Mike Kravetz 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Jens Axboe 
Cc: Alexander Viro 
Cc: io-ur...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Nadav Amit 
---
 fs/userfaultfd.c  | 21 ++--
 include/linux/hugetlb.h   |  4 +-
 include/linux/mm.h|  6 +--
 include/linux/shmem_fs.h  |  2 +-
 include/linux/userfaultfd_k.h | 10 ++--
 mm/hugetlb.c  | 12 +++--
 mm/memory.c   | 36 ++---
 mm/shmem.c| 17 +++
 mm/userfaultfd.c  | 96 +--
 9 files changed, 102 insertions(+), 102 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index db1a963f6ae2..7bbee2a00d37 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1914,6 +1914,8 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
struct uffdio_copy uffdio_copy;
struct uffdio_copy __user *user_uffdio_copy;
struct userfaultfd_wake_range range;
+   struct iov_iter iter;
+   struct iovec iov;
 
user_uffdio_copy = (struct uffdio_copy __user *) arg;
 
@@ -1940,10 +1942,15 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
goto out;
if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP))
goto out;
+
+   ret = import_single_range(READ, (__force void __user *)uffdio_copy.src,
+ uffdio_copy.len, , );
+   if (unlikely(ret))
+   return ret;
+
if (mmget_not_zero(ctx->mm)) {
-   ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
-  uffdio_copy.len, >mmap_changing,
-  uffdio_copy.mode);
+   ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, ,
+  >mmap_changing, uffdio_copy.mode);
mmput(ctx->mm);
} else {
return -ESRCH;
@@ -1971,6 +1978,8 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx 
*ctx,
struct uffdio_zeropage uffdio_zeropage;
struct uffdio_zeropage __user *user_uffdio_zeropage;
struct userfaultfd_wake_range range;
+   struct iov_iter iter;
+   struct iovec iov;
 
user_uffdio_zeropage = (struct uffdio_zeropage __user *) arg;
 
@@ -1992,10 +2001,12 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx 
*ctx,
if (uffdio_zeropage.mode & ~UFFDIO_ZEROPAGE_MODE_DONTWAKE)
goto out;
 
+   ret = import_single_range(READ, (__force void __user *)0,
+ uffdio_zeropage.range.len, , );
+
if (mmget_not_zero(ctx->mm)) {
ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start,
-uffdio_zeropage.range.len,
->mmap_changing);
+, >mmap_changing);
mmput(ctx->mm);
} else {
return -ESRCH;
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ebca2ef02212..2f3452e0bb84 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -137,7 +137,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct 
vm_area_struct *vma,
 int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
-   unsigned long src_addr,
+   struct iov_iter *iter,
struct page **pagep);
 int hugetlb_reserve_pages(struct inode *inode, long from, long to,
struct vm_area_struct *vma,
@@ -312,7 +312,7 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct 
*dst_mm,
pte_t *dst_pte,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
-   unsigned long src_addr,
+   struct iov_iter *iter,
struct page **pagep)
 {
BUG();
diff --git a/include/linux/mm.h b/include/linux/mm.h
index db6ae4d3fb4e..1f183c441d89 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3085,10 +3085,10 @@ extern void copy_user_huge_page(struct page *dst, 
struct page *src,
unsigned long 

[RFC PATCH 08/13] fs/userfaultfd: complete reads asynchronously

2020-11-28 Thread Nadav Amit
From: Nadav Amit 

Complete reads asynchronously to allow io_uring to complete reads
asynchronously.

Reads, which report page-faults and events, can only be performed
asynchronously if the read is performed into a kernel buffer, and
therefore guarantee that no page-fault would occur during the completion
of the read. Otherwise, we would have needed to handle nested
page-faults or do expensive pinning/unpinning of the pages into which
the read is performed.

Userfaultfd holds in its context the kiocb and iov_iter that would be
used for the next asynchronous read (can be extended later into a list
to hold more than a single enqueued read).  If such a buffer is
available and a fault occurs, the fault is reported to the user and the
fault is added to the fault workqueue instead of the pending-fault
workqueue.

There is a need to prevent a race between synchronous and asynchronous
reads, so reads will first use buffers that were previous enqueued and
only later pending-faults and events. For this matter a new
"notification" lock is introduced that is held while enqueuing new
events and pending faults and during event reads. It may be possible to
use the fd_wqh.lock instead, but having a separate lock for the matter
seems cleaner.

Cc: Jens Axboe 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Alexander Viro 
Cc: io-ur...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Nadav Amit 
---
 fs/userfaultfd.c | 265 +--
 1 file changed, 235 insertions(+), 30 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 6333b4632742..db1a963f6ae2 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -44,9 +44,10 @@ enum userfaultfd_state {
  *
  * Locking order:
  * fd_wqh.lock
- * fault_pending_wqh.lock
- * fault_wqh.lock
- * event_wqh.lock
+ * notification_lock
+ * fault_pending_wqh.lock
+ * fault_wqh.lock
+ * event_wqh.lock
  *
  * To avoid deadlocks, IRQs must be disabled when taking any of the above 
locks,
  * since fd_wqh.lock is taken by aio_poll() while it's holding a lock that's
@@ -79,6 +80,16 @@ struct userfaultfd_ctx {
struct mm_struct *mm;
/* controlling process files as they might be different than current */
struct files_struct *files;
+   /*
+* lock for sync and async userfaultfd reads, which must be held when
+* enqueueing into fault_pending_wqh or event_wqh, upon userfaultfd
+* reads and on accesses of iocb_callback and to.
+*/
+   spinlock_t notification_lock;
+   /* kiocb struct that is used for the next asynchronous read */
+   struct kiocb *iocb_callback;
+   /* the iterator that is used for the next asynchronous read */
+   struct iov_iter to;
 };
 
 struct userfaultfd_fork_ctx {
@@ -356,6 +367,53 @@ static inline long userfaultfd_get_blocking_state(unsigned 
int flags)
return TASK_UNINTERRUPTIBLE;
 }
 
+static bool userfaultfd_get_async_complete_locked(struct userfaultfd_ctx *ctx,
+   struct kiocb **iocb, struct iov_iter *iter)
+{
+   if (!ctx->released)
+   lockdep_assert_held(>notification_lock);
+
+   if (ctx->iocb_callback == NULL)
+   return false;
+
+   *iocb = ctx->iocb_callback;
+   *iter = ctx->to;
+
+   ctx->iocb_callback = NULL;
+   ctx->to.kvec = NULL;
+   return true;
+}
+
+static bool userfaultfd_get_async_complete(struct userfaultfd_ctx *ctx,
+   struct kiocb **iocb, struct iov_iter *iter)
+{
+   bool r;
+
+   spin_lock_irq(>notification_lock);
+   r = userfaultfd_get_async_complete_locked(ctx, iocb, iter);
+   spin_unlock_irq(>notification_lock);
+   return r;
+}
+
+static void userfaultfd_copy_async_msg(struct kiocb *iocb,
+  struct iov_iter *iter,
+  struct uffd_msg *msg,
+  int ret)
+{
+
+   const struct kvec *kvec = iter->kvec;
+
+   if (ret == 0)
+   ret = copy_to_iter(msg, sizeof(*msg), iter);
+
+   /* Should never fail as we guarantee that we use a kernel buffer */
+   WARN_ON_ONCE(ret != sizeof(*msg));
+   iocb->ki_complete(iocb, ret, 0);
+
+   kfree(kvec);
+   iter->kvec = NULL;
+}
+
 /*
  * The locking rules involved in returning VM_FAULT_RETRY depending on
  * FAULT_FLAG_ALLOW_RETRY, FAULT_FLAG_RETRY_NOWAIT and
@@ -380,6 +438,10 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
bool must_wait;
long blocking_state;
bool poll;
+   bool async = false;
+   struct kiocb *iocb;
+   struct iov_iter iter;
+   wait_queue_head_t *wqh;
 
/*
 * We don't do userfault handling for the final child pid 

[RFC PATCH 13/13] selftests/vm/userfaultfd: iouring and polling tests

2020-11-28 Thread Nadav Amit
From: Nadav Amit 

Add tests to check the use of userfaultfd with iouring, "write"
interface of userfaultfd and with the "poll" feature of userfaultfd.

Enabling the tests is done through new test "modifiers": "poll", "write"
"iouring" that are added to the test name after colon. The "shmem" test
does not work with "iouring" test. The signal test does not appear to be
suitable for iouring as it might leave the ring in dubious state.

Introduce a uffd_base_ops struct that holds functions for
read/copy/zero/etc operations using ioctls or alternatively writes or
iouring. Adapting the tests for iouring is slightly complicated, as
operations on iouring must be synchronized. Access to the iouring is
therefore protected by a mutex. Reads are performed to several
preallocated buffers and are protected by another mutex. Whenever the
iouring completion queue is polled, the caller must take care of any
read or write that were initiated, even if it waits for another event.

Each thread holds a local request ID which it uses to issue its own
non-read requests, under the assumption that only one request will be on
the fly at any given moment and that the issuing thread will wait for
its completion.

This change creates a dependency of the userfaultfd tests on iouring.

Cc: Jens Axboe 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Alexander Viro 
Cc: io-ur...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Nadav Amit 
---
 tools/testing/selftests/vm/Makefile  |   2 +-
 tools/testing/selftests/vm/userfaultfd.c | 824 +--
 2 files changed, 757 insertions(+), 69 deletions(-)

diff --git a/tools/testing/selftests/vm/Makefile 
b/tools/testing/selftests/vm/Makefile
index 30873b19d04b..4f88123530c5 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -127,6 +127,6 @@ warn_32bit_failure:
 endif
 endif
 
-$(OUTPUT)/userfaultfd: LDLIBS += -lpthread
+$(OUTPUT)/userfaultfd: LDLIBS += -lpthread -luring
 
 $(OUTPUT)/mlock-random-test: LDLIBS += -lcap
diff --git a/tools/testing/selftests/vm/userfaultfd.c 
b/tools/testing/selftests/vm/userfaultfd.c
index f7e6cf43db71..9077167b3e77 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -55,6 +55,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../kselftest.h"
 
@@ -73,6 +74,13 @@ static int bounces;
 #define TEST_SHMEM 3
 static int test_type;
 
+#define MOD_IOURING(0)
+#define MOD_WRITE  (1)
+#define MOD_POLL   (2)
+#define N_MODIFIERS(MOD_POLL+1)
+static bool test_mods[N_MODIFIERS];
+const char *mod_strs[N_MODIFIERS] = {"iouring", "write", "poll"};
+
 /* exercise the test_uffdio_*_eexist every ALARM_INTERVAL_SECS */
 #define ALARM_INTERVAL_SECS 10
 static volatile bool test_uffdio_copy_eexist = true;
@@ -111,6 +119,12 @@ struct uffd_stats {
 ~(unsigned long)(sizeof(unsigned long long) \
  -  1)))
 
+/*
+ * async indication that no result was provided. Must be different than any
+ * existing error code.
+ */
+#define RES_NOT_DONE   (-)
+
 const char *examples =
 "# Run anonymous memory test on 100MiB region with 9 bounces:\n"
 "./userfaultfd anon 100 9\n\n"
@@ -122,7 +136,10 @@ const char *examples =
 "./userfaultfd hugetlb_shared 256 50 /dev/hugepages/hugefile\n\n"
 "# 10MiB-~6GiB 999 bounces anonymous test, "
 "continue forever unless an error triggers\n"
-"while ./userfaultfd anon $[RANDOM % 6000 + 10] 999; do true; done\n\n";
+"while ./userfaultfd anon $[RANDOM % 6000 + 10] 999; do true; done\n"
+"# Run anonymous memory test on 100MiB region with 99 bounces, "
+"polling on faults with iouring interface\n"
+"./userfaultfd anon:iouring:poll 100 99\n\n";
 
 static void usage(void)
 {
@@ -288,6 +305,13 @@ struct uffd_test_ops {
void (*alias_mapping)(__u64 *start, size_t len, unsigned long offset);
 };
 
+struct uffd_base_ops {
+   bool (*poll_msg)(int ufd, unsigned long cpu);
+   int (*read_msg)(int ufd, struct uffd_msg *msg);
+   int (*copy)(int ufd, struct uffdio_copy *uffdio_copy);
+   int (*zero)(int ufd, struct uffdio_zeropage *zeropage);
+};
+
 #define SHMEM_EXPECTED_IOCTLS  ((1 << _UFFDIO_WAKE) | \
 (1 << _UFFDIO_COPY) | \
 (1 << _UFFDIO_ZEROPAGE))
@@ -465,13 +489,417 @@ static void *locking_thread(void *arg)
return NULL;
 }
 
+__thread int local_req_id;
+
+#define READ_QUEUE_DEPTH   (16)
+
+struct uffd_msg *iouring_read_msgs;
+
+static struct io_uring ring;
+
+/* ring_mutex - protects the iouring */
+pthread_mutex_t ring_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+/* async_mutex - protects iouring_read_msgs */
+pthread_mutex_t async_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+static volatile ssize_t *ring_results;
+
+enum 

[RFC PATCH 07/13] fs/userfaultfd: support read_iter to use io_uring

2020-11-28 Thread Nadav Amit
From: Nadav Amit 

iouring with userfaultfd cannot currently be used fixed buffers since
userfaultfd does not provide read_iter(). This is required to allow
asynchronous (queued) reads from userfaultfd.

To support async-reads of userfaultfd provide read_iter() instead of
read().

Cc: Jens Axboe 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Alexander Viro 
Cc: io-ur...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Nadav Amit 
---
 fs/userfaultfd.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index b6a04e526025..6333b4632742 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1195,9 +1195,9 @@ static ssize_t userfaultfd_ctx_read(struct 
userfaultfd_ctx *ctx, int no_wait,
return ret;
 }
 
-static ssize_t userfaultfd_read(struct file *file, char __user *buf,
-   size_t count, loff_t *ppos)
+static ssize_t userfaultfd_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
+   struct file *file = iocb->ki_filp;
struct userfaultfd_ctx *ctx = file->private_data;
ssize_t _ret, ret = 0;
struct uffd_msg msg;
@@ -1207,16 +1207,18 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
return -EINVAL;
 
for (;;) {
-   if (count < sizeof(msg))
+   if (iov_iter_count(to) < sizeof(msg))
return ret ? ret : -EINVAL;
_ret = userfaultfd_ctx_read(ctx, no_wait, );
if (_ret < 0)
return ret ? ret : _ret;
-   if (copy_to_user((__u64 __user *) buf, , sizeof(msg)))
-   return ret ? ret : -EFAULT;
+
+   _ret = copy_to_iter(, sizeof(msg), to);
+   if (_ret != sizeof(msg))
+   return ret ? ret : -EINVAL;
+
ret += sizeof(msg);
-   buf += sizeof(msg);
-   count -= sizeof(msg);
+
/*
 * Allow to read more than one fault at time but only
 * block if waiting for the very first one.
@@ -1980,7 +1982,7 @@ static const struct file_operations userfaultfd_fops = {
 #endif
.release= userfaultfd_release,
.poll   = userfaultfd_poll,
-   .read   = userfaultfd_read,
+   .read_iter  = userfaultfd_read_iter,
.unlocked_ioctl = userfaultfd_ioctl,
.compat_ioctl   = compat_ptr_ioctl,
.llseek = noop_llseek,
-- 
2.25.1



[RFC PATCH 06/13] iov_iter: support atomic copy_page_from_iter_iovec()

2020-11-28 Thread Nadav Amit
From: Nadav Amit 

copy_page_from_iter_iovec() cannot be used when preemption is enabled.

Change copy_page_from_iter_iovec() into __copy_page_from_iter_iovec()
with an additional parameter that says whether the caller runs in atomic
context. When __copy_page_from_iter_iovec() is used in an atomic context
it will gracefully fail but would not lead to a deadlock. The caller
is expected to recover from such failure gracefully.

Cc: Jens Axboe 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Alexander Viro 
Cc: io-ur...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Nadav Amit 
---
 include/linux/uio.h |  3 +++
 lib/iov_iter.c  | 23 +--
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 72d88566694e..7c90f7371a6f 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -121,6 +121,9 @@ size_t copy_page_to_iter(struct page *page, size_t offset, 
size_t bytes,
 struct iov_iter *i);
 size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
 struct iov_iter *i);
+size_t __copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
+struct iov_iter *i, bool atomic);
+
 
 size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
 size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 1635111c5bd2..e597df6a46a7 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -246,7 +246,7 @@ static size_t copy_page_to_iter_iovec(struct page *page, 
size_t offset, size_t b
 }
 
 static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, 
size_t bytes,
-struct iov_iter *i)
+struct iov_iter *i, bool atomic)
 {
size_t skip, copy, left, wanted;
const struct iovec *iov;
@@ -259,14 +259,15 @@ static size_t copy_page_from_iter_iovec(struct page 
*page, size_t offset, size_t
if (unlikely(!bytes))
return 0;
 
-   might_fault();
+   if (!atomic)
+   might_fault();
wanted = bytes;
iov = i->iov;
skip = i->iov_offset;
buf = iov->iov_base + skip;
copy = min(bytes, iov->iov_len - skip);
 
-   if (IS_ENABLED(CONFIG_HIGHMEM) && !fault_in_pages_readable(buf, copy)) {
+   if (atomic || (IS_ENABLED(CONFIG_HIGHMEM) && 
!fault_in_pages_readable(buf, copy))) {
kaddr = kmap_atomic(page);
to = kaddr + offset;
 
@@ -295,6 +296,9 @@ static size_t copy_page_from_iter_iovec(struct page *page, 
size_t offset, size_t
buf += copy;
kunmap_atomic(kaddr);
copy = min(bytes, iov->iov_len - skip);
+   if (atomic)
+   goto done;
+
}
/* Too bad - revert to non-atomic kmap */
 
@@ -929,8 +933,8 @@ size_t copy_page_to_iter(struct page *page, size_t offset, 
size_t bytes,
 }
 EXPORT_SYMBOL(copy_page_to_iter);
 
-size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
-struct iov_iter *i)
+size_t __copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
+struct iov_iter *i, bool atomic)
 {
if (unlikely(!page_copy_sane(page, offset, bytes)))
return 0;
@@ -944,7 +948,14 @@ size_t copy_page_from_iter(struct page *page, size_t 
offset, size_t bytes,
kunmap_atomic(kaddr);
return wanted;
} else
-   return copy_page_from_iter_iovec(page, offset, bytes, i);
+   return copy_page_from_iter_iovec(page, offset, bytes, i, 
atomic);
+}
+EXPORT_SYMBOL(__copy_page_from_iter);
+
+size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
+struct iov_iter *i)
+{
+   return __copy_page_from_iter(page, offset, bytes, i, false);
 }
 EXPORT_SYMBOL(copy_page_from_iter);
 
-- 
2.25.1



[RFC PATCH 05/13] fs/userfaultfd: introduce UFFD_FEATURE_POLL

2020-11-28 Thread Nadav Amit
From: Nadav Amit 

Add a feature UFFD_FEATURE_POLL that makes the faulting thread spin
while waiting for the page-fault to be handled.

Users of this feature should be wise by setting the page-fault handling
thread on another physical CPU and to potentially ensure that there are
available cores to run the handler, as otherwise they will see
performance degradation.

We can later enhance it by setting one or two timeouts: one timeout
until the page-fault is handled and another until the handler was
woken.

Cc: Jens Axboe 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Alexander Viro 
Cc: io-ur...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Nadav Amit 
---
 fs/userfaultfd.c | 24 
 include/uapi/linux/userfaultfd.h |  9 -
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index fedf7c1615d5..b6a04e526025 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -122,7 +122,9 @@ static int userfaultfd_wake_function(wait_queue_entry_t 
*wq, unsigned mode,
if (len && (start > uwq->msg.arg.pagefault.address ||
start + len <= uwq->msg.arg.pagefault.address))
goto out;
-   WRITE_ONCE(uwq->waken, true);
+
+   smp_store_mb(uwq->waken, true);
+
/*
 * The Program-Order guarantees provided by the scheduler
 * ensure uwq->waken is visible before the task is woken.
@@ -377,6 +379,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
vm_fault_t ret = VM_FAULT_SIGBUS;
bool must_wait;
long blocking_state;
+   bool poll;
 
/*
 * We don't do userfault handling for the final child pid update.
@@ -410,6 +413,8 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
if (ctx->features & UFFD_FEATURE_SIGBUS)
goto out;
 
+   poll = ctx->features & UFFD_FEATURE_POLL;
+
/*
 * If it's already released don't get it. This avoids to loop
 * in __get_user_pages if userfaultfd_release waits on the
@@ -495,7 +500,10 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
 * following the spin_unlock to happen before the list_add in
 * __add_wait_queue.
 */
-   set_current_state(blocking_state);
+
+   if (!poll)
+   set_current_state(blocking_state);
+
spin_unlock_irq(>fault_pending_wqh.lock);
 
if (!is_vm_hugetlb_page(vmf->vma))
@@ -509,10 +517,18 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, 
unsigned long reason)
 
if (likely(must_wait && !READ_ONCE(ctx->released))) {
wake_up_poll(>fd_wqh, EPOLLIN);
-   schedule();
+   if (poll) {
+   while (!READ_ONCE(uwq.waken) && 
!READ_ONCE(ctx->released) &&
+  !signal_pending(current)) {
+   cpu_relax();
+   cond_resched();
+   }
+   } else
+   schedule();
}
 
-   __set_current_state(TASK_RUNNING);
+   if (!poll)
+   __set_current_state(TASK_RUNNING);
 
/*
 * Here we race with the list_del; list_add in
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index e7e98bde221f..4eeba4235afe 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -27,7 +27,9 @@
   UFFD_FEATURE_MISSING_HUGETLBFS | \
   UFFD_FEATURE_MISSING_SHMEM | \
   UFFD_FEATURE_SIGBUS |\
-  UFFD_FEATURE_THREAD_ID)
+  UFFD_FEATURE_THREAD_ID | \
+  UFFD_FEATURE_POLL)
+
 #define UFFD_API_IOCTLS\
((__u64)1 << _UFFDIO_REGISTER | \
 (__u64)1 << _UFFDIO_UNREGISTER |   \
@@ -171,6 +173,10 @@ struct uffdio_api {
 *
 * UFFD_FEATURE_THREAD_ID pid of the page faulted task_struct will
 * be returned, if feature is not requested 0 will be returned.
+*
+* UFFD_FEATURE_POLL polls upon page-fault if the feature is requested
+* instead of descheduling. This feature should only be enabled for
+* low-latency handlers and when CPUs are not overcomitted.
 */
 #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0)
 #define UFFD_FEATURE_EVENT_FORK(1<<1)
@@ -181,6 +187,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_EVENT_UNMAP   (1<<6)
 #define UFFD_FEATURE_SIGBUS(1<<7)
 #define UFFD_FEATURE_THREAD_ID (1<<8)
+#define UFFD_FEATURE_POLL  (1<<9)
__u64 features;
 
__u64 ioctls;
-- 
2.25.1



[RFC PATCH 12/13] fs/userfaultfd: kmem-cache for wait-queue objects

2020-11-28 Thread Nadav Amit
From: Nadav Amit 

Allocating work-queue objects on the stack has usually negative
performance side-effects. First, it is hard to ensure alignment to
cache-lines without increasing the stack size. Second, it might cause
false sharing. Third, it is more likely to encounter TLB misses as
objects are more likely reside on different pages.

Allocate userfaultfd wait-queue objects on the heap using kmem-cache for
better performance.

Cc: Jens Axboe 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Alexander Viro 
Cc: io-ur...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Nadav Amit 
---
 fs/userfaultfd.c | 60 +---
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 5c22170544e3..224b595ec758 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -32,6 +32,7 @@
 int sysctl_unprivileged_userfaultfd __read_mostly = 1;
 
 static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
+static struct kmem_cache *userfaultfd_wait_queue_cachep __read_mostly;
 
 enum userfaultfd_state {
UFFD_STATE_WAIT_API,
@@ -904,14 +905,15 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct 
list_head *fcs)
 static void dup_fctx(struct userfaultfd_fork_ctx *fctx)
 {
struct userfaultfd_ctx *ctx = fctx->orig;
-   struct userfaultfd_wait_queue ewq;
+   struct userfaultfd_wait_queue *ewq = 
kmem_cache_zalloc(userfaultfd_wait_queue_cachep, GFP_KERNEL);
 
-   msg_init();
+   msg_init(>msg);
 
-   ewq.msg.event = UFFD_EVENT_FORK;
-   ewq.msg.arg.reserved.reserved1 = (unsigned long)fctx->new;
+   ewq->msg.event = UFFD_EVENT_FORK;
+   ewq->msg.arg.reserved.reserved1 = (unsigned long)fctx->new;
 
-   userfaultfd_event_wait_completion(ctx, );
+   userfaultfd_event_wait_completion(ctx, ewq);
+   kmem_cache_free(userfaultfd_wait_queue_cachep, ewq);
 }
 
 void dup_userfaultfd_complete(struct list_head *fcs)
@@ -951,7 +953,7 @@ void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx 
*vm_ctx,
 unsigned long len)
 {
struct userfaultfd_ctx *ctx = vm_ctx->ctx;
-   struct userfaultfd_wait_queue ewq;
+   struct userfaultfd_wait_queue *ewq = 
kmem_cache_zalloc(userfaultfd_wait_queue_cachep, GFP_KERNEL);
 
if (!ctx)
return;
@@ -961,14 +963,15 @@ void mremap_userfaultfd_complete(struct 
vm_userfaultfd_ctx *vm_ctx,
return;
}
 
-   msg_init();
+   msg_init(>msg);
 
-   ewq.msg.event = UFFD_EVENT_REMAP;
-   ewq.msg.arg.remap.from = from;
-   ewq.msg.arg.remap.to = to;
-   ewq.msg.arg.remap.len = len;
+   ewq->msg.event = UFFD_EVENT_REMAP;
+   ewq->msg.arg.remap.from = from;
+   ewq->msg.arg.remap.to = to;
+   ewq->msg.arg.remap.len = len;
 
-   userfaultfd_event_wait_completion(ctx, );
+   userfaultfd_event_wait_completion(ctx, ewq);
+   kmem_cache_free(userfaultfd_wait_queue_cachep, ewq);
 }
 
 bool userfaultfd_remove(struct vm_area_struct *vma,
@@ -976,23 +979,25 @@ bool userfaultfd_remove(struct vm_area_struct *vma,
 {
struct mm_struct *mm = vma->vm_mm;
struct userfaultfd_ctx *ctx;
-   struct userfaultfd_wait_queue ewq;
+   struct userfaultfd_wait_queue *ewq;
 
ctx = vma->vm_userfaultfd_ctx.ctx;
if (!ctx || !(ctx->features & UFFD_FEATURE_EVENT_REMOVE))
return true;
 
+   ewq = kmem_cache_zalloc(userfaultfd_wait_queue_cachep, GFP_KERNEL);
userfaultfd_ctx_get(ctx);
WRITE_ONCE(ctx->mmap_changing, true);
mmap_read_unlock(mm);
 
-   msg_init();
+   msg_init(>msg);
 
-   ewq.msg.event = UFFD_EVENT_REMOVE;
-   ewq.msg.arg.remove.start = start;
-   ewq.msg.arg.remove.end = end;
+   ewq->msg.event = UFFD_EVENT_REMOVE;
+   ewq->msg.arg.remove.start = start;
+   ewq->msg.arg.remove.end = end;
 
-   userfaultfd_event_wait_completion(ctx, );
+   userfaultfd_event_wait_completion(ctx, ewq);
+   kmem_cache_free(userfaultfd_wait_queue_cachep, ewq);
 
return false;
 }
@@ -1040,20 +1045,21 @@ int userfaultfd_unmap_prep(struct vm_area_struct *vma,
 void userfaultfd_unmap_complete(struct mm_struct *mm, struct list_head *uf)
 {
struct userfaultfd_unmap_ctx *ctx, *n;
-   struct userfaultfd_wait_queue ewq;
+   struct userfaultfd_wait_queue *ewq = 
kmem_cache_zalloc(userfaultfd_wait_queue_cachep, GFP_KERNEL);
 
list_for_each_entry_safe(ctx, n, uf, list) {
-   msg_init();
+   msg_init(>msg);
 
-   ewq.msg.event = UFFD_EVENT_UNMAP;
-   ewq.msg.arg.remove.start = ctx->start;
-   ewq.msg.arg.remove.end = ctx->end;
+   ewq->msg.event = UFFD_EVENT_UNMAP;
+   ewq->msg.arg.remove.start = ctx->start;
+   ewq->msg.arg.remove.end = ctx->end;
 
-   

[RFC PATCH 01/13] fs/userfaultfd: fix wrong error code on WP & !VM_MAYWRITE

2020-11-28 Thread Nadav Amit
From: Nadav Amit 

It is possible to get an EINVAL error instead of EPERM if the following
test vm_flags have VM_UFFD_WP but do not have VM_MAYWRITE, as "ret" is
overwritten since commit cab350afcbc9 ("userfaultfd: hugetlbfs: allow
registration of ranges containing huge pages").

Fix it.

Cc: Mike Kravetz 
Cc: Jens Axboe 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Alexander Viro 
Cc: io-ur...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Fixes: cab350afcbc9 ("userfaultfd: hugetlbfs: allow registration of ranges 
containing huge pages")
Signed-off-by: Nadav Amit 
---
 fs/userfaultfd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 000b457ad087..c8ed4320370e 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1364,6 +1364,7 @@ static int userfaultfd_register(struct userfaultfd_ctx 
*ctx,
if (end & (vma_hpagesize - 1))
goto out_unlock;
}
+   ret = -EPERM;
if ((vm_flags & VM_UFFD_WP) && !(cur->vm_flags & VM_MAYWRITE))
goto out_unlock;
 
-- 
2.25.1



[RFC PATCH 03/13] selftests/vm/userfaultfd: wake after copy failure

2020-11-28 Thread Nadav Amit
From: Nadav Amit 

When userfaultfd copy-ioctl fails since the PTE already exists, an
-EEXIST error is returned and the faulting thread is not woken. The
current userfaultfd test does not wake the faulting thread in such case.
The assumption is presumably that another thread set the PTE through
copy/wp ioctl and would wake the faulting thread or that alternatively
the fault handler would realize there is no need to "must_wait" and
continue. This is not necessarily true.

There is an assumption that the "must_wait" tests in handle_userfault()
are sufficient to provide definitive answer whether the offending PTE is
populated or not. However, userfaultfd_must_wait() test is lockless.
Consequently, concurrent calls to ptep_modify_prot_start(), for
instance, can clear the PTE and can cause userfaultfd_must_wait()
to wrongly assume it is not populated and a wait is needed.

There are therefore 3 options:
(1) Change the tests to wake on copy failure.
(2) Wake faulting thread unconditionally on zero/copy ioctls before
returning -EEXIST.
(3) Change the userfaultfd_must_wait() to hold locks.

This patch took the first approach, but the others are valid solutions
with different tradeoffs.

Cc: Jens Axboe 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Alexander Viro 
Cc: io-ur...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Nadav Amit 
---
 tools/testing/selftests/vm/userfaultfd.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/tools/testing/selftests/vm/userfaultfd.c 
b/tools/testing/selftests/vm/userfaultfd.c
index 9b0912a01777..f7e6cf43db71 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -484,6 +484,18 @@ static void retry_copy_page(int ufd, struct uffdio_copy 
*uffdio_copy,
}
 }
 
+static void wake_range(int ufd, unsigned long addr, unsigned long len)
+{
+   struct uffdio_range uffdio_wake;
+
+   uffdio_wake.start = addr;
+   uffdio_wake.len = len;
+
+   if (ioctl(ufd, UFFDIO_WAKE, _wake))
+   fprintf(stderr, "error waking %lu\n",
+   addr), exit(1);
+}
+
 static int __copy_page(int ufd, unsigned long offset, bool retry)
 {
struct uffdio_copy uffdio_copy;
@@ -507,6 +519,7 @@ static int __copy_page(int ufd, unsigned long offset, bool 
retry)
uffdio_copy.copy);
exit(1);
}
+   wake_range(ufd, uffdio_copy.dst, page_size);
} else if (uffdio_copy.copy != page_size) {
fprintf(stderr, "UFFDIO_COPY unexpected copy %Ld\n",
uffdio_copy.copy); exit(1);
-- 
2.25.1



[RFC PATCH 04/13] fs/userfaultfd: simplify locks in userfaultfd_ctx_read

2020-11-28 Thread Nadav Amit
From: Nadav Amit 

Small refactoring to reduce the number of locations in which locks are
released in userfaultfd_ctx_read(), as this makes the understanding of
the code and its changes harder.

No functional change intended.

Cc: Jens Axboe 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Alexander Viro 
Cc: io-ur...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Nadav Amit 
---
 fs/userfaultfd.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 4fe07c1a44c6..fedf7c1615d5 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1039,6 +1039,7 @@ static ssize_t userfaultfd_ctx_read(struct 
userfaultfd_ctx *ctx, int no_wait,
set_current_state(TASK_INTERRUPTIBLE);
spin_lock(>fault_pending_wqh.lock);
uwq = find_userfault(ctx);
+   ret = -EAGAIN;
if (uwq) {
/*
 * Use a seqcount to repeat the lockless check
@@ -1077,11 +1078,11 @@ static ssize_t userfaultfd_ctx_read(struct 
userfaultfd_ctx *ctx, int no_wait,
 
/* careful to always initialize msg if ret == 0 */
*msg = uwq->msg;
-   spin_unlock(>fault_pending_wqh.lock);
ret = 0;
-   break;
}
spin_unlock(>fault_pending_wqh.lock);
+   if (!ret)
+   break;
 
spin_lock(>event_wqh.lock);
uwq = find_userfault_evt(ctx);
@@ -1099,17 +1100,14 @@ static ssize_t userfaultfd_ctx_read(struct 
userfaultfd_ctx *ctx, int no_wait,
 * reference on it.
 */
userfaultfd_ctx_get(fork_nctx);
-   spin_unlock(>event_wqh.lock);
-   ret = 0;
-   break;
+   } else {
+   userfaultfd_event_complete(ctx, uwq);
}
-
-   userfaultfd_event_complete(ctx, uwq);
-   spin_unlock(>event_wqh.lock);
ret = 0;
-   break;
}
spin_unlock(>event_wqh.lock);
+   if (!ret)
+   break;
 
if (signal_pending(current)) {
ret = -ERESTARTSYS;
-- 
2.25.1



[RFC PATCH 00/13] fs/userfaultfd: support iouring and polling

2020-11-28 Thread Nadav Amit
From: Nadav Amit 

While the overhead of userfaultfd is usually reasonable, this overhead
can still be prohibitive for low-latency backing storage, such as RDMA,
persistent memory or in-memory compression. In such cases the overhead
of scheduling and entering/exiting the kernel becomes dominant.

The natural solution for this problem is to use iouring with
userfaultfd. But besides one bug, this does not provide sufficient
performance improvement and the use of ioctls for zero/copy limits the
use of iouring for synchronous "reads" (reporting of faults/events).
This patch-set provides four solutions for this overhead:

1. Userfaultfd "polling" mode, in which the faulting thread polls after
reporting the fault instead of being de-scheduled. This fits cases in
which the handler is expected to poll for page-faults on a different
thread.

2. Asynchronous-reads, in which the faulting thread reports page-faults
(and other events) directly to the userspace handler thread. For this
matter asynchronous read completions are being introduced.

3. Write interface, which provides similar services to the zero/copy
ioctls. This allows the use of iouring for zero/copy without changing
the iouring code or making it to be userfaultfd-aware. The low bits of
the "position" are being used to encode the requested operation
(zero/cop/wp/etc).

4. Async-writes, in which the zero/copy is performed by the faulting
thread instead of the iouring thread. This reduces caching effects as
the data is likely to be used by the faulting thread and find_vma()
cannot use its cache on the iouring worker.

I will provide some benchmark results later, but some initial results
show that these patches reduce the overhead of handling a user
page-fault by over 50%.

The patches require a bit more cleanup but seem to pass the tests.

Note that the first three patches are bug fixes. I did not Cc them to
stable yet.

Cc: Mike Kravetz 
Cc: Jens Axboe 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Alexander Viro 
Cc: io-ur...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org

Nadav Amit (13):
  fs/userfaultfd: fix wrong error code on WP & !VM_MAYWRITE
  fs/userfaultfd: fix wrong file usage with iouring
  selftests/vm/userfaultfd: wake after copy failure
  fs/userfaultfd: simplify locks in userfaultfd_ctx_read
  fs/userfaultfd: introduce UFFD_FEATURE_POLL
  iov_iter: support atomic copy_page_from_iter_iovec()
  fs/userfaultfd: support read_iter to use io_uring
  fs/userfaultfd: complete reads asynchronously
  fs/userfaultfd: use iov_iter for copy/zero
  fs/userfaultfd: add write_iter() interface
  fs/userfaultfd: complete write asynchronously
  fs/userfaultfd: kmem-cache for wait-queue objects
  selftests/vm/userfaultfd: iouring and polling tests

 fs/userfaultfd.c | 740 
 include/linux/hugetlb.h  |   4 +-
 include/linux/mm.h   |   6 +-
 include/linux/shmem_fs.h |   2 +-
 include/linux/uio.h  |   3 +
 include/linux/userfaultfd_k.h|  10 +-
 include/uapi/linux/userfaultfd.h |  21 +-
 lib/iov_iter.c   |  23 +-
 mm/hugetlb.c |  12 +-
 mm/memory.c  |  36 +-
 mm/shmem.c   |  17 +-
 mm/userfaultfd.c |  96 ++-
 tools/testing/selftests/vm/Makefile  |   2 +-
 tools/testing/selftests/vm/userfaultfd.c | 835 +--
 14 files changed, 1506 insertions(+), 301 deletions(-)

-- 
2.25.1



Re: [PATCH v12 09/17] s390/vfio-ap: sysfs attribute to display the guest's matrix

2020-11-28 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:08 -0500
Tony Krowiak  wrote:

> The matrix of adapters and domains configured in a guest's APCB may
> differ from the matrix of adapters and domains assigned to the matrix mdev,
> so this patch introduces a sysfs attribute to display the matrix of
> adapters and domains that are or will be assigned to the APCB of a guest
> that is or will be using the matrix mdev. For a matrix mdev denoted by
> $uuid, the guest matrix can be displayed as follows:
> 
>cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix
> 
> Signed-off-by: Tony Krowiak 

Code looks good, but it may be a little early, since the treatment of
guset_matrix is changed by the following patches.


[RFC PATCH 02/13] fs/userfaultfd: fix wrong file usage with iouring

2020-11-28 Thread Nadav Amit
From: Nadav Amit 

Using io-uring with userfaultfd for reads can lead upon a fork event to
the installation of the userfaultfd file descriptor on the worker kernel
thread instead of the process that initiated the read. io-uring assumes
that no new file descriptors are installed during read.

As a result the controlling process would not be able to access the
new forked process userfaultfd file descriptor.

To solve this problem, Save the files_struct of the process that
initiated userfaultfd syscall in the context and reload it when needed.

Cc: Jens Axboe 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Alexander Viro 
Cc: io-ur...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Fixes: 2b188cc1bb85 ("Add io_uring IO interface")
Signed-off-by: Nadav Amit 
---
 fs/userfaultfd.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index c8ed4320370e..4fe07c1a44c6 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 int sysctl_unprivileged_userfaultfd __read_mostly = 1;
 
@@ -76,6 +77,8 @@ struct userfaultfd_ctx {
bool mmap_changing;
/* mm with one ore more vmas attached to this userfaultfd_ctx */
struct mm_struct *mm;
+   /* controlling process files as they might be different than current */
+   struct files_struct *files;
 };
 
 struct userfaultfd_fork_ctx {
@@ -173,6 +176,7 @@ static void userfaultfd_ctx_put(struct userfaultfd_ctx *ctx)
VM_BUG_ON(spin_is_locked(>fd_wqh.lock));
VM_BUG_ON(waitqueue_active(>fd_wqh));
mmdrop(ctx->mm);
+   put_files_struct(ctx->files);
kmem_cache_free(userfaultfd_ctx_cachep, ctx);
}
 }
@@ -666,6 +670,8 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct 
list_head *fcs)
ctx->mm = vma->vm_mm;
mmgrab(ctx->mm);
 
+   ctx->files = octx->files;
+   atomic_inc(>files->count);
userfaultfd_ctx_get(octx);
WRITE_ONCE(octx->mmap_changing, true);
fctx->orig = octx;
@@ -976,10 +982,32 @@ static int resolve_userfault_fork(struct userfaultfd_ctx 
*ctx,
  struct userfaultfd_ctx *new,
  struct uffd_msg *msg)
 {
+   struct files_struct *files = NULL;
int fd;
 
+   BUG_ON(new->files == NULL);
+
+   /*
+* This function can be called from another context than the controlling
+* process, for instance, for an io-uring submission kernel thread. If
+* that is the case we must ensure the correct files are being used.
+*/
+   if (current->files != new->files) {
+   task_lock(current);
+   files = current->files;
+   current->files = new->files;
+   task_unlock(current);
+   }
+
fd = anon_inode_getfd("[userfaultfd]", _fops, new,
  O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
+
+   if (files != NULL) {
+   task_lock(current);
+   current->files = files;
+   task_unlock(current);
+   }
+
if (fd < 0)
return fd;
 
@@ -1986,6 +2014,8 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
/* prevent the mm struct to be freed */
mmgrab(ctx->mm);
 
+   ctx->files = get_files_struct(current);
+
fd = anon_inode_getfd("[userfaultfd]", _fops, ctx,
  O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
if (fd < 0) {
-- 
2.25.1



Re: [PATCH v12 08/17] s390/vfio-ap: introduce shadow APCB

2020-11-28 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:07 -0500
Tony Krowiak  wrote:

> The APCB is a field within the CRYCB that provides the AP configuration
> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
> maintain it for the lifespan of the guest.
> 
> Signed-off-by: Tony Krowiak 
> Reviewed-by: Halil Pasic 

Still LGTM


Re: [PATCH 1/8] lazy tlb: introduce exit_lazy_tlb

2020-11-28 Thread Andy Lutomirski
On Sat, Nov 28, 2020 at 8:01 AM Nicholas Piggin  wrote:
>
> This is called at points where a lazy mm is switched away or made not
> lazy (by its owner switching back).
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/arm/mach-rpc/ecard.c|  1 +
>  arch/powerpc/mm/book3s64/radix_tlb.c |  1 +
>  fs/exec.c|  6 --
>  include/asm-generic/mmu_context.h| 21 +
>  kernel/kthread.c |  1 +
>  kernel/sched/core.c  |  2 ++
>  6 files changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/mach-rpc/ecard.c b/arch/arm/mach-rpc/ecard.c
> index 827b50f1c73e..43eb1bfba466 100644
> --- a/arch/arm/mach-rpc/ecard.c
> +++ b/arch/arm/mach-rpc/ecard.c
> @@ -253,6 +253,7 @@ static int ecard_init_mm(void)
> current->mm = mm;
> current->active_mm = mm;
> activate_mm(active_mm, mm);
> +   exit_lazy_tlb(active_mm, current);
> mmdrop(active_mm);
> ecard_init_pgtables(mm);
> return 0;
> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
> b/arch/powerpc/mm/book3s64/radix_tlb.c
> index b487b489d4b6..ac3fec03926a 100644
> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
> @@ -661,6 +661,7 @@ static void do_exit_flush_lazy_tlb(void *arg)
> mmgrab(_mm);
> current->active_mm = _mm;
> switch_mm_irqs_off(mm, _mm, current);
> +   exit_lazy_tlb(mm, current);
> mmdrop(mm);
> }
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 547a2390baf5..4b4dea1bb7ba 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1017,6 +1017,8 @@ static int exec_mmap(struct mm_struct *mm)
> if (!IS_ENABLED(CONFIG_ARCH_WANT_IRQS_OFF_ACTIVATE_MM))
> local_irq_enable();
> activate_mm(active_mm, mm);
> +   if (!old_mm)
> +   exit_lazy_tlb(active_mm, tsk);
> if (IS_ENABLED(CONFIG_ARCH_WANT_IRQS_OFF_ACTIVATE_MM))
> local_irq_enable();
> tsk->mm->vmacache_seqnum = 0;
> @@ -1028,9 +1030,9 @@ static int exec_mmap(struct mm_struct *mm)
> setmax_mm_hiwater_rss(>signal->maxrss, old_mm);
> mm_update_next_owner(old_mm);
> mmput(old_mm);
> -   return 0;
> +   } else {
> +   mmdrop(active_mm);
> }
> -   mmdrop(active_mm);

This looks like an unrelated change.

> return 0;
>  }
>
> diff --git a/include/asm-generic/mmu_context.h 
> b/include/asm-generic/mmu_context.h
> index 91727065bacb..4626d0020e65 100644
> --- a/include/asm-generic/mmu_context.h
> +++ b/include/asm-generic/mmu_context.h
> @@ -24,6 +24,27 @@ static inline void enter_lazy_tlb(struct mm_struct *mm,
>  }
>  #endif
>
> +/*
> + * exit_lazy_tlb - Called after switching away from a lazy TLB mode mm.
> + *
> + * mm:  the lazy mm context that was switched
> + * tsk: the task that was switched to (with a non-lazy mm)
> + *
> + * mm may equal tsk->mm.
> + * mm and tsk->mm will not be NULL.
> + *
> + * Note this is not symmetrical to enter_lazy_tlb, this is not
> + * called when tasks switch into the lazy mm, it's called after the
> + * lazy mm becomes non-lazy (either switched to a different mm or the
> + * owner of the mm returns).
> + */
> +#ifndef exit_lazy_tlb
> +static inline void exit_lazy_tlb(struct mm_struct *mm,

Maybe name this parameter prev_lazy_mm?


Re: [PATCH 5/8] lazy tlb: allow lazy tlb mm switching to be configurable

2020-11-28 Thread Andy Lutomirski
On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin  wrote:
>
> NOMMU systems could easily go without this and save a bit of code
> and the refcount atomics, because their mm switch is a no-op. I
> haven't flipped them over because haven't audited all arch code to
> convert over to using the _lazy_tlb refcounting.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/Kconfig | 11 +++
>  include/linux/sched/mm.h | 13 ++--
>  kernel/sched/core.c  | 68 +---
>  kernel/sched/sched.h |  4 ++-
>  4 files changed, 75 insertions(+), 21 deletions(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 56b6ccc0e32d..596bf589d74b 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -430,6 +430,17 @@ config ARCH_WANT_IRQS_OFF_ACTIVATE_MM
>   irqs disabled over activate_mm. Architectures that do IPI based TLB
>   shootdowns should enable this.
>
> +# Should make this depend on MMU, because there is little use for lazy mm 
> switching
> +# with NOMMU. Must audit NOMMU architecture code for lazy mm refcounting 
> first.
> +config MMU_LAZY_TLB
> +   def_bool y
> +   help
> + Enable "lazy TLB" mmu context switching for kernel threads.
> +
> +config MMU_LAZY_TLB_REFCOUNT
> +   def_bool y
> +   depends on MMU_LAZY_TLB
> +

This could use some documentation as to what "no" means.

>  config ARCH_HAVE_NMI_SAFE_CMPXCHG
> bool
>
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 7157c0f6fef8..bd0f27402d4b 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -51,12 +51,21 @@ static inline void mmdrop(struct mm_struct *mm)
>  /* Helpers for lazy TLB mm refcounting */
>  static inline void mmgrab_lazy_tlb(struct mm_struct *mm)
>  {
> -   mmgrab(mm);
> +   if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_REFCOUNT))
> +   mmgrab(mm);
>  }
>
>  static inline void mmdrop_lazy_tlb(struct mm_struct *mm)
>  {
> -   mmdrop(mm);
> +   if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_REFCOUNT)) {
> +   mmdrop(mm);
> +   } else {
> +   /*
> +* mmdrop_lazy_tlb must provide a full memory barrier, see the
> +* membarrier comment finish_task_switch.

"membarrier comment in finish_task_switch()", perhaps?

> +*/
> +   smp_mb();
> +   }
>  }
>
>  /**
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e372b613d514..3b79c6cc3a37 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3579,7 +3579,7 @@ static struct rq *finish_task_switch(struct task_struct 
> *prev)
> __releases(rq->lock)
>  {
> struct rq *rq = this_rq();
> -   struct mm_struct *mm = rq->prev_mm;
> +   struct mm_struct *mm = NULL;
> long prev_state;
>
> /*
> @@ -3598,7 +3598,10 @@ static struct rq *finish_task_switch(struct 
> task_struct *prev)
>   current->comm, current->pid, preempt_count()))
> preempt_count_set(FORK_PREEMPT_COUNT);
>
> -   rq->prev_mm = NULL;
> +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT
> +   mm = rq->prev_lazy_mm;
> +   rq->prev_lazy_mm = NULL;
> +#endif
>
> /*
>  * A task struct has one reference for the use as "current".
> @@ -3630,6 +3633,8 @@ static struct rq *finish_task_switch(struct task_struct 
> *prev)
>  * rq->curr, before returning to userspace, for
>  * {PRIVATE,GLOBAL}_EXPEDITED. This is implicitly provided by
>  * mmdrop_lazy_tlb().
> +*
> +* This same issue applies to other places that mmdrop_lazy_tlb().
>  */
> if (mm)
> mmdrop_lazy_tlb(mm);
> @@ -3719,22 +3724,10 @@ asmlinkage __visible void schedule_tail(struct 
> task_struct *prev)
> calculate_sigpending();
>  }
>
> -/*
> - * context_switch - switch to the new MM and the new thread's register state.
> - */
> -static __always_inline struct rq *
> -context_switch(struct rq *rq, struct task_struct *prev,
> -  struct task_struct *next, struct rq_flags *rf)
> +static __always_inline void
> +context_switch_mm(struct rq *rq, struct task_struct *prev,
> +  struct task_struct *next)
>  {
> -   prepare_task_switch(rq, prev, next);
> -
> -   /*
> -* For paravirt, this is coupled with an exit in switch_to to
> -* combine the page table reload and the switch backend into
> -* one hypercall.
> -*/
> -   arch_start_context_switch(prev);
> -
> /*
>  * kernel -> kernel   lazy + transfer active
>  *   user -> kernel   lazy + mmgrab_lazy_tlb() active
> @@ -3765,11 +3758,50 @@ context_switch(struct rq *rq, struct task_struct 
> *prev,
> if (!prev->mm) {// from kernel
> exit_lazy_tlb(prev->active_mm, next);
>
> +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT
> /* will mmdrop_lazy_tlb() in 

Re: [GIT PULL] RISC-V Fixes for 5.10-rc6

2020-11-28 Thread pr-tracker-bot
The pull request you sent on Sat, 28 Nov 2020 15:00:47 -0800 (PST):

> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git 
> tags/riscv-for-linus-5.10-rc6

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/aae5ab854e38151e69f261dbf0e3b7e396403178

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


mapping.c:undefined reference to `dma_to_phys'

2020-11-28 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   45e885c439e825c19f3a51e46ef8210984bc0a9c
commit: 7bc5c428a660d4d1bc95ba54bf4cb6bccf8c3029 dma-direct: remove 
__dma_to_phys
date:   3 months ago
config: mips-randconfig-r031-20201129 (attached as .config)
compiler: mips64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7bc5c428a660d4d1bc95ba54bf4cb6bccf8c3029
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout 7bc5c428a660d4d1bc95ba54bf4cb6bccf8c3029
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   mips64-linux-ld: kernel/dma/mapping.o: in function `dma_map_page_attrs':
   mapping.c:(.text+0x10c): undefined reference to `__phys_to_dma'
   mips64-linux-ld: kernel/dma/mapping.o: in function `dma_unmap_page_attrs':
>> mapping.c:(.text+0x23c): undefined reference to `dma_to_phys'
>> mips64-linux-ld: mapping.c:(.text+0x274): undefined reference to 
>> `dma_to_phys'
   mips64-linux-ld: kernel/dma/mapping.o: in function `dma_sync_single_for_cpu':
   mapping.c:(.text+0x3d4): undefined reference to `dma_to_phys'
   mips64-linux-ld: kernel/dma/direct.o: in function 
`dma_direct_get_required_mask':
   direct.c:(.text+0xe4): undefined reference to `__phys_to_dma'
   mips64-linux-ld: kernel/dma/direct.o: in function `dma_direct_alloc':
>> direct.c:(.text+0x210): undefined reference to `dma_to_phys'
   mips64-linux-ld: direct.c:(.text+0x2d8): undefined reference to 
`__phys_to_dma'
   mips64-linux-ld: direct.c:(.text+0x354): undefined reference to 
`__phys_to_dma'
   mips64-linux-ld: direct.c:(.text+0x42c): undefined reference to 
`__phys_to_dma'
>> mips64-linux-ld: direct.c:(.text+0x4d8): undefined reference to `dma_to_phys'
   mips64-linux-ld: kernel/dma/direct.o: in function `dma_direct_free':
   direct.c:(.text+0x5d8): undefined reference to `dma_to_phys'
   mips64-linux-ld: kernel/dma/direct.o: in function `dma_direct_map_sg':
   direct.c:(.text+0x764): undefined reference to `__phys_to_dma'
   mips64-linux-ld: kernel/dma/direct.o: in function `dma_direct_get_sgtable':
   direct.c:(.text+0xa00): undefined reference to `dma_to_phys'
   mips64-linux-ld: kernel/dma/direct.o: in function `dma_direct_mmap':
   direct.c:(.text+0xb10): undefined reference to `dma_to_phys'
   mips64-linux-ld: kernel/dma/direct.o: in function `dma_direct_supported':
   direct.c:(.text+0xbf8): undefined reference to `__phys_to_dma'
   mips64-linux-ld: kernel/dma/direct.o: in function `dma_direct_need_sync':
   direct.c:(.text+0xc28): undefined reference to `dma_to_phys'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: Lockdep warning on io_file_data_ref_zero() with 5.10-rc5

2020-11-28 Thread Nadav Amit
> On Nov 28, 2020, at 4:13 PM, Pavel Begunkov  wrote:
> 
> On 28/11/2020 23:59, Nadav Amit wrote:
>> Hello Pavel,
>> 
>> I got the following lockdep splat while rebasing my work on 5.10-rc5 on the
>> kernel (based on 5.10-rc5+).
>> 
>> I did not actually confirm that the problem is triggered without my changes,
>> as my iouring workload requires some kernel changes (not iouring changes),
>> yet IMHO it seems pretty clear that this is a result of your commit
>> e297822b20e7f ("io_uring: order refnode recycling”), that acquires a lock in
>> io_file_data_ref_zero() inside a softirq context.
> 
> Yeah, that's true. It was already reported by syzkaller and fixed by Jens, but
> queued for 5.11. Thanks for letting know anyway!
> 
> https://lore.kernel.org/io-uring/948d2d3b-5f36-034d-28e6-7490343a5...@kernel.dk/T/#t

Thanks for the quick response and sorry for the noise. I should improve my
Googling abilities and check the iouring repository the next time.

Regards,
Nadav

Re: Lockdep warning on io_file_data_ref_zero() with 5.10-rc5

2020-11-28 Thread Pavel Begunkov
On 28/11/2020 23:59, Nadav Amit wrote:
> Hello Pavel,
> 
> I got the following lockdep splat while rebasing my work on 5.10-rc5 on the
> kernel (based on 5.10-rc5+).
> 
> I did not actually confirm that the problem is triggered without my changes,
> as my iouring workload requires some kernel changes (not iouring changes),
> yet IMHO it seems pretty clear that this is a result of your commit
> e297822b20e7f ("io_uring: order refnode recycling”), that acquires a lock in
> io_file_data_ref_zero() inside a softirq context.

Yeah, that's true. It was already reported by syzkaller and fixed by Jens, but
queued for 5.11. Thanks for letting know anyway!

https://lore.kernel.org/io-uring/948d2d3b-5f36-034d-28e6-7490343a5...@kernel.dk/T/#t


Jens, I think it's for the best to add it for 5.10, at least so that lockdep
doesn't complain.

> 
> Let me know if my analysis is wrong.
> 
> Regards,
> Nadav
> 
> [  136.349353] 
> [  136.350212] WARNING: inconsistent lock state
> [  136.351093] 5.10.0-rc5+ #1435 Not tainted
> [  136.352003] 
> [  136.352891] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> [  136.354057] swapper/5/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
> [  136.355078] 88810417d6a8 (_data->lock){+.?.}-{2:2}, at: 
> io_file_data_ref_zero+0x4d/0x220
> [  136.356717] {SOFTIRQ-ON-W} state was registered at:
> [  136.357539]   lock_acquire+0x172/0x520
> [  136.358209]   _raw_spin_lock+0x30/0x40
> [  136.358880]   __io_uring_register+0x1c99/0x1fe0
> [  136.359656]   __x64_sys_io_uring_register+0xe2/0x270
> [  136.360489]   do_syscall_64+0x39/0x90
> [  136.361144]   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  136.361991] irq event stamp: 835836
> [  136.362627] hardirqs last  enabled at (835836): [] 
> _raw_spin_unlock_irqrestore+0x41/0x50
> [  136.364112] hardirqs last disabled at (835835): [] 
> _raw_spin_lock_irqsave+0x5a/0x60
> [  136.365553] softirqs last  enabled at (835824): [] 
> _local_bh_enable+0x21/0x40
> [  136.366920] softirqs last disabled at (835825): [] 
> asm_call_irq_on_stack+0x12/0x20
> [  136.368335] 
> [  136.368335] other info that might help us debug this:
> [  136.369414]  Possible unsafe locking scenario:
> [  136.369414] 
> [  136.370414]CPU0
> [  136.370907]
> [  136.371403]   lock(_data->lock);
> [  136.372064]   
> [  136.372585] lock(_data->lock);
> [  136.373269] 
> [  136.373269]  *** DEADLOCK ***
> [  136.373269] 
> [  136.374319] 2 locks held by swapper/5/0:
> [  136.375005]  #0: 83c45380 (rcu_callback){}-{0:0}, at: 
> rcu_core+0x451/0xb70
> [  136.376284]  #1: 83c454a0 (rcu_read_lock){}-{1:2}, at: 
> percpu_ref_switch_to_atomic_rcu+0x139/0x320
> [  136.377849] 
> [  136.377849] stack backtrace:
> [  136.378650] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.10.0-rc5irina+ 
> #1435
> [  136.379746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
> [  136.381550] Call Trace:
> [  136.382053]  
> [  136.382502]  dump_stack+0xa4/0xd9
> [  136.383116]  print_usage_bug.cold+0x217/0x220
> [  136.383871]  mark_lock+0xb90/0xe80
> [  136.384506]  ? print_usage_bug+0x180/0x180
> [  136.385223]  ? __kasan_check_read+0x11/0x20
> [  136.385946]  ? mark_lock+0x116/0xe80
> [  136.386599]  ? print_usage_bug+0x180/0x180
> [  136.387324]  ? __lock_acquire+0x8f5/0x2a80
> [  136.388039]  ? __kasan_check_read+0x11/0x20
> [  136.388776]  ? __lock_acquire+0x8f5/0x2a80
> [  136.389493]  __lock_acquire+0xdc9/0x2a80
> [  136.390190]  ? lockdep_hardirqs_on_prepare+0x210/0x210
> [  136.391039]  ? rcu_read_lock_sched_held+0xa1/0xd0
> [  136.391835]  ? rcu_read_lock_bh_held+0xb0/0xb0
> [  136.392603]  lock_acquire+0x172/0x520
> [  136.393258]  ? io_file_data_ref_zero+0x4d/0x220
> [  136.394025]  ? lock_release+0x410/0x410
> [  136.394705]  ? lock_acquire+0x172/0x520
> [  136.395386]  ? percpu_ref_switch_to_atomic_rcu+0x139/0x320
> [  136.396277]  ? lock_release+0x410/0x410
> [  136.396961]  _raw_spin_lock+0x30/0x40
> [  136.397620]  ? io_file_data_ref_zero+0x4d/0x220
> [  136.398392]  io_file_data_ref_zero+0x4d/0x220
> [  136.399138]  percpu_ref_switch_to_atomic_rcu+0x310/0x320
> [  136.47]  ? percpu_ref_init+0x180/0x180
> [  136.400730]  rcu_core+0x49c/0xb70
> [  136.401344]  ? rcu_core+0x451/0xb70
> [  136.401978]  ? strict_work_handler+0x150/0x150
> [  136.402740]  ? rcu_read_lock_sched_held+0xa1/0xd0
> [  136.403535]  ? rcu_read_lock_bh_held+0xb0/0xb0
> [  136.404298]  rcu_core_si+0xe/0x10
> [  136.404914]  __do_softirq+0x104/0x59d
> [  136.405572]  asm_call_irq_on_stack+0x12/0x20
> [  136.406306]  
> [  136.406760]  do_softirq_own_stack+0x6f/0x80
> [  136.407484]  irq_exit_rcu+0xf3/0x100
> [  136.408134]  sysvec_apic_timer_interrupt+0x4b/0xb0
> [  136.408946]  asm_sysvec_apic_timer_interrupt+0x12/0x20
> [  136.409798] RIP: 0010:default_idle+0x1c/0x20
> [  136.410536] Code: eb cd 66 66 2e 0f 1f 84 00 00 00 00 00 

[BISECTED REGRESSION] Broken USB/GPIO on OMAP1 OSK

2020-11-28 Thread Aaro Koskinen
Hi,

I tried to upgrade my OMAP1 OSK board to v5.9, but the rootfs cannot
be accessed anymore due to broken USB. It fails to probe with the
following logs:

[9.219940] ohci ohci: cannot find GPIO chip i2c-tps65010, deferring
[9.250366] ohci ohci: cannot find GPIO chip i2c-tps65010, deferring
[9.731445] ohci ohci: cannot find GPIO chip i2c-tps65010, deferring
[   10.342102] ohci ohci: cannot find GPIO chip i2c-tps65010, deferring
[   10.966430] ohci ohci: cannot find GPIO chip i2c-tps65010, deferring

Bisected to:

commit 15d157e874437e381643c37a10922388d6e55b29
Author: Linus Walleij 
Date:   Mon Jul 20 15:55:24 2020 +0200

usb: ohci-omap: Convert to use GPIO descriptors

I suspect one of the issues is the name "i2c-tps65010" vs "tps65010":

# cat /sys/devices/platform/omap_i2c.1/i2c-1/i2c-tps65010/gpio/gpiochip208/label
tps65010

However changing that in the lookup table still doesn't help much; I got rid
of the "deferring" message but the USB still doesn't work. So far the only
workaround I have is to revert the whole commit.

A.


[PATCH Xilinx Alveo 4/8] fpga: xrt: core infrastructure for xrt-lib module

2020-11-28 Thread Sonal Santan
From: Sonal Santan 

Add xrt-lib kernel module infrastructrure code which defines APIs
for working with device nodes, iteration and lookup of platform
devices, common interfaces for platform devices, plumbing of
function call and ioctls between platform devices and parent
partitions.

Signed-off-by: Sonal Santan 
---
 drivers/fpga/alveo/lib/xrt-cdev.c   |  234 +++
 drivers/fpga/alveo/lib/xrt-main.c   |  275 
 drivers/fpga/alveo/lib/xrt-main.h   |   46 ++
 drivers/fpga/alveo/lib/xrt-subdev.c | 1007 +++
 4 files changed, 1562 insertions(+)
 create mode 100644 drivers/fpga/alveo/lib/xrt-cdev.c
 create mode 100644 drivers/fpga/alveo/lib/xrt-main.c
 create mode 100644 drivers/fpga/alveo/lib/xrt-main.h
 create mode 100644 drivers/fpga/alveo/lib/xrt-subdev.c

diff --git a/drivers/fpga/alveo/lib/xrt-cdev.c 
b/drivers/fpga/alveo/lib/xrt-cdev.c
new file mode 100644
index ..b7bef9c8e9ce
--- /dev/null
+++ b/drivers/fpga/alveo/lib/xrt-cdev.c
@@ -0,0 +1,234 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Xilinx Alveo FPGA device node helper functions.
+ *
+ * Copyright (C) 2020 Xilinx, Inc.
+ *
+ * Authors:
+ * Cheng Zhen 
+ */
+
+#include "xrt-subdev.h"
+
+extern struct class *xrt_class;
+
+#defineXRT_CDEV_DIR"xfpga"
+#defineINODE2PDATA(inode)  \
+   container_of((inode)->i_cdev, struct xrt_subdev_platdata, xsp_cdev)
+#defineINODE2PDEV(inode)   \
+   to_platform_device(kobj_to_dev((inode)->i_cdev->kobj.parent))
+#defineCDEV_NAME(sysdev)   (strchr((sysdev)->kobj.name, '!') + 1)
+
+/* Allow it to be accessed from cdev. */
+static void xrt_devnode_allowed(struct platform_device *pdev)
+{
+   struct xrt_subdev_platdata *pdata = DEV_PDATA(pdev);
+
+   /* Allow new opens. */
+   mutex_lock(>xsp_devnode_lock);
+   pdata->xsp_devnode_online = true;
+   mutex_unlock(>xsp_devnode_lock);
+}
+
+/* Turn off access from cdev and wait for all existing user to go away. */
+static int xrt_devnode_disallowed(struct platform_device *pdev)
+{
+   int ret = 0;
+   struct xrt_subdev_platdata *pdata = DEV_PDATA(pdev);
+
+   mutex_lock(>xsp_devnode_lock);
+
+   /* Prevent new opens. */
+   pdata->xsp_devnode_online = false;
+   /* Wait for existing user to close. */
+   while (!ret && pdata->xsp_devnode_ref) {
+   int rc;
+
+   mutex_unlock(>xsp_devnode_lock);
+   rc = wait_for_completion_killable(>xsp_devnode_comp);
+   mutex_lock(>xsp_devnode_lock);
+
+   if (rc == -ERESTARTSYS) {
+   /* Restore online state. */
+   pdata->xsp_devnode_online = true;
+   xrt_err(pdev, "%s is in use, ref=%d",
+   CDEV_NAME(pdata->xsp_sysdev),
+   pdata->xsp_devnode_ref);
+   ret = -EBUSY;
+   }
+   }
+
+   mutex_unlock(>xsp_devnode_lock);
+
+   return ret;
+}
+
+static struct platform_device *
+__xrt_devnode_open(struct inode *inode, bool excl)
+{
+   struct xrt_subdev_platdata *pdata = INODE2PDATA(inode);
+   struct platform_device *pdev = INODE2PDEV(inode);
+   bool opened = false;
+
+   mutex_lock(>xsp_devnode_lock);
+
+   if (pdata->xsp_devnode_online) {
+   if (excl && pdata->xsp_devnode_ref) {
+   xrt_err(pdev, "%s has already been opened exclusively",
+   CDEV_NAME(pdata->xsp_sysdev));
+   } else if (!excl && pdata->xsp_devnode_excl) {
+   xrt_err(pdev, "%s has been opened exclusively",
+   CDEV_NAME(pdata->xsp_sysdev));
+   } else {
+   pdata->xsp_devnode_ref++;
+   pdata->xsp_devnode_excl = excl;
+   opened = true;
+   xrt_info(pdev, "opened %s, ref=%d",
+   CDEV_NAME(pdata->xsp_sysdev),
+   pdata->xsp_devnode_ref);
+   }
+   } else {
+   xrt_err(pdev, "%s is offline", CDEV_NAME(pdata->xsp_sysdev));
+   }
+
+   mutex_unlock(>xsp_devnode_lock);
+
+   return opened ? pdev : NULL;
+}
+
+struct platform_device *
+xrt_devnode_open_excl(struct inode *inode)
+{
+   return __xrt_devnode_open(inode, true);
+}
+
+struct platform_device *
+xrt_devnode_open(struct inode *inode)
+{
+   return __xrt_devnode_open(inode, false);
+}
+EXPORT_SYMBOL_GPL(xrt_devnode_open);
+
+void xrt_devnode_close(struct inode *inode)
+{
+   struct xrt_subdev_platdata *pdata = INODE2PDATA(inode);
+   struct platform_device *pdev = INODE2PDEV(inode);
+   bool notify = false;
+
+   mutex_lock(>xsp_devnode_lock);
+
+   pdata->xsp_devnode_ref--;
+   if (pdata->xsp_devnode_ref == 0) {
+   pdata->xsp_devnode_excl = false;
+   notify = 

[PATCH Xilinx Alveo 8/8] fpga: xrt: Kconfig and Makefile updates for XRT drivers

2020-11-28 Thread Sonal Santan
From: Sonal Santan 

Update fpga Kconfig/Makefile and add Kconfig/Makefile for
new drivers.

Signed-off-by: Sonal Santan 
---
 drivers/fpga/Kconfig |  2 ++
 drivers/fpga/Makefile|  3 +++
 drivers/fpga/alveo/Kconfig   |  7 ++
 drivers/fpga/alveo/lib/Kconfig   | 11 +
 drivers/fpga/alveo/lib/Makefile  | 42 
 drivers/fpga/alveo/mgmt/Kconfig  | 11 +
 drivers/fpga/alveo/mgmt/Makefile | 28 +
 7 files changed, 104 insertions(+)
 create mode 100644 drivers/fpga/alveo/Kconfig
 create mode 100644 drivers/fpga/alveo/lib/Kconfig
 create mode 100644 drivers/fpga/alveo/lib/Makefile
 create mode 100644 drivers/fpga/alveo/mgmt/Kconfig
 create mode 100644 drivers/fpga/alveo/mgmt/Makefile

diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
index 7cd5a29fc437..8687ef231308 100644
--- a/drivers/fpga/Kconfig
+++ b/drivers/fpga/Kconfig
@@ -215,4 +215,6 @@ config FPGA_MGR_ZYNQMP_FPGA
  to configure the programmable logic(PL) through PS
  on ZynqMP SoC.
 
+source "drivers/fpga/alveo/Kconfig"
+
 endif # FPGA
diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
index d8e21dfc6778..59943dccf405 100644
--- a/drivers/fpga/Makefile
+++ b/drivers/fpga/Makefile
@@ -46,3 +46,6 @@ dfl-afu-objs += dfl-afu-error.o
 
 # Drivers for FPGAs which implement DFL
 obj-$(CONFIG_FPGA_DFL_PCI) += dfl-pci.o
+
+obj-$(CONFIG_FPGA_ALVEO_LIB)   += alveo/lib/
+obj-$(CONFIG_FPGA_ALVEO_XMGMT) += alveo/mgmt/
diff --git a/drivers/fpga/alveo/Kconfig b/drivers/fpga/alveo/Kconfig
new file mode 100644
index ..a583c3543945
--- /dev/null
+++ b/drivers/fpga/alveo/Kconfig
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Xilinx Alveo FPGA device configuration
+#
+
+source "drivers/fpga/alveo/lib/Kconfig"
+source "drivers/fpga/alveo/mgmt/Kconfig"
diff --git a/drivers/fpga/alveo/lib/Kconfig b/drivers/fpga/alveo/lib/Kconfig
new file mode 100644
index ..62175af2108e
--- /dev/null
+++ b/drivers/fpga/alveo/lib/Kconfig
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Xilinx Alveo FPGA device configuration
+#
+
+config FPGA_ALVEO_LIB
+   tristate "Xilinx Alveo Driver Library"
+   depends on HWMON && PCI 
+   select LIBFDT
+   help
+ Xilinx Alveo FPGA PCIe device driver common library.
diff --git a/drivers/fpga/alveo/lib/Makefile b/drivers/fpga/alveo/lib/Makefile
new file mode 100644
index ..a14204dc489d
--- /dev/null
+++ b/drivers/fpga/alveo/lib/Makefile
@@ -0,0 +1,42 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2020 Xilinx, Inc. All rights reserved.
+#
+# Authors: sonal.san...@xilinx.com
+#
+
+FULL_ALVEO_PATH=$(srctree)/$(src)/..
+FULL_DTC_PATH=$(srctree)/scripts/dtc/libfdt
+
+obj-$(CONFIG_FPGA_ALVEO_LIB) := xrt-lib.o
+
+xrt-lib-objs :=\
+   xrt-main.o  \
+   xrt-subdev.o\
+   xrt-cdev.o  \
+   ../common/xrt-metadata.o\
+   subdevs/xrt-partition.o \
+   subdevs/xrt-test.o  \
+   subdevs/xrt-vsec.o  \
+   subdevs/xrt-vsec-golden.o   \
+   subdevs/xrt-axigate.o   \
+   subdevs/xrt-qspi.o  \
+   subdevs/xrt-gpio.o  \
+   subdevs/xrt-mailbox.o   \
+   subdevs/xrt-icap.o  \
+   subdevs/xrt-cmc.o   \
+   subdevs/xrt-cmc-ctrl.o  \
+   subdevs/xrt-cmc-sensors.o   \
+   subdevs/xrt-cmc-mailbox.o   \
+   subdevs/xrt-cmc-bdinfo.o\
+   subdevs/xrt-cmc-sc.o\
+   subdevs/xrt-srsr.o  \
+   subdevs/xrt-clock.o \
+   subdevs/xrt-clkfreq.o   \
+   subdevs/xrt-ucs.o   \
+   subdevs/xrt-calib.o
+
+
+ccflags-y := -I$(FULL_ALVEO_PATH)/include \
+   -I$(FULL_ALVEO_PATH)/common \
+   -I$(FULL_DTC_PATH)
diff --git a/drivers/fpga/alveo/mgmt/Kconfig b/drivers/fpga/alveo/mgmt/Kconfig
new file mode 100644
index ..8a5590842dad
--- /dev/null
+++ b/drivers/fpga/alveo/mgmt/Kconfig
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Xilinx Alveo FPGA device configuration
+#
+
+config FPGA_ALVEO_XMGMT
+   tristate "Xilinx Alveo Management Driver"
+   depends on HWMON && PCI && FPGA_ALVEO_LIB
+   select LIBFDT
+   help
+ Xilinx Alveo FPGA PCIe device driver for Management Physical Function.
diff --git a/drivers/fpga/alveo/mgmt/Makefile b/drivers/fpga/alveo/mgmt/Makefile
new file mode 100644
index ..08be7952a832
--- /dev/null
+++ b/drivers/fpga/alveo/mgmt/Makefile
@@ -0,0 +1,28 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2019-2020 Xilinx, Inc. All rights reserved.
+#
+# Authors: sonal.san...@xilinx.com
+#
+
+FULL_ALVEO_PATH=$(srctree)/$(src)/..
+FULL_DTC_PATH=$(srctree)/scripts/dtc/libfdt
+
+obj-$(CONFIG_FPGA_ALVEO_XMGMT) += 

[PATCH Xilinx Alveo 7/8] fpga: xrt: Alveo management physical function driver

2020-11-28 Thread Sonal Santan
From: Sonal Santan 

Add management physical function driver core. The driver attaches
to management physical function of Alveo devices. It instantiates
the root driver and one or more partition drivers which in turn
instantiate platform drivers. The instantiation of partition and
platform drivers is completely data driven. The driver integrates
with FPGA manager and provides xclbin download service.

Signed-off-by: Sonal Santan 
---
 drivers/fpga/alveo/mgmt/xmgmt-fmgr-drv.c | 194 
 drivers/fpga/alveo/mgmt/xmgmt-fmgr.h |  29 +
 drivers/fpga/alveo/mgmt/xmgmt-main-impl.h|  36 +
 drivers/fpga/alveo/mgmt/xmgmt-main-mailbox.c | 930 +++
 drivers/fpga/alveo/mgmt/xmgmt-main-ulp.c | 190 
 drivers/fpga/alveo/mgmt/xmgmt-main.c | 843 +
 drivers/fpga/alveo/mgmt/xmgmt-root.c | 375 
 7 files changed, 2597 insertions(+)
 create mode 100644 drivers/fpga/alveo/mgmt/xmgmt-fmgr-drv.c
 create mode 100644 drivers/fpga/alveo/mgmt/xmgmt-fmgr.h
 create mode 100644 drivers/fpga/alveo/mgmt/xmgmt-main-impl.h
 create mode 100644 drivers/fpga/alveo/mgmt/xmgmt-main-mailbox.c
 create mode 100644 drivers/fpga/alveo/mgmt/xmgmt-main-ulp.c
 create mode 100644 drivers/fpga/alveo/mgmt/xmgmt-main.c
 create mode 100644 drivers/fpga/alveo/mgmt/xmgmt-root.c

diff --git a/drivers/fpga/alveo/mgmt/xmgmt-fmgr-drv.c 
b/drivers/fpga/alveo/mgmt/xmgmt-fmgr-drv.c
new file mode 100644
index ..d451b5a2c291
--- /dev/null
+++ b/drivers/fpga/alveo/mgmt/xmgmt-fmgr-drv.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Xilinx Alveo Management Function Driver
+ *
+ * Copyright (C) 2019-2020 Xilinx, Inc.
+ * Bulk of the code borrowed from XRT mgmt driver file, fmgr.c
+ *
+ * Authors: sonal.san...@xilinx.com
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "xrt-subdev.h"
+#include "xmgmt-fmgr.h"
+#include "xrt-axigate.h"
+#include "xmgmt-main-impl.h"
+
+/*
+ * Container to capture and cache full xclbin as it is passed in blocks by FPGA
+ * Manager. Driver needs access to full xclbin to walk through xclbin sections.
+ * FPGA Manager's .write() backend sends incremental blocks without any
+ * knowledge of xclbin format forcing us to collect the blocks and stitch them
+ * together here.
+ */
+
+struct xfpga_klass {
+   const struct platform_device *pdev;
+   struct axlf *blob;
+   char name[64];
+   size_t   count;
+   size_t   total_count;
+   struct mutex axlf_lock;
+   int  reader_ref;
+   enum fpga_mgr_states state;
+   enum xfpga_sec_level sec_level;
+};
+
+struct key *xfpga_keys;
+
+static int xmgmt_pr_write_init(struct fpga_manager *mgr,
+   struct fpga_image_info *info, const char *buf, size_t count)
+{
+   struct xfpga_klass *obj = mgr->priv;
+   const struct axlf *bin = (const struct axlf *)buf;
+
+   if (count < sizeof(struct axlf)) {
+   obj->state = FPGA_MGR_STATE_WRITE_INIT_ERR;
+   return -EINVAL;
+   }
+
+   if (count > bin->m_header.m_length) {
+   obj->state = FPGA_MGR_STATE_WRITE_INIT_ERR;
+   return -EINVAL;
+   }
+
+   /* Free up the previous blob */
+   vfree(obj->blob);
+   obj->blob = vmalloc(bin->m_header.m_length);
+   if (!obj->blob) {
+   obj->state = FPGA_MGR_STATE_WRITE_INIT_ERR;
+   return -ENOMEM;
+   }
+
+   xrt_info(obj->pdev, "Begin download of xclbin %pUb of length %lld B",
+   >m_header.uuid, bin->m_header.m_length);
+
+   obj->count = 0;
+   obj->total_count = bin->m_header.m_length;
+   obj->state = FPGA_MGR_STATE_WRITE_INIT;
+   return 0;
+}
+
+static int xmgmt_pr_write(struct fpga_manager *mgr,
+   const char *buf, size_t count)
+{
+   struct xfpga_klass *obj = mgr->priv;
+   char *curr = (char *)obj->blob;
+
+   if ((obj->state != FPGA_MGR_STATE_WRITE_INIT) &&
+   (obj->state != FPGA_MGR_STATE_WRITE)) {
+   obj->state = FPGA_MGR_STATE_WRITE_ERR;
+   return -EINVAL;
+   }
+
+   curr += obj->count;
+   obj->count += count;
+
+   /*
+* The xclbin buffer should not be longer than advertised in the header
+*/
+   if (obj->total_count < obj->count) {
+   obj->state = FPGA_MGR_STATE_WRITE_ERR;
+   return -EINVAL;
+   }
+
+   xrt_info(obj->pdev, "Copying block of %zu B of xclbin", count);
+   memcpy(curr, buf, count);
+   obj->state = FPGA_MGR_STATE_WRITE;
+   return 0;
+}
+
+
+static int xmgmt_pr_write_complete(struct fpga_manager *mgr,
+  struct fpga_image_info *info)
+{
+   int result = 0;
+   struct xfpga_klass *obj = mgr->priv;
+
+   if (obj->state != FPGA_MGR_STATE_WRITE) {
+   obj->state = FPGA_MGR_STATE_WRITE_COMPLETE_ERR;
+   

[PATCH Xilinx Alveo 6/8] fpga: xrt: header file for platform and parent drivers

2020-11-28 Thread Sonal Santan
From: Sonal Santan 

Add private header files for platform and parent drivers.
Each header file defines ioctls supported by the platform
or parent driver. The header files also define core data
structures for sending and receiving events by platform
and parent drivers.

Signed-off-by: Sonal Santan 
---
 drivers/fpga/alveo/include/xmgmt-main.h|  34 +++
 drivers/fpga/alveo/include/xrt-axigate.h   |  31 ++
 drivers/fpga/alveo/include/xrt-calib.h |  28 ++
 drivers/fpga/alveo/include/xrt-clkfreq.h   |  21 ++
 drivers/fpga/alveo/include/xrt-clock.h |  29 ++
 drivers/fpga/alveo/include/xrt-cmc.h   |  23 ++
 drivers/fpga/alveo/include/xrt-ddr-srsr.h  |  29 ++
 drivers/fpga/alveo/include/xrt-flash.h |  28 ++
 drivers/fpga/alveo/include/xrt-gpio.h  |  41 +++
 drivers/fpga/alveo/include/xrt-icap.h  |  27 ++
 drivers/fpga/alveo/include/xrt-mailbox.h   |  44 +++
 drivers/fpga/alveo/include/xrt-metadata.h  | 184 
 drivers/fpga/alveo/include/xrt-parent.h| 103 +++
 drivers/fpga/alveo/include/xrt-partition.h |  33 ++
 drivers/fpga/alveo/include/xrt-subdev.h| 333 +
 drivers/fpga/alveo/include/xrt-ucs.h   |  22 ++
 16 files changed, 1010 insertions(+)
 create mode 100644 drivers/fpga/alveo/include/xmgmt-main.h
 create mode 100644 drivers/fpga/alveo/include/xrt-axigate.h
 create mode 100644 drivers/fpga/alveo/include/xrt-calib.h
 create mode 100644 drivers/fpga/alveo/include/xrt-clkfreq.h
 create mode 100644 drivers/fpga/alveo/include/xrt-clock.h
 create mode 100644 drivers/fpga/alveo/include/xrt-cmc.h
 create mode 100644 drivers/fpga/alveo/include/xrt-ddr-srsr.h
 create mode 100644 drivers/fpga/alveo/include/xrt-flash.h
 create mode 100644 drivers/fpga/alveo/include/xrt-gpio.h
 create mode 100644 drivers/fpga/alveo/include/xrt-icap.h
 create mode 100644 drivers/fpga/alveo/include/xrt-mailbox.h
 create mode 100644 drivers/fpga/alveo/include/xrt-metadata.h
 create mode 100644 drivers/fpga/alveo/include/xrt-parent.h
 create mode 100644 drivers/fpga/alveo/include/xrt-partition.h
 create mode 100644 drivers/fpga/alveo/include/xrt-subdev.h
 create mode 100644 drivers/fpga/alveo/include/xrt-ucs.h

diff --git a/drivers/fpga/alveo/include/xmgmt-main.h 
b/drivers/fpga/alveo/include/xmgmt-main.h
new file mode 100644
index ..3f26c480ce27
--- /dev/null
+++ b/drivers/fpga/alveo/include/xmgmt-main.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 Xilinx, Inc.
+ *
+ * Authors:
+ * Cheng Zhen 
+ */
+
+#ifndef_XMGMT_MAIN_H_
+#define_XMGMT_MAIN_H_
+
+#include 
+
+enum xrt_mgmt_main_ioctl_cmd {
+   // section needs to be vfree'd by caller
+   XRT_MGMT_MAIN_GET_AXLF_SECTION = 0,
+   // vbnv needs to be kfree'd by caller
+   XRT_MGMT_MAIN_GET_VBNV,
+};
+
+enum provider_kind {
+   XMGMT_BLP,
+   XMGMT_PLP,
+   XMGMT_ULP,
+};
+
+struct xrt_mgmt_main_ioctl_get_axlf_section {
+   enum provider_kind xmmigas_axlf_kind;
+   enum axlf_section_kind xmmigas_section_kind;
+   void *xmmigas_section;
+   u64 xmmigas_section_size;
+};
+
+#endif /* _XMGMT_MAIN_H_ */
diff --git a/drivers/fpga/alveo/include/xrt-axigate.h 
b/drivers/fpga/alveo/include/xrt-axigate.h
new file mode 100644
index ..b1dd70546040
--- /dev/null
+++ b/drivers/fpga/alveo/include/xrt-axigate.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 Xilinx, Inc.
+ *
+ * Authors:
+ * Lizhi Hou 
+ */
+
+#ifndef_XRT_AXIGATE_H_
+#define_XRT_AXIGATE_H_
+
+
+#include "xrt-subdev.h"
+#include "xrt-metadata.h"
+
+/*
+ * AXIGATE driver IOCTL calls.
+ */
+enum xrt_axigate_ioctl_cmd {
+   XRT_AXIGATE_FREEZE = 0,
+   XRT_AXIGATE_FREE,
+};
+
+/* the ep names are in the order of hardware layers */
+static const char * const xrt_axigate_epnames[] = {
+   NODE_GATE_PLP,
+   NODE_GATE_ULP,
+   NULL
+};
+
+#endif /* _XRT_AXIGATE_H_ */
diff --git a/drivers/fpga/alveo/include/xrt-calib.h 
b/drivers/fpga/alveo/include/xrt-calib.h
new file mode 100644
index ..5e5bb5cec285
--- /dev/null
+++ b/drivers/fpga/alveo/include/xrt-calib.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 Xilinx, Inc.
+ *
+ * Authors:
+ * Cheng Zhen 
+ */
+
+#ifndef_XRT_CALIB_H_
+#define_XRT_CALIB_H_
+
+#include "xrt-subdev.h"
+#include 
+
+/*
+ * Memory calibration driver IOCTL calls.
+ */
+enum xrt_calib_results {
+   XRT_CALIB_UNKNOWN,
+   XRT_CALIB_SUCCEEDED,
+   XRT_CALIB_FAILED,
+};
+
+enum xrt_calib_ioctl_cmd {
+   XRT_CALIB_RESULT = 0,
+};
+
+#endif /* _XRT_CALIB_H_ */
diff --git a/drivers/fpga/alveo/include/xrt-clkfreq.h 
b/drivers/fpga/alveo/include/xrt-clkfreq.h
new file mode 100644
index ..60e4109cc05a
--- /dev/null
+++ b/drivers/fpga/alveo/include/xrt-clkfreq.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 Xilinx, Inc.
+ *
+ * 

[PATCH Xilinx Alveo 3/8] fpga: xrt: infrastructure support for xmgmt driver

2020-11-28 Thread Sonal Santan
From: Sonal Santan 

Add infrastructure code for XRT management physical function
driver. This provides support for enumerating and extracting
sections from xclbin files, interacting with device tree nodes
found in xclbin and working with Alveo partitions.

Signed-off-by: Sonal Santan 
---
 drivers/fpga/alveo/common/xrt-metadata.c | 590 ++
 drivers/fpga/alveo/common/xrt-root.c | 744 +++
 drivers/fpga/alveo/common/xrt-root.h |  24 +
 drivers/fpga/alveo/common/xrt-xclbin.c   | 387 
 drivers/fpga/alveo/common/xrt-xclbin.h   |  46 ++
 5 files changed, 1791 insertions(+)
 create mode 100644 drivers/fpga/alveo/common/xrt-metadata.c
 create mode 100644 drivers/fpga/alveo/common/xrt-root.c
 create mode 100644 drivers/fpga/alveo/common/xrt-root.h
 create mode 100644 drivers/fpga/alveo/common/xrt-xclbin.c
 create mode 100644 drivers/fpga/alveo/common/xrt-xclbin.h

diff --git a/drivers/fpga/alveo/common/xrt-metadata.c 
b/drivers/fpga/alveo/common/xrt-metadata.c
new file mode 100644
index ..5596619ed82d
--- /dev/null
+++ b/drivers/fpga/alveo/common/xrt-metadata.c
@@ -0,0 +1,590 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Xilinx Alveo FPGA Metadata parse APIs
+ *
+ * Copyright (C) 2020 Xilinx, Inc.
+ *
+ * Authors:
+ *  Lizhi Hou 
+ */
+
+#include 
+#include "libfdt.h"
+#include "xrt-metadata.h"
+
+#define MAX_BLOB_SIZE  (4096 * 25)
+
+#define md_err(dev, fmt, args...)  \
+   dev_err(dev, "%s: "fmt, __func__, ##args)
+#define md_warn(dev, fmt, args...) \
+   dev_warn(dev, "%s: "fmt, __func__, ##args)
+#define md_info(dev, fmt, args...) \
+   dev_info(dev, "%s: "fmt, __func__, ##args)
+#define md_dbg(dev, fmt, args...)  \
+   dev_dbg(dev, "%s: "fmt, __func__, ##args)
+
+static int xrt_md_setprop(struct device *dev, char *blob, int offset,
+   const char *prop, const void *val, int size);
+static int xrt_md_overlay(struct device *dev, char *blob, int target,
+   const char *overlay_blob, int overlay_offset);
+static int xrt_md_get_endpoint(struct device *dev, const char *blob,
+   const char *ep_name, const char *regmap_name, int *ep_offset);
+
+long xrt_md_size(struct device *dev, const char *blob)
+{
+   long len = (long) fdt_totalsize(blob);
+
+   return (len > MAX_BLOB_SIZE) ? -EINVAL : len;
+}
+
+int xrt_md_create(struct device *dev, char **blob)
+{
+   int ret = 0;
+
+   WARN_ON(!blob);
+
+   *blob = vmalloc(MAX_BLOB_SIZE);
+   if (!*blob)
+   return -ENOMEM;
+
+   ret = fdt_create_empty_tree(*blob, MAX_BLOB_SIZE);
+   if (ret) {
+   md_err(dev, "format blob failed, ret = %d", ret);
+   goto failed;
+   }
+
+   ret = fdt_next_node(*blob, -1, NULL);
+   if (ret < 0) {
+   md_err(dev, "No Node, ret = %d", ret);
+   goto failed;
+   }
+
+   ret = fdt_add_subnode(*blob, ret, NODE_ENDPOINTS);
+   if (ret < 0)
+   md_err(dev, "add node failed, ret = %d", ret);
+
+failed:
+   if (ret < 0) {
+   vfree(*blob);
+   *blob = NULL;
+   } else
+   ret = 0;
+
+   return ret;
+}
+
+int xrt_md_add_node(struct device *dev, char *blob, int parent_offset,
+   const char *ep_name)
+{
+   int ret;
+
+   ret = fdt_add_subnode(blob, parent_offset, ep_name);
+   if (ret < 0 && ret != -FDT_ERR_EXISTS)
+   md_err(dev, "failed to add node %s. %d", ep_name, ret);
+
+   return ret;
+}
+
+int xrt_md_del_endpoint(struct device *dev, char *blob, const char *ep_name,
+   char *regmap_name)
+{
+   int ret;
+   int ep_offset;
+
+   ret = xrt_md_get_endpoint(dev, blob, ep_name, regmap_name, _offset);
+   if (ret) {
+   md_err(dev, "can not find ep %s", ep_name);
+   return -EINVAL;
+   }
+
+   ret = fdt_del_node(blob, ep_offset);
+   if (ret)
+   md_err(dev, "delete node %s failed, ret %d", ep_name, ret);
+
+   return ret;
+}
+
+static int __xrt_md_add_endpoint(struct device *dev, char *blob,
+   struct xrt_md_endpoint *ep, int *offset, bool root)
+{
+   int ret = 0;
+   int ep_offset;
+   u32 val, count = 0;
+   u64 io_range[2];
+   char comp[128];
+
+   if (!ep->ep_name) {
+   md_err(dev, "empty name");
+   return -EINVAL;
+   }
+
+   if (!root) {
+   ret = xrt_md_get_endpoint(dev, blob, NODE_ENDPOINTS, NULL,
+   _offset);
+   if (ret) {
+   md_err(dev, "invalid blob, ret = %d", ret);
+   return -EINVAL;
+   }
+   } else {
+   ep_offset = 0;
+   }
+
+   ep_offset = xrt_md_add_node(dev, blob, ep_offset, ep->ep_name);
+   if (ep_offset < 0) {
+   md_err(dev, "add endpoint failed, ret = %d", ret);
+  

[PATCH Xilinx Alveo 1/8] Documentation: fpga: Add a document describing Alveo XRT drivers

2020-11-28 Thread Sonal Santan
From: Sonal Santan 

Describe Alveo XRT driver architecture and provide basic overview
of Xilinx Alveo platform.

Signed-off-by: Sonal Santan 
---
 Documentation/fpga/index.rst |   1 +
 Documentation/fpga/xrt.rst   | 588 +++
 2 files changed, 589 insertions(+)
 create mode 100644 Documentation/fpga/xrt.rst

diff --git a/Documentation/fpga/index.rst b/Documentation/fpga/index.rst
index f80f95667ca2..30134357b70d 100644
--- a/Documentation/fpga/index.rst
+++ b/Documentation/fpga/index.rst
@@ -8,6 +8,7 @@ fpga
 :maxdepth: 1

 dfl
+xrt

 .. only::  subproject and html

diff --git a/Documentation/fpga/xrt.rst b/Documentation/fpga/xrt.rst
new file mode 100644
index ..9f37d46459b0
--- /dev/null
+++ b/Documentation/fpga/xrt.rst
@@ -0,1 +1,588 @@
+==
+XRTV2 Linux Kernel Driver Overview
+==
+
+XRTV2 drivers are second generation `XRT `_ 
drivers which
+support `Alveo `_ 
PCIe platforms
+from Xilinx.
+
+XRTV2 drivers support *subsystem* style data driven platforms where driver's 
configuration
+and behavior is determined by meta data provided by platform (in *device tree* 
format).
+Primary management physical function (MPF) driver is called **xmgmt**. Primary 
user physical
+function (UPF) driver is called **xuser** and HW subsystem drivers are 
packaged into a library
+module called **xrt-lib**, which is shared by **xmgmt** and **xuser** (WIP).
+
+Alveo Platform Overview
+===
+
+Alveo platforms are architected as two physical FPGA partitions: *Shell* and 
*User*. Shell
+provides basic infrastructure for the Alveo platform like PCIe connectivity, 
board management,
+Dynamic Function Exchange (DFX), sensors, clocking, reset, and security. User 
partition contains
+user compiled binary which is loaded by a process called DFX also known as 
partial reconfiguration.
+
+Physical partitions require strict HW compatibility with each other for DFX to 
work properly.
+Every physical partition has two interface UUIDs: *parent* UUID and *child* 
UUID. For simple
+single stage platforms Shell → User forms parent child relationship. For 
complex two stage
+platforms Base → Shell → User forms the parent child relationship chain.
+
+.. note::
+   Partition compatibility matching is key design component of Alveo platforms 
and XRT. Partitions
+   have child and parent relationship. A loaded partition exposes child 
partition UUID to advertise
+   its compatibility requirement for child partition. When loading a child 
partition the xmgmt
+   management driver matches parent UUID of the child partition against child 
UUID exported by the
+   parent. Parent and child partition UUIDs are stored in the *xclbin* (for 
user) or *xsabin* (for
+   base and shell). Except for root UUID, VSEC, hardware itself does not know 
about UUIDs. UUIDs are
+   stored in xsabin and xclbin.
+
+
+The physical partitions and their loading is illustrated below::
+
+SHELL   USER
++---+  +---+
+|   |  |   |
+| VSEC UUID | CHILD PARENT |LOGIC UUID |
+|   o--->|

[PATCH Xilinx Alveo 2/8] fpga: xrt: Add UAPI header files

2020-11-28 Thread Sonal Santan
From: Sonal Santan 

Add XRT UAPI header files which describe flash layout, XRT
mailbox protocol, xclBin/axlf FPGA image container format and
XRT management physical function driver ioctl interfaces.

flash_xrt_data.h:
Layout used by XRT to store private data on flash.

mailbox_proto.h:
Mailbox opcodes and high level data structures representing
various kinds of information like sensors, clock, etc.

mailbox_transport.h:
Transport protocol used by mailbox.

xclbin.h:
Container format used to store compiled FPGA image which includes
bitstream and metadata.

xmgmt-ioctl.h:
Ioctls defined by management physical function driver:
* XMGMT_IOCICAPDOWNLOAD_AXLF
  xclbin download which programs the user partition
* XMGMT_IOCFREQSCALE
  Program the clocks driving user partition

Signed-off-by: Sonal Santan 
---
 include/uapi/linux/xrt/flash_xrt_data.h|  67 
 include/uapi/linux/xrt/mailbox_proto.h | 394 +++
 include/uapi/linux/xrt/mailbox_transport.h |  74 
 include/uapi/linux/xrt/xclbin.h| 418 +
 include/uapi/linux/xrt/xmgmt-ioctl.h   |  72 
 5 files changed, 1025 insertions(+)
 create mode 100644 include/uapi/linux/xrt/flash_xrt_data.h
 create mode 100644 include/uapi/linux/xrt/mailbox_proto.h
 create mode 100644 include/uapi/linux/xrt/mailbox_transport.h
 create mode 100644 include/uapi/linux/xrt/xclbin.h
 create mode 100644 include/uapi/linux/xrt/xmgmt-ioctl.h

diff --git a/include/uapi/linux/xrt/flash_xrt_data.h 
b/include/uapi/linux/xrt/flash_xrt_data.h
new file mode 100644
index ..0cafc2f38fbe
--- /dev/null
+++ b/include/uapi/linux/xrt/flash_xrt_data.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 Xilinx, Inc.
+ *
+ * Authors:
+ * Cheng Zhen 
+ */
+
+#ifndef _FLASH_XRT_DATA_H_
+#define _FLASH_XRT_DATA_H_
+
+#define XRT_DATA_MAGIC "XRTDATA"
+
+/*
+ * This header file contains data structure for xrt meta data on flash. This
+ * file is included in user space utilities and kernel drivers. The data
+ * structure is used to describe on-flash xrt data which is written by utility
+ * and read by driver. Any change of the data structure should either be
+ * backward compatible or cause version to be bumped up.
+ */
+
+struct flash_data_ident {
+   char fdi_magic[7];
+   char fdi_version;
+};
+
+/*
+ * On-flash meta data describing XRT data on flash. Either fdh_id_begin or
+ * fdh_id_end should be at well-known location on flash so that the reader
+ * can easily pick up fdi_version from flash before it tries to interpret
+ * the whole data structure.
+ * E.g., you align header in the end of the flash so that fdh_id_end is at well
+ * known location or align header at the beginning of the flash so that
+ * fdh_id_begin is at well known location.
+ */
+struct flash_data_header {
+   struct flash_data_ident fdh_id_begin;
+   uint32_t fdh_data_offset;
+   uint32_t fdh_data_len;
+   uint32_t fdh_data_parity;
+   uint8_t fdh_reserved[16];
+   struct flash_data_ident fdh_id_end;
+};
+
+static inline uint32_t flash_xrt_data_get_parity32(unsigned char *buf, size_t 
n)
+{
+   char *p;
+   size_t i;
+   size_t len;
+   uint32_t parity = 0;
+
+   for (len = 0; len < n; len += 4) {
+   uint32_t tmp = 0;
+   size_t thislen = n - len;
+
+   /* One word at a time. */
+   if (thislen > 4)
+   thislen = 4;
+
+   for (i = 0, p = (char *) i < thislen; i++)
+   p[i] = buf[len + i];
+   parity ^= tmp;
+   }
+   return parity;
+}
+
+#endif
diff --git a/include/uapi/linux/xrt/mailbox_proto.h 
b/include/uapi/linux/xrt/mailbox_proto.h
new file mode 100644
index ..2aa782d86792
--- /dev/null
+++ b/include/uapi/linux/xrt/mailbox_proto.h
@@ -0,0 +1,394 @@
+/* SPDX-License-Identifier: Apache-2.0 OR GPL-2.0 */
+/*
+ *  Copyright (C) 2019-2020, Xilinx Inc
+ */
+
+#ifndef _XCL_MB_PROTOCOL_H_
+#define _XCL_MB_PROTOCOL_H_
+
+#ifndef __KERNEL__
+#include 
+#else
+#include 
+#endif
+
+/*
+ * This header file contains mailbox protocol b/w mgmt and user pfs.
+ * - Any changes made here should maintain backward compatibility.
+ * - If it's not possible, new OP code should be added and version number 
should
+ *   be bumped up.
+ * - Support for old OP code should never be removed.
+ */
+#define XCL_MB_PROTOCOL_VER0U
+
+/*
+ * UUID_SZ should ALWAYS have the same number
+ * as the MACRO UUID_SIZE defined in linux/uuid.h
+ */
+#define XCL_UUID_SZ16
+
+/**
+ * enum mailbox_request - List of all mailbox request OPCODE. Some OP code
+ *requires arguments, which is defined as corresponding
+ *data structures below. Response to the request 
usually
+ *is a int32_t containing the error code. Some 
responses
+ *are more complicated and require a data 

[PATCH Xilinx Alveo 0/8] Xilinx Alveo/XRT patch overview

2020-11-28 Thread Sonal Santan
Hello,

This patch series adds management physical function driver for Xilinx Alveo PCIe
accelerator cards, https://www.xilinx.com/products/boards-and-kits/alveo.html
This driver is part of Xilinx Runtime (XRT) open source stack.

The patch depends on the "PATCH Xilinx Alveo libfdt prep" which was posted
before.

ALVEO PLATFORM ARCHITECTURE

Alveo PCIe FPGA based platforms have a static *shell* partition and a partial
re-configurable *user* partition. The shell partition is automatically loaded 
from
flash when host is booted and PCIe is enumerated by BIOS. Shell cannot be 
changed
till the next cold reboot. The shell exposes two PCIe physical functions:

1. management physical function
2. user physical function

The patch series includes Documentation/xrt.rst which describes Alveo
platform, xmgmt driver architecture and deployment model in more more detail.

Users compile their high level design in C/C++/OpenCL or RTL into FPGA image
using Vitis 
https://www.xilinx.com/products/design-tools/vitis/vitis-platform.html
tools. The image is packaged as xclbin and contains partial bitstream for the
user partition and necessary metadata. Users can dynamically swap the image
running on the user partition in order to switch between different workloads.

ALVEO DRIVERS

Alveo Linux kernel driver *xmgmt* binds to management physical function of
Alveo platform. The modular driver framework is organized into several
platform drivers which primarily handle the following functionality:

1.  Loading firmware container also called xsabin at driver attach time
2.  Loading of user compiled xclbin with FPGA Manager integration
3.  Clock scaling of image running on user partition
4.  In-band sensors: temp, voltage, power, etc.
5.  Device reset and rescan
6.  Flashing static *shell* partition

The platform drivers are packaged into *xrt-lib* helper module with a well
defined interfaces the details of which can be found in Documentation/xrt.rst.

xmgmt driver is second generation Alveo management driver and evolution of
the first generation (out of tree) Alveo management driver, xclmgmt. The
sources of the first generation drivers were posted on LKML last year--
https://lore.kernel.org/lkml/20190319215401.6562-1-sonal.san...@xilinx.com/

Changes since the first generation driver include the following: the driver
has been re-architected as data driven modular driver; the driver has been
split into xmgmt and xrt-lib; user physical function driver has been removed
from the patch series.

Alveo/XRT security and platform architecture is documented on the following 
GitHub pages:
https://xilinx.github.io/XRT/master/html/security.html
https://xilinx.github.io/XRT/master/html/platforms_partitions.html

User physical function driver is not included in this patch series.

TESTING AND VALIDATION

xmgmt driver can be tested with full XRT open source stack which includes
user space libraries, board utilities and (out of tree) first generation
user physical function driver xocl. XRT open source runtime stack is
available at https://github.com/Xilinx/XRT. This patch series has been
validated on Alveo U50 platform.

Complete documentation for XRT open source stack can be found here--
https://xilinx.github.io/XRT/master/html/index.html

Thanks,
-Sonal

Sonal Santan (8):
  Documentation: fpga: Add a document describing Alveo XRT drivers
  fpga: xrt: Add UAPI header files
  fpga: xrt: infrastructure support for xmgmt driver
  fpga: xrt: core infrastructure for xrt-lib module
  fpga: xrt: platform drivers for subsystems in shell partition
  fpga: xrt: header file for platform and parent drivers
  fpga: xrt: Alveo management physical function driver
  fpga: xrt: Kconfig and Makefile updates for XRT drivers

 Documentation/fpga/index.rst  |1 +
 Documentation/fpga/xrt.rst|  588 +
 drivers/fpga/Kconfig  |2 +
 drivers/fpga/Makefile |3 +
 drivers/fpga/alveo/Kconfig|7 +
 drivers/fpga/alveo/common/xrt-metadata.c  |  590 +
 drivers/fpga/alveo/common/xrt-root.c  |  744 +++
 drivers/fpga/alveo/common/xrt-root.h  |   24 +
 drivers/fpga/alveo/common/xrt-xclbin.c|  387 
 drivers/fpga/alveo/common/xrt-xclbin.h|   46 +
 drivers/fpga/alveo/include/xmgmt-main.h   |   34 +
 drivers/fpga/alveo/include/xrt-axigate.h  |   31 +
 drivers/fpga/alveo/include/xrt-calib.h|   28 +
 drivers/fpga/alveo/include/xrt-clkfreq.h  |   21 +
 drivers/fpga/alveo/include/xrt-clock.h|   29 +
 drivers/fpga/alveo/include/xrt-cmc.h  |   23 +
 drivers/fpga/alveo/include/xrt-ddr-srsr.h |   29 +
 drivers/fpga/alveo/include/xrt-flash.h|   28 +
 drivers/fpga/alveo/include/xrt-gpio.h |   41 +
 drivers/fpga/alveo/include/xrt-icap.h |   27 +
 drivers/fpga/alveo/include/xrt-mailbox.h  |   44 +
 drivers/fpga/alveo/include/xrt-metadata.h |  184 ++
 

Lockdep warning on io_file_data_ref_zero() with 5.10-rc5

2020-11-28 Thread Nadav Amit
Hello Pavel,

I got the following lockdep splat while rebasing my work on 5.10-rc5 on the
kernel (based on 5.10-rc5+).

I did not actually confirm that the problem is triggered without my changes,
as my iouring workload requires some kernel changes (not iouring changes),
yet IMHO it seems pretty clear that this is a result of your commit
e297822b20e7f ("io_uring: order refnode recycling”), that acquires a lock in
io_file_data_ref_zero() inside a softirq context.

Let me know if my analysis is wrong.

Regards,
Nadav

[  136.349353] 
[  136.350212] WARNING: inconsistent lock state
[  136.351093] 5.10.0-rc5+ #1435 Not tainted
[  136.352003] 
[  136.352891] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[  136.354057] swapper/5/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[  136.355078] 88810417d6a8 (_data->lock){+.?.}-{2:2}, at: 
io_file_data_ref_zero+0x4d/0x220
[  136.356717] {SOFTIRQ-ON-W} state was registered at:
[  136.357539]   lock_acquire+0x172/0x520
[  136.358209]   _raw_spin_lock+0x30/0x40
[  136.358880]   __io_uring_register+0x1c99/0x1fe0
[  136.359656]   __x64_sys_io_uring_register+0xe2/0x270
[  136.360489]   do_syscall_64+0x39/0x90
[  136.361144]   entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  136.361991] irq event stamp: 835836
[  136.362627] hardirqs last  enabled at (835836): [] 
_raw_spin_unlock_irqrestore+0x41/0x50
[  136.364112] hardirqs last disabled at (835835): [] 
_raw_spin_lock_irqsave+0x5a/0x60
[  136.365553] softirqs last  enabled at (835824): [] 
_local_bh_enable+0x21/0x40
[  136.366920] softirqs last disabled at (835825): [] 
asm_call_irq_on_stack+0x12/0x20
[  136.368335] 
[  136.368335] other info that might help us debug this:
[  136.369414]  Possible unsafe locking scenario:
[  136.369414] 
[  136.370414]CPU0
[  136.370907]
[  136.371403]   lock(_data->lock);
[  136.372064]   
[  136.372585] lock(_data->lock);
[  136.373269] 
[  136.373269]  *** DEADLOCK ***
[  136.373269] 
[  136.374319] 2 locks held by swapper/5/0:
[  136.375005]  #0: 83c45380 (rcu_callback){}-{0:0}, at: 
rcu_core+0x451/0xb70
[  136.376284]  #1: 83c454a0 (rcu_read_lock){}-{1:2}, at: 
percpu_ref_switch_to_atomic_rcu+0x139/0x320
[  136.377849] 
[  136.377849] stack backtrace:
[  136.378650] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.10.0-rc5irina+ #1435
[  136.379746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[  136.381550] Call Trace:
[  136.382053]  
[  136.382502]  dump_stack+0xa4/0xd9
[  136.383116]  print_usage_bug.cold+0x217/0x220
[  136.383871]  mark_lock+0xb90/0xe80
[  136.384506]  ? print_usage_bug+0x180/0x180
[  136.385223]  ? __kasan_check_read+0x11/0x20
[  136.385946]  ? mark_lock+0x116/0xe80
[  136.386599]  ? print_usage_bug+0x180/0x180
[  136.387324]  ? __lock_acquire+0x8f5/0x2a80
[  136.388039]  ? __kasan_check_read+0x11/0x20
[  136.388776]  ? __lock_acquire+0x8f5/0x2a80
[  136.389493]  __lock_acquire+0xdc9/0x2a80
[  136.390190]  ? lockdep_hardirqs_on_prepare+0x210/0x210
[  136.391039]  ? rcu_read_lock_sched_held+0xa1/0xd0
[  136.391835]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  136.392603]  lock_acquire+0x172/0x520
[  136.393258]  ? io_file_data_ref_zero+0x4d/0x220
[  136.394025]  ? lock_release+0x410/0x410
[  136.394705]  ? lock_acquire+0x172/0x520
[  136.395386]  ? percpu_ref_switch_to_atomic_rcu+0x139/0x320
[  136.396277]  ? lock_release+0x410/0x410
[  136.396961]  _raw_spin_lock+0x30/0x40
[  136.397620]  ? io_file_data_ref_zero+0x4d/0x220
[  136.398392]  io_file_data_ref_zero+0x4d/0x220
[  136.399138]  percpu_ref_switch_to_atomic_rcu+0x310/0x320
[  136.47]  ? percpu_ref_init+0x180/0x180
[  136.400730]  rcu_core+0x49c/0xb70
[  136.401344]  ? rcu_core+0x451/0xb70
[  136.401978]  ? strict_work_handler+0x150/0x150
[  136.402740]  ? rcu_read_lock_sched_held+0xa1/0xd0
[  136.403535]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  136.404298]  rcu_core_si+0xe/0x10
[  136.404914]  __do_softirq+0x104/0x59d
[  136.405572]  asm_call_irq_on_stack+0x12/0x20
[  136.406306]  
[  136.406760]  do_softirq_own_stack+0x6f/0x80
[  136.407484]  irq_exit_rcu+0xf3/0x100
[  136.408134]  sysvec_apic_timer_interrupt+0x4b/0xb0
[  136.408946]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  136.409798] RIP: 0010:default_idle+0x1c/0x20
[  136.410536] Code: eb cd 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 
55 48 89 e5 e8 b2 b1 a6 fe e9 07 00 00 00 0f 00 2d 26 f1 5c 00 fb f4 <5d> c3 cc 
cc 0f 1f 44 00 00 55 48 89 e5 41 55 4c 8b 2d 8e c2 00 02
[  136.413291] RSP: 0018:c911fda8 EFLAGS: 0206
[  136.414150] RAX: 000cc0ed RBX: 0005 RCX: dc00
[  136.415256] RDX:  RSI:  RDI: 8285578e
[  136.416364] RBP: c911fda8 R08: 0001 R09: 0001
[  136.417474] R10: 8881e877546b R11: ed103d0eea8d R12: 0005
[  136.418579] R13: 

[PATCH Xilinx Alveo libfdt prep 0/1] Expose libfdt for use by Alveo/XRT

2020-11-28 Thread Sonal Santan
Hello,

This patch series adds support for exporting limited set of libfdt symbols from
Linux kernel. It enables drivers and other kernel modules to use libfdt for
working with device trees. This may be used by platform vendors to describe HW
features inside a PCIe device to its driver in a data driven manner.


"Xilinx Alveo" PCIe accelerator card driver patch series which follows this 
patch
makes use of device tree to advertise HW subsystems sitting behind PCIe BARs.
The use of device trees makes the driver data driven and overall solution more
scalable.

Thanks,
-Sonal

Sonal Santan (1):
  Export subset of libfdt symbols for use by other drivers.

 lib/fdt.c|  6 ++
 lib/fdt_empty_tree.c |  3 +++
 lib/fdt_ro.c | 11 +++
 lib/fdt_rw.c |  6 ++
 4 files changed, 26 insertions(+)

-- 
2.17.1



[PATCH Xilinx Alveo libfdt prep 1/1] Export subset of libfdt symbols for use by other drivers.

2020-11-28 Thread Sonal Santan
From: Sonal Santan 

Some drivers may want to use device tree as metadata format to discover HW
subsystems behind PCIe BAR. This is particularly useful for PCIe FPGA
devices.

Signed-off-by: Sonal Santan 
---
 lib/fdt.c|  6 ++
 lib/fdt_empty_tree.c |  3 +++
 lib/fdt_ro.c | 11 +++
 lib/fdt_rw.c |  6 ++
 4 files changed, 26 insertions(+)

diff --git a/lib/fdt.c b/lib/fdt.c
index 97f20069fc37..9747513b50e7 100644
--- a/lib/fdt.c
+++ b/lib/fdt.c
@@ -1,2 +1,8 @@
 #include 
+#include 
 #include "../scripts/dtc/libfdt/fdt.c"
+
+EXPORT_SYMBOL_GPL(fdt_next_node);
+EXPORT_SYMBOL_GPL(fdt_first_subnode);
+EXPORT_SYMBOL_GPL(fdt_next_subnode);
+EXPORT_SYMBOL_GPL(fdt_subnode_offset);
diff --git a/lib/fdt_empty_tree.c b/lib/fdt_empty_tree.c
index 5d30c58150ad..3dab578c9d22 100644
--- a/lib/fdt_empty_tree.c
+++ b/lib/fdt_empty_tree.c
@@ -1,2 +1,5 @@
 #include 
+#include 
 #include "../scripts/dtc/libfdt/fdt_empty_tree.c"
+
+EXPORT_SYMBOL_GPL(fdt_create_empty_tree);
diff --git a/lib/fdt_ro.c b/lib/fdt_ro.c
index f73c04ea7be4..ec96cf3d0d7a 100644
--- a/lib/fdt_ro.c
+++ b/lib/fdt_ro.c
@@ -1,2 +1,13 @@
 #include 
+#include 
 #include "../scripts/dtc/libfdt/fdt_ro.c"
+
+EXPORT_SYMBOL_GPL(fdt_getprop_by_offset);
+EXPORT_SYMBOL_GPL(fdt_node_check_compatible);
+EXPORT_SYMBOL_GPL(fdt_get_name);
+EXPORT_SYMBOL_GPL(fdt_next_property_offset);
+EXPORT_SYMBOL_GPL(fdt_getprop);
+EXPORT_SYMBOL_GPL(fdt_node_offset_by_compatible);
+EXPORT_SYMBOL_GPL(fdt_parent_offset);
+EXPORT_SYMBOL_GPL(fdt_stringlist_get);
+EXPORT_SYMBOL_GPL(fdt_first_property_offset);
diff --git a/lib/fdt_rw.c b/lib/fdt_rw.c
index 0c1f0f4a4b13..ec3939ed3d7d 100644
--- a/lib/fdt_rw.c
+++ b/lib/fdt_rw.c
@@ -1,2 +1,8 @@
 #include 
+#include 
 #include "../scripts/dtc/libfdt/fdt_rw.c"
+
+EXPORT_SYMBOL_GPL(fdt_del_node);
+EXPORT_SYMBOL_GPL(fdt_add_subnode);
+EXPORT_SYMBOL_GPL(fdt_pack);
+EXPORT_SYMBOL_GPL(fdt_setprop);
-- 
2.17.1



Re: [GIT PULL] USB/PHY driver fixes for 5.10-rc6

2020-11-28 Thread Linus Torvalds
On Sat, Nov 28, 2020 at 2:07 PM Randy Dunlap  wrote:
>
> Could it just be a vger issue?  vger has been acting ill today...

Possible. pr-tracker-bot obviously is back, it just had a very long delay.

And yes, the delay might have been due to it not seeing the original
pull requests due to vger slowness. I didn't think to check if the
pull requests were visible on lore.kernel.org when I was wondering
where the pr-tracker-bot was.

   Linus


Re: [PATCH] Staging: android: ashmem: Fixed a coding style issue.

2020-11-28 Thread kernel test robot
Hi Vishawanath,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on staging/staging-testing]

url:
https://github.com/0day-ci/linux/commits/Vishawanath-Jadhav/Staging-android-ashmem-Fixed-a-coding-style-issue/20201129-060817
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git 
1de16e38f1fdbfd9d842a06919098813ed93abf7
config: powerpc64-randconfig-r001-20201129 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
f502b14d40e751fe00afc493ef0d08f196524886)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc64 cross compiling tool for clang build
# apt-get install binutils-powerpc64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/b135b8b40f7a0f0a8ac6a6d5e083af1e2535ff10
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Vishawanath-Jadhav/Staging-android-ashmem-Fixed-a-coding-style-issue/20201129-060817
git checkout b135b8b40f7a0f0a8ac6a6d5e083af1e2535ff10
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> drivers/staging/android/ashmem.c:430:16: error: cannot assign to variable 
>> 'vmfile_fops' with const-qualified type 'const struct file_operations'
   vmfile_fops = *vmfile->f_op;
   ~~~ ^
   drivers/staging/android/ashmem.c:379:38: note: variable 'vmfile_fops' 
declared const here
   static const struct file_operations vmfile_fops;
   ^~~
   drivers/staging/android/ashmem.c:431:21: error: cannot assign to variable 
'vmfile_fops' with const-qualified type 'const struct file_operations'
   vmfile_fops.mmap = ashmem_vmfile_mmap;
    ^
   drivers/staging/android/ashmem.c:379:38: note: variable 'vmfile_fops' 
declared const here
   static const struct file_operations vmfile_fops;
   ^~~
   drivers/staging/android/ashmem.c:432:34: error: cannot assign to variable 
'vmfile_fops' with const-qualified type 'const struct file_operations'
   vmfile_fops.get_unmapped_area =
   ~ ^
   drivers/staging/android/ashmem.c:379:38: note: variable 'vmfile_fops' 
declared const here
   static const struct file_operations vmfile_fops;
   ^~~
   3 errors generated.

vim +430 drivers/staging/android/ashmem.c

6d67b0290b4b84c Suren Baghdasaryan 2020-01-27  376  
11980c2ac4ccfad Robert Love2011-12-20  377  static int 
ashmem_mmap(struct file *file, struct vm_area_struct *vma)
11980c2ac4ccfad Robert Love2011-12-20  378  {
b135b8b40f7a0f0 Vishawanath Jadhav 2020-11-28  379  static const struct 
file_operations vmfile_fops;
11980c2ac4ccfad Robert Love2011-12-20  380  struct ashmem_area 
*asma = file->private_data;
11980c2ac4ccfad Robert Love2011-12-20  381  int ret = 0;
11980c2ac4ccfad Robert Love2011-12-20  382  
11980c2ac4ccfad Robert Love2011-12-20  383  
mutex_lock(_mutex);
11980c2ac4ccfad Robert Love2011-12-20  384  
11980c2ac4ccfad Robert Love2011-12-20  385  /* user needs to 
SET_SIZE before mapping */
59848d6aded59a6 Alistair Strachan  2018-06-19  386  if (!asma->size) {
11980c2ac4ccfad Robert Love2011-12-20  387  ret = -EINVAL;
11980c2ac4ccfad Robert Love2011-12-20  388  goto out;
11980c2ac4ccfad Robert Love2011-12-20  389  }
11980c2ac4ccfad Robert Love2011-12-20  390  
8632c614565d0c5 Alistair Strachan  2018-06-19  391  /* requested mapping 
size larger than object size */
8632c614565d0c5 Alistair Strachan  2018-06-19  392  if (vma->vm_end - 
vma->vm_start > PAGE_ALIGN(asma->size)) {
11980c2ac4ccfad Robert Love2011-12-20  393  ret = -EINVAL;
11980c2ac4ccfad Robert Love2011-12-20  394  goto out;
11980c2ac4ccfad Robert Love2011-12-20  395  }
11980c2ac4ccfad Robert Love2011-12-20  396  
11980c2ac4ccfad Robert Love2011-12-20  397  /* requested protection 
bits must match our allowed protection mask */
59848d6aded59a6 Alistair Strachan  2018-06-19  398  if ((vma->vm_flags & 
~calc_vm_prot_bits(asma->prot_mask, 0)) &
59848d6aded59a6 Alistair Strachan  2018-06-19  399  
calc_vm_prot_bits(PROT_MASK, 0)) {
11980c2ac4ccfad Robert Love2011-12-20  400  ret = -EPERM;

[PATCH] thermal: Constify static attribute_group structs

2020-11-28 Thread Rikard Falkeborn
The only usage of these structs is to assign their address to the
thermal_zone_attribute_groups array, which consists of pointers to
const, so make them const to allow the compiler to put them in read-only
memory.

Signed-off-by: Rikard Falkeborn 
---
 drivers/thermal/thermal_sysfs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index a6f371fc9af2..0866e949339b 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -425,7 +425,7 @@ static struct attribute *thermal_zone_dev_attrs[] = {
NULL,
 };
 
-static struct attribute_group thermal_zone_attribute_group = {
+static const struct attribute_group thermal_zone_attribute_group = {
.attrs = thermal_zone_dev_attrs,
 };
 
@@ -434,7 +434,7 @@ static struct attribute *thermal_zone_mode_attrs[] = {
NULL,
 };
 
-static struct attribute_group thermal_zone_mode_attribute_group = {
+static const struct attribute_group thermal_zone_mode_attribute_group = {
.attrs = thermal_zone_mode_attrs,
 };
 
@@ -468,7 +468,7 @@ static umode_t thermal_zone_passive_is_visible(struct 
kobject *kobj,
return 0;
 }
 
-static struct attribute_group thermal_zone_passive_attribute_group = {
+static const struct attribute_group thermal_zone_passive_attribute_group = {
.attrs = thermal_zone_passive_attrs,
.is_visible = thermal_zone_passive_is_visible,
 };
-- 
2.29.2



Re: [PATCH] PCI: Add pci reset quirk for Huawei Intelligent NIC virtual function

2020-11-28 Thread Bjorn Helgaas
[+cc Alex]

On Sat, Nov 28, 2020 at 02:18:25PM +0800, Chiqijun wrote:
> When multiple VFs do FLR at the same time, the firmware is
> processed serially, resulting in some VF FLRs being delayed more
> than 100ms, when the virtual machine restarts and the device
> driver is loaded, the firmware is doing the corresponding VF
> FLR, causing the driver to fail to load.
> 
> To solve this problem, add host and firmware status synchronization
> during FLR.

Is this because the Huawei Intelligent NIC isn't following the spec,
or is it because Linux isn't correctly waiting for the FLR to
complete?

If this is a Huawei Intelligent NIC defect, is there documentation
somewhere (errata) that you can reference?  Will it be fixed in future
designs, so we don't have to add future Device IDs to the quirk?

> Signed-off-by: Chiqijun 
> ---
>  drivers/pci/quirks.c | 67 
>  1 file changed, 67 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index f70692ac79c5..bd6236ea9064 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3912,6 +3912,71 @@ static int delay_250ms_after_flr(struct pci_dev *dev, 
> int probe)
>   return 0;
>  }
>  
> +#define PCI_DEVICE_ID_HINIC_VF  0x375E
> +#define HINIC_VF_FLR_TYPE   0x1000
> +#define HINIC_VF_OP 0xE80
> +#define HINIC_OPERATION_TIMEOUT 15000
> +
> +/* Device-specific reset method for Huawei Intelligent NIC virtual functions 
> */
> +static int reset_hinic_vf_dev(struct pci_dev *pdev, int probe)
> +{
> + unsigned long timeout;
> + void __iomem *bar;
> + u16 old_command;
> + u32 val;
> +
> + if (probe)
> + return 0;
> +
> + bar = pci_iomap(pdev, 0, 0);
> + if (!bar)
> + return -ENOTTY;
> +
> + pci_read_config_word(pdev, PCI_COMMAND, _command);
> +
> + /*
> +  * FLR cap bit bit30, FLR ACK bit: bit18, to avoid big-endian conversion
> +  * the big-endian bit6, bit10 is directly operated here
> +  */
> + val = readl(bar + HINIC_VF_FLR_TYPE);
> + if (!(val & (1UL << 6))) {
> + pci_iounmap(pdev, bar);
> + return -ENOTTY;
> + }
> +
> + val = readl(bar + HINIC_VF_OP);
> + val = val | (1UL << 10);
> + writel(val, bar + HINIC_VF_OP);
> +
> + /* Perform the actual device function reset */
> + pcie_flr(pdev);
> +
> + pci_write_config_word(pdev, PCI_COMMAND,
> +   old_command | PCI_COMMAND_MEMORY);
> +
> + /* Waiting for device reset complete */
> + timeout = jiffies + msecs_to_jiffies(HINIC_OPERATION_TIMEOUT);
> + do {
> + val = readl(bar + HINIC_VF_OP);
> + if (!(val & (1UL << 10)))
> + goto reset_complete;
> + msleep(20);
> + } while (time_before(jiffies, timeout));
> +
> + val = readl(bar + HINIC_VF_OP);
> + if (!(val & (1UL << 10)))
> + goto reset_complete;
> +
> + pci_warn(pdev, "Reset dev timeout, flr ack reg: %x\n",
> +  be32_to_cpu(val));
> +
> +reset_complete:
> + pci_write_config_word(pdev, PCI_COMMAND, old_command);
> + pci_iounmap(pdev, bar);
> +
> + return 0;
> +}
> +
>  static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
>   { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
>reset_intel_82599_sfp_virtfn },
> @@ -3923,6 +3988,8 @@ static const struct pci_dev_reset_methods 
> pci_dev_reset_methods[] = {
>   { PCI_VENDOR_ID_INTEL, 0x0953, delay_250ms_after_flr },
>   { PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
>   reset_chelsio_generic_dev },
> + { PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HINIC_VF,
> + reset_hinic_vf_dev },
>   { 0 }
>  };
>  
> -- 
> 2.17.1
> 


Re: [PATCH] scsi: ses: Fix crash caused by kfree an invalid pointer

2020-11-28 Thread James Bottomley
On Sat, 2020-11-28 at 20:23 +0800, Ding Hui wrote:
> We can get a crash when disconnecting the iSCSI session,
> the call trace like this:
> 
>   [2a00fb70] kfree at 0830e224
>   [2a00fba0] ses_intf_remove at 01f200e4
>   [2a00fbd0] device_del at 086b6a98
>   [2a00fc50] device_unregister at 086b6d58
>   [2a00fc70] __scsi_remove_device at 0870608c
>   [2a00fca0] scsi_remove_device at 08706134
>   [2a00fcc0] __scsi_remove_target at 087062e4
>   [2a00fd10] scsi_remove_target at 087064c0
>   [2a00fd70] __iscsi_unbind_session at 01c872c4
>   [2a00fdb0] process_one_work at 0810f35c
>   [2a00fe00] worker_thread at 0810f648
>   [2a00fe70] kthread at 08116e98
> 
> In ses_intf_add, components count could be 0, and kcalloc 0 size
> scomp,
> but not saved in edev->component[i].scratch
> 
> In this situation, edev->component[0].scratch is an invalid pointer,
> when kfree it in ses_intf_remove_enclosure, a crash like above would
> happen
> The call trace also could be other random cases when kfree cannot
> catch
> the invalid pointer
> 
> We should not use edev->component[] array when the components count
> is 0
> We also need check index when use edev->component[] array in
> ses_enclosure_data_process
> 
> Tested-by: Zeng Zhicong 
> Cc: stable  # 2.6.25+
> Signed-off-by: Ding Hui 

This doesn't really look to be the right thing to do: an enclosure
which has no component can't usefully be controlled by the driver since
there's nothing for it to do, so what we should do in this situation is
refuse to attach like the proposed patch below.

It does seem a bit odd that someone would build an enclosure that
doesn't enclose anything, so would you mind running

sg_ses -e 

on it and reporting back what it shows?  It's possible there's another
type that the enclosure device should be tracking.

Regards,

James

---8>8>8><8<8<8
From: James Bottomley 
Subject: [PATCH] scsi: ses: don't attach if enclosure has no components

An enclosure with no components can't usefully be operated by the
driver (since effectively it has nothing to manage), so report the
problem and don't attach.  Not attaching also fixes an oops which
could occur if the driver tries to manage a zero component enclosure.

Reported-by: Ding Hui 
Cc: sta...@vger.kernel.org
Signed-off-by: James Bottomley 
---
 drivers/scsi/ses.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index c2afba2a5414..9624298b9c89 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -690,6 +690,11 @@ static int ses_intf_add(struct device *cdev,
type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE)
components += type_ptr[1];
}
+   if (components == 0) {
+   sdev_printk(KERN_ERR, sdev, "enclosure has no enumerated 
components\n");
+   goto err_free;
+   }
+
ses_dev->page1 = buf;
ses_dev->page1_len = len;
buf = NULL;
-- 
2.26.2





Re: [PATCH 1/5] PCI/DPC: Ignore devices with no AER Capability

2020-11-28 Thread Bjorn Helgaas
On Sat, Nov 28, 2020 at 01:56:23PM -0800, Kuppuswamy, Sathyanarayanan wrote:
> On 11/28/20 1:53 PM, Bjorn Helgaas wrote:
> > On Sat, Nov 28, 2020 at 01:49:46PM -0800, Kuppuswamy, Sathyanarayanan wrote:
> > > On 11/28/20 12:24 PM, Bjorn Helgaas wrote:
> > > > On Wed, Nov 25, 2020 at 06:01:57PM -0800, Kuppuswamy, Sathyanarayanan 
> > > > wrote:
> > > > > On 11/25/20 5:18 PM, Bjorn Helgaas wrote:
> > > > > > From: Bjorn Helgaas 
> > > > > > 
> > > > > > Downstream Ports may support DPC regardless of whether they support 
> > > > > > AER
> > > > > > (see PCIe r5.0, sec 6.2.10.2).  Previously, if the user booted with
> > > > > > "pcie_ports=dpc-native", it was possible for dpc_probe() to succeed 
> > > > > > even if
> > > > > > the device had no AER Capability, but 
> > > > > > dpc_get_aer_uncorrect_severity()
> > > > > > depends on the AER Capability.
> > > > > > 
> > > > > > dpc_probe() previously failed if:
> > > > > > 
> > > > > >  !pcie_aer_is_native(pdev) && !pcie_ports_dpc_native
> > > > > >  !(pcie_aer_is_native() || pcie_ports_dpc_native)# by De 
> > > > > > Morgan's law
> > > > > > 
> > > > > > so it succeeded if:
> > > > > > 
> > > > > >  pcie_aer_is_native() || pcie_ports_dpc_native
> > > > > > 
> > > > > > Fail dpc_probe() if the device has no AER Capability.
> > > > > > 
> > > > > > Signed-off-by: Bjorn Helgaas 
> > > > > > Cc: Olof Johansson 
> > > > > > ---
> > > > > > drivers/pci/pcie/dpc.c | 3 +++
> > > > > > 1 file changed, 3 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > > > > index e05aba86a317..ed0dbc43d018 100644
> > > > > > --- a/drivers/pci/pcie/dpc.c
> > > > > > +++ b/drivers/pci/pcie/dpc.c
> > > > > > @@ -287,6 +287,9 @@ static int dpc_probe(struct pcie_device *dev)
> > > > > > int status;
> > > > > > u16 ctl, cap;
> > > > > > +   if (!pdev->aer_cap)
> > > > > > +   return -ENOTSUPP;
> > > > > Don't we check aer_cap support in drivers/pci/pcie/portdrv_core.c ?
> > > > > 
> > > > > We don't enable DPC service, if AER service is not enabled. And AER
> > > > > service is only enabled if AER capability is supported.
> > > > > 
> > > > > So dpc_probe() should not happen if AER capability is not supported?
> > > > 
> > > > I don't think that's always true.  If I'm reading this right, we have
> > > > this:
> > > > 
> > > > get_port_device_capability(...)
> > > > {
> > > > #ifdef CONFIG_PCIEAER
> > > >   if (dev->aer_cap && ...)
> > > > services |= PCIE_PORT_SERVICE_AER;
> > > > #endif
> > > > 
> > > >   if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
> > > >   pci_aer_available() &&
> > > >   (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
> > > > services |= PCIE_PORT_SERVICE_DPC;
> > > > }
> > > > 
> > > > and in the case where:
> > > > 
> > > > - CONFIG_PCIEAER=y
> > > > - booted with "pcie_ports=dpc-native" (pcie_ports_dpc_native is 
> > > > true)
> > > > - "dev" has no AER capability
> > > > - "dev" has DPC capability
> > > > 
> > > > I think we do enable PCIE_PORT_SERVICE_DPC.
> > > Got it. But further looking into it, I am wondering whether
> > > we should keep this dependency? Currently we just use it to
> > > dump the error information. Do we need to create dependency
> > > between DPC and AER (which is functionality not dependent) just
> > > to see more details about the error?
> > 
> > That's a good question, but I don't really want to get into the actual
> > operation of the AER and DPC drivers in this series, so maybe
> > something we should explore later.

> In that case, can you move this check to
> drivers/pci/pcie/portdrv_core.c?  I don't see the point of
> distributed checks in both get_port_device_capability() and
> dpc_probe().

I totally agree that these distributed checks are terrible, but my
long-term hope is to get rid of portdrv and handle these "services"
more like we handle other capabilities.  For example, maybe we can
squash dpc_probe() into pci_dpc_init(), so I'd actually like to move
things from get_port_device_capability() into dpc_probe().


Hey dear

2020-11-28 Thread samera ali
Hey dear

Nice to meet you, Am Miss samera I found your email here in google
search and I picked
interest to contact you. I've something very important which I would like
to discuss with you and I would appreciate if you respond back to me
through my email address as to tell you more about me with my
photos, my private email as fellows??  [ asamera...@gmail.com ]

From, samera ali


  1   2   3   4   5   >