date:20200805

Re: [RFC PATCH] dma-iommu: allow devices to set IOVA range dynamically

2020-08-05 Thread Christoph Hellwig

I'm not entirely sure this is the right mechanism.  Can you please
send it along with your intended user so that we can get a better
picture?  Also the export needs to be a _GPL one for any kind of
functionality like this,

Re: [PATCHv2 2/2] hwrng: optee: fix wait use case

2020-08-05 Thread Sumit Garg

On Thu, 6 Aug 2020 at 12:00, Jorge Ramirez-Ortiz, Foundries
 wrote:
>
> On 06/08/20, Sumit Garg wrote:
> > On Thu, 6 Aug 2020 at 02:08, Jorge Ramirez-Ortiz, Foundries
> >  wrote:
> > >
> > > On 05/08/20, Sumit Garg wrote:
> > > > Apologies for my delayed response as I was busy with some other tasks
> > > > along with holidays.
> > >
> > > no pb! was just making sure this wasnt falling through some cracks.
> > >
> > > >
> > > > On Fri, 24 Jul 2020 at 19:53, Jorge Ramirez-Ortiz, Foundries
> > > >  wrote:
> > > > >
> > > > > On 24/07/20, Sumit Garg wrote:
> > > > > > On Thu, 23 Jul 2020 at 14:16, Jorge Ramirez-Ortiz 
> > > > > >  wrote:
> > > > > > >
> > > > > > > The current code waits for data to be available before attempting 
> > > > > > > a
> > > > > > > second read. However the second read would not be executed as the
> > > > > > > while loop exits.
> > > > > > >
> > > > > > > This fix does not wait if all data has been read and reads a 
> > > > > > > second
> > > > > > > time if only partial data was retrieved on the first read.
> > > > > > >
> > > > > > > This fix also does not attempt to read if not data is requested.
> > > > > >
> > > > > > I am not sure how this is possible, can you elaborate?
> > > > >
> > > > > currently, if the user sets max 0, get_optee_rng_data will regardless
> > > > > issuese a call to the secure world requesting 0 bytes from the RNG
> > > > >
> > > >
> > > > This case is already handled by core API: rng_dev_read().
> > >
> > > ah ok good point, you are right
> > > but yeah, there is no consequence to the actual patch.
> > >
> >
> > So, at least you could get rid of the corresponding text from commit 
> > message.
> >
> > > >
> > > > > with this patch, this request is avoided.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Jorge Ramirez-Ortiz 
> > > > > > > ---
> > > > > > >  v2: tidy up the while loop to avoid reading when no data is 
> > > > > > > requested
> > > > > > >
> > > > > > >  drivers/char/hw_random/optee-rng.c | 4 ++--
> > > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/char/hw_random/optee-rng.c 
> > > > > > > b/drivers/char/hw_random/optee-rng.c
> > > > > > > index 5bc4700c4dae..a99d82949981 100644
> > > > > > > --- a/drivers/char/hw_random/optee-rng.c
> > > > > > > +++ b/drivers/char/hw_random/optee-rng.c
> > > > > > > @@ -122,14 +122,14 @@ static int optee_rng_read(struct hwrng 
> > > > > > > *rng, void *buf, size_t max, bool wait)
> > > > > > > if (max > MAX_ENTROPY_REQ_SZ)
> > > > > > > max = MAX_ENTROPY_REQ_SZ;
> > > > > > >
> > > > > > > -   while (read == 0) {
> > > > > > > +   while (read < max) {
> > > > > > > rng_size = get_optee_rng_data(pvt_data, data, 
> > > > > > > (max - read));
> > > > > > >
> > > > > > > data += rng_size;
> > > > > > > read += rng_size;
> > > > > > >
> > > > > > > if (wait && pvt_data->data_rate) {
> > > > > > > -   if (timeout-- == 0)
> > > > > > > +   if ((timeout-- == 0) || (read == max))
> > > > > >
> > > > > > If read == max, would there be any sleep?
> > > > >
> > > > > no but I see no reason why there should be a wait since we already 
> > > > > have
> > > > > all the data that we need; the msleep is only required when we need to
> > > > > wait for the RNG to generate entropy for the number of bytes we are
> > > > > requesting. if we are requesting 0 bytes, the entropy is already
> > > > > available. at leat this is what makes sense to me.
> > > > >
> > > >
> > > > Wouldn't it lead to a call as msleep(0); that means no wait as well?
> > >
> > > I dont understand: there is no reason to wait if read == max and this
> > > patch will not wait: if read == max it calls 'return read'
> > >
> > > am I misunderstanding your point?
> >
> > What I mean is that we shouldn't require this extra check here as
> > there wasn't any wait if read == max with existing implementation too.
>
> um, I am getting confused Sumit
>
> with the exisiting implementation (the one we aim to replace), if 
> get_optee_rng_data reads all the values requested on the first call (ie, read 
> = 0) with wait set to true, the call will wait with msleep(0). Which is 
> unnecessary and waits for a jiffy (ie, the call to msleep 0 will schedule a 
> one jiffy timeout interrruptible)
>
> with this alternative implementation, msleep(0) does not get called.
>
> are we in synch?

Ah, I see msleep(0) also by default schedules timeout for 1 jiffy. So
we are in sync now. Probably you can clarify this in commit message as
well to avoid confusion.

-Sumit

>
> >
> > -Sumit
> >
> > >
> > > >
> > > > -Sumit
> > > >
> > > > >
> > > > > >
> > > > > > -Sumit
> > > > > >
> > > > > > > return read;
> > > > > > > msleep((1000 * (max - read)) / 
> > > > > > > pvt_data->data_rate);
> > > > > > > } else

Re: [PATCH] scsi: ufs: ti-j721e-ufs: Fix error return in ti_j721e_ufs_probe()

2020-08-05 Thread Jing Xiangfeng


Please ignore this patch.
Thanks

On 2020/8/6 14:44, Jing Xiangfeng wrote:

Fix to return error code IS_ERR() from the error handling case instead
of 0.

Fixes: 22617e216331 ("scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime 
changes")
Signed-off-by: Jing Xiangfeng 
---
  drivers/scsi/ufs/ti-j721e-ufs.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/ufs/ti-j721e-ufs.c b/drivers/scsi/ufs/ti-j721e-ufs.c
index 46bb905b4d6a..eafe7c08b0c8 100644
--- a/drivers/scsi/ufs/ti-j721e-ufs.c
+++ b/drivers/scsi/ufs/ti-j721e-ufs.c
@@ -38,6 +38,7 @@ static int ti_j721e_ufs_probe(struct platform_device *pdev)
/* Select MPHY refclk frequency */
clk = devm_clk_get(dev, NULL);
if (IS_ERR(clk)) {
+   ret = IS_ERR(clk);
dev_err(dev, "Cannot claim MPHY clock.\n");
goto clk_err;
}

答复: 答复: 答复: 答复: 答复: 答复: [PATCH] iommu/vt-d:Add support for ACPI device in RMRR

2020-08-05 Thread FelixCui-oc

Hi  baolu,
>Sure. Before that, let me sync my understanding with you. You 
have an acpi namespace device in ANDD table, it also shows up in the device 
scope of a RMRR. 
>Current code doesn't enumerate that device for the RMRR, hence 
iommu_create_device_direct_mappings() doesn't work for this device.

>At the same time, probe_acpi_namespace_devices() doesn't work 
for this device, hence you want to add a home-made
>acpi_device_create_direct_mappings() helper.

Your understanding is right. 
But there is a problem that even if the namespace device in 
rmrr is enumerated in the code, probe_acpi_namespace_devices() also doesn't 
work for this device.
This is because the dev parameter of the 
iommu_create_device_direct_mappings() is not the namespace device in RMRR.
The actual parameter passed in is the namespace device's 
physical node device.
In iommu_create_device_direct_mappings(), the physical node 
device passed in cannot match the namespace device in rmrr->device[],right?
We need acpi_device_create_direct_mappings() helper ? 

In addition, adev->physical_node_list is related to the __HID 
of namespace device reported by the bios.
For example, if the __HID reported by the bios belongs to 
acpi_pnp_device_ids[], adev->physical_node_list has no devices.
So in acpi_device_create_direct_mappings(), I added the case 
that adev->physical_node_list is empty.

Best regards
Felix cui

-邮件原件-
发件人: Lu Baolu  
发送时间: 2020年8月6日 10:36
收件人: FelixCui-oc ; Joerg Roedel ; 
io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org; David Woodhouse 

抄送: baolu...@linux.intel.com; RaymondPang-oc ; 
CobeChen-oc 
主题: Re: 答复: 答复: 答复: 答复: 答复: [PATCH] iommu/vt-d:Add support for ACPI device in 
RMRR

Hi Felix,

On 8/5/20 3:37 PM, FelixCui-oc wrote:
> Hi baolu,
>   Let me talk about why acpi_device_create_direct_mappings() is 
> needed and please tell me if there is an error.

Sure. Before that, let me sync my understanding with you. You have an acpi 
namespace device in ANDD table, it also shows up in the device scope of a RMRR. 
Current code doesn't enumerate that device for the RMRR, hence 
iommu_create_device_direct_mappings() doesn't work for this device.

At the same time, probe_acpi_namespace_devices() doesn't work for this device, 
hence you want to add a home-made
acpi_device_create_direct_mappings() helper.

Did I get it right?

>   In the probe_acpi_namespace_devices() function, only the device 
> in the addev->physical_node_list is probed,
>   but we need to establish identity mapping for the namespace 
> device in RMRR. These are two different devices.

The namespace device has been probed and put in one drhd's device list.
Hence, it should be processed by probe_acpi_namespace_devices(). So the 
question is why there are no devices in addev->physical_node_list?

>   Therefore, the namespace device in RMRR is not mapped in 
> probe_acpi_namespace_devices().
>   acpi_device_create_direct_mappings() is to create direct 
> mappings for namespace devices in RMRR.

Best regards,
baolu

[PATCH v2 net-next] net/tls: allow MSG_CMSG_COMPAT in sendmsg

2020-08-05 Thread Rouven Czerwinski

Trying to use ktls on a system with 32-bit userspace and 64-bit kernel
results in a EOPNOTSUPP message during sendmsg:

  setsockopt(3, SOL_TLS, TLS_TX, …, 40) = 0
  sendmsg(3, …, msg_flags=0}, 0) = -1 EOPNOTSUPP (Operation not supported)

The tls_sw implementation does strict flag checking and does not allow
the MSG_CMSG_COMPAT flag, which is set if the message comes in through
the compat syscall.

This patch adds MSG_CMSG_COMPAT to the flag check to allow the usage of
the TLS SW implementation on systems using the compat syscall path.

Note that the same check is present in the sendmsg path for the TLS
device implementation, however the flag hasn't been added there for lack
of testing hardware.

Signed-off-by: Rouven Czerwinski 
---
 net/tls/tls_sw.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 24f64bc0de18..a332ae123bda 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -935,7 +935,8 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
int ret = 0;
int pending;
 
-   if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL))
+   if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
+  MSG_CMSG_COMPAT))
return -EOPNOTSUPP;
 
mutex_lock(&tls_ctx->tx_lock);

base-commit: c1055b76ad00aed0e8b79417080f212d736246b6
-- 
2.27.0

Re: Minor RST rant

2020-08-05 Thread Christoph Hellwig

On Wed, Aug 05, 2020 at 05:12:30PM +0200, pet...@infradead.org wrote:
> On Wed, Aug 05, 2020 at 04:49:50PM +0200, Vegard Nossum wrote:
> > FWIW, I *really* like how the extra markup renders in a browser, and I
> > don't think I'm the only one.
> 
> The thing is, I write code in a text editor, not a browser. When a
> header file says: read Documentation/foo I do 'gf' and that file gets
> opened in a buffer.
> 
> Needing a browser is a fail.

And that is my main problem with all the RST craze.  It optmizes for
shiny display in a browser, but copletely messed up the typical
developer flow.

linux-next: Signed-off-by missing for commit in the cifs tree

2020-08-05 Thread Stephen Rothwell

Hi all,

Commit

  2676d210d2f4 ("cifs: Fix an error pointer dereference in cifs_mount()")

is missing a Signed-off-by from its author.

This is only pisked up by my script because of the mangling of the email
sender address by the mailing list it passed through.  I guess a little
more care is required to make sure the author attribution is correct in
this case.

-- 
Cheers,
Stephen Rothwell


pgpNF0H63M8Ye.pgp
Description: OpenPGP digital signature

Re: [PATCH] seg6: using DSCP of inner IPv4 packets

2020-08-05 Thread Ahmed Abdelsalam


Hi David,

SRv6 as defined in [1][2] does not mandate that the hop_limit of the 
outer IPv6 header has to be copied from the inner packet.


The only thing that is mandatory is that the hop_limit of the inner 
packet has to be decremented [3]. This complies with the specification 
defined in the Generic Packet Tunneling in IPv6 [4]. This part is 
actually missing in the kernel.


For the hop_limit of the outer IPv6 header, the other SRv6 
implementations [5][6] by default uses the default ipv6 hop_limit. But 
they allow also to use a configurable hop_limit for the outer header.


In conclusion the hop limit behavior in this patch is intentional and in 
my opnion correct.


If you agree I can send two patches to:
- decrement hop_limit of inner packet
- allow a configurable hop limit of outer IPv6 header


[1] https://tools.ietf.org/html/rfc8754
[2] 
https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming-16
[3] 
https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming-16#section-5

[4] https://tools.ietf.org/html/rfc2473#section-3.1
[5]https://github.com/FDio/vpp/blob/8bf80a3ddae7733925a757cb1710e25776eea01c/src/vnet/srv6/sr_policy_rewrite.c#L110
[6] 
https://www.cisco.com/c/en/us/td/docs/routers/asr9000/software/asr9k-r6-6/segment-routing/configuration/guide/b-segment-routing-cg-asr9000-66x/b-segment-routing-cg-asr9000-66x_chapter_011.html#id_94209



On 06/08/2020 02:40, David Miller wrote:

From: Ahmed Abdelsalam 
Date: Tue,  4 Aug 2020 07:40:30 +


This patch allows copying the DSCP from inner IPv4 header to the
outer IPv6 header, when doing SRv6 Encapsulation.

This allows forwarding packet across the SRv6 fabric based on their
original traffic class.

Signed-off-by: Ahmed Abdelsalam 


You have changed the hop limit behavior here and that neither seems
intentional nor correct.

When encapsulating ipv6 inside of ipv6 the inner hop limit should be
inherited.  You should only use the DST hop limit when encapsulating
ipv4.

And that's what the existing code did.

Re: [PATCH 1/2] net: tls: add compat for get/setsockopt

2020-08-05 Thread Rouven Czerwinski

On Wed, 2020-08-05 at 17:45 -0700, David Miller wrote:
> Neither of these patches apply cleanly to net-next.  The compat handling
> and TLS code has been changed quite a bit lately.

Indeed, Patch 1 is no longer required on net-next. I'll drop the patch.

> ALso, you must provide a proper header "[PATCH 0/N] ..." posting for your
> patch series which explains at a high level what your patch series is doing,
> how it is doing it, and why it is doing it that way.

Since I'm now down to one patch I'll forgo the cover letter and expand
the commit message.

Thanks for the explanation,
Rouven

[PATCH v5 1/4] bus: mhi: core: Add helper API to return number of free TREs

2020-08-05 Thread Hemant Kumar

Introduce mhi_get_no_free_descriptors() API to return number
of TREs available to queue buffer. MHI clients can use this
API to know before hand if ring is full without calling queue
API.

Signed-off-by: Hemant Kumar 
---
 drivers/bus/mhi/core/main.c | 12 
 include/linux/mhi.h |  9 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
index 2cff5dd..0599e7d 100644
--- a/drivers/bus/mhi/core/main.c
+++ b/drivers/bus/mhi/core/main.c
@@ -258,6 +258,18 @@ int mhi_destroy_device(struct device *dev, void *data)
return 0;
 }
 
+int mhi_get_no_free_descriptors(struct mhi_device *mhi_dev,
+   enum dma_data_direction dir)
+{
+   struct mhi_controller *mhi_cntrl = mhi_dev->mhi_cntrl;
+   struct mhi_chan *mhi_chan = (dir == DMA_TO_DEVICE) ?
+   mhi_dev->ul_chan : mhi_dev->dl_chan;
+   struct mhi_ring *tre_ring = &mhi_chan->tre_ring;
+
+   return get_nr_avail_ring_elements(mhi_cntrl, tre_ring);
+}
+EXPORT_SYMBOL_GPL(mhi_get_no_free_descriptors);
+
 void mhi_notify(struct mhi_device *mhi_dev, enum mhi_callback cb_reason)
 {
struct mhi_driver *mhi_drv;
diff --git a/include/linux/mhi.h b/include/linux/mhi.h
index a35d876..6565528 100644
--- a/include/linux/mhi.h
+++ b/include/linux/mhi.h
@@ -600,6 +600,15 @@ void mhi_set_mhi_state(struct mhi_controller *mhi_cntrl,
 void mhi_notify(struct mhi_device *mhi_dev, enum mhi_callback cb_reason);
 
 /**
+ * mhi_get_no_free_descriptors - Get transfer ring length
+ * Get # of TD available to queue buffers
+ * @mhi_dev: Device associated with the channels
+ * @dir: Direction of the channel
+ */
+int mhi_get_no_free_descriptors(struct mhi_device *mhi_dev,
+   enum dma_data_direction dir);
+
+/**
  * mhi_prepare_for_power_up - Do pre-initialization before power up.
  *This is optional, call this before power up if
  *the controller does not want bus framework to
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH v5 0/4] user space client interface driver

2020-08-05 Thread Hemant Kumar

V5:
- Removed mhi_uci_drv structure.
- Used idr instead of creating global list of uci devices.
- Used kref instead of local ref counting for uci device and
  open count.
- Removed unlikely macro.

V4:
- Fix locking to protect proper struct members.
- Updated documentation describing uci client driver use cases.
- Fixed uci ref counting in mhi_uci_open for error case.
- Addressed style related review comments.

V3: Added documentation for MHI UCI driver.

V2: Added mutex lock to prevent multiple readers to access same
mhi buffer which can result into use after free.

Hemant Kumar (4):
  bus: mhi: core: Add helper API to return number of free TREs
  bus: mhi: core: Move MHI_MAX_MTU to external header file
  docs: Add documentation for userspace client interface
  bus: mhi: clients: Add userspace client interface driver

 Documentation/mhi/index.rst  |   1 +
 Documentation/mhi/uci.rst|  39 +++
 drivers/bus/mhi/Kconfig  |   6 +
 drivers/bus/mhi/Makefile |   1 +
 drivers/bus/mhi/clients/Kconfig  |  15 +
 drivers/bus/mhi/clients/Makefile |   3 +
 drivers/bus/mhi/clients/uci.c| 662 +++
 drivers/bus/mhi/core/internal.h  |   1 -
 drivers/bus/mhi/core/main.c  |  12 +
 include/linux/mhi.h  |  12 +
 10 files changed, 751 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/mhi/uci.rst
 create mode 100644 drivers/bus/mhi/clients/Kconfig
 create mode 100644 drivers/bus/mhi/clients/Makefile
 create mode 100644 drivers/bus/mhi/clients/uci.c

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH net 2/2] net: initialize fastreuse on inet_inherit_port

2020-08-05 Thread Tim Froidcoeur

In the case of TPROXY, bind_conflict optimizations for SO_REUSEADDR or
SO_REUSEPORT are broken, possibly resulting in O(n) instead of O(1) bind
behaviour or in the incorrect reuse of a bind.

the kernel keeps track for each bind_bucket if all sockets in the
bind_bucket support SO_REUSEADDR or SO_REUSEPORT in two fastreuse flags.
These flags allow skipping the costly bind_conflict check when possible
(meaning when all sockets have the proper SO_REUSE option).

For every socket added to a bind_bucket, these flags need to be updated.
As soon as a socket that does not support reuse is added, the flag is
set to false and will never go back to true, unless the bind_bucket is
deleted.

Note that there is no mechanism to re-evaluate these flags when a socket
is removed (this might make sense when removing a socket that would not
allow reuse; this leaves room for a future patch).

For this optimization to work, it is mandatory that these flags are
properly initialized and updated.

When a child socket is created from a listen socket in
__inet_inherit_port, the TPROXY case could create a new bind bucket
without properly initializing these flags, thus preventing the
optimization to work. Alternatively, a socket not allowing reuse could
be added to an existing bind bucket without updating the flags, causing
bind_conflict to never be called as it should.

Call inet_csk_update_fastreuse when __inet_inherit_port decides to create
a new bind_bucket or use a different bind_bucket than the one of the
listen socket.

Fixes: 093d282321da ("tproxy: fix hash locking issue when using port 
redirection in __inet_inherit_port()")
Acked-by: Matthieu Baerts 
Signed-off-by: Tim Froidcoeur 
---
 net/ipv4/inet_hashtables.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 2bbaaf0c7176..006a34b18537 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -163,6 +163,7 @@ int __inet_inherit_port(const struct sock *sk, struct sock 
*child)
return -ENOMEM;
}
}
+   inet_csk_update_fastreuse(tb, child);
}
inet_bind_hash(child, tb, port);
spin_unlock(&head->lock);
-- 
2.25.1


-- 


Disclaimer: https://www.tessares.net/mail-disclaimer/

[PATCH] scsi: ufs: ti-j721e-ufs: Fix error return in ti_j721e_ufs_probe()

2020-08-05 Thread Jing Xiangfeng

Fix to return error code IS_ERR() from the error handling case instead
of 0.

Fixes: 22617e216331 ("scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime 
changes")
Signed-off-by: Jing Xiangfeng 
---
 drivers/scsi/ufs/ti-j721e-ufs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/ufs/ti-j721e-ufs.c b/drivers/scsi/ufs/ti-j721e-ufs.c
index 46bb905b4d6a..eafe7c08b0c8 100644
--- a/drivers/scsi/ufs/ti-j721e-ufs.c
+++ b/drivers/scsi/ufs/ti-j721e-ufs.c
@@ -38,6 +38,7 @@ static int ti_j721e_ufs_probe(struct platform_device *pdev)
/* Select MPHY refclk frequency */
clk = devm_clk_get(dev, NULL);
if (IS_ERR(clk)) {
+   ret = IS_ERR(clk);
dev_err(dev, "Cannot claim MPHY clock.\n");
goto clk_err;
}
-- 
2.17.1

[PATCH v5 4/4] bus: mhi: clients: Add userspace client interface driver

2020-08-05 Thread Hemant Kumar

This MHI client driver allows userspace clients to transfer
raw data between MHI device and host using standard file operations.
Device file node is created with format

/dev/mhi__

Currently it supports LOOPBACK channel.

Signed-off-by: Hemant Kumar 
---
 drivers/bus/mhi/Kconfig  |   6 +
 drivers/bus/mhi/Makefile |   1 +
 drivers/bus/mhi/clients/Kconfig  |  15 +
 drivers/bus/mhi/clients/Makefile |   3 +
 drivers/bus/mhi/clients/uci.c| 662 +++
 5 files changed, 687 insertions(+)
 create mode 100644 drivers/bus/mhi/clients/Kconfig
 create mode 100644 drivers/bus/mhi/clients/Makefile
 create mode 100644 drivers/bus/mhi/clients/uci.c

diff --git a/drivers/bus/mhi/Kconfig b/drivers/bus/mhi/Kconfig
index 6a217ff..927c392 100644
--- a/drivers/bus/mhi/Kconfig
+++ b/drivers/bus/mhi/Kconfig
@@ -20,3 +20,9 @@ config MHI_BUS_DEBUG
 Enable debugfs support for use with the MHI transport. Allows
 reading and/or modifying some values within the MHI controller
 for debug and test purposes.
+
+if MHI_BUS
+
+source "drivers/bus/mhi/clients/Kconfig"
+
+endif
diff --git a/drivers/bus/mhi/Makefile b/drivers/bus/mhi/Makefile
index 19e6443..48f6028 100644
--- a/drivers/bus/mhi/Makefile
+++ b/drivers/bus/mhi/Makefile
@@ -1,2 +1,3 @@
 # core layer
 obj-y += core/
+obj-y += clients/
diff --git a/drivers/bus/mhi/clients/Kconfig b/drivers/bus/mhi/clients/Kconfig
new file mode 100644
index 000..37aaf51
--- /dev/null
+++ b/drivers/bus/mhi/clients/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+menu "MHI clients support"
+
+config MHI_UCI
+   tristate "MHI UCI"
+   depends on MHI_BUS
+   help
+MHI based userspace client interface driver is used for transferring
+raw data between host and device using standard file operations from
+userspace. Open, read, write, and close operations are supported
+by this driver. Please check mhi_uci_match_table for all supported
+channels that are exposed to userspace.
+
+endmenu
diff --git a/drivers/bus/mhi/clients/Makefile b/drivers/bus/mhi/clients/Makefile
new file mode 100644
index 000..cd34282
--- /dev/null
+++ b/drivers/bus/mhi/clients/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_MHI_UCI) += uci.o
diff --git a/drivers/bus/mhi/clients/uci.c b/drivers/bus/mhi/clients/uci.c
new file mode 100644
index 000..a25d5d0
--- /dev/null
+++ b/drivers/bus/mhi/clients/uci.c
@@ -0,0 +1,662 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2018-2020, The Linux Foundation. All rights reserved.*/
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DEVICE_NAME "mhi"
+#define MHI_UCI_DRIVER_NAME "mhi_uci"
+#define MAX_UCI_MINORS (128)
+
+static DEFINE_IDR(uci_idr);
+static DEFINE_MUTEX(uci_idr_mutex);
+static struct class *uci_dev_class;
+static int uci_dev_major = -1;
+
+/**
+ * struct uci_chan - MHI channel for a uci device
+ * @wq: wait queue for reader/writer
+ * @lock: spin lock
+ * @pending: list of rx buffers userspace is waiting to read
+ * @cur_buf: current buffer userspace is reading
+ * @rx_size: size of the current rx buffer userspace is reading
+ */
+struct uci_chan {
+   wait_queue_head_t wq;
+
+   /* protects pending and cur_buf members */
+   spinlock_t lock;
+
+   struct list_head pending;
+   struct uci_buf *cur_buf;
+   size_t rx_size;
+};
+
+/**
+ * struct uci_buf - uci buffer
+ * @data: data buffer
+ * @len: length of data buffer
+ * @node: list node of the uci buffer
+ */
+struct uci_buf {
+   void *data;
+   size_t len;
+   struct list_head node;
+};
+
+/**
+ * struct uci_dev - MHI uci device
+ * @minor: uci device node minor number
+ * @mhi_dev: associated mhi device object
+ * @chan: MHI channel name
+ * @lock: mutex lock
+ * @ul_chan: uplink uci channel object
+ * @dl_chan: downlink uci channel object
+ * @mtu: max tx buffer length
+ * @actual_mtu: maximum size of incoming buffer
+ * @ref_count: uci_dev reference count
+ * @open_count: device node open count
+ * @enabled: uci device probed
+ */
+struct uci_dev {
+   unsigned int minor;
+   struct mhi_device *mhi_dev;
+   const char *chan;
+
+   /* protects uci_dev struct members */
+   struct mutex lock;
+
+   struct uci_chan ul_chan;
+   struct uci_chan dl_chan;
+   size_t mtu;
+   size_t actual_mtu;
+   struct kref ref_count;
+   struct kref open_count;
+   bool enabled;
+};
+
+static int mhi_queue_inbound(struct uci_dev *udev)
+{
+   struct mhi_device *mhi_dev = udev->mhi_dev;
+   struct device *dev = &mhi_dev->dev;
+   size_t mtu = udev->mtu;
+   size_t actual_mtu = udev->actual_mtu;
+   int nr_trbs, i, ret = -EIO;
+   void *buf;
+   struct uci_buf *uci_buf;
+
+   nr_trbs = mhi_get_no_free_descriptors(mhi_dev, DMA_FROM_DEVICE);
+
+   for (i = 0; i < nr_trbs; i++) {
+   buf = kmalloc(

[PATCH net 0/2] net: initialize fastreuse on inet_inherit_port

2020-08-05 Thread Tim Froidcoeur

In the case of TPROXY, bind_conflict optimizations for SO_REUSEADDR or
SO_REUSEPORT are broken, possibly resulting in O(n) instead of O(1) bind
behaviour or in the incorrect reuse of a bind.

the kernel keeps track for each bind_bucket if all sockets in the
bind_bucket support SO_REUSEADDR or SO_REUSEPORT in two fastreuse flags.
These flags allow skipping the costly bind_conflict check when possible
(meaning when all sockets have the proper SO_REUSE option).

For every socket added to a bind_bucket, these flags need to be updated.
As soon as a socket that does not support reuse is added, the flag is
set to false and will never go back to true, unless the bind_bucket is
deleted.

Note that there is no mechanism to re-evaluate these flags when a socket
is removed (this might make sense when removing a socket that would not
allow reuse; this leaves room for a future patch).

For this optimization to work, it is mandatory that these flags are
properly initialized and updated.

When a child socket is created from a listen socket in
__inet_inherit_port, the TPROXY case could create a new bind bucket
without properly initializing these flags, thus preventing the
optimization to work. Alternatively, a socket not allowing reuse could
be added to an existing bind bucket without updating the flags, causing
bind_conflict to never be called as it should.

Patch 1/2 refactors the fastreuse update code in inet_csk_get_port into a
small helper function, making the actual fix tiny and easier to understand. 

Patch 2/2 calls this new helper when __inet_inherit_port decides to create
a new bind_bucket or use a different bind_bucket than the one of the listen
socket.

Tim Froidcoeur (2):
  net: refactor bind_bucket fastreuse into helper
  net: initialize fastreuse on inet_inherit_port

 include/net/inet_connection_sock.h |  4 ++
 net/ipv4/inet_connection_sock.c| 99 --
 net/ipv4/inet_hashtables.c |  1 +
 3 files changed, 59 insertions(+), 45 deletions(-)

-- 
2.25.1


-- 


Disclaimer: https://www.tessares.net/mail-disclaimer/

[PATCH net 1/2] net: refactor bind_bucket fastreuse into helper

2020-08-05 Thread Tim Froidcoeur

Refactor the fastreuse update code in inet_csk_get_port into a small
helper function that can be called from other places.

Acked-by: Matthieu Baerts 
Signed-off-by: Tim Froidcoeur 
---
 include/net/inet_connection_sock.h |  4 ++
 net/ipv4/inet_connection_sock.c| 99 --
 2 files changed, 58 insertions(+), 45 deletions(-)

diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index e5b388f5fa20..1d59bf55bb4d 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -316,6 +316,10 @@ int inet_csk_compat_getsockopt(struct sock *sk, int level, 
int optname,
 int inet_csk_compat_setsockopt(struct sock *sk, int level, int optname,
   char __user *optval, unsigned int optlen);
 
+/* update the fast reuse flag when adding a socket */
+void inet_csk_update_fastreuse(struct inet_bind_bucket *tb,
+  struct sock *sk);
+
 struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu);
 
 #define TCP_PINGPONG_THRESH3
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index afaf582a5aa9..3b46b1f6086e 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -266,7 +266,7 @@ inet_csk_find_open_port(struct sock *sk, struct 
inet_bind_bucket **tb_ret, int *
 static inline int sk_reuseport_match(struct inet_bind_bucket *tb,
 struct sock *sk)
 {
-   kuid_t uid = sock_i_uid(sk);
+   kuid_t uid = sock_i_uid((struct sock *)sk);
 
if (tb->fastreuseport <= 0)
return 0;
@@ -296,6 +296,57 @@ static inline int sk_reuseport_match(struct 
inet_bind_bucket *tb,
ipv6_only_sock(sk), true, false);
 }
 
+void inet_csk_update_fastreuse(struct inet_bind_bucket *tb,
+  struct sock *sk)
+{
+   kuid_t uid = sock_i_uid((struct sock *)sk);
+   bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
+
+   if (hlist_empty(&tb->owners)) {
+   tb->fastreuse = reuse;
+   if (sk->sk_reuseport) {
+   tb->fastreuseport = FASTREUSEPORT_ANY;
+   tb->fastuid = uid;
+   tb->fast_rcv_saddr = sk->sk_rcv_saddr;
+   tb->fast_ipv6_only = ipv6_only_sock(sk);
+   tb->fast_sk_family = sk->sk_family;
+#if IS_ENABLED(CONFIG_IPV6)
+   tb->fast_v6_rcv_saddr = sk->sk_v6_rcv_saddr;
+#endif
+   } else {
+   tb->fastreuseport = 0;
+   }
+   } else {
+   if (!reuse)
+   tb->fastreuse = 0;
+   if (sk->sk_reuseport) {
+   /* We didn't match or we don't have fastreuseport set on
+* the tb, but we have sk_reuseport set on this socket
+* and we know that there are no bind conflicts with
+* this socket in this tb, so reset our tb's reuseport
+* settings so that any subsequent sockets that match
+* our current socket will be put on the fast path.
+*
+* If we reset we need to set FASTREUSEPORT_STRICT so we
+* do extra checking for all subsequent sk_reuseport
+* socks.
+*/
+   if (!sk_reuseport_match(tb, sk)) {
+   tb->fastreuseport = FASTREUSEPORT_STRICT;
+   tb->fastuid = uid;
+   tb->fast_rcv_saddr = sk->sk_rcv_saddr;
+   tb->fast_ipv6_only = ipv6_only_sock(sk);
+   tb->fast_sk_family = sk->sk_family;
+#if IS_ENABLED(CONFIG_IPV6)
+   tb->fast_v6_rcv_saddr = sk->sk_v6_rcv_saddr;
+#endif
+   }
+   } else {
+   tb->fastreuseport = 0;
+   }
+   }
+}
+
 /* Obtain a reference to a local port for the given sock,
  * if snum is zero it means select any available local port.
  * We try to allocate an odd port (and leave even ports for connect())
@@ -308,7 +359,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
struct inet_bind_hashbucket *head;
struct net *net = sock_net(sk);
struct inet_bind_bucket *tb = NULL;
-   kuid_t uid = sock_i_uid(sk);
int l3mdev;
 
l3mdev = inet_sk_bound_l3mdev(sk);
@@ -345,49 +395,8 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
goto fail_unlock;
}
 success:
-   if (hlist_empty(&tb->owners)) {
-   tb->fastreuse = reuse;
-   if (sk->sk_reuseport) {
-   tb->fastreuseport = FASTREUSEPORT_ANY;
-   tb->fastu

[PATCH v5 3/4] docs: Add documentation for userspace client interface

2020-08-05 Thread Hemant Kumar

MHI userspace client driver is creating device file node
for user application to perform file operations. File
operations are handled by MHI core driver. Currently
Loopback MHI channel is supported by this driver.

Signed-off-by: Hemant Kumar 
---
 Documentation/mhi/index.rst |  1 +
 Documentation/mhi/uci.rst   | 39 +++
 2 files changed, 40 insertions(+)
 create mode 100644 Documentation/mhi/uci.rst

diff --git a/Documentation/mhi/index.rst b/Documentation/mhi/index.rst
index 1d8dec3..c75a371 100644
--- a/Documentation/mhi/index.rst
+++ b/Documentation/mhi/index.rst
@@ -9,6 +9,7 @@ MHI
 
mhi
topology
+   uci
 
 .. only::  subproject and html
 
diff --git a/Documentation/mhi/uci.rst b/Documentation/mhi/uci.rst
new file mode 100644
index 000..5d92939
--- /dev/null
+++ b/Documentation/mhi/uci.rst
@@ -0,0 +1,39 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=
+Userspace Client Interface (UCI)
+=
+
+UCI driver enables userspace clients to communicate to external MHI devices
+like modem and WLAN. It creates standard character device file nodes for user
+space clients to perform open, read, write, poll and close file operations.
+
+Device file node is created with format:-
+
+/dev/mhi__
+
+controller_name is the name of underlying bus used to transfer data.
+mhi_device_name is the name of the MHI channel being used by MHI client in
+userspace to send or receive data using MHI protocol.
+
+There is a separate character device file node created for each channel 
specified
+in mhi device id table. MHI channels are statically defined by MHI 
specification.
+Driver currently supports LOOPBACK channel 0 (Host to device) and 1 (Device to 
Host).
+
+LOOPBACK Channel
+
+
+Userspace MHI client using LOOPBACK channel opens device file node. As part of
+open operation TREs to transfer ring of LOOPBACK channel 1 gets queued and 
channel
+doorbell is rung. When userspace MHI client performs write operation on device 
node,
+data buffer gets queued as a TRE to transfer ring of LOOPBACK channel 0. MHI 
Core
+driver rings the channel doorbell for MHI device to move data over underlying 
bus.
+When userspace MHI client driver performs read operation, same data gets 
looped back
+to MHI host using LOOPBACK channel 1. LOOPBACK channel is used to verify data 
path
+and data integrity between MHI Host and MHI device.
+
+Other Use Cases
+---
+
+Getting MHI device specific diagnostics information to userspace MHI diag 
client
+using DIAG channel 4 (Host to device) and 5 (Device to Host).
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH v5 2/4] bus: mhi: core: Move MHI_MAX_MTU to external header file

2020-08-05 Thread Hemant Kumar

Currently this macro is defined in internal MHI header as
a TRE length mask. Moving it to external header allows MHI
client drivers to set this upper bound for the transmit
buffer size.

Signed-off-by: Hemant Kumar 
---
 drivers/bus/mhi/core/internal.h | 1 -
 include/linux/mhi.h | 3 +++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/bus/mhi/core/internal.h b/drivers/bus/mhi/core/internal.h
index 7989269..4abf0cf 100644
--- a/drivers/bus/mhi/core/internal.h
+++ b/drivers/bus/mhi/core/internal.h
@@ -453,7 +453,6 @@ enum mhi_pm_state {
 #define CMD_EL_PER_RING128
 #define PRIMARY_CMD_RING   0
 #define MHI_DEV_WAKE_DB127
-#define MHI_MAX_MTU0x
 #define MHI_RANDOM_U32_NONZERO(bmsk)   (prandom_u32_max(bmsk) + 1)
 
 enum mhi_er_type {
diff --git a/include/linux/mhi.h b/include/linux/mhi.h
index 6565528..610f3b0 100644
--- a/include/linux/mhi.h
+++ b/include/linux/mhi.h
@@ -16,6 +16,9 @@
 #include 
 #include 
 
+/* MHI client drivers to set this upper bound for tx buffer */
+#define MHI_MAX_MTU 0x
+
 #define MHI_MAX_OEM_PK_HASH_SEGMENTS 16
 
 struct mhi_chan;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH 1/2] leds: is31fl319x: Add sdb pin and generate a 5ms low pulse when startup

2020-08-05 Thread Grant Feng

generate a 5ms low pulse on sdb pin when startup, then the chip
becomes more stable in the complex EM environment.

Signed-off-by: Grant Feng 
---
 drivers/leds/leds-is31fl319x.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/leds/leds-is31fl319x.c b/drivers/leds/leds-is31fl319x.c
index ca6634b8683c..b4f70002cec9 100644
--- a/drivers/leds/leds-is31fl319x.c
+++ b/drivers/leds/leds-is31fl319x.c
@@ -16,6 +16,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /* register numbers */
 #define IS31FL319X_SHUTDOWN0x00
@@ -61,6 +63,7 @@
 struct is31fl319x_chip {
const struct is31fl319x_chipdef *cdef;
struct i2c_client   *client;
+   struct gpio_desc*sdb_pin;
struct regmap   *regmap;
struct mutexlock;
u32 audio_gain_db;
@@ -265,6 +268,15 @@ static int is31fl319x_parse_dt(struct device *dev,
is31->audio_gain_db = min(is31->audio_gain_db,
  IS31FL319X_AUDIO_GAIN_DB_MAX);
 
+   is31->sdb_pin = gpiod_get(dev, "sdb", GPIOD_ASIS);
+   if (IS_ERR(is31->sdb_pin)) {
+   dev_warn(dev, "failed to get gpio_sdb, try default\r\n");
+   } else {
+   gpiod_direction_output(is31->sdb_pin, 0);
+   mdelay(5);
+   gpiod_direction_output(is31->sdb_pin, 1);
+   }
+
return 0;
 
 put_child_node:
-- 
2.17.1

[PATCH 2/2] DT: leds: Add an optional property named 'sdb-gpios'

2020-08-05 Thread Grant Feng

The chip enters hardware shutdown when the SDB pin is pulled low.
The chip releases hardware shutdown when the SDB pin is pulled high.

Signed-off-by: Grant Feng 
---
 Documentation/devicetree/bindings/leds/leds-is31fl319x.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/leds/leds-is31fl319x.txt 
b/Documentation/devicetree/bindings/leds/leds-is31fl319x.txt
index fc2603484544..e8bef4be57dc 100644
--- a/Documentation/devicetree/bindings/leds/leds-is31fl319x.txt
+++ b/Documentation/devicetree/bindings/leds/leds-is31fl319x.txt
@@ -16,6 +16,7 @@ Optional properties:
 - audio-gain-db : audio gain selection for external analog modulation input.
Valid values: 0 - 21, step by 3 (rounded down)
Default: 0
+- sdb-gpios : Specifier of the GPIO connected to SDB pin.
 
 Each led is represented as a sub-node of the issi,is31fl319x device.
 There can be less leds subnodes than the chip can support but not more.
@@ -44,6 +45,7 @@ fancy_leds: leds@65 {
#address-cells = <1>;
#size-cells = <0>;
reg = <0x65>;
+   sdb-gpios = <&gpio0 11 GPIO_ACTIVE_HIGH>;
 
red_aux: led@1 {
label = "red:aux";
-- 
2.17.1

Re: [PATCHv2 2/2] hwrng: optee: fix wait use case

2020-08-05 Thread Jorge Ramirez-Ortiz, Foundries

On 06/08/20, Sumit Garg wrote:
> On Thu, 6 Aug 2020 at 02:08, Jorge Ramirez-Ortiz, Foundries
>  wrote:
> >
> > On 05/08/20, Sumit Garg wrote:
> > > Apologies for my delayed response as I was busy with some other tasks
> > > along with holidays.
> >
> > no pb! was just making sure this wasnt falling through some cracks.
> >
> > >
> > > On Fri, 24 Jul 2020 at 19:53, Jorge Ramirez-Ortiz, Foundries
> > >  wrote:
> > > >
> > > > On 24/07/20, Sumit Garg wrote:
> > > > > On Thu, 23 Jul 2020 at 14:16, Jorge Ramirez-Ortiz 
> > > > >  wrote:
> > > > > >
> > > > > > The current code waits for data to be available before attempting a
> > > > > > second read. However the second read would not be executed as the
> > > > > > while loop exits.
> > > > > >
> > > > > > This fix does not wait if all data has been read and reads a second
> > > > > > time if only partial data was retrieved on the first read.
> > > > > >
> > > > > > This fix also does not attempt to read if not data is requested.
> > > > >
> > > > > I am not sure how this is possible, can you elaborate?
> > > >
> > > > currently, if the user sets max 0, get_optee_rng_data will regardless
> > > > issuese a call to the secure world requesting 0 bytes from the RNG
> > > >
> > >
> > > This case is already handled by core API: rng_dev_read().
> >
> > ah ok good point, you are right
> > but yeah, there is no consequence to the actual patch.
> >
> 
> So, at least you could get rid of the corresponding text from commit message.
> 
> > >
> > > > with this patch, this request is avoided.
> > > >
> > > > >
> > > > > >
> > > > > > Signed-off-by: Jorge Ramirez-Ortiz 
> > > > > > ---
> > > > > >  v2: tidy up the while loop to avoid reading when no data is 
> > > > > > requested
> > > > > >
> > > > > >  drivers/char/hw_random/optee-rng.c | 4 ++--
> > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/char/hw_random/optee-rng.c 
> > > > > > b/drivers/char/hw_random/optee-rng.c
> > > > > > index 5bc4700c4dae..a99d82949981 100644
> > > > > > --- a/drivers/char/hw_random/optee-rng.c
> > > > > > +++ b/drivers/char/hw_random/optee-rng.c
> > > > > > @@ -122,14 +122,14 @@ static int optee_rng_read(struct hwrng *rng, 
> > > > > > void *buf, size_t max, bool wait)
> > > > > > if (max > MAX_ENTROPY_REQ_SZ)
> > > > > > max = MAX_ENTROPY_REQ_SZ;
> > > > > >
> > > > > > -   while (read == 0) {
> > > > > > +   while (read < max) {
> > > > > > rng_size = get_optee_rng_data(pvt_data, data, (max 
> > > > > > - read));
> > > > > >
> > > > > > data += rng_size;
> > > > > > read += rng_size;
> > > > > >
> > > > > > if (wait && pvt_data->data_rate) {
> > > > > > -   if (timeout-- == 0)
> > > > > > +   if ((timeout-- == 0) || (read == max))
> > > > >
> > > > > If read == max, would there be any sleep?
> > > >
> > > > no but I see no reason why there should be a wait since we already have
> > > > all the data that we need; the msleep is only required when we need to
> > > > wait for the RNG to generate entropy for the number of bytes we are
> > > > requesting. if we are requesting 0 bytes, the entropy is already
> > > > available. at leat this is what makes sense to me.
> > > >
> > >
> > > Wouldn't it lead to a call as msleep(0); that means no wait as well?
> >
> > I dont understand: there is no reason to wait if read == max and this
> > patch will not wait: if read == max it calls 'return read'
> >
> > am I misunderstanding your point?
> 
> What I mean is that we shouldn't require this extra check here as
> there wasn't any wait if read == max with existing implementation too.

um, I am getting confused Sumit

with the exisiting implementation (the one we aim to replace), if 
get_optee_rng_data reads all the values requested on the first call (ie, read = 
0) with wait set to true, the call will wait with msleep(0). Which is 
unnecessary and waits for a jiffy (ie, the call to msleep 0 will schedule a one 
jiffy timeout interrruptible)

with this alternative implementation, msleep(0) does not get called.

are we in synch?

> 
> -Sumit
> 
> >
> > >
> > > -Sumit
> > >
> > > >
> > > > >
> > > > > -Sumit
> > > > >
> > > > > > return read;
> > > > > > msleep((1000 * (max - read)) / 
> > > > > > pvt_data->data_rate);
> > > > > > } else {
> > > > > > --
> > > > > > 2.17.1
> > > > > >

Re: [PATCH v3 0/4] add i2c support for mt8192

2020-08-05 Thread Matthias Brugger





On 05/08/2020 23:42, w...@the-dreams.de wrote:

On Wed, Aug 05, 2020 at 06:52:18PM +0800, Qii Wang wrote:

This series are based on 5.8-rc1 and we provide four i2c patches
to support mt8192 SoC.

Main changes compared to v2:
--delete unused I2C_DMA_4G_MODE

Main changes compared to v1:
--modify the commit with access more than 8GB dram
--add Reviewed-by and Acked-by from Yingjoe, Matthias and Rob

Qii Wang (4):
   i2c: mediatek: Add apdma sync in i2c driver
   i2c: mediatek: Add access to more than 8GB dram in i2c driver
   dt-bindings: i2c: update bindings for MT8192 SoC
   i2c: mediatek: Add i2c compatible for MediaTek MT8192

  .../devicetree/bindings/i2c/i2c-mt65xx.txt |  1 +


Applied to for-next, thanks!

Sidenote: I get these warnings when compiling the driver:

drivers/i2c/busses/i2c-mt65xx.c:267: warning: Function parameter or member 
'min_low_ns' not described in 'i2c_spec_values'
drivers/i2c/busses/i2c-mt65xx.c:267: warning: Function parameter or member 
'min_high_ns' not described in 'i2c_spec_values'
drivers/i2c/busses/i2c-mt65xx.c:267: warning: Function parameter or member 
'min_su_sta_ns' not described in 'i2c_spec_values'
drivers/i2c/busses/i2c-mt65xx.c:267: warning: Function parameter or member 
'max_hd_dat_ns' not described in 'i2c_spec_values'
drivers/i2c/busses/i2c-mt65xx.c:267: warning: Function parameter or member 
'min_su_dat_ns' not described in 'i2c_spec_values'

Is someone interested to fix these?



I just send a fix for that.

Regards,
Matthias

Re: [PATCH v6 2/4] dmaengine: mediatek-cqdma: remove redundant queue structure

2020-08-05 Thread EastL

On Mon, 2020-07-27 at 15:14 +0530, Vinod Koul wrote:
> On 23-07-20, 10:34, EastL wrote:
> > On Wed, 2020-07-15 at 11:49 +0530, Vinod Koul wrote:
> > > On 02-07-20, 15:06, EastL Lee wrote:
> > > 
> > > >  static enum dma_status mtk_cqdma_tx_status(struct dma_chan *c,
> > > >dma_cookie_t cookie,
> > > >struct dma_tx_state *txstate)
> > > >  {
> > > > -   struct mtk_cqdma_vchan *cvc = to_cqdma_vchan(c);
> > > > -   struct mtk_cqdma_vdesc *cvd;
> > > > -   struct virt_dma_desc *vd;
> > > > -   enum dma_status ret;
> > > > -   unsigned long flags;
> > > > -   size_t bytes = 0;
> > > > -
> > > > -   ret = dma_cookie_status(c, cookie, txstate);
> > > > -   if (ret == DMA_COMPLETE || !txstate)
> > > > -   return ret;
> > > > -
> > > > -   spin_lock_irqsave(&cvc->vc.lock, flags);
> > > > -   vd = mtk_cqdma_find_active_desc(c, cookie);
> > > > -   spin_unlock_irqrestore(&cvc->vc.lock, flags);
> > > > -
> > > > -   if (vd) {
> > > > -   cvd = to_cqdma_vdesc(vd);
> > > > -   bytes = cvd->residue;
> > > > -   }
> > > > -
> > > > -   dma_set_residue(txstate, bytes);
> > > 
> > > any reason why you want to remove setting residue?
> > Because Mediatek CQDMA HW can't support residue.
> 
> And previously it did?
No, It was calculated by sw before.
We found that the residue was not necessary, so we removed it.

[PATCH V3] venus: core: add shutdown callback for venus

2020-08-05 Thread Mansur Alisha Shaik

After the SMMU translation is disabled in the
arm-smmu shutdown callback during reboot, if
any subsystem are still alive then IOVAs they
are using will become PAs on bus, which may
lead to crash.

Below are the consumers of smmu from venus
arm-smmu: consumer: aa0.video-codec supplier=1500.iommu
arm-smmu: consumer: video-firmware.0 supplier=1500.iommu

So implemented shutdown callback, which detach iommu maps.

Signed-off-by: Mansur Alisha Shaik 
---
Changes in V3:
- Fix build errors

 drivers/media/platform/qcom/venus/core.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/media/platform/qcom/venus/core.c 
b/drivers/media/platform/qcom/venus/core.c
index 203c653..cfe211a 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -341,6 +341,16 @@ static int venus_remove(struct platform_device *pdev)
return ret;
 }
 
+static void venus_core_shutdown(struct platform_device *pdev)
+{
+   struct venus_core *core = platform_get_drvdata(pdev);
+   int ret;
+
+   ret = venus_remove(pdev);
+   if (ret)
+   dev_warn(core->dev, "shutdown failed %d\n", ret);
+}
+
 static __maybe_unused int venus_runtime_suspend(struct device *dev)
 {
struct venus_core *core = dev_get_drvdata(dev);
@@ -592,6 +602,7 @@ static struct platform_driver qcom_venus_driver = {
.of_match_table = venus_dt_match,
.pm = &venus_pm_ops,
},
+   .shutdown = venus_core_shutdown,
 };
 module_platform_driver(qcom_venus_driver);
 
-- 
2.7.4

linux-next: Tree for Aug 6

2020-08-05 Thread Stephen Rothwell

Hi all,

News: The merge window has opened, so please do not add any v5.10
related material to your linux-next included branches until after the
merge window closes again.

Changes since 20200805:

My fixes tree contains:

  dbf24e30ce2e ("device_cgroup: Fix RCU list debugging warning")

Non-merge commits (relative to Linus' tree): 8844
 9336 files changed, 508184 insertions(+), 169064 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig and htmldocs. And finally, a simple boot test
of the powerpc pseries_le_defconfig kernel in qemu (with and without
kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 328 trees (counting Linus' and 85 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (fffe3ae0ee84 Merge tag 'for-linus-hmm' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma)
Merging fixes/master (dbf24e30ce2e device_cgroup: Fix RCU list debugging 
warning)
Merging kbuild-current/fixes (bcf876870b95 Linux 5.8)
Merging arc-current/for-curr (11ba468877bb Linux 5.8-rc5)
Merging arm-current/fixes (5c6360ee4a0e ARM: 8988/1: mmu: fix crash in EFI 
calls due to p4d typo in create_mapping_late())
Merging arm64-fixes/for-next/fixes (6a7389f0312f MAINTAINERS: Include drivers 
subdirs for ARM PMU PROFILING AND DEBUGGING entry)
Merging arm-soc-fixes/arm/fixes (fe1d899f4212 ARM: dts: keystone-k2g-evm: fix 
rgmii phy-mode for ksz9031 phy)
Merging uniphier-fixes/fixes (48778464bb7d Linux 5.8-rc2)
Merging drivers-memory-fixes/fixes (b3a9e3b9622a Linux 5.8-rc1)
Merging m68k-current/for-linus (382f429bb559 m68k: defconfig: Update defconfigs 
for v5.8-rc3)
Merging powerpc-fixes/fixes (bcf876870b95 Linux 5.8)
Merging s390-fixes/fixes (8e911bd8afe0 s390/test_unwind: fix possible memleak 
in test_unwind())
Merging sparc/master (17ec0a17e90f sparc: Use fallthrough pseudo-keyword)
Merging fscrypt-current/for-stable (2b4eae95c736 fscrypt: don't evict dirty 
inodes after removing key)
Merging net/master (ac3a0c847296 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net)
Merging bpf/master (ac3a0c847296 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net)
Merging ipsec/master (ac3a0c847296 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net)
Merging netfilter/master (4203b19c2796 netfilter: flowtable: Set offload 
timeout when adding flow)
Merging ipvs/master (eadede5f9362 Merge branch 'hns3-fixes')
Merging wireless-drivers/master (1cfd3426ef98 ath10k: Fix NULL pointer 
dereference in AHB device probe)
Merging mac80211/master (ac3a0c847296 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net)
Merging rdma-fixes/for-rc (bcf876870b95 Linux 5.8)
Merging sound-current/for-linus (ed4d0a4aaf56 ALSA: hda/tegra: Add 100us dma 
stop delay)
Merging sound-asoc-fixes/for-linus (68122177749a Merge remote-tracking branch 
'asoc/for-5.9' into asoc-linus)
Merging regmap-fixes/for-linus (2b0f61e27f75 Merge remote-tracking branch 
'regmap/for-5.8' into regmap-linus)
Merging regulator-fixes/for-linus (e30c06f230a9 Merge remote-tracking branch 
'regulator/for-5.9' into regulator-linus)
Merging spi-fixes/for-linus (cdce7131f268 Merge remote-tracking branch 
'spi/for-5.9' into spi-linus)
Merging pci-current/for-linus (b361663c5a40 PCI/ASPM: Disable ASPM on ASMedia 
ASM1083/1085 PCIe-to-PCI bridge)
Merging driver-core.current/driver-core-linus (92ed30

Re: [PATCH 0/2] perf: Allow closing siblings' file descriptors

2020-08-05 Thread Adrian Hunter

On 8/07/20 6:16 pm, Alexander Shishkin wrote:
> Hi guys,
> 
> I've been looking at reducing the number of open file descriptors per perf
> session. If we retain one descriptor per event, in a large group they add
> up. At the same time, we're not actually using them for anything after the
> SET_OUTPUT and maybe SET_FILTER ioctls. So, this series is a stab at that.

I am wondering if instead we should be looking at creating a kernel API that
allows associating a multitude of tracepoints with a single event.  Thoughts
anyone?

Re: [PATCHv2 2/2] hwrng: optee: fix wait use case

2020-08-05 Thread Sumit Garg

On Thu, 6 Aug 2020 at 02:08, Jorge Ramirez-Ortiz, Foundries
 wrote:
>
> On 05/08/20, Sumit Garg wrote:
> > Apologies for my delayed response as I was busy with some other tasks
> > along with holidays.
>
> no pb! was just making sure this wasnt falling through some cracks.
>
> >
> > On Fri, 24 Jul 2020 at 19:53, Jorge Ramirez-Ortiz, Foundries
> >  wrote:
> > >
> > > On 24/07/20, Sumit Garg wrote:
> > > > On Thu, 23 Jul 2020 at 14:16, Jorge Ramirez-Ortiz  
> > > > wrote:
> > > > >
> > > > > The current code waits for data to be available before attempting a
> > > > > second read. However the second read would not be executed as the
> > > > > while loop exits.
> > > > >
> > > > > This fix does not wait if all data has been read and reads a second
> > > > > time if only partial data was retrieved on the first read.
> > > > >
> > > > > This fix also does not attempt to read if not data is requested.
> > > >
> > > > I am not sure how this is possible, can you elaborate?
> > >
> > > currently, if the user sets max 0, get_optee_rng_data will regardless
> > > issuese a call to the secure world requesting 0 bytes from the RNG
> > >
> >
> > This case is already handled by core API: rng_dev_read().
>
> ah ok good point, you are right
> but yeah, there is no consequence to the actual patch.
>

So, at least you could get rid of the corresponding text from commit message.

> >
> > > with this patch, this request is avoided.
> > >
> > > >
> > > > >
> > > > > Signed-off-by: Jorge Ramirez-Ortiz 
> > > > > ---
> > > > >  v2: tidy up the while loop to avoid reading when no data is requested
> > > > >
> > > > >  drivers/char/hw_random/optee-rng.c | 4 ++--
> > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/char/hw_random/optee-rng.c 
> > > > > b/drivers/char/hw_random/optee-rng.c
> > > > > index 5bc4700c4dae..a99d82949981 100644
> > > > > --- a/drivers/char/hw_random/optee-rng.c
> > > > > +++ b/drivers/char/hw_random/optee-rng.c
> > > > > @@ -122,14 +122,14 @@ static int optee_rng_read(struct hwrng *rng, 
> > > > > void *buf, size_t max, bool wait)
> > > > > if (max > MAX_ENTROPY_REQ_SZ)
> > > > > max = MAX_ENTROPY_REQ_SZ;
> > > > >
> > > > > -   while (read == 0) {
> > > > > +   while (read < max) {
> > > > > rng_size = get_optee_rng_data(pvt_data, data, (max - 
> > > > > read));
> > > > >
> > > > > data += rng_size;
> > > > > read += rng_size;
> > > > >
> > > > > if (wait && pvt_data->data_rate) {
> > > > > -   if (timeout-- == 0)
> > > > > +   if ((timeout-- == 0) || (read == max))
> > > >
> > > > If read == max, would there be any sleep?
> > >
> > > no but I see no reason why there should be a wait since we already have
> > > all the data that we need; the msleep is only required when we need to
> > > wait for the RNG to generate entropy for the number of bytes we are
> > > requesting. if we are requesting 0 bytes, the entropy is already
> > > available. at leat this is what makes sense to me.
> > >
> >
> > Wouldn't it lead to a call as msleep(0); that means no wait as well?
>
> I dont understand: there is no reason to wait if read == max and this
> patch will not wait: if read == max it calls 'return read'
>
> am I misunderstanding your point?

What I mean is that we shouldn't require this extra check here as
there wasn't any wait if read == max with existing implementation too.

-Sumit

>
> >
> > -Sumit
> >
> > >
> > > >
> > > > -Sumit
> > > >
> > > > > return read;
> > > > > msleep((1000 * (max - read)) / 
> > > > > pvt_data->data_rate);
> > > > > } else {
> > > > > --
> > > > > 2.17.1
> > > > >

Re: [PATCH v3 3/4] fpga: dfl: create a dfl bus type to support DFL devices

2020-08-05 Thread Xu Yilun

On Wed, Aug 05, 2020 at 06:29:11PM +0800, Wu, Hao wrote:
> > Subject: [PATCH v3 3/4] fpga: dfl: create a dfl bus type to support DFL 
> > devices
> >
> > A new bus type "dfl" is introduced for private features which are not
> > initialized by DFL feature drivers (dfl-fme & dfl-afu drivers). So these
> > private features could be handled by separate driver modules.
> >
> > DFL feature drivers (dfl-fme, dfl-port) will create DFL devices on
> > enumeration. DFL drivers could be registered on this bus to match these
> > DFL devices. They are matched by dfl type & feature_id.
> >
> > Signed-off-by: Xu Yilun 
> > Signed-off-by: Wu Hao 
> > Signed-off-by: Matthew Gerlach 
> > Signed-off-by: Russ Weight 
> > Reviewed-by: Tom Rix 
> > ---
> > v2: change the bus uevent format.
> > change the dfl device's sysfs name format.
> > refactor dfl_dev_add().
> > minor fixes for comments from Hao and Tom.
> > v3: no change.
> > ---
> >  Documentation/ABI/testing/sysfs-bus-dfl |  15 ++
> >  drivers/fpga/dfl.c  | 254 
> > +++-
> >  drivers/fpga/dfl.h  |  84 +++
> >  3 files changed, 345 insertions(+), 8 deletions(-)
> >  create mode 100644 Documentation/ABI/testing/sysfs-bus-dfl
> >
> > diff --git a/Documentation/ABI/testing/sysfs-bus-dfl
> > b/Documentation/ABI/testing/sysfs-bus-dfl
> > new file mode 100644
> > index 000..b1eea30
> > --- /dev/null
> > +++ b/Documentation/ABI/testing/sysfs-bus-dfl
> > @@ -0,0 +1,15 @@
> > +What:/sys/bus/dfl/devices/.../type
> 
> So it will be clear, that it's always dfl_dev.X/type here?
> 
> > +Date:July 2020
> > +KernelVersion:5.9
> 
> Same as the other patches.
> 
> > +Contact:Xu Yilun 
> > +Description:Read-only. It returns type of DFL FIU of the device. Now DFL
> > +supports 2 FIU types, 0 for FME, 1 for PORT.
> > +Format: 0x%x
> > +
> > +What:/sys/bus/dfl/devices/.../feature_id
> 
> Same?
> 
> > +Date:July 2020
> > +KernelVersion:5.9
> 
> Ditto
> 
> > +Contact:Xu Yilun 
> > +Description:Read-only. It returns feature identifier local to its DFL FIU
> > +type.
> > +Format: 0x%x
> > diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
> > index c649239..978d182 100644
> > --- a/drivers/fpga/dfl.c
> > +++ b/drivers/fpga/dfl.c
> > @@ -30,12 +30,6 @@ static DEFINE_MUTEX(dfl_id_mutex);
> >   * index to dfl_chardevs table. If no chardev support just set devt_type
> >   * as one invalid index (DFL_FPGA_DEVT_MAX).
> >   */
> > -enum dfl_id_type {
> > -FME_ID,/* fme id allocation and mapping */
> > -PORT_ID,/* port id allocation and mapping */
> > -DFL_ID_MAX,
> > -};
> > -
> >  enum dfl_fpga_devt_type {
> >  DFL_FPGA_DEVT_FME,
> >  DFL_FPGA_DEVT_PORT,
> > @@ -250,6 +244,236 @@ int dfl_fpga_check_port_id(struct platform_device
> > *pdev, void *pport_id)
> >  }
> >  EXPORT_SYMBOL_GPL(dfl_fpga_check_port_id);
> >
> > +static DEFINE_IDA(dfl_device_ida);
> > +
> > +static const struct dfl_device_id *
> > +dfl_match_one_device(const struct dfl_device_id *id, struct dfl_device
> > *ddev)
> > +{
> > +if (id->type == ddev->type && id->feature_id == ddev->feature_id)
> > +return id;
> > +
> > +return NULL;
> > +}
> > +
> > +static int dfl_bus_match(struct device *dev, struct device_driver *drv)
> > +{
> > +struct dfl_device *ddev = to_dfl_dev(dev);
> > +struct dfl_driver *ddrv = to_dfl_drv(drv);
> > +const struct dfl_device_id *id_entry = ddrv->id_table;
> > +
> > +if (id_entry) {
> > +while (id_entry->feature_id) {
> > +if (dfl_match_one_device(id_entry, ddev)) {
> > +ddev->id_entry = id_entry;
> > +return 1;
> > +}
> > +id_entry++;
> > +}
> > +}
> > +
> > +return 0;
> > +}
> > +
> > +static int dfl_bus_probe(struct device *dev)
> > +{
> > +struct dfl_device *ddev = to_dfl_dev(dev);
> > +struct dfl_driver *ddrv = to_dfl_drv(dev->driver);
> > +
> > +return ddrv->probe(ddev);
> > +}
> > +
> > +static int dfl_bus_remove(struct device *dev)
> > +{
> > +struct dfl_device *ddev = to_dfl_dev(dev);
> > +struct dfl_driver *ddrv = to_dfl_drv(dev->driver);
> > +
> > +if (ddrv->remove)
> > +ddrv->remove(ddev);
> > +
> > +return 0;
> > +}
> > +
> > +static int dfl_bus_uevent(struct device *dev, struct kobj_uevent_env *env)
> > +{
> > +struct dfl_device *ddev = to_dfl_dev(dev);
> > +
> > +return add_uevent_var(env, "MODALIAS=dfl:t%08Xf%04X",
> > +  ddev->type, ddev->feature_id);
> 
> Then we only print 12bit of feature_id will be enough?
> should we make type shorter as well as feature id?

I could envision that we need a struct

 struct dfl_feature_id {
u16 id: 12;
 }

for it.

But it seems more complex and I didn't see the benifit. We don't have to
worry about the invalid values cause we parse all the ddev->feature_id in
dfl driver, and ensures it will not be larger than 12bit value.

> And do you think if we should add a new field for dfl version?

I think it may not be necessary now. If we support dfl v1 in future, we
still could try to check uuid first, then fall back to type &
feature_id.

Do you have any idea fo

fs/fuse/virtio_fs.c:1009:6: warning: Variable 'err' is reassigned a value before the old one has been used.

2020-08-05 Thread kernel test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   fffe3ae0ee84e25d2befe2ae59bc32aa2b6bc77b
commit: a62a8ef9d97da23762a588592c8b8eb50a8deb6a virtio-fs: add virtiofs 
filesystem
date:   11 months ago
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


cppcheck warnings: (new ones prefixed by >>)

>> fs/fuse/virtio_fs.c:1009:6: warning: Variable 'err' is reassigned a value 
>> before the old one has been used. [redundantAssignment]
err = -ENOMEM;
^
   fs/fuse/virtio_fs.c:1003:6: note: Variable 'err' is reassigned a value 
before the old one has been used.
err = -EINVAL;
^
   fs/fuse/virtio_fs.c:1009:6: note: Variable 'err' is reassigned a value 
before the old one has been used.
err = -ENOMEM;
^
   fs/fuse/virtio_fs.c:1020:6: warning: Variable 'err' is reassigned a value 
before the old one has been used. [redundantAssignment]
err = fuse_fill_super_common(sb, &ctx);
^
   fs/fuse/virtio_fs.c:1009:6: note: Variable 'err' is reassigned a value 
before the old one has been used.
err = -ENOMEM;
^
   fs/fuse/virtio_fs.c:1020:6: note: Variable 'err' is reassigned a value 
before the old one has been used.
err = fuse_fill_super_common(sb, &ctx);
^

vim +/err +1009 fs/fuse/virtio_fs.c

   979  
   980  static int virtio_fs_fill_super(struct super_block *sb)
   981  {
   982  struct fuse_conn *fc = get_fuse_conn_super(sb);
   983  struct virtio_fs *fs = fc->iq.priv;
   984  unsigned int i;
   985  int err;
   986  struct fuse_fs_context ctx = {
   987  .rootmode = S_IFDIR,
   988  .default_permissions = 1,
   989  .allow_other = 1,
   990  .max_read = UINT_MAX,
   991  .blksize = 512,
   992  .destroy = true,
   993  .no_control = true,
   994  .no_force_umount = true,
   995  };
   996  
   997  mutex_lock(&virtio_fs_mutex);
   998  
   999  /* After holding mutex, make sure virtiofs device is still 
there.
  1000   * Though we are holding a reference to it, drive ->remove might
  1001   * still have cleaned up virtual queues. In that case bail out.
  1002   */
  1003  err = -EINVAL;
  1004  if (list_empty(&fs->list)) {
  1005  pr_info("virtio-fs: tag <%s> not found\n", fs->tag);
  1006  goto err;
  1007  }
  1008  
> 1009  err = -ENOMEM;
  1010  /* Allocate fuse_dev for hiprio and notification queues */
  1011  for (i = 0; i < VQ_REQUEST; i++) {
  1012  struct virtio_fs_vq *fsvq = &fs->vqs[i];
  1013  
  1014  fsvq->fud = fuse_dev_alloc();
  1015  if (!fsvq->fud)
  1016  goto err_free_fuse_devs;
  1017  }
  1018  
  1019  ctx.fudptr = (void **)&fs->vqs[VQ_REQUEST].fud;
  1020  err = fuse_fill_super_common(sb, &ctx);
  1021  if (err < 0)
  1022  goto err_free_fuse_devs;
  1023  
  1024  fc = fs->vqs[VQ_REQUEST].fud->fc;
  1025  
  1026  for (i = 0; i < fs->nvqs; i++) {
  1027  struct virtio_fs_vq *fsvq = &fs->vqs[i];
  1028  
  1029  if (i == VQ_REQUEST)
  1030  continue; /* already initialized */
  1031  fuse_dev_install(fsvq->fud, fc);
  1032  }
  1033  
  1034  /* Previous unmount will stop all queues. Start these again */
  1035  virtio_fs_start_all_queues(fs);
  1036  fuse_send_init(fc);
  1037  mutex_unlock(&virtio_fs_mutex);
  1038  return 0;
  1039  
  1040  err_free_fuse_devs:
  1041  virtio_fs_free_devs(fs);
  1042  err:
  1043  mutex_unlock(&virtio_fs_mutex);
  1044  return err;
  1045  }
  1046  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

[PATCH] powerpc/32s: Fix assembler warning about r0

2020-08-05 Thread Christophe Leroy

The assembler says:
  arch/powerpc/kernel/head_32.S:1095: Warning: invalid register expression

It's objecting to the use of r0 as the RA argument. That's because
when RA = 0 the literal value 0 is used, rather than the content of
r0, making the use of r0 in the source potentially confusing.

Fix it to use a literal 0, the generated code is identical.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index f3ab94d73936..5624db0e09a1 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -1092,7 +1092,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
 */
lis r5, abatron_pteptrs@h
ori r5, r5, abatron_pteptrs@l
-   stw r5, 0xf0(r0)/* This much match your Abatron config */
+   stw r5, 0xf0(0) /* This much match your Abatron config */
lis r6, swapper_pg_dir@h
ori r6, r6, swapper_pg_dir@l
tophys(r5, r5)
-- 
2.25.0

Re: [PATCH v1 01/12] fbdev: gxfb: use generic power management

2020-08-05 Thread Vaibhav Gupta

On Wed, Aug 05, 2020 at 03:19:01PM -0500, Bjorn Helgaas wrote:
> On Wed, Aug 05, 2020 at 11:37:11PM +0530, Vaibhav Gupta wrote:
> > Drivers using legacy power management .suspen()/.resume() callbacks
> > have to manage PCI states and device's PM states themselves. They also
> > need to take care of standard configuration registers.
> 
> s/using legacy/using legacy PCI/
> s/.suspen/.suspend/ (in all these patches)
> 
Oh, that's a blunder. Since most of the drivers in my project need similar
changes, I made a template for commit message. And by mistake I would have
edited the template itself.
> I wouldn't necessarily repost the whole series just for that (unless
> the maintainer wants it), but maybe update your branch so if you have
> occasion to repost for other reasons, this will be fixed.
> 
> This particular driver actually doesn't *do* any of the PCI state or
> device PM state management you mention.  And I don't see the "single
> 'struct dev_pm_ops'" you mention below -- I thought that meant you
> would have a single struct shared between drivers (I think you did
> that for IDE?), but that's not what you're doing.  This driver has
> gxfb_pm_ops, the next has lxfb_pm_ops, etc.
> 
Yeah, the sentence sounds misleading. What I meant was that earlier there
were two pointers for PM, .suspend and .resume. Whereas now there is a single
"struct dev_pm_ops" variable inside pci_driver.
> AFAICT the patches are fine, but the commit logs don't seem exactly
> accurate.
> 
I am fixing it.

Thanks
Vaibhav Gupta
> > Switch to generic power management framework using a single
> > "struct dev_pm_ops" variable to take the unnecessary load from the driver.
> > This also avoids the need for the driver to directly call most of the PCI
> > helper functions and device power state control functions, as through
> > the generic framework PCI Core takes care of the necessary operations,
> > and drivers are required to do only device-specific jobs.
> >
> > Signed-off-by: Vaibhav Gupta 
> > ---
> >  drivers/video/fbdev/geode/gxfb.h   |  5 
> >  drivers/video/fbdev/geode/gxfb_core.c  | 36 ++
> >  drivers/video/fbdev/geode/suspend_gx.c |  4 ---
> >  3 files changed, 20 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/video/fbdev/geode/gxfb.h 
> > b/drivers/video/fbdev/geode/gxfb.h
> > index d2e9c5c8e294..792c111c21e4 100644
> > --- a/drivers/video/fbdev/geode/gxfb.h
> > +++ b/drivers/video/fbdev/geode/gxfb.h
> > @@ -21,7 +21,6 @@ struct gxfb_par {
> > void __iomem *dc_regs;
> > void __iomem *vid_regs;
> > void __iomem *gp_regs;
> > -#ifdef CONFIG_PM
> > int powered_down;
> >  
> > /* register state, for power management functionality */
> > @@ -36,7 +35,6 @@ struct gxfb_par {
> > uint64_t fp[FP_REG_COUNT];
> >  
> > uint32_t pal[DC_PAL_COUNT];
> > -#endif
> >  };
> >  
> >  unsigned int gx_frame_buffer_size(void);
> > @@ -49,11 +47,8 @@ void gx_set_dclk_frequency(struct fb_info *info);
> >  void gx_configure_display(struct fb_info *info);
> >  int gx_blank_display(struct fb_info *info, int blank_mode);
> >  
> > -#ifdef CONFIG_PM
> >  int gx_powerdown(struct fb_info *info);
> >  int gx_powerup(struct fb_info *info);
> > -#endif
> > -
> >  
> >  /* Graphics Processor registers (table 6-23 from the data book) */
> >  enum gp_registers {
> > diff --git a/drivers/video/fbdev/geode/gxfb_core.c 
> > b/drivers/video/fbdev/geode/gxfb_core.c
> > index d38a148d4746..44089b331f91 100644
> > --- a/drivers/video/fbdev/geode/gxfb_core.c
> > +++ b/drivers/video/fbdev/geode/gxfb_core.c
> > @@ -322,17 +322,14 @@ static struct fb_info *gxfb_init_fbinfo(struct device 
> > *dev)
> > return info;
> >  }
> >  
> > -#ifdef CONFIG_PM
> > -static int gxfb_suspend(struct pci_dev *pdev, pm_message_t state)
> > +static int __maybe_unused gxfb_suspend(struct device *dev)
> >  {
> > -   struct fb_info *info = pci_get_drvdata(pdev);
> > +   struct fb_info *info = dev_get_drvdata(dev);
> >  
> > -   if (state.event == PM_EVENT_SUSPEND) {
> > -   console_lock();
> > -   gx_powerdown(info);
> > -   fb_set_suspend(info, 1);
> > -   console_unlock();
> > -   }
> > +   console_lock();
> > +   gx_powerdown(info);
> > +   fb_set_suspend(info, 1);
> > +   console_unlock();
> >  
> > /* there's no point in setting PCI states; we emulate PCI, so
> >  * we don't end up getting power savings anyways */
> > @@ -340,9 +337,9 @@ static int gxfb_suspend(struct pci_dev *pdev, 
> > pm_message_t state)
> > return 0;
> >  }
> >  
> > -static int gxfb_resume(struct pci_dev *pdev)
> > +static int __maybe_unused gxfb_resume(struct device *dev)
> >  {
> > -   struct fb_info *info = pci_get_drvdata(pdev);
> > +   struct fb_info *info = dev_get_drvdata(dev);
> > int ret;
> >  
> > console_lock();
> > @@ -356,7 +353,6 @@ static int gxfb_resume(struct pci_dev *pdev)
> > console_unlock();
> > return 0;
> >  }
> > -#endif
> >  
> >  static int gxfb_probe(struct

Re: [PATCH v2 03/24] virtio: allow virtioXX, leXX in config space

2020-08-05 Thread Michael S. Tsirkin

On Thu, Aug 06, 2020 at 11:37:38AM +0800, Jason Wang wrote:
> 
> On 2020/8/5 下午7:45, Michael S. Tsirkin wrote:
> > > >#define virtio_cread(vdev, structname, member, ptr)  
> > > > \
> > > > do {
> > > > \
> > > > might_sleep();  
> > > > \
> > > > /* Must match the member's type, and be integer */  
> > > > \
> > > > -   if (!typecheck(typeofstructname*)0)->member)), 
> > > > *(ptr))) \
> > > > +   if (!__virtio_typecheck(structname, member, *(ptr)))
> > > > \
> > > > (*ptr) = 1; 
> > > > \
> > > A silly question,  compare to using set()/get() directly, what's the value
> > > of the accessors macro here?
> > > 
> > > Thanks
> > get/set don't convert to the native endian, I guess that's why
> > drivers use cread/cwrite. It is also nice that there's type
> > safety, checking the correct integer width is used.
> 
> 
> Yes, but this is simply because a macro is used here, how about just doing
> things similar like virtio_cread_bytes():
> 
> static inline void virtio_cread(struct virtio_device *vdev,
>                   unsigned int offset,
>                   void *buf, size_t len)
> 
> 
> And do the endian conversion inside?
> 
> Thanks
> 

Then you lose type safety. It's very easy to have an le32 field
and try to read it into a u16 by mistake.

These macros are all about preventing bugs: and the whole patchset
is about several bugs sparse found - that is what prompted me to make
type checks more strict.


> >

[PATCH 1/2] exfat: add NameLength check when extracting name

2020-08-05 Thread Tetsuhiro Kohada

The current implementation doesn't care NameLength when extracting
the name from Name dir-entries, so the name may be incorrect.
(Without null-termination, Insufficient Name dir-entry, etc)
Add a NameLength check when extracting the name from Name dir-entries
to extract correct name.
And, change to get the information of file/stream-ext dir-entries
via the member variable of exfat_entry_set_cache.

** This patch depends on:
  '[PATCH v3] exfat: integrates dir-entry getting and validation'.

Signed-off-by: Tetsuhiro Kohada 
---
 fs/exfat/dir.c | 81 --
 1 file changed, 39 insertions(+), 42 deletions(-)

diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
index 91cdbede0fd1..545bb73b95e9 100644
--- a/fs/exfat/dir.c
+++ b/fs/exfat/dir.c
@@ -28,16 +28,15 @@ static int exfat_extract_uni_name(struct exfat_dentry *ep,
 
 }
 
-static void exfat_get_uniname_from_ext_entry(struct super_block *sb,
-   struct exfat_chain *p_dir, int entry, unsigned short *uniname)
+static int exfat_get_uniname_from_name_entries(struct exfat_entry_set_cache 
*es,
+   struct exfat_uni_name *uniname)
 {
-   int i;
-   struct exfat_entry_set_cache *es;
+   int n, l, i;
struct exfat_dentry *ep;
 
-   es = exfat_get_dentry_set(sb, p_dir, entry, ES_ALL_ENTRIES);
-   if (!es)
-   return;
+   uniname->name_len = es->de_stream->name_len;
+   if (uniname->name_len == 0)
+   return -EIO;
 
/*
 * First entry  : file entry
@@ -45,14 +44,15 @@ static void exfat_get_uniname_from_ext_entry(struct 
super_block *sb,
 * Third entry  : first file-name entry
 * So, the index of first file-name dentry should start from 2.
 */
-
-   i = 2;
-   while ((ep = exfat_get_validated_dentry(es, i++, TYPE_NAME))) {
-   exfat_extract_uni_name(ep, uniname);
-   uniname += EXFAT_FILE_NAME_LEN;
+   for (l = 0, n = 2; l < uniname->name_len; n++) {
+   ep = exfat_get_validated_dentry(es, n, TYPE_NAME);
+   if (!ep)
+   return -EIO;
+   for (i = 0; l < uniname->name_len && i < EXFAT_FILE_NAME_LEN; 
i++, l++)
+   uniname->name[l] = 
le16_to_cpu(ep->dentry.name.unicode_0_14[i]);
}
-
-   exfat_free_dentry_set(es, false);
+   uniname->name[l] = 0;
+   return 0;
 }
 
 /* read a directory entry from the opened directory */
@@ -63,6 +63,7 @@ static int exfat_readdir(struct inode *inode, struct 
exfat_dir_entry *dir_entry)
sector_t sector;
struct exfat_chain dir, clu;
struct exfat_uni_name uni_name;
+   struct exfat_entry_set_cache *es;
struct exfat_dentry *ep;
struct super_block *sb = inode->i_sb;
struct exfat_sb_info *sbi = EXFAT_SB(sb);
@@ -114,47 +115,43 @@ static int exfat_readdir(struct inode *inode, struct 
exfat_dir_entry *dir_entry)
return -EIO;
 
type = exfat_get_entry_type(ep);
-   if (type == TYPE_UNUSED) {
-   brelse(bh);
+   brelse(bh);
+
+   if (type == TYPE_UNUSED)
break;
-   }
 
-   if (type != TYPE_FILE && type != TYPE_DIR) {
-   brelse(bh);
+   if (type != TYPE_FILE && type != TYPE_DIR)
continue;
-   }
 
-   dir_entry->attr = le16_to_cpu(ep->dentry.file.attr);
+   es = exfat_get_dentry_set(sb, &dir, dentry, 
ES_ALL_ENTRIES);
+   if (!es)
+   return -EIO;
+
+   dir_entry->attr = le16_to_cpu(es->de_file->attr);
exfat_get_entry_time(sbi, &dir_entry->crtime,
-   ep->dentry.file.create_tz,
-   ep->dentry.file.create_time,
-   ep->dentry.file.create_date,
-   ep->dentry.file.create_time_cs);
+   es->de_file->create_tz,
+   es->de_file->create_time,
+   es->de_file->create_date,
+   es->de_file->create_time_cs);
exfat_get_entry_time(sbi, &dir_entry->mtime,
-   ep->dentry.file.modify_tz,
-   ep->dentry.file.modify_time,
-   ep->dentry.file.modify_date,
-   ep->dentry.file.modify_time_cs);
+   es->de_file->modify_tz,
+   es->de_file->modify_time,
+

[PATCH 2/2] exfat: unify name extraction

2020-08-05 Thread Tetsuhiro Kohada

Name extraction in exfat_find_dir_entry() also doesn't care NameLength,
so the name may be incorrect.
Replace the name extraction in exfat_find_dir_entry() with using
exfat_entry_set_cache and exfat_get_uniname_from_name_entries(),
like exfat_readdir().
Replace the name extraction with using exfat_entry_set_cache and
exfat_get_uniname_from_name_entries(), like exfat_readdir().
And, remove unused functions/parameters.

** This patch depends on:
  '[PATCH v3] exfat: integrates dir-entry getting and validation'.

Signed-off-by: Tetsuhiro Kohada 
---
 fs/exfat/dir.c  | 161 ++--
 fs/exfat/exfat_fs.h |   2 +-
 fs/exfat/namei.c|   4 +-
 3 files changed, 38 insertions(+), 129 deletions(-)

diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
index 545bb73b95e9..c9715c7a55a1 100644
--- a/fs/exfat/dir.c
+++ b/fs/exfat/dir.c
@@ -10,24 +10,6 @@
 #include "exfat_raw.h"
 #include "exfat_fs.h"
 
-static int exfat_extract_uni_name(struct exfat_dentry *ep,
-   unsigned short *uniname)
-{
-   int i, len = 0;
-
-   for (i = 0; i < EXFAT_FILE_NAME_LEN; i++) {
-   *uniname = le16_to_cpu(ep->dentry.name.unicode_0_14[i]);
-   if (*uniname == 0x0)
-   return len;
-   uniname++;
-   len++;
-   }
-
-   *uniname = 0x0;
-   return len;
-
-}
-
 static int exfat_get_uniname_from_name_entries(struct exfat_entry_set_cache 
*es,
struct exfat_uni_name *uniname)
 {
@@ -869,13 +851,6 @@ struct exfat_entry_set_cache *exfat_get_dentry_set(struct 
super_block *sb,
return NULL;
 }
 
-enum {
-   DIRENT_STEP_FILE,
-   DIRENT_STEP_STRM,
-   DIRENT_STEP_NAME,
-   DIRENT_STEP_SECD,
-};
-
 /*
  * return values:
  *   >= 0  : return dir entiry position with the name in dir
@@ -885,13 +860,12 @@ enum {
  */
 int exfat_find_dir_entry(struct super_block *sb, struct exfat_inode_info *ei,
struct exfat_chain *p_dir, struct exfat_uni_name *p_uniname,
-   int num_entries, unsigned int type)
+   int num_entries)
 {
-   int i, rewind = 0, dentry = 0, end_eidx = 0, num_ext = 0, len;
-   int order, step, name_len = 0;
+   int i, rewind = 0, dentry = 0, end_eidx = 0, num_ext = 0;
+   int name_len = 0;
int dentries_per_clu, num_empty = 0;
unsigned int entry_type;
-   unsigned short *uniname = NULL;
struct exfat_chain clu;
struct exfat_hint *hint_stat = &ei->hint_stat;
struct exfat_hint_femp candi_empty;
@@ -909,27 +883,33 @@ int exfat_find_dir_entry(struct super_block *sb, struct 
exfat_inode_info *ei,
 
candi_empty.eidx = EXFAT_HINT_NONE;
 rewind:
-   order = 0;
-   step = DIRENT_STEP_FILE;
while (clu.dir != EXFAT_EOF_CLUSTER) {
i = dentry & (dentries_per_clu - 1);
for (; i < dentries_per_clu; i++, dentry++) {
struct exfat_dentry *ep;
struct buffer_head *bh;
+   struct exfat_entry_set_cache *es;
+   struct exfat_uni_name uni_name;
+   u16 name_hash;
 
if (rewind && dentry == end_eidx)
goto not_found;
 
+   /* skip secondary dir-entries in previous dir-entry set 
*/
+   if (num_ext) {
+   num_ext--;
+   continue;
+   }
+
ep = exfat_get_dentry(sb, &clu, i, &bh, NULL);
if (!ep)
return -EIO;
 
entry_type = exfat_get_entry_type(ep);
+   brelse(bh);
 
if (entry_type == TYPE_UNUSED ||
entry_type == TYPE_DELETED) {
-   step = DIRENT_STEP_FILE;
-
num_empty++;
if (candi_empty.eidx == EXFAT_HINT_NONE &&
num_empty == 1) {
@@ -954,7 +934,6 @@ int exfat_find_dir_entry(struct super_block *sb, struct 
exfat_inode_info *ei,
}
}
 
-   brelse(bh);
if (entry_type == TYPE_UNUSED)
goto not_found;
continue;
@@ -963,80 +942,38 @@ int exfat_find_dir_entry(struct super_block *sb, struct 
exfat_inode_info *ei,
num_empty = 0;
candi_empty.eidx = EXFAT_HINT_NONE;
 
-   if (entry_type == TYPE_FILE || entry_type == TYPE_DIR) {
-   step = DIRENT_STEP_FILE;
-   if (type == TYPE_ALL || type == entry_type) {
-   num_ext = ep-

RE: [PATCH v7] cpufreq: intel_pstate: Implement passive mode with HWP enabled

2020-08-05 Thread Doug Smythies

On 2020.08.05 09:56 Rafael J. Wysocki wrote:

> v6 -> v7:
>* Cosmetic changes in store_energy_performance_prefernce() to reduce the
>  LoC number and make it a bit easier to read.  No intentional functional
>  impact.

??
V7 is identical to V6.

Diff:

$ diff hwppassive-v6-2-2.patch hwppassive-v7-2-2.patch
2c2
< Sent: August 4, 2020 8:11 AM
---
> Sent: August 5, 2020 9:56 AM
5c5
< Subject: [PATCH v6] cpufreq: intel_pstate: Implement passive mode with HWP 
enabled
---
> Subject: [PATCH v7] cpufreq: intel_pstate: Implement passive mode with HWP 
> enabled
76a77,81
>
> v6 -> v7:
>* Cosmetic changes in store_energy_performance_prefernce() to reduce the
>  LoC number and make it a bit easier to read.  No intentional functional
>  impact.

... Doug

Re: [PATCH 4/4] vhost: vdpa: report iova range

2020-08-05 Thread Michael S. Tsirkin

On Thu, Aug 06, 2020 at 11:29:16AM +0800, Jason Wang wrote:
> 
> On 2020/8/5 下午8:58, Michael S. Tsirkin wrote:
> > On Wed, Jun 17, 2020 at 11:29:47AM +0800, Jason Wang wrote:
> > > This patch introduces a new ioctl for vhost-vdpa device that can
> > > report the iova range by the device. For device that depends on
> > > platform IOMMU, we fetch the iova range via DOMAIN_ATTR_GEOMETRY. For
> > > devices that has its own DMA translation unit, we fetch it directly
> > > from vDPA bus operation.
> > > 
> > > Signed-off-by: Jason Wang 
> > > ---
> > >   drivers/vhost/vdpa.c | 27 +++
> > >   include/uapi/linux/vhost.h   |  4 
> > >   include/uapi/linux/vhost_types.h |  5 +
> > >   3 files changed, 36 insertions(+)
> > > 
> > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > index 77a0c9fb6cc3..ad23e66cbf57 100644
> > > --- a/drivers/vhost/vdpa.c
> > > +++ b/drivers/vhost/vdpa.c
> > > @@ -332,6 +332,30 @@ static long vhost_vdpa_set_config_call(struct 
> > > vhost_vdpa *v, u32 __user *argp)
> > >   return 0;
> > >   }
> > > +
> > > +static long vhost_vdpa_get_iova_range(struct vhost_vdpa *v, u32 __user 
> > > *argp)
> > > +{
> > > + struct iommu_domain_geometry geo;
> > > + struct vdpa_device *vdpa = v->vdpa;
> > > + const struct vdpa_config_ops *ops = vdpa->config;
> > > + struct vhost_vdpa_iova_range range;
> > > + struct vdpa_iova_range vdpa_range;
> > > +
> > > + if (!ops->set_map && !ops->dma_map) {
> > Why not just check if (ops->get_iova_range) directly?
> 
> 
> Because set_map || dma_ops is a hint that the device has its own DMA
> translation logic.
> 
> Device without get_iova_range does not necessarily meant it use IOMMU
> driver.
> 
> Thanks

OK let's add some code comments please, and check get_iova_range
is actually there before calling.

> 
> > 
> > 
> > 
> > 
> > > + iommu_domain_get_attr(v->domain,
> > > +   DOMAIN_ATTR_GEOMETRY, &geo);
> > > + range.start = geo.aperture_start;
> > > + range.end = geo.aperture_end;
> > > + } else {
> > > + vdpa_range = ops->get_iova_range(vdpa);
> > > + range.start = vdpa_range.start;
> > > + range.end = vdpa_range.end;
> > > + }
> > > +
> > > + return copy_to_user(argp, &range, sizeof(range));
> > > +
> > > +}
> > > +
> > >   static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int 
> > > cmd,
> > >  void __user *argp)
> > >   {
> > > @@ -442,6 +466,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file 
> > > *filep,
> > >   case VHOST_VDPA_SET_CONFIG_CALL:
> > >   r = vhost_vdpa_set_config_call(v, argp);
> > >   break;
> > > + case VHOST_VDPA_GET_IOVA_RANGE:
> > > + r = vhost_vdpa_get_iova_range(v, argp);
> > > + break;
> > >   default:
> > >   r = vhost_dev_ioctl(&v->vdev, cmd, argp);
> > >   if (r == -ENOIOCTLCMD)
> > > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> > > index 0c2349612e77..850956980e27 100644
> > > --- a/include/uapi/linux/vhost.h
> > > +++ b/include/uapi/linux/vhost.h
> > > @@ -144,4 +144,8 @@
> > >   /* Set event fd for config interrupt*/
> > >   #define VHOST_VDPA_SET_CONFIG_CALL  _IOW(VHOST_VIRTIO, 0x77, int)
> > > +
> > > +/* Get the valid iova range */
> > > +#define VHOST_VDPA_GET_IOVA_RANGE_IOW(VHOST_VIRTIO, 0x78, \
> > > +  struct vhost_vdpa_iova_range)
> > >   #endif
> > > diff --git a/include/uapi/linux/vhost_types.h 
> > > b/include/uapi/linux/vhost_types.h
> > > index 669457ce5c48..4025b5a36177 100644
> > > --- a/include/uapi/linux/vhost_types.h
> > > +++ b/include/uapi/linux/vhost_types.h
> > > @@ -127,6 +127,11 @@ struct vhost_vdpa_config {
> > >   __u8 buf[0];
> > >   };
> > > +struct vhost_vdpa_iova_range {
> > > + __u64 start;
> > > + __u64 end;
> > > +};
> > > +
> > 
> > Pls document fields. And I think first/last is a better API ...
> > 
> > >   /* Feature bits */
> > >   /* Log all write descriptors. Can be changed while device is active. */
> > >   #define VHOST_F_LOG_ALL 26
> > > -- 
> > > 2.20.1

RE: [PATCH] cpufreq: intel_pstate: Implement passive mode with HWP enabled

2020-08-05 Thread Doug Smythies

On 2020.08.03 10:09 Rafael J. Wysocki wrote:
> On Sunday, August 2, 2020 5:17:39 PM CEST Doug Smythies wrote:
> > On 2020.07.19 04:43 Rafael J. Wysocki wrote:
> > > On Fri, Jul 17, 2020 at 3:37 PM Doug Smythies  wrote:
> > > > On 2020.07.16 05:08 Rafael J. Wysocki wrote:
> > > > > On Wed, Jul 15, 2020 at 10:39 PM Doug Smythies  
> > > > > wrote:
> > > > >> On 2020.07.14 11:16 Rafael J. Wysocki wrote:
> > > > >> >
> > > > >> > From: Rafael J. Wysocki 
> > > > >> ...
> > > > >> > Since the passive mode hasn't worked with HWP at all, and it is 
> > > > >> > not going to
> > > > >> > the default for HWP systems anyway, I don't see any drawbacks 
> > > > >> > related to making
> > > > >> > this change, so I would consider this as 5.9 material unless there 
> > > > >> > are any
> > > > >> > serious objections.
> > > > >>
> > > > >> Good point.
> > > >
> > > > Actually, for those users that default to passive mode upon boot,
> > > > this would mean they would find themselves using this.
> > > > Also, it isn't obvious, from the typical "what driver and what governor"
> > > > inquiry.
> > >
> > > So the change in behavior is that after this patch
> > > intel_pstate=passive doesn't imply no_hwp any more.
> > >
> > > That's a very minor difference though and I'm not aware of any adverse
> > > effects it can cause on HWP systems anyway.
> >
> > My point was, that it will now default to something where
> > testing has not been completed.
> >
> > > The "what governor" is straightforward in the passive mode: that's
> > > whatever cpufreq governor has been selected.
> >
> > I think you might have missed my point.
> > From the normal methods of inquiry one does not know
> > if HWP is being used or not. Why? Because with
> > or without HWP one gets the same answers under:
> >
> > /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
> > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> 
> Yes, but this is also the case in the active mode, isn't it?

Yes, fair enough.
But we aren't changing what it means by default
between kernel 5.8 and 5.9-rc1.

... Doug

Re: [PATCH 1/4] vdpa: introduce config op to get valid iova range

2020-08-05 Thread Michael S. Tsirkin

On Thu, Aug 06, 2020 at 11:25:11AM +0800, Jason Wang wrote:
> 
> On 2020/8/5 下午8:51, Michael S. Tsirkin wrote:
> > On Wed, Jun 17, 2020 at 11:29:44AM +0800, Jason Wang wrote:
> > > This patch introduce a config op to get valid iova range from the vDPA
> > > device.
> > > 
> > > Signed-off-by: Jason Wang
> > > ---
> > >   include/linux/vdpa.h | 14 ++
> > >   1 file changed, 14 insertions(+)
> > > 
> > > diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> > > index 239db794357c..b7633ed2500c 100644
> > > --- a/include/linux/vdpa.h
> > > +++ b/include/linux/vdpa.h
> > > @@ -41,6 +41,16 @@ struct vdpa_device {
> > >   unsigned int index;
> > >   };
> > > +/**
> > > + * vDPA IOVA range - the IOVA range support by the device
> > > + * @start: start of the IOVA range
> > > + * @end: end of the IOVA range
> > > + */
> > > +struct vdpa_iova_range {
> > > + u64 start;
> > > + u64 end;
> > > +};
> > > +
> > This is ambiguous. Is end in the range or just behind it?
> 
> 
> In the range.

OK I guess we can treat it as a bugfix and merge after rc1,
but pls add a bit more in the commit log about what's
currently broken.

> 
> > How about first/last?
> 
> 
> Sure.
> 
> Thanks
> 
> 
> > 
> > 
> >

Re: [PATCH v2 19/24] vdpa: make sure set_features in invoked for legacy

2020-08-05 Thread Michael S. Tsirkin

On Thu, Aug 06, 2020 at 11:23:05AM +0800, Jason Wang wrote:
> 
> On 2020/8/5 下午7:40, Michael S. Tsirkin wrote:
> > On Wed, Aug 05, 2020 at 02:14:07PM +0800, Jason Wang wrote:
> > > On 2020/8/4 上午5:00, Michael S. Tsirkin wrote:
> > > > Some legacy guests just assume features are 0 after reset.
> > > > We detect that config space is accessed before features are
> > > > set and set features to 0 automatically.
> > > > Note: some legacy guests might not even access config space, if this is
> > > > reported in the field we might need to catch a kick to handle these.
> > > I wonder whether it's easier to just support modern device?
> > > 
> > > Thanks
> > Well hardware vendors are I think interested in supporting legacy
> > guests. Limiting vdpa to modern only would make it uncompetitive.
> 
> 
> My understanding is that, IOMMU_PLATFORM is mandatory for hardware vDPA to
> work.

Hmm I don't really see why. Assume host maps guest memory properly,
VM does not have an IOMMU, legacy guest can just work.

Care explaining what's wrong with this picture?


> So it can only work for modern device ...
> 
> Thanks
> 
> 
> > 
> > 
> >

Re: [PATCH] jbd2: fix incorrect code style

2020-08-05 Thread tytso

On Sat, Jul 18, 2020 at 08:57:37AM -0400, Xianting Tian wrote:
> Remove unnecessary blank.
> 
> Signed-off-by: Xianting Tian 

Thanks, applied.

- Ted

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-08-05 Thread Hugh Dickins

On Mon, 27 Jul 2020, Greg KH wrote:
> 
> Linus just pointed me at this thread.
> 
> If you could run:
>   echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
> and run the same workload to see if anything shows up in the log when
> xhci crashes, that would be great.

Thanks, I tried that, and indeed it did have a story to tell:

ep 0x81 - asked for 16 bytes, 10 bytes untransferred
ep 0x81 - asked for 16 bytes, 10 bytes untransferred
ep 0x81 - asked for 16 bytes, 10 bytes untransferred
   a very large number of lines like the above, then
Cancel URB d81602f7, dev 4, ep 0x0, starting at offset 0xfffd42c0
// Ding dong!
ep 0x81 - asked for 16 bytes, 10 bytes untransferred
Stopped on No-op or Link TRB for slot 1 ep 0
xhci_drop_endpoint called for udev 5bc07fa6
drop ep 0x81, slot id 1, new drop flags = 0x8, new add flags = 0x0
add ep 0x81, slot id 1, new drop flags = 0x8, new add flags = 0x8
xhci_check_bandwidth called for udev 5bc07fa6
// Ding dong!
Successful Endpoint Configure command
Cancel URB 6b77d490, dev 4, ep 0x81, starting at offset 0x0
// Ding dong!
Stopped on No-op or Link TRB for slot 1 ep 2
Removing canceled TD starting at 0x0 (dma).
list_del corruption: prev(8fdb4de7a130)->next should be 8fdb41697f88,
   but is 6b6b6b6b6b6b6b6b; next(8fdb4de7a130)->prev is 6b6b6b6b6b6b6b6b.
[ cut here ]
kernel BUG at lib/list_debug.c:53!
RIP: 0010:__list_del_entry_valid+0x8e/0xb0
Call Trace:

 handle_cmd_completion+0x7d4/0x14f0 [xhci_hcd]
 xhci_irq+0x242/0x1ea0 [xhci_hcd]
 xhci_msi_irq+0x11/0x20 [xhci_hcd]
 __handle_irq_event_percpu+0x48/0x2c0
 handle_irq_event_percpu+0x32/0x80
 handle_irq_event+0x4a/0x80
 handle_edge_irq+0xd8/0x1b0
 handle_irq+0x2b/0x50
 do_IRQ+0xb6/0x1c0
 common_interrupt+0x90/0x90

Info provided for your interest, not expecting any response.
The list_del info in there is non-standard, from a patch of mine:
I find hashed addresses in debug output less than helpful.

> 
> Although if you are using an "older version" of the driver, there's not
> much I can suggest except update to a newer one :)

Yes, I was reluctant to post any info, since really the ball is at our
end of the court, not yours. I did have a go at bringing in the latest
xhci driver instead, but quickly saw that was not a sensible task for
me. And I did scan the git log of xhci changes (especially xhci-ring.c
changes): thought I saw a likely relevant and easily applied fix commit,
but in fact it made no difference here.

I suspect it's in part a hardware problem, but driver not recovering
correctly. I've replaced the machine (but also noticed that the same
crash has occasionally been seen on other machines). I'm sure it has
no relevance to this unlock_page() thread, though it's quite possible
that it's triggered under stress, and Linus's changes allowed greater
stress.

Hugh

Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-05 Thread Christophe Leroy


Hi,

On 08/05/2020 06:40 PM, Segher Boessenkool wrote:

Hi!

On Wed, Aug 05, 2020 at 04:40:16PM +, Christophe Leroy wrote:

It cannot optimise it because it does not know shift < 32.  The code
below is incorrect for shift equal to 32, fwiw.


Is there a way to tell it ?


Sure, for example the &31 should work (but it doesn't, with the GCC
version you used -- which version is that?)


GCC 10.1




What does the compiler do for just

static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
return ns >> (shift & 31);
}



Worse:


I cannot make heads or tails of all that branch spaghetti, sorry.


  73c:  55 8c 06 fe clrlwi  r12,r12,27
  740:  7f c8 f0 14 addcr30,r8,r30
  744:  7c c6 4a 14 add r6,r6,r9
  748:  7c c6 e1 14 adder6,r6,r28
  74c:  34 6c ff e0 addic.  r3,r12,-32
  750:  41 80 00 70 blt 7c0 <__c_kernel_clock_gettime+0x114>


This branch is always true.  Hrm.


As a standalone function:

With your suggestion:

06ac :
 6ac:   54 a5 06 fe clrlwi  r5,r5,27
 6b0:   35 25 ff e0 addic.  r9,r5,-32
 6b4:   41 80 00 10 blt 6c4 
 6b8:   7c 64 4c 30 srw r4,r3,r9
 6bc:   38 60 00 00 li  r3,0
 6c0:   4e 80 00 20 blr
 6c4:   54 69 08 3c rlwinm  r9,r3,1,0,30
 6c8:   21 45 00 1f subfic  r10,r5,31
 6cc:   7c 84 2c 30 srw r4,r4,r5
 6d0:   7d 29 50 30 slw r9,r9,r10
 6d4:   7c 63 2c 30 srw r3,r3,r5
 6d8:   7d 24 23 78 or  r4,r9,r4
 6dc:   4e 80 00 20 blr


With the version as is in my series:

06ac :
 6ac:   21 25 00 20 subfic  r9,r5,32
 6b0:   7c 69 48 30 slw r9,r3,r9
 6b4:   7c 84 2c 30 srw r4,r4,r5
 6b8:   7d 24 23 78 or  r4,r9,r4
 6bc:   7c 63 2c 30 srw r3,r3,r5
 6c0:   4e 80 00 20 blr


Christophe

Re: [PATCH 2/2] Add a new sysctl knob: unprivileged_userfaultfd_user_mode_only

2020-08-05 Thread Michael S. Tsirkin

On Wed, Aug 05, 2020 at 05:43:02PM -0700, Nick Kralevich wrote:
> On Fri, Jul 24, 2020 at 6:40 AM Michael S. Tsirkin  wrote:
> >
> > On Thu, Jul 23, 2020 at 05:13:28PM -0700, Nick Kralevich wrote:
> > > On Thu, Jul 23, 2020 at 10:30 AM Lokesh Gidra  
> > > wrote:
> > > > From the discussion so far it seems that there is a consensus that
> > > > patch 1/2 in this series should be upstreamed in any case. Is there
> > > > anything that is pending on that patch?
> > >
> > > That's my reading of this thread too.
> > >
> > > > > > Unless I'm mistaken that you can already enforce bit 1 of the second
> > > > > > parameter of the userfaultfd syscall to be set with seccomp-bpf, 
> > > > > > this
> > > > > > would be more a question to the Android userland team.
> > > > > >
> > > > > > The question would be: does it ever happen that a seccomp filter 
> > > > > > isn't
> > > > > > already applied to unprivileged software running without
> > > > > > SYS_CAP_PTRACE capability?
> > > > >
> > > > > Yes.
> > > > >
> > > > > Android uses selinux as our primary sandboxing mechanism. We do use
> > > > > seccomp on a few processes, but we have found that it has a
> > > > > surprisingly high performance cost [1] on arm64 devices so turning it
> > > > > on system wide is not a good option.
> > > > >
> > > > > [1] 
> > > > > https://lore.kernel.org/linux-security-module/20200606.3F7109A@keescook/T/#m82ace19539ac595682affabdf652c0ffa5d27dad
> > >
> > > As Jeff mentioned, seccomp is used strategically on Android, but is
> > > not applied to all processes. It's too expensive and impractical when
> > > simpler implementations (such as this sysctl) can exist. It's also
> > > significantly simpler to test a sysctl value for correctness as
> > > opposed to a seccomp filter.
> >
> > Given that selinux is already used system-wide on Android, what is wrong
> > with using selinux to control userfaultfd as opposed to seccomp?
> 
> Userfaultfd file descriptors will be generally controlled by SELinux.
> You can see the patchset at
> https://lore.kernel.org/lkml/20200401213903.182112-3-dan...@google.com/
> (which is also referenced in the original commit message for this
> patchset). However, the SELinux patchset doesn't include the ability
> to control FAULT_FLAG_USER / UFFD_USER_MODE_ONLY directly.
> 
> SELinux already has the ability to control who gets CAP_SYS_PTRACE,
> which combined with this patch, is largely equivalent to direct
> UFFD_USER_MODE_ONLY checks. Additionally, with the SELinux patch
> above, movement of userfaultfd file descriptors can be mediated by
> SELinux, preventing one process from acquiring userfaultfd descriptors
> of other processes unless allowed by security policy.
> 
> It's an interesting question whether finer-grain SELinux support for
> controlling UFFD_USER_MODE_ONLY should be added. I can see some
> advantages to implementing this. However, we don't need to decide that
> now.
>
> Kernel security checks generally break down into DAC (discretionary
> access control) and MAC (mandatory access control) controls. Most
> kernel security features check via both of these mechanisms. Security
> attributes of the system should be settable without necessarily
> relying on an LSM such as SELinux. This patch follows the same basic
> model -- system wide control of a hardening feature is provided by the
> unprivileged_userfaultfd_user_mode_only sysctl (DAC), and if needed,
> SELinux support for this can also be implemented on top of the DAC
> controls.
> 
> This DAC/MAC split has been successful in several other security
> features. For example, the ability to map at page zero is controlled
> in DAC via the mmap_min_addr sysctl [1], and via SELinux via the
> mmap_zero access vector [2]. Similarly, access to the kernel ring
> buffer is controlled both via DAC as the dmesg_restrict sysctl [3], as
> well as the SELinux syslog_read [2] check. Indeed, the dmesg_restrict
> sysctl is very similar to this patch -- it introduces a capability
> (CAP_SYSLOG, CAP_SYS_PTRACE) check on access to a sensitive resource.
> 
> If we want to ensure that a security feature will be well tested and
> vetted, it's important to not limit its use to LSMs only. This ensures
> that kernel and application developers will always be able to test the
> effects of a security feature, without relying on LSMs like SELinux.
> It also ensures that all distributions can enable this security
> mitigation should it be necessary for their unique environments,
> without introducing an SELinux dependency. And this patch does not
> preclude an SELinux implementation should it be necessary.
> 
> Even if we decide to implement fine-grain SELinux controls on
> UFFD_USER_MODE_ONLY, we still need this patch. We shouldn't make this
> an either/or choice between SELinux and this patch. Both are
> necessary.
> 
> -- Nick
> 
> [1] https://wiki.debian.org/mmap_min_addr
> [2] https://selinuxproject.org/page/NB_ObjectClassesPermissions
> [3] https://www.kernel.org/doc/Docu

[PATCH] tty: synclink_gt: switch from 'pci_' to 'dma_' API

2020-08-05 Thread Christophe JAILLET

The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.

When memory is allocated in 'alloc_desc()' and 'alloc_bufs()', GFP_KERNEL
can be used because it is only called from a probe function and no lock is
acquired.
The call chain is:
   init_one(the probe function)
  --> device_init
 --> alloc_dma_bufs
--> alloc_desc
--> alloc_bufs

@@
@@
-PCI_DMA_BIDIRECTIONAL
+DMA_BIDIRECTIONAL

@@
@@
-PCI_DMA_TODEVICE
+DMA_TO_DEVICE

@@
@@
-PCI_DMA_FROMDEVICE
+DMA_FROM_DEVICE

@@
@@
-PCI_DMA_NONE
+DMA_NONE

@@
expression e1, e2, e3;
@@
-pci_alloc_consistent(e1, e2, e3)
+dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-pci_zalloc_consistent(e1, e2, e3)
+dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-pci_free_consistent(e1, e2, e3, e4)
+dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_map_single(e1, e2, e3, e4)
+dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_unmap_single(e1, e2, e3, e4)
+dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-pci_map_page(e1, e2, e3, e4, e5)
+dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-pci_unmap_page(e1, e2, e3, e4)
+dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_map_sg(e1, e2, e3, e4)
+dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_unmap_sg(e1, e2, e3, e4)
+dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_dma_sync_single_for_device(e1, e2, e3, e4)
+dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-pci_dma_mapping_error(e1, e2)
+dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-pci_set_dma_mask(e1, e2)
+dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-pci_set_consistent_dma_mask(e1, e2)
+dma_set_coherent_mask(&e1->dev, e2)

Signed-off-by: Christophe JAILLET 
---
If needed, see post from Christoph Hellwig on the kernel-janitors ML:
   https://marc.info/?l=kernel-janitors&m=158745678307186&w=4
---
 drivers/tty/synclink_gt.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/tty/synclink_gt.c b/drivers/tty/synclink_gt.c
index b794177ccfb9..1edf06653148 100644
--- a/drivers/tty/synclink_gt.c
+++ b/drivers/tty/synclink_gt.c
@@ -3341,8 +3341,8 @@ static int alloc_desc(struct slgt_info *info)
unsigned int pbufs;
 
/* allocate memory to hold descriptor lists */
-   info->bufs = pci_zalloc_consistent(info->pdev, DESC_LIST_SIZE,
-  &info->bufs_dma_addr);
+   info->bufs = dma_alloc_coherent(&info->pdev->dev, DESC_LIST_SIZE,
+   &info->bufs_dma_addr, GFP_KERNEL);
if (info->bufs == NULL)
return -ENOMEM;
 
@@ -3384,7 +3384,8 @@ static int alloc_desc(struct slgt_info *info)
 static void free_desc(struct slgt_info *info)
 {
if (info->bufs != NULL) {
-   pci_free_consistent(info->pdev, DESC_LIST_SIZE, info->bufs, 
info->bufs_dma_addr);
+   dma_free_coherent(&info->pdev->dev, DESC_LIST_SIZE,
+ info->bufs, info->bufs_dma_addr);
info->bufs  = NULL;
info->rbufs = NULL;
info->tbufs = NULL;
@@ -3395,7 +3396,9 @@ static int alloc_bufs(struct slgt_info *info, struct 
slgt_desc *bufs, int count)
 {
int i;
for (i=0; i < count; i++) {
-   if ((bufs[i].buf = pci_alloc_consistent(info->pdev, DMABUFSIZE, 
&bufs[i].buf_dma_addr)) == NULL)
+   bufs[i].buf = dma_alloc_coherent(&info->pdev->dev, DMABUFSIZE,
+&bufs[i].buf_dma_addr, 
GFP_KERNEL);
+   if (!bufs[i].buf)
return -ENOMEM;
bufs[i].pbuf  = cpu_to_le32((unsigned int)bufs[i].buf_dma_addr);
}
@@ -3408,7 +3411,8 @@ static void free_bufs(struct slgt_info *info, struct 
slgt_desc *bufs, int count)
for (i=0; i < count; i++) {
if (bufs[i].buf == NULL)
continue;
-   pci_free_consistent(info->pdev, DMABUFSIZE, bufs[i].buf, 
bufs[i].buf_dma_addr);
+   dma_free_coherent(&info->

Re: [PATCH 06/18] fsinfo: Add a uniquifier ID to struct mount [ver #21]

2020-08-05 Thread Ian Kent

On Wed, 2020-08-05 at 20:33 +0100, Matthew Wilcox wrote:
> On Wed, Aug 05, 2020 at 04:30:10PM +0100, David Howells wrote:
> > Miklos Szeredi  wrote:
> > 
> > > idr_alloc_cyclic() seems to be a good template for doing the
> > > lower
> > > 32bit allocation, and we can add code to increment the high 32bit
> > > on
> > > wraparound.
> > > 
> > > Lots of code uses idr_alloc_cyclic() so I guess it shouldn't be
> > > too
> > > bad in terms of memory use or performance.
> > 
> > It's optimised for shortness of path and trades memory for
> > performance.  It's
> > currently implemented using an xarray, so memory usage is dependent
> > on the
> > sparseness of the tree.  Each node in the tree is 576 bytes and in
> > the worst
> > case, each one node will contain one mount - and then you have to
> > backfill the
> > ancestry, though for lower memory costs.
> > 
> > Systemd makes life more interesting since it sets up a whole load
> > of
> > propagations.  Each mount you make may cause several others to be
> > created, but
> > that would likely make the tree more efficient.
> 
> I would recommend using xa_alloc and ignoring the ID assigned from
> xa_alloc.  Looking up by unique ID is then a matter of iterating
> every
> mount (xa_for_each()) looking for a matching unique ID in the mount
> struct.  That's O(n) search, but it's faster than a linked list, and
> we
> don't have that many mounts in a system.

How many is not many, 5000, 1, I agree that 3 plus is fairly
rare, even for the autofs direct mount case I hope the implementation
here will help to fix.

Ian

Re: [PATCH V2] venus: core: add shutdown callback for venus

2020-08-05 Thread kernel test robot

Hi Mansur,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linuxtv-media/master]
[also build test WARNING on v5.8 next-20200805]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Mansur-Alisha-Shaik/venus-core-add-shutdown-callback-for-venus/20200806-114716
base:   git://linuxtv.org/media_tree.git master
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   In file included from include/linux/device.h:15,
from include/linux/node.h:18,
from include/linux/cpu.h:17,
from include/linux/of_device.h:5,
from drivers/media/platform/qcom/venus/core.c:11:
   drivers/media/platform/qcom/venus/core.c: In function 'venus_core_shutdown':
>> drivers/media/platform/qcom/venus/core.c:351:23: warning: too many arguments 
>> for format [-Wformat-extra-args]
 351 |   dev_warn(core->dev, "shutdown failed \n", ret);
 |   ^~~~
   include/linux/dev_printk.h:19:22: note: in definition of macro 'dev_fmt'
  19 | #define dev_fmt(fmt) fmt
 |  ^~~
   drivers/media/platform/qcom/venus/core.c:351:3: note: in expansion of macro 
'dev_warn'
 351 |   dev_warn(core->dev, "shutdown failed \n", ret);
 |   ^~~~

vim +351 drivers/media/platform/qcom/venus/core.c

   343  
   344  static void venus_core_shutdown(struct platform_device *pdev)
   345  {
   346  struct venus_core *core = platform_get_drvdata(pdev);
   347  int ret;
   348  
   349  ret = venus_remove(pdev);
   350  if(ret)
 > 351  dev_warn(core->dev, "shutdown failed \n", ret);
   352  }
   353  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

RE: [PATCH] kprobes: fix compiler warning for !CONFIG_KPROBES_ON_FTRACE

2020-08-05 Thread John Fastabend

Muchun Song wrote:
> Fix compiler warning(as show below) for !CONFIG_KPROBES_ON_FTRACE.
> 
> kernel/kprobes.c: In function 'kill_kprobe':
> kernel/kprobes.c:1116:33: warning: statement with no effect
> [-Wunused-value]
>  1116 | #define disarm_kprobe_ftrace(p) (-ENODEV)
>   | ^
> kernel/kprobes.c:2154:3: note: in expansion of macro
> 'disarm_kprobe_ftrace'
>  2154 |   disarm_kprobe_ftrace(p);
> 
> Link: https://lore.kernel.org/r/20200805142136.0331f...@canb.auug.org.au
> 
> Reported-by: Stephen Rothwell 
> Fixes: 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at 
> kprobe_ftrace_handler")
> Signed-off-by: Muchun Song 
> ---

Acked-by: John Fastabend

Re: ext4: fix spelling typos in ext4_mb_initialize_context

2020-08-05 Thread tytso

On Wed, Jul 15, 2020 at 11:00:44AM +0800, brookxu wrote:
> Fix spelling typos in ext4_mb_initialize_context.
> 
> Signed-off-by: Chunguang Xu 

Thanks, applied.

- Ted

Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask

2020-08-05 Thread Michael Ellerman

pet...@infradead.org writes:
> On Tue, Aug 04, 2020 at 05:40:07PM +0530, Srikar Dronamraju wrote:
>> * pet...@infradead.org  [2020-08-04 12:45:20]:
>> 
>> > On Tue, Aug 04, 2020 at 09:03:06AM +0530, Srikar Dronamraju wrote:
>> > > cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
>> > > most architectures. One of the users of cpu_smt_mask(), would be to
>> > > identify idle-cores. On Power9, a pair of cores can be presented by the
>> > > firmware as a big-core for backward compatibility reasons.
>> > > 
>> > > In order to maintain userspace backward compatibility with previous
>> > > versions of processor, (since Power8 had SMT8 cores), Power9 onwards 
>> > > there
>> > > is option to the firmware to advertise a pair of SMT4 cores as a fused
>> > > cores (referred to as the big_core mode in the Linux Kernel). On Power9
>> > > this pair shares the L2 cache as well. However, from the scheduler's 
>> > > point
>> > > of view, a core should be determined by SMT4. The load-balancer already
>> > > does this. Hence allow PowerPc architecture to override the default
>> > > cpu_smt_mask() to point to the SMT4 cores in a big_core mode.
>> > 
>> > I'm utterly confused.
>> > 
>> > Why can't you set your regular siblings mask to the smt4 thing? Who
>> > cares about the compat stuff, I thought that was an LPAR/AIX thing.

To be clear this stuff is all for when we're running on the PowerVM machines,
ie. as LPARs.

That brings with it a bunch of problems, such as existing software that
has been developed/configured for Power8 and expects to see SMT8.

We also allow LPARs to be live migrated from Power8 to Power9 (and back), so
maintaining the illusion of SMT8 is considered a requirement to make that work.

>> There are no technical challenges to set the sibling mask to SMT4.
>> This is for Linux running on PowerVM. When these Power9 boxes are sold /
>> marketed as X core boxes (where X stand for SMT8 cores).  Since on PowerVM
>> world, everything is in SMT8 mode, the device tree properties still mark
>> the system to be running on 8 thread core. There are a number of utilities
>> like ppc64_cpu that directly read from device-tree. They would get core
>> count and thread count which is SMT8 based.
>> 
>> If the sibling_mask is set to small core, then same user when looking at
>
> FWIW, I find the small/big core naming utterly confusing vs the
> big/little naming from ARM. When you say small, I'm thinking of
> asymmetric cores, not SMT4/SMT8.

Yeah I agree the naming is confusing.

Let's call them "SMT4 cores" and "SMT8 cores"?

>> output from lscpu and other utilities that look at sysfs will start seeing
>> 2x number of cores to what he had provisioned and what the utilities from
>> the device-tree show. This can gets users confused.
>
> One will report SMT8 and the other SMT4, right? So only users that
> cannot read will be confused, but if you can't read, confusion is
> guaranteed anyway.

It's partly users, but also software that would see different values depending
on where it looks.

> Also, by exposing the true (SMT4) topology to userspace, userspace
> applications could behave better -- for those few that actually parse
> the topology information.

Agreed, though as you say there aren't that many that actually use the low-level
topology information.

>> So to keep the device-tree properties, utilities depending on device-tree,
>> sysfs and utilities depending on sysfs on the same page, userspace are only
>> exposed as SMT8.
>
> I'm not convinced it makes sense to lie to userspace just to accomodate
> a few users that cannot read.

The problem is we are already lying to userspace, because firmware lies to us.

ie. the firmware on these systems shows us an SMT8 core, and so current kernels
show SMT8 to userspace. I don't think we can realistically change that fact now,
as these systems are already out in the field.

What this patch tries to do is undo some of the mess, and at least give the
scheduler the right information.

cheers

Re: [PATCH v5 4/4] clk: qcom: lpass: Add support for LPASS clock controller for SC7180

2020-08-05 Thread Taniya Das


Hi Stephen,

On 8/6/2020 1:54 AM, Stephen Boyd wrote:

Quoting Taniya Das (2020-07-24 09:07:58)

+
+static struct clk_rcg2 core_clk_src = {
+   .cmd_rcgr = 0x1d000,
+   .mnd_width = 8,
+   .hid_width = 5,
+   .parent_map = lpass_core_cc_parent_map_2,
+   .clkr.hw.init = &(struct clk_init_data){
+   .name = "core_clk_src",


Any chance this can get a better name? Something with LPASS prefix?



These are the exact clock names from the hardware plan.


+   .parent_data = &(const struct clk_parent_data){
+   .fw_name = "bi_tcxo",
+   },
+   .num_parents = 1,
+   .ops = &clk_rcg2_ops,
+   },
+};
+

[...]

+
+static struct clk_branch lpass_audio_core_sysnoc_mport_core_clk = {
+   .halt_reg = 0x23000,
+   .halt_check = BRANCH_HALT,
+   .hwcg_reg = 0x23000,
+   .hwcg_bit = 1,
+   .clkr = {
+   .enable_reg = 0x23000,
+   .enable_mask = BIT(0),
+   .hw.init = &(struct clk_init_data){
+   .name = "lpass_audio_core_sysnoc_mport_core_clk",
+   .parent_data = &(const struct clk_parent_data){
+   .hw = &core_clk_src.clkr.hw,
+   },
+   .num_parents = 1,
+   .flags = CLK_SET_RATE_PARENT,
+   .ops = &clk_branch2_ops,
+   },
+   },
+};
+
+static struct clk_regmap *lpass_core_cc_sc7180_clocks[] = {
+   [EXT_MCLK0_CLK_SRC] = &ext_mclk0_clk_src.clkr,
+   [LPAIF_PRI_CLK_SRC] = &lpaif_pri_clk_src.clkr,
+   [LPAIF_SEC_CLK_SRC] = &lpaif_sec_clk_src.clkr,
+   [CORE_CLK_SRC] = &core_clk_src.clkr,


And all of these, can they have LPASS_ prefix on the defines? Seems
like we're missing a namespace otherwise.



These are generated as they are in the HW plan. Do you still think I 
should update them?



+   [LPASS_AUDIO_CORE_EXT_MCLK0_CLK] = &lpass_audio_core_ext_mclk0_clk.clkr,
+   [LPASS_AUDIO_CORE_LPAIF_PRI_IBIT_CLK] =
+   &lpass_audio_core_lpaif_pri_ibit_clk.clkr,
+   [LPASS_AUDIO_CORE_LPAIF_SEC_IBIT_CLK] =
+   &lpass_audio_core_lpaif_sec_ibit_clk.clkr,
+   [LPASS_AUDIO_CORE_SYSNOC_MPORT_CORE_CLK] =
+   &lpass_audio_core_sysnoc_mport_core_clk.clkr,
+   [LPASS_LPAAUDIO_DIG_PLL] = &lpass_lpaaudio_dig_pll.clkr,
+   [LPASS_LPAAUDIO_DIG_PLL_OUT_ODD] = &lpass_lpaaudio_dig_pll_out_odd.clkr,
+};
+


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.

--

Re: [PATCH v6 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

2020-08-05 Thread Paraschiv, Andra-Irina





On 05/08/2020 17:23, kernel test robot wrote:


Hi Andra,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linux/master]
[also build test ERROR on linus/master v5.8 next-20200805]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Andra-Paraschiv/Add-support-for-Nitro-Enclaves/20200805-171942
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
bcf876870b95592b52519ed4aafcf9d95999bc9c
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
 wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
 chmod +x ~/bin/make.cross
 # save the attached .config to linux build tree
 COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=arm64


Removed, for now, the dependency on ARM64 arch. x86 is currently 
supported, with Arm to come afterwards.


Thanks,
Andra



If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

drivers/virt/nitro_enclaves/ne_misc_dev.c: In function 'ne_setup_cpu_pool':

drivers/virt/nitro_enclaves/ne_misc_dev.c:245:46: error: 'smp_num_siblings' 
undeclared (first use in this function); did you mean 'cpu_sibling'?

  245 |  ne_cpu_pool.avail_cores_size = nr_cpu_ids / smp_num_siblings;
  |  ^~~~
  |  cpu_sibling
drivers/virt/nitro_enclaves/ne_misc_dev.c:245:46: note: each undeclared 
identifier is reported only once for each function it appears in
drivers/virt/nitro_enclaves/ne_misc_dev.c: In function 'ne_enclave_ioctl':
drivers/virt/nitro_enclaves/ne_misc_dev.c:928:54: error: 'smp_num_siblings' 
undeclared (first use in this function)
  928 |   if (vcpu_id >= (ne_enclave->avail_cpu_cores_size * 
smp_num_siblings)) {
  |  
^~~~

vim +245 drivers/virt/nitro_enclaves/ne_misc_dev.c

7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  130
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  131  /**
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  132   * ne_setup_cpu_pool() - Set 
the NE CPU pool after handling sanity checks such
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  133   *  as not 
sharing CPU cores with the primary / parent VM
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  134   *  or not 
using CPU 0, which should remain available for
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  135   *  the 
primary / parent VM. Offline the CPUs from the
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  136   *  pool 
after the checks passed.
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  137   * @ne_cpu_list:   The CPU 
list used for setting NE CPU pool.
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  138   *
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  139   * Context: Process context.
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  140   * Return:
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  141   * * 0 on success.
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  142   * * Negative return value on 
failure.
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  143   */
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  144  static int 
ne_setup_cpu_pool(const char *ne_cpu_list)
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  145  {
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  146 int core_id = -1;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  147 unsigned int cpu = 0;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  148 cpumask_var_t cpu_pool 
= NULL;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  149 unsigned int 
cpu_sibling = 0;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  150 unsigned int i = 0;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  151 int numa_node = -1;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  152 int rc = -EINVAL;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  153
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  154 if (!ne_cpu_list)
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  155 return 0;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  156
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  157 if 
(!zalloc_cpumask_var(&cpu_pool, GFP_KERNEL))
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  158 return -ENOMEM;
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  159
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  160 
mutex_lock(&ne_cpu_pool.mutex);
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  161
7d5c9a7dfa51e60 Andra Paraschiv 2020-08-05  162

[PATCH] dt-bindings: sound: Convert NXP spdif to json-schema

2020-08-05 Thread Anson Huang

Convert the NXP SPDIF binding to DT schema format using json-schema.

Signed-off-by: Anson Huang 
---
 .../devicetree/bindings/sound/fsl,spdif.txt|  68 -
 .../devicetree/bindings/sound/fsl,spdif.yaml   | 108 +
 2 files changed, 108 insertions(+), 68 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/sound/fsl,spdif.txt
 create mode 100644 Documentation/devicetree/bindings/sound/fsl,spdif.yaml

diff --git a/Documentation/devicetree/bindings/sound/fsl,spdif.txt 
b/Documentation/devicetree/bindings/sound/fsl,spdif.txt
deleted file mode 100644
index e1365b0..000
--- a/Documentation/devicetree/bindings/sound/fsl,spdif.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-Freescale Sony/Philips Digital Interface Format (S/PDIF) Controller
-
-The Freescale S/PDIF audio block is a stereo transceiver that allows the
-processor to receive and transmit digital audio via an coaxial cable or
-a fibre cable.
-
-Required properties:
-
-  - compatible : Compatible list, should contain one of the following
- compatibles:
- "fsl,imx35-spdif",
- "fsl,vf610-spdif",
- "fsl,imx6sx-spdif",
-
-  - reg: Offset and length of the register set for the 
device.
-
-  - interrupts : Contains the spdif interrupt.
-
-  - dmas   : Generic dma devicetree binding as described in
- Documentation/devicetree/bindings/dma/dma.txt.
-
-  - dma-names  : Two dmas have to be defined, "tx" and "rx".
-
-  - clocks : Contains an entry for each entry in clock-names.
-
-  - clock-names: Includes the following entries:
-   "core"The core clock of spdif controller.
-   "rxtx<0-7>"   Clock source list for tx and rx clock.
- This clock list should be identical to the source
- list connecting to the spdif clock mux in "SPDIF
- Transceiver Clock Diagram" of SoC reference manual.
- It can also be referred to TxClk_Source bit of
- register SPDIF_STC.
-   "spba"The spba clock is required when SPDIF is placed as a
- bus slave of the Shared Peripheral Bus and when two
- or more bus masters (CPU, DMA or DSP) try to access
- it. This property is optional depending on the SoC
- design.
-
-Optional properties:
-
-   - big-endian: If this property is absent, the native endian 
mode
- will be in use as default, or the big endian mode
- will be in use for all the device registers.
-
-Example:
-
-spdif: spdif@2004000 {
-   compatible = "fsl,imx35-spdif";
-   reg = <0x02004000 0x4000>;
-   interrupts = <0 52 0x04>;
-   dmas = <&sdma 14 18 0>,
-  <&sdma 15 18 0>;
-   dma-names = "rx", "tx";
-
-   clocks = <&clks 197>, <&clks 3>,
-  <&clks 197>, <&clks 107>,
-  <&clks 0>, <&clks 118>,
-  <&clks 62>, <&clks 139>,
-  <&clks 0>;
-   clock-names = "core", "rxtx0",
-   "rxtx1", "rxtx2",
-   "rxtx3", "rxtx4",
-   "rxtx5", "rxtx6",
-   "rxtx7";
-
-   big-endian;
-};
diff --git a/Documentation/devicetree/bindings/sound/fsl,spdif.yaml 
b/Documentation/devicetree/bindings/sound/fsl,spdif.yaml
new file mode 100644
index 000..819f37f
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/fsl,spdif.yaml
@@ -0,0 +1,108 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/sound/fsl,spdif.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Freescale Sony/Philips Digital Interface Format (S/PDIF) Controller
+
+maintainers:
+  - Shengjiu Wang 
+
+description: |
+  The Freescale S/PDIF audio block is a stereo transceiver that allows the
+  processor to receive and transmit digital audio via an coaxial cable or
+  a fibre cable.
+
+properties:
+  compatible:
+enum:
+  - fsl,imx35-spdif
+  - fsl,vf610-spdif
+  - fsl,imx6sx-spdif
+
+  reg:
+maxItems: 1
+
+  interrupts:
+maxItems: 1
+
+  dmas:
+items:
+  - description: DMA controller phandle and request line for RX
+  - description: DMA controller phandle and request line for TX
+
+  dma-names:
+items:
+  - const: rx
+  - const: tx
+
+  clocks:
+items:
+  - description: The core clock of spdif controller.
+  - description: Clock for tx0 and rx0.
+  - description: Clock for tx1 and rx1.
+  - description: Clock for tx2 and rx2.
+  - description: Clock for tx3 and rx3.
+  - description: Clock for tx4 and rx4.
+  - description: Clock for tx5 and rx5.
+  - description: Clock for tx6

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-08-05 Thread Hugh Dickins

Nice to see the +130.0% this morning.

I got back on to this on Monday, here's some follow-up.

On Sun, 26 Jul 2020, Hugh Dickins wrote:
> 
> The comparison runs have not yet completed (except for the one started
> early), but they have all got past the most interesting tests, and it's
> clear that they do not have the "failure"s seen with your patches.
> 
> From that I can only conclude that your patches make a difference.
> 
> I've deduced nothing useful from the logs, will have to leave that
> to others here with more experience of them.  But my assumption now
> is that you have successfully removed one bottleneck, so the tests
> get somewhat further and now stick in the next bottleneck, whatever
> that may be.  Which shows up as "failure", where the unlock_page()
> wake_up_page_bit() bottleneck had allowed the tests to proceed in
> a more serially sedate way.

Yes, that's still how it appears to me. The test failures, all
of them, came from fork() returning ENOSPC, which originated from
alloc_pid()'s idr_alloc_cyclic(). I did try doubling our already
large pid_max, that did not work out well, there are probably good
reasons for it to be where it is and I was wrong to dabble with it.
I also tried an rcu_barrier() and retry when getting -ENOSPC there,
thinking maybe RCU was not freeing them up fast enough, but that
didn't help either.

I think (but didn't quite make the effort to double-check with
an independent count) it was simply running out of pids: that your
change speeds up the forking enough, that exiting could not quite keep
up (SIGCHLD is SIG_IGNed); whereas before your change, the unlock_page()
in do_wp_page(), on a PageAnon stack page, slowed the forking down enough
when heavily contended.

(I think we could improve the checks there, to avoid taking page lock in
more cases; but I don't know if that would help any real-life workload -
I see now that Michal's case is do_read_fault() not do_wp_page().)

And FWIW a further speedup there is the opposite of what these tests
are wanting: for the moment I've enabled a delay to get them passing
as before.

Something I was interested to realize in looking at this: trylock_page()
on a contended lock is now much less likely to jump the queue and
succeed than before, since your lock holder hands off the page lock to
the next holder: much smaller window than waiting for the next to wake
to take it. Nothing wrong with that, but effect might be seen somewhere.

> 
> The xhci handle_cmd_completion list_del bugs (on an older version
> of the driver): weird, nothing to do with page wakeups, I'll just
> have to assume that it's some driver bug exposed by the greater
> stress allowed down, and let driver people investigate (if it
> still manifests) when we take in your improvements.

Complete red herring. I'll give Greg more info in response to his
mail, and there may be an xhci bug in there; but when I looked back,
found I'd come across the same bug back in October, and find that
occasionally it's been seen in our fleet. Yes, it's odd that your
change coincided with it becoming more common on that machine
(which I've since replaced by another), yes it's funny that it's
in __list_del_entry_valid(), which is exactly where I got crashes
on pages with your initial patch; but it's just a distraction.

> 
> One nice thing from the comparison runs without your patches:
> watchdog panic did crash one of those with exactly the unlock_page()
> wake_up_page_bit() softlockup symptom we've been fighting, that did
> not appear with your patches.  So although the sample size is much
> too small to justify a conclusion, it does tend towards confirming
> your changes.
> 
> Thank you for your work on this! And I'm sure you'd have preferred
> some hard data back, rather than a diary of my mood swings, but...
> we do what we can.
> 
> Hugh

Re: [PATCH v2 2/2] dma-pool: Only allocate from CMA when in same memory zone

2020-08-05 Thread Christoph Hellwig

On Tue, Aug 04, 2020 at 11:43:15AM +0200, Nicolas Saenz Julienne wrote:
> > Second I don't see the need (and actually some harm) in preventing 
> > GFP_KERNEL
> > allocations from dipping into lower CMA areas - something that we did 
> > support
> > before 5.8 with the single pool.
> 
> My thinking is the least we pressure CMA the better, it's generally scarse, 
> and
> it'll not grow as the atomic pools grow. As far as harm is concerned, we now
> check addresses for correctness, so we shouldn't run into problems.
> 
> There is a potential case for architectures defining a default CMA but not
> defining DMA zones where this could be problematic. But isn't that just plain
> abusing CMA? If you need low memory allocations, you should be defining DMA
> zones.

The latter is pretty much what I expect, as we only support the default and
per-device DMA CMAs.

[PATCH 9/9] scsi: ufs: Properly release resources if a task is aborted successfully

2020-08-05 Thread Can Guo

In current UFS task abort hook, namely ufshcd_abort(), if a task is
aborted successfully, clock scaling busy time statistics is not updated
and, most important, clk_gating.active_reqs is not decreased, which makes
clk_gating.active_reqs stay above zero forever, thus clock gating would
never happen. To fix it, instead of releasing resources "mannually", use
the existing func __ufshcd_transfer_req_compl(). This can also eliminate
racing of scsi_dma_unmap() from the real completion in IRQ handler path.

Signed-off-by: Can Guo 
CC: Stanley Chu 
Reviewed-by: Stanley Chu 

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index b2947ab..9541fc7 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -6636,11 +6636,8 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
goto out;
}
 
-   scsi_dma_unmap(cmd);
-
spin_lock_irqsave(host->host_lock, flags);
-   ufshcd_outstanding_req_clear(hba, tag);
-   hba->lrb[tag].cmd = NULL;
+   __ufshcd_transfer_req_compl(hba, (1UL << tag));
spin_unlock_irqrestore(host->host_lock, flags);
 
 out:
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.

[PATCH 1/9] scsi: ufs: Add checks before setting clk-gating states

2020-08-05 Thread Can Guo

Clock gating features can be turned on/off selectively which means its
state information is only important if it is enabled. This change makes
sure that we only look at state of clk-gating if it is enabled.

Signed-off-by: Can Guo 
Reviewed-by: Avri Altman 
Reviewed-by: Hongwu Su 
Reviewed-by: Stanley Chu 
Reviewed-by: Bean Huo 

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 3076222..5acb38c 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -1839,6 +1839,8 @@ static void ufshcd_init_clk_gating(struct ufs_hba *hba)
if (!ufshcd_is_clkgating_allowed(hba))
return;
 
+   hba->clk_gating.state = CLKS_ON;
+
hba->clk_gating.delay_ms = 150;
INIT_DELAYED_WORK(&hba->clk_gating.gate_work, ufshcd_gate_work);
INIT_WORK(&hba->clk_gating.ungate_work, ufshcd_ungate_work);
@@ -2541,7 +2543,8 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *cmd)
err = SCSI_MLQUEUE_HOST_BUSY;
goto out;
}
-   WARN_ON(hba->clk_gating.state != CLKS_ON);
+   WARN_ON(ufshcd_is_clkgating_allowed(hba) &&
+   (hba->clk_gating.state != CLKS_ON));
 
lrbp = &hba->lrb[tag];
 
@@ -8326,8 +8329,11 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
/* If link is active, device ref_clk can't be switched off */
__ufshcd_setup_clocks(hba, false, true);
 
-   hba->clk_gating.state = CLKS_OFF;
-   trace_ufshcd_clk_gating(dev_name(hba->dev), hba->clk_gating.state);
+   if (ufshcd_is_clkgating_allowed(hba)) {
+   hba->clk_gating.state = CLKS_OFF;
+   trace_ufshcd_clk_gating(dev_name(hba->dev),
+   hba->clk_gating.state);
+   }
 
/* Put the host controller in low power mode if possible */
ufshcd_hba_vreg_set_lpm(hba);
@@ -8467,6 +8473,11 @@ static int ufshcd_resume(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
if (hba->clk_scaling.is_allowed)
ufshcd_suspend_clkscaling(hba);
ufshcd_setup_clocks(hba, false);
+   if (ufshcd_is_clkgating_allowed(hba)) {
+   hba->clk_gating.state = CLKS_OFF;
+   trace_ufshcd_clk_gating(dev_name(hba->dev),
+   hba->clk_gating.state);
+   }
 out:
hba->pm_op_in_progress = 0;
if (ret)
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.

[PATCH 7/9] scsi: ufs: Move dumps in IRQ handler to error handler

2020-08-05 Thread Can Guo

Sometime dumps in IRQ handler are heavy enough to cause system stability
issues, move them to error handler and only print basic host regs here.

Signed-off-by: Can Guo 
Reviewed-by: Bean Huo 

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 6a10003..a79fbbd 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -5696,6 +5696,19 @@ static void ufshcd_err_handler(struct work_struct *work)
UFSHCD_UIC_DL_TCx_REPLAY_ERROR
needs_reset = true;
 
+   if (hba->saved_err & (INT_FATAL_ERRORS | UIC_ERROR |
+ UFSHCD_UIC_HIBERN8_MASK)) {
+   bool pr_prdt = !!(hba->saved_err & SYSTEM_BUS_FATAL_ERROR);
+
+   spin_unlock_irqrestore(hba->host->host_lock, flags);
+   ufshcd_print_host_state(hba);
+   ufshcd_print_pwr_info(hba);
+   ufshcd_print_host_regs(hba);
+   ufshcd_print_tmrs(hba, hba->outstanding_tasks);
+   ufshcd_print_trs(hba, hba->outstanding_reqs, pr_prdt);
+   spin_lock_irqsave(hba->host->host_lock, flags);
+   }
+
/*
 * if host reset is required then skip clearing the pending
 * transfers forcefully because they will get cleared during
@@ -5915,18 +5928,12 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba 
*hba)
 
/* dump controller state before resetting */
if (hba->saved_err & (INT_FATAL_ERRORS | UIC_ERROR)) {
-   bool pr_prdt = !!(hba->saved_err &
-   SYSTEM_BUS_FATAL_ERROR);
-
dev_err(hba->dev, "%s: saved_err 0x%x saved_uic_err 
0x%x\n",
__func__, hba->saved_err,
hba->saved_uic_err);
-
-   ufshcd_print_host_regs(hba);
+   ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE,
+"host_regs: ");
ufshcd_print_pwr_info(hba);
-   ufshcd_print_tmrs(hba, hba->outstanding_tasks);
-   ufshcd_print_trs(hba, hba->outstanding_reqs,
-   pr_prdt);
}
ufshcd_schedule_eh_work(hba);
retval |= IRQ_HANDLED;
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.

[PATCH 6/9] scsi: ufs: Recover hba runtime PM error in error handler

2020-08-05 Thread Can Guo

Current error handler cannot work well or recover hba runtime PM error if
ufshcd_suspend/resume has failed due to UFS errors, e.g. hibern8 enter/exit
error or SSU cmd error. When this happens, error handler may fail doing
full reset and restore because error handler always assumes that powers,
IRQs and clocks are ready after pm_runtime_get_sync returns, but actually
they are not if ufshcd_reusme fails [1]. Besides, if ufschd_suspend/resume
fails due to UFS error, runtime PM framework saves the error value to
dev.power.runtime_error. After that, hba dev runtime suspend/resume would
not be invoked anymore unless runtime_error is cleared [2].

In case of ufshcd_suspend/resume fails due to UFS errors, for scenario [1],
error handler cannot assume anything of pm_runtime_get_sync, meaning error
handler should explicitly turn ON powers, IRQs and clocks again. To get the
hba runtime PM work as regard for scenario [2], error handler can clear the
runtime_error by calling pm_runtime_set_active() if full reset and restore
succeeds. And, more important, if pm_runtime_set_active() returns no error,
which means runtime_error has been cleared, we also need to resume those
scsi devices under hba in case any of them has failed to be resumed due to
hba runtime resume failure. This is to unblock blk_queue_enter in case
there are bios waiting inside it.

Signed-off-by: Can Guo 
Reviewed-by: Bean Huo 

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 2604016..6a10003 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ufshcd.h"
 #include "ufs_quirks.h"
 #include "unipro.h"
@@ -229,6 +230,10 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba);
 static int ufshcd_change_power_mode(struct ufs_hba *hba,
 struct ufs_pa_layer_attr *pwr_mode);
 static void ufshcd_schedule_eh_work(struct ufs_hba *hba);
+static int ufshcd_setup_hba_vreg(struct ufs_hba *hba, bool on);
+static int ufshcd_setup_vreg(struct ufs_hba *hba, bool on);
+static inline int ufshcd_config_vreg_hpm(struct ufs_hba *hba,
+struct ufs_vreg *vreg);
 static int ufshcd_wb_buf_flush_enable(struct ufs_hba *hba);
 static int ufshcd_wb_buf_flush_disable(struct ufs_hba *hba);
 static int ufshcd_wb_ctrl(struct ufs_hba *hba, bool enable);
@@ -5553,6 +5558,84 @@ static inline void ufshcd_schedule_eh_work(struct 
ufs_hba *hba)
}
 }
 
+static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
+{
+   pm_runtime_get_sync(hba->dev);
+   if (pm_runtime_suspended(hba->dev)) {
+   /*
+* Don't assume anything of pm_runtime_get_sync(), if
+* resume fails, irq and clocks can be OFF, and powers
+* can be OFF or in LPM.
+*/
+   ufshcd_setup_hba_vreg(hba, true);
+   ufshcd_enable_irq(hba);
+   ufshcd_setup_vreg(hba, true);
+   ufshcd_config_vreg_hpm(hba, hba->vreg_info.vccq);
+   ufshcd_config_vreg_hpm(hba, hba->vreg_info.vccq2);
+   ufshcd_hold(hba, false);
+   if (!ufshcd_is_clkgating_allowed(hba))
+   ufshcd_setup_clocks(hba, true);
+   ufshcd_release(hba);
+   ufshcd_vops_resume(hba, UFS_RUNTIME_PM);
+   } else {
+   ufshcd_hold(hba, false);
+   if (hba->clk_scaling.is_allowed) {
+   cancel_work_sync(&hba->clk_scaling.suspend_work);
+   cancel_work_sync(&hba->clk_scaling.resume_work);
+   ufshcd_suspend_clkscaling(hba);
+   }
+   }
+}
+
+static void ufshcd_err_handling_unprepare(struct ufs_hba *hba)
+{
+   ufshcd_release(hba);
+   if (hba->clk_scaling.is_allowed)
+   ufshcd_resume_clkscaling(hba);
+   pm_runtime_put(hba->dev);
+}
+
+static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba)
+{
+   return (hba->ufshcd_state == UFSHCD_STATE_ERROR ||
+   (!(hba->saved_err || hba->saved_uic_err || hba->force_reset ||
+   ufshcd_is_link_broken(hba;
+}
+
+#ifdef CONFIG_PM
+static void ufshcd_recover_pm_error(struct ufs_hba *hba)
+{
+   struct Scsi_Host *shost = hba->host;
+   struct scsi_device *sdev;
+   struct request_queue *q;
+   int ret;
+
+   /*
+* Set RPM status of hba device to RPM_ACTIVE,
+* this also clears its runtime error.
+*/
+   ret = pm_runtime_set_active(hba->dev);
+   /*
+* If hba device had runtime error, we also need to resume those
+* scsi devices under hba in case any of them has failed to be
+* resumed due to hba runtime resume failure. This is to unblock
+* blk_queue_enter in case there are bios waiting inside it.
+*/
+   if (!ret) {
+   list_for_each_entry(sdev, &shost->__devices, sibl

[PATCH 5/9] scsi: ufs: Fix concurrency of error handler and other error recovery paths

2020-08-05 Thread Can Guo

Error recovery can be invoked from multiple paths, including hibern8
enter/exit (from ufshcd_link_recovery), ufshcd_eh_host_reset_handler and
eh_work scheduled from IRQ context. Ultimately, these paths are trying to
invoke ufshcd_reset_and_restore, in either sync or async manner.

Having both sync and async manners at the same time has some problems

- If link recovery happens during ungate work, ufshcd_hold() would be
  called recursively. Although commit 53c12d0ef6fcb
  ("scsi: ufs: fix error recovery after the hibern8 exit failure") [1]
  fixed a deadlock due to recursive calls of ufshcd_hold() by adding a
  check of eh_in_progress into ufshcd_hold, this check allows eh_work to
  run in parallel while link recovery is running.

- Similar concurrency can also happen when error recovery is invoked from
  ufshcd_eh_host_reset_handler and ufshcd_link_recovery.

- Concurrency can even happen between eh_works. eh_work, currently queued
  on system_wq, is allowed to have multiple instances running in parallel,
  but we don't have proper protection for that.

If any of above concurrency happens, error recovery would fail and lead
ufs device and host into bad states. To fix the concurrency problem, this
change queues eh_work on a single threaded workqueue and remove link
recovery calls from hibern8 enter/exit path. Meanwhile, make use of eh_work
in eh_host_reset_handler instead of calling ufshcd_reset_and_restore. This
unifies UFS error recovery mechanism.

In addition, according to the UFSHCI JEDEC spec, hibern8 enter/exit error
occurs when the link is broken. This essentially applies to any power mode
change operations (since they all use PACP_PWR cmds in UniPro layer). So,
in this change, if a power mode change operation (including AH8 enter/exit)
fails, mark link state as UIC_LINK_BROKEN_STATE and schedule the eh_work.
In this case, error handler needs to do a full reset and restore to recover
the link back to active. Before the link state is recovered to active,
ufshcd_uic_pwr_ctrl simply returns -ENOLINK to avoid more errors.

Signed-off-by: Can Guo 
Reviewed-by: Bean Huo 

diff --git a/drivers/scsi/ufs/ufs-sysfs.c b/drivers/scsi/ufs/ufs-sysfs.c
index 2d71d23..02d379f00 100644
--- a/drivers/scsi/ufs/ufs-sysfs.c
+++ b/drivers/scsi/ufs/ufs-sysfs.c
@@ -16,6 +16,7 @@ static const char *ufschd_uic_link_state_to_string(
case UIC_LINK_OFF_STATE:return "OFF";
case UIC_LINK_ACTIVE_STATE: return "ACTIVE";
case UIC_LINK_HIBERN8_STATE:return "HIBERN8";
+   case UIC_LINK_BROKEN_STATE: return "BROKEN";
default:return "UNKNOWN";
}
 }
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 71c650f..2604016 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -228,6 +228,7 @@ static int ufshcd_scale_clks(struct ufs_hba *hba, bool 
scale_up);
 static irqreturn_t ufshcd_intr(int irq, void *__hba);
 static int ufshcd_change_power_mode(struct ufs_hba *hba,
 struct ufs_pa_layer_attr *pwr_mode);
+static void ufshcd_schedule_eh_work(struct ufs_hba *hba);
 static int ufshcd_wb_buf_flush_enable(struct ufs_hba *hba);
 static int ufshcd_wb_buf_flush_disable(struct ufs_hba *hba);
 static int ufshcd_wb_ctrl(struct ufs_hba *hba, bool enable);
@@ -1571,11 +1572,6 @@ int ufshcd_hold(struct ufs_hba *hba, bool async)
spin_lock_irqsave(hba->host->host_lock, flags);
hba->clk_gating.active_reqs++;
 
-   if (ufshcd_eh_in_progress(hba)) {
-   spin_unlock_irqrestore(hba->host->host_lock, flags);
-   return 0;
-   }
-
 start:
switch (hba->clk_gating.state) {
case CLKS_ON:
@@ -1653,6 +1649,7 @@ static void ufshcd_gate_work(struct work_struct *work)
struct ufs_hba *hba = container_of(work, struct ufs_hba,
clk_gating.gate_work.work);
unsigned long flags;
+   int ret;
 
spin_lock_irqsave(hba->host->host_lock, flags);
/*
@@ -1679,8 +1676,11 @@ static void ufshcd_gate_work(struct work_struct *work)
 
/* put the link into hibern8 mode before turning off clocks */
if (ufshcd_can_hibern8_during_gating(hba)) {
-   if (ufshcd_uic_hibern8_enter(hba)) {
+   ret = ufshcd_uic_hibern8_enter(hba);
+   if (ret) {
hba->clk_gating.state = CLKS_ON;
+   dev_err(hba->dev, "%s: hibern8 enter failed %d\n",
+   __func__, ret);
trace_ufshcd_clk_gating(dev_name(hba->dev),
hba->clk_gating.state);
goto out;
@@ -1725,11 +1725,10 @@ static void __ufshcd_release(struct ufs_hba *hba)
 
hba->clk_gating.active_reqs--;
 
-   if (hba->clk_gating.active_reqs || hba->clk_gating.is_suspended
-   || hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL
-   || ufshc

[PATCH 8/9] scsi: ufs: Fix a racing problem btw error handler and runtime PM ops

2020-08-05 Thread Can Guo

Current IRQ handler blocks scsi requests before scheduling eh_work, when
error handler calls pm_runtime_get_sync, if ufshcd_suspend/resume sends a
scsi cmd, most likely the SSU cmd, since scsi requests are blocked,
pm_runtime_get_sync() will never return because ufshcd_suspend/reusme is
blocked by the scsi cmd. Some changes and code re-arrangement can be made
to resolve it.

o In queuecommand path, hba->ufshcd_state check and ufshcd_send_command
  should stay into the same spin lock. This is to make sure that no more
  commands leak into doorbell after hba->ufshcd_state is changed.
o Don't block scsi requests before error handler starts to run, let error
  handler block scsi requests when it is ready to start error recovery.
o Don't let scsi layer keep requeuing the scsi cmds sent from hba runtime
  PM ops, let them pass or fail them. Let them pass if eh_work is scheduled
  due to non-fatal errors. Fail them if eh_work is scheduled due to fatal
  errors, otherwise the cmds may eventually time out since UFS is in bad
  state, which gets error handler blocked for too long. If we fail the scsi
  cmds sent from hba runtime PM ops, hba runtime PM ops fails too, but it
  does not hurt since error handler can recover hba runtime PM error.

Signed-off-by: Can Guo 
Reviewed-by: Bean Huo 

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index a79fbbd..b2947ab 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -126,7 +126,8 @@ enum {
UFSHCD_STATE_RESET,
UFSHCD_STATE_ERROR,
UFSHCD_STATE_OPERATIONAL,
-   UFSHCD_STATE_EH_SCHEDULED,
+   UFSHCD_STATE_EH_SCHEDULED_FATAL,
+   UFSHCD_STATE_EH_SCHEDULED_NON_FATAL,
 };
 
 /* UFSHCD error handling flags */
@@ -2515,34 +2516,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *cmd)
if (!down_read_trylock(&hba->clk_scaling_lock))
return SCSI_MLQUEUE_HOST_BUSY;
 
-   spin_lock_irqsave(hba->host->host_lock, flags);
-   switch (hba->ufshcd_state) {
-   case UFSHCD_STATE_OPERATIONAL:
-   break;
-   case UFSHCD_STATE_EH_SCHEDULED:
-   case UFSHCD_STATE_RESET:
-   err = SCSI_MLQUEUE_HOST_BUSY;
-   goto out_unlock;
-   case UFSHCD_STATE_ERROR:
-   set_host_byte(cmd, DID_ERROR);
-   cmd->scsi_done(cmd);
-   goto out_unlock;
-   default:
-   dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n",
-   __func__, hba->ufshcd_state);
-   set_host_byte(cmd, DID_BAD_TARGET);
-   cmd->scsi_done(cmd);
-   goto out_unlock;
-   }
-
-   /* if error handling is in progress, don't issue commands */
-   if (ufshcd_eh_in_progress(hba)) {
-   set_host_byte(cmd, DID_ERROR);
-   cmd->scsi_done(cmd);
-   goto out_unlock;
-   }
-   spin_unlock_irqrestore(hba->host->host_lock, flags);
-
hba->req_abort_count = 0;
 
err = ufshcd_hold(hba, true);
@@ -2578,11 +2551,51 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *cmd)
/* Make sure descriptors are ready before ringing the doorbell */
wmb();
 
-   /* issue command to the controller */
spin_lock_irqsave(hba->host->host_lock, flags);
+   switch (hba->ufshcd_state) {
+   case UFSHCD_STATE_OPERATIONAL:
+   case UFSHCD_STATE_EH_SCHEDULED_NON_FATAL:
+   break;
+   case UFSHCD_STATE_EH_SCHEDULED_FATAL:
+   /*
+* pm_runtime_get_sync() is used at error handling preparation
+* stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's
+* PM ops, it can never be finished if we let SCSI layer keep
+* retrying it, which gets err handler stuck forever. Neither
+* can we let the scsi cmd pass through, because UFS is in bad
+* state, the scsi cmd may eventually time out, which will get
+* err handler blocked for too long. So, just fail the scsi cmd
+* sent from PM ops, err handler can recover PM error anyways.
+*/
+   if (hba->pm_op_in_progress) {
+   hba->force_reset = true;
+   set_host_byte(cmd, DID_BAD_TARGET);
+   goto out_compl_cmd;
+   }
+   fallthrough;
+   case UFSHCD_STATE_RESET:
+   err = SCSI_MLQUEUE_HOST_BUSY;
+   goto out_compl_cmd;
+   case UFSHCD_STATE_ERROR:
+   set_host_byte(cmd, DID_ERROR);
+   goto out_compl_cmd;
+   default:
+   dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n",
+   __func__, hba->ufshcd_state);
+   set_host_byte(cmd, DID_BAD_TARGET);
+   goto out_compl_cmd;
+   }
ufshcd_send_command(hba, tag);
-out_unlock:

[PATCH 4/9] scsi: ufs: Add some debug infos to ufshcd_print_host_state

2020-08-05 Thread Can Guo

The infos of the last interrupt status and its timestamp are very helpful
when debug system stability issues, e.g. IRQ starvation, so add them to
ufshcd_print_host_state. Meanwhile, UFS device infos like model name and
its FW version also come in handy during debug. In addition, this change
makes cleanup to some prints in ufshcd_print_host_regs as similar prints
are already available in ufshcd_print_host_state.

Signed-off-by: Can Guo 
Reviewed-by: Avri Altman 
Reviewed-by: Hongwu Su 
Reviewed-by: Asutosh Das 
Reviewed-by: Stanley Chu 
Reviewed-by: Bean Huo 

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 5acb38c..71c650f 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -411,15 +411,6 @@ static void ufshcd_print_err_hist(struct ufs_hba *hba,
 static void ufshcd_print_host_regs(struct ufs_hba *hba)
 {
ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
-   dev_err(hba->dev, "hba->ufs_version = 0x%x, hba->capabilities = 0x%x\n",
-   hba->ufs_version, hba->capabilities);
-   dev_err(hba->dev,
-   "hba->outstanding_reqs = 0x%x, hba->outstanding_tasks = 0x%x\n",
-   (u32)hba->outstanding_reqs, (u32)hba->outstanding_tasks);
-   dev_err(hba->dev,
-   "last_hibern8_exit_tstamp at %lld us, hibern8_exit_cnt = %d\n",
-   ktime_to_us(hba->ufs_stats.last_hibern8_exit_tstamp),
-   hba->ufs_stats.hibern8_exit_cnt);
 
ufshcd_print_err_hist(hba, &hba->ufs_stats.pa_err, "pa_err");
ufshcd_print_err_hist(hba, &hba->ufs_stats.dl_err, "dl_err");
@@ -438,8 +429,6 @@ static void ufshcd_print_host_regs(struct ufs_hba *hba)
ufshcd_print_err_hist(hba, &hba->ufs_stats.host_reset, "host_reset");
ufshcd_print_err_hist(hba, &hba->ufs_stats.task_abort, "task_abort");
 
-   ufshcd_print_clk_freqs(hba);
-
ufshcd_vops_dbg_register_dump(hba);
 }
 
@@ -499,6 +488,8 @@ static void ufshcd_print_tmrs(struct ufs_hba *hba, unsigned 
long bitmap)
 
 static void ufshcd_print_host_state(struct ufs_hba *hba)
 {
+   struct scsi_device *sdev_ufs = hba->sdev_ufs_device;
+
dev_err(hba->dev, "UFS Host state=%d\n", hba->ufshcd_state);
dev_err(hba->dev, "outstanding reqs=0x%lx tasks=0x%lx\n",
hba->outstanding_reqs, hba->outstanding_tasks);
@@ -511,12 +502,24 @@ static void ufshcd_print_host_state(struct ufs_hba *hba)
dev_err(hba->dev, "Auto BKOPS=%d, Host self-block=%d\n",
hba->auto_bkops_enabled, hba->host->host_self_blocked);
dev_err(hba->dev, "Clk gate=%d\n", hba->clk_gating.state);
+   dev_err(hba->dev,
+   "last_hibern8_exit_tstamp at %lld us, hibern8_exit_cnt=%d\n",
+   ktime_to_us(hba->ufs_stats.last_hibern8_exit_tstamp),
+   hba->ufs_stats.hibern8_exit_cnt);
+   dev_err(hba->dev, "last intr at %lld us, last intr status=0x%x\n",
+   ktime_to_us(hba->ufs_stats.last_intr_ts),
+   hba->ufs_stats.last_intr_status);
dev_err(hba->dev, "error handling flags=0x%x, req. abort count=%d\n",
hba->eh_flags, hba->req_abort_count);
-   dev_err(hba->dev, "Host capabilities=0x%x, caps=0x%x\n",
-   hba->capabilities, hba->caps);
+   dev_err(hba->dev, "hba->ufs_version=0x%x, Host capabilities=0x%x, 
caps=0x%x\n",
+   hba->ufs_version, hba->capabilities, hba->caps);
dev_err(hba->dev, "quirks=0x%x, dev. quirks=0x%x\n", hba->quirks,
hba->dev_quirks);
+   if (sdev_ufs)
+   dev_err(hba->dev, "UFS dev info: %.8s %.16s rev %.4s\n",
+   sdev_ufs->vendor, sdev_ufs->model, sdev_ufs->rev);
+
+   ufshcd_print_clk_freqs(hba);
 }
 
 /**
@@ -5951,6 +5954,8 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
 
spin_lock(hba->host->host_lock);
intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
+   hba->ufs_stats.last_intr_status = intr_status;
+   hba->ufs_stats.last_intr_ts = ktime_get();
 
/*
 * There could be max of hba->nutrs reqs in flight and in worst case
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index b2ef18f..b7f54af 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -409,6 +409,8 @@ struct ufs_err_reg_hist {
 
 /**
  * struct ufs_stats - keeps usage/err statistics
+ * @last_intr_status: record the last interrupt status.
+ * @last_intr_ts: record the last interrupt timestamp.
  * @hibern8_exit_cnt: Counter to keep track of number of exits,
  * reset this after link-startup.
  * @last_hibern8_exit_tstamp: Set time after the hibern8 exit.
@@ -428,6 +430,9 @@ struct ufs_err_reg_hist {
  * @tsk_abort: tracks task abort events
  */
 struct ufs_stats {
+   u32 last_intr_status;
+   ktime_t last_intr_ts;
+
u32 hibern8_exit_cnt;
ktime_t last_hibern8_exit_tstamp;
 
-- 
Qualcomm Innovation Center, Inc. is a memb

[PATCH 3/9] scsi: ufs-qcom: Remove testbus dump in ufs_qcom_dump_dbg_regs

2020-08-05 Thread Can Guo

Dumping testbus registers is heavy enough to cause stability issues
sometime, just remove them as of now.

Signed-off-by: Can Guo 
Reviewed-by: Hongwu Su 
Reviewed-by: Avri Altman 
Reviewed-by: Bean Huo 

diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c
index 823eccf..6b75338 100644
--- a/drivers/scsi/ufs/ufs-qcom.c
+++ b/drivers/scsi/ufs/ufs-qcom.c
@@ -1630,44 +1630,12 @@ int ufs_qcom_testbus_config(struct ufs_qcom_host *host)
return 0;
 }
 
-static void ufs_qcom_testbus_read(struct ufs_hba *hba)
-{
-   ufshcd_dump_regs(hba, UFS_TEST_BUS, 4, "UFS_TEST_BUS ");
-}
-
-static void ufs_qcom_print_unipro_testbus(struct ufs_hba *hba)
-{
-   struct ufs_qcom_host *host = ufshcd_get_variant(hba);
-   u32 *testbus = NULL;
-   int i, nminor = 256, testbus_len = nminor * sizeof(u32);
-
-   testbus = kmalloc(testbus_len, GFP_KERNEL);
-   if (!testbus)
-   return;
-
-   host->testbus.select_major = TSTBUS_UNIPRO;
-   for (i = 0; i < nminor; i++) {
-   host->testbus.select_minor = i;
-   ufs_qcom_testbus_config(host);
-   testbus[i] = ufshcd_readl(hba, UFS_TEST_BUS);
-   }
-   print_hex_dump(KERN_ERR, "UNIPRO_TEST_BUS ", DUMP_PREFIX_OFFSET,
-   16, 4, testbus, testbus_len, false);
-   kfree(testbus);
-}
-
 static void ufs_qcom_dump_dbg_regs(struct ufs_hba *hba)
 {
ufshcd_dump_regs(hba, REG_UFS_SYS1CLK_1US, 16 * 4,
 "HCI Vendor Specific Registers ");
 
-   /* sleep a bit intermittently as we are dumping too much data */
ufs_qcom_print_hw_debug_reg_all(hba, NULL, ufs_qcom_dump_regs_wrapper);
-   udelay(1000);
-   ufs_qcom_testbus_read(hba);
-   udelay(1000);
-   ufs_qcom_print_unipro_testbus(hba);
-   udelay(1000);
 }
 
 /**
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.

[PATCH 2/9] ufs: ufs-qcom: Fix race conditions caused by func ufs_qcom_testbus_config

2020-08-05 Thread Can Guo

If ufs_qcom_dump_dbg_regs() calls ufs_qcom_testbus_config() from
ufshcd_suspend/resume and/or clk gate/ungate context, pm_runtime_get_sync()
and ufshcd_hold() will cause racing problems. Fix this by removing the
unnecessary calls of pm_runtime_get_sync() and ufshcd_hold().

Signed-off-by: Can Guo 
Reviewed-by: Hongwu Su 
Reviewed-by: Avri Altman 
Reviewed-by: Bean Huo 

diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c
index d0d7552..823eccf 100644
--- a/drivers/scsi/ufs/ufs-qcom.c
+++ b/drivers/scsi/ufs/ufs-qcom.c
@@ -1614,9 +1614,6 @@ int ufs_qcom_testbus_config(struct ufs_qcom_host *host)
 */
}
mask <<= offset;
-
-   pm_runtime_get_sync(host->hba->dev);
-   ufshcd_hold(host->hba, false);
ufshcd_rmwl(host->hba, TEST_BUS_SEL,
(u32)host->testbus.select_major << 19,
REG_UFS_CFG1);
@@ -1629,8 +1626,6 @@ int ufs_qcom_testbus_config(struct ufs_qcom_host *host)
 * committed before returning.
 */
mb();
-   ufshcd_release(host->hba);
-   pm_runtime_put_sync(host->hba->dev);
 
return 0;
 }
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.

Re: [PATCH] Replace HTTP links with HTTPS ones: Ext4

2020-08-05 Thread tytso

On Mon, Jul 06, 2020 at 09:03:39PM +0200, Alexander A. Klimov wrote:
> Rationale:
> Reduces attack surface on kernel devs opening the links for MITM
> as HTTPS traffic is much harder to manipulate.

Thanks, applied.

- Ted

Re: [PATCH v9 8/9] scsi: ufs: Fix a racing problem btw error handler and runtime PM ops

2020-08-05 Thread Can Guo


On 2020-08-05 09:31, Martin K. Petersen wrote:

Can,


Current IRQ handler blocks scsi requests before scheduling eh_work,
when error handler calls pm_runtime_get_sync, if ufshcd_suspend/resume
sends a scsi cmd, most likely the SSU cmd, since scsi requests are
blocked, pm_runtime_get_sync() will never return because
ufshcd_suspend/reusme is blocked by the scsi cmd. Some changes and
code re-arrangement can be made to resolve it.


  CC [M]  drivers/scsi/ufs/ufshcd.o
drivers/scsi/ufs/ufshcd.c: In function ‘ufshcd_queuecommand’:
drivers/scsi/ufs/ufshcd.c:2570:6: error: this statement may fall
through [-Werror=implicit-fallthrough=]
 2570 |   if (hba->pm_op_in_progress) {
  |  ^
drivers/scsi/ufs/ufshcd.c:2575:2: note: here
 2575 |  case UFSHCD_STATE_RESET:
  |  ^~~~
cc1: all warnings being treated as errors
make[3]: *** [scripts/Makefile.build:280: drivers/scsi/ufs/ufshcd.o] 
Error 1

make[2]: *** [scripts/Makefile.build:497: drivers/scsi/ufs] Error 2
make[1]: *** [scripts/Makefile.build:497: drivers/scsi] Error 2
make: *** [Makefile:1764: drivers] Error 2


Thanks Martin, will fix it in next version.

Can Guo.

drivers/usb/host/ehci.h:743:17: sparse: sparse: incorrect type in argument 1 (different address spaces)

2020-08-05 Thread kernel test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   fffe3ae0ee84e25d2befe2ae59bc32aa2b6bc77b
commit: 670d0a4b10704667765f7d18f7592993d02783aa sparse: use identifiers to 
define address spaces
date:   7 weeks ago
config: mips-randconfig-s031-20200806 (attached as .config)
compiler: mips-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.2-117-g8c7aee71-dirty
git checkout 670d0a4b10704667765f7d18f7592993d02783aa
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 
CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


sparse warnings: (new ones prefixed by >>)

   drivers/usb/host/ehci-hcd.c: note: in included file:
   drivers/usb/host/ehci-q.c:1389:27: sparse: sparse: incorrect type in 
assignment (different base types) @@ expected restricted __hc32 [usertype] 
old_current @@ got int @@
   drivers/usb/host/ehci-q.c:1389:27: sparse: expected restricted __hc32 
[usertype] old_current
   drivers/usb/host/ehci-q.c:1389:27: sparse: got int
   drivers/usb/host/ehci-hcd.c: note: in included file:
   drivers/usb/host/ehci-mem.c:188:24: sparse: sparse: incorrect type in 
assignment (different base types) @@ expected restricted __hc32 [usertype] 
*periodic @@ got restricted __le32 [usertype] * @@
   drivers/usb/host/ehci-mem.c:188:24: sparse: expected restricted __hc32 
[usertype] *periodic
   drivers/usb/host/ehci-mem.c:188:24: sparse: got restricted __le32 
[usertype] *
   drivers/usb/host/ehci-hcd.c:566:27: sparse: sparse: incorrect type in 
assignment (different base types) @@ expected restricted __hc32 [usertype] 
old_current @@ got int @@
   drivers/usb/host/ehci-hcd.c:566:27: sparse: expected restricted __hc32 
[usertype] old_current
   drivers/usb/host/ehci-hcd.c:566:27: sparse: got int
   drivers/usb/host/ehci-hcd.c: note: in included file:
>> drivers/usb/host/ehci.h:743:17: sparse: sparse: incorrect type in argument 1 
>> (different address spaces) @@ expected void const volatile [noderef] 
>> __iomem *mem @@ got unsigned int * @@
>> drivers/usb/host/ehci.h:743:17: sparse: expected void const volatile 
>> [noderef] __iomem *mem
   drivers/usb/host/ehci.h:743:17: sparse: got unsigned int *
   drivers/usb/host/ehci.h:743:17: sparse: sparse: cast to restricted __be32
   drivers/usb/host/ehci-hcd.c: note: in included file (through 
arch/mips/include/asm/mmiowb.h, include/linux/spinlock.h, 
include/linux/seqlock.h, ...):
   arch/mips/include/asm/io.h:354:1: sparse: sparse: cast to restricted __le32
   arch/mips/include/asm/io.h:354:1: sparse: sparse: cast to restricted __le32
   arch/mips/include/asm/io.h:354:1: sparse: sparse: cast to restricted __le32
   arch/mips/include/asm/io.h:354:1: sparse: sparse: cast to restricted __le32
   arch/mips/include/asm/io.h:354:1: sparse: sparse: cast to restricted __le32
   arch/mips/include/asm/io.h:354:1: sparse: sparse: cast to restricted __le32
   drivers/usb/host/ehci-hcd.c: note: in included file:
>> drivers/usb/host/ehci.h:743:17: sparse: sparse: incorrect type in argument 1 
>> (different address spaces) @@ expected void const volatile [noderef] 
>> __iomem *mem @@ got unsigned int * @@
>> drivers/usb/host/ehci.h:743:17: sparse: expected void const volatile 
>> [noderef] __iomem *mem
   drivers/usb/host/ehci.h:743:17: sparse: got unsigned int *
   drivers/usb/host/ehci.h:743:17: sparse: sparse: cast to restricted __be32
   drivers/usb/host/ehci-hcd.c: note: in included file (through 
arch/mips/include/asm/mmiowb.h, include/linux/spinlock.h, 
include/linux/seqlock.h, ...):
   arch/mips/include/asm/io.h:354:1: sparse: sparse: cast to restricted __le32
   arch/mips/include/asm/io.h:354:1: sparse: sparse: cast to restricted __le32
   arch/mips/include/asm/io.h:354:1: sparse: sparse: cast to restricted __le32
   arch/mips/include/asm/io.h:354:1: sparse: sparse: cast to restricted __le32
   arch/mips/include/asm/io.h:354:1: sparse: sparse: cast to restricted __le32
   arch/mips/include/asm/io.h:354:1: sparse: sparse: cast to restricted __le32
   drivers/usb/host/ehci-hcd.c: note: in included file:
>> drivers/usb/host/ehci.h:743:17: sparse: sparse: incorrect type in argument 1 
>> (different address spaces) @@ expected void const volatile [noderef] 
>> __iomem *mem @@ got unsigned int * @@
>> drivers/usb/host/ehci.h:743:17: sparse: expected void const volatile 
>> [noderef] __iomem *mem
   drivers/usb/host/ehci.h:743:17: sparse: got unsigned int *
   drivers/usb/host/ehci.h:743:17: sparse: sparse: cast to restricted __be32
   drivers/usb/host/ehci-hcd.c: note: in included file

[PATCH v4] PCI: Reduce warnings on possible RW1C corruption

2020-08-05 Thread Mark Tomlinson

For hardware that only supports 32-bit writes to PCI there is the
possibility of clearing RW1C (write-one-to-clear) bits. A rate-limited
messages was introduced by fb2659230120, but rate-limiting is not the
best choice here. Some devices may not show the warnings they should if
another device has just produced a bunch of warnings. Also, the number
of messages can be a nuisance on devices which are otherwise working
fine.

This patch changes the ratelimit to a single warning per bus. This
ensures no bus is 'starved' of emitting a warning and also that there
isn't a continuous stream of warnings. It would be preferable to have a
warning per device, but the pci_dev structure is not available here, and
a lookup from devfn would be far too slow.

Suggested-by: Bjorn Helgaas 
Fixes: fb2659230120 ("PCI: Warn on possible RW1C corruption for sub-32 bit 
config writes")
Signed-off-by: Mark Tomlinson 
---
changes in v4:
 - Use bitfield rather than bool to save memory (was meant to be in v3).

 drivers/pci/access.c | 9 ++---
 include/linux/pci.h  | 1 +
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 79c4a2ef269a..b452467fd133 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -160,9 +160,12 @@ int pci_generic_config_write32(struct pci_bus *bus, 
unsigned int devfn,
 * write happen to have any RW1C (write-one-to-clear) bits set, we
 * just inadvertently cleared something we shouldn't have.
 */
-   dev_warn_ratelimited(&bus->dev, "%d-byte config write to 
%04x:%02x:%02x.%d offset %#x may corrupt adjacent RW1C bits\n",
-size, pci_domain_nr(bus), bus->number,
-PCI_SLOT(devfn), PCI_FUNC(devfn), where);
+   if (!bus->unsafe_warn) {
+   dev_warn(&bus->dev, "%d-byte config write to %04x:%02x:%02x.%d 
offset %#x may corrupt adjacent RW1C bits\n",
+size, pci_domain_nr(bus), bus->number,
+PCI_SLOT(devfn), PCI_FUNC(devfn), where);
+   bus->unsafe_warn = 1;
+   }
 
mask = ~(((1 << (size * 8)) - 1) << ((where & 0x3) * 8));
tmp = readl(addr) & mask;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 34c1c4f45288..85211a787f8b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -626,6 +626,7 @@ struct pci_bus {
struct bin_attribute*legacy_io; /* Legacy I/O for this bus */
struct bin_attribute*legacy_mem;/* Legacy mem */
unsigned intis_added:1;
+   unsigned intunsafe_warn:1;  /* warned about RW1C config 
write */
 };
 
 #define to_pci_bus(n)  container_of(n, struct pci_bus, dev)
-- 
2.28.0

[PATCH] softirq: add irq off checking for __raise_softirq_irqoff

2020-08-05 Thread Jiafei Pan

__raise_softirq_irqoff will update per-CPU mask of pending softirqs,
it need to be called in irq disabled context in order to keep it atomic
operation, otherwise it will be interrupted by hardware interrupt,
and per-CPU softirqs pending mask will be corrupted, the result is
there will be unexpected issue, for example hrtimer soft irq will
be losed and soft hrtimer will never be expire and handled.

Adding irqs disabled checking here to provide warning in irqs enabled
context.

Signed-off-by: Jiafei Pan 
---
 kernel/softirq.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index bf88d7f62433..11f61e54a3ae 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -481,6 +481,11 @@ void raise_softirq(unsigned int nr)
 
 void __raise_softirq_irqoff(unsigned int nr)
 {
+   /* This function can only be called in irq disabled context,
+* otherwise or_softirq_pending will be interrupted by hardware
+* interrupt, so that there will be unexpected issue.
+*/
+   WARN_ON_ONCE(!irqs_disabled());
trace_softirq_raise(nr);
or_softirq_pending(1UL << nr);
 }
-- 
2.17.1

[PATCH] leds: Add an optional property named 'sdb-gpios'

2020-08-05 Thread Grant Feng

The chip enters hardware shutdown when the SDB pin is pulled low.
The chip releases hardware shutdown when the SDB pin is pulled high.

Signed-off-by: Grant Feng 
---
 Documentation/devicetree/bindings/leds/leds-is31fl32xx.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/leds/leds-is31fl32xx.txt 
b/Documentation/devicetree/bindings/leds/leds-is31fl32xx.txt
index 926c2117942c..94f02827fd83 100644
--- a/Documentation/devicetree/bindings/leds/leds-is31fl32xx.txt
+++ b/Documentation/devicetree/bindings/leds/leds-is31fl32xx.txt
@@ -15,6 +15,8 @@ Required properties:
 - reg: I2C slave address
 - address-cells : must be 1
 - size-cells : must be 0
+- sdb-gpios : (optional)
+  Specifier of the GPIO connected to SDB pin.
 
 LED sub-node properties:
 - reg : LED channel number (1..N)
@@ -31,6 +33,7 @@ is31fl3236: led-controller@3c {
reg = <0x3c>;
#address-cells = <1>;
#size-cells = <0>;
+   sdb-gpios = <&gpio0 11 GPIO_ACTIVE_HIGH>;
 
led@1 {
reg = <1>;
-- 
2.17.1

RE: [EXT] Re: [PATCH v4 2/2] net: dsa: ocelot: Add support for QinQ Operation

2020-08-05 Thread Hongbo Wang

> On 8/3/2020 11:36 PM, Hongbo Wang wrote:
> >>> + if (vlan->proto == ETH_P_8021AD) {
> >>> + ocelot->enable_qinq = true;
> >>> + ocelot_port->qinq_mode = true;
> >>> + }
> >>  ...
> >>> + if (vlan->proto == ETH_P_8021AD) {
> >>> + ocelot->enable_qinq = false;
> >>> + ocelot_port->qinq_mode = false;
> >>> + }
> >>> +
> >>
> >> I don't understand how this can work just by using a boolean to track
> >> the state.
> >>
> >> This won't work properly if you are handling multiple QinQ VLAN entries.
> >>
> >> Also, I need Andrew and Florian to review and ACK the DSA layer
> >> changes that add the protocol value to the device notifier block.
> >
> > Hi David,
> > Thanks for reply.
> >
> > When setting bridge's VLAN protocol to 802.1AD by the command "ip link
> > set br0 type bridge vlan_protocol 802.1ad", it will call
> > dsa_slave_vlan_rx_add(dev, proto, vid) for every port in the bridge,
> > the parameter vid is port's pvid 1, if pvid's proto is 802.1AD, I will
> > enable switch's enable_qinq, and the related port's qinq_mode,
> >
> > When there are multiple QinQ VLAN entries, If one VLAN's proto is 802.1AD,
> I will enable switch and the related port into QinQ mode.
> 
> The enabling appears fine, the problem is the disabling, the first 802.1AD 
> VLAN
> entry that gets deleted will lead to the port and switch no longer being in 
> QinQ
> mode, and this does not look intended.
> --
> Florian

When I try to add reference counter, I found that:
1.
the command "ip link set br0 type bridge vlan_protocol 802.1ad" call path is:
br_changelink -> __br_vlan_set_proto -> vlan_vid_add -> ... -> 
ndo_vlan_rx_add_vid -> dsa_slave_vlan_rx_add_vid(dev, proto, vid) -> 
felix_vlan_add

dsa_slave_vlan_rx_add_vid can pass correct protocol and vid(1) to ocelot driver.

vlan_vid_add is in net/8021q/vlan_core.c, it maintains a vid_list that stores 
the map of vid and protocol,
the function vlan_vid_info_get can read the map.

but when deleting bridge using "ip link del dev br0 type bridge", the call path 
is:
br_dev_delete -> ... -> br_switchdev_port_vlan_del -> ... -> 
dsa_slave_port_obj_del -> dsa_slave_vlan_del -> ... -> felix_vlan_del

br_switchdev_port_vlan_del is in net/bridge/br_switchdev.c, it didn't have the 
list for map vid and protocol,
so it can't pass correct protocol that corresponding with vid to ocelot driver.

2.
For ocelot QinQ case, the switch port linked to customer has different actions 
with the port for ISP,

uplink: Customer LAN(CTAG) -> swp0(vlan_aware:0 pop_cnt:0) -> swp1(add STAG) -> 
ISP MAN(STAG + CTAG)
downlink: ISP MAN(STAG + CTAG) -> swp1(vlan_aware:1 pop_cnt:1, pop STAG) -> 
swp0(only CTAG) -> Customer LAN

the different action is descripted in "4.3.3 Provider Bridges and Q-in-Q 
Operation" in VSC99599_1_00_TS.pdf

so I need a standard command to set swp0 and swp1 for different mode, 
but "ip link set br0 type bridge vlan_protocol 802.1ad" will set all ports into 
the same mode, it's not my intent.

3.
I thought some ways to resovle the above issue:
a. br_switchdev_port_vlan_del will pass default value ETH_P_8021Q, but don't 
care it in felix_vlan_del.
b. In felix_vlan_add and felix_vlan_del, only when vid is ocelot_port's pvid, 
it enable or disable switch's enable_qinq.
c. Maybe I can use devlink to set swp0 and swp1 into different mode.
d. let br_switchdev_port_vlan_del call vlan_vid_info_get to get protocol for 
vid, but vlan_vid_info_get is static in vlan_core.c, so this need to add 
related functions in br_switchdev.c.

Any comments is welcome!

Thanks
Hongbo

Re: [PATCH v4 01/12] ASoC: qcom: Add common array to initialize soc based core clocks

2020-08-05 Thread Rohit Kumar


Thanks Stephen for reviewing.

On 8/6/2020 6:01 AM, Stephen Boyd wrote:

Quoting Rohit kumar (2020-07-22 03:31:44)

From: Ajit Pandey 

LPASS variants have their own soc specific clocks that needs to be
enabled for MI2S audio support. Added a common variable in drvdata to
initialize such clocks using bulk clk api. Such clock names is
defined in variants specific data and needs to fetched during init.

Why not just get all the clks and not even care about the names of them?
Use devm_clk_bulk_get_all() for that, unless some clks need to change
rates?


There is ahbix clk which needs clk rate to be set. Please check below 
patch in


the series for reference

[PATCH v5 02/12] ASoC: qcom: lpass-cpu: Move ahbix clk to platform 
specific function


Thanks,

Rohit

--
Qualcomm INDIA, on behalf of Qualcomm Innovation Center, Inc.is a member
of the Code Aurora Forum, hosted by the Linux Foundation.

Re: [PATCH v2] mm: vmstat: fix /proc/sys/vm/stat_refresh generating false warnings

2020-08-05 Thread Roman Gushchin

On Wed, Aug 05, 2020 at 08:01:33PM -0700, Hugh Dickins wrote:
> On Mon, 3 Aug 2020, Roman Gushchin wrote:
> > On Fri, Jul 31, 2020 at 07:17:05PM -0700, Hugh Dickins wrote:
> > > On Fri, 31 Jul 2020, Roman Gushchin wrote:
> > > > On Thu, Jul 30, 2020 at 09:06:55PM -0700, Hugh Dickins wrote:
> > > > > 
> > > > > Though another alternative did occur to me overnight: we could
> > > > > scrap the logged warning, and show "nr_whatever -53" as output
> > > > > from /proc/sys/vm/stat_refresh: that too would be acceptable
> > > > > to me, and you redirect to /dev/null.
> > > > 
> > > > It sounds like a good idea to me. Do you want me to prepare a patch?
> > > 
> > > Yes, if you like that one best, please do prepare a patch - thanks!
> > 
> > Hi Hugh,
> > 
> > I mastered a patch (attached below), but honestly I can't say I like it.
> > The resulting interface is confusing: we don't generally use sysctls to
> > print debug data and/or warnings.
> 
> Since you confessed to not liking it yourself, I paid it very little
> attention.  Yes, when I made that suggestion, I wasn't really thinking
> of how stat_refresh is a /proc/sys/vm sysctl thing; and I'm not at all
> sure how issuing output from a /proc file intended for input works out
> (perhaps there are plenty of good examples, and you followed one, but
> it smells fishy to me now).
> 
> > 
> > I thought about treating a write to this sysctls as setting the threshold,
> > so that "echo 0 > /proc/sys/vm/stat_refresh" would warn on all negative
> > entries, and "cat /proc/sys/vm/stat_refresh" would use the default threshold
> > as in my patch. But this breaks  to some extent the current ABI, as passing
> > an incorrect value will result in -EINVAL instead of passing (as now).
> 
> I expect we could handle that well enough, by more lenient validation
> of the input; though my comment above on output versus input sheds doubt.
> 
> > 
> > Overall I still think we shouldn't warn on any values inside the possible
> > range, as it's not an indication of any kind of error. The only reason
> > why we see some values going negative and some not, is that some of them
> > are updated more frequently than others, and some are bouncing around
> > zero, while other can't reach zero too easily (like the number of free 
> > pages).
> 
> We continue to disagree on that (and it amuses me that you who are so
> sure they can be ignored, cannot ignore them; whereas I who am so curious
> to investigate them, have not actually found the time to do so in years).
> It was looking as if nothing could satisfy us both, but...

I can only repeat my understanding here: with the current implementation
the measured number can vary in range of
  (true_value - zone_threshold * NR_CPUS,
   true_value + zone_threshold * NR_CPUS).
zone_threshold depends on the size of a zone and the number of CPUs,
but cannot exceed 125.

Of course, most likely measured numbers are mostly distributed somewhere
close to the real number, and reaching distant ends of this range is
unlikely. But it's a question of probability.

So if the true value is close to 0, there are high chances of getting
negative measured numbers. The bigger is the value, the lower are these
chances. And if it's bigger than the maximal drift, these chances are 0.

So we can be sure that a measured value can't go negative only if we know
for sure that the true number is bigger than zone_threshold * NR_CPUS.

You can, probably, say that if the chances of getting a negative value
are really really low, it's better to spawn a warning, rather than miss
a potential error. I'd happily agree, if we'd have a nice formula
to calculate the tolerance by the given probability. But if we'll treat
all negative numbers as warnings, we'll just end with a lot of false
warnings.

> 
> > 
> > Actually, if someone wants to ensure that numbers are accurate,
> > we have to temporarily set the threshold to 0, then flush the percpu data
> > and only then check atomics. In the current design flushing percpu data
> > matters for only slowly updated counters, as all others will run away while
> > we're waiting for the flush. So if we're targeting some slowly updating
> > counters, maybe we should warn only on them being negative, Idk.
> 
> I was going to look into that angle, though it would probably add a little
> unjustifiable overhead to fast paths, and be rejected on that basis.

I'd expect it. What I think can be acceptable is to have different tolerance
for different counters, if there is a good reason to have more precise values
for some counters.
I did a similar thing in the "new slab controller" patchset for memcg
slab statistics, which required a different threshold because they are measured
in bytes (all other metrics were historically in pages).

> 
> But in going to do so, came up against an earlier comment of yours, of
> which I had misunderstood the significance. I had said and you replied:
> 
> > > nr_zone_write_pending: yes, I've looked at our machines, and see

Re: [PATCH v17 01/21] mm/vmscan: remove unnecessary lruvec adding

2020-08-05 Thread Alex Shi




在 2020/7/25 下午8:59, Alex Shi 写道:
> We don't have to add a freeable page into lru and then remove from it.
> This change saves a couple of actions and makes the moving more clear.
> 
> The SetPageLRU needs to be kept here for list intergrity.
> Otherwise:
>  #0 mave_pages_to_lru  #1 release_pages
>if (put_page_testzero())
>  if !put_page_testzero
>  !PageLRU //skip lru_lock
>list_add(&page->lru,)
>list_add(&page->lru,) //corrupt

The race comments should be corrected to this:
/*
 * The SetPageLRU needs to be kept here for list intergrity.
 * Otherwise:
 *   #0 mave_pages_to_lru #1 release_pages
 *   if !put_page_testzero
 *if (put_page_testzero())
 *  !PageLRU //skip lru_lock
 * SetPageLRU()
 * list_add(&page->lru,)
 *list_add(&page->lru,)
 */

> 
> [a...@linux-foundation.org: coding style fixes]
> Signed-off-by: Alex Shi 
> Cc: Andrew Morton 
> Cc: Johannes Weiner 
> Cc: Tejun Heo 
> Cc: Matthew Wilcox 
> Cc: Hugh Dickins 
> Cc: linux...@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/vmscan.c | 37 -
>  1 file changed, 24 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 749d239c62b2..ddb29d813d77 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1856,26 +1856,29 @@ static unsigned noinline_for_stack 
> move_pages_to_lru(struct lruvec *lruvec,
>   while (!list_empty(list)) {
>   page = lru_to_page(list);
>   VM_BUG_ON_PAGE(PageLRU(page), page);
> + list_del(&page->lru);
>   if (unlikely(!page_evictable(page))) {
> - list_del(&page->lru);
>   spin_unlock_irq(&pgdat->lru_lock);
>   putback_lru_page(page);
>   spin_lock_irq(&pgdat->lru_lock);
>   continue;
>   }
> - lruvec = mem_cgroup_page_lruvec(page, pgdat);
>  
> + /*
> +  * The SetPageLRU needs to be kept here for list intergrity.
> +  * Otherwise:
> +  *   #0 mave_pages_to_lru #1 release_pages
> +  *if (put_page_testzero())
> +  *   if !put_page_testzero
> +  *  !PageLRU //skip lru_lock
> +  *list_add(&page->lru,)
> +  * list_add(&page->lru,) //corrupt
> +  */

/*
 * The SetPageLRU needs to be kept here for list intergrity.
 * Otherwise:
 *   #0 mave_pages_to_lru #1 release_pages
 *   if !put_page_testzero
 *if (put_page_testzero())
 *  !PageLRU //skip lru_lock
 * SetPageLRU()
 * list_add(&page->lru,)
 *list_add(&page->lru,)
 */

>   SetPageLRU(page);
> - lru = page_lru(page);
>  
> - nr_pages = hpage_nr_pages(page);
> - update_lru_size(lruvec, lru, page_zonenum(page), nr_pages);
> - list_move(&page->lru, &lruvec->lists[lru]);
> -
> - if (put_page_testzero(page)) {
> + if (unlikely(put_page_testzero(page))) {
>   __ClearPageLRU(page);
>   __ClearPageActive(page);
> - del_page_from_lru_list(page, lruvec, lru);
>  
>   if (unlikely(PageCompound(page))) {
>   spin_unlock_irq(&pgdat->lru_lock);
> @@ -1883,11 +1886,19 @@ static unsigned noinline_for_stack 
> move_pages_to_lru(struct lruvec *lruvec,
>   spin_lock_irq(&pgdat->lru_lock);
>   } else
>   list_add(&page->lru, &pages_to_free);
> - } else {
> - nr_moved += nr_pages;
> - if (PageActive(page))
> - workingset_age_nonresident(lruvec, nr_pages);
> +
> + continue;
>   }
> +
> + lruvec = mem_cgroup_page_lruvec(page, pgdat);
> + lru = page_lru(page);
> + nr_pages = hpage_nr_pages(page);
> +
> + update_lru_size(lruvec, lru, page_zonenum(page), nr_pages);
> + list_add(&page->lru, &lruvec->lists[lru]);
> + nr_moved += nr_pages;
> +

Re: [PATCH] venus: core: add shutdown callback for venus

2020-08-05 Thread mansur


Hi Sai,


On 2020-06-24 12:17, Sai Prakash Ranjan wrote:

Hi Mansur,

On 2020-06-13 16:03, Mansur Alisha Shaik wrote:

After the SMMU translation is disabled in the
arm-smmu shutdown callback during reboot, if
any subsystem are still alive then IOVAs they
are using will become PAs on bus, which may
lead to crash.

Below are the consumers of smmu from venus
arm-smmu: consumer: aa0.video-codec supplier=1500.iommu
arm-smmu: consumer: video-firmware.0 supplier=1500.iommu

So implemented shutdown callback, which detach iommu maps.

Change-Id: I0f0f331056e0b84b92f1d86f66618d4b1caaa24a
Signed-off-by: Mansur Alisha Shaik 
---
 drivers/media/platform/qcom/venus/core.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/media/platform/qcom/venus/core.c
b/drivers/media/platform/qcom/venus/core.c
index 30d4b9e..acf798c 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -371,6 +371,14 @@ static int venus_remove(struct platform_device 
*pdev)

return ret;
 }

+static void venus_core_shutdown(struct platform_device *pdev)
+{
+   int ret;
+
+   ret = venus_remove(pdev);
+   WARN_ON(ret < 0);


I don't think you should warn here, its shutdown path and you can't
do anything with this WARN unlike remove callback where you have
to be sure to cleanup properly so that you are able to reload module.
But if you still want a hint about this failure, then just add a 
dev_err()
to indicate the failure instead of a big stack trace spamming kernel 
log.




posted V2 version by adding dev_warn during shutdown failure instead of 
WARN_ON.

V2 version : https://lore.kernel.org/patchwork/patch/1284693/


Thanks,
Sai


---
Thanks,
Mansur

[PATCH V2] venus: core: add shutdown callback for venus

2020-08-05 Thread Mansur Alisha Shaik

After the SMMU translation is disabled in the
arm-smmu shutdown callback during reboot, if
any subsystem are still alive then IOVAs they
are using will become PAs on bus, which may
lead to crash.

Below are the consumers of smmu from venus
arm-smmu: consumer: aa0.video-codec supplier=1500.iommu
arm-smmu: consumer: video-firmware.0 supplier=1500.iommu

So implemented shutdown callback, which detach iommu maps.

Change-Id: I0f0f331056e0b84b92f1d86f66618d4b1caaa24a
Signed-off-by: Mansur Alisha Shaik 
---
 drivers/media/platform/qcom/venus/core.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/media/platform/qcom/venus/core.c 
b/drivers/media/platform/qcom/venus/core.c
index 203c653..92aac06 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -341,6 +341,16 @@ static int venus_remove(struct platform_device *pdev)
return ret;
 }
 
+static void venus_core_shutdown(struct platform_device *pdev)
+{
+   struct venus_core *core = platform_get_drvdata(pdev);
+   int ret;
+
+   ret = venus_remove(pdev);
+   if(ret)
+   dev_warn(core->dev, "shutdown failed \n", ret);
+}
+
 static __maybe_unused int venus_runtime_suspend(struct device *dev)
 {
struct venus_core *core = dev_get_drvdata(dev);
@@ -592,6 +602,7 @@ static struct platform_driver qcom_venus_driver = {
.of_match_table = venus_dt_match,
.pm = &venus_pm_ops,
},
+   .shutdown = venus_core_shutdown,
 };
 module_platform_driver(qcom_venus_driver);
 
-- 
2.7.4

Re: [PATCH 0/2] locking/qspinlock: Break qspinlock_types.h header loop

2020-08-05 Thread Vineet Gupta

On 7/30/20 12:50 AM, Herbert Xu wrote:
> On Thu, Jul 30, 2020 at 10:47:16AM +0300, Andy Shevchenko wrote:
>> We may ask Synopsys folks to look at this as well.
>> Vineet, any ideas if we may unify ATOMIC64_INIT() across the architectures?
> I don't think there is any technical difficulty.  The custom
> atomic64_t simply adds an alignment requirement so the initialisor
> remains the same.

Exactly so.

FWIW the alignment requirement is because ARC ABI allows 64-bit data to be 
32-bit
aligned provided hardware deals fine with 4 byte aligned for the non-atomic
double-load/store LDD/STD instructions. The 64-bit alignement however is 
required
for atomic double load/store LLOCKD/SCONDD instructions hence the definition of
ARC atomic64_t

-Vineet

Re: [PATCH v2 03/24] virtio: allow virtioXX, leXX in config space

2020-08-05 Thread Jason Wang




On 2020/8/5 下午7:45, Michael S. Tsirkin wrote:

   #define virtio_cread(vdev, structname, member, ptr)  \
do {\
might_sleep();  \
/* Must match the member's type, and be integer */  \
-   if (!typecheck(typeofstructname*)0)->member)), *(ptr))) \
+   if (!__virtio_typecheck(structname, member, *(ptr)))\
(*ptr) = 1; \

A silly question,  compare to using set()/get() directly, what's the value
of the accessors macro here?

Thanks

get/set don't convert to the native endian, I guess that's why
drivers use cread/cwrite. It is also nice that there's type
safety, checking the correct integer width is used.



Yes, but this is simply because a macro is used here, how about just 
doing things similar like virtio_cread_bytes():


static inline void virtio_cread(struct virtio_device *vdev,
                  unsigned int offset,
                  void *buf, size_t len)


And do the endian conversion inside?

Thanks

Re: linux-next: manual merge of the hmm tree with the kvm-ppc tree

2020-08-05 Thread Stephen Rothwell

Hi all,

On Thu, 30 Jul 2020 19:16:10 +1000 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the hmm tree got a conflict in:
> 
>   arch/powerpc/kvm/book3s_hv_uvmem.c
> 
> between commit:
> 
>   f1b87ea8784b ("KVM: PPC: Book3S HV: Move kvmppc_svm_page_out up")
> 
> from the kvm-ppc tree and commit:
> 
>   5143192cd410 ("mm/migrate: add a flags parameter to migrate_vma")
> 
> from the hmm tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc arch/powerpc/kvm/book3s_hv_uvmem.c
> index 0d49e3425a12,6850bd04bcb9..
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@@ -496,94 -253,14 +496,95 @@@ unsigned long kvmppc_h_svm_init_start(s
>   return ret;
>   }
>   
>  -unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
>  +/*
>  + * Provision a new page on HV side and copy over the contents
>  + * from secure memory using UV_PAGE_OUT uvcall.
>  + * Caller must held kvm->arch.uvmem_lock.
>  + */
>  +static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
>  +unsigned long start,
>  +unsigned long end, unsigned long page_shift,
>  +struct kvm *kvm, unsigned long gpa)
>   {
>  -if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
>  -return H_UNSUPPORTED;
>  +unsigned long src_pfn, dst_pfn = 0;
>  +struct migrate_vma mig;
>  +struct page *dpage, *spage;
>  +struct kvmppc_uvmem_page_pvt *pvt;
>  +unsigned long pfn;
>  +int ret = U_SUCCESS;
>   
>  -kvm->arch.secure_guest |= KVMPPC_SECURE_INIT_DONE;
>  -pr_info("LPID %d went secure\n", kvm->arch.lpid);
>  -return H_SUCCESS;
>  +memset(&mig, 0, sizeof(mig));
>  +mig.vma = vma;
>  +mig.start = start;
>  +mig.end = end;
>  +mig.src = &src_pfn;
>  +mig.dst = &dst_pfn;
> - mig.src_owner = &kvmppc_uvmem_pgmap;
> ++mig.pgmap_owner = &kvmppc_uvmem_pgmap;
> ++mig.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
>  +
>  +/* The requested page is already paged-out, nothing to do */
>  +if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
>  +return ret;
>  +
>  +ret = migrate_vma_setup(&mig);
>  +if (ret)
>  +return -1;
>  +
>  +spage = migrate_pfn_to_page(*mig.src);
>  +if (!spage || !(*mig.src & MIGRATE_PFN_MIGRATE))
>  +goto out_finalize;
>  +
>  +if (!is_zone_device_page(spage))
>  +goto out_finalize;
>  +
>  +dpage = alloc_page_vma(GFP_HIGHUSER, vma, start);
>  +if (!dpage) {
>  +ret = -1;
>  +goto out_finalize;
>  +}
>  +
>  +lock_page(dpage);
>  +pvt = spage->zone_device_data;
>  +pfn = page_to_pfn(dpage);
>  +
>  +/*
>  + * This function is used in two cases:
>  + * - When HV touches a secure page, for which we do UV_PAGE_OUT
>  + * - When a secure page is converted to shared page, we *get*
>  + *   the page to essentially unmap the device page. In this
>  + *   case we skip page-out.
>  + */
>  +if (!pvt->skip_page_out)
>  +ret = uv_page_out(kvm->arch.lpid, pfn << page_shift,
>  +  gpa, 0, page_shift);
>  +
>  +if (ret == U_SUCCESS)
>  +*mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
>  +else {
>  +unlock_page(dpage);
>  +__free_page(dpage);
>  +goto out_finalize;
>  +}
>  +
>  +migrate_vma_pages(&mig);
>  +
>  +out_finalize:
>  +migrate_vma_finalize(&mig);
>  +return ret;
>  +}
>  +
>  +static inline int kvmppc_svm_page_out(struct vm_area_struct *vma,
>  +  unsigned long start, unsigned long end,
>  +  unsigned long page_shift,
>  +  struct kvm *kvm, unsigned long gpa)
>  +{
>  +int ret;
>  +
>  +mutex_lock(&kvm->arch.uvmem_lock);
>  +ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa);
>  +mutex_unlock(&kvm->arch.uvmem_lock);
>  +
>  +return ret;
>   }
>   
>   /*
> @@@ -744,7 -400,20 +745,8 @@@ static int kvmppc_svm_page_in(struct vm
>   mig.end = end;
>   mig.src = &src_pfn;
>   mig.dst = &dst_pfn;
> + mig.flags = MIGRATE_VMA_SELECT_SYSTEM;
>   
>  -/*
>  - * We come here with mmap_lock write lock held just for
>  - * ksm_madvise(), otherwise we only need read mmap_lock.
>  - * Hence downgrade to read lock once ksm_madvise() is done.
>  - */
>  -ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
>  -  MADV_UNMERGEABLE, &vma->vm_flags);
>  -mmap_write_downgrade(kvm->mm);

Re: [PATCH 4/4] vhost: vdpa: report iova range

2020-08-05 Thread Jason Wang




On 2020/8/5 下午8:58, Michael S. Tsirkin wrote:

On Wed, Jun 17, 2020 at 11:29:47AM +0800, Jason Wang wrote:

This patch introduces a new ioctl for vhost-vdpa device that can
report the iova range by the device. For device that depends on
platform IOMMU, we fetch the iova range via DOMAIN_ATTR_GEOMETRY. For
devices that has its own DMA translation unit, we fetch it directly
from vDPA bus operation.

Signed-off-by: Jason Wang 
---
  drivers/vhost/vdpa.c | 27 +++
  include/uapi/linux/vhost.h   |  4 
  include/uapi/linux/vhost_types.h |  5 +
  3 files changed, 36 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 77a0c9fb6cc3..ad23e66cbf57 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -332,6 +332,30 @@ static long vhost_vdpa_set_config_call(struct vhost_vdpa 
*v, u32 __user *argp)
  
  	return 0;

  }
+
+static long vhost_vdpa_get_iova_range(struct vhost_vdpa *v, u32 __user *argp)
+{
+   struct iommu_domain_geometry geo;
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+   struct vhost_vdpa_iova_range range;
+   struct vdpa_iova_range vdpa_range;
+
+   if (!ops->set_map && !ops->dma_map) {

Why not just check if (ops->get_iova_range) directly?



Because set_map || dma_ops is a hint that the device has its own DMA 
translation logic.


Device without get_iova_range does not necessarily meant it use IOMMU 
driver.


Thanks








+   iommu_domain_get_attr(v->domain,
+ DOMAIN_ATTR_GEOMETRY, &geo);
+   range.start = geo.aperture_start;
+   range.end = geo.aperture_end;
+   } else {
+   vdpa_range = ops->get_iova_range(vdpa);
+   range.start = vdpa_range.start;
+   range.end = vdpa_range.end;
+   }
+
+   return copy_to_user(argp, &range, sizeof(range));
+
+}
+
  static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
   void __user *argp)
  {
@@ -442,6 +466,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
case VHOST_VDPA_SET_CONFIG_CALL:
r = vhost_vdpa_set_config_call(v, argp);
break;
+   case VHOST_VDPA_GET_IOVA_RANGE:
+   r = vhost_vdpa_get_iova_range(v, argp);
+   break;
default:
r = vhost_dev_ioctl(&v->vdev, cmd, argp);
if (r == -ENOIOCTLCMD)
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index 0c2349612e77..850956980e27 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -144,4 +144,8 @@
  
  /* Set event fd for config interrupt*/

  #define VHOST_VDPA_SET_CONFIG_CALL_IOW(VHOST_VIRTIO, 0x77, int)
+
+/* Get the valid iova range */
+#define VHOST_VDPA_GET_IOVA_RANGE  _IOW(VHOST_VIRTIO, 0x78, \
+struct vhost_vdpa_iova_range)
  #endif
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index 669457ce5c48..4025b5a36177 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -127,6 +127,11 @@ struct vhost_vdpa_config {
__u8 buf[0];
  };
  
+struct vhost_vdpa_iova_range {

+   __u64 start;
+   __u64 end;
+};
+


Pls document fields. And I think first/last is a better API ...


  /* Feature bits */
  /* Log all write descriptors. Can be changed while device is active. */
  #define VHOST_F_LOG_ALL 26
--
2.20.1

[GIT PULL] erofs fixes for 5.9-rc1

2020-08-05 Thread Gao Xiang

Hi Linus,

Could you consider this pull request for 5.9-rc1?

This cycle mainly addresses an issue out of some extended inode with
designated location, which can hardly be generated by current mkfs but
needs to handle at runtime anyway. The others are quite trivial ones.

All commits have been tested and have been in linux-next as well.
This merges cleanly with master.

Thanks,
Gao Xiang

The following changes since commit 92ed301919932f13b9172e525674157e983d:

  Linux 5.8-rc7 (2020-07-26 14:14:06 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git 
tags/erofs-for-5.9-rc1

for you to fetch changes up to 0e62ea33ac12ebde876b67eca113630805191a66:

  erofs: remove WQ_CPU_INTENSIVE flag from unbound wq's (2020-08-03 21:04:46 
+0800)


Changes since last update:

 - use HTTPS links instead of insecure HTTP ones;

 - fix crossing page boundary on specific extended inodes;

 - remove useless WQ_CPU_INTENSIVE flag for unbound wq;

 - minor cleanup.


Alexander A. Klimov (1):
  erofs: Replace HTTP links with HTTPS ones

Gao Xiang (3):
  erofs: fix extended inode could cross boundary
  erofs: fold in used-once helper erofs_workgroup_unfreeze_final()
  erofs: remove WQ_CPU_INTENSIVE flag from unbound wq's

 fs/erofs/compress.h |   2 +-
 fs/erofs/data.c |   2 +-
 fs/erofs/decompressor.c |   2 +-
 fs/erofs/dir.c  |   2 +-
 fs/erofs/erofs_fs.h |   2 +-
 fs/erofs/inode.c| 123 +++-
 fs/erofs/internal.h |   2 +-
 fs/erofs/namei.c|   2 +-
 fs/erofs/super.c|   2 +-
 fs/erofs/utils.c|  16 ++-
 fs/erofs/xattr.c|   2 +-
 fs/erofs/xattr.h|   2 +-
 fs/erofs/zdata.c|   6 +--
 fs/erofs/zdata.h|   2 +-
 fs/erofs/zmap.c |   2 +-
 fs/erofs/zpvec.h|   2 +-
 16 files changed, 100 insertions(+), 71 deletions(-)

Re: [PATCH 3/4] vdpa: get_iova_range() is mandatory for device specific DMA translation

2020-08-05 Thread Jason Wang




On 2020/8/5 下午8:55, Michael S. Tsirkin wrote:

On Wed, Jun 17, 2020 at 11:29:46AM +0800, Jason Wang wrote:

In order to let userspace work correctly, get_iova_range() is a must
for the device that has its own DMA translation logic.

I guess you mean for a device.

However in absence of ths op, I don't see what is wrong with just
assuming device can access any address.



It's just for safe, if you want, we can assume any address without this op.





Signed-off-by: Jason Wang 
---
  drivers/vdpa/vdpa.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index de211ef3738c..ab7af978ef70 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -82,6 +82,10 @@ struct vdpa_device *__vdpa_alloc_device(struct device 
*parent,
if (!!config->dma_map != !!config->dma_unmap)
goto err;
  
+	if ((config->dma_map || config->set_map) &&

+   !config->get_iova_range)
+   goto err;
+
err = -ENOMEM;
vdev = kzalloc(size, GFP_KERNEL);
if (!vdev)

What about devices using an IOMMU for translation?
IOMMUs generally have a limited IOVA range too, right?



See patch 4 which query the IOMMU geometry in this case:

+        iommu_domain_get_attr(v->domain,
+                  DOMAIN_ATTR_GEOMETRY, &geo);
+        range.start = geo.aperture_start;
+        range.end = geo.aperture_end;

Thanks







--
2.20.1

Re: WARNING in rxrpc_recvmsg

2020-08-05 Thread syzbot

syzbot suspects this issue was fixed by commit:

commit 65550098c1c4db528400c73acf3e46bfa78d9264
Author: David Howells 
Date:   Tue Jul 28 23:03:56 2020 +

rxrpc: Fix race between recvmsg and sendmsg on immediate call failure

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=10bd3bcc90
start commit:   7cc2a8ea Merge tag 'block-5.8-2020-07-01' of git://git.ker..
git tree:   upstream
kernel config:  https://syzkaller.appspot.com/x/.config?x=7be693511b29b338
dashboard link: https://syzkaller.appspot.com/bug?extid=1a68d5c4e74edea44294
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=17a5022f10
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=150932a710

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: rxrpc: Fix race between recvmsg and sendmsg on immediate call failure

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Re: [PATCH 1/4] vdpa: introduce config op to get valid iova range

2020-08-05 Thread Jason Wang




On 2020/8/5 下午8:51, Michael S. Tsirkin wrote:

On Wed, Jun 17, 2020 at 11:29:44AM +0800, Jason Wang wrote:

This patch introduce a config op to get valid iova range from the vDPA
device.

Signed-off-by: Jason Wang
---
  include/linux/vdpa.h | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 239db794357c..b7633ed2500c 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -41,6 +41,16 @@ struct vdpa_device {
unsigned int index;
  };
  
+/**

+ * vDPA IOVA range - the IOVA range support by the device
+ * @start: start of the IOVA range
+ * @end: end of the IOVA range
+ */
+struct vdpa_iova_range {
+   u64 start;
+   u64 end;
+};
+

This is ambiguous. Is end in the range or just behind it?



In the range.



How about first/last?



Sure.

Thanks

Re: [PATCH v2 22/24] vdpa_sim: fix endian-ness of config space

2020-08-05 Thread Jason Wang




On 2020/8/5 下午8:06, Michael S. Tsirkin wrote:

On Wed, Aug 05, 2020 at 02:21:07PM +0800, Jason Wang wrote:

On 2020/8/4 上午5:00, Michael S. Tsirkin wrote:

VDPA sim accesses config space as native endian - this is
wrong since it's a modern device and actually uses LE.

It only supports modern guests so we could punt and
just force LE, but let's use the full virtio APIs since people
tend to copy/paste code, and this is not data path anyway.

Signed-off-by: Michael S. Tsirkin
---
   drivers/vdpa/vdpa_sim/vdpa_sim.c | 31 ++-
   1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index a9bc5e0fb353..fa05e065ff69 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -24,6 +24,7 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
   #include 
@@ -72,6 +73,23 @@ struct vdpasim {
u64 features;
   };
+/* TODO: cross-endian support */
+static inline bool vdpasim_is_little_endian(struct vdpasim *vdpasim)
+{
+   return virtio_legacy_is_little_endian() ||
+   (vdpasim->features & (1ULL << VIRTIO_F_VERSION_1));
+}
+
+static inline u16 vdpasim16_to_cpu(struct vdpasim *vdpasim, __virtio16 val)
+{
+   return __virtio16_to_cpu(vdpasim_is_little_endian(vdpasim), val);
+}
+
+static inline __virtio16 cpu_to_vdpasim16(struct vdpasim *vdpasim, u16 val)
+{
+   return __cpu_to_virtio16(vdpasim_is_little_endian(vdpasim), val);
+}
+
   static struct vdpasim *vdpasim_dev;
   static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
@@ -306,7 +324,6 @@ static const struct vdpa_config_ops vdpasim_net_config_ops;
   static struct vdpasim *vdpasim_create(void)
   {
-   struct virtio_net_config *config;
struct vdpasim *vdpasim;
struct device *dev;
int ret = -ENOMEM;
@@ -331,10 +348,7 @@ static struct vdpasim *vdpasim_create(void)
if (!vdpasim->buffer)
goto err_iommu;
-   config = &vdpasim->config;
-   config->mtu = 1500;
-   config->status = VIRTIO_NET_S_LINK_UP;
-   eth_random_addr(config->mac);
+   eth_random_addr(vdpasim->config.mac);
vringh_set_iotlb(&vdpasim->vqs[0].vring, vdpasim->iommu);
vringh_set_iotlb(&vdpasim->vqs[1].vring, vdpasim->iommu);
@@ -448,6 +462,7 @@ static u64 vdpasim_get_features(struct vdpa_device *vdpa)
   static int vdpasim_set_features(struct vdpa_device *vdpa, u64 features)
   {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+   struct virtio_net_config *config = &vdpasim->config;
/* DMA mapping must be done by driver */
if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM)))
@@ -455,6 +470,12 @@ static int vdpasim_set_features(struct vdpa_device *vdpa, 
u64 features)
vdpasim->features = features & vdpasim_features;
+   /* We only know whether guest is using the legacy interface here, so
+* that's the earliest we can set config fields.
+*/

We check whether or not ACCESS_PLATFORM is set before which is probably a
hint that only modern device is supported. So I wonder just force LE and
fail if VERSION_1 is not set is better?

Thanks

So how about I add a comment along the lines of

/*
  * vdpasim ATM requires VIRTIO_F_ACCESS_PLATFORM, so we don't need to
  * support legacy guests. Keep transitional device code around for
  * the benefit of people who might copy-and-paste this into transitional
  * device code.
  */



That's fine.

Thanks

Re: [PATCH v2 19/24] vdpa: make sure set_features in invoked for legacy

2020-08-05 Thread Jason Wang




On 2020/8/5 下午7:40, Michael S. Tsirkin wrote:

On Wed, Aug 05, 2020 at 02:14:07PM +0800, Jason Wang wrote:

On 2020/8/4 上午5:00, Michael S. Tsirkin wrote:

Some legacy guests just assume features are 0 after reset.
We detect that config space is accessed before features are
set and set features to 0 automatically.
Note: some legacy guests might not even access config space, if this is
reported in the field we might need to catch a kick to handle these.

I wonder whether it's easier to just support modern device?

Thanks

Well hardware vendors are I think interested in supporting legacy
guests. Limiting vdpa to modern only would make it uncompetitive.



My understanding is that, IOMMU_PLATFORM is mandatory for hardware vDPA 
to work. So it can only work for modern device ...


Thanks

Re: Is anyone else getting a bad signature from kernel.org's 5.8 sources+Greg's sign?

2020-08-05 Thread David Niklas

On Wed, 5 Aug 2020 18:36:08 -0700
Randy Dunlap  wrote:

> On 8/5/20 5:59 PM, David Niklas wrote:
> > Hello,
> > I downloaded the kernel sources from kernel.org using curl, then
> > opera, and finally lynx (to rule out an html parsing bug). I did the
> > same with the sign and I keep getting:
> > 
> > %  gpg2 --verify linux-5.8.tar.sign linux-5.8.tar.xz
> > gpg: Signature made Mon Aug  3 00:19:13 2020 EDT
> > gpg:using RSA key
> > 647F28654894E3BD457199BE38DBBDC86092693E gpg: BAD signature from
> > "Greg Kroah-Hartman " [unknown]
> > 
> > I did refresh all the keys just in case.
> > I believe this is important so I'm addressing this to the signer and
> > only CC'ing the list.
> > 
> > If I'm made some simple mistake, feel free to send SIG666 to my
> > terminal. I did re-read the man page just in case.  
> 
> It works successfully for me.
> 
> 
> from https://www.kernel.org/category/signatures.html::
> 
> 
> If you get "BAD signature"
> 
> If at any time you see "BAD signature" output from "gpg2 --verify",
> please first check the following first:
> 
> Make sure that you are verifying the signature against the .tar
> version of the archive, not the compressed (.tar.xz) version. Make sure
> the the downloaded file is correct and not truncated or otherwise
> corrupted.
> 
> If you repeatedly get the same "BAD signature" output, please email
> helpd...@kernel.org, so we can investigate the problem.
> 
> 
> 

Many thanks. I've never seen a signature done that way before, but I
understand why you would do it that way.

David

Re: [GIT PULL] LEDs changes for v5.9-rc1

2020-08-05 Thread pr-tracker-bot

The pull request you sent on Wed, 5 Aug 2020 23:33:29 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds.git/ 
> tags/leds-5.9-rc1

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e4a7b2dc35d9582c253cf5e6d6c3605aabc7284d

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

[PATCH RESEND v1 06/11] perf mem: Support Arm SPE events

2020-08-05 Thread Leo Yan

This patch is to add Arm SPE events for perf memory profiling.  It
supports three Arm SPE events:

  - spe-load: memory event for only recording memory load ops;
  - spe-store: memory event for only recording memory store ops;
  - spe-ldst: memory event for recording memory load and store ops.

Signed-off-by: Leo Yan 
---
 tools/perf/arch/arm64/util/Build|  2 +-
 tools/perf/arch/arm64/util/mem-events.c | 46 +
 2 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/arch/arm64/util/mem-events.c

diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build
index 5c13438c7bd4..cb18442e840f 100644
--- a/tools/perf/arch/arm64/util/Build
+++ b/tools/perf/arch/arm64/util/Build
@@ -8,4 +8,4 @@ perf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 perf-$(CONFIG_AUXTRACE) += ../../arm/util/pmu.o \
  ../../arm/util/auxtrace.o \
  ../../arm/util/cs-etm.o \
- arm-spe.o
+ arm-spe.o mem-events.o
diff --git a/tools/perf/arch/arm64/util/mem-events.c 
b/tools/perf/arch/arm64/util/mem-events.c
new file mode 100644
index ..f23128db54fb
--- /dev/null
+++ b/tools/perf/arch/arm64/util/mem-events.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "map_symbol.h"
+#include "mem-events.h"
+
+#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
+
+static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+   E("spe-load",   
"arm_spe_0/ts_enable=1,load_filter=1,store_filter=0,min_latency=%u/",   
"arm_spe_0"),
+   E("spe-store",  "arm_spe_0/ts_enable=1,load_filter=0,store_filter=1/",  
"arm_spe_0"),
+   E("spe-ldst",   
"arm_spe_0/ts_enable=1,load_filter=1,store_filter=1,min_latency=%u/",   
"arm_spe_0"),
+};
+
+static char mem_ld_name[100];
+static char mem_st_name[100];
+static char mem_ldst_name[100];
+
+struct perf_mem_event *perf_mem_events__ptr(int i)
+{
+   if (i >= PERF_MEM_EVENTS__MAX)
+   return NULL;
+
+   return &perf_mem_events[i];
+}
+
+char *perf_mem_events__name(int i)
+{
+   struct perf_mem_event *e = perf_mem_events__ptr(i);
+
+   if (i >= PERF_MEM_EVENTS__MAX)
+   return NULL;
+
+   if (i == PERF_MEM_EVENTS__LOAD) {
+   scnprintf(mem_ld_name, sizeof(mem_ld_name),
+ e->name, perf_mem_events__loads_ldlat);
+   return mem_ld_name;
+   }
+
+   if (i == PERF_MEM_EVENTS__STORE) {
+   scnprintf(mem_st_name, sizeof(mem_st_name), e->name);
+   return mem_st_name;
+   }
+
+   scnprintf(mem_ldst_name, sizeof(mem_ldst_name),
+ e->name, perf_mem_events__loads_ldlat);
+   return mem_ldst_name;
+}
-- 
2.17.1

[PATCH RESEND v1 05/11] perf mem: Support AUX trace

2020-08-05 Thread Leo Yan

Perf memory profiling doesn't support aux trace data so the tool cannot
receive the synthesized samples from hardware tracing data.  On the
Arm64 platform, though it doesn't support PMU events for memory load and
store, but Armv8's SPE is a good candidate for memory profiling, the
hardware tracer can record memory accessing operations with physical
address and virtual address for different cache level and it also stats
the memory operations for remote access and TLB.

To allow the perf memory tool to support AUX trace, this patches adds
the aux callbacks for session structure.  It passes the predefined synth
options (like llc, flc, remote_access, tlb, etc) so this notifies the
tracing decoder to generate corresponding samples.  This patch also
invokes the standard API perf_event__process_attr() to register sample
IDs into evlist.

Signed-off-by: Leo Yan 
---
 tools/perf/builtin-mem.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index a7204634893c..6c8b5e956a4a 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -7,6 +7,7 @@
 #include "perf.h"
 
 #include 
+#include "util/auxtrace.h"
 #include "util/trace-event.h"
 #include "util/tool.h"
 #include "util/session.h"
@@ -249,6 +250,15 @@ static int process_sample_event(struct perf_tool *tool,
 
 static int report_raw_events(struct perf_mem *mem)
 {
+   struct itrace_synth_opts itrace_synth_opts = {
+   .set = true,
+   .flc = true,/* First level cache samples */
+   .llc = true,/* Last level cache samples */
+   .tlb = true,/* TLB samples */
+   .remote_access = true,  /* Remote access samples */
+   .default_no_sample = true,
+   };
+
struct perf_data data = {
.path  = input_name,
.mode  = PERF_DATA_MODE_READ,
@@ -261,6 +271,8 @@ static int report_raw_events(struct perf_mem *mem)
if (IS_ERR(session))
return PTR_ERR(session);
 
+   session->itrace_synth_opts = &itrace_synth_opts;
+
if (mem->cpu_list) {
ret = perf_session__cpu_bitmap(session, mem->cpu_list,
   mem->cpu_bitmap);
@@ -394,6 +406,19 @@ parse_mem_ops(const struct option *opt, const char *str, 
int unset)
return ret;
 }
 
+static int process_attr(struct perf_tool *tool __maybe_unused,
+   union perf_event *event,
+   struct evlist **pevlist)
+{
+   int err;
+
+   err = perf_event__process_attr(tool, event, pevlist);
+   if (err)
+   return err;
+
+   return 0;
+}
+
 int cmd_mem(int argc, const char **argv)
 {
struct stat st;
@@ -405,8 +430,12 @@ int cmd_mem(int argc, const char **argv)
.comm   = perf_event__process_comm,
.lost   = perf_event__process_lost,
.fork   = perf_event__process_fork,
+   .attr   = process_attr,
.build_id   = perf_event__process_build_id,
.namespaces = perf_event__process_namespaces,
+   .auxtrace_info  = perf_event__process_auxtrace_info,
+   .auxtrace   = perf_event__process_auxtrace,
+   .auxtrace_error = perf_event__process_auxtrace_error,
.ordered_events = true,
},
.input_name  = "perf.data",
-- 
2.17.1

[PATCH RESEND v1 04/11] perf mem: Only initialize memory event for recording

2020-08-05 Thread Leo Yan

It's needless to initialize memory events for perf reporting, so only
initialize memory event for perf recording.  This change allows to parse
perf data on cross platforms, e.g. perf tool can output reports even the
machine doesn't enable any memory events.

Signed-off-by: Leo Yan 
---
 tools/perf/builtin-mem.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index bd4229ca3685..a7204634893c 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -78,6 +78,11 @@ static int __cmd_record(int argc, const char **argv, struct 
perf_mem *mem)
OPT_END()
};
 
+   if (perf_mem_events__init()) {
+   pr_err("failed: memory events not supported\n");
+   return -1;
+   }
+
argc = parse_options(argc, argv, options, record_mem_usage,
 PARSE_OPT_KEEP_UNKNOWN);
 
@@ -436,11 +441,6 @@ int cmd_mem(int argc, const char **argv)
NULL
};
 
-   if (perf_mem_events__init()) {
-   pr_err("failed: memory events not supported\n");
-   return -1;
-   }
-
argc = parse_options_subcommand(argc, argv, mem_options, 
mem_subcommands,
mem_usage, PARSE_OPT_KEEP_UNKNOWN);
 
-- 
2.17.1

[PATCH RESEND v1 11/11] perf arm-spe: Set sample's data source field

2020-08-05 Thread Leo Yan

The sample structure contains the field 'data_src' which is used to
tell the detailed info for data operations, e.g. this field indicates
the data operation is loading or storing, on which cache level, it's
snooping or remote accessing, etc.  At the end, the 'data_src' will be
parsed by perf memory tool to display human readable strings.

This patch is to fill the 'data_src' field in the synthesized samples
base on different types.  Now support types for Level 1 dcache miss,
Level 1 dcache hit, Last level cache miss, Last level cache access,
TLB miss, TLB hit, remote access for other socket.

Note, current perf tool can display statistics for L1/L2/L3 caches but
it doesn't support the 'last level cache'.  To fit into current
implementation, 'data_src' field uses L3 cache for last level cache.

Signed-off-by: Leo Yan 
---
 tools/perf/util/arm-spe.c | 87 +++
 1 file changed, 79 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 74308a72b000..3114f059fc2f 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -259,7 +259,7 @@ arm_spe_deliver_synth_event(struct arm_spe *spe,
 }
 
 static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
-u64 spe_events_id)
+u64 spe_events_id, u64 data_src)
 {
struct arm_spe *spe = speq->spe;
struct arm_spe_record *record = &speq->decoder->record;
@@ -272,6 +272,7 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue 
*speq,
sample.stream_id = spe_events_id;
sample.addr = record->addr;
sample.phys_addr = record->phys_addr;
+   sample.data_src = data_src;
 
return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
@@ -293,21 +294,74 @@ static int arm_spe__synth_branch_sample(struct 
arm_spe_queue *speq,
return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
 
+static u64 arm_spe__synth_data_source(const struct arm_spe_record *record,
+ int type)
+{
+   union perf_mem_data_src data_src = { 0 };
+
+   if (record->op == ARM_SPE_LD)
+   data_src.mem_op = PERF_MEM_OP_LOAD;
+   else
+   data_src.mem_op = PERF_MEM_OP_STORE;
+
+   switch (type) {
+   case ARM_SPE_L1D_MISS:
+   data_src.mem_lvl_num = PERF_MEM_LVLNUM_L1;
+   data_src.mem_lvl = PERF_MEM_LVL_MISS | PERF_MEM_LVL_L1;
+   break;
+   case ARM_SPE_L1D_ACCESS:
+   data_src.mem_lvl_num = PERF_MEM_LVLNUM_L1;
+   data_src.mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_L1;
+   break;
+   case ARM_SPE_LLC_MISS:
+   data_src.mem_lvl_num = PERF_MEM_LVLNUM_L3;
+   data_src.mem_lvl = PERF_MEM_LVL_MISS | PERF_MEM_LVL_L3;
+   break;
+   case ARM_SPE_LLC_ACCESS:
+   data_src.mem_lvl_num = PERF_MEM_LVLNUM_L3;
+   data_src.mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_L3;
+   break;
+   case ARM_SPE_TLB_MISS:
+   data_src.mem_dtlb = PERF_MEM_TLB_WK | PERF_MEM_TLB_MISS;
+   break;
+   case ARM_SPE_TLB_ACCESS:
+   data_src.mem_dtlb = PERF_MEM_TLB_WK | PERF_MEM_TLB_HIT;
+   break;
+   case ARM_SPE_REMOTE_ACCESS:
+   data_src.mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
+   data_src.mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_REM_CCE1;
+   break;
+   default:
+   break;
+   }
+
+   return data_src.val;
+}
+
 static int arm_spe_sample(struct arm_spe_queue *speq)
 {
const struct arm_spe_record *record = &speq->decoder->record;
struct arm_spe *spe = speq->spe;
+   u64 data_src;
int err;
 
if (spe->sample_flc) {
if (record->type & ARM_SPE_L1D_MISS) {
-   err = arm_spe__synth_mem_sample(speq, spe->l1d_miss_id);
+   data_src = arm_spe__synth_data_source(record,
+ ARM_SPE_L1D_MISS);
+
+   err = arm_spe__synth_mem_sample(speq, spe->l1d_miss_id,
+   data_src);
if (err)
return err;
}
 
if (record->type & ARM_SPE_L1D_ACCESS) {
-   err = arm_spe__synth_mem_sample(speq, 
spe->l1d_access_id);
+   data_src = arm_spe__synth_data_source(record,
+ 
ARM_SPE_L1D_ACCESS);
+
+   err = arm_spe__synth_mem_sample(speq, 
spe->l1d_access_id,
+   data_src);
if (err)
return err;
}
@@ -315,13 +369,21 @@ static int arm_spe_sample

[PATCH RESEND v1 03/11] perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE

2020-08-05 Thread Leo Yan

The existed architectures which have supported perf memory profiling,
usually it contains two types of hardware events: load and store, so if
want to profile memory for both load and store operations, the tool will
use these two events at the same time.  But this is not valid for aux
tracing event, the same event can be used with setting different
configurations for memory operation filtering, e.g the event can be used
to only trace memory load, or only memory store, or trace for both memory
load and store.

This patch introduces a new event PERF_MEM_EVENTS__LOAD_STORE, which is
used to support the event which can record both memory load and store
operations.

Signed-off-by: Leo Yan 
---
 tools/perf/builtin-mem.c | 11 +--
 tools/perf/util/mem-events.h |  1 +
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 9a7df8d01296..bd4229ca3685 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -19,8 +19,9 @@
 #include "util/symbol.h"
 #include 
 
-#define MEM_OPERATION_LOAD 0x1
-#define MEM_OPERATION_STORE0x2
+#define MEM_OPERATION_LOAD 0x1
+#define MEM_OPERATION_STORE0x2
+#define MEM_OPERATION_LOAD_STORE   0x4
 
 struct perf_mem {
struct perf_tooltool;
@@ -97,6 +98,11 @@ static int __cmd_record(int argc, const char **argv, struct 
perf_mem *mem)
e->record = true;
}
 
+   if (mem->operation & MEM_OPERATION_LOAD_STORE) {
+   e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD_STORE);
+   e->record = true;
+   }
+
e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
if (e->record)
rec_argv[i++] = "-W";
@@ -326,6 +332,7 @@ struct mem_mode {
 static const struct mem_mode mem_modes[]={
MEM_OPT("load", MEM_OPERATION_LOAD),
MEM_OPT("store", MEM_OPERATION_STORE),
+   MEM_OPT("ldst", MEM_OPERATION_LOAD_STORE),
MEM_END
 };
 
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 726a9c8103e4..5ef178278909 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -28,6 +28,7 @@ struct mem_info {
 enum {
PERF_MEM_EVENTS__LOAD,
PERF_MEM_EVENTS__STORE,
+   PERF_MEM_EVENTS__LOAD_STORE,
PERF_MEM_EVENTS__MAX,
 };
 
-- 
2.17.1

[PATCH RESEND v1 08/11] perf arm-spe: Save memory addresses in packet

2020-08-05 Thread Leo Yan

This patch is to save virtual and physical memory addresses in packet,
the address info can be used for generating memory samples.

Signed-off-by: Leo Yan 
---
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 4 
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c 
b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index 93e063f22be5..373dc2d1cf06 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -162,6 +162,10 @@ static int arm_spe_read_record(struct arm_spe_decoder 
*decoder)
decoder->record.from_ip = ip;
else if (idx == SPE_ADDR_PKT_HDR_INDEX_BRANCH)
decoder->record.to_ip = ip;
+   else if (idx == SPE_ADDR_PKT_HDR_INDEX_DATA_VIRT)
+   decoder->record.addr = ip;
+   else if (idx == SPE_ADDR_PKT_HDR_INDEX_DATA_PHYS)
+   decoder->record.phys_addr = ip;
break;
case ARM_SPE_COUNTER:
break;
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h 
b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index a5111a8d4360..5acddfcffbd1 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -47,6 +47,8 @@ struct arm_spe_record {
u64 from_ip;
u64 to_ip;
u64 timestamp;
+   u64 addr;
+   u64 phys_addr;
 };
 
 struct arm_spe_insn;
-- 
2.17.1

[PATCH RESEND v1 10/11] perf arm-spe: Fill address info for memory samples

2020-08-05 Thread Leo Yan

Since the Arm SPE backend decoder has passed virtual and physical
addresses info through packet, these addresses info can be filled into
the synthesize samples, finally the address info can be used for memory
profiling.

To support memory related samples, this patch divides into two functions
for generating samples:
  - arm_spe__synth_mem_sample() is for synthesizing memory accessing and
TLB related samples;
  - arm_spe__synth_branch_sample() is to synthesize branch samples which
is mainly for branch miss prediction.

Signed-off-by: Leo Yan 
---
 tools/perf/util/arm-spe.c | 52 +++
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index c2cf5058648f..74308a72b000 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -235,7 +235,6 @@ static void arm_spe_prep_sample(struct arm_spe *spe,
sample->cpumode = arm_spe_cpumode(spe, sample->ip);
sample->pid = speq->pid;
sample->tid = speq->tid;
-   sample->addr = record->to_ip;
sample->period = 1;
sample->cpu = speq->cpu;
 
@@ -259,18 +258,37 @@ arm_spe_deliver_synth_event(struct arm_spe *spe,
return ret;
 }
 
-static int
-arm_spe_synth_spe_events_sample(struct arm_spe_queue *speq,
-   u64 spe_events_id)
+static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
+u64 spe_events_id)
 {
struct arm_spe *spe = speq->spe;
+   struct arm_spe_record *record = &speq->decoder->record;
+   union perf_event *event = speq->event_buf;
+   struct perf_sample sample = { 0 };
+
+   arm_spe_prep_sample(spe, speq, event, &sample);
+
+   sample.id = spe_events_id;
+   sample.stream_id = spe_events_id;
+   sample.addr = record->addr;
+   sample.phys_addr = record->phys_addr;
+
+   return arm_spe_deliver_synth_event(spe, speq, event, &sample);
+}
+
+static int arm_spe__synth_branch_sample(struct arm_spe_queue *speq,
+   u64 spe_events_id)
+{
+   struct arm_spe *spe = speq->spe;
+   struct arm_spe_record *record = &speq->decoder->record;
union perf_event *event = speq->event_buf;
-   struct perf_sample sample = { .ip = 0, };
+   struct perf_sample sample = { 0 };
 
arm_spe_prep_sample(spe, speq, event, &sample);
 
sample.id = spe_events_id;
sample.stream_id = spe_events_id;
+   sample.addr = record->to_ip;
 
return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
@@ -283,15 +301,13 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
if (spe->sample_flc) {
if (record->type & ARM_SPE_L1D_MISS) {
-   err = arm_spe_synth_spe_events_sample(
-   speq, spe->l1d_miss_id);
+   err = arm_spe__synth_mem_sample(speq, spe->l1d_miss_id);
if (err)
return err;
}
 
if (record->type & ARM_SPE_L1D_ACCESS) {
-   err = arm_spe_synth_spe_events_sample(
-   speq, spe->l1d_access_id);
+   err = arm_spe__synth_mem_sample(speq, 
spe->l1d_access_id);
if (err)
return err;
}
@@ -299,15 +315,13 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
if (spe->sample_llc) {
if (record->type & ARM_SPE_LLC_MISS) {
-   err = arm_spe_synth_spe_events_sample(
-   speq, spe->llc_miss_id);
+   err = arm_spe__synth_mem_sample(speq, spe->llc_miss_id);
if (err)
return err;
}
 
if (record->type & ARM_SPE_LLC_ACCESS) {
-   err = arm_spe_synth_spe_events_sample(
-   speq, spe->llc_access_id);
+   err = arm_spe__synth_mem_sample(speq, 
spe->llc_access_id);
if (err)
return err;
}
@@ -315,31 +329,27 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
if (spe->sample_tlb) {
if (record->type & ARM_SPE_TLB_MISS) {
-   err = arm_spe_synth_spe_events_sample(
-   speq, spe->tlb_miss_id);
+   err = arm_spe__synth_mem_sample(speq, spe->tlb_miss_id);
if (err)
return err;
}
 
if (record->type & ARM_SPE_TLB_ACCESS) {
-   err = arm_spe_synth_spe_events_sample(
-   speq, spe->tlb_access_id);
+   err = arm_spe__synth

Re: [git pull] drm next for 5.9-rc1

2020-08-05 Thread pr-tracker-bot

The pull request you sent on Thu, 6 Aug 2020 11:07:02 +1000:

> git://anongit.freedesktop.org/drm/drm tags/drm-next-2020-08-06

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/8186749621ed6b8fc42644c399e8c755a2b6f630

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

[PATCH RESEND v1 09/11] perf arm-spe: Store operation types in packet

2020-08-05 Thread Leo Yan

This patch is to store operation types into packet structure, this can
be used by frontend to generate memory accessing info for samples.

Signed-off-by: Leo Yan 
---
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 11 +++
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h |  6 ++
 2 files changed, 17 insertions(+)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c 
b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index 373dc2d1cf06..cba394784b0d 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -172,6 +172,17 @@ static int arm_spe_read_record(struct arm_spe_decoder 
*decoder)
case ARM_SPE_CONTEXT:
break;
case ARM_SPE_OP_TYPE:
+   /*
+* When operation type packet header's class equals 1,
+* the payload's least significant bit (LSB) indicates
+* the operation type: load/swap or store.
+*/
+   if (idx == 1) {
+   if (payload & 0x1)
+   decoder->record.op = ARM_SPE_ST;
+   else
+   decoder->record.op = ARM_SPE_LD;
+   }
break;
case ARM_SPE_EVENTS:
if (payload & BIT(EV_L1D_REFILL))
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h 
b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 5acddfcffbd1..f23188282ef0 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -41,9 +41,15 @@ enum arm_spe_sample_type {
ARM_SPE_REMOTE_ACCESS   = 1 << 7,
 };
 
+enum arm_spe_op_type {
+   ARM_SPE_LD  = 1 << 0,
+   ARM_SPE_ST  = 1 << 1,
+};
+
 struct arm_spe_record {
enum arm_spe_sample_type type;
int err;
+   u32 op;
u64 from_ip;
u64 to_ip;
u64 timestamp;
-- 
2.17.1

[PATCH RESEND v1 07/11] perf arm-spe: Enable attribution PERF_SAMPLE_DATA_SRC

2020-08-05 Thread Leo Yan

This patch is to enable attribution PERF_SAMPLE_DATA_SRC for the perf
data, when decoding the tracing data, it will tells the tool it contains
memory data.

Signed-off-by: Leo Yan 
---
 tools/perf/util/arm-spe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 3882a5360ada..c2cf5058648f 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -803,7 +803,7 @@ arm_spe_synth_events(struct arm_spe *spe, struct 
perf_session *session)
attr.type = PERF_TYPE_HARDWARE;
attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
-   PERF_SAMPLE_PERIOD;
+   PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC;
if (spe->timeless_decoding)
attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
else
-- 
2.17.1

[PATCH RESEND v1 01/11] perf mem: Search event name with more flexible path

2020-08-05 Thread Leo Yan

Perf tool searches memory event name under the folder
'/sys/devices/cpu/events/', this leads to the limitation for selection
memory profiling event which must be under this folder.  Thus it's
impossible to use any other event as memory event which is not under
this specific folder, e.g. Arm SPE hardware event is not located in
'/sys/devices/cpu/events/' so it cannot be enabled for memory profiling.

This patch changes to search folder from '/sys/devices/cpu/events/' to
'/sys/devices', so it give flexibility to find events which can be used
for memory profiling.

Signed-off-by: Leo Yan 
---
 tools/perf/util/mem-events.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index ea0af0bc4314..35c8d175a9d2 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -18,8 +18,8 @@ unsigned int perf_mem_events__loads_ldlat = 30;
 #define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
 
 struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
-   E("ldlat-loads","cpu/mem-loads,ldlat=%u/P", "mem-loads"),
-   E("ldlat-stores",   "cpu/mem-stores/P", "mem-stores"),
+   E("ldlat-loads","cpu/mem-loads,ldlat=%u/P", 
"cpu/events/mem-loads"),
+   E("ldlat-stores",   "cpu/mem-stores/P", 
"cpu/events/mem-stores"),
 };
 #undef E
 
@@ -93,7 +93,7 @@ int perf_mem_events__init(void)
struct perf_mem_event *e = &perf_mem_events[j];
struct stat st;
 
-   scnprintf(path, PATH_MAX, "%s/devices/cpu/events/%s",
+   scnprintf(path, PATH_MAX, "%s/devices/%s",
  mnt, e->sysfs_name);
 
if (!stat(path, &st))
-- 
2.17.1

[PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE

2020-08-05 Thread Leo Yan

This patch set is to support AUX trace and Arm SPE as the first enabled
hardware tracing for Perf memory tool.

Patches 01 ~ 04 are preparasion patches which mainly resolve the issue
for memory events, since the existed code is hard coded the memory
events which based on x86 and PowerPC architectures, so patches 01 ~ 04
extend to support more flexible memory event name, and introduce weak
functions so can allow every architecture to define its own memory
events structure and returning event pointer and name respectively.

Patch 05 is used to extend Perf memory tool to support AUX trace.

Patch 06 ~ 11 are to support Arm SPE with Perf memory tool.  Firstly it
registers SPE events for memory events, then it extends the SPE packet
to pass addresses info and operation types, and also set 'data_src'
field so can allow the tool to display readable string in the result.

This patch set has been tested on ARMv8 Hisilicon D06 platform.  I noted
now the 'data object' cannot be displayed properly, this should be
another issue so need to check separately.   Below is testing result:

# Samples: 73  of event 'l1d-miss'
# Total weight : 73
# Sort order   : 
local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead   Samples  Local Weight  Memory access Symbol
  Shared Object  Data Symbol
   Data Object Snoop TLB access 
 Locked
#         
..  .  
  
..    ..  ..
#
 2.74% 2  0 L1 or L1 miss [k] 
perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0x2027aacb08a8   
 [unknown]   N/A   N/A  
   No
 2.74% 2  0 L1 or L1 miss [k] 
perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0x2027be6488a8   
 [unknown]   N/A   N/A  
   No
 2.74% 2  0 L1 or L1 miss [k] 
perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0x2027c432f8a8   
 [unknown]   N/A   N/A  
   No
 1.37% 1  0 L1 or L1 miss [k] 
__arch_copy_to_user [kernel.kallsyms]  [k] 0x0027a65352a0   
 [unknown]   N/A   N/A  
   No
 1.37% 1  0 L1 or L1 miss [k] 
__d_lookup_rcu  [kernel.kallsyms]  [k] 0x0027d3cbf468   
 [unknown]   N/A   N/A  
   No
 1.37% 1  0 L1 or L1 miss [k] 
__d_lookup_rcu  [kernel.kallsyms]  [k] 0x0027d8f44490   
 [unknown]   N/A   N/A  
   No
 [...]


# Samples: 101  of event 'l1d-access'
# Total weight : 101
# Sort order   : 
local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead   Samples  Local Weight  Memory access Symbol
  Shared Object  Data Symbol
   Data Object Snoop TLB access 
 Locked
#         
..  .  
  
..    ..  ..
#
 2.97% 3  0 L1 or L1 hit  [k] 
perf_event_mmap [kernel.kallsyms]  [k] perf_swevent+0x5c
 [kernel.kallsyms].data  N/A   N/A  
   No
 1.98% 2  0 L1 or L1 hit  [k] 
kmem_cache_alloc[kernel.kallsyms]  [k] 0x2027af40e3d0   
 [unknown]   N/A   N/A  
   No
 1.98% 2  0 L1 or L1 hit  [k] 
perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0x2027aacb08a8   
 [unknown]   N/A   N/A  
   No
 1.98% 2  0 L1 or L1 hit  [k] 
perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0x2027be6488a8   
 [unknown]   N/A   N/A  
   No
 1.98% 2  0 L1 or L1 hit  [k] 
perf_iterate_ctx.constprop.151  [kernel.k

[PATCH RESEND v1 02/11] perf mem: Introduce weak function perf_mem_events__ptr()

2020-08-05 Thread Leo Yan

Different architectures might use different event or different event
parameters for memory profiling, this patch introduces weak function
perf_mem_events__ptr(), which allows to return back architecture
specific memory event.

After the function perf_mem_events__ptr() is introduced, the variable
'perf_mem_events' can be accessed by using this new function; so marks
the variable as 'static' variable, this can allow the architectures to
define its own memory event array.

Signed-off-by: Leo Yan 
---
 tools/perf/builtin-c2c.c | 18 --
 tools/perf/builtin-mem.c | 21 ++---
 tools/perf/util/mem-events.c | 26 +++---
 tools/perf/util/mem-events.h |  2 +-
 4 files changed, 46 insertions(+), 21 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 5938b100eaf4..88e68f36aa62 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2914,6 +2914,7 @@ static int perf_c2c__record(int argc, const char **argv)
int ret;
bool all_user = false, all_kernel = false;
bool event_set = false;
+   struct perf_mem_event *e;
struct option options[] = {
OPT_CALLBACK('e', "event", &event_set, "event",
 "event selector. Use 'perf mem record -e list' to list 
available events",
@@ -2941,11 +2942,15 @@ static int perf_c2c__record(int argc, const char **argv)
rec_argv[i++] = "record";
 
if (!event_set) {
-   perf_mem_events[PERF_MEM_EVENTS__LOAD].record  = true;
-   perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
+   e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+   e->record = true;
+
+   e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+   e->record = true;
}
 
-   if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
+   e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+   if (e->record)
rec_argv[i++] = "-W";
 
rec_argv[i++] = "-d";
@@ -2953,12 +2958,13 @@ static int perf_c2c__record(int argc, const char **argv)
rec_argv[i++] = "--sample-cpu";
 
for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
-   if (!perf_mem_events[j].record)
+   e = perf_mem_events__ptr(j);
+   if (!e->record)
continue;
 
-   if (!perf_mem_events[j].supported) {
+   if (!e->supported) {
pr_err("failed: event '%s' not supported\n",
-  perf_mem_events[j].name);
+  perf_mem_events__name(j));
free(rec_argv);
return -1;
}
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 3523279af6af..9a7df8d01296 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -64,6 +64,7 @@ static int __cmd_record(int argc, const char **argv, struct 
perf_mem *mem)
const char **rec_argv;
int ret;
bool all_user = false, all_kernel = false;
+   struct perf_mem_event *e;
struct option options[] = {
OPT_CALLBACK('e', "event", &mem, "event",
 "event selector. use 'perf mem record -e list' to list 
available events",
@@ -86,13 +87,18 @@ static int __cmd_record(int argc, const char **argv, struct 
perf_mem *mem)
 
rec_argv[i++] = "record";
 
-   if (mem->operation & MEM_OPERATION_LOAD)
-   perf_mem_events[PERF_MEM_EVENTS__LOAD].record = true;
+   if (mem->operation & MEM_OPERATION_LOAD) {
+   e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+   e->record = true;
+   }
 
-   if (mem->operation & MEM_OPERATION_STORE)
-   perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
+   if (mem->operation & MEM_OPERATION_STORE) {
+   e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+   e->record = true;
+   }
 
-   if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
+   e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+   if (e->record)
rec_argv[i++] = "-W";
 
rec_argv[i++] = "-d";
@@ -101,10 +107,11 @@ static int __cmd_record(int argc, const char **argv, 
struct perf_mem *mem)
rec_argv[i++] = "--phys-data";
 
for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
-   if (!perf_mem_events[j].record)
+   e = perf_mem_events__ptr(j);
+   if (!e->record)
continue;
 
-   if (!perf_mem_events[j].supported) {
+   if (!e->supported) {
pr_err("failed: event '%s' not supported\n",
   perf_mem_events__name(j));
free(rec_argv);
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 35c8d175a9d2..7a5a0d699e27 100644
--- a/tools/perf/util/mem-events.c
++

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1070 matches

Mail list logo