Re: [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration

2020-06-22 Thread Ralph Campbell



On 6/22/20 5:30 PM, John Hubbard wrote:

On 2020-06-22 16:38, Ralph Campbell wrote:

The OpenCL function clEnqueueSVMMigrateMem(), without any flags, will
migrate memory in the given address range to device private memory. The
source pages might already have been migrated to device private memory.
In that case, the source struct page is not checked to see if it is
a device private page and incorrectly computes the GPU's physical
address of local memory leading to data corruption.
Fix this by checking the source struct page and computing the correct
physical address.

Signed-off-by: Ralph Campbell 
---
  drivers/gpu/drm/nouveau/nouveau_dmem.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index cc9993837508..f6a806ba3caa 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -540,6 +540,12 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct 
nouveau_drm *drm,
  if (!(src & MIGRATE_PFN_MIGRATE))
  goto out;
+    if (spage && is_device_private_page(spage)) {
+    paddr = nouveau_dmem_page_addr(spage);
+    *dma_addr = DMA_MAPPING_ERROR;
+    goto done;
+    }
+
  dpage = nouveau_dmem_page_alloc_locked(drm);
  if (!dpage)
  goto out;
@@ -560,6 +566,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct 
nouveau_drm *drm,
  goto out_free_page;
  }
+done:
  *pfn = NVIF_VMM_PFNMAP_V0_V | NVIF_VMM_PFNMAP_V0_VRAM |
  ((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT);
  if (src & MIGRATE_PFN_WRITE)
@@ -615,6 +622,7 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
  struct migrate_vma args = {
  .vma    = vma,
  .start    = start,
+    .src_owner    = drm->dev,


Hi Ralph,

This .src_owner setting does look like a required fix, but it seems like
a completely separate fix from what is listed in this patch's commit
description, right? (It feels like a casualty of rearranging the patches.)


thanks,


It's a bit more complex. There is a catch-22 here with the change to 
mm/migrate.c.
Without this patch or mm/migrate.c, a second call to clEnqueueSVMMigrateMem()
for the same address range will invalidate the GPU mapping to device private 
memory
created by the first call.
With this patch but not mm/migrate.c, the first call to clEnqueueSVMMigrateMem()
will fail to migrate normal anonymous memory to device private memory.
Without this patch but including the change to mm/migrate.c, a second call to
clEnqueueSVMMigrateMem() will crash the kernel because dma_map_page() will be
called with the device private PFN which is not a valid CPU physical address.
With both changes, a range of anonymous and device private pages can be migrated
to the GPU and the GPU page tables updated properly.


Re: [PATCH v2 1/3] mfd: core: Make a best effort attempt to match devices with the correct of_nodes

2020-06-22 Thread Frank Rowand
On 2020-06-22 20:17, Frank Rowand wrote:
> On 2020-06-22 17:23, Frank Rowand wrote:
>> On 2020-06-22 14:11, Lee Jones wrote:
>>> On Mon, 22 Jun 2020, Frank Rowand wrote:
>>>
 On 2020-06-22 10:10, Lee Jones wrote:
> On Mon, 22 Jun 2020, Frank Rowand wrote:
>
>> On 2020-06-22 03:50, Lee Jones wrote:
>>> On Thu, 18 Jun 2020, Frank Rowand wrote:
>>>
 On 2020-06-15 04:26, Lee Jones wrote:
> On Sun, 14 Jun 2020, Frank Rowand wrote:
>
>> Hi Lee,
>>
>> I'm looking at 5.8-rc1.
>>
>> The only use of OF_MFD_CELL() where the same compatible is specified
>> for multiple elements of a struct mfd_cell array is for compatible
>> "stericsson,ab8500-pwm" in drivers/mfd/ab8500-core.c:
>>
>> OF_MFD_CELL("ab8500-pwm",
>> NULL, NULL, 0, 1, "stericsson,ab8500-pwm"),
>> OF_MFD_CELL("ab8500-pwm",
>> NULL, NULL, 0, 2, "stericsson,ab8500-pwm"),
>> OF_MFD_CELL("ab8500-pwm",
>> NULL, NULL, 0, 3, "stericsson,ab8500-pwm"),

  OF_MFD_CELL("ab8500-pwm",
  NULL, NULL, 0, 0, "stericsson,ab8500-pwm"),

  OF_MFD_CELL_REG("ab8500-pwm-mc",
  NULL, NULL, 0, 0, "stericsson,ab8500-pwm", 0),
  OF_MFD_CELL_REG("ab8500-pwm-mc",
  NULL, NULL, 0, 1, "stericsson,ab8500-pwm", 1),
  OF_MFD_CELL_REG("ab8500-pwm-mc",
  NULL, NULL, 0, 2, "stericsson,ab8500-pwm", 2),

>>
>> The only .dts or .dtsi files where I see compatible 
>> "stericsson,ab8500-pwm"
>> are:
>>
>>arch/arm/boot/dts/ste-ab8500.dtsi
>>arch/arm/boot/dts/ste-ab8505.dtsi
>>
>> These two .dtsi files only have a single node with this compatible.
>> Chasing back to .dts and .dtsi files that include these two .dtsi
>> files, I see no case where there are multiple nodes with this
>> compatible.
>>
>> So it looks to me like there is no .dts in mainline that is providing
>> the three "stericsson,ab8500-pwm" nodes that 
>> drivers/mfd/ab8500-core.c
>> is expecting.  No case that there are multiple mfd child nodes where
>> mfd_add_device() would assign the first of n child nodes with the
>> same compatible to multiple devices.
>>
>> So it appears to me that drivers/mfd/ab8500-core.c is currently 
>> broken.
>> Am I missing something here?
>>
>> If I am correct, then either drivers/mfd/ab8500-core.c or
>> ste-ab8500.dtsi and ste-ab8505.dtsi need to be fixed.
>
> Your analysis is correct.

 OK, if I'm not overlooking anything, that is good news.

 Existing .dts source files only have one "ab8500-pwm" child.  They 
 already
 work correcly.

 Create a new compatible for the case of multiple children.  In my 
 example
 I will add "-mc" (multiple children) to the existing compatible.  There
 is likely a better name, but this lets me provide an example.

 Modify drivers/mfd/ab8500-core.c to use the new compatible, and new 
 .dts
 source files with multiple children use the new compatible:

  OF_MFD_CELL("ab8500-pwm",
  NULL, NULL, 0, 0, "stericsson,ab8500-pwm"),

  OF_MFD_CELL_REG("ab8500-pwm-mc",
  NULL, NULL, 0, 0, "stericsson,ab8500-pwm", 0),
  OF_MFD_CELL_REG("ab8500-pwm-mc",
  NULL, NULL, 0, 1, "stericsson,ab8500-pwm", 1),
  OF_MFD_CELL_REG("ab8500-pwm-mc",
  NULL, NULL, 0, 2, "stericsson,ab8500-pwm", 2),

 The "OF_MFD_CELL" entry is the existing entry, which will handle 
 current
 .dts source files.  The new "OF_MFD_CELL_REG" entries will handle new
 .dts source files.
>>>
>>> Sorry, but I'm not sure what the above exercise is supposed to solve.
>>>
>>> Could you explain it for me please?
>>
>> The OF_MFD_CELL() entry handles all of the existing .dts source files
>> that only have one ab8500-pwm child nodes.  So existing .dtb blobs
>> continue to work.
>>
>> The OF_MFD_CELL_REG() entries will handle all of the new .dts source
>> files that will have up to 3 ab8500-pwm child nodes.
>>
>> Compatibility is maintained for existing .dtb files.  A new kernel
>> version with the changes will support new .dtb files that contain
>> multiple ab8500-pwm child nodes.
>
> I can see *what* you're trying to do.  I was 

Re: [PATCH 07/10] media: mtk-vcodec: venc: remove redundant code

2020-06-22 Thread Tiffany Lin
On Mon, 2020-06-22 at 22:10 +0900, Alexandre Courbot wrote:
> On Fri, Jun 19, 2020 at 3:59 PM Tiffany Lin  wrote:
> >
> > On Wed, 2020-05-20 at 17:27 +0900, Alexandre Courbot wrote:
> > > vidioc_try_fmt() does clamp height and width when called on the OUTPUT
> > > queue, so clamping them prior to calling this function is redundant. Set
> > > the queue's parameters after calling vidioc_try_fmt() so we can use the
> > > values it computed.
> > >
> >
> > vidioc_try_fmt clamps height and width only when f->type ==
> > V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE
> >
> > Does this cleanup pass v4l2 compliance test?
> 
> It doesn't result in more tests failing at least. :) But although I
> cannot test with a pristine upstream version, it seems like some tests
> are not passing to begin with. If you have different results with a
> true upstream I would like to hear about it. Otherwise I am willing to
> help with getting all the tests in the green.
> 
> Regarding this particular patch, you are right that we may end up
> writing an unclamped size in q_data. It's probably better to drop it
> for now.
> 
I did attach compliance tests results when I upstream first version.
It's how maintainer make sure all v4l2 driver implement interfaces the
same way.
And by doing this automatically instead review flow to make sure it meet
interfaces spec.


> > I recall compliance test will try different fmt and make sure driver
> > response enough information?
> >
> >
> > > Signed-off-by: Alexandre Courbot 
> > > ---
> > >  .../media/platform/mtk-vcodec/mtk_vcodec_enc.c   | 16 
> > >  1 file changed, 4 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c 
> > > b/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c
> > > index 05743a745a11..f0af78f112db 100644
> > > --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c
> > > +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c
> > > @@ -449,7 +449,6 @@ static int vidioc_venc_s_fmt_out(struct file *file, 
> > > void *priv,
> > >   struct mtk_q_data *q_data;
> > >   int ret, i;
> > >   const struct mtk_video_fmt *fmt;
> > > - struct v4l2_pix_format_mplane *pix_fmt_mp = >fmt.pix_mp;
> > >
> > >   vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
> > >   if (!vq) {
> > > @@ -474,20 +473,13 @@ static int vidioc_venc_s_fmt_out(struct file *file, 
> > > void *priv,
> > >   f->fmt.pix.pixelformat = fmt->fourcc;
> > >   }
> > >
> > > - pix_fmt_mp->height = clamp(pix_fmt_mp->height,
> > > - MTK_VENC_MIN_H,
> > > - MTK_VENC_MAX_H);
> > > - pix_fmt_mp->width = clamp(pix_fmt_mp->width,
> > > - MTK_VENC_MIN_W,
> > > - MTK_VENC_MAX_W);
> > > -
> > > - q_data->visible_width = f->fmt.pix_mp.width;
> > > - q_data->visible_height = f->fmt.pix_mp.height;
> > > - q_data->fmt = fmt;
> > > - ret = vidioc_try_fmt(f, q_data->fmt);
> > > + ret = vidioc_try_fmt(f, fmt);
> > >   if (ret)
> > >   return ret;
> > >
> > > + q_data->fmt = fmt;
> > > + q_data->visible_width = f->fmt.pix_mp.width;
> > > + q_data->visible_height = f->fmt.pix_mp.height;
> > >   q_data->coded_width = f->fmt.pix_mp.width;
> > >   q_data->coded_height = f->fmt.pix_mp.height;
> > >
> >



[PATCH] drm/omap: Remove aggregate initialization of new_mode in omap_connector_mode_valid

2020-06-22 Thread Nathan Chancellor
After commit 42acb06b01b1 ("drm: pahole struct drm_display_mode"), clang
warns:

drivers/gpu/drm/omapdrm/omap_connector.c:92:39: warning: braces around
scalar initializer [-Wbraced-scalar-init]
struct drm_display_mode new_mode = { { 0 } };
 ^~
1 warning generated.

After the struct was shuffled, the second set of braces is no longer
needed because we are not initializing a structure (struct list_head)
but a regular integer (int clock).

However, looking into it further, this initialization is pointless
because new_mode is used as the destination of drm_mode_copy, where the
members of new_mode will just be completely overwritten with the members
of mode. Just remove the initialization of new_mode so that there is no
more warning and we don't need to worry about updating the
initialization if the structure ever get shuffled again.

Link: https://github.com/ClangBuiltLinux/linux/issues/1059
Suggested-by: Nick Desaulniers 
Signed-off-by: Nathan Chancellor 
---
 drivers/gpu/drm/omapdrm/omap_connector.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/omapdrm/omap_connector.c 
b/drivers/gpu/drm/omapdrm/omap_connector.c
index 528764566b17..ce4da1511920 100644
--- a/drivers/gpu/drm/omapdrm/omap_connector.c
+++ b/drivers/gpu/drm/omapdrm/omap_connector.c
@@ -89,7 +89,7 @@ static enum drm_mode_status omap_connector_mode_valid(struct 
drm_connector *conn
 struct drm_display_mode *mode)
 {
struct omap_connector *omap_connector = to_omap_connector(connector);
-   struct drm_display_mode new_mode = { { 0 } };
+   struct drm_display_mode new_mode;
enum drm_mode_status status;
 
status = omap_connector_mode_fixup(omap_connector->output, mode,

base-commit: 27f11fea33608cbd321a97cbecfa2ef97dcc1821
-- 
2.27.0



Re: [PATCH] tpm_tis_spi: Prefer async probe

2020-06-22 Thread Jarkko Sakkinen
On Fri, Jun 19, 2020 at 02:20:01PM -0700, Douglas Anderson wrote:
> On a Chromebook I'm working on I noticed a big (~1 second) delay
> during bootup where nothing was happening.  Right around this big
> delay there were messages about the TPM:
> 
> [2.311352] tpm_tis_spi spi0.0: TPM ready IRQ confirmed on attempt 2
> [3.332790] tpm_tis_spi spi0.0: Cr50 firmware version: ...
> 
> I put a few printouts in and saw that tpm_tis_spi_init() (specifically
> tpm_chip_register() in that function) was taking the lion's share of
> this time, though ~115 ms of the time was in cr50_print_fw_version().
> 
> Let's make a one-line change to prefer async probe for tpm_tis_spi.
> There's no reason we need to block other drivers from probing while we
> load.
> 
> NOTES:
> * It's possible that other hardware runs through the init sequence
>   faster than Cr50 and this isn't such a big problem for them.
>   However, even if they are faster they are still doing _some_
>   transfers over a SPI bus so this should benefit everyone even if to
>   a lesser extent.
> * It's possible that there are extra delays in the code that could be
>   optimized out.  I didn't dig since once I enabled async probe they
>   no longer impacted me.
> 
> Signed-off-by: Douglas Anderson 
> ---
> 
>  drivers/char/tpm/tpm_tis_spi_main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/char/tpm/tpm_tis_spi_main.c 
> b/drivers/char/tpm/tpm_tis_spi_main.c
> index d96755935529..422766445373 100644
> --- a/drivers/char/tpm/tpm_tis_spi_main.c
> +++ b/drivers/char/tpm/tpm_tis_spi_main.c
> @@ -288,6 +288,7 @@ static struct spi_driver tpm_tis_spi_driver = {
>   .pm = _tis_pm,
>   .of_match_table = of_match_ptr(of_tis_spi_match),
>   .acpi_match_table = ACPI_PTR(acpi_tis_spi_match),
> + .probe_type = PROBE_PREFER_ASYNCHRONOUS,
>   },
>   .probe = tpm_tis_spi_driver_probe,
>   .remove = tpm_tis_spi_remove,
> -- 
> 2.27.0.111.gc72c7da667-goog
> 


Reviewed-by: Jarkko Sakkinen 

/Jarkko


linux-next: manual merge of the drm-intel tree with Linus' tree

2020-06-22 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the drm-intel tree got a conflict in:

  drivers/gpu/drm/i915/i915_drv.h

between commit:

  7fb81e9d8073 ("drm/i915: Use drmm_add_final_kfree")

from Linus' tree and commit:

  8a25c4be583d ("drm/i915/params: switch to device specific parameters")

from the drm-intel tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/gpu/drm/i915/i915_drv.h
index adb9bf34cf97,2697960f15a9..
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@@ -826,9 -827,9 +827,12 @@@ struct i915_selftest_stash 
  struct drm_i915_private {
struct drm_device drm;
  
+   /* i915 device parameters */
+   struct i915_params params;
+ 
 +  /* FIXME: Device release actions should all be moved to drmm_ */
 +  bool do_release;
 +
const struct intel_device_info __info; /* Use INTEL_INFO() to access. */
struct intel_runtime_info __runtime; /* Use RUNTIME_INFO() to access. */
struct intel_driver_caps caps;


pgp1R8DpUwuGy.pgp
Description: OpenPGP digital signature


Re: [PATCH 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window

2020-06-22 Thread Oliver O'Halloran
On Tue, Jun 23, 2020 at 11:12 AM Alexey Kardashevskiy  wrote:
>
> On 23/06/2020 04:59, Leonardo Bras wrote:
> >
> >> Also, despite this particular file, the "pdn" name is usually used for
> >> struct pci_dn (not device_node), let's keep it that way.
> >
> > Sure, I got confused for some time about this, as we have:
> > static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn).
> > but on *_ddw() we have "struct pci_dn *pdn".
>
> True again, not the cleanest style here.
>
>
> > I will also add a patch that renames those 'struct device_node *pdn' to
> > something like 'struct device_node *parent_dn'.

I usually go with "np" or "node". In this case I'd use "parent_np" or
just "parent." As you said pci_dn conventionally uses pdn so that
should be avoided if at all possible. There's some places that just
use "dn" for device_node, but I don't think that's something we should
encourage due to how similar it is to pdn.

> I would not go that far, we (well, Oliver) are getting rid of many
> occurrences of pci_dn and Oliver may have a stronger opinion here.

I'm trying to remove the use of pci_dn from non-RTAS platforms which
doesn't apply to pseries. For RTAS platforms having pci_dn sort of
makes sense since it's used to cache data from the device_node and
having it saves you from needing to parse and validate the DT at
runtime since we're supposed to be relying on the FW provided settings
in the DT. I want to get rid of it on PowerNV because it's become a
dumping ground for random bits and pieces of platform specific data.
It's confusing at best and IMO it duplicates a lot of what's already
available in the per-PHB structures which the platform specific stuff
should actually be looking at.

Oliver


Re: [virtio-dev] Re: [PATCH v5 0/3] Support virtio cross-device resources

2020-06-22 Thread David Stevens
Unless there are any remaining objections to these patches, what are
the next steps towards getting these merged? Sorry, I'm not familiar
with the workflow for contributing patches to Linux.

Thanks,
David

On Tue, Jun 9, 2020 at 6:53 PM Michael S. Tsirkin  wrote:
>
> On Tue, Jun 09, 2020 at 10:25:15AM +0900, David Stevens wrote:
> > This patchset implements the current proposal for virtio cross-device
> > resource sharing [1]. It will be used to import virtio resources into
> > the virtio-video driver currently under discussion [2]. The patch
> > under consideration to add support in the virtio-video driver is [3].
> > It uses the APIs from v3 of this series, but the changes to update it
> > are relatively minor.
> >
> > This patchset adds a new flavor of dma-bufs that supports querying the
> > underlying virtio object UUID, as well as adding support for exporting
> > resources from virtgpu.
>
> Gerd, David, if possible, please test this in configuration with
> virtual VTD enabled but with iommu_platform=off
> to make sure we didn't break this config.
>
>
> Besides that, for virtio parts:
>
> Acked-by: Michael S. Tsirkin 
>
>
> > [1] https://markmail.org/thread/2ypjt5cfeu3m6lxu
> > [2] https://markmail.org/thread/p5d3k566srtdtute
> > [3] https://markmail.org/thread/j4xlqaaim266qpks
> >
> > v4 -> v5 changes:
> >  - Remove virtio_dma_buf_export_info.
> >
> > David Stevens (3):
> >   virtio: add dma-buf support for exported objects
> >   virtio-gpu: add VIRTIO_GPU_F_RESOURCE_UUID feature
> >   drm/virtio: Support virtgpu exported resources
> >
> >  drivers/gpu/drm/virtio/virtgpu_drv.c   |  3 +
> >  drivers/gpu/drm/virtio/virtgpu_drv.h   | 20 ++
> >  drivers/gpu/drm/virtio/virtgpu_kms.c   |  4 ++
> >  drivers/gpu/drm/virtio/virtgpu_prime.c | 96 +-
> >  drivers/gpu/drm/virtio/virtgpu_vq.c| 55 +++
> >  drivers/virtio/Makefile|  2 +-
> >  drivers/virtio/virtio.c|  6 ++
> >  drivers/virtio/virtio_dma_buf.c| 82 ++
> >  include/linux/virtio.h |  1 +
> >  include/linux/virtio_dma_buf.h | 37 ++
> >  include/uapi/linux/virtio_gpu.h| 19 +
> >  11 files changed, 321 insertions(+), 4 deletions(-)
> >  create mode 100644 drivers/virtio/virtio_dma_buf.c
> >  create mode 100644 include/linux/virtio_dma_buf.h
> >
> > --
> > 2.27.0.278.ge193c7cf3a9-goog
>
>
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
>


Re: [PATCH] KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL

2020-06-22 Thread Sean Christopherson
On Tue, Jun 23, 2020 at 09:21:28AM +0800, Xiaoyao Li wrote:
> On 6/23/2020 8:51 AM, Sean Christopherson wrote:
> >Remove support for context switching between the guest's and host's
> >desired UMWAIT_CONTROL.  Propagating the guest's value to hardware isn't
> >required for correct functionality, e.g. KVM intercepts reads and writes
> >to the MSR, and the latency effects of the settings controlled by the
> >MSR are not architecturally visible.
> >
> >As a general rule, KVM should not allow the guest to control power
> >management settings unless explicitly enabled by userspace, e.g. see
> >KVM_CAP_X86_DISABLE_EXITS.  E.g. Intel's SDM explicitly states that C0.2
> >can improve the performance of SMT siblings.  A devious guest could
> >disable C0.2 so as to improve the performance of their workloads at the
> >detriment to workloads running in the host or on other VMs.
> >
> >Wholesale removal of UMWAIT_CONTROL context switching also fixes a race
> >condition where updates from the host may cause KVM to enter the guest
> >with the incorrect value.  Because updates are are propagated to all
> >CPUs via IPI (SMP function callback), the value in hardware may be
> >stale with respect to the cached value and KVM could enter the guest
> >with the wrong value in hardware.  As above, the guest can't observe the
> >bad value, but it's a weird and confusing wart in the implementation.
> >
> >Removal also fixes the unnecessary usage of VMX's atomic load/store MSR
> >lists.  Using the lists is only necessary for MSRs that are required for
> >correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on
> >old hardware, or for MSRs that need to-the-uop precision, e.g. perf
> >related MSRs.  For UMWAIT_CONTROL, the effects are only visible in the
> >kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in
> >vcpu_vmx_run().
> 
> >Using the atomic lists is undesirable as they are more
> >expensive than direct RDMSR/WRMSR.
> 
> Do you mean the extra handling of atomic list facility in kvm? Or just mean
> vm-exit/-entry MSR-load/save in VMX hardware is expensive than direct
> RDMSR/WRMSR instruction?

Both.  The KVM handling is the bigger cost, e.g. requires two VMWRITEs to
update the list counts, on top of the list processing.  The actual ucode
cost is also somewhat expensive if adding an MSR to the list causes the
load/store lists to be activated, e.g. on top of the memory accesses for
the list, VM-Enter ucode needs to do its consistency checks.

Expensive is obviously relative, but as far as the lists are concerned it's
an easy penalty to avoid.


Re: [PATCH 06/10] media: mtk-vcodec: venc: specify supported formats per-chip

2020-06-22 Thread Tiffany Lin
On Mon, 2020-06-22 at 21:44 +0900, Alexandre Courbot wrote:
> On Fri, Jun 19, 2020 at 4:26 PM Tiffany Lin  wrote:
> >
> > On Wed, 2020-05-20 at 17:27 +0900, Alexandre Courbot wrote:
> > > Different chips have different supported bitrate ranges. Move the list
> > > of supported formats to the platform data, and split the output and
> > > capture formats into two lists to make it easier to find the default
> > > format for each queue.
> > >
> >
> > Does this patch pass v4l2 compliance test?
> 
> This should not change the behavior towards userspace at all (it's
> just moving data around and making it more flexible), so the test
> results should not be affected either.
> 
I remember that passing compliance tests is required for upstream.
The tests try to make sure that all V4L2 driver implement interfaces in
the same way.
So user space applications could find/enumerate HW capability.



> >
> >
> > > Signed-off-by: Alexandre Courbot 
> > > ---
> > >  .../platform/mtk-vcodec/mtk_vcodec_drv.h  |   8 ++
> > >  .../platform/mtk-vcodec/mtk_vcodec_enc.c  | 122 +++---
> > >  .../platform/mtk-vcodec/mtk_vcodec_enc_drv.c  |  40 ++
> > >  3 files changed, 95 insertions(+), 75 deletions(-)
> > >
> > > diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h 
> > > b/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h
> > > index b8f913de8d80..59b4b750666b 100644
> > > --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h
> > > +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h
> > > @@ -313,6 +313,10 @@ enum mtk_chip {
> > >   * @has_lt_irq: whether the encoder uses the LT irq
> > >   * @min_birate: minimum supported encoding bitrate
> > >   * @max_bitrate: maximum supported encoding bitrate
> > > + * @capture_formats: array of supported capture formats
> > > + * @num_capture_formats: number of entries in capture_formats
> > > + * @output_formats: array of supported output formats
> > > + * @num_output_formats: number of entries in output_formats
> > >   */
> > >  struct mtk_vcodec_enc_pdata {
> > >   enum mtk_chip chip;
> > > @@ -321,6 +325,10 @@ struct mtk_vcodec_enc_pdata {
> > >   bool has_lt_irq;
> > >   unsigned long min_bitrate;
> > >   unsigned long max_bitrate;
> > > + const struct mtk_video_fmt *capture_formats;
> > > + size_t num_capture_formats;
> > > + const struct mtk_video_fmt *output_formats;
> > > + size_t num_output_formats;
> > >  };
> > >
> > >  /**
> > > diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c 
> > > b/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c
> > > index 50ba9da59153..05743a745a11 100644
> > > --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c
> > > +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c
> > > @@ -23,47 +23,9 @@
> > >  #define DFT_CFG_WIDTHMTK_VENC_MIN_W
> > >  #define DFT_CFG_HEIGHT   MTK_VENC_MIN_H
> > >  #define MTK_MAX_CTRLS_HINT   20
> > > -#define OUT_FMT_IDX  0
> > > -#define CAP_FMT_IDX  4
> > > -
> > >
> > >  static void mtk_venc_worker(struct work_struct *work);
> > >
> > > -static const struct mtk_video_fmt mtk_video_formats[] = {
> > > - {
> > > - .fourcc = V4L2_PIX_FMT_NV12M,
> > > - .type = MTK_FMT_FRAME,
> > > - .num_planes = 2,
> > > - },
> > > - {
> > > - .fourcc = V4L2_PIX_FMT_NV21M,
> > > - .type = MTK_FMT_FRAME,
> > > - .num_planes = 2,
> > > - },
> > > - {
> > > - .fourcc = V4L2_PIX_FMT_YUV420M,
> > > - .type = MTK_FMT_FRAME,
> > > - .num_planes = 3,
> > > - },
> > > - {
> > > - .fourcc = V4L2_PIX_FMT_YVU420M,
> > > - .type = MTK_FMT_FRAME,
> > > - .num_planes = 3,
> > > - },
> > > - {
> > > - .fourcc = V4L2_PIX_FMT_H264,
> > > - .type = MTK_FMT_ENC,
> > > - .num_planes = 1,
> > > - },
> > > - {
> > > - .fourcc = V4L2_PIX_FMT_VP8,
> > > - .type = MTK_FMT_ENC,
> > > - .num_planes = 1,
> > > - },
> > > -};
> > > -
> > > -#define NUM_FORMATS ARRAY_SIZE(mtk_video_formats)
> > > -
> > >  static const struct mtk_codec_framesizes mtk_venc_framesizes[] = {
> > >   {
> > >   .fourcc = V4L2_PIX_FMT_H264,
> > > @@ -156,27 +118,17 @@ static const struct v4l2_ctrl_ops 
> > > mtk_vcodec_enc_ctrl_ops = {
> > >   .s_ctrl = vidioc_venc_s_ctrl,
> > >  };
> > >
> > > -static int vidioc_enum_fmt(struct v4l2_fmtdesc *f, bool output_queue)
> > > +static int vidioc_enum_fmt(struct v4l2_fmtdesc *f,
> > > +const struct mtk_video_fmt *formats,
> > > +size_t num_formats)
> > >  {
> > > - const struct mtk_video_fmt *fmt;
> > > - int i, j = 0;
> > > + if (f->index >= num_formats)
> > > + return -EINVAL;
> > >
> > > - for (i = 0; i < NUM_FORMATS; ++i) {
> > > - if (output_queue && 

[PATCH v2] arm64/module: Optimize module load time by optimizing PLT counting

2020-06-22 Thread Saravana Kannan
When loading a module, module_frob_arch_sections() tries to figure out
the number of PLTs that'll be needed to handle all the RELAs. While
doing this, it tries to dedupe PLT allocations for multiple
R_AARCH64_CALL26 relocations to the same symbol. It does the same for
R_AARCH64_JUMP26 relocations.

To make checks for duplicates easier/faster, it sorts the relocation
list by type, symbol and addend. That way, to check for a duplicate
relocation, it just needs to compare with the previous entry.

However, sorting the entire relocation array is unnecessary and
expensive (O(n log n)) because there are a lot of other relocation types
that don't need deduping or can't be deduped.

So this commit partitions the array into entries that need deduping and
those that don't. And then sorts just the part that needs deduping. And
when CONFIG_RANDOMIZE_BASE is disabled, the sorting is skipped entirely
because PLTs are not allocated for R_AARCH64_CALL26 and R_AARCH64_JUMP26
if it's disabled.

This gives significant reduction in module load time for modules with
large number of relocations with no measurable impact on modules with a
small number of relocations. In my test setup with CONFIG_RANDOMIZE_BASE
enabled, these were the results for a few downstream modules:

Module  Size (MB)
wlan14
video codec 3.8
drm 1.8
IPA 2.5
audio   1.2
gpu 1.8

Without this patch:
Module  Number of entries sortedModule load time (ms)
wlan243739  283
video codec 74029   138
drm 53837   67
IPA 42800   90
audio   21326   27
gpu 20967   32

Total time to load all these module: 637 ms

With this patch:
Module  Number of entries sortedModule load time (ms)
wlan22454   61
video codec 10150   47
drm 13014   40
IPA 809763
audio   460616
gpu 652720

Total time to load all these modules: 247

Time saved during boot for just these 6 modules: 390 ms

Cc: Ard Biesheuvel 
Signed-off-by: Saravana Kannan 
---

v1 -> v2:
- Provided more details in the commit text
- Pulled in Will's comments on the coding style
- Pulled in Ard's suggestion about skipping jumps with the same section
  index (parts of Will's suggested code)

 arch/arm64/kernel/module-plts.c | 46 ++---
 1 file changed, 43 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
index 65b08a74aec6..0ce3a28e3347 100644
--- a/arch/arm64/kernel/module-plts.c
+++ b/arch/arm64/kernel/module-plts.c
@@ -253,6 +253,40 @@ static unsigned int count_plts(Elf64_Sym *syms, Elf64_Rela 
*rela, int num,
return ret;
 }
 
+static bool branch_rela_needs_plt(Elf64_Sym *syms, Elf64_Rela *rela,
+ Elf64_Word dstidx)
+{
+
+   Elf64_Sym *s = syms + ELF64_R_SYM(rela->r_info);
+
+   if (s->st_shndx == dstidx)
+   return false;
+
+   return ELF64_R_TYPE(rela->r_info) == R_AARCH64_JUMP26 ||
+  ELF64_R_TYPE(rela->r_info) == R_AARCH64_CALL26;
+}
+
+/* Group branch PLT relas at the front end of the array. */
+static int partition_branch_plt_relas(Elf64_Sym *syms, Elf64_Rela *rela,
+ int numrels, Elf64_Word dstidx)
+{
+   int i = 0, j = numrels - 1;
+
+   if (!IS_ENABLED(CONFIG_RANDOMIZE_BASE))
+   return 0;
+
+   while (i < j) {
+   if (branch_rela_needs_plt(syms, [i], dstidx))
+   i++;
+   else if (branch_rela_needs_plt(syms, [j], dstidx))
+   swap(rela[i], rela[j]);
+   else
+   j--;
+   }
+
+   return i;
+}
+
 int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
  char *secstrings, struct module *mod)
 {
@@ -290,7 +324,7 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr 
*sechdrs,
 
for (i = 0; i < ehdr->e_shnum; i++) {
Elf64_Rela *rels = (void *)ehdr + sechdrs[i].sh_offset;
-   int numrels = sechdrs[i].sh_size / sizeof(Elf64_Rela);
+   int nents, numrels = sechdrs[i].sh_size / sizeof(Elf64_Rela);
Elf64_Shdr *dstsec = sechdrs + sechdrs[i].sh_info;
 
if (sechdrs[i].sh_type != SHT_RELA)
@@ -300,8 +334,14 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr 
*sechdrs,
if (!(dstsec->sh_flags & SHF_EXECINSTR))
continue;
 
-   /* sort by type, symbol index and addend */
-   sort(rels, numrels, sizeof(Elf64_Rela), 

Re: [PATCH v2 3/3] mm/shuffle: remove dynamic reconfiguration

2020-06-22 Thread Wei Yang
On Fri, Jun 19, 2020 at 02:59:22PM +0200, David Hildenbrand wrote:
>Commit e900a918b098 ("mm: shuffle initial free memory to improve
>memory-side-cache utilization") promised "autodetection of a
>memory-side-cache (to be added in a follow-on patch)" over a year ago.
>
>The original series included patches [1], however, they were dropped
>during review [2] to be followed-up later.
>
>Due to lack of platforms that publish an HMAT, autodetection is currently
>not implemented. However, manual activation is actively used [3]. Let's
>simplify for now and re-add when really (ever?) needed.
>
>[1] 
>https://lkml.kernel.org/r/154510700291.1941238.817190985966612531.st...@dwillia2-desk3.amr.corp.intel.com
>[2] 
>https://lkml.kernel.org/r/154690326478.676627.103843791978176914.st...@dwillia2-desk3.amr.corp.intel.com
>[3] 
>https://lkml.kernel.org/r/CAPcyv4irwGUU2x+c6b4L=kbb1dnasnkaazd6ospyjl9kfsn...@mail.gmail.com
>
>Cc: Andrew Morton 
>Cc: Johannes Weiner 
>Cc: Michal Hocko 
>Cc: Minchan Kim 
>Cc: Huang Ying 
>Cc: Wei Yang 
>Cc: Mel Gorman 
>Cc: Dan Williams 
>Signed-off-by: David Hildenbrand 

Reviewed-by: Wei Yang 

-- 
Wei Yang
Help you, Help me


Re: [PATCH] isofs: fix High Sierra dirent flag accesses

2020-06-22 Thread Egor Chelak
On 6/23/2020 12:22 AM, Matthew Wilcox wrote:
> It's been about 22 years since I contributed the patch which added
> support for the Acorn extensions ;-)  But I'm pretty sure that it's not
> possible to have an Acorn CD-ROM that is also an HSF CD-ROM.  That is,
> all Acorn formatted CD-ROMs are ISO-9660 compatible.  So I think this
> chunk of the patch is not required.

I couldn't find any info on Acorn extensions online, so I wasn't sure if
they were mutually exclusive or not, and fixed it there too, just to be
safe. Still, even though it won't be needed in practice, I think it's
better to access the flags in the same way everywhere. Having the same
field accessed differently in different places raises the question "why
it's done differently here?". If we go that way, at the very least there
should be an explanatory comment saying HSF+Acorn is an impossible
combination, and perhaps some logic to prevent HSF discs from mounting
with -o map=acorn. Just leaving it be doesn't seem like a clean
solution.

On 6/23/2020 12:31 AM, Matthew Wilcox wrote:
> Also, ew.  Why on earth do we do 'de->flags[-sbi->s_high_sierra]'?
> I'm surprised we don't have any tools that warn about references outside
> an array.  I would do this as ...
> 
> static inline u8 de_flags(struct isofs_sb_info *sbi,
>   struct iso_directory_record *de)
> {
>   if (sbi->s_high_sierra)
>   return de->date[6];
>   return de->flags;
> }
I would do something like that, but for this patch I'm just trying to do
a simple bugfix. The isofs code definitely needs a clean up, and perhaps
I'll do it in a future patch. I haven't submitted a patch before, so I
want to start with something simple and uncontroversial, while I learn
the process. :-)


Re: [PATCH] KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL

2020-06-22 Thread Xiaoyao Li

On 6/23/2020 8:51 AM, Sean Christopherson wrote:

Remove support for context switching between the guest's and host's
desired UMWAIT_CONTROL.  Propagating the guest's value to hardware isn't
required for correct functionality, e.g. KVM intercepts reads and writes
to the MSR, and the latency effects of the settings controlled by the
MSR are not architecturally visible.

As a general rule, KVM should not allow the guest to control power
management settings unless explicitly enabled by userspace, e.g. see
KVM_CAP_X86_DISABLE_EXITS.  E.g. Intel's SDM explicitly states that C0.2
can improve the performance of SMT siblings.  A devious guest could
disable C0.2 so as to improve the performance of their workloads at the
detriment to workloads running in the host or on other VMs.

Wholesale removal of UMWAIT_CONTROL context switching also fixes a race
condition where updates from the host may cause KVM to enter the guest
with the incorrect value.  Because updates are are propagated to all
CPUs via IPI (SMP function callback), the value in hardware may be
stale with respect to the cached value and KVM could enter the guest
with the wrong value in hardware.  As above, the guest can't observe the
bad value, but it's a weird and confusing wart in the implementation.

Removal also fixes the unnecessary usage of VMX's atomic load/store MSR
lists.  Using the lists is only necessary for MSRs that are required for
correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on
old hardware, or for MSRs that need to-the-uop precision, e.g. perf
related MSRs.  For UMWAIT_CONTROL, the effects are only visible in the
kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in
vcpu_vmx_run(). 



Using the atomic lists is undesirable as they are more
expensive than direct RDMSR/WRMSR.


Do you mean the extra handling of atomic list facility in kvm? Or just 
mean vm-exit/-entry MSR-load/save in VMX hardware is expensive than 
direct RDMSR/WRMSR instruction?




Re: [PATCH] arm64: defconfig: Enable Qualcomm SM8250 pinctrl driver

2020-06-22 Thread Jeffrey Hugo
On Mon, Jun 22, 2020 at 4:03 PM Bjorn Andersson
 wrote:
>
> The SM8250 pinctrl driver provides pin configuration, pin muxing and
> GPIO pin control for many pins on the SM8250 SoC.
>
> Signed-off-by: Bjorn Andersson 

Looks sane to me.

Reviewed-by: Jeffrey Hugo 


Re: [PATCH 1/2] media: omap3isp: Remove cacheflush.h

2020-06-22 Thread Laurent Pinchart
Hi Nathan,

Thank you for the patch.

On Mon, Jun 22, 2020 at 04:47:39PM -0700, Nathan Chancellor wrote:
> After mm.h was removed from the asm-generic version of cacheflush.h,
> s390 allyesconfig shows several warnings of the following nature:
> 
> In file included from ./arch/s390/include/generated/asm/cacheflush.h:1,
>  from drivers/media/platform/omap3isp/isp.c:42:
> ./include/asm-generic/cacheflush.h:16:42: warning: 'struct mm_struct'
> declared inside parameter list will not be visible outside of this
> definition or declaration
> 
> As Geert and Laurent point out, this driver does not need this header in
> the two files that include it. Remove it so there are no warnings.
> 
> Fixes: e0cf615d725c ("asm-generic: don't include  in 
> cacheflush.h")
> Suggested-by: Geert Uytterhoeven 
> Suggested-by: Laurent Pinchart 
> Signed-off-by: Nathan Chancellor 

Reviewed-by: Laurent Pinchart 

> ---
>  drivers/media/platform/omap3isp/isp.c  | 2 --
>  drivers/media/platform/omap3isp/ispvideo.c | 1 -
>  2 files changed, 3 deletions(-)
> 
> diff --git a/drivers/media/platform/omap3isp/isp.c 
> b/drivers/media/platform/omap3isp/isp.c
> index a4ee6b86663e..b91e472ee764 100644
> --- a/drivers/media/platform/omap3isp/isp.c
> +++ b/drivers/media/platform/omap3isp/isp.c
> @@ -39,8 +39,6 @@
>   *   Troy Laramy 
>   */
>  
> -#include 
> -
>  #include 
>  #include 
>  #include 
> diff --git a/drivers/media/platform/omap3isp/ispvideo.c 
> b/drivers/media/platform/omap3isp/ispvideo.c
> index 10c214bd0903..1ac9aef70dff 100644
> --- a/drivers/media/platform/omap3isp/ispvideo.c
> +++ b/drivers/media/platform/omap3isp/ispvideo.c
> @@ -18,7 +18,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include 
>  #include 
> 
> base-commit: 27f11fea33608cbd321a97cbecfa2ef97dcc1821

-- 
Regards,

Laurent Pinchart


Re: [PATCH net-next v9 2/5] net: phy: Add a helper to return the index for of the internal delay

2020-06-22 Thread Dan Murphy

David

On 6/22/20 5:40 PM, David Miller wrote:

From: Dan Murphy 
Date: Fri, 19 Jun 2020 11:18:10 -0500


+s32 phy_get_internal_delay(struct phy_device *phydev, struct device *dev,
+  const int *delay_values, int size, bool is_rx)
+{
+   int i;
+   s32 delay;

Please use reverse christmas tree ordering for local variables.


OK.

Dan



Re: [PATCH net-next v9 1/5] dt-bindings: net: Add tx and rx internal delays

2020-06-22 Thread Dan Murphy

David

Thanks for the review

On 6/22/20 5:40 PM, David Miller wrote:

From: Dan Murphy 
Date: Fri, 19 Jun 2020 11:18:09 -0500


@@ -162,6 +162,19 @@ properties:
  description:
Specifies a reference to a node representing a SFP cage.
  
+

+  rx-internal-delay-ps:

Do you really want two empty lines between these two sections?


No.  Will fix.

Dan



Re: [PATCH v2 1/3] mfd: core: Make a best effort attempt to match devices with the correct of_nodes

2020-06-22 Thread Frank Rowand
On 2020-06-22 17:23, Frank Rowand wrote:
> On 2020-06-22 14:11, Lee Jones wrote:
>> On Mon, 22 Jun 2020, Frank Rowand wrote:
>>
>>> On 2020-06-22 10:10, Lee Jones wrote:
 On Mon, 22 Jun 2020, Frank Rowand wrote:

> On 2020-06-22 03:50, Lee Jones wrote:
>> On Thu, 18 Jun 2020, Frank Rowand wrote:
>>
>>> On 2020-06-15 04:26, Lee Jones wrote:
 On Sun, 14 Jun 2020, Frank Rowand wrote:

> Hi Lee,
>
> I'm looking at 5.8-rc1.
>
> The only use of OF_MFD_CELL() where the same compatible is specified
> for multiple elements of a struct mfd_cell array is for compatible
> "stericsson,ab8500-pwm" in drivers/mfd/ab8500-core.c:
>
> OF_MFD_CELL("ab8500-pwm",
> NULL, NULL, 0, 1, "stericsson,ab8500-pwm"),
> OF_MFD_CELL("ab8500-pwm",
> NULL, NULL, 0, 2, "stericsson,ab8500-pwm"),
> OF_MFD_CELL("ab8500-pwm",
> NULL, NULL, 0, 3, "stericsson,ab8500-pwm"),
>>>
>>>  OF_MFD_CELL("ab8500-pwm",
>>>  NULL, NULL, 0, 0, "stericsson,ab8500-pwm"),
>>>
>>>  OF_MFD_CELL_REG("ab8500-pwm-mc",
>>>  NULL, NULL, 0, 0, "stericsson,ab8500-pwm", 0),
>>>  OF_MFD_CELL_REG("ab8500-pwm-mc",
>>>  NULL, NULL, 0, 1, "stericsson,ab8500-pwm", 1),
>>>  OF_MFD_CELL_REG("ab8500-pwm-mc",
>>>  NULL, NULL, 0, 2, "stericsson,ab8500-pwm", 2),
>>>
>
> The only .dts or .dtsi files where I see compatible 
> "stericsson,ab8500-pwm"
> are:
>
>arch/arm/boot/dts/ste-ab8500.dtsi
>arch/arm/boot/dts/ste-ab8505.dtsi
>
> These two .dtsi files only have a single node with this compatible.
> Chasing back to .dts and .dtsi files that include these two .dtsi
> files, I see no case where there are multiple nodes with this
> compatible.
>
> So it looks to me like there is no .dts in mainline that is providing
> the three "stericsson,ab8500-pwm" nodes that drivers/mfd/ab8500-core.c
> is expecting.  No case that there are multiple mfd child nodes where
> mfd_add_device() would assign the first of n child nodes with the
> same compatible to multiple devices.
>
> So it appears to me that drivers/mfd/ab8500-core.c is currently 
> broken.
> Am I missing something here?
>
> If I am correct, then either drivers/mfd/ab8500-core.c or
> ste-ab8500.dtsi and ste-ab8505.dtsi need to be fixed.

 Your analysis is correct.
>>>
>>> OK, if I'm not overlooking anything, that is good news.
>>>
>>> Existing .dts source files only have one "ab8500-pwm" child.  They 
>>> already
>>> work correcly.
>>>
>>> Create a new compatible for the case of multiple children.  In my 
>>> example
>>> I will add "-mc" (multiple children) to the existing compatible.  There
>>> is likely a better name, but this lets me provide an example.
>>>
>>> Modify drivers/mfd/ab8500-core.c to use the new compatible, and new .dts
>>> source files with multiple children use the new compatible:
>>>
>>>  OF_MFD_CELL("ab8500-pwm",
>>>  NULL, NULL, 0, 0, "stericsson,ab8500-pwm"),
>>>
>>>  OF_MFD_CELL_REG("ab8500-pwm-mc",
>>>  NULL, NULL, 0, 0, "stericsson,ab8500-pwm", 0),
>>>  OF_MFD_CELL_REG("ab8500-pwm-mc",
>>>  NULL, NULL, 0, 1, "stericsson,ab8500-pwm", 1),
>>>  OF_MFD_CELL_REG("ab8500-pwm-mc",
>>>  NULL, NULL, 0, 2, "stericsson,ab8500-pwm", 2),
>>>
>>> The "OF_MFD_CELL" entry is the existing entry, which will handle current
>>> .dts source files.  The new "OF_MFD_CELL_REG" entries will handle new
>>> .dts source files.
>>
>> Sorry, but I'm not sure what the above exercise is supposed to solve.
>>
>> Could you explain it for me please?
>
> The OF_MFD_CELL() entry handles all of the existing .dts source files
> that only have one ab8500-pwm child nodes.  So existing .dtb blobs
> continue to work.
>
> The OF_MFD_CELL_REG() entries will handle all of the new .dts source
> files that will have up to 3 ab8500-pwm child nodes.
>
> Compatibility is maintained for existing .dtb files.  A new kernel
> version with the changes will support new .dtb files that contain
> multiple ab8500-pwm child nodes.

 I can see *what* you're trying to do.  I was looking for an
 explanation of *how* you think that will work.  FWIW, I don't think
 what you're proposing will work as you envisage.  I thought that
 perhaps I was missing 

Re: [PATCH] KVM: x86/mmu: Don't put invalid SPs back on the list of active pages

2020-06-22 Thread Sean Christopherson
On Tue, Jun 23, 2020 at 02:23:53AM +0200, Paolo Bonzini wrote:
> On 22/06/20 21:18, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index fdd05c233308..fa5bd3f987dd 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -2757,10 +2757,13 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm 
> > *kvm,
> > if (!sp->root_count) {
> > /* Count self */
> > (*nr_zapped)++;
> > -   list_move(>link, invalid_list);
> > +   if (sp->role.invalid)
> > +   list_add(>link, invalid_list);
> > +   else
> > +   list_move(>link, invalid_list);
> 
> It's late here, but I think this part needs a comment anyway...

No argument here.  I'll spin a v2, I just realized there is a separate
optimization that can build on this patch.  I was planning on sending it
separately, but I misread the loop in make_mmu_pages_available().


Re: [PATCH 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window

2020-06-22 Thread Alexey Kardashevskiy



On 23/06/2020 04:59, Leonardo Bras wrote:
> Hello Alexey, thanks for the feedback!
> 
> On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
>>
>> On 19/06/2020 15:06, Leonardo Bras wrote:
>>> Move the window-removing part of remove_ddw into a new function
>>> (remove_dma_window), so it can be used to remove other DMA windows.
>>>
>>> It's useful for removing DMA windows that don't create DIRECT64_PROPNAME
>>> property, like the default DMA window from the device, which uses
>>> "ibm,dma-window".
>>>
>>> Signed-off-by: Leonardo Bras 
>>> ---
>>>  arch/powerpc/platforms/pseries/iommu.c | 53 +++---
>>>  1 file changed, 31 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>>> b/arch/powerpc/platforms/pseries/iommu.c
>>> index 5e1fbc176a37..de633f6ae093 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -767,25 +767,14 @@ static int __init disable_ddw_setup(char *str)
>>>  
>>>  early_param("disable_ddw", disable_ddw_setup);
>>>  
>>> -static void remove_ddw(struct device_node *np, bool remove_prop)
>>> +static void remove_dma_window(struct device_node *pdn, u32 *ddw_avail,
>>
>> You do not need the entire ddw_avail here, pass just the token you need.
> 
> Well, I just emulated the behavior of create_ddw() and query_ddw() as
> both just pass the array instead of the token, even though they only
> use a single token. 

True, there is a pattern.

> I think it's to make the rest of the code independent of the design of
> the "ibm,ddw-applicable" array, and if it changes, only local changes
> on the functions will be needed.

The helper removes a window, if you are going to call other operations
in remove_dma_window(), then you'll have to change its name ;)


>> Also, despite this particular file, the "pdn" name is usually used for
>> struct pci_dn (not device_node), let's keep it that way.
> 
> Sure, I got confused for some time about this, as we have:
> static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn).
> but on *_ddw() we have "struct pci_dn *pdn".

True again, not the cleanest style here.


> I will also add a patch that renames those 'struct device_node *pdn' to
> something like 'struct device_node *parent_dn'.

I would not go that far, we (well, Oliver) are getting rid of many
occurrences of pci_dn and Oliver may have a stronger opinion here.


> 
>>> + struct property *win)
>>>  {
>>> struct dynamic_dma_window_prop *dwp;
>>> -   struct property *win64;
>>> -   u32 ddw_avail[3];
>>> u64 liobn;
>>> -   int ret = 0;
>>> -
>>> -   ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
>>> -_avail[0], 3);
>>> -
>>> -   win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
>>> -   if (!win64)
>>> -   return;
>>> -
>>> -   if (ret || win64->length < sizeof(*dwp))
>>> -   goto delprop;
>>> +   int ret;
>>>  
>>> -   dwp = win64->value;
>>> +   dwp = win->value;
>>> liobn = (u64)be32_to_cpu(dwp->liobn);
>>>  
>>> /* clear the whole window, note the arg is in kernel pages */
>>> @@ -793,24 +782,44 @@ static void remove_ddw(struct device_node *np, bool 
>>> remove_prop)
>>> 1ULL << (be32_to_cpu(dwp->window_shift) - PAGE_SHIFT), dwp);
>>> if (ret)
>>> pr_warn("%pOF failed to clear tces in window.\n",
>>> -   np);
>>> +   pdn);
>>> else
>>> pr_debug("%pOF successfully cleared tces in window.\n",
>>> -np);
>>> +pdn);
>>>  
>>> ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
>>> if (ret)
>>> pr_warn("%pOF: failed to remove direct window: rtas returned "
>>> "%d to ibm,remove-pe-dma-window(%x) %llx\n",
>>> -   np, ret, ddw_avail[2], liobn);
>>> +   pdn, ret, ddw_avail[2], liobn);
>>> else
>>> pr_debug("%pOF: successfully removed direct window: rtas 
>>> returned "
>>> "%d to ibm,remove-pe-dma-window(%x) %llx\n",
>>> -   np, ret, ddw_avail[2], liobn);
>>> +   pdn, ret, ddw_avail[2], liobn);
>>> +}
>>> +
>>> +static void remove_ddw(struct device_node *np, bool remove_prop)
>>> +{
>>> +   struct property *win;
>>> +   u32 ddw_avail[3];
>>> +   int ret = 0;
>>> +
>>> +   ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
>>> +_avail[0], 3);
>>> +   if (ret)
>>> +   return;
>>> +
>>> +   win = of_find_property(np, DIRECT64_PROPNAME, NULL);
>>> +   if (!win)
>>> +   return;
>>> +
>>> +   if (win->length >= sizeof(struct dynamic_dma_window_prop))
>>
>> Any good reason not to make it "=="? Is there something optional or we
>> expect extension (which may not grow from the end but may add cells in
>> between). Thanks,
> 
> Well, it comes from the old behavior of 

Re: [PATCH 2/4] powerpc/pseries/iommu: Implement ibm,reset-pe-dma-windows rtas call

2020-06-22 Thread Alexey Kardashevskiy



On 23/06/2020 04:58, Leonardo Bras wrote:
> Hello Alexey, thanks for the feedback!
> 
> On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
>>
>> On 19/06/2020 15:06, Leonardo Bras wrote:
>>> Platforms supporting the DDW option starting with LoPAR level 2.7 implement
>>> ibm,ddw-extensions. The first extension available (index 2) carries the
>>> token for ibm,reset-pe-dma-windows rtas call, which is used to restore
>>> the default DMA window for a device, if it has been deleted.
>>>
>>> It does so by resetting the TCE table allocation for the PE to it's
>>> boot time value, available in "ibm,dma-window" device tree node.
>>>
>>> Signed-off-by: Leonardo Bras 
>>> ---
>>>  arch/powerpc/platforms/pseries/iommu.c | 33 ++
>>>  1 file changed, 33 insertions(+)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>>> b/arch/powerpc/platforms/pseries/iommu.c
>>> index e5a617738c8b..5e1fbc176a37 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -1012,6 +1012,39 @@ static phys_addr_t ddw_memory_hotplug_max(void)
>>> return max_addr;
>>>  }
>>>  
>>> +/*
>>> + * Platforms supporting the DDW option starting with LoPAR level 2.7 
>>> implement
>>> + * ibm,ddw-extensions, which carries the rtas token for
>>> + * ibm,reset-pe-dma-windows.
>>> + * That rtas-call can be used to restore the default DMA window for the 
>>> device.
>>> + */
>>> +static void reset_dma_window(struct pci_dev *dev, struct device_node 
>>> *par_dn)
>>> +{
>>> +   int ret;
>>> +   u32 cfg_addr, ddw_ext[3];
>>> +   u64 buid;
>>> +   struct device_node *dn;
>>> +   struct pci_dn *pdn;
>>> +
>>> +   ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
>>> +_ext[0], 3);
>>
>> s/3/2/ as for the reset extension you do not need the "64bit largest
>> block" extension.
> 
> Sure, I will update this.
> 
>>
>>
>>> +   if (ret)
>>> +   return;
>>> +
>>> +   dn = pci_device_to_OF_node(dev);
>>> +   pdn = PCI_DN(dn);
>>> +   buid = pdn->phb->buid;
>>> +   cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>>> +
>>> +   ret = rtas_call(ddw_ext[1], 3, 1, NULL, cfg_addr,
>>
>> Here the "reset" extention is in ddw_ext[1]. Hm. 1/4 has a bug then.
> 
> Humm, in 1/4 I used dd_ext[0] (how many extensions) and ddw_ext[2] (64-
> bit largest window count). I fail to see the bug here.

There is none, my bad :)


>> And I am pretty sure it won't compile as reset_dma_window() is not used
>> and it is static so fold it into one the next patches. Thanks,
> 
> Sure, I will do that. 
> I was questioning myself about this and thought it would be better to
> split for easier revision.

People separate things when a patch is really huge but even then I miss
the point - I'd really like to see a new function _and_ its uses in the
same patch, otherwise I either need to jump between mails or apply the
series, either is little but extra work :) Thanks,


>>
>>
>>> +   BUID_HI(buid), BUID_LO(buid));
>>> +   if (ret)
>>> +   dev_info(>dev,
>>> +"ibm,reset-pe-dma-windows(%x) %x %x %x returned %d ",
>>> +ddw_ext[1], cfg_addr, BUID_HI(buid), BUID_LO(buid),
>>> +ret);
>>> +}
>>> +
>>>  /*
>>>   * If the PE supports dynamic dma windows, and there is space for a table
>>>   * that can map all pages in a linear offset, then setup such a table,
>>>
> 
> Best regards,
> Leonardo
> 

-- 
Alexey


Re: [PATCH 1/4] powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows

2020-06-22 Thread Alexey Kardashevskiy



On 23/06/2020 04:58, Leonardo Bras wrote:
> Hello Alexey, thank you for the feedback!
> 
> On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
>>
>> On 19/06/2020 15:06, Leonardo Bras wrote:
>>> From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the number of
>>> outputs from "ibm,query-pe-dma-windows" go from 5 to 6.
>>>
>>> This change of output size is meant to expand the address size of
>>> largest_available_block PE TCE from 32-bit to 64-bit, which ends up
>>> shifting page_size and migration_capable.
>>>
>>> This ends up requiring the update of
>>> ddw_query_response->largest_available_block from u32 to u64, and manually
>>> assigning the values from the buffer into this struct, according to
>>> output size.
>>>
>>> Signed-off-by: Leonardo Bras 
>>> ---
>>>  arch/powerpc/platforms/pseries/iommu.c | 57 +-
>>>  1 file changed, 46 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>>> b/arch/powerpc/platforms/pseries/iommu.c
>>> index 6d47b4a3ce39..e5a617738c8b 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -334,7 +334,7 @@ struct direct_window {
>>>  /* Dynamic DMA Window support */
>>>  struct ddw_query_response {
>>> u32 windows_available;
>>> -   u32 largest_available_block;
>>> +   u64 largest_available_block;
>>> u32 page_size;
>>> u32 migration_capable;
>>>  };
>>> @@ -869,14 +869,32 @@ static int find_existing_ddw_windows(void)
>>>  }
>>>  machine_arch_initcall(pseries, find_existing_ddw_windows);
>>>  
>>> +/*
>>> + * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule how many 
>>> output
>>> + * parameters ibm,query-pe-dma-windows will have, ranging from 5 to 6.
>>> + */
>>> +
>>> +static int query_ddw_out_sz(struct device_node *par_dn)
>>
>> Can easily be folded into query_ddw().
> 
> Sure, but it will get inlined by the compiler, and I think it reads
> better this way. 


> I mean, I understand you have a reason to think it's better to fold it
> in query_ddw(), and I would like to better understand that to improve
> my code in the future.


You have numbers 5 and 6 (the number of parameters) twice in the file,
this is why I brought it up. query_ddw_out_sz() can potentially return
something else than 5 or 6 and you will have to change the callsite(s)
then, since these are not macros, this allows to think there may be more
places with 5 and 6. Dunno. A single function will simplify things imho.


> 
>>> +{
>>> +   int ret;
>>> +   u32 ddw_ext[3];
>>> +
>>> +   ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
>>> +_ext[0], 3);
>>> +   if (ret || ddw_ext[0] < 2 || ddw_ext[2] != 1)
>>
>> Oh that PAPR thing again :-/
>>
>> ===
>> The “ibm,ddw-extensions” property value is a list of integers the first
>> integer indicates the number of extensions implemented and subsequent
>> integers, one per extension, provide a value associated with that
>> extension.
>> ===
>>
>> So ddw_ext[0] is length.
>> Listindex==2 is for "reset" says PAPR and
>> Listindex==3 is for this new 64bit "largest_available_block".
>>
>> So I'd expect ddw_ext[2] to have the "reset" token and ddw_ext[3] to
>> have "1" for this new feature but indexes are smaller. I am confused.
>> Either way these "2" and "3" needs to be defined in macros, "0" probably
>> too.
> 
> Remember these indexes are not C-like 0-starting indexes, where the
> size would be Listindex==1.

Yeah I can see that is the assumption but out of curiosity - is it
written anywhere? Across PAPR, they index bytes from 1 but bits from 0 :-/

Either way make them macros.


> Basically, in C-like array it's :
> a[0] == size, 
> a[1] == reset_token, 
> a[2] == new 64bit "largest_available_block"
> 
>> Please post 'lsprop "ibm,ddw-extensions"' here. Thanks,
> 
> Sure:
> [root@host pci@80029004005]# lsprop "ibm,ddw-extensions"
> ibm,dd
> w-extensions
>  0002 0056 

Right, good. Thanks,


> 
> 
>>
>>> +   return 5;
>>> +   return 6;
>>> +}
>>> +
>>>  static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
>>> -   struct ddw_query_response *query)
>>> +struct ddw_query_response *query,
>>> +struct device_node *par_dn)
>>>  {
>>> struct device_node *dn;
>>> struct pci_dn *pdn;
>>> -   u32 cfg_addr;
>>> +   u32 cfg_addr, query_out[5];
>>> u64 buid;
>>> -   int ret;
>>> +   int ret, out_sz;
>>>  
>>> /*
>>>  * Get the config address and phb buid of the PE window.
>>> @@ -888,12 +906,29 @@ static int query_ddw(struct pci_dev *dev, const u32 
>>> *ddw_avail,
>>> pdn = PCI_DN(dn);
>>> buid = pdn->phb->buid;
>>> cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>>> +   out_sz = query_ddw_out_sz(par_dn);
>>> +
>>> +   ret = rtas_call(ddw_avail[0], 3, out_sz, query_out,
>>> +   cfg_addr, BUID_HI(buid), BUID_LO(buid));

Re: [PATCH 4/4] powerpc/pseries/iommu: Remove default DMA window before creating DDW

2020-06-22 Thread Alexey Kardashevskiy



On 23/06/2020 04:59, Leonardo Bras wrote:
> Hello Alexey, thanks for the feedback!
> 
> On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
>>
>> On 19/06/2020 15:06, Leonardo Bras wrote:
>>> On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
>>> default DMA window for the device, before attempting to configure a DDW,
>>> in order to make the maximum resources available for the next DDW to be
>>> created.
>>>
>>> This is a requirement for some devices to use DDW, given they only
>>> allow one DMA window.
>>>
>>> If setting up a new DDW fails anywhere after the removal of this
>>> default DMA window, restore it using reset_dma_window.
>>
>> Nah... If we do it like this, then under pHyp we lose 32bit DMA for good
>> as pHyp can only create a single window and it has to map at
>> 0x800.... They probably do not care though.
>>
>> Under KVM, this will fail as VFIO allows creating  2 windows and it
>> starts from 0 but the existing iommu_bypass_supported_pSeriesLP() treats
>> the window address == 0 as a failure. And we want to keep both DMA
>> windows for PCI adapters with both 64bit and 32bit PCI functions (I
>> heard AMD GPU video + audio are like this) or someone could hotplug
>> 32bit DMA device on a vphb with already present 64bit DMA window so we
>> do not remove the default window.
> 
> Well, then I would suggest doing something like this:
>   query_ddw(...);
>   if (query.windows_available == 0){
>   remove_dma_window(...,default_win);
>   query_ddw(...);
>   }
> 
> This would make sure to cover cases of windows available == 1
> and windows available > 1; 


Is "1" what pHyp returns on query? And was it always like that? Then it
is probably ok. I just never really explored the idea of removing the
default window as we did not have to.


>> The last discussed thing I remember was that there was supposed to be a
>> new bit in "ibm,architecture-vec-5" (forgot the details), we could use
>> that to decide whether to keep the default window or not, like this.
> 
> I checked on the latest LoPAR draft (soon to be published), for the
> ibm,architecture-vec 'option array 5' and this entry was the only
> recently added one that is related to this patchset:
> 
> Byte 8 - Bit 0:
> SRIOV Virtual Functions Support Dynamic DMA Windows (DDW):
> 0: SRIOV Virtual Functions do not support DDW
> 1: SRIOV Virtual Functions do support DDW
> 
> Isn't this equivalent to having a "ibm,ddw-applicable" property?

I am not sure, is there anything else to this new bit? I'd think if the
client supports it, then pHyp will create one 64bit window per a PE and
DDW API won't be needed. Thanks,


> 
> 
>>
>>> Signed-off-by: Leonardo Bras 
>>> ---
>>>  arch/powerpc/platforms/pseries/iommu.c | 20 +---
>>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>>> b/arch/powerpc/platforms/pseries/iommu.c
>>> index de633f6ae093..68d1ea957ac7 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -1074,8 +1074,9 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
>>> device_node *pdn)
>>> u64 dma_addr, max_addr;
>>> struct device_node *dn;
>>> u32 ddw_avail[3];
>>> +
>>> struct direct_window *window;
>>> -   struct property *win64;
>>> +   struct property *win64, *dfl_win;
>>
>> Make it "default_win" or "def_win", "dfl" hurts to read :)
> 
> Sure, no problem :)
> 
>>
>>> struct dynamic_dma_window_prop *ddwprop;
>>> struct failed_ddw_pdn *fpdn;
>>>  
>>> @@ -1110,8 +,19 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
>>> device_node *pdn)
>>> if (ret)
>>> goto out_failed;
>>>  
>>> -   /*
>>> -* Query if there is a second window of size to map the
>>> +   /*
>>> +* First step of setting up DDW is removing the default DMA window,
>>> +* if it's present. It will make all the resources available to the
>>> +* new DDW window.
>>> +* If anything fails after this, we need to restore it.
>>> +*/
>>> +
>>> +   dfl_win = of_find_property(pdn, "ibm,dma-window", NULL);
>>> +   if (dfl_win)
>>> +   remove_dma_window(pdn, ddw_avail, dfl_win);
>>
>> Before doing so, you want to make sure that the "reset" is actually
>> supported. Thanks,
> 
> Good catch, I will improve that.
> 
>>
>>
>>> +
>>> +   /*
>>> +* Query if there is a window of size to map the
>>>  * whole partition.  Query returns number of windows, largest
>>>  * block assigned to PE (partition endpoint), and two bitmasks
>>>  * of page sizes: supported and supported for migrate-dma.
>>> @@ -1219,6 +1231,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
>>> device_node *pdn)
>>> kfree(win64);
>>>  
>>>  out_failed:
>>> +   if (dfl_win)
>>> +   reset_dma_window(dev, pdn);
>>>  
>>> fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
>>> if (!fpdn)
>>>
> 
> Best regards,

Re: [PATCH 4/4] iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist()

2020-06-22 Thread kernel test robot
Hi John,

I love your patch! Perhaps something to improve:

[auto build test WARNING on iommu/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/John-Garry/iommu-arm-smmu-v3-Improve-cmdq-lock-efficiency/20200623-013438
base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: arm64-randconfig-c024-20200622 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>, old ones prefixed by <<):

In file included from include/linux/bits.h:23,
from include/linux/ioport.h:15,
from include/linux/acpi.h:12,
from drivers/iommu/arm-smmu-v3.c:12:
drivers/iommu/arm-smmu-v3.c: In function 'arm_smmu_cmdq_issue_cmdlist':
include/linux/bits.h:26:28: warning: comparison of unsigned expression < 0 is 
always false [-Wtype-limits]
26 |   __builtin_constant_p((l) > (h)), (l) > (h), 0)))
|^
include/linux/build_bug.h:16:62: note: in definition of macro 
'BUILD_BUG_ON_ZERO'
16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
|  ^
include/linux/bits.h:39:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
39 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
|   ^~~
>> drivers/iommu/arm-smmu-v3.c:1404:18: note: in expansion of macro 'GENMASK'
1404 |  u32 prod_mask = GENMASK(cmdq->q.llq.max_n_shift, 0);
|  ^~~
include/linux/bits.h:26:40: warning: comparison of unsigned expression < 0 is 
always false [-Wtype-limits]
26 |   __builtin_constant_p((l) > (h)), (l) > (h), 0)))
|^
include/linux/build_bug.h:16:62: note: in definition of macro 
'BUILD_BUG_ON_ZERO'
16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
|  ^
include/linux/bits.h:39:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
39 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
|   ^~~
>> drivers/iommu/arm-smmu-v3.c:1404:18: note: in expansion of macro 'GENMASK'
1404 |  u32 prod_mask = GENMASK(cmdq->q.llq.max_n_shift, 0);
|  ^~~

vim +/GENMASK +1404 drivers/iommu/arm-smmu-v3.c

  1369  
  1370  /*
  1371   * This is the actual insertion function, and provides the following
  1372   * ordering guarantees to callers:
  1373   *
  1374   * - There is a dma_wmb() before publishing any commands to the queue.
  1375   *   This can be relied upon to order prior writes to data structures
  1376   *   in memory (such as a CD or an STE) before the command.
  1377   *
  1378   * - On completion of a CMD_SYNC, there is a control dependency.
  1379   *   This can be relied upon to order subsequent writes to memory (e.g.
  1380   *   freeing an IOVA) after completion of the CMD_SYNC.
  1381   *
  1382   * - Command insertion is totally ordered, so if two CPUs each race to
  1383   *   insert their own list of commands then all of the commands from one
  1384   *   CPU will appear before any of the commands from the other CPU.
  1385   *
  1386   * - A CMD_SYNC is always inserted, which ensures we limit the prod 
pointer
  1387   *   for when the cmdq is full, such that we don't wrap more than twice.
  1388   *   It also makes it easy for the owner to know by how many to 
increment the
  1389   *   cmdq lock.
  1390   */
  1391  static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
  1392 u64 *cmds, int n)
  1393  {
  1394  u64 cmd_sync[CMDQ_ENT_DWORDS];
  1395  const int sync = 1;
  1396  u32 prod;
  1397  unsigned long flags;
  1398  bool owner;
  1399  struct arm_smmu_cmdq *cmdq = >cmdq;
  1400  struct arm_smmu_ll_queue llq = {
  1401  .max_n_shift = cmdq->q.llq.max_n_shift,
  1402  }, head = llq, space = llq;
  1403  u32 owner_val = 1 << cmdq->q.llq.owner_count_shift;
> 1404  u32 prod_mask = GENMASK(cmdq->q.llq.max_n_shift, 0);
  1405  u32 owner_mask = GENMASK(30, cmdq->q.llq.owner_count_shift);
  1406  int ret = 0;
  1407  
  1408  /* 1. Allocate some space in the queue */
  1409  local_irq_save(flags);
  1410  
  1411  prod = atomic_fetch_add(n + sync + owner_val,
  1412  >q.llq.atomic.prod);
  1413  
  1414  owner = !(prod & owner_mask);
  1415  llq.prod = prod_mask & prod;
  1416  head.prod = queue_inc_prod_n(, n + sync);
  1417  
  1418  /*
  1419   * Ensure it's safe to write the entries. For th

[PATCH v5 1/2] remoteproc: qcom: Add per subsystem SSR notification

2020-06-22 Thread Rishabh Bhatnagar
Currently there is a single notification chain which is called whenever any
remoteproc shuts down. This leads to all the listeners being notified, and
is not an optimal design as kernel drivers might only be interested in
listening to notifications from a particular remoteproc. Create a global
list of remoteproc notification info data structures. This will hold the
name and notifier_list information for a particular remoteproc. The API
to register for notifications will use name argument to retrieve the
notification info data structure and the notifier block will be added to
that data structure's notification chain. Also move from blocking notifier
to srcu notifer based implementation to support dynamic notifier head
creation.

Signed-off-by: Siddharth Gupta 
Signed-off-by: Rishabh Bhatnagar 
---
 drivers/remoteproc/qcom_common.c  | 86 +--
 drivers/remoteproc/qcom_common.h  |  5 +-
 include/linux/remoteproc/qcom_rproc.h | 20 ++--
 3 files changed, 91 insertions(+), 20 deletions(-)

diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c
index 9028cea..658f2ca 100644
--- a/drivers/remoteproc/qcom_common.c
+++ b/drivers/remoteproc/qcom_common.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -23,7 +24,14 @@
 #define to_smd_subdev(d) container_of(d, struct qcom_rproc_subdev, subdev)
 #define to_ssr_subdev(d) container_of(d, struct qcom_rproc_ssr, subdev)
 
-static BLOCKING_NOTIFIER_HEAD(ssr_notifiers);
+struct qcom_ssr_subsystem {
+   const char *name;
+   struct srcu_notifier_head notifier_list;
+   struct list_head list;
+};
+
+static LIST_HEAD(qcom_ssr_subsystem_list);
+static DEFINE_MUTEX(qcom_ssr_subsys_lock);
 
 static int glink_subdev_start(struct rproc_subdev *subdev)
 {
@@ -189,37 +197,80 @@ void qcom_remove_smd_subdev(struct rproc *rproc, struct 
qcom_rproc_subdev *smd)
 }
 EXPORT_SYMBOL_GPL(qcom_remove_smd_subdev);
 
+static struct qcom_ssr_subsystem *qcom_ssr_get_subsys(const char *name)
+{
+   struct qcom_ssr_subsystem *info;
+
+   mutex_lock(_ssr_subsys_lock);
+   /* Match in the global qcom_ssr_subsystem_list with name */
+   list_for_each_entry(info, _ssr_subsystem_list, list)
+   if (!strcmp(info->name, name))
+   return info;
+
+   info = kzalloc(sizeof(*info), GFP_KERNEL);
+   if (!info)
+   return ERR_PTR(-ENOMEM);
+   info->name = kstrdup_const(name, GFP_KERNEL);
+   srcu_init_notifier_head(>notifier_list);
+
+   /* Add to global notification list */
+   list_add_tail(>list, _ssr_subsystem_list);
+   mutex_unlock(_ssr_subsys_lock);
+
+   return info;
+}
+
 /**
  * qcom_register_ssr_notifier() - register SSR notification handler
- * @nb:notifier_block to notify for restart notifications
+ * @name:  Subsystem's SSR name
+ * @nb:notifier_block to be invoked upon subsystem's state 
change
  *
- * Returns 0 on success, negative errno on failure.
+ * This registers the @nb notifier block as part the notifier chain for a
+ * remoteproc associated with @name. The notifier block's callback
+ * will be invoked when the remote processor's SSR events occur
+ * (pre/post startup and pre/post shutdown).
  *
- * This register the @notify function as handler for restart notifications. As
- * remote processors are stopped this function will be called, with the SSR
- * name passed as a parameter.
+ * Return: a subsystem cookie on success, ERR_PTR on failure.
  */
-int qcom_register_ssr_notifier(struct notifier_block *nb)
+void *qcom_register_ssr_notifier(const char *name, struct notifier_block *nb)
 {
-   return blocking_notifier_chain_register(_notifiers, nb);
+   struct qcom_ssr_subsystem *info;
+
+   info = qcom_ssr_get_subsys(name);
+   if (IS_ERR(info))
+   return info;
+
+   srcu_notifier_chain_register(>notifier_list, nb);
+
+   return >notifier_list;
 }
 EXPORT_SYMBOL_GPL(qcom_register_ssr_notifier);
 
 /**
  * qcom_unregister_ssr_notifier() - unregister SSR notification handler
+ * @notify:subsystem coookie returned from qcom_register_ssr_notifier
  * @nb:notifier_block to unregister
+ *
+ * This function will unregister the notifier from the particular notifier
+ * chain.
+ *
+ * Return: 0 on success, %ENOENT otherwise.
  */
-void qcom_unregister_ssr_notifier(struct notifier_block *nb)
+int qcom_unregister_ssr_notifier(void *notify, struct notifier_block *nb)
 {
-   blocking_notifier_chain_unregister(_notifiers, nb);
+   return srcu_notifier_chain_unregister(notify, nb);
 }
 EXPORT_SYMBOL_GPL(qcom_unregister_ssr_notifier);
 
 static void ssr_notify_unprepare(struct rproc_subdev *subdev)
 {
struct qcom_rproc_ssr *ssr = to_ssr_subdev(subdev);
+   struct qcom_ssr_notif_data data = {
+   .name = ssr->info->name,
+   .crashed = false,
+   };
 

[PATCH v5 0/2] Extend SSR notifications framework

2020-06-22 Thread Rishabh Bhatnagar
This set of patches gives kernel client drivers the ability to register
for a particular remoteproc's SSR notifications. Also the notifications
are extended to before/after-powerup/shutdown stages.
It also fixes the bug where clients need to register for notifications
again if the platform driver is removed. This is done by creating a
global list of per-remoteproc notification info data structures which
remain static. An API is exported to register for a remoteproc's SSR
notifications and uses remoteproc's ssr_name and notifier block as
arguments.

Changelog:
v5 -> v4:
- Make qcom_ssr_get_subsys static function
- Fix mutex locking
- Fix function comments

v4 -> v3:
- Fix naming convention 

v3 -> v2:
- Create a global list of per remoteproc notification data structure
- Pass ssr_name and crashed information as part of notification data
- Move notification type enum to qcom_rproc.h from remoteproc.h

v2 -> v1:
- Fix commit text

Rishabh Bhatnagar (1):
  remoteproc: qcom: Add per subsystem SSR notification

Siddharth Gupta (1):
  remoteproc: qcom: Add notification types to SSR

 drivers/remoteproc/qcom_common.c  | 128 ++
 drivers/remoteproc/qcom_common.h  |   5 +-
 include/linux/remoteproc/qcom_rproc.h |  36 --
 3 files changed, 149 insertions(+), 20 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v5 2/2] remoteproc: qcom: Add notification types to SSR

2020-06-22 Thread Rishabh Bhatnagar
The SSR subdevice only adds callback for the unprepare event. Add callbacks
for prepare, start and prepare events. The client driver for a particular
remoteproc might be interested in knowing the status of the remoteproc
while undergoing SSR, not just when the remoteproc has finished shutting
down.

Signed-off-by: Siddharth Gupta 
Signed-off-by: Rishabh Bhatnagar 
---
 drivers/remoteproc/qcom_common.c  | 44 ++-
 include/linux/remoteproc/qcom_rproc.h | 16 +
 2 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c
index 658f2ca..0848bf1 100644
--- a/drivers/remoteproc/qcom_common.c
+++ b/drivers/remoteproc/qcom_common.c
@@ -262,6 +262,44 @@ int qcom_unregister_ssr_notifier(void *notify, struct 
notifier_block *nb)
 }
 EXPORT_SYMBOL_GPL(qcom_unregister_ssr_notifier);
 
+static int ssr_notify_prepare(struct rproc_subdev *subdev)
+{
+   struct qcom_rproc_ssr *ssr = to_ssr_subdev(subdev);
+   struct qcom_ssr_notif_data data = {
+   .name = ssr->info->name,
+   .crashed = false,
+   };
+
+   srcu_notifier_call_chain(>info->notifier_list,
+QCOM_SSR_BEFORE_POWERUP, );
+   return 0;
+}
+
+static int ssr_notify_start(struct rproc_subdev *subdev)
+{
+   struct qcom_rproc_ssr *ssr = to_ssr_subdev(subdev);
+   struct qcom_ssr_notif_data data = {
+   .name = ssr->info->name,
+   .crashed = false,
+   };
+
+   srcu_notifier_call_chain(>info->notifier_list,
+QCOM_SSR_AFTER_POWERUP, );
+   return 0;
+}
+
+static void ssr_notify_stop(struct rproc_subdev *subdev, bool crashed)
+{
+   struct qcom_rproc_ssr *ssr = to_ssr_subdev(subdev);
+   struct qcom_ssr_notif_data data = {
+   .name = ssr->info->name,
+   .crashed = crashed,
+   };
+
+   srcu_notifier_call_chain(>info->notifier_list,
+QCOM_SSR_BEFORE_SHUTDOWN, );
+}
+
 static void ssr_notify_unprepare(struct rproc_subdev *subdev)
 {
struct qcom_rproc_ssr *ssr = to_ssr_subdev(subdev);
@@ -270,7 +308,8 @@ static void ssr_notify_unprepare(struct rproc_subdev 
*subdev)
.crashed = false,
};
 
-   srcu_notifier_call_chain(>info->notifier_list, 0, );
+   srcu_notifier_call_chain(>info->notifier_list,
+QCOM_SSR_AFTER_SHUTDOWN, );
 }
 
 /**
@@ -294,6 +333,9 @@ void qcom_add_ssr_subdev(struct rproc *rproc, struct 
qcom_rproc_ssr *ssr,
}
 
ssr->info = info;
+   ssr->subdev.prepare = ssr_notify_prepare;
+   ssr->subdev.start = ssr_notify_start;
+   ssr->subdev.stop = ssr_notify_stop;
ssr->subdev.unprepare = ssr_notify_unprepare;
 
rproc_add_subdev(rproc, >subdev);
diff --git a/include/linux/remoteproc/qcom_rproc.h 
b/include/linux/remoteproc/qcom_rproc.h
index 58422b1..83ac8e8 100644
--- a/include/linux/remoteproc/qcom_rproc.h
+++ b/include/linux/remoteproc/qcom_rproc.h
@@ -5,6 +5,22 @@ struct notifier_block;
 
 #if IS_ENABLED(CONFIG_QCOM_RPROC_COMMON)
 
+/**
+ * enum qcom_ssr_notif_type - Startup/Shutdown events related to a remoteproc
+ * processor.
+ *
+ * @QCOM_SSR_BEFORE_POWERUP:   Remoteproc about to start (prepare stage)
+ * @QCOM_SSR_AFTER_POWERUP:Remoteproc is running (start stage)
+ * @QCOM_SSR_BEFORE_SHUTDOWN:  Remoteproc crashed or shutting down (stop stage)
+ * @QCOM_SSR_AFTER_SHUTDOWN:   Remoteproc is down (unprepare stage)
+ */
+enum qcom_ssr_notif_type {
+   QCOM_SSR_BEFORE_POWERUP,
+   QCOM_SSR_AFTER_POWERUP,
+   QCOM_SSR_BEFORE_SHUTDOWN,
+   QCOM_SSR_AFTER_SHUTDOWN,
+};
+
 struct qcom_ssr_notif_data {
const char *name;
bool crashed;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH] mailbox: imx: Mark PM functions as __maybe_unused

2020-06-22 Thread Nathan Chancellor
When CONFIG_PM and CONFIG_PM_SLEEP are unset, the following warnings
occur:

drivers/mailbox/imx-mailbox.c:638:12: warning: 'imx_mu_runtime_resume'
defined but not used [-Wunused-function]
  638 | static int imx_mu_runtime_resume(struct device *dev)
  |^
drivers/mailbox/imx-mailbox.c:629:12: warning: 'imx_mu_runtime_suspend'
defined but not used [-Wunused-function]
  629 | static int imx_mu_runtime_suspend(struct device *dev)
  |^~
drivers/mailbox/imx-mailbox.c:611:12: warning: 'imx_mu_resume_noirq'
defined but not used [-Wunused-function]
  611 | static int imx_mu_resume_noirq(struct device *dev)
  |^~~
drivers/mailbox/imx-mailbox.c:601:12: warning: 'imx_mu_suspend_noirq'
defined but not used [-Wunused-function]
  601 | static int imx_mu_suspend_noirq(struct device *dev)
  |^~~~

Mark these functions as __maybe_unused, which is the standard procedure
for PM functions.

Fixes: bb2b2624dbe2 ("mailbox: imx: Add runtime PM callback to handle MU 
clocks")
Signed-off-by: Nathan Chancellor 
---
 drivers/mailbox/imx-mailbox.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/mailbox/imx-mailbox.c b/drivers/mailbox/imx-mailbox.c
index 7205b825c8b5..2543c7b6948b 100644
--- a/drivers/mailbox/imx-mailbox.c
+++ b/drivers/mailbox/imx-mailbox.c
@@ -598,7 +598,7 @@ static const struct of_device_id imx_mu_dt_ids[] = {
 };
 MODULE_DEVICE_TABLE(of, imx_mu_dt_ids);
 
-static int imx_mu_suspend_noirq(struct device *dev)
+static int __maybe_unused imx_mu_suspend_noirq(struct device *dev)
 {
struct imx_mu_priv *priv = dev_get_drvdata(dev);
 
@@ -608,7 +608,7 @@ static int imx_mu_suspend_noirq(struct device *dev)
return 0;
 }
 
-static int imx_mu_resume_noirq(struct device *dev)
+static int __maybe_unused imx_mu_resume_noirq(struct device *dev)
 {
struct imx_mu_priv *priv = dev_get_drvdata(dev);
 
@@ -626,7 +626,7 @@ static int imx_mu_resume_noirq(struct device *dev)
return 0;
 }
 
-static int imx_mu_runtime_suspend(struct device *dev)
+static int __maybe_unused imx_mu_runtime_suspend(struct device *dev)
 {
struct imx_mu_priv *priv = dev_get_drvdata(dev);
 
@@ -635,7 +635,7 @@ static int imx_mu_runtime_suspend(struct device *dev)
return 0;
 }
 
-static int imx_mu_runtime_resume(struct device *dev)
+static int __maybe_unused imx_mu_runtime_resume(struct device *dev)
 {
struct imx_mu_priv *priv = dev_get_drvdata(dev);
int ret;

base-commit: 27f11fea33608cbd321a97cbecfa2ef97dcc1821
-- 
2.27.0



Re: [PATCH v2 0/3] Preventing job distribution to isolated CPUs

2020-06-22 Thread Nitesh Narayan Lal

On 6/22/20 7:45 PM, Nitesh Narayan Lal wrote:
>
> Testing
> ===
> * Patch 1: 
>   Fix for cpumask_local_spread() is tested by creating VFs, loading
>   iavf module and by adding a tracepoint to confirm that only housekeeping 
>   CPUs are picked when an appropriate profile is set up and all remaining  
>   CPUs when no CPU isolation is configured.
>
> * Patch 2: 
>   To test the PCI fix, I hotplugged a virtio-net-pci from qemu console 
>   and forced its addition to a specific node to trigger the code path that 
>   includes the proposed fix and verified that only housekeeping CPUs   
>   are included via tracepoint. 
>
> * Patch 3: 
>   To test the fix in store_rps_map(), I tried configuring an isolated  
>   CPU by writing to /sys/class/net/en*/queues/rx*/rps_cpus which   
>   resulted in 'write error: Invalid argument' error. For the case  
>   where a non-isolated CPU is writing in rps_cpus the above operation  
>   succeeded without any error. 
>
>
> Changes from v1:   
> ===
> - Included the suggestions made by Bjorn Helgaas in the commit messages.
> - Included the 'Reviewed-by' and 'Acked-by' received for Patch-2.  
>
> [1] 
> https://patchwork.ozlabs.org/project/netdev/patch/51102eebe62336c6a4e584c7a503553b9f90e01c.ca...@marvell.com/
>
> Alex Belits (3):   
>   lib: Restrict cpumask_local_spread to houskeeping CPUs   
>   PCI: Restrict probe functions to housekeeping CPUs   
>   net: Restrict receive packets queuing to housekeeping CPUs   
>
>  drivers/pci/pci-driver.c |  5 -   
>  lib/cpumask.c| 43 +++-
>  net/core/net-sysfs.c | 10 +-  
>  3 files changed, 38 insertions(+), 20 deletions(-)
>
> --
>

Hi,

It seems that the cover email got messed up while I was sending the patches.
I am putting my intended cover-email below for now. I can send a v3 with proper
cover-email if needed. The reason, I am not sending it right now, is that if I
get some comments in my patches I will prefer including them as well in my
v3 posting.


"
This patch-set is originated from one of the patches that have been
posted earlier as a part of "Task_isolation" mode [1] patch series
by Alex Belits . There are only a couple of
changes that I am proposing in this patch-set compared to what Alex
has posted earlier.


Context
===
On a broad level, all three patches that are included in this patch
set are meant to improve the driver/library to respect isolated
CPUs by not pinning any job on it. Not doing so could impact
the latency values in RT use-cases.


Patches
===
* Patch1:
  The first patch is meant to make cpumask_local_spread()
  aware of the isolated CPUs. It ensures that the CPUs that
  are returned by this API only includes housekeeping CPUs.

* Patch2:
  This patch ensures that a probe function that is called
  using work_on_cpu() doesn't run any task on an isolated CPU.

* Patch3:
  This patch makes store_rps_map() aware of the isolated
  CPUs so that rps don't queue any jobs on an isolated CPU.


Proposed Changes

To fix the above-mentioned issues Alex has used housekeeping_cpumask().
The only changes that I am proposing here are:
- Removing the dependency on CONFIG_TASK_ISOLATION that was proposed by
  Alex. As it should be safe to rely on housekeeping_cpumask()
  even when we don't have any isolated CPUs and we want
  to fall back to using all available CPUs in any of the above scenarios.
- Using both HK_FLAG_DOMAIN and HK_FLAG_WQ in all 

linux-next: manual merge of the net-next tree with the net tree

2020-06-22 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  net/xfrm/xfrm_device.c

between commit:

  94579ac3f6d0 ("xfrm: Fix double ESP trailer insertion in IPsec crypto 
offload.")

from the net tree and commit:

  272c2330adc9 ("xfrm: bail early on slave pass over skb")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc net/xfrm/xfrm_device.c
index 626096bd0d29,b8918fc5248b..
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@@ -106,9 -106,10 +106,10 @@@ struct sk_buff *validate_xmit_xfrm(stru
struct sk_buff *skb2, *nskb, *pskb = NULL;
netdev_features_t esp_features = features;
struct xfrm_offload *xo = xfrm_offload(skb);
+   struct net_device *dev = skb->dev;
struct sec_path *sp;
  
 -  if (!xo)
 +  if (!xo || (xo->flags & XFRM_XMIT))
return skb;
  
if (!(features & NETIF_F_HW_ESP))
@@@ -129,27 -134,20 +134,22 @@@
return skb;
}
  
 +  xo->flags |= XFRM_XMIT;
 +
-   if (skb_is_gso(skb)) {
-   struct net_device *dev = skb->dev;
- 
-   if (unlikely(x->xso.dev != dev)) {
-   struct sk_buff *segs;
+   if (skb_is_gso(skb) && unlikely(x->xso.dev != dev)) {
+   struct sk_buff *segs;
  
-   /* Packet got rerouted, fixup features and segment it. 
*/
-   esp_features = esp_features & ~(NETIF_F_HW_ESP
-   | NETIF_F_GSO_ESP);
+   /* Packet got rerouted, fixup features and segment it. */
+   esp_features = esp_features & ~(NETIF_F_HW_ESP | 
NETIF_F_GSO_ESP);
  
-   segs = skb_gso_segment(skb, esp_features);
-   if (IS_ERR(segs)) {
-   kfree_skb(skb);
-   atomic_long_inc(>tx_dropped);
-   return NULL;
-   } else {
-   consume_skb(skb);
-   skb = segs;
-   }
+   segs = skb_gso_segment(skb, esp_features);
+   if (IS_ERR(segs)) {
+   kfree_skb(skb);
+   atomic_long_inc(>tx_dropped);
+   return NULL;
+   } else {
+   consume_skb(skb);
+   skb = segs;
}
}
  


pgp71AZVsEuZC.pgp
Description: OpenPGP digital signature


Re: [PATCH v3] acpi: Extend TPM2 ACPI table with missing log fields

2020-06-22 Thread Jarkko Sakkinen
On Tue, Jun 23, 2020 at 03:56:53AM +0300, Jarkko Sakkinen wrote:
> On Fri, Jun 19, 2020 at 11:14:20AM -0400, Stefan Berger wrote:
> > On 4/2/20 3:21 PM, Jarkko Sakkinen wrote:
> > > On Wed, Apr 01, 2020 at 11:05:36AM +0200, Rafael J. Wysocki wrote:
> > > > On Wed, Apr 1, 2020 at 10:37 AM Jarkko Sakkinen
> > > >  wrote:
> > > > > On Tue, Mar 31, 2020 at 05:49:49PM -0400, Stefan Berger wrote:
> > > > > > From: Stefan Berger 
> > > > > > 
> > > > > > Recent extensions of the TPM2 ACPI table added 3 more fields
> > > > > > including 12 bytes of start method specific parameters and Log Area
> > > > > > Minimum Length (u32) and Log Area Start Address (u64). So, we extend
> > > > > > the existing structure with these fields to allow non-UEFI systems
> > > > > > to access the TPM2's log.
> > > > > > 
> > > > > > The specification that has the new fields is the following:
> > > > > >TCG ACPI Specification
> > > > > >Family "1.2" and "2.0"
> > > > > >Version 1.2, Revision 8
> > > > > > 
> > > > > > Adapt all existing table size calculations to use
> > > > > > offsetof(struct acpi_table_tpm2, start_method_specific)
> > > > > > [where start_method_specific is a newly added field]
> > > > > > rather than sizeof(struct acpi_table_tpm2) so that the addition
> > > > > > of the new fields does not affect current systems that may not
> > > > > > have them.
> > > > > > 
> > > > > > Signed-off-by: Stefan Berger 
> > > > > > Cc: linux-a...@vger.kernel.org
> > > > > I think I'm cool with this but needs an ack from ACPI maintainer.
> > > > > 
> > > > > Rafael, given that this not an intrusive change in any possible means,
> > > > > can I pick this patch and put it to my next pull request?
> > > > Yes, please.
> > > > 
> > > > Thanks!
> > > Great, thanks Rafael.
> > > 
> > > Reviewed-by: Jarkko Sakkinen 
> > > 
> > > Do you mind if I add your ack to the commit?
> > 
> > 
> > Any chance to get v4 applied?
> 
> You should split the actbl3.h change to a separate patch and add 'Cc:'
> tag to Rafael to the commit message.

Send v5 with Rafael's ack (no need to split anymore).

/Jarkko


Re: [PATCH v3] acpi: Extend TPM2 ACPI table with missing log fields

2020-06-22 Thread Jarkko Sakkinen
On Fri, Jun 19, 2020 at 05:55:19PM +0200, Rafael J. Wysocki wrote:
> On Fri, Jun 19, 2020 at 5:14 PM Stefan Berger  wrote:
> >
> > On 4/2/20 3:21 PM, Jarkko Sakkinen wrote:
> > > On Wed, Apr 01, 2020 at 11:05:36AM +0200, Rafael J. Wysocki wrote:
> > >> On Wed, Apr 1, 2020 at 10:37 AM Jarkko Sakkinen
> > >>  wrote:
> > >>> On Tue, Mar 31, 2020 at 05:49:49PM -0400, Stefan Berger wrote:
> >  From: Stefan Berger 
> > 
> >  Recent extensions of the TPM2 ACPI table added 3 more fields
> >  including 12 bytes of start method specific parameters and Log Area
> >  Minimum Length (u32) and Log Area Start Address (u64). So, we extend
> >  the existing structure with these fields to allow non-UEFI systems
> >  to access the TPM2's log.
> > 
> >  The specification that has the new fields is the following:
> > TCG ACPI Specification
> > Family "1.2" and "2.0"
> > Version 1.2, Revision 8
> > 
> >  Adapt all existing table size calculations to use
> >  offsetof(struct acpi_table_tpm2, start_method_specific)
> >  [where start_method_specific is a newly added field]
> >  rather than sizeof(struct acpi_table_tpm2) so that the addition
> >  of the new fields does not affect current systems that may not
> >  have them.
> > 
> >  Signed-off-by: Stefan Berger 
> >  Cc: linux-a...@vger.kernel.org
> > >>> I think I'm cool with this but needs an ack from ACPI maintainer.
> > >>>
> > >>> Rafael, given that this not an intrusive change in any possible means,
> > >>> can I pick this patch and put it to my next pull request?
> > >> Yes, please.
> > >>
> > >> Thanks!
> > > Great, thanks Rafael.
> > >
> > > Reviewed-by: Jarkko Sakkinen 
> > >
> > > Do you mind if I add your ack to the commit?
> >
> 
> It looks like I missed the previous message from Jarkko.
> 
> Yes, please, feel free to add my ACK to the patch, thanks!

OK, this is great, thanks. I'll pick it to my tree then.

/Jarkko


Re: [RESEND PATCH 3/3] nouveau: make nvkm_vmm_ctor() and nvkm_mmu_ptp_get() static

2020-06-22 Thread John Hubbard

On 2020-06-22 16:38, Ralph Campbell wrote:

The functions nvkm_vmm_ctor() and nvkm_mmu_ptp_get() are not called outside
of the file defining them so make them static.

Signed-off-by: Ralph Campbell 
---
  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/base.c | 2 +-
  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c  | 2 +-
  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h  | 3 ---
  3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/base.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/base.c
index ee11ccaf0563..de91e9a26172 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/base.c
@@ -61,7 +61,7 @@ nvkm_mmu_ptp_put(struct nvkm_mmu *mmu, bool force, struct 
nvkm_mmu_pt *pt)
kfree(pt);
  }
  
-struct nvkm_mmu_pt *

+static struct nvkm_mmu_pt *
  nvkm_mmu_ptp_get(struct nvkm_mmu *mmu, u32 size, bool zero)
  {
struct nvkm_mmu_pt *pt;
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
index 199f94e15c5f..67b00dcef4b8 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
@@ -1030,7 +1030,7 @@ nvkm_vmm_ctor_managed(struct nvkm_vmm *vmm, u64 addr, u64 
size)
return 0;
  }
  
-int

+static int
  nvkm_vmm_ctor(const struct nvkm_vmm_func *func, struct nvkm_mmu *mmu,
  u32 pd_header, bool managed, u64 addr, u64 size,
  struct lock_class_key *key, const char *name,
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h 
b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h
index d3f8f916d0db..a2b179568970 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h
@@ -163,9 +163,6 @@ int nvkm_vmm_new_(const struct nvkm_vmm_func *, struct 
nvkm_mmu *,
  u32 pd_header, bool managed, u64 addr, u64 size,
  struct lock_class_key *, const char *name,
  struct nvkm_vmm **);
-int nvkm_vmm_ctor(const struct nvkm_vmm_func *, struct nvkm_mmu *,
- u32 pd_header, bool managed, u64 addr, u64 size,
- struct lock_class_key *, const char *name, struct nvkm_vmm *);
  struct nvkm_vma *nvkm_vmm_node_search(struct nvkm_vmm *, u64 addr);
  struct nvkm_vma *nvkm_vmm_node_split(struct nvkm_vmm *, struct nvkm_vma *,
 u64 addr, u64 size);



Looks accurate: the order within vmm.c (now that there is no .h
declaration) is still good, and I found no other uses of either function
within the linux.git tree, so


Reviewed-by: John Hubbard 

Re: [PATCH tip/core/rcu 02/26] mm/mmap.c: Add cond_resched() for exit_mmap() CPU stalls

2020-06-22 Thread Paul E. McKenney
On Mon, Jun 22, 2020 at 05:47:19PM -0700, Shakeel Butt wrote:
> On Mon, Jun 22, 2020 at 5:22 PM  wrote:
> >
> > From: "Paul E. McKenney" 
> >
> > A large process running on a heavily loaded system can encounter the
> > following RCU CPU stall warning:
> >
> >   rcu: INFO: rcu_sched self-detected stall on CPU
> >   rcu: \x093-: (20998 ticks this GP) idle=4ea/1/0x4002 
> > softirq=556558/556558 fqs=5190
> >   \x09(t=21013 jiffies g=1005461 q=132576)
> >   NMI backtrace for cpu 3
> >   CPU: 3 PID: 501900 Comm: aio-free-ring-w Kdump: loaded Not tainted 
> > 5.2.9-108_fbk12_rc3_3858_gb83b75af7909 #1
> >   Hardware name: Wiwynn   HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016
> >   Call Trace:
> >
> >dump_stack+0x46/0x60
> >nmi_cpu_backtrace.cold.3+0x13/0x50
> >? lapic_can_unplug_cpu.cold.27+0x34/0x34
> >nmi_trigger_cpumask_backtrace+0xba/0xca
> >rcu_dump_cpu_stacks+0x99/0xc7
> >rcu_sched_clock_irq.cold.87+0x1aa/0x397
> >? tick_sched_do_timer+0x60/0x60
> >update_process_times+0x28/0x60
> >tick_sched_timer+0x37/0x70
> >__hrtimer_run_queues+0xfe/0x270
> >hrtimer_interrupt+0xf4/0x210
> >smp_apic_timer_interrupt+0x5e/0x120
> >apic_timer_interrupt+0xf/0x20
> >
> >   RIP: 0010:kmem_cache_free+0x223/0x300
> >   Code: 88 00 00 00 0f 85 ca 00 00 00 41 8b 55 18 31 f6 f7 da 41 f6 45 0a 
> > 02 40 0f 94 c6 83 c6 05 9c 41 5e fa e8 a0 a7 01 00 41 56 9d <49> 8b 47 08 
> > a8 03 0f 85 87 00 00 00 65 48 ff 08 e9 3d fe ff ff 65
> >   RSP: 0018:c9000e8e3da8 EFLAGS: 0206 ORIG_RAX: ff13
> >   RAX: 0002 RBX: 88861b9de960 RCX: 0030
> >   RDX: fffe41e8 RSI: 60777fe3a100 RDI: 0001be18
> >   RBP: ea00186e7780 R08:  R09: 
> >   R10: 88861b9dea28 R11: 7ffde000 R12: 81230a1f
> >   R13: 54684dc0 R14: 0206 R15: 547dbc00
> >? remove_vma+0x4f/0x60
> >remove_vma+0x4f/0x60
> >exit_mmap+0xd6/0x160
> >mmput+0x4a/0x110
> >do_exit+0x278/0xae0
> >? syscall_trace_enter+0x1d3/0x2b0
> >? handle_mm_fault+0xaa/0x1c0
> >do_group_exit+0x3a/0xa0
> >__x64_sys_exit_group+0x14/0x20
> >do_syscall_64+0x42/0x100
> >entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > And on a PREEMPT=n kernel, the "while (vma)" loop in exit_mmap() can run
> > for a very long time given a large process.  This commit therefore adds
> > a cond_resched() to this loop, providing RCU any needed quiescent states.
> >
> > Cc: Andrew Morton 
> > Cc: 
> > Signed-off-by: Paul E. McKenney 
> 
> We have exactly the same change in our internal kernel since 2018. We
> mostly observed the need_resched warnings on the processes mapping the
> hugetlbfs.
> 
> Reviewed-by: Shakeel Butt 

Thank you very much, I will apply your Reviewed-by on the next rebase.

Any other patches we should know about?  ;-)

Thanx, Paul

> > ---
> >  mm/mmap.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 59a4682..972f839 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -3159,6 +3159,7 @@ void exit_mmap(struct mm_struct *mm)
> > if (vma->vm_flags & VM_ACCOUNT)
> > nr_accounted += vma_pages(vma);
> > vma = remove_vma(vma);
> > +   cond_resched();
> > }
> > vm_unacct_memory(nr_accounted);
> >  }
> > --
> > 2.9.5
> >


Re: [PATCH v3] acpi: Extend TPM2 ACPI table with missing log fields

2020-06-22 Thread Jarkko Sakkinen
On Fri, Jun 19, 2020 at 11:14:20AM -0400, Stefan Berger wrote:
> On 4/2/20 3:21 PM, Jarkko Sakkinen wrote:
> > On Wed, Apr 01, 2020 at 11:05:36AM +0200, Rafael J. Wysocki wrote:
> > > On Wed, Apr 1, 2020 at 10:37 AM Jarkko Sakkinen
> > >  wrote:
> > > > On Tue, Mar 31, 2020 at 05:49:49PM -0400, Stefan Berger wrote:
> > > > > From: Stefan Berger 
> > > > > 
> > > > > Recent extensions of the TPM2 ACPI table added 3 more fields
> > > > > including 12 bytes of start method specific parameters and Log Area
> > > > > Minimum Length (u32) and Log Area Start Address (u64). So, we extend
> > > > > the existing structure with these fields to allow non-UEFI systems
> > > > > to access the TPM2's log.
> > > > > 
> > > > > The specification that has the new fields is the following:
> > > > >TCG ACPI Specification
> > > > >Family "1.2" and "2.0"
> > > > >Version 1.2, Revision 8
> > > > > 
> > > > > Adapt all existing table size calculations to use
> > > > > offsetof(struct acpi_table_tpm2, start_method_specific)
> > > > > [where start_method_specific is a newly added field]
> > > > > rather than sizeof(struct acpi_table_tpm2) so that the addition
> > > > > of the new fields does not affect current systems that may not
> > > > > have them.
> > > > > 
> > > > > Signed-off-by: Stefan Berger 
> > > > > Cc: linux-a...@vger.kernel.org
> > > > I think I'm cool with this but needs an ack from ACPI maintainer.
> > > > 
> > > > Rafael, given that this not an intrusive change in any possible means,
> > > > can I pick this patch and put it to my next pull request?
> > > Yes, please.
> > > 
> > > Thanks!
> > Great, thanks Rafael.
> > 
> > Reviewed-by: Jarkko Sakkinen 
> > 
> > Do you mind if I add your ack to the commit?
> 
> 
> Any chance to get v4 applied?

You should split the actbl3.h change to a separate patch and add 'Cc:'
tag to Rafael to the commit message.

/Jarkko


Re: [PATCH] KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL

2020-06-22 Thread Paolo Bonzini
On 23/06/20 02:51, Sean Christopherson wrote:
> Remove support for context switching between the guest's and host's
> desired UMWAIT_CONTROL.  Propagating the guest's value to hardware isn't
> required for correct functionality, e.g. KVM intercepts reads and writes
> to the MSR, and the latency effects of the settings controlled by the
> MSR are not architecturally visible.
> 
> As a general rule, KVM should not allow the guest to control power
> management settings unless explicitly enabled by userspace, e.g. see
> KVM_CAP_X86_DISABLE_EXITS.  E.g. Intel's SDM explicitly states that C0.2
> can improve the performance of SMT siblings.  A devious guest could
> disable C0.2 so as to improve the performance of their workloads at the
> detriment to workloads running in the host or on other VMs.
> 
> Wholesale removal of UMWAIT_CONTROL context switching also fixes a race
> condition where updates from the host may cause KVM to enter the guest
> with the incorrect value.  Because updates are are propagated to all
> CPUs via IPI (SMP function callback), the value in hardware may be
> stale with respect to the cached value and KVM could enter the guest
> with the wrong value in hardware.  As above, the guest can't observe the
> bad value, but it's a weird and confusing wart in the implementation.
> 
> Removal also fixes the unnecessary usage of VMX's atomic load/store MSR
> lists.  Using the lists is only necessary for MSRs that are required for
> correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on
> old hardware, or for MSRs that need to-the-uop precision, e.g. perf
> related MSRs.  For UMWAIT_CONTROL, the effects are only visible in the
> kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in
> vcpu_vmx_run().  Using the atomic lists is undesirable as they are more
> expensive than direct RDMSR/WRMSR.
> 
> Furthermore, even if giving the guest control of the MSR is legitimate,
> e.g. in pass-through scenarios, it's not clear that the benefits would
> outweigh the overhead.  E.g. saving and restoring an MSR across a VMX
> roundtrip costs ~250 cycles, and if the guest diverged from the host
> that cost would be paid on every run of the guest.  In other words, if
> there is a legitimate use case then it should be enabled by a new
> per-VM capability.
> 
> Note, KVM still needs to emulate MSR_IA32_UMWAIT_CONTROL so that it can
> correctly expose other WAITPKG features to the guest, e.g. TPAUSE,
> UMWAIT and UMONITOR.
> 
> Fixes: 6e3ba4abcea56 ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL")
> Cc: sta...@vger.kernel.org
> Cc: Jingqi Liu 
> Cc: Tao Xu 
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/include/asm/mwait.h |  2 --
>  arch/x86/kernel/cpu/umwait.c |  6 --
>  arch/x86/kvm/vmx/vmx.c   | 18 --
>  3 files changed, 26 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index 73d997aa2966..e039a933aca3 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -25,8 +25,6 @@
>  #define TPAUSE_C01_STATE 1
>  #define TPAUSE_C02_STATE 0
>  
> -u32 get_umwait_control_msr(void);
> -
>  static inline void __monitor(const void *eax, unsigned long ecx,
>unsigned long edx)
>  {
> diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
> index 300e3fd5ade3..ec8064c0ae03 100644
> --- a/arch/x86/kernel/cpu/umwait.c
> +++ b/arch/x86/kernel/cpu/umwait.c
> @@ -18,12 +18,6 @@
>   */
>  static u32 umwait_control_cached = UMWAIT_CTRL_VAL(10, 
> UMWAIT_C02_ENABLE);
>  
> -u32 get_umwait_control_msr(void)
> -{
> - return umwait_control_cached;
> -}
> -EXPORT_SYMBOL_GPL(get_umwait_control_msr);
> -
>  /*
>   * Cache the original IA32_UMWAIT_CONTROL MSR value which is configured by
>   * hardware or BIOS before kernel boot.
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 08e26a9518c2..b2447c1ee362 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6606,23 +6606,6 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx 
> *vmx)
>   msrs[i].host, false);
>  }
>  
> -static void atomic_switch_umwait_control_msr(struct vcpu_vmx *vmx)
> -{
> - u32 host_umwait_control;
> -
> - if (!vmx_has_waitpkg(vmx))
> - return;
> -
> - host_umwait_control = get_umwait_control_msr();
> -
> - if (vmx->msr_ia32_umwait_control != host_umwait_control)
> - add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL,
> - vmx->msr_ia32_umwait_control,
> - host_umwait_control, false);
> - else
> - clear_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL);
> -}
> -
>  static void vmx_update_hv_timer(struct kvm_vcpu *vcpu)
>  {
>   struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -6730,7 +6713,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  
>   if (vcpu_to_pmu(vcpu)->version)
>  

Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall

2020-06-22 Thread Kees Cook
On Mon, Jun 22, 2020 at 08:05:10PM -0400, Arvind Sankar wrote:
> But I still don't see anything _stopping_ the compiler from optimizing
> this better in the future. The "=m" is not a barrier: it just informs
> the compiler that the asm produces an output value in *ptr (and no other
> outputs). If nothing can consume that output, it doesn't stop the
> compiler from freeing the allocation immediately after the asm instead
> of at the end of the function.

Ah, yeah, I get what you mean.

> I'm talking about something like
>   asm volatile("" : : "r" (ptr) : "memory");
> which tells the compiler that the asm may change memory arbitrarily.

Yeah, I will adjust it.

> Here, we don't use it really as a barrier, but to tell the compiler that
> the asm may have stashed the value of ptr somewhere in memory, so it's
> not free to reuse the space that it pointed to until the function
> returns (unless it can prove that nothing accesses memory, not just that
> nothing accesses ptr).

-- 
Kees Cook


Re: [PATCH 01/12] ima: Have the LSM free its audit rule

2020-06-22 Thread Casey Schaufler
On 6/22/2020 5:32 PM, Tyler Hicks wrote:
> Ask the LSM to free its audit rule rather than directly calling kfree().
> Both AppArmor and SELinux do additional work in their audit_rule_free()
> hooks. Fix memory leaks by allowing the LSMs to perform necessary work.
>
> Fixes: b16942455193 ("ima: use the lsm policy update notifier")
> Signed-off-by: Tyler Hicks 
> Cc: Janne Karhunen 
> ---
>  security/integrity/ima/ima.h| 6 ++
>  security/integrity/ima/ima_policy.c | 2 +-
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
> index df93ac258e01..de05d7f1d3ec 100644
> --- a/security/integrity/ima/ima.h
> +++ b/security/integrity/ima/ima.h
> @@ -404,6 +404,7 @@ static inline void ima_free_modsig(struct modsig *modsig)
>  #ifdef CONFIG_IMA_LSM_RULES
>  
>  #define security_filter_rule_init security_audit_rule_init
> +#define security_filter_rule_free security_audit_rule_free
>  #define security_filter_rule_match security_audit_rule_match

In context this seems perfectly reasonable. If, however, you're
working with the LSM infrastructure this set of #defines is maddening.
The existing ones have been driving my nuts for the past few years,
so I'd like to discourage adding another. Since the security_filter_rule
functions are IMA specific they shouldn't be prefixed security_. I know
that it seems to be code churn/bikesheading, but we please change these:

static inline int ima_filter_rule_init(.)
{
return security_audit_rule_init(.);
}

and so forth. I understand if you don't want to make the change.
I have plenty of other things driving me crazy just now, so this
doesn't seem likely to push me over the edge.

>  
>  #else
> @@ -414,6 +415,11 @@ static inline int security_filter_rule_init(u32 field, 
> u32 op, char *rulestr,
>   return -EINVAL;
>  }
>  
> +static inline void security_filter_rule_free(void *lsmrule)
> +{
> + return -EINVAL;
> +}
> +
>  static inline int security_filter_rule_match(u32 secid, u32 field, u32 op,
>void *lsmrule)
>  {
> diff --git a/security/integrity/ima/ima_policy.c 
> b/security/integrity/ima/ima_policy.c
> index e493063a3c34..236a731492d1 100644
> --- a/security/integrity/ima/ima_policy.c
> +++ b/security/integrity/ima/ima_policy.c
> @@ -258,7 +258,7 @@ static void ima_lsm_free_rule(struct ima_rule_entry 
> *entry)
>   int i;
>  
>   for (i = 0; i < MAX_LSM_RULES; i++) {
> - kfree(entry->lsm[i].rule);
> + security_filter_rule_free(entry->lsm[i].rule);
>   kfree(entry->lsm[i].args_p);
>   }
>   kfree(entry);



Re: [PATCH v1 03/11] soc: mediatek: cmdq: add write_s function

2020-06-22 Thread Dennis-YC Hsieh
Hi Matthias,


On Mon, 2020-06-22 at 19:08 +0200, Matthias Brugger wrote:
> 
> On 22/06/2020 18:12, Dennis-YC Hsieh wrote:
> > Hi Matthias,
> > 
> > On Mon, 2020-06-22 at 17:54 +0200, Matthias Brugger wrote:
> >>
> >> On 22/06/2020 17:36, Dennis-YC Hsieh wrote:
> >>> Hi Matthias,
> >>>
> >>> thanks for your comment.
> >>>
> >>> On Mon, 2020-06-22 at 13:07 +0200, Matthias Brugger wrote:
> 
>  On 21/06/2020 16:18, Dennis YC Hsieh wrote:
> > add write_s function in cmdq helper functions which
> > writes value contains in internal register to address
> > with large dma access support.
> >
> > Signed-off-by: Dennis YC Hsieh 
> > ---
> >  drivers/soc/mediatek/mtk-cmdq-helper.c   |   19 +++
> >  include/linux/mailbox/mtk-cmdq-mailbox.h |1 +
> >  include/linux/soc/mediatek/mtk-cmdq.h|   19 +++
> >  3 files changed, 39 insertions(+)
> >
> > diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c 
> > b/drivers/soc/mediatek/mtk-cmdq-helper.c
> > index bf32e3b2ca6c..817a5a97dbe5 100644
> > --- a/drivers/soc/mediatek/mtk-cmdq-helper.c
> > +++ b/drivers/soc/mediatek/mtk-cmdq-helper.c
> > @@ -18,6 +18,10 @@ struct cmdq_instruction {
> > union {
> > u32 value;
> > u32 mask;
> > +   struct {
> > +   u16 arg_c;
> > +   u16 src_reg;
> > +   };
> > };
> > union {
> > u16 offset;
> > @@ -222,6 +226,21 @@ int cmdq_pkt_write_mask(struct cmdq_pkt *pkt, u8 
> > subsys,
> >  }
> >  EXPORT_SYMBOL(cmdq_pkt_write_mask);
> >  
> > +int cmdq_pkt_write_s(struct cmdq_pkt *pkt, u16 high_addr_reg_idx,
> > +u16 addr_low, u16 src_reg_idx)
> > +{
> 
>  Do I understand correctly that we use CMDQ_ADDR_HIGH(addr) and
>  CMDQ_ADDR_LOW(addr) to calculate in the client high_addr_reg_idx and 
>  addr_low
>  respectively?
> 
>  In that case I think a better interface would be to pass the address and 
>  do the
>  high/low calculation in the cmdq_pkt_write_s
> >>>
> >>> Not exactly. The high_addr_reg_idx parameter is index of internal
> >>> register (which store address bit[47:16]), not result of
> >>> CMDQ_ADDR_HIGH(addr). 
> >>>
> >>> The CMDQ_ADDR_HIGH macro use in patch 02/11 cmdq_pkt_assign() api. This
> >>> api helps assign address bit[47:16] into one of internal register by
> >>> index. And same index could be use in cmdq_pkt_write_s(). The gce
> >>> combine bit[47:16] in internal register and bit[15:0] in addr_low
> >>> parameter to final address. So it is better to keep interface in this
> >>> way.
> >>>
> >>
> >> Got it, but then why don't we call cmdq_pkt_assign() in 
> >> cmdq_pkt_write_s()? This
> >> way we would get a clean API for what we want to do.
> >> Do we expect other users of cmdq_pkt_assign()? Otherwise we could keep it
> >> private the this file and don't export it.
> > 
> > Considering this case: write 2 register 0xaabb00c0 0xaabb00d0.
> > 
> > If we call assign inside write_s api it will be:
> > assign aabb to internal reg 0
> > write reg 0 + 0x00c0
> > assign aabb to internal reg 0
> > write reg 0 + 0x00d0
> > 
> > 
> > But if we let client decide timing to call assign, it will be like:
> > assign aabb to internal reg 0
> > write reg 0 + 0x00c0
> > write reg 0 + 0x00d0
> > 
> 
> Ok, thanks for clarification. Is this something you exepect to see in the gce
> consumer driver?
> 

yes it is, less command means better performance and save memory, so it
is a good practice for consumer.

> > 
> > The first way uses 4 command and second one uses only 3 command.
> > Thus it is better to let client call assign explicitly.
> > 
> >>
> >> By the way, why do you postfix the _s, I understand that it reflects the 
> >> large
> >> DMA access but I wonder why you choose '_s'.
> >>
> > 
> > The name of this command is "write_s" which is hardware spec.
> > I'm just following it since it is a common language between gce sw/hw
> > designers.
> > 
> 
> Ok, I will probably have to look that up every time have a look at the driver,
> but that's OK.
> 

ok thanks for your comment



Regards,
Dennis

> Regards,
> Matthias
> 
> > 
> > Regards,
> > Dennis
> > 
> >> Regards,
> >> Matthias
> >>
> >>>
> >>> Regards,
> >>> Dennis
> >>>
> 
>  Regards,
>  Matthias
> 
> > +   struct cmdq_instruction inst = {};
> > +
> > +   inst.op = CMDQ_CODE_WRITE_S;
> > +   inst.src_t = CMDQ_REG_TYPE;
> > +   inst.sop = high_addr_reg_idx;
> > +   inst.offset = addr_low;
> > +   inst.src_reg = src_reg_idx;
> > +
> > +   return cmdq_pkt_append_command(pkt, inst);
> > +}
> > +EXPORT_SYMBOL(cmdq_pkt_write_s);
> > +
> >  int cmdq_pkt_wfe(struct cmdq_pkt *pkt, u16 event)
> >  {
> > struct 

[PATCH tip/core/rcu 12/14] Documentation/litmus-tests: Cite an RCU litmus test

2020-06-22 Thread paulmck
From: "Joel Fernandes (Google)" 

This commit cites a pertinent RCU-related litmus test.

Co-developed-by: Joel Fernandes (Google) 
Co-developed-by: Akira Yokosawa 
[Alan: grammar nit]
[ paulmck: Update commit log and title per Akira feedback. ]
Suggested-by: Alan Stern 
Signed-off-by: Joel Fernandes (Google) 
Signed-off-by: Akira Yokosawa 
Signed-off-by: Paul E. McKenney 
---
 Documentation/litmus-tests/README | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/litmus-tests/README 
b/Documentation/litmus-tests/README
index ac0b270..b79e640 100644
--- a/Documentation/litmus-tests/README
+++ b/Documentation/litmus-tests/README
@@ -24,6 +24,10 @@ Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
 RCU (/rcu directory)
 
 
+MP+onceassign+derefonce.litmus (under tools/memory-model/litmus-tests/)
+Demonstrates the use of rcu_assign_pointer() and rcu_dereference() to
+ensure that an RCU reader will not see pre-initialization garbage.
+
 RCU+sync+read.litmus
 RCU+sync+free.litmus
 Both the above litmus tests demonstrate the RCU grace period guarantee
-- 
2.9.5



[PATCH tip/core/rcu 08/14] Documentation/litmus-tests/atomic: Add a test for atomic_set()

2020-06-22 Thread paulmck
From: Boqun Feng 

We already use a litmus test in atomic_t.txt to describe the behavior of
an atomic_set() with the an atomic RMW, so add it into atomic-tests
directory to make it easily accessible for anyone who cares about the
semantics of our atomic APIs.

Besides currently the litmus test "atomic-set" in atomic_t.txt has a few
things to be improved:

1)  The CPU/Processor numbers "P1,P2" are not only inconsistent with
the rest of the document, which uses "CPU0" and "CPU1", but also
unacceptable by the herd tool, which requires processors start
at "P0".

2)  The initialization block uses a "atomic_set()", which is OK, but
it's better to use ATOMIC_INIT() to make clear this is an
initialization.

3)  The return value of atomic_add_unless() is discarded
inexplicitly, which is OK for C language, but it will be helpful
to the herd tool if we use a void cast to make the discard
explicit.

4)  The name and the paragraph describing the test need to be more
accurate and aligned with our wording in LKMM.

Therefore fix these in both atomic_t.txt and the new added litmus test.

Acked-by: Andrea Parri 
Acked-by: Alan Stern 
Signed-off-by: Boqun Feng 
Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Paul E. McKenney 
---
 Documentation/atomic_t.txt | 14 ++---
 ...Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus | 24 ++
 Documentation/litmus-tests/atomic/README   |  7 +++
 3 files changed, 38 insertions(+), 7 deletions(-)
 create mode 100644 
Documentation/litmus-tests/atomic/Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus

diff --git a/Documentation/atomic_t.txt b/Documentation/atomic_t.txt
index 0ab747e..67d1d99f 100644
--- a/Documentation/atomic_t.txt
+++ b/Documentation/atomic_t.txt
@@ -85,21 +85,21 @@ smp_store_release() respectively. Therefore, if you find 
yourself only using
 the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all
 and are doing it wrong.
 
-A subtle detail of atomic_set{}() is that it should be observable to the RMW
-ops. That is:
+A note for the implementation of atomic_set{}() is that it must not break the
+atomicity of the RMW ops. That is:
 
-  C atomic-set
+  C Atomic-RMW-ops-are-atomic-WRT-atomic_set
 
   {
-atomic_set(v, 1);
+atomic_t v = ATOMIC_INIT(1);
   }
 
-  P1(atomic_t *v)
+  P0(atomic_t *v)
   {
-atomic_add_unless(v, 1, 0);
+(void)atomic_add_unless(v, 1, 0);
   }
 
-  P2(atomic_t *v)
+  P1(atomic_t *v)
   {
 atomic_set(v, 0);
   }
diff --git 
a/Documentation/litmus-tests/atomic/Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
 
b/Documentation/litmus-tests/atomic/Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
new file mode 100644
index 000..4938531
--- /dev/null
+++ 
b/Documentation/litmus-tests/atomic/Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
@@ -0,0 +1,24 @@
+C Atomic-RMW-ops-are-atomic-WRT-atomic_set
+
+(*
+ * Result: Never
+ *
+ * Test that atomic_set() cannot break the atomicity of atomic RMWs.
+ *)
+
+{
+   atomic_t v = ATOMIC_INIT(1);
+}
+
+P0(atomic_t *v)
+{
+   (void)atomic_add_unless(v, 1, 0);
+}
+
+P1(atomic_t *v)
+{
+   atomic_set(v, 0);
+}
+
+exists
+(v=2)
diff --git a/Documentation/litmus-tests/atomic/README 
b/Documentation/litmus-tests/atomic/README
index ae61201..a1b7241 100644
--- a/Documentation/litmus-tests/atomic/README
+++ b/Documentation/litmus-tests/atomic/README
@@ -2,3 +2,10 @@ This directory contains litmus tests that are typical to 
describe the semantics
 of our atomic APIs. For more information about how to "run" a litmus test or
 how to generate a kernel test module based on a litmus test, please see
 tools/memory-model/README.
+
+
+LITMUS TESTS
+
+
+Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
+   Test that atomic_set() cannot break the atomicity of atomic RMWs.
-- 
2.9.5



[PATCH tip/core/rcu 05/14] MAINTAINERS: Update maintainers for new Documentation/litmus-tests

2020-06-22 Thread paulmck
From: "Joel Fernandes (Google)" 

This commit adds Joel Fernandes as official LKMM reviewer.

Acked-by: Boqun Feng 
Acked-by: Andrea Parri 
Signed-off-by: Joel Fernandes (Google) 
[ paulmck: Apply Joe Perches alphabetization feedback. ]
Signed-off-by: Paul E. McKenney 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 68f21d4..696a02f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9960,6 +9960,7 @@ M:Luc Maranget 
 M: "Paul E. McKenney" 
 R: Akira Yokosawa 
 R: Daniel Lustig 
+R: Joel Fernandes 
 L: linux-kernel@vger.kernel.org
 L: linux-a...@vger.kernel.org
 S: Supported
@@ -9968,6 +9969,7 @@ F:Documentation/atomic_bitops.txt
 F: Documentation/atomic_t.txt
 F: Documentation/core-api/atomic_ops.rst
 F: Documentation/core-api/refcount-vs-atomic.rst
+F: Documentation/litmus-tests/
 F: Documentation/memory-barriers.txt
 F: tools/memory-model/
 
-- 
2.9.5



Re: [PATCH v4 3/3] arm64: dts: realtek: Add RTD1319 SoC and Realtek Pym Particles EVB

2020-06-22 Thread Andreas Färber

Am 21.06.20 um 01:32 schrieb Andreas Färber:

diff --git a/arch/arm64/boot/dts/realtek/rtd13xx.dtsi 
b/arch/arm64/boot/dts/realtek/rtd13xx.dtsi
new file mode 100644
index ..8c5b6fc7b8eb
--- /dev/null
+++ b/arch/arm64/boot/dts/realtek/rtd13xx.dtsi

[...]

+ {
+   uart0: serial0@800 {


Node name should be serial, not serial0.


+   compatible = "snps,dw-apb-uart";
+   reg = <0x800 0x400>;
+   reg-shift = <2>;
+   reg-io-width = <4>;
+   interrupts = ;
+   clock-frequency = <43200>;
+   status = "disabled";
+   };
+};
+
+ {
+   uart1: serial1@200 {


Ditto, serial.


+   compatible = "snps,dw-apb-uart";
+   reg = <0x200 0x400>;
+   reg-shift = <2>;
+   reg-io-width = <4>;
+   interrupts = ;
+   clock-frequency = <43200>;
+   status = "disabled";
+   };
+
+   uart2: serial2@400 {


Ditto.


+   compatible = "snps,dw-apb-uart";
+   reg = <0x400 0x400>;
+   reg-shift = <2>;
+   reg-io-width = <4>;
+   interrupts = ;
+   clock-frequency = <43200>;
+   status = "disabled";
+   };
+};


Regards,
Andreas

--
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer
HRB 36809 (AG Nürnberg)


[PATCH tip/core/rcu 01/14] tools/memory-model: Add recent references

2020-06-22 Thread paulmck
From: "Paul E. McKenney" 

This commit updates the list of LKMM-related publications in
Documentation/references.txt.

Signed-off-by: Paul E. McKenney 
Acked-by: Andrea Parri 
---
 tools/memory-model/Documentation/references.txt | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/tools/memory-model/Documentation/references.txt 
b/tools/memory-model/Documentation/references.txt
index b177f3e..ecbbaa5 100644
--- a/tools/memory-model/Documentation/references.txt
+++ b/tools/memory-model/Documentation/references.txt
@@ -73,6 +73,18 @@ oChristopher Pulte, Shaked Flur, Will Deacon, Jon French,
 Linux-kernel memory model
 =
 
+o  Jade Alglave, Will Deacon, Boqun Feng, David Howells, Daniel
+   Lustig, Luc Maranget, Paul E. McKenney, Andrea Parri, Nicholas
+   Piggin, Alan Stern, Akira Yokosawa, and Peter Zijlstra.
+   2019. "Calibrating your fear of big bad optimizing compilers"
+   Linux Weekly News.  https://lwn.net/Articles/799218/
+
+o  Jade Alglave, Will Deacon, Boqun Feng, David Howells, Daniel
+   Lustig, Luc Maranget, Paul E. McKenney, Andrea Parri, Nicholas
+   Piggin, Alan Stern, Akira Yokosawa, and Peter Zijlstra.
+   2019. "Who's afraid of a big bad optimizing compiler?"
+   Linux Weekly News.  https://lwn.net/Articles/793253/
+
 o  Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
Alan Stern.  2018. "Frightening small children and disconcerting
grown-ups: Concurrency in the Linux kernel". In Proceedings of
@@ -88,6 +100,11 @@ o   Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea 
Parri, and
Alan Stern.  2017.  "A formal kernel memory-ordering model (part 2)"
Linux Weekly News.  https://lwn.net/Articles/720550/
 
+o  Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
+   Alan Stern.  2017-2019.  "A Formal Model of Linux-Kernel Memory
+   Ordering" (backup material for the LWN articles)
+   
https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/LWNLinuxMM/
+
 
 Memory-model tooling
 
@@ -110,5 +127,5 @@ Memory-model comparisons
 
 
 o  Paul E. McKenney, Ulrich Weigand, Andrea Parri, and Boqun
-   Feng. 2016. "Linux-Kernel Memory Model". (6 June 2016).
-   http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0124r2.html.
+   Feng. 2018. "Linux-Kernel Memory Model". (27 September 2018).
+   http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0124r6.html.
-- 
2.9.5



[PATCH] IB/hfi1: Add explicit cast OPA_MTU_8192 to 'enum ib_mtu'

2020-06-22 Thread Nathan Chancellor
Clang warns:

drivers/infiniband/hw/hfi1/qp.c:198:9: warning: implicit conversion from
enumeration type 'enum opa_mtu' to different enumeration type 'enum
ib_mtu' [-Wenum-conversion]
mtu = OPA_MTU_8192;
~ ^~~~
1 warning generated.

enum opa_mtu extends enum ib_mtu. There are typically two ways to deal
with this:

* Remove the expected types and just use 'int' for all parameters and
  types.

* Explicitly cast the enums between each other.

This driver chooses to do the later so do the same thing here.

Fixes: 6d72344cf6c4 ("IB/ipoib: Increase ipoib Datagram mode MTU's upper limit")
Link: https://github.com/ClangBuiltLinux/linux/issues/1062
Link: 
https://lore.kernel.org/linux-rdma/20200527040350.GA3118979@ubuntu-s3-xlarge-x86/
Signed-off-by: Nathan Chancellor 
---
 drivers/infiniband/hw/hfi1/qp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hfi1/qp.c b/drivers/infiniband/hw/hfi1/qp.c
index 0c2ae9f7b3e8..2f3d9ce077d3 100644
--- a/drivers/infiniband/hw/hfi1/qp.c
+++ b/drivers/infiniband/hw/hfi1/qp.c
@@ -195,7 +195,7 @@ static inline int verbs_mtu_enum_to_int(struct ib_device 
*dev, enum ib_mtu mtu)
 {
/* Constraining 10KB packets to 8KB packets */
if (mtu == (enum ib_mtu)OPA_MTU_10240)
-   mtu = OPA_MTU_8192;
+   mtu = (enum ib_mtu)OPA_MTU_8192;
return opa_mtu_enum_to_int((enum opa_mtu)mtu);
 }
 

base-commit: 27f11fea33608cbd321a97cbecfa2ef97dcc1821
-- 
2.27.0



Re: [PATCH v2 00/11] KVM: Support guest MAXPHYADDR < host MAXPHYADDR

2020-06-22 Thread Paolo Bonzini
On 23/06/20 01:47, Andy Lutomirski wrote:
> I believe that Xen does this.  Linux does not.)  For a guest to
> actually be functional in this case, the guest needs to make sure
> that it is not setting bits that are not, in fact, reserved on the
> CPU.  This means the guest needs to check MAXPHYADDR and do something
> different on different CPUs.
> 
> Do such guests exist?

I don't know; at least KVM does it too when EPT is disabled, though.  It
tries to minimize the effect of this issue by preferring bit 51, but
this does not help if the host MAXPHYADDR is 52.

> As far as I know, Xen is busted on systems
> with unusually large MAXPHYADDR regardless of any virtualization
> issues, so, at best, this series would make Xen, running as a KVM
> guest, work better on new hardware than it does running bare metal on
> that hardware.  This seems like an insufficient justification for a
> performance-eating series like this.
> 
> And, unless I've misunderstood, this will eat performance quite
> badly. Linux guests [0] (and probably many other guests), in quite a
> few workloads, is fairly sensitive to the performance of ordinary 
> write-protect or not-present faults.  Promoting these to VM exits 
> because you want to check for bits above the guest's MAXPHYADDR is
> going to hurt.

The series needs benchmarking indeed, however note that the vmexits do
not occur for not-present faults.  QEMU sets a fixed MAXPHYADDR of 40
but that is generally a bad idea and several distros change that to just
use host MAXPHYADDR instead (which would disable the new code).

> (Also, I'm confused.  Wouldn't faults like this be EPT/NPT
> violations, not page faults?)

Only if the pages are actually accessible.  Otherwise, W/U/F faults
would prevail over the RSVD fault.  Tom is saying that there's no
architectural promise that RSVD faults prevail, either, so that would
remove the need to trap #PF.

Paolo

> --Andy
> 
> 
> [0] From rather out-of-date memory, Linux doesn't make as much use
> as one might expect of the A bit.  Instead it uses minor faults.
> Ouch.



[PATCH tip/core/rcu 09/14] Documentation/litmus-tests/atomic: Add a test for smp_mb__after_atomic()

2020-06-22 Thread paulmck
From: Boqun Feng 

We already use a litmus test in atomic_t.txt to describe atomic RMW +
smp_mb__after_atomic() is stronger than acquire (both the read and the
write parts are ordered). So make it a litmus test in atomic-tests
directory, so that people can access the litmus easily.

Additionally, change the processor numbers "P1, P2" to "P0, P1" in
atomic_t.txt for the consistency with the processor numbers in the
litmus test, which herd can handle.

Acked-by: Alan Stern 
Acked-by: Andrea Parri 
Signed-off-by: Boqun Feng 
Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Paul E. McKenney 
---
 Documentation/atomic_t.txt | 10 +++
 ...b__after_atomic-is-stronger-than-acquire.litmus | 32 ++
 Documentation/litmus-tests/atomic/README   |  5 
 3 files changed, 42 insertions(+), 5 deletions(-)
 create mode 100644 
Documentation/litmus-tests/atomic/Atomic-RMW+mb__after_atomic-is-stronger-than-acquire.litmus

diff --git a/Documentation/atomic_t.txt b/Documentation/atomic_t.txt
index 67d1d99f..0f1fded 100644
--- a/Documentation/atomic_t.txt
+++ b/Documentation/atomic_t.txt
@@ -233,19 +233,19 @@ as well. Similarly, something like:
 is an ACQUIRE pattern (though very much not typical), but again the barrier is
 strictly stronger than ACQUIRE. As illustrated:
 
-  C strong-acquire
+  C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
 
   {
   }
 
-  P1(int *x, atomic_t *y)
+  P0(int *x, atomic_t *y)
   {
 r0 = READ_ONCE(*x);
 smp_rmb();
 r1 = atomic_read(y);
   }
 
-  P2(int *x, atomic_t *y)
+  P1(int *x, atomic_t *y)
   {
 atomic_inc(y);
 smp_mb__after_atomic();
@@ -253,14 +253,14 @@ strictly stronger than ACQUIRE. As illustrated:
   }
 
   exists
-  (r0=1 /\ r1=0)
+  (0:r0=1 /\ 0:r1=0)
 
 This should not happen; but a hypothetical atomic_inc_acquire() --
 (void)atomic_fetch_inc_acquire() for instance -- would allow the outcome,
 because it would not order the W part of the RMW against the following
 WRITE_ONCE.  Thus:
 
-  P1   P2
+  P0   P1
 
t = LL.acq *y (0)
t++;
diff --git 
a/Documentation/litmus-tests/atomic/Atomic-RMW+mb__after_atomic-is-stronger-than-acquire.litmus
 
b/Documentation/litmus-tests/atomic/Atomic-RMW+mb__after_atomic-is-stronger-than-acquire.litmus
new file mode 100644
index 000..9a8e31a
--- /dev/null
+++ 
b/Documentation/litmus-tests/atomic/Atomic-RMW+mb__after_atomic-is-stronger-than-acquire.litmus
@@ -0,0 +1,32 @@
+C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
+
+(*
+ * Result: Never
+ *
+ * Test that an atomic RMW followed by a smp_mb__after_atomic() is
+ * stronger than a normal acquire: both the read and write parts of
+ * the RMW are ordered before the subsequential memory accesses.
+ *)
+
+{
+}
+
+P0(int *x, atomic_t *y)
+{
+   int r0;
+   int r1;
+
+   r0 = READ_ONCE(*x);
+   smp_rmb();
+   r1 = atomic_read(y);
+}
+
+P1(int *x, atomic_t *y)
+{
+   atomic_inc(y);
+   smp_mb__after_atomic();
+   WRITE_ONCE(*x, 1);
+}
+
+exists
+(0:r0=1 /\ 0:r1=0)
diff --git a/Documentation/litmus-tests/atomic/README 
b/Documentation/litmus-tests/atomic/README
index a1b7241..714cf93 100644
--- a/Documentation/litmus-tests/atomic/README
+++ b/Documentation/litmus-tests/atomic/README
@@ -7,5 +7,10 @@ tools/memory-model/README.
 LITMUS TESTS
 
 
+Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
+   Test that an atomic RMW followed by a smp_mb__after_atomic() is
+   stronger than a normal acquire: both the read and write parts of
+   the RMW are ordered before the subsequential memory accesses.
+
 Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
Test that atomic_set() cannot break the atomicity of atomic RMWs.
-- 
2.9.5



[PATCH tip/core/rcu 06/14] tools/memory-model: Add an exception for limitations on _unless() family

2020-06-22 Thread paulmck
From: Boqun Feng 

According to Luc, atomic_add_unless() is directly provided by herd7,
therefore it can be used in litmus tests. So change the limitation
section in README to unlimit the use of atomic_add_unless().

Cc: Luc Maranget 
Acked-by: Andrea Parri 
Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Boqun Feng 
Signed-off-by: Paul E. McKenney 
---
 tools/memory-model/README | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/memory-model/README b/tools/memory-model/README
index fc07b52..b9c562e 100644
--- a/tools/memory-model/README
+++ b/tools/memory-model/README
@@ -207,11 +207,15 @@ The Linux-kernel memory model (LKMM) has the following 
limitations:
case as a store release.
 
b.  The "unless" RMW operations are not currently modeled:
-   atomic_long_add_unless(), atomic_add_unless(),
-   atomic_inc_unless_negative(), and
-   atomic_dec_unless_positive().  These can be emulated
+   atomic_long_add_unless(), atomic_inc_unless_negative(),
+   and atomic_dec_unless_positive().  These can be emulated
in litmus tests, for example, by using atomic_cmpxchg().
 
+   One exception of this limitation is atomic_add_unless(),
+   which is provided directly by herd7 (so no corresponding
+   definition in linux-kernel.def).  atomic_add_unless() is
+   modeled by herd7 therefore it can be used in litmus tests.
+
c.  The call_rcu() function is not modeled.  It can be
emulated in litmus tests by adding another process that
invokes synchronize_rcu() and the body of the callback
-- 
2.9.5



[PATCH tip/core/rcu 03/14] Documentation: LKMM: Add litmus test for RCU GP guarantee where updater frees object

2020-06-22 Thread paulmck
From: "Joel Fernandes (Google)" 

This adds an example for the important RCU grace period guarantee, which
shows an RCU reader can never span a grace period.

Acked-by: Andrea Parri 
Signed-off-by: Joel Fernandes (Google) 
Signed-off-by: Paul E. McKenney 
---
 .../litmus-tests/rcu/RCU+sync+free.litmus  | 42 ++
 1 file changed, 42 insertions(+)
 create mode 100644 Documentation/litmus-tests/rcu/RCU+sync+free.litmus

diff --git a/Documentation/litmus-tests/rcu/RCU+sync+free.litmus 
b/Documentation/litmus-tests/rcu/RCU+sync+free.litmus
new file mode 100644
index 000..4ee67e1
--- /dev/null
+++ b/Documentation/litmus-tests/rcu/RCU+sync+free.litmus
@@ -0,0 +1,42 @@
+C RCU+sync+free
+
+(*
+ * Result: Never
+ *
+ * This litmus test demonstrates that an RCU reader can never see a write that
+ * follows a grace period, if it did not see writes that precede that grace
+ * period.
+ *
+ * This is a typical pattern of RCU usage, where the write before the grace
+ * period assigns a pointer, and the writes following the grace period destroy
+ * the object that the pointer used to point to.
+ *
+ * This is one implication of the RCU grace-period guarantee, which says (among
+ * other things) that an RCU read-side critical section cannot span a grace 
period.
+ *)
+
+{
+int x = 1;
+int *y = 
+int z = 1;
+}
+
+P0(int *x, int *z, int **y)
+{
+   int *r0;
+   int r1;
+
+   rcu_read_lock();
+   r0 = rcu_dereference(*y);
+   r1 = READ_ONCE(*r0);
+   rcu_read_unlock();
+}
+
+P1(int *x, int *z, int **y)
+{
+   rcu_assign_pointer(*y, z);
+   synchronize_rcu();
+   WRITE_ONCE(*x, 0);
+}
+
+exists (0:r0=x /\ 0:r1=0)
-- 
2.9.5



[PATCH tip/core/rcu 11/14] Documentation/litmus-tests: Merge atomic's README into top-level one

2020-06-22 Thread paulmck
From: Akira Yokosawa 

Where Documentation/litmus-tests/README lists RCU litmus tests,
Documentation/litmus-tests/atomic/README lists atomic litmus tests.
For symmetry, merge the latter into former, with some context
adjustment in the introduction.

Acked-by: Andrea Parri 
Acked-by: Joel Fernandes (Google) 
Acked-by: Boqun Feng 
Signed-off-by: Akira Yokosawa 
Signed-off-by: Paul E. McKenney 
---
 Documentation/litmus-tests/README| 19 +++
 Documentation/litmus-tests/atomic/README | 16 
 2 files changed, 19 insertions(+), 16 deletions(-)
 delete mode 100644 Documentation/litmus-tests/atomic/README

diff --git a/Documentation/litmus-tests/README 
b/Documentation/litmus-tests/README
index c4307ea..ac0b270 100644
--- a/Documentation/litmus-tests/README
+++ b/Documentation/litmus-tests/README
@@ -2,6 +2,25 @@
 LITMUS TESTS
 
 
+Each subdirectory contains litmus tests that are typical to describe the
+semantics of respective kernel APIs.
+For more information about how to "run" a litmus test or how to generate
+a kernel test module based on a litmus test, please see
+tools/memory-model/README.
+
+
+atomic (/atomic derectory)
+--
+
+Atomic-RMW+mb__after_atomic-is-stronger-than-acquire.litmus
+Test that an atomic RMW followed by a smp_mb__after_atomic() is
+stronger than a normal acquire: both the read and write parts of
+the RMW are ordered before the subsequential memory accesses.
+
+Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
+Test that atomic_set() cannot break the atomicity of atomic RMWs.
+
+
 RCU (/rcu directory)
 
 
diff --git a/Documentation/litmus-tests/atomic/README 
b/Documentation/litmus-tests/atomic/README
deleted file mode 100644
index 714cf93..000
--- a/Documentation/litmus-tests/atomic/README
+++ /dev/null
@@ -1,16 +0,0 @@
-This directory contains litmus tests that are typical to describe the semantics
-of our atomic APIs. For more information about how to "run" a litmus test or
-how to generate a kernel test module based on a litmus test, please see
-tools/memory-model/README.
-
-
-LITMUS TESTS
-
-
-Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
-   Test that an atomic RMW followed by a smp_mb__after_atomic() is
-   stronger than a normal acquire: both the read and write parts of
-   the RMW are ordered before the subsequential memory accesses.
-
-Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
-   Test that atomic_set() cannot break the atomicity of atomic RMWs.
-- 
2.9.5



[PATCH tip/core/rcu 04/14] Documentation: LKMM: Add litmus test for RCU GP guarantee where reader stores

2020-06-22 Thread paulmck
From: "Joel Fernandes (Google)" 

This adds an example for the important RCU grace period guarantee, which
shows an RCU reader can never span a grace period.

Acked-by: Andrea Parri 
Signed-off-by: Joel Fernandes (Google) 
Signed-off-by: Paul E. McKenney 
---
 Documentation/litmus-tests/README  | 11 +++
 .../litmus-tests/rcu/RCU+sync+read.litmus  | 37 ++
 2 files changed, 48 insertions(+)
 create mode 100644 Documentation/litmus-tests/README
 create mode 100644 Documentation/litmus-tests/rcu/RCU+sync+read.litmus

diff --git a/Documentation/litmus-tests/README 
b/Documentation/litmus-tests/README
new file mode 100644
index 000..c4307ea
--- /dev/null
+++ b/Documentation/litmus-tests/README
@@ -0,0 +1,11 @@
+
+LITMUS TESTS
+
+
+RCU (/rcu directory)
+
+
+RCU+sync+read.litmus
+RCU+sync+free.litmus
+Both the above litmus tests demonstrate the RCU grace period guarantee
+that an RCU read-side critical section can never span a grace period.
diff --git a/Documentation/litmus-tests/rcu/RCU+sync+read.litmus 
b/Documentation/litmus-tests/rcu/RCU+sync+read.litmus
new file mode 100644
index 000..f341767
--- /dev/null
+++ b/Documentation/litmus-tests/rcu/RCU+sync+read.litmus
@@ -0,0 +1,37 @@
+C RCU+sync+read
+
+(*
+ * Result: Never
+ *
+ * This litmus test demonstrates that after a grace period, an RCU updater 
always
+ * sees all stores done in prior RCU read-side critical sections. Such
+ * read-side critical sections would have ended before the grace period ended.
+ *
+ * This is one implication of the RCU grace-period guarantee, which says (among
+ * other things) that an RCU read-side critical section cannot span a grace 
period.
+ *)
+
+{
+int x = 0;
+int y = 0;
+}
+
+P0(int *x, int *y)
+{
+   rcu_read_lock();
+   WRITE_ONCE(*x, 1);
+   WRITE_ONCE(*y, 1);
+   rcu_read_unlock();
+}
+
+P1(int *x, int *y)
+{
+   int r0;
+   int r1;
+
+   r0 = READ_ONCE(*x);
+   synchronize_rcu();
+   r1 = READ_ONCE(*y);
+}
+
+exists (1:r0=1 /\ 1:r1=0)
-- 
2.9.5



[PATCH tip/core/rcu 14/14] docs: fix references for DMA*.txt files

2020-06-22 Thread paulmck
From: Mauro Carvalho Chehab 

As we moved those files to core-api, fix references to point
to their newer locations.

Signed-off-by: Mauro Carvalho Chehab 
Signed-off-by: Paul E. McKenney 
---
 Documentation/memory-barriers.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index eaabc31..0e4947a 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -546,8 +546,8 @@ There are certain things that the Linux kernel memory 
barriers do not guarantee:
[*] For information on bus mastering DMA and coherency please read:
 
Documentation/driver-api/pci/pci.rst
-   Documentation/DMA-API-HOWTO.txt
-   Documentation/DMA-API.txt
+   Documentation/core-api/dma-api-howto.rst
+   Documentation/core-api/dma-api.rst
 
 
 DATA DEPENDENCY BARRIERS (HISTORICAL)
@@ -1932,7 +1932,7 @@ There are some more advanced barrier functions:
  here.
 
  See the subsection "Kernel I/O barrier effects" for more information on
- relaxed I/O accessors and the Documentation/DMA-API.txt file for more
+ relaxed I/O accessors and the Documentation/core-api/dma-api.rst file for 
more
  information on consistent memory.
 
 
-- 
2.9.5



Re: [RFC] Bypass filesystems for reading cached pages

2020-06-22 Thread Dave Chinner
On Mon, Jun 22, 2020 at 04:35:05PM +0200, Andreas Gruenbacher wrote:
> On Mon, Jun 22, 2020 at 2:32 AM Dave Chinner  wrote:
> > On Fri, Jun 19, 2020 at 08:50:36AM -0700, Matthew Wilcox wrote:
> > >
> > > This patch lifts the IOCB_CACHED idea expressed by Andreas to the VFS.
> > > The advantage of this patch is that we can avoid taking any filesystem
> > > lock, as long as the pages being accessed are in the cache (and we don't
> > > need to readahead any pages into the cache).  We also avoid an indirect
> > > function call in these cases.
> >
> > What does this micro-optimisation actually gain us except for more
> > complexity in the IO path?
> >
> > i.e. if a filesystem lock has such massive overhead that it slows
> > down the cached readahead path in production workloads, then that's
> > something the filesystem needs to address, not unconditionally
> > bypass the filesystem before the IO gets anywhere near it.
> 
> I'm fine with not moving that functionality into the VFS. The problem
> I have in gfs2 is that taking glocks is really expensive. Part of that
> overhead is accidental, but we definitely won't be able to fix it in
> the short term. So something like the IOCB_CACHED flag that prevents
> generic_file_read_iter from issuing readahead I/O would save the day
> for us. Does that idea stand a chance?

I have no problem with a "NOREADAHEAD" flag being passed to
generic_file_read_iter(). It's not a "already cached" flag though,
it's a "don't start any IO" directive, just like the NOWAIT flag is
a "don't block on locks or IO in progress" directive and not an
"already cached" flag. Readahead is something we should be doing,
unless a filesystem has a very good reason not to, such as the gfs2
locking case here...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


[PATCH tip/core/rcu 13/14] tools/memory-model/README: Expand dependency of klitmus7

2020-06-22 Thread paulmck
From: Akira Yokosawa 

klitmus7 is independent of the memory model but depends on the
build-target kernel release.
It occasionally lost compatibility due to kernel API changes [1, 2, 3].
It was remedied in a backwards-compatible manner respectively [4, 5, 6].

Reflect this fact in README.

[1]: b899a850431e ("compiler.h: Remove ACCESS_ONCE()")
[2]: 0bb95f80a38f ("Makefile: Globally enable VLA warning")
[3]: d56c0d45f0e2 ("proc: decouple proc from VFS with "struct proc_ops"")
[4]: https://github.com/herd/herdtools7/commit/e87d7f9287d1
 ("klitmus: Use WRITE_ONCE and READ_ONCE in place of deprecated 
ACCESS_ONCE")
[5]: https://github.com/herd/herdtools7/commit/a0cbb10d02be
 ("klitmus: Avoid variable length array")
[6]: https://github.com/herd/herdtools7/commit/46b9412d3a58
 ("klitmus: Linux kernel v5.6.x compat")

NOTE: [5] was ahead of herdtools7 7.53, which did not make an
official release.  Code generated by klitmus7 without [5] can still be
built targeting Linux 4.20--5.5 if you don't care VLA warnings.

Acked-by: Andrea Parri 
Signed-off-by: Akira Yokosawa 
Signed-off-by: Paul E. McKenney 
---
 tools/memory-model/README | 30 --
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/tools/memory-model/README b/tools/memory-model/README
index b9c562e..90af203 100644
--- a/tools/memory-model/README
+++ b/tools/memory-model/README
@@ -28,8 +28,34 @@ downloaded separately:
 See "herdtools7/INSTALL.md" for installation instructions.
 
 Note that although these tools usually provide backwards compatibility,
-this is not absolutely guaranteed.  Therefore, if a later version does
-not work, please try using the exact version called out above.
+this is not absolutely guaranteed.
+
+For example, a future version of herd7 might not work with the model
+in this release.  A compatible model will likely be made available in
+a later release of Linux kernel.
+
+If you absolutely need to run the model in this particular release,
+please try using the exact version called out above.
+
+klitmus7 is independent of the model provided here.  It has its own
+dependency on a target kernel release where converted code is built
+and executed.  Any change in kernel APIs essential to klitmus7 will
+necessitate an upgrade of klitmus7.
+
+If you find any compatibility issues in klitmus7, please inform the
+memory model maintainers.
+
+klitmus7 Compatibility Table
+
+
+     ==
+   target Linux  herdtools7
+     --
+-- 4.18  7.48 --
+   4.15 -- 4.19  7.49 --
+   4.20 -- 5.5   7.54 --
+   5.6  --   HEAD
+     ==
 
 
 ==
-- 
2.9.5



[PATCH tip/core/rcu 10/14] tools/memory-model: Fix reference to litmus test in recipes.txt

2020-06-22 Thread paulmck
From: Akira Yokosawa 

The name of litmus test doesn't match the one described below.
Fix the name of litmus test.

Acked-by: Andrea Parri 
Acked-by: Joel Fernandes (Google) 
Signed-off-by: Akira Yokosawa 
Signed-off-by: Paul E. McKenney 
---
 tools/memory-model/Documentation/recipes.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/memory-model/Documentation/recipes.txt 
b/tools/memory-model/Documentation/recipes.txt
index 7fe8d7a..63c4adf 100644
--- a/tools/memory-model/Documentation/recipes.txt
+++ b/tools/memory-model/Documentation/recipes.txt
@@ -126,7 +126,7 @@ However, it is not necessarily the case that accesses 
ordered by
 locking will be seen as ordered by CPUs not holding that lock.
 Consider this example:
 
-   /* See Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus. */
+   /* See Z6.0+pooncelock+pooncelock+pombonce.litmus. */
void CPU0(void)
{
spin_lock();
-- 
2.9.5



[PATCH tip/core/rcu 02/14] tools/memory-model: Fix "conflict" definition

2020-06-22 Thread paulmck
From: Marco Elver 

The definition of "conflict" should not include the type of access nor
whether the accesses are concurrent or not, which this patch addresses.
The definition of "data race" remains unchanged.

The definition of "conflict" as we know it and is cited by various
papers on memory consistency models appeared in [1]: "Two accesses to
the same variable conflict if at least one is a write; two operations
conflict if they execute conflicting accesses."

The LKMM as well as the C11 memory model are adaptations of
data-race-free, which are based on the work in [2]. Necessarily, we need
both conflicting data operations (plain) and synchronization operations
(marked). For example, C11's definition is based on [3], which defines a
"data race" as: "Two memory operations conflict if they access the same
memory location, and at least one of them is a store, atomic store, or
atomic read-modify-write operation. In a sequentially consistent
execution, two memory operations from different threads form a type 1
data race if they conflict, at least one of them is a data operation,
and they are adjacent in http://snir.cs.illinois.edu/listed/J21.pdf

[2] S. Adve, "Designing Memory Consistency Models for Shared-Memory
Multiprocessors", 1993.
URL: http://sadve.cs.illinois.edu/Publications/thesis.pdf

[3] H.-J. Boehm, S. Adve, "Foundations of the C++ Concurrency Memory
Model", 2008.
URL: https://www.hpl.hp.com/techreports/2008/HPL-2008-56.pdf

Signed-off-by: Marco Elver 
Co-developed-by: Alan Stern 
Signed-off-by: Alan Stern 
Acked-by: Andrea Parri 
Signed-off-by: Paul E. McKenney 
---
 tools/memory-model/Documentation/explanation.txt | 83 +---
 1 file changed, 45 insertions(+), 38 deletions(-)

diff --git a/tools/memory-model/Documentation/explanation.txt 
b/tools/memory-model/Documentation/explanation.txt
index e91a2eb..993f800 100644
--- a/tools/memory-model/Documentation/explanation.txt
+++ b/tools/memory-model/Documentation/explanation.txt
@@ -1987,28 +1987,36 @@ outcome undefined.
 
 In technical terms, the compiler is allowed to assume that when the
 program executes, there will not be any data races.  A "data race"
-occurs when two conflicting memory accesses execute concurrently;
-two memory accesses "conflict" if:
+occurs when there are two memory accesses such that:
 
-   they access the same location,
+1. they access the same location,
 
-   they occur on different CPUs (or in different threads on the
-   same CPU),
+2. at least one of them is a store,
 
-   at least one of them is a plain access,
+3. at least one of them is plain,
 
-   and at least one of them is a store.
+4. they occur on different CPUs (or in different threads on the
+   same CPU), and
 
-The LKMM tries to determine whether a program contains two conflicting
-accesses which may execute concurrently; if it does then the LKMM says
-there is a potential data race and makes no predictions about the
-program's outcome.
+5. they execute concurrently.
 
-Determining whether two accesses conflict is easy; you can see that
-all the concepts involved in the definition above are already part of
-the memory model.  The hard part is telling whether they may execute
-concurrently.  The LKMM takes a conservative attitude, assuming that
-accesses may be concurrent unless it can prove they cannot.
+In the literature, two accesses are said to "conflict" if they satisfy
+1 and 2 above.  We'll go a little farther and say that two accesses
+are "race candidates" if they satisfy 1 - 4.  Thus, whether or not two
+race candidates actually do race in a given execution depends on
+whether they are concurrent.
+
+The LKMM tries to determine whether a program contains race candidates
+which may execute concurrently; if it does then the LKMM says there is
+a potential data race and makes no predictions about the program's
+outcome.
+
+Determining whether two accesses are race candidates is easy; you can
+see that all the concepts involved in the definition above are already
+part of the memory model.  The hard part is telling whether they may
+execute concurrently.  The LKMM takes a conservative attitude,
+assuming that accesses may be concurrent unless it can prove they
+are not.
 
 If two memory accesses aren't concurrent then one must execute before
 the other.  Therefore the LKMM decides two accesses aren't concurrent
@@ -2171,8 +2179,8 @@ again, now using plain accesses for buf:
}
 
 This program does not contain a data race.  Although the U and V
-accesses conflict, the LKMM can prove they are not concurrent as
-follows:
+accesses are race candidates, the LKMM can prove they are not
+concurrent as follows:
 
The smp_wmb() fence in P0 is both a compiler barrier and a
cumul-fence.  It guarantees that no matter what hash of
@@ -2326,12 +2334,11 @@ could now perform the load of x before the load of ptr 
(there might be
 a control dependency but no 

[PATCH tip/core/rcu 07/14] Documentation/litmus-tests: Introduce atomic directory

2020-06-22 Thread paulmck
From: Boqun Feng 

Although we have atomic_t.txt and its friends to describe the semantics
of atomic APIs and lib/atomic64_test.c for build testing and testing in
UP mode, the tests for our atomic APIs in real SMP mode are still
missing. Since now we have the LKMM tool in kernel and litmus tests can
be used to generate kernel modules for testing purpose with "klitmus" (a
tool from the LKMM toolset), it makes sense to put a few typical litmus
tests into kernel so that

1)  they are the examples to describe the conceptual mode of the
semantics of atomic APIs, and

2)  they can be used to generate kernel test modules for anyone
who is interested to test the atomic APIs implementation (in
most cases, is the one who implements the APIs for a new arch)

Therefore, introduce the atomic directory for this purpose. The
directory is maintained by the LKMM group to make sure the litmus tests
are always aligned with our memory model.

Acked-by: Alan Stern 
Acked-by: Andrea Parri 
Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Boqun Feng 
Signed-off-by: Paul E. McKenney 
---
 Documentation/litmus-tests/atomic/README | 4 
 1 file changed, 4 insertions(+)
 create mode 100644 Documentation/litmus-tests/atomic/README

diff --git a/Documentation/litmus-tests/atomic/README 
b/Documentation/litmus-tests/atomic/README
new file mode 100644
index 000..ae61201
--- /dev/null
+++ b/Documentation/litmus-tests/atomic/README
@@ -0,0 +1,4 @@
+This directory contains litmus tests that are typical to describe the semantics
+of our atomic APIs. For more information about how to "run" a litmus test or
+how to generate a kernel test module based on a litmus test, please see
+tools/memory-model/README.
-- 
2.9.5



Re: [RESEND PATCH 1/3] nouveau: fix migrate page regression

2020-06-22 Thread John Hubbard

On 2020-06-22 16:38, Ralph Campbell wrote:

The patch to add zero page migration to GPU memory inadvertantly included


inadvertently


part of a future change which broke normal page migration to GPU memory
by copying too much data and corrupting GPU memory.
Fix this by only copying one page instead of a byte count.

Fixes: 9d4296a7d4b3 ("drm/nouveau/nouveau/hmm: fix migrate zero page to GPU")
Signed-off-by: Ralph Campbell 
---
  drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index e5c230d9ae24..cc9993837508 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -550,7 +550,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct 
nouveau_drm *drm,
 DMA_BIDIRECTIONAL);
if (dma_mapping_error(dev, *dma_addr))
goto out_free_page;
-   if (drm->dmem->migrate.copy_func(drm, page_size(spage),
+   if (drm->dmem->migrate.copy_func(drm, 1,
NOUVEAU_APER_VRAM, paddr, NOUVEAU_APER_HOST, *dma_addr))
goto out_dma_unmap;
} else {




I Am Not A Nouveau Expert, nor is it really clear to me how
page_size(spage) came to contain something other than a page's worth of
byte count, but this fix looks accurate to me. It's better for
maintenance, too, because the function never intends to migrate "some
number of bytes". It intends to migrate exactly one page.

Hope I'm not missing something fundamental, but:

Reviewed-by: John Hubbard 

[PATCH] KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL

2020-06-22 Thread Sean Christopherson
Remove support for context switching between the guest's and host's
desired UMWAIT_CONTROL.  Propagating the guest's value to hardware isn't
required for correct functionality, e.g. KVM intercepts reads and writes
to the MSR, and the latency effects of the settings controlled by the
MSR are not architecturally visible.

As a general rule, KVM should not allow the guest to control power
management settings unless explicitly enabled by userspace, e.g. see
KVM_CAP_X86_DISABLE_EXITS.  E.g. Intel's SDM explicitly states that C0.2
can improve the performance of SMT siblings.  A devious guest could
disable C0.2 so as to improve the performance of their workloads at the
detriment to workloads running in the host or on other VMs.

Wholesale removal of UMWAIT_CONTROL context switching also fixes a race
condition where updates from the host may cause KVM to enter the guest
with the incorrect value.  Because updates are are propagated to all
CPUs via IPI (SMP function callback), the value in hardware may be
stale with respect to the cached value and KVM could enter the guest
with the wrong value in hardware.  As above, the guest can't observe the
bad value, but it's a weird and confusing wart in the implementation.

Removal also fixes the unnecessary usage of VMX's atomic load/store MSR
lists.  Using the lists is only necessary for MSRs that are required for
correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on
old hardware, or for MSRs that need to-the-uop precision, e.g. perf
related MSRs.  For UMWAIT_CONTROL, the effects are only visible in the
kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in
vcpu_vmx_run().  Using the atomic lists is undesirable as they are more
expensive than direct RDMSR/WRMSR.

Furthermore, even if giving the guest control of the MSR is legitimate,
e.g. in pass-through scenarios, it's not clear that the benefits would
outweigh the overhead.  E.g. saving and restoring an MSR across a VMX
roundtrip costs ~250 cycles, and if the guest diverged from the host
that cost would be paid on every run of the guest.  In other words, if
there is a legitimate use case then it should be enabled by a new
per-VM capability.

Note, KVM still needs to emulate MSR_IA32_UMWAIT_CONTROL so that it can
correctly expose other WAITPKG features to the guest, e.g. TPAUSE,
UMWAIT and UMONITOR.

Fixes: 6e3ba4abcea56 ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL")
Cc: sta...@vger.kernel.org
Cc: Jingqi Liu 
Cc: Tao Xu 
Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/mwait.h |  2 --
 arch/x86/kernel/cpu/umwait.c |  6 --
 arch/x86/kvm/vmx/vmx.c   | 18 --
 3 files changed, 26 deletions(-)

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 73d997aa2966..e039a933aca3 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -25,8 +25,6 @@
 #define TPAUSE_C01_STATE   1
 #define TPAUSE_C02_STATE   0
 
-u32 get_umwait_control_msr(void);
-
 static inline void __monitor(const void *eax, unsigned long ecx,
 unsigned long edx)
 {
diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index 300e3fd5ade3..ec8064c0ae03 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -18,12 +18,6 @@
  */
 static u32 umwait_control_cached = UMWAIT_CTRL_VAL(10, UMWAIT_C02_ENABLE);
 
-u32 get_umwait_control_msr(void)
-{
-   return umwait_control_cached;
-}
-EXPORT_SYMBOL_GPL(get_umwait_control_msr);
-
 /*
  * Cache the original IA32_UMWAIT_CONTROL MSR value which is configured by
  * hardware or BIOS before kernel boot.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 08e26a9518c2..b2447c1ee362 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6606,23 +6606,6 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
msrs[i].host, false);
 }
 
-static void atomic_switch_umwait_control_msr(struct vcpu_vmx *vmx)
-{
-   u32 host_umwait_control;
-
-   if (!vmx_has_waitpkg(vmx))
-   return;
-
-   host_umwait_control = get_umwait_control_msr();
-
-   if (vmx->msr_ia32_umwait_control != host_umwait_control)
-   add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL,
-   vmx->msr_ia32_umwait_control,
-   host_umwait_control, false);
-   else
-   clear_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL);
-}
-
 static void vmx_update_hv_timer(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -6730,7 +6713,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
if (vcpu_to_pmu(vcpu)->version)
atomic_switch_perf_msrs(vmx);
-   atomic_switch_umwait_control_msr(vmx);
 
if (enable_preemption_timer)
vmx_update_hv_timer(vcpu);
-- 
2.26.0



[PATCH memory-model 0/14] LKMM updates for v5.9

2020-06-22 Thread Paul E. McKenney
Hello!

This series contains updates to the Linux-kernel memory model:

1.  tools/memory-model: Add recent references.

2.  tools/memory-model: Fix "conflict" definition, courtesy of
Marco Elver.

3.  Documentation: LKMM: Add litmus test for RCU GP guarantee where
updater frees object, courtesy of Joel Fernandes.

4.  Documentation: LKMM: Add litmus test for RCU GP guarantee where
reader stores, courtesy of Joel Fernandes.

5.  MAINTAINERS: Update maintainers for new Documentation/litmus-tests,
courtesy of Joel Fernandes.

6.  tools/memory-model: Add an exception for limitations on _unless()
family, courtesy of Boqun Feng.

7.  Documentation/litmus-tests: Introduce atomic directory, courtesy of
Boqun Feng.

8.  Documentation/litmus-tests/atomic: Add a test for atomic_set()
courtesy of Boqun Feng.

9.  Documentation/litmus-tests/atomic: Add a test for
smp_mb__after_atomic(), courtesy of Boqun Feng.

10. tools/memory-model: Fix reference to litmus test in recipes.txt
courtesy of Akira Yokosawa.

11. Documentation/litmus-tests: Merge atomic's README into top-level
one, courtesy of Akira Yokosawa.

12. Documentation/litmus-tests: Cite an RCU litmus test, courtesy of
Joel Fernandes.

13. tools/memory-model/README: Expand dependency of klitmus7, courtesy
of Akira Yokosawa.

14. fix references for DMA*.txt files, courtesy of Mauro Carvalho Chehab.

Thanx, Paul



 /Documentation/litmus-tests/atomic/README  
 |   16 -
 b/Documentation/atomic_t.txt   
 |   24 +-
 b/Documentation/litmus-tests/README
 |   34 
 
b/Documentation/litmus-tests/atomic/Atomic-RMW+mb__after_atomic-is-stronger-than-acquire.litmus
 |   32 +++
 
b/Documentation/litmus-tests/atomic/Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
 |   24 ++
 b/Documentation/litmus-tests/atomic/README 
 |   16 +
 b/Documentation/litmus-tests/rcu/RCU+sync+free.litmus  
 |   42 +
 b/Documentation/litmus-tests/rcu/RCU+sync+read.litmus  
 |   37 
 b/Documentation/memory-barriers.txt
 |6 
 b/MAINTAINERS  
 |2 
 b/tools/memory-model/Documentation/explanation.txt 
 |   83 +-
 b/tools/memory-model/Documentation/recipes.txt 
 |2 
 b/tools/memory-model/Documentation/references.txt  
 |   21 ++
 b/tools/memory-model/README
 |   40 
 14 files changed, 302 insertions(+), 77 deletions(-)


Re: [PULL REQUEST] i2c for 5.8

2020-06-22 Thread John Stultz
On Sat, Jun 13, 2020 at 4:36 AM Wolfram Sang  wrote:
>
> I2C has quite some patches for you this time. I hope it is the move to
> per-driver-maintainers which is now showing results. We will see.
>
> Big news is two new drivers (Nuvoton NPCM and Qualcomm CCI), larger
> refactoring of the Designware, Tegra, and PXA drivers, the Cadence
> driver supports being a slave now, and there is support to
> instanciate SPD eeproms for well-known cases (which will be user-visible
> because the i801 driver supports it), and some
> devm_platform_ioremap_resource() conversions which blow up the diffstat.
>
> Note that I applied the Nuvoton driver quite late, so some minor fixup patches
> arrived during the merge window. I chose to apply them right away
> because they were trivial.
>
> Please pull.
>
> Thanks,
>
>Wolfram
>
>
> The following changes since commit 0e698dfa282211e414076f9dc7e83c1c288314fd:
>
>   Linux 5.7-rc4 (2020-05-03 14:56:04 -0700)
>
> are available in the Git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-5.8
>
> for you to fetch changes up to d790eeb3db6aefac39ffa06e598eb31b7352ca4f:
>
>   i2c: Drop stray comma in MODULE_AUTHOR statements (2020-06-11 12:32:14 
> +0200)
>
> 
...
> Andy Shevchenko (17):
...
>   i2c: designware: Move ACPI parts into common module

Just as a heads up. I'm seeing a regression from this merge that I
bisected down to the patch above (f9288fcc5c615), with the HiKey
board. It seems the adv7511 (HDMI bridge) driver isn't probing, which
causes graphics to fail.

I've just bisected it down and haven't managed to do much debugging,
but I wanted to raise a flag on this. Let me know if there is anything
you'd like me to try right off.

thanks
-john


Re: [PATCHv8 0/3] optee: register drivers on optee bus

2020-06-22 Thread Jarkko Sakkinen
On Thu, Jun 18, 2020 at 09:56:13AM +0200, Jens Wiklander wrote:
> On Thu, Jun 18, 2020 at 02:37:55AM +0300, Jarkko Sakkinen wrote:
> > On Tue, Jun 16, 2020 at 10:29:07AM +0200, Jens Wiklander wrote:
> > > Hi Maxim and Jarkko,
> > > 
> > > On Mon, Jun 15, 2020 at 05:32:40PM +0300, Maxim Uvarov wrote:
> > > > ping.
> > > > Patchset was reviewed and all comments are codeverd. Optee-os patches
> > > > were merged. These kernel patches look like they are hanging
> > > > somewhere...
> > > 
> > > I'm almost OK with this patchset, except that
> > > Documentation/ABI/testing/sysfs-bus-optee-devices needs to be updated
> > > for the new kernel version and TEE mailing list which we're changing right
> > > now.
> > > 
> > > The last patch touches files I'm not maintainer of. That patch depends
> > > on the previous patches so it makes sense to keep them together.  If a
> > > TPM device driver maintainer would ack that patch I can take it via my
> > > tree. Or we can do it the other way around (with a v9 patchset),
> > > whichever is preferred.
> > > 
> > > Cheers,
> > > Jens
> > 
> > Probably easier if you pick all three and I ack the one touching TPM.
> 
> Makes sense, let's do that.

Great, thanks!

/Jarkko


Re: [PATCHv2] tpm: ibmvtpm: Wait for ready buffer before probing for TPM2 attributes

2020-06-22 Thread Jarkko Sakkinen
On Fri, Jun 19, 2020 at 01:30:40PM +1000, David Gibson wrote:
> The tpm2_get_cc_attrs_tbl() call will result in TPM commands being issued,
> which will need the use of the internal command/response buffer.  But,
> we're issuing this *before* we've waited to make sure that buffer is
> allocated.
> 
> This can result in intermittent failures to probe if the hypervisor / TPM
> implementation doesn't respond quickly enough.  I find it fails almost
> every time with an 8 vcpu guest under KVM with software emulated TPM.
> 
> To fix it, just move the tpm2_get_cc_attrs_tlb() call after the
> existing code to wait for initialization, which will ensure the buffer
> is allocated.
> 
> Fixes: 18b3670d79ae9 ("tpm: ibmvtpm: Add support for TPM2")
> Signed-off-by: David Gibson 

Reviewed-by: Jarkko Sakkinen 

/Jarkko


Re: [PATCH] proc: Avoid a thundering herd of threads freeing proc dentries

2020-06-22 Thread Matthew Wilcox
On Sun, Jun 21, 2020 at 10:15:39PM -0700, Junxiao Bi wrote:
> On 6/20/20 9:27 AM, Matthew Wilcox wrote:
> > On Fri, Jun 19, 2020 at 05:42:45PM -0500, Eric W. Biederman wrote:
> > > Junxiao Bi  writes:
> > > > Still high lock contention. Collect the following hot path.
> > > A different location this time.
> > > 
> > > I know of at least exit_signal and exit_notify that take thread wide
> > > locks, and it looks like exit_mm is another.  Those don't use the same
> > > locks as flushing proc.
> > > 
> > > 
> > > So I think you are simply seeing a result of the thundering herd of
> > > threads shutting down at once.  Given that thread shutdown is 
> > > fundamentally
> > > a slow path there is only so much that can be done.
> > > 
> > > If you are up for a project to working through this thundering herd I
> > > expect I can help some.  It will be a long process of cleaning up
> > > the entire thread exit process with an eye to performance.
> > Wengang had some tests which produced wall-clock values for this problem,
> > which I agree is more informative.
> > 
> > I'm not entirely sure what the customer workload is that requires a
> > highly threaded workload to also shut down quickly.  To my mind, an
> > overall workload is normally composed of highly-threaded tasks that run
> > for a long time and only shut down rarely (thus performance of shutdown
> > is not important) and single-threaded tasks that run for a short time.
> 
> The real workload is a Java application working in server-agent mode, issue
> happened in agent side, all it do is waiting works dispatching from server
> and execute. To execute one work, agent will start lots of short live
> threads, there could be a lot of threads exit same time if there were a lots
> of work to execute, the contention on the exit path caused a high %sys time
> which impacted other workload.

How about this for a micro?  Executes in about ten seconds on my laptop.
You might need to tweak it a bit to get better timing on a server.

// gcc -pthread -O2 -g -W -Wall
#include 
#include 

void *worker(void *arg)
{
int i = 0;
int *p = arg;

for (;;) {
while (i < 1000 * 1000) {
i += *p;
}
sleep(1);
}
}

int main(int argc, char **argv)
{
pthread_t threads[20][100];
int i, j, one = 1;

for (i = 0; i < 1000; i++) {
for (j = 0; j < 100; j++)
pthread_create([i % 20][j], NULL, worker, );
if (i < 5)
continue;
for (j = 0; j < 100; j++)
pthread_cancel(threads[(i - 5) %20][j]);
}

return 0;
}


Re: [PATCH tip/core/rcu 02/26] mm/mmap.c: Add cond_resched() for exit_mmap() CPU stalls

2020-06-22 Thread Shakeel Butt
On Mon, Jun 22, 2020 at 5:22 PM  wrote:
>
> From: "Paul E. McKenney" 
>
> A large process running on a heavily loaded system can encounter the
> following RCU CPU stall warning:
>
>   rcu: INFO: rcu_sched self-detected stall on CPU
>   rcu: \x093-: (20998 ticks this GP) idle=4ea/1/0x4002 
> softirq=556558/556558 fqs=5190
>   \x09(t=21013 jiffies g=1005461 q=132576)
>   NMI backtrace for cpu 3
>   CPU: 3 PID: 501900 Comm: aio-free-ring-w Kdump: loaded Not tainted 
> 5.2.9-108_fbk12_rc3_3858_gb83b75af7909 #1
>   Hardware name: Wiwynn   HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016
>   Call Trace:
>
>dump_stack+0x46/0x60
>nmi_cpu_backtrace.cold.3+0x13/0x50
>? lapic_can_unplug_cpu.cold.27+0x34/0x34
>nmi_trigger_cpumask_backtrace+0xba/0xca
>rcu_dump_cpu_stacks+0x99/0xc7
>rcu_sched_clock_irq.cold.87+0x1aa/0x397
>? tick_sched_do_timer+0x60/0x60
>update_process_times+0x28/0x60
>tick_sched_timer+0x37/0x70
>__hrtimer_run_queues+0xfe/0x270
>hrtimer_interrupt+0xf4/0x210
>smp_apic_timer_interrupt+0x5e/0x120
>apic_timer_interrupt+0xf/0x20
>
>   RIP: 0010:kmem_cache_free+0x223/0x300
>   Code: 88 00 00 00 0f 85 ca 00 00 00 41 8b 55 18 31 f6 f7 da 41 f6 45 0a 02 
> 40 0f 94 c6 83 c6 05 9c 41 5e fa e8 a0 a7 01 00 41 56 9d <49> 8b 47 08 a8 03 
> 0f 85 87 00 00 00 65 48 ff 08 e9 3d fe ff ff 65
>   RSP: 0018:c9000e8e3da8 EFLAGS: 0206 ORIG_RAX: ff13
>   RAX: 0002 RBX: 88861b9de960 RCX: 0030
>   RDX: fffe41e8 RSI: 60777fe3a100 RDI: 0001be18
>   RBP: ea00186e7780 R08:  R09: 
>   R10: 88861b9dea28 R11: 7ffde000 R12: 81230a1f
>   R13: 54684dc0 R14: 0206 R15: 547dbc00
>? remove_vma+0x4f/0x60
>remove_vma+0x4f/0x60
>exit_mmap+0xd6/0x160
>mmput+0x4a/0x110
>do_exit+0x278/0xae0
>? syscall_trace_enter+0x1d3/0x2b0
>? handle_mm_fault+0xaa/0x1c0
>do_group_exit+0x3a/0xa0
>__x64_sys_exit_group+0x14/0x20
>do_syscall_64+0x42/0x100
>entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> And on a PREEMPT=n kernel, the "while (vma)" loop in exit_mmap() can run
> for a very long time given a large process.  This commit therefore adds
> a cond_resched() to this loop, providing RCU any needed quiescent states.
>
> Cc: Andrew Morton 
> Cc: 
> Signed-off-by: Paul E. McKenney 

We have exactly the same change in our internal kernel since 2018. We
mostly observed the need_resched warnings on the processes mapping the
hugetlbfs.

Reviewed-by: Shakeel Butt 

> ---
>  mm/mmap.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 59a4682..972f839 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3159,6 +3159,7 @@ void exit_mmap(struct mm_struct *mm)
> if (vma->vm_flags & VM_ACCOUNT)
> nr_accounted += vma_pages(vma);
> vma = remove_vma(vma);
> +   cond_resched();
> }
> vm_unacct_memory(nr_accounted);
>  }
> --
> 2.9.5
>


[PATCH tip/core/rcu 23/23] rcutorture: Remove KCSAN stubs

2020-06-22 Thread paulmck
From: "Paul E. McKenney" 

KCSAN is now in mainline, so this commit removes the stubs for the
data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS()
macros.

Signed-off-by: Paul E. McKenney 
---
 kernel/rcu/rcutorture.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 9c31001..f78c646 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -52,19 +52,6 @@
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Paul E. McKenney  and Josh Triplett 
");
 
-#ifndef data_race
-#define data_race(expr)
\
-   ({  \
-   expr;   \
-   })
-#endif
-#ifndef ASSERT_EXCLUSIVE_WRITER
-#define ASSERT_EXCLUSIVE_WRITER(var) do { } while (0)
-#endif
-#ifndef ASSERT_EXCLUSIVE_ACCESS
-#define ASSERT_EXCLUSIVE_ACCESS(var) do { } while (0)
-#endif
-
 /* Bits for ->extendables field, extendables param, and related definitions. */
 #define RCUTORTURE_RDR_SHIFT8  /* Put SRCU index in upper bits. */
 #define RCUTORTURE_RDR_MASK ((1 << RCUTORTURE_RDR_SHIFT) - 1)
-- 
2.9.5



[PATCH v1 2/2] romfs: address performance regression since v3.10

2020-06-22 Thread Sven Van Asbroeck
Problem
---
romfs sequential read performance has regressed very badly since
v3.10. Currently, reading a large file inside a romfs image is
up to 12x slower compared to reading the romfs image directly.

Benchmarks:
- use a romfs image which contains a single 250M file
- calculate the md5sum of the romfs image directly (test 1)
  $ time md5sum image.romfs
- loop-mount the romfs image, and calc the md5sum of the file
  inside it (test 2)
  $ mount -o loop,ro image.romfs /mnt/romfs
  $ time md5sum /mnt/romfs/file
- drop caches in between
  $ echo 3 > /proc/sys/vm/drop_caches

imx6 (arm cortex a9) on emmc, running v5.7.2:
(test 1)  5 seconds
(test 2) 60 seconds (12x slower)

Intel i7-3630QM on Samsung SSD 850 EVO (EMT02B6Q),
running Ubuntu with v4.15.0-106-generic:
(test 1) 1.3 seconds
(test 2) 3.3 seconds (2.5x slower)

To show that a regression has occurred since v3.10:

imx6 on emmc, running v3.10.17:
(test 1) 16 seconds
(test 2) 18 seconds

Proposed Solution
-
Increase the blocksize from 1K to PAGE_SIZE. This brings the
sequential read performance close to where it was on v3.10:

imx6 on emmc, running v5.7.2:
(test 2 1K blocksize) 60 seconds
(test 2 4K blocksize) 22 seconds

Intel on Ubuntu running v4.15:
(test 2 1K blocksize) 3.3 seconds
(test 2 4K blocksize) 1.9 seconds

There is a risk that this may increase latency on random-
access workloads. But the test below suggests that this
is not a concern:

Benchmark:
- use a 630M romfs image consisting of 9600 files
- loop-mount the romfs image
  $ mount -o loop,ro image.romfs /mnt/romfs
- drop all caches
- list all files in the filesystem (test 3)
  $ time find /mnt/romfs > /dev/null

imx6 on emmc, running v5.7.2:
(test 3 1K blocksize) 9.5 seconds
(test 3 4K blocksize) 9   seconds

Intel on Ubuntu, running v4.15:
(test 3 1K blocksize) 1.4 seconds
(test 3 4K blocksize) 1.2 seconds

Practical Solution
--
Introduce a mount-option called 'largeblocks'. If present,
increase the blocksize for much better sequential performance.

Note that the Linux block layer can only support n-K blocks if
the underlying block device length is also aligned to n-K. This
may not always be the case. Therefore, the driver will pick the
largest blocksize which the underlying block device can support.

Cc: Al Viro 
Cc: Deepa Dinamani 
Cc: David Howells 
Cc: "Darrick J. Wong" 
Cc: Janos Farkas 
Cc: Jeff Layton 
To: linux-kernel@vger.kernel.org
Signed-off-by: Sven Van Asbroeck 
---
 fs/romfs/super.c | 62 
 1 file changed, 57 insertions(+), 5 deletions(-)

diff --git a/fs/romfs/super.c b/fs/romfs/super.c
index 6fecdea791f1..93565aeaa43c 100644
--- a/fs/romfs/super.c
+++ b/fs/romfs/super.c
@@ -65,7 +65,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -460,6 +460,54 @@ static __u32 romfs_checksum(const void *data, int size)
return sum;
 }
 
+enum romfs_param {
+   Opt_largeblocks,
+};
+
+static const struct fs_parameter_spec romfs_fs_parameters[] = {
+   fsparam_flag("largeblocks", Opt_largeblocks),
+   {}
+};
+
+/*
+ * Parse a single mount parameter.
+ */
+static int romfs_parse_param(struct fs_context *fc, struct fs_parameter *param)
+{
+   struct fs_parse_result result;
+   int opt;
+
+   opt = fs_parse(fc, romfs_fs_parameters, param, );
+   if (opt < 0)
+   return opt;
+
+   switch (opt) {
+   case Opt_largeblocks:
+   fc->fs_private = (void *) 1;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+/*
+ * pick the largest blocksize which the underlying block device
+ * is a multiple of. Or fall back to legacy (ROMBSIZE).
+ */
+static int romfs_largest_blocksize(struct super_block *sb)
+{
+   loff_t device_sz = i_size_read(sb->s_bdev->bd_inode);
+   int blksz;
+
+   for (blksz = PAGE_SIZE; blksz > ROMBSIZE; blksz >>= 1)
+   if ((device_sz % blksz) == 0)
+   break;
+
+   return blksz;
+}
+
 /*
  * fill in the superblock
  */
@@ -467,17 +515,19 @@ static int romfs_fill_super(struct super_block *sb, 
struct fs_context *fc)
 {
struct romfs_super_block *rsb;
struct inode *root;
-   unsigned long pos, img_size;
+   unsigned long pos, img_size, dev_blocksize;
const char *storage;
size_t len;
int ret;
 
 #ifdef CONFIG_BLOCK
+   dev_blocksize = fc->fs_private ? romfs_largest_blocksize(sb) :
+ROMBSIZE;
if (!sb->s_mtd) {
-   sb_set_blocksize(sb, ROMBSIZE);
+   sb_set_blocksize(sb, dev_blocksize);
} else {
-   sb->s_blocksize = ROMBSIZE;
-   sb->s_blocksize_bits = blksize_bits(ROMBSIZE);
+   sb->s_blocksize = dev_blocksize;
+   sb->s_blocksize_bits = blksize_bits(dev_blocksize);
}
 #endif
 
@@ -573,6 +623,7 @@ 

[PATCH v1 1/2] romfs: use s_blocksize(_bits) if CONFIG_BLOCK

2020-06-22 Thread Sven Van Asbroeck
The super_block fields s_blocksize and s_blocksize_bits always
reflect the actual configured blocksize for a filesystem.

Use these in all calculations where blocksize is required.
This allows us to easily change the blocksize in a later patch.

Note that I cannot determine what happens if !CONFIG_BLOCK, as
I have no access to such a system. Out of an abundance of caution,
I have left all !CONFIG_BLOCK codepaths in their original state.

Cc: Al Viro 
Cc: Deepa Dinamani 
Cc: David Howells 
Cc: "Darrick J. Wong" 
Cc: Janos Farkas 
Cc: Jeff Layton 
To: linux-kernel@vger.kernel.org
Signed-off-by: Sven Van Asbroeck 
---
 fs/romfs/storage.c | 25 +
 fs/romfs/super.c   |  9 -
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/fs/romfs/storage.c b/fs/romfs/storage.c
index 6b2b4362089e..5e84efadac3f 100644
--- a/fs/romfs/storage.c
+++ b/fs/romfs/storage.c
@@ -109,9 +109,9 @@ static int romfs_blk_read(struct super_block *sb, unsigned 
long pos,
 
/* copy the string up to blocksize bytes at a time */
while (buflen > 0) {
-   offset = pos & (ROMBSIZE - 1);
-   segment = min_t(size_t, buflen, ROMBSIZE - offset);
-   bh = sb_bread(sb, pos >> ROMBSBITS);
+   offset = pos & (sb->s_blocksize - 1);
+   segment = min_t(size_t, buflen, sb->s_blocksize - offset);
+   bh = sb_bread(sb, pos >> sb->s_blocksize_bits);
if (!bh)
return -EIO;
memcpy(buf, bh->b_data + offset, segment);
@@ -138,9 +138,9 @@ static ssize_t romfs_blk_strnlen(struct super_block *sb,
 
/* scan the string up to blocksize bytes at a time */
while (limit > 0) {
-   offset = pos & (ROMBSIZE - 1);
-   segment = min_t(size_t, limit, ROMBSIZE - offset);
-   bh = sb_bread(sb, pos >> ROMBSBITS);
+   offset = pos & (sb->s_blocksize - 1);
+   segment = min_t(size_t, limit, sb->s_blocksize - offset);
+   bh = sb_bread(sb, pos >> sb->s_blocksize_bits);
if (!bh)
return -EIO;
buf = bh->b_data + offset;
@@ -170,9 +170,9 @@ static int romfs_blk_strcmp(struct super_block *sb, 
unsigned long pos,
 
/* compare string up to a block at a time */
while (size > 0) {
-   offset = pos & (ROMBSIZE - 1);
-   segment = min_t(size_t, size, ROMBSIZE - offset);
-   bh = sb_bread(sb, pos >> ROMBSBITS);
+   offset = pos & (sb->s_blocksize - 1);
+   segment = min_t(size_t, size, sb->s_blocksize - offset);
+   bh = sb_bread(sb, pos >> sb->s_blocksize_bits);
if (!bh)
return -EIO;
matched = (memcmp(bh->b_data + offset, str, segment) == 0);
@@ -180,7 +180,8 @@ static int romfs_blk_strcmp(struct super_block *sb, 
unsigned long pos,
size -= segment;
pos += segment;
str += segment;
-   if (matched && size == 0 && offset + segment < ROMBSIZE) {
+   if (matched && size == 0 &&
+   offset + segment < sb->s_blocksize) {
if (!bh->b_data[offset + segment])
terminated = true;
else
@@ -194,8 +195,8 @@ static int romfs_blk_strcmp(struct super_block *sb, 
unsigned long pos,
if (!terminated) {
/* the terminating NUL must be on the first byte of the next
 * block */
-   BUG_ON((pos & (ROMBSIZE - 1)) != 0);
-   bh = sb_bread(sb, pos >> ROMBSBITS);
+   BUG_ON((pos & (sb->s_blocksize - 1)) != 0);
+   bh = sb_bread(sb, pos >> sb->s_blocksize_bits);
if (!bh)
return -EIO;
matched = !bh->b_data[0];
diff --git a/fs/romfs/super.c b/fs/romfs/super.c
index e582d001f792..6fecdea791f1 100644
--- a/fs/romfs/super.c
+++ b/fs/romfs/super.c
@@ -411,10 +411,17 @@ static int romfs_statfs(struct dentry *dentry, struct 
kstatfs *buf)
 
buf->f_type = ROMFS_MAGIC;
buf->f_namelen = ROMFS_MAXFN;
-   buf->f_bsize = ROMBSIZE;
buf->f_bfree = buf->f_bavail = buf->f_ffree;
+#ifdef CONFIG_BLOCK
+   buf->f_bsize = sb->s_blocksize;
+   buf->f_blocks =
+   (romfs_maxsize(dentry->d_sb) + sb->s_blocksize - 1) >>
+   sb->s_blocksize_bits;
+#else
+   buf->f_bsize = ROMBSIZE;
buf->f_blocks =
(romfs_maxsize(dentry->d_sb) + ROMBSIZE - 1) >> ROMBSBITS;
+#endif
buf->f_fsid.val[0] = (u32)id;
buf->f_fsid.val[1] = (u32)(id >> 32);
return 0;
-- 
2.17.1



Re: KVM/RCU related warning on latest mainline kernel

2020-06-22 Thread Wanpeng Li
On Sun, 21 Jun 2020 at 17:34, Maxim Levitsky  wrote:
>
> I started to see this warning recently:
>
>
> [  474.827893] [ cut here ]
> [  474.827894] WARNING: CPU: 28 PID: 3984 at kernel/rcu/tree.c:453 
> rcu_is_cpu_rrupt_from_idle+0x29/0x40
> [  474.827894] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio 
> xfs rfcomm xt_MASQUERADE xt_conntrack ipt_REJECT iptable_mangle iptable_nat 
> nf_nat ebtable_filter ebtables ip6table_filter
> ip6_tables tun bridge pmbus pmbus_core cmac ee1004 jc42 bnep sunrpc vfat fat 
> dm_mirror dm_region_hash dm_log iwlmvm wmi_bmof mac80211 libarc4 uvcvideo 
> iwlwifi videobuf2_vmalloc btusb videobuf2_memops
> kvm_amd snd_usb_audio videobuf2_v4l2 videobuf2_common snd_hda_codec_hdmi 
> btrtl kvm btbcm btintel snd_usbmidi_lib snd_hda_intel videodev input_leds 
> joydev snd_rawmidi snd_intel_dspcfg bluetooth mc
> snd_hda_codec cfg80211 snd_hwdep xpad ff_memless thunderbolt snd_seq 
> snd_hda_core irqbypass ecdh_generic i2c_nvidia_gpu efi_pstore ecc pcspkr 
> snd_seq_device rfkill snd_pcm bfq snd_timer i2c_piix4 snd
> zenpower rtc_cmos tpm_crb tpm_tis tpm_tis_core wmi tpm button binfmt_misc 
> dm_crypt sd_mod uas usb_storage hid_generic usbhid hid ext4 mbcache jbd2 
> amdgpu gpu_sched ttm ahci drm_kms_helper syscopyarea
> libahci
> [  474.827913]  sysfillrect crc32_pclmul sysimgblt crc32c_intel fb_sys_fops 
> igb ccp libata xhci_pci cec i2c_algo_bit rng_core nvme xhci_hcd nvme_core drm 
> t10_pi nbd usbmon it87 hwmon_vid fuse i2c_dev
> i2c_core ipv6 autofs4 [last unloaded: nvidia]
> [  474.827918] CPU: 28 PID: 3984 Comm: CPU 0/KVM Tainted: P   O  
> 5.8.0-rc1.stable #118
> [  474.827919] Hardware name: Gigabyte Technology Co., Ltd. TRX40 
> DESIGNARE/TRX40 DESIGNARE, BIOS F4c 03/05/2020
> [  474.827919] RIP: 0010:rcu_is_cpu_rrupt_from_idle+0x29/0x40
> [  474.827920] Code: 00 0f 1f 44 00 00 31 c0 65 48 8b 15 21 1e ea 7e 48 83 fa 
> 01 7f 27 48 85 d2 75 11 65 48 8b 04 25 00 25 01 00 f6 40 24 02 75 02 <0f> 0b 
> 65 48 8b 05 f5 1d ea 7e 48 85 c0 0f 94 c0 0f
> b6 c0 c3 0f 1f
> [  474.827920] RSP: 0018:c99d0e80 EFLAGS: 00010046
> [  474.827921] RAX: 889775e6d580 RBX: c9000476fce8 RCX: 
> 0001
> [  474.827921] RDX:  RSI:  RDI: 
> 
> [  474.827922] RBP:  R08:  R09: 
> 006e802f88c2
> [  474.827922] R10:  R11:  R12: 
> 006e802f8b10
> [  474.827923] R13: 889fbe719280 R14: 889fbe719378 R15: 
> 889fbe7197e0
> [  474.827923] FS:  (0008) GS:889fbe70(0008) 
> knlGS:
> [  474.827924] CS:  0010 DS:  ES:  CR0: 80050033
> [  474.827924] CR2:  CR3: 00176872a000 CR4: 
> 00340ea0
> [  474.827924] Call Trace:
> [  474.827924]  
> [  474.827925]  rcu_sched_clock_irq+0x49/0x500
> [  474.827925]  update_process_times+0x24/0x50
> [  474.827925]  tick_sched_handle.isra.0+0x1f/0x60
> [  474.827926]  tick_sched_timer+0x3b/0x80
> [  474.827926]  ? tick_sched_handle.isra.0+0x60/0x60
> [  474.827926]  __hrtimer_run_queues+0xf3/0x260
> [  474.827927]  hrtimer_interrupt+0x10e/0x240
> [  474.827927]  __sysvec_apic_timer_interrupt+0x51/0xe0
> [  474.827927]  asm_call_on_stack+0xf/0x20
> [  474.827928]  
> [  474.827928]  sysvec_apic_timer_interrupt+0x6c/0x80
> [  474.827928]  asm_sysvec_apic_timer_interrupt+0x12/0x20
> [  474.827929] RIP: 0010:kvm_arch_vcpu_ioctl_run+0xdca/0x1c80 [kvm]
> [  474.827930] Code: f0 41 c7 45 30 00 00 00 00 49 89 85 e8 19 00 00 4c 89 ef 
> ff 15 cf 41 07 00 65 4c 89 2d 07 b6 b4 5d fb 49 83 85 c8 00 00 00 01  65 
> 48 c7 05 f1 b5 b4 5d 00 00 00 00 e9 cc 00 00
> 00 e9 a5 00 00
> [  474.827930] RSP: 0018:c9000476fd90 EFLAGS: 0212
> [  474.827931] RAX: 0004eca3dfff RBX:  RCX: 
> 7b29b4fe
> [  474.827931] RDX: 0001 RSI: fe40717a2b01 RDI: 
> 889f9f46
> [  474.827931] RBP: c9000476fe60 R08:  R09: 
> 
> [  474.827932] R10:  R11:  R12: 
> 
> [  474.827932] R13: 889f9f46 R14: 8000 R15: 
> c900047161d0
> [  474.827932]  kvm_vcpu_ioctl+0x211/0x5b0 [kvm]
> [  474.827933]  ? mprotect_fixup+0x1cf/0x2f0
> [  474.827933]  ksys_ioctl+0x84/0xc0
> [  474.827933]  __x64_sys_ioctl+0x16/0x20
> [  474.827934]  do_syscall_64+0x41/0xc0
> [  474.827934]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  474.827934] RIP: 0033:0x7f39ad07435b
> [  474.827934] Code: Bad RIP value.
> [  474.827935] RSP: 002b:7f39a89c8728 EFLAGS: 0246 ORIG_RAX: 
> 0010
> [  474.827936] RAX: ffda RBX: 561ae9351460 RCX: 
> 7f39ad07435b
> [  474.827936] RDX:  RSI: ae80 RDI: 
> 0016
> [  474.827936] RBP: 7f39a89c8820 R08: 561ae71e2d10 R09: 
> 00ff
> [  474.827937] R10: 561ae6b3dc87 R11: 

Re: kprobe: __blkdev_put probe is missed

2020-06-22 Thread Masami Hiramatsu
On Tue, 23 Jun 2020 08:47:06 +0900
Masami Hiramatsu  wrote:

> On Mon, 22 Jun 2020 09:01:48 -0400
> Steven Rostedt  wrote:
> 
> > On Mon, 22 Jun 2020 08:27:53 +0800
> > Ming Lei  wrote:
> > 
> > > Can you kprobe guys improve the implementation for covering this case?
> > > For example, put probe on 3) in case the above situation is recognized.
> > 
> > To do so would require solving the halting problem.
> > 
> >   https://en.wikipedia.org/wiki/Halting_problem
> > 
> > Or perhaps reading the DWARF output of the compiler to determine if it
> > optimized the location you are looking for.
> 
> As far as I can see, gcc-9.3 doesn't generate this information :(
> Maybe the optimizer forgot to push the tail-call callsite information
> to dwarf generator when making a recursive tail-call to a loop.
> 
> > The first case is impossible to solve, the second would take a lot of
> > work, (are you going to fund it?)
> 
> What I can provide is "--skip-prologue" option for the perf-probe
> which will be similar to the "-P" option. If the compiler correctly
> generates the information, we can enable it automatically. But
> as far as I can see, it doesn't.
> 
> [OT] DWARF has its option(and GNU extension) but it seems not correctly
> implemented yet.
>  
> http://www.dwarfstd.org/ShowIssue.php?issue=100909.2

Oops, sorry, I missed the following sentences.

"Tail calls are jump-like instructions which transfer control to the start
of some subprogram, but the call site location address isn't visible in the
unwind information."

"Tail recursion is a call to the current function which is compiled as a
loop into the middle of the current function."

"The DW_TAG_call_site entries describe normal and tail calls."

This means, the gcc is correctly implemented and this __blkdev_put() case
is NOT covered by DT_TAG_call_site.
So we can not detect it from the debuginfo.

Thank you,

-- 
Masami Hiramatsu 


[PATCH v1 0/2] address romfs performance regression

2020-06-22 Thread Sven Van Asbroeck
Tree: next-20200613

Cc: Al Viro 
Cc: Deepa Dinamani 
Cc: David Howells 
Cc: "Darrick J. Wong" 
Cc: Janos Farkas 
Cc: Jeff Layton 
To: linux-kernel@vger.kernel.org

Sven Van Asbroeck (2):
  romfs: use s_blocksize(_bits) if CONFIG_BLOCK
  romfs: address performance regression since v3.10

 fs/romfs/storage.c | 25 
 fs/romfs/super.c   | 71 ++
 2 files changed, 78 insertions(+), 18 deletions(-)

-- 
2.17.1



[PATCH tip/core/rcu 02/10] x86/mm/pat: Mark an intentional data race

2020-06-22 Thread paulmck
From: Qian Cai 

cpa_4k_install could be accessed concurrently as noticed by KCSAN,

read to 0xaa59a000 of 8 bytes by interrupt on cpu 7:
cpa_inc_4k_install arch/x86/mm/pat/set_memory.c:131 [inline]
__change_page_attr+0x10cf/0x1840 arch/x86/mm/pat/set_memory.c:1514
__change_page_attr_set_clr+0xce/0x490 arch/x86/mm/pat/set_memory.c:1636
__set_pages_np+0xc4/0xf0 arch/x86/mm/pat/set_memory.c:2148
__kernel_map_pages+0xb0/0xc8 arch/x86/mm/pat/set_memory.c:2178
kernel_map_pages include/linux/mm.h:2719 [inline] 

write to 0xaa59a000 of 8 bytes by task 1 on cpu 6:
cpa_inc_4k_install arch/x86/mm/pat/set_memory.c:131 [inline]
__change_page_attr+0x10ea/0x1840 arch/x86/mm/pat/set_memory.c:1514
__change_page_attr_set_clr+0xce/0x490 arch/x86/mm/pat/set_memory.c:1636
__set_pages_p+0xc4/0xf0 arch/x86/mm/pat/set_memory.c:2129
__kernel_map_pages+0x2e/0xc8 arch/x86/mm/pat/set_memory.c:2176
kernel_map_pages include/linux/mm.h:2719 [inline] 

Both accesses are due to the same "cpa_4k_install++" in
cpa_inc_4k_install. A data race here could be potentially undesirable:
depending on compiler optimizations or how x86 executes a non-LOCK'd
increment, it may lose increments, corrupt the counter, etc. Since this
counter only seems to be used for printing some stats, this data race
itself is unlikely to cause harm to the system though. Thus, mark this
intentional data race using the data_race() marco.

Suggested-by: Macro Elver 
Signed-off-by: Qian Cai 
Acked-by: Borislav Petkov 
Signed-off-by: Paul E. McKenney 
---
 arch/x86/mm/pat/set_memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 77e0430..d1b2a88 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -135,7 +135,7 @@ static inline void cpa_inc_2m_checked(void)
 
 static inline void cpa_inc_4k_install(void)
 {
-   cpa_4k_install++;
+   data_race(cpa_4k_install++);
 }
 
 static inline void cpa_inc_lp_sameprot(int level)
-- 
2.9.5



[PATCH tip/core/rcu 08/10] kcsan: Rename test.c to selftest.c

2020-06-22 Thread paulmck
From: Marco Elver 

Rename 'test.c' to 'selftest.c' to better reflect its purpose (Kconfig
variable and code inside already match this). This is to avoid confusion
with the test suite module in 'kcsan-test.c'.

No functional change.

Signed-off-by: Marco Elver 
Signed-off-by: Paul E. McKenney 
---
 kernel/kcsan/Makefile   | 2 +-
 kernel/kcsan/{test.c => selftest.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename kernel/kcsan/{test.c => selftest.c} (100%)

diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
index 14533cf..092ce58 100644
--- a/kernel/kcsan/Makefile
+++ b/kernel/kcsan/Makefile
@@ -11,7 +11,7 @@ CFLAGS_core.o := $(call cc-option,-fno-conserve-stack,) \
$(call cc-option,-fno-stack-protector,)
 
 obj-y := core.o debugfs.o report.o
-obj-$(CONFIG_KCSAN_SELFTEST) += test.o
+obj-$(CONFIG_KCSAN_SELFTEST) += selftest.o
 
 CFLAGS_kcsan-test.o := $(CFLAGS_KCSAN) -g -fno-omit-frame-pointer
 obj-$(CONFIG_KCSAN_TEST) += kcsan-test.o
diff --git a/kernel/kcsan/test.c b/kernel/kcsan/selftest.c
similarity index 100%
rename from kernel/kcsan/test.c
rename to kernel/kcsan/selftest.c
-- 
2.9.5



[PATCH tip/core/rcu 10/10] kcsan: Add jiffies test to test suite

2020-06-22 Thread paulmck
From: Marco Elver 

Add a test that KCSAN nor the compiler gets confused about accesses to
jiffies on different architectures.

Signed-off-by: Marco Elver 
Signed-off-by: Paul E. McKenney 
---
 kernel/kcsan/kcsan-test.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/kernel/kcsan/kcsan-test.c b/kernel/kcsan/kcsan-test.c
index 3af420a..fed6fcb 100644
--- a/kernel/kcsan/kcsan-test.c
+++ b/kernel/kcsan/kcsan-test.c
@@ -366,6 +366,11 @@ static noinline void 
test_kernel_read_struct_zero_size(void)
kcsan_check_read(_struct.val[3], 0);
 }
 
+static noinline void test_kernel_jiffies_reader(void)
+{
+   sink_value((long)jiffies);
+}
+
 static noinline void test_kernel_seqlock_reader(void)
 {
unsigned int seq;
@@ -817,6 +822,23 @@ static void test_assert_exclusive_access_scoped(struct 
kunit *test)
KUNIT_EXPECT_TRUE(test, match_expect_inscope);
 }
 
+/*
+ * jiffies is special (declared to be volatile) and its accesses are typically
+ * not marked; this test ensures that the compiler nor KCSAN gets confused 
about
+ * jiffies's declaration on different architectures.
+ */
+__no_kcsan
+static void test_jiffies_noreport(struct kunit *test)
+{
+   bool match_never = false;
+
+   begin_test_checks(test_kernel_jiffies_reader, 
test_kernel_jiffies_reader);
+   do {
+   match_never = report_available();
+   } while (!end_test_checks(match_never));
+   KUNIT_EXPECT_FALSE(test, match_never);
+}
+
 /* Test that racing accesses in seqlock critical sections are not reported. */
 __no_kcsan
 static void test_seqlock_noreport(struct kunit *test)
@@ -867,6 +889,7 @@ static struct kunit_case kcsan_test_cases[] = {
KCSAN_KUNIT_CASE(test_assert_exclusive_bits_nochange),
KCSAN_KUNIT_CASE(test_assert_exclusive_writer_scoped),
KCSAN_KUNIT_CASE(test_assert_exclusive_access_scoped),
+   KCSAN_KUNIT_CASE(test_jiffies_noreport),
KCSAN_KUNIT_CASE(test_seqlock_noreport),
{},
 };
-- 
2.9.5



[PATCH kcsan 0/10] KCSAN updates for v5.9

2020-06-22 Thread Paul E. McKenney
Hello!

This series provides KCSAN updates:

1.  Annotate a data race in vm_area_dup(), courtesy of Qian Cai.

2.  x86/mm/pat: Mark an intentional data race, courtesy of Qian Cai.

3.  Add ASSERT_EXCLUSIVE_ACCESS() to __list_splice_init_rcu().

4.  Add test suite, courtesy of Marco Elver.

5.  locking/osq_lock: Annotate a data race in osq_lock.

6.  Prefer '__no_kcsan inline' in test, courtesy of Marco Elver.

7.  Silence -Wmissing-prototypes warning with W=1, courtesy of Qian Cai.

8.  Rename test.c to selftest.c, courtesy of Marco Elver.

9.  Remove existing special atomic rules, courtesy of Marco Elver.

10. Add jiffies test to test suite, courtesy of Marco Elver.

Thanx, Paul



 arch/x86/mm/pat/set_memory.c |2 
 include/linux/rculist.h  |2 
 kernel/fork.c|8 
 kernel/kcsan/Makefile|5 
 kernel/kcsan/atomic.h|6 
 kernel/kcsan/core.c  |9 
 kernel/kcsan/kcsan-test.c|  ++-
 kernel/kcsan/selftest.c  |1 
 kernel/locking/osq_lock.c|6 
 lib/Kconfig.kcsan|   23 
 10 files changed, 1161 insertions(+), 12 deletions(-)


[PATCH tip/core/rcu 05/10] locking/osq_lock: Annotate a data race in osq_lock

2020-06-22 Thread paulmck
From: Qian Cai 

The prev->next pointer can be accessed concurrently as noticed by KCSAN:

 write (marked) to 0x9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
  osq_lock+0x25f/0x350
  osq_wait_next at kernel/locking/osq_lock.c:79
  (inlined by) osq_lock at kernel/locking/osq_lock.c:185
  rwsem_optimistic_spin
  

 read to 0x9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
  osq_lock+0x196/0x350
  osq_lock at kernel/locking/osq_lock.c:157
  rwsem_optimistic_spin
  

Since the write only stores NULL to prev->next and the read tests if
prev->next equals to this_cpu_ptr(_node). Even if the value is
shattered, the code is still working correctly. Thus, mark it as an
intentional data race using the data_race() macro.

Signed-off-by: Qian Cai 
Signed-off-by: Paul E. McKenney 
---
 kernel/locking/osq_lock.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 1f77349..1de006e 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 */
 
for (;;) {
-   if (prev->next == node &&
+   /*
+* cpu_relax() below implies a compiler barrier which would
+* prevent this comparison being optimized away.
+*/
+   if (data_race(prev->next) == node &&
cmpxchg(>next, node, NULL) == node)
break;
 
-- 
2.9.5



[PATCH tip/core/rcu 03/10] rculist: Add ASSERT_EXCLUSIVE_ACCESS() to __list_splice_init_rcu()

2020-06-22 Thread paulmck
From: "Paul E. McKenney" 

After the sync() in __list_splice_init_rcu(), there should be no
readers traversing the old list.  This commit therefore enlists the
help of KCSAN to verify this condition via a pair of calls to
ASSERT_EXCLUSIVE_ACCESS().

Signed-off-by: Paul E. McKenney 
Cc: Marco Elver 
---
 include/linux/rculist.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index df587d1..2ebd112 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -248,6 +248,8 @@ static inline void __list_splice_init_rcu(struct list_head 
*list,
 */
 
sync();
+   ASSERT_EXCLUSIVE_ACCESS(*first);
+   ASSERT_EXCLUSIVE_ACCESS(*last);
 
/*
 * Readers are finished with the source list, so perform splice.
-- 
2.9.5



[PATCH tip/core/rcu 01/10] fork: Annotate a data race in vm_area_dup()

2020-06-22 Thread paulmck
From: Qian Cai 

struct vm_area_struct could be accessed concurrently as noticed by
KCSAN,

 write to 0x9cf8bba08ad8 of 8 bytes by task 14263 on cpu 35:
  vma_interval_tree_insert+0x101/0x150:
  rb_insert_augmented_cached at include/linux/rbtree_augmented.h:58
  (inlined by) vma_interval_tree_insert at mm/interval_tree.c:23
  __vma_link_file+0x6e/0xe0
  __vma_link_file at mm/mmap.c:629
  vma_link+0xa2/0x120
  mmap_region+0x753/0xb90
  do_mmap+0x45c/0x710
  vm_mmap_pgoff+0xc0/0x130
  ksys_mmap_pgoff+0x1d1/0x300
  __x64_sys_mmap+0x33/0x40
  do_syscall_64+0x91/0xc44
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 read to 0x9cf8bba08a80 of 200 bytes by task 14262 on cpu 122:
  vm_area_dup+0x6a/0xe0
  vm_area_dup at kernel/fork.c:362
  __split_vma+0x72/0x2a0
  __split_vma at mm/mmap.c:2661
  split_vma+0x5a/0x80
  mprotect_fixup+0x368/0x3f0
  do_mprotect_pkey+0x263/0x420
  __x64_sys_mprotect+0x51/0x70
  do_syscall_64+0x91/0xc44
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

vm_area_dup() blindly copies all fields of original VMA to the new one.
This includes coping vm_area_struct::shared.rb which is normally
protected by i_mmap_lock. But this is fine because the read value will
be overwritten on the following __vma_link_file() under proper
protection. Thus, mark it as an intentional data race and insert a few
assertions for the fields that should not be modified concurrently.

Signed-off-by: Qian Cai 
Signed-off-by: Paul E. McKenney 
---
 kernel/fork.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 142b236..bba10fb 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -359,7 +359,13 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct 
*orig)
struct vm_area_struct *new = kmem_cache_alloc(vm_area_cachep, 
GFP_KERNEL);
 
if (new) {
-   *new = *orig;
+   ASSERT_EXCLUSIVE_WRITER(orig->vm_flags);
+   ASSERT_EXCLUSIVE_WRITER(orig->vm_file);
+   /*
+* orig->shared.rb may be modified concurrently, but the clone
+* will be reinitialized.
+*/
+   *new = data_race(*orig);
INIT_LIST_HEAD(>anon_vma_chain);
new->vm_next = new->vm_prev = NULL;
}
-- 
2.9.5



[PATCH tip/core/rcu 07/10] kcsan: Silence -Wmissing-prototypes warning with W=1

2020-06-22 Thread paulmck
From: Marco Elver 

The functions here should not be forward declared for explicit use
elsewhere in the kernel, as they should only be emitted by the compiler
due to sanitizer instrumentation.  Add forward declarations a line above
their definition to shut up warnings in W=1 builds.

Link: https://lkml.kernel.org/r/202006060103.jscpnv1g%...@intel.com
Reported-by: kernel test robot 
Signed-off-by: Marco Elver 
Signed-off-by: Paul E. McKenney 
---
 kernel/kcsan/core.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
index 15f6794..1866baf 100644
--- a/kernel/kcsan/core.c
+++ b/kernel/kcsan/core.c
@@ -754,6 +754,7 @@ EXPORT_SYMBOL(__kcsan_check_access);
  */
 
 #define DEFINE_TSAN_READ_WRITE(size)   
\
+   void __tsan_read##size(void *ptr); \
void __tsan_read##size(void *ptr)  \
{  \
check_access(ptr, size, 0);\
@@ -762,6 +763,7 @@ EXPORT_SYMBOL(__kcsan_check_access);
void __tsan_unaligned_read##size(void *ptr)\
__alias(__tsan_read##size);\
EXPORT_SYMBOL(__tsan_unaligned_read##size);\
+   void __tsan_write##size(void *ptr);\
void __tsan_write##size(void *ptr) \
{  \
check_access(ptr, size, KCSAN_ACCESS_WRITE);   \
@@ -777,12 +779,14 @@ DEFINE_TSAN_READ_WRITE(4);
 DEFINE_TSAN_READ_WRITE(8);
 DEFINE_TSAN_READ_WRITE(16);
 
+void __tsan_read_range(void *ptr, size_t size);
 void __tsan_read_range(void *ptr, size_t size)
 {
check_access(ptr, size, 0);
 }
 EXPORT_SYMBOL(__tsan_read_range);
 
+void __tsan_write_range(void *ptr, size_t size);
 void __tsan_write_range(void *ptr, size_t size)
 {
check_access(ptr, size, KCSAN_ACCESS_WRITE);
@@ -799,6 +803,7 @@ EXPORT_SYMBOL(__tsan_write_range);
  * the size-check of compiletime_assert_rwonce_type().
  */
 #define DEFINE_TSAN_VOLATILE_READ_WRITE(size)  
\
+   void __tsan_volatile_read##size(void *ptr);\
void __tsan_volatile_read##size(void *ptr) \
{  \
const bool is_atomic = size <= sizeof(long long) &&\
@@ -811,6 +816,7 @@ EXPORT_SYMBOL(__tsan_write_range);
void __tsan_unaligned_volatile_read##size(void *ptr)   \
__alias(__tsan_volatile_read##size);   \
EXPORT_SYMBOL(__tsan_unaligned_volatile_read##size);   \
+   void __tsan_volatile_write##size(void *ptr);   \
void __tsan_volatile_write##size(void *ptr)\
{  \
const bool is_atomic = size <= sizeof(long long) &&\
@@ -836,14 +842,17 @@ DEFINE_TSAN_VOLATILE_READ_WRITE(16);
  * The below are not required by KCSAN, but can still be emitted by the
  * compiler.
  */
+void __tsan_func_entry(void *call_pc);
 void __tsan_func_entry(void *call_pc)
 {
 }
 EXPORT_SYMBOL(__tsan_func_entry);
+void __tsan_func_exit(void);
 void __tsan_func_exit(void)
 {
 }
 EXPORT_SYMBOL(__tsan_func_exit);
+void __tsan_init(void);
 void __tsan_init(void)
 {
 }
-- 
2.9.5



[PATCH tip/core/rcu 09/10] kcsan: Remove existing special atomic rules

2020-06-22 Thread paulmck
From: Marco Elver 

Remove existing special atomic rules from kcsan_is_atomic_special()
because they are no longer needed. Since we rely on the compiler
emitting instrumentation distinguishing volatile accesses, the rules
have become redundant.

Let's keep kcsan_is_atomic_special() around, so that we have an obvious
place to add special rules should the need arise in future.

Signed-off-by: Marco Elver 
Signed-off-by: Paul E. McKenney 
---
 kernel/kcsan/atomic.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/kcsan/atomic.h b/kernel/kcsan/atomic.h
index be9e625..75fe701 100644
--- a/kernel/kcsan/atomic.h
+++ b/kernel/kcsan/atomic.h
@@ -3,8 +3,7 @@
 #ifndef _KERNEL_KCSAN_ATOMIC_H
 #define _KERNEL_KCSAN_ATOMIC_H
 
-#include 
-#include 
+#include 
 
 /*
  * Special rules for certain memory where concurrent conflicting accesses are
@@ -13,8 +12,7 @@
  */
 static bool kcsan_is_atomic_special(const volatile void *ptr)
 {
-   /* volatile globals that have been observed in data races. */
-   return ptr ==  || ptr == >state;
+   return false;
 }
 
 #endif /* _KERNEL_KCSAN_ATOMIC_H */
-- 
2.9.5



[PATCH tip/core/rcu 04/10] kcsan: Add test suite

2020-06-22 Thread paulmck
From: Marco Elver 

This adds KCSAN test focusing on behaviour of the integrated runtime.
Tests various race scenarios, and verifies the reports generated to
console. Makes use of KUnit for test organization, and the Torture
framework for test thread control.

Signed-off-by: Marco Elver 
Signed-off-by: Paul E. McKenney 
---
 kernel/kcsan/Makefile |3 +
 kernel/kcsan/kcsan-test.c | 1084 +
 lib/Kconfig.kcsan |   23 +-
 3 files changed, 1109 insertions(+), 1 deletion(-)
 create mode 100644 kernel/kcsan/kcsan-test.c

diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
index d4999b3..14533cf 100644
--- a/kernel/kcsan/Makefile
+++ b/kernel/kcsan/Makefile
@@ -12,3 +12,6 @@ CFLAGS_core.o := $(call cc-option,-fno-conserve-stack,) \
 
 obj-y := core.o debugfs.o report.o
 obj-$(CONFIG_KCSAN_SELFTEST) += test.o
+
+CFLAGS_kcsan-test.o := $(CFLAGS_KCSAN) -g -fno-omit-frame-pointer
+obj-$(CONFIG_KCSAN_TEST) += kcsan-test.o
diff --git a/kernel/kcsan/kcsan-test.c b/kernel/kcsan/kcsan-test.c
new file mode 100644
index 000..a8c1150
--- /dev/null
+++ b/kernel/kcsan/kcsan-test.c
@@ -0,0 +1,1084 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KCSAN test with various race scenarious to test runtime behaviour. Since the
+ * interface with which KCSAN's reports are obtained is via the console, this 
is
+ * the output we should verify. For each test case checks the presence (or
+ * absence) of generated reports. Relies on 'console' tracepoint to capture
+ * reports as they appear in the kernel log.
+ *
+ * Makes use of KUnit for test organization, and the Torture framework for test
+ * thread control.
+ *
+ * Copyright (C) 2020, Google LLC.
+ * Author: Marco Elver 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Points to current test-case memory access "kernels". */
+static void (*access_kernels[2])(void);
+
+static struct task_struct **threads; /* Lists of threads. */
+static unsigned long end_time;   /* End time of test. */
+
+/* Report as observed from console. */
+static struct {
+   spinlock_t lock;
+   int nlines;
+   char lines[3][512];
+} observed = {
+   .lock = __SPIN_LOCK_UNLOCKED(observed.lock),
+};
+
+/* Setup test checking loop. */
+static __no_kcsan_or_inline void
+begin_test_checks(void (*func1)(void), void (*func2)(void))
+{
+   kcsan_disable_current();
+
+   /*
+* Require at least as long as KCSAN_REPORT_ONCE_IN_MS, to ensure at
+* least one race is reported.
+*/
+   end_time = jiffies + msecs_to_jiffies(CONFIG_KCSAN_REPORT_ONCE_IN_MS + 
500);
+
+   /* Signal start; release potential initialization of shared data. */
+   smp_store_release(_kernels[0], func1);
+   smp_store_release(_kernels[1], func2);
+}
+
+/* End test checking loop. */
+static __no_kcsan_or_inline bool
+end_test_checks(bool stop)
+{
+   if (!stop && time_before(jiffies, end_time)) {
+   /* Continue checking */
+   might_sleep();
+   return false;
+   }
+
+   kcsan_enable_current();
+   return true;
+}
+
+/*
+ * Probe for console output: checks if a race was reported, and obtains 
observed
+ * lines of interest.
+ */
+__no_kcsan
+static void probe_console(void *ignore, const char *buf, size_t len)
+{
+   unsigned long flags;
+   int nlines;
+
+   /*
+* Note that KCSAN reports under a global lock, so we do not risk the
+* possibility of having multiple reports interleaved. If that were the
+* case, we'd expect tests to fail.
+*/
+
+   spin_lock_irqsave(, flags);
+   nlines = observed.nlines;
+
+   if (strnstr(buf, "BUG: KCSAN: ", len) && strnstr(buf, "test_", len)) {
+   /*
+* KCSAN report and related to the test.
+*
+* The provided @buf is not NUL-terminated; copy no more than
+* @len bytes and let strscpy() add the missing NUL-terminator.
+*/
+   strscpy(observed.lines[0], buf, min(len + 1, 
sizeof(observed.lines[0])));
+   nlines = 1;
+   } else if ((nlines == 1 || nlines == 2) && strnstr(buf, "bytes by", 
len)) {
+   strscpy(observed.lines[nlines++], buf, min(len + 1, 
sizeof(observed.lines[0])));
+
+   if (strnstr(buf, "race at unknown origin", len)) {
+   if (WARN_ON(nlines != 2))
+   goto out;
+
+   /* No second line of interest. */
+   strcpy(observed.lines[nlines++], "");
+   }
+   }
+
+out:
+   WRITE_ONCE(observed.nlines, nlines); /* Publish new nlines. */
+   spin_unlock_irqrestore(, flags);
+}
+
+/* Check if a report related to the test exists. */
+__no_kcsan
+static bool report_available(void)
+{
+   return 

[PATCH v1 0/2] address romfs performance regression

2020-06-22 Thread Sven Van Asbroeck
Tree: next-20200613

Cc: Al Viro 
Cc: Deepa Dinamani 
Cc: David Howells 
Cc: "Darrick J. Wong" 
Cc: Janos Farkas 
Cc: Jeff Layton 
To: linux-kernel@vger.kernel.org

Sven Van Asbroeck (2):
  romfs: use s_blocksize(_bits) if CONFIG_BLOCK
  romfs: address performance regression since v3.10

 fs/romfs/storage.c | 25 
 fs/romfs/super.c   | 71 ++
 2 files changed, 78 insertions(+), 18 deletions(-)

-- 
2.17.1



Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]

2020-06-22 Thread Anchal Agarwal
On Mon, Jun 22, 2020 at 10:38:46AM +0200, Roger Pau Monné wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you can confirm the sender and know the 
> content is safe.
> 
> 
> 
> On Fri, Jun 19, 2020 at 11:43:12PM +, Anchal Agarwal wrote:
> > On Wed, Jun 17, 2020 at 10:35:28AM +0200, Roger Pau Monné wrote:
> > > CAUTION: This email originated from outside of the organization. Do not 
> > > click links or open attachments unless you can confirm the sender and 
> > > know the content is safe.
> > >
> > >
> > >
> > > On Tue, Jun 16, 2020 at 09:49:25PM +, Anchal Agarwal wrote:
> > > > On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monné wrote:
> > > > > CAUTION: This email originated from outside of the organization. Do 
> > > > > not click links or open attachments unless you can confirm the sender 
> > > > > and know the content is safe.
> > > > > On Wed, Jun 03, 2020 at 11:33:52PM +, Agarwal, Anchal wrote:
> > > > > >  CAUTION: This email originated from outside of the organization. 
> > > > > > Do not click links or open attachments unless you can confirm the 
> > > > > > sender and know the content is safe.
> > > > > > > + xenbus_dev_error(dev, err, "Freezing timed out;"
> > > > > > > +  "the device may become 
> > > > > > inconsistent state");
> > > > > >
> > > > > > Leaving the device in this state is quite bad, as it's in a 
> > > > > > closed
> > > > > > state and with the queues frozen. You should make an attempt to
> > > > > > restore things to a working state.
> > > > > >
> > > > > > You mean if backend closed after timeout? Is there a way to know 
> > > > > > that? I understand it's not good to
> > > > > > leave it in this state however, I am still trying to find if there 
> > > > > > is a good way to know if backend is still connected after timeout.
> > > > > > Hence the message " the device may become inconsistent state".  I 
> > > > > > didn't see a timeout not even once on my end so that's why
> > > > > > I may be looking for an alternate perspective here. may be need to 
> > > > > > thaw everything back intentionally is one thing I could think of.
> > > > >
> > > > > You can manually force this state, and then check that it will behave
> > > > > correctly. I would expect that on a failure to disconnect from the
> > > > > backend you should switch the frontend to the 'Init' state in order to
> > > > > try to reconnect to the backend when possible.
> > > > >
> > > > From what I understand forcing manually is, failing the freeze without
> > > > disconnect and try to revive the connection by unfreezing the
> > > > queues->reconnecting to backend [which never got diconnected]. May be 
> > > > even
> > > > tearing down things manually because I am not sure what state will 
> > > > frontend
> > > > see if backend fails to to disconnect at any point in time. I assumed 
> > > > connected.
> > > > Then again if its "CONNECTED" I may not need to tear down everything 
> > > > and start
> > > > from Initialising state because that may not work.
> > > >
> > > > So I am not so sure about backend's state so much, lets say if  
> > > > xen_blkif_disconnect fail,
> > > > I don't see it getting handled in the backend then what will be 
> > > > backend's state?
> > > > Will it still switch xenbus state to 'Closed'? If not what will 
> > > > frontend see,
> > > > if it tries to read backend's state through xenbus_read_driver_state ?
> > > >
> > > > So the flow be like:
> > > > Front end marks XenbusStateClosing
> > > > Backend marks its state as XenbusStateClosing
> > > > Frontend marks XenbusStateClosed
> > > > Backend disconnects calls xen_blkif_disconnect
> > > >Backend fails to disconnect, the above function returns EBUSY
> > > >What will be state of backend here?
> > >
> > > Backend should stay in state 'Closing' then, until it can finish
> > > tearing down.
> > >
> > It disconnects the ring after switching to connected state too.
> > > >Frontend did not tear down the rings if backend does not 
> > > > switches the
> > > >state to 'Closed' in case of failure.
> > > >
> > > > If backend stays in CONNECTED state, then even if we mark it 
> > > > Initialised in frontend, backend
> > >
> > > Backend will stay in state 'Closing' I think.
> > >
> > > > won't be calling connect(). {From reading code in frontend_changed}
> > > > IMU, Initialising will fail since backend dev->state != 
> > > > XenbusStateClosed plus
> > > > we did not tear down anything so calling talk_to_blkback may not be 
> > > > needed
> > > >
> > > > Does that sound correct?
> > >
> > > I think switching to the initial state in order to try to attempt a
> > > reconnection would be our best bet here.
> > >
> > It does not seems to work correctly, I get hung tasks all over and all the
> > requests to filesystem gets stuck. Backend does shows the state as 

[PATCH tip/core/rcu 06/10] kcsan: Prefer '__no_kcsan inline' in test

2020-06-22 Thread paulmck
From: Marco Elver 

Instead of __no_kcsan_or_inline, prefer '__no_kcsan inline' in test --
this is in case we decide to remove __no_kcsan_or_inline.

Suggested-by: Peter Zijlstra 
Signed-off-by: Marco Elver 
Signed-off-by: Paul E. McKenney 
---
 kernel/kcsan/kcsan-test.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/kcsan/kcsan-test.c b/kernel/kcsan/kcsan-test.c
index a8c1150..3af420a 100644
--- a/kernel/kcsan/kcsan-test.c
+++ b/kernel/kcsan/kcsan-test.c
@@ -43,7 +43,7 @@ static struct {
 };
 
 /* Setup test checking loop. */
-static __no_kcsan_or_inline void
+static __no_kcsan inline void
 begin_test_checks(void (*func1)(void), void (*func2)(void))
 {
kcsan_disable_current();
@@ -60,7 +60,7 @@ begin_test_checks(void (*func1)(void), void (*func2)(void))
 }
 
 /* End test checking loop. */
-static __no_kcsan_or_inline bool
+static __no_kcsan inline bool
 end_test_checks(bool stop)
 {
if (!stop && time_before(jiffies, end_time)) {
-- 
2.9.5



[PATCH v1 2/2] romfs: address performance regression since v3.10

2020-06-22 Thread Sven Van Asbroeck
Problem
---
romfs sequential read performance has regressed very badly since
v3.10. Currently, reading a large file inside a romfs image is
up to 12x slower compared to reading the romfs image directly.

Benchmarks:
- use a romfs image which contains a single 250M file
- calculate the md5sum of the romfs image directly (test 1)
  $ time md5sum image.romfs
- loop-mount the romfs image, and calc the md5sum of the file
  inside it (test 2)
  $ mount -o loop,ro image.romfs /mnt/romfs
  $ time md5sum /mnt/romfs/file
- drop caches in between
  $ echo 3 > /proc/sys/vm/drop_caches

imx6 (arm cortex a9) on emmc, running v5.7.2:
(test 1)  5 seconds
(test 2) 60 seconds (12x slower)

Intel i7-3630QM on Samsung SSD 850 EVO (EMT02B6Q),
running Ubuntu with v4.15.0-106-generic:
(test 1) 1.3 seconds
(test 2) 3.3 seconds (2.5x slower)

To show that a regression has occurred since v3.10:

imx6 on emmc, running v3.10.17:
(test 1) 16 seconds
(test 2) 18 seconds

Proposed Solution
-
Increase the blocksize from 1K to PAGE_SIZE. This brings the
sequential read performance close to where it was on v3.10:

imx6 on emmc, running v5.7.2:
(test 2 1K blocksize) 60 seconds
(test 2 4K blocksize) 22 seconds

Intel on Ubuntu running v4.15:
(test 2 1K blocksize) 3.3 seconds
(test 2 4K blocksize) 1.9 seconds

There is a risk that this may increase latency on random-
access workloads. But the test below suggests that this
is not a concern:

Benchmark:
- use a 630M romfs image consisting of 9600 files
- loop-mount the romfs image
  $ mount -o loop,ro image.romfs /mnt/romfs
- drop all caches
- list all files in the filesystem (test 3)
  $ time find /mnt/romfs > /dev/null

imx6 on emmc, running v5.7.2:
(test 3 1K blocksize) 9.5 seconds
(test 3 4K blocksize) 9   seconds

Intel on Ubuntu, running v4.15:
(test 3 1K blocksize) 1.4 seconds
(test 3 4K blocksize) 1.2 seconds

Practical Solution
--
Introduce a mount-option called 'largeblocks'. If present,
increase the blocksize for much better sequential performance.

Note that the Linux block layer can only support n-K blocks if
the underlying block device length is also aligned to n-K. This
may not always be the case. Therefore, the driver will pick the
largest blocksize which the underlying block device can support.

Signed-off-by: Sven Van Asbroeck 
---
 fs/romfs/super.c | 62 
 1 file changed, 57 insertions(+), 5 deletions(-)

diff --git a/fs/romfs/super.c b/fs/romfs/super.c
index 6fecdea791f1..93565aeaa43c 100644
--- a/fs/romfs/super.c
+++ b/fs/romfs/super.c
@@ -65,7 +65,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -460,6 +460,54 @@ static __u32 romfs_checksum(const void *data, int size)
return sum;
 }
 
+enum romfs_param {
+   Opt_largeblocks,
+};
+
+static const struct fs_parameter_spec romfs_fs_parameters[] = {
+   fsparam_flag("largeblocks", Opt_largeblocks),
+   {}
+};
+
+/*
+ * Parse a single mount parameter.
+ */
+static int romfs_parse_param(struct fs_context *fc, struct fs_parameter *param)
+{
+   struct fs_parse_result result;
+   int opt;
+
+   opt = fs_parse(fc, romfs_fs_parameters, param, );
+   if (opt < 0)
+   return opt;
+
+   switch (opt) {
+   case Opt_largeblocks:
+   fc->fs_private = (void *) 1;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+/*
+ * pick the largest blocksize which the underlying block device
+ * is a multiple of. Or fall back to legacy (ROMBSIZE).
+ */
+static int romfs_largest_blocksize(struct super_block *sb)
+{
+   loff_t device_sz = i_size_read(sb->s_bdev->bd_inode);
+   int blksz;
+
+   for (blksz = PAGE_SIZE; blksz > ROMBSIZE; blksz >>= 1)
+   if ((device_sz % blksz) == 0)
+   break;
+
+   return blksz;
+}
+
 /*
  * fill in the superblock
  */
@@ -467,17 +515,19 @@ static int romfs_fill_super(struct super_block *sb, 
struct fs_context *fc)
 {
struct romfs_super_block *rsb;
struct inode *root;
-   unsigned long pos, img_size;
+   unsigned long pos, img_size, dev_blocksize;
const char *storage;
size_t len;
int ret;
 
 #ifdef CONFIG_BLOCK
+   dev_blocksize = fc->fs_private ? romfs_largest_blocksize(sb) :
+ROMBSIZE;
if (!sb->s_mtd) {
-   sb_set_blocksize(sb, ROMBSIZE);
+   sb_set_blocksize(sb, dev_blocksize);
} else {
-   sb->s_blocksize = ROMBSIZE;
-   sb->s_blocksize_bits = blksize_bits(ROMBSIZE);
+   sb->s_blocksize = dev_blocksize;
+   sb->s_blocksize_bits = blksize_bits(dev_blocksize);
}
 #endif
 
@@ -573,6 +623,7 @@ static int romfs_get_tree(struct fs_context *fc)
 static const struct fs_context_operations romfs_context_ops = {
.get_tree   = 

[PATCH v1 1/2] romfs: use s_blocksize(_bits) if CONFIG_BLOCK

2020-06-22 Thread Sven Van Asbroeck
The super_block fields s_blocksize and s_blocksize_bits always
reflect the actual configured blocksize for a filesystem.

Use these in all calculations where blocksize is required.
This allows us to easily change the blocksize in a later patch.

Note that I cannot determine what happens if !CONFIG_BLOCK, as
I have no access to such a system. Out of an abundance of caution,
I have left all !CONFIG_BLOCK codepaths in their original state.

Signed-off-by: Sven Van Asbroeck 
---
 fs/romfs/storage.c | 25 +
 fs/romfs/super.c   |  9 -
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/fs/romfs/storage.c b/fs/romfs/storage.c
index 6b2b4362089e..5e84efadac3f 100644
--- a/fs/romfs/storage.c
+++ b/fs/romfs/storage.c
@@ -109,9 +109,9 @@ static int romfs_blk_read(struct super_block *sb, unsigned 
long pos,
 
/* copy the string up to blocksize bytes at a time */
while (buflen > 0) {
-   offset = pos & (ROMBSIZE - 1);
-   segment = min_t(size_t, buflen, ROMBSIZE - offset);
-   bh = sb_bread(sb, pos >> ROMBSBITS);
+   offset = pos & (sb->s_blocksize - 1);
+   segment = min_t(size_t, buflen, sb->s_blocksize - offset);
+   bh = sb_bread(sb, pos >> sb->s_blocksize_bits);
if (!bh)
return -EIO;
memcpy(buf, bh->b_data + offset, segment);
@@ -138,9 +138,9 @@ static ssize_t romfs_blk_strnlen(struct super_block *sb,
 
/* scan the string up to blocksize bytes at a time */
while (limit > 0) {
-   offset = pos & (ROMBSIZE - 1);
-   segment = min_t(size_t, limit, ROMBSIZE - offset);
-   bh = sb_bread(sb, pos >> ROMBSBITS);
+   offset = pos & (sb->s_blocksize - 1);
+   segment = min_t(size_t, limit, sb->s_blocksize - offset);
+   bh = sb_bread(sb, pos >> sb->s_blocksize_bits);
if (!bh)
return -EIO;
buf = bh->b_data + offset;
@@ -170,9 +170,9 @@ static int romfs_blk_strcmp(struct super_block *sb, 
unsigned long pos,
 
/* compare string up to a block at a time */
while (size > 0) {
-   offset = pos & (ROMBSIZE - 1);
-   segment = min_t(size_t, size, ROMBSIZE - offset);
-   bh = sb_bread(sb, pos >> ROMBSBITS);
+   offset = pos & (sb->s_blocksize - 1);
+   segment = min_t(size_t, size, sb->s_blocksize - offset);
+   bh = sb_bread(sb, pos >> sb->s_blocksize_bits);
if (!bh)
return -EIO;
matched = (memcmp(bh->b_data + offset, str, segment) == 0);
@@ -180,7 +180,8 @@ static int romfs_blk_strcmp(struct super_block *sb, 
unsigned long pos,
size -= segment;
pos += segment;
str += segment;
-   if (matched && size == 0 && offset + segment < ROMBSIZE) {
+   if (matched && size == 0 &&
+   offset + segment < sb->s_blocksize) {
if (!bh->b_data[offset + segment])
terminated = true;
else
@@ -194,8 +195,8 @@ static int romfs_blk_strcmp(struct super_block *sb, 
unsigned long pos,
if (!terminated) {
/* the terminating NUL must be on the first byte of the next
 * block */
-   BUG_ON((pos & (ROMBSIZE - 1)) != 0);
-   bh = sb_bread(sb, pos >> ROMBSBITS);
+   BUG_ON((pos & (sb->s_blocksize - 1)) != 0);
+   bh = sb_bread(sb, pos >> sb->s_blocksize_bits);
if (!bh)
return -EIO;
matched = !bh->b_data[0];
diff --git a/fs/romfs/super.c b/fs/romfs/super.c
index e582d001f792..6fecdea791f1 100644
--- a/fs/romfs/super.c
+++ b/fs/romfs/super.c
@@ -411,10 +411,17 @@ static int romfs_statfs(struct dentry *dentry, struct 
kstatfs *buf)
 
buf->f_type = ROMFS_MAGIC;
buf->f_namelen = ROMFS_MAXFN;
-   buf->f_bsize = ROMBSIZE;
buf->f_bfree = buf->f_bavail = buf->f_ffree;
+#ifdef CONFIG_BLOCK
+   buf->f_bsize = sb->s_blocksize;
+   buf->f_blocks =
+   (romfs_maxsize(dentry->d_sb) + sb->s_blocksize - 1) >>
+   sb->s_blocksize_bits;
+#else
+   buf->f_bsize = ROMBSIZE;
buf->f_blocks =
(romfs_maxsize(dentry->d_sb) + ROMBSIZE - 1) >> ROMBSBITS;
+#endif
buf->f_fsid.val[0] = (u32)id;
buf->f_fsid.val[1] = (u32)(id >> 32);
return 0;
-- 
2.17.1



[PATCH 1/2] media: omap3isp: Remove cacheflush.h

2020-06-22 Thread Nathan Chancellor
After mm.h was removed from the asm-generic version of cacheflush.h,
s390 allyesconfig shows several warnings of the following nature:

In file included from ./arch/s390/include/generated/asm/cacheflush.h:1,
 from drivers/media/platform/omap3isp/isp.c:42:
./include/asm-generic/cacheflush.h:16:42: warning: 'struct mm_struct'
declared inside parameter list will not be visible outside of this
definition or declaration

As Geert and Laurent point out, this driver does not need this header in
the two files that include it. Remove it so there are no warnings.

Fixes: e0cf615d725c ("asm-generic: don't include  in cacheflush.h")
Suggested-by: Geert Uytterhoeven 
Suggested-by: Laurent Pinchart 
Signed-off-by: Nathan Chancellor 
---
 drivers/media/platform/omap3isp/isp.c  | 2 --
 drivers/media/platform/omap3isp/ispvideo.c | 1 -
 2 files changed, 3 deletions(-)

diff --git a/drivers/media/platform/omap3isp/isp.c 
b/drivers/media/platform/omap3isp/isp.c
index a4ee6b86663e..b91e472ee764 100644
--- a/drivers/media/platform/omap3isp/isp.c
+++ b/drivers/media/platform/omap3isp/isp.c
@@ -39,8 +39,6 @@
  * Troy Laramy 
  */
 
-#include 
-
 #include 
 #include 
 #include 
diff --git a/drivers/media/platform/omap3isp/ispvideo.c 
b/drivers/media/platform/omap3isp/ispvideo.c
index 10c214bd0903..1ac9aef70dff 100644
--- a/drivers/media/platform/omap3isp/ispvideo.c
+++ b/drivers/media/platform/omap3isp/ispvideo.c
@@ -18,7 +18,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 

base-commit: 27f11fea33608cbd321a97cbecfa2ef97dcc1821
-- 
2.27.0



[PATCH 2/2] asm-generic: Make cacheflush.h self-contained

2020-06-22 Thread Nathan Chancellor
Currently, cacheflush.h has to be included after mm.h to avoid several
-Wvisibility warnings:

$ clang -Wvisibility -fsyntax-only include/asm-generic/cacheflush.h
include/asm-generic/cacheflush.h:16:42: warning: declaration of 'struct
mm_struct' will not be visible outside of this function [-Wvisibility]
static inline void flush_cache_mm(struct mm_struct *mm)
 ^
...
include/asm-generic/cacheflush.h:28:45: warning: declaration of 'struct
vm_area_struct' will not be visible outside of this function
[-Wvisibility]
static inline void flush_cache_range(struct vm_area_struct *vma,
^
...

Add a few forward declarations should that there are no warnings and the
ordering of the includes does not matter.

Fixes: e0cf615d725c ("asm-generic: don't include  in cacheflush.h")
Signed-off-by: Nathan Chancellor 
---
 include/asm-generic/cacheflush.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index 907fa5d16494..093b743da596 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -2,6 +2,11 @@
 #ifndef _ASM_GENERIC_CACHEFLUSH_H
 #define _ASM_GENERIC_CACHEFLUSH_H
 
+struct address_space;
+struct mm_struct;
+struct page;
+struct vm_area_struct;
+
 /*
  * The cache doesn't need to be flushed when TLB entries change when
  * the cache is mapped to physical memory, not virtual memory
-- 
2.27.0



[PATCH 0/2] Small fixes around cacheflush.h

2020-06-22 Thread Nathan Chancellor
Hi all,

These two patches are the culmination of the small discussion here:

https://lore.kernel.org/lkml/camuhmdvsdutoi5bugf9slqdgadwyl1+qalwskgin1teolgh...@mail.gmail.com/

I have fallen behind on fixing issues so sorry for not sending these
sooner and letting these warnings slip into mainline. Please let me know
if there are any comments or concerns. They are two completely
independent patches so if they need to be routed via separate trees,
that is fine. It was just easier to send them together since they are
dealing with the same problem.

Cheers,
Nathan




[PATCH tip/core/rcu 01/23] torture: Remove qemu dependency on EFI firmware

2020-06-22 Thread paulmck
From: "Paul E. McKenney" 

On some (probably misconfigured) systems, the torture-test scripting
will cause qemu to complain about missing EFI firmware, often because
qemu is trying to traverse broken symbolic links to find that firmware.
Which is a bit silly given that the default torture-test guest OS has
but a single binary for its userspace, and thus is unlikely to do much
in the way of networking in any case.

This commit therefore avoids such problems by specifying "-net none"
to qemu unless the TORTURE_QEMU_INTERACTIVE environment variable is set
(for example, by having specified "--interactive" to kvm.sh), in which
case "-net nic -net user" is specified to qemu instead.  Either choice
may be overridden by specifying the "-net" argument of your choice to
the kvm.sh "--qemu-args" parameter.

Link: https://lore.kernel.org/lkml/20190701141403.ga246...@google.com
Reported-by: Joel Fernandes 
Signed-off-by: Paul E. McKenney 
Cc: Sebastian Andrzej Siewior 
---
 tools/testing/selftests/rcutorture/bin/functions.sh | 21 ++---
 .../selftests/rcutorture/bin/kvm-test-1-run.sh  |  1 +
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/rcutorture/bin/functions.sh 
b/tools/testing/selftests/rcutorture/bin/functions.sh
index 1281022..436b154 100644
--- a/tools/testing/selftests/rcutorture/bin/functions.sh
+++ b/tools/testing/selftests/rcutorture/bin/functions.sh
@@ -215,9 +215,6 @@ identify_qemu_args () {
then
echo -device 
spapr-vlan,netdev=net0,mac=$TORTURE_QEMU_MAC
echo -netdev bridge,br=br0,id=net0
-   elif test -n "$TORTURE_QEMU_INTERACTIVE"
-   then
-   echo -net nic -net user
fi
;;
esac
@@ -275,3 +272,21 @@ specify_qemu_cpus () {
esac
fi
 }
+
+# specify_qemu_net qemu-args
+#
+# Appends a string containing "-net none" to qemu-args, unless the incoming
+# qemu-args already contains "-smp" or unless the TORTURE_QEMU_INTERACTIVE
+# environment variable is set, in which case the string that is be added is
+# instead "-net nic -net user".
+specify_qemu_net () {
+   if echo $1 | grep -q -e -net
+   then
+   echo $1
+   elif test -n "$TORTURE_QEMU_INTERACTIVE"
+   then
+   echo $1 -net nic -net user
+   else
+   echo $1 -net none
+   fi
+}
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh 
b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
index 6ff611c..1b9aebd 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
@@ -141,6 +141,7 @@ then
cpu_count=$TORTURE_ALLOTED_CPUS
 fi
 qemu_args="`specify_qemu_cpus "$QEMU" "$qemu_args" "$cpu_count"`"
+qemu_args="`specify_qemu_net "$qemu_args"`"
 
 # Generate architecture-specific and interaction-specific qemu arguments
 qemu_args="$qemu_args `identify_qemu_args "$QEMU" "$resdir/console.log"`"
-- 
2.9.5



[PATCH tip/core/rcu 04/23] rcutorture: Add races with task-exit processing

2020-06-22 Thread paulmck
From: "Paul E. McKenney" 

Several variants of Linux-kernel RCU interact with task-exit processing,
including preemptible RCU, Tasks RCU, and Tasks Trace RCU.  This commit
therefore adds testing of this interaction to rcutorture by adding
rcutorture.read_exit_burst and rcutorture.read_exit_delay kernel-boot
parameters.  These kernel parameters control the frequency and spacing
of special read-then-exit kthreads that are spawned.

[ paulmck: Apply feedback from Dan Carpenter's static checker. ]
[ paulmck: Reduce latency to avoid false-positive shutdown hangs. ]
Signed-off-by: Paul E. McKenney 
---
 Documentation/admin-guide/kernel-parameters.txt |  14 +++
 include/linux/torture.h |   5 ++
 kernel/rcu/rcutorture.c | 112 +++-
 3 files changed, 128 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index fb95fad..a0dcc92 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4258,6 +4258,20 @@
Set time (jiffies) between CPU-hotplug operations,
or zero to disable CPU-hotplug testing.
 
+   rcutorture.read_exit= [KNL]
+   Set the number of read-then-exit kthreads used
+   to test the interaction of RCU updaters and
+   task-exit processing.
+
+   rcutorture.read_exit_burst= [KNL]
+   The number of times in a given read-then-exit
+   episode that a set of read-then-exit kthreads
+   is spawned.
+
+   rcutorture.read_exit_delay= [KNL]
+   The delay, in seconds, between successive
+   read-then-exit testing episodes.
+
rcutorture.shuffle_interval= [KNL]
Set task-shuffle interval (s).  Shuffling tasks
allows some CPUs to go into dyntick-idle mode
diff --git a/include/linux/torture.h b/include/linux/torture.h
index 629b66e..7f65bd1 100644
--- a/include/linux/torture.h
+++ b/include/linux/torture.h
@@ -55,6 +55,11 @@ struct torture_random_state {
 #define DEFINE_TORTURE_RANDOM_PERCPU(name) \
DEFINE_PER_CPU(struct torture_random_state, name)
 unsigned long torture_random(struct torture_random_state *trsp);
+static inline void torture_random_init(struct torture_random_state *trsp)
+{
+   trsp->trs_state = 0;
+   trsp->trs_count = 0;
+}
 
 /* Task shuffler, which causes CPUs to occasionally go idle. */
 void torture_shuffle_task_register(struct task_struct *tp);
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index efb792e..2621a33 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -109,6 +109,10 @@ torture_param(int, object_debug, 0,
 torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs 
(s)");
 torture_param(int, onoff_interval, 0,
 "Time between CPU hotplugs (jiffies), 0=disable");
+torture_param(int, read_exit_delay, 13,
+ "Delay between read-then-exit episodes (s)");
+torture_param(int, read_exit_burst, 16,
+ "# of read-then-exit bursts per episode, zero to disable");
 torture_param(int, shuffle_interval, 3, "Number of seconds between shuffles");
 torture_param(int, shutdown_secs, 0, "Shutdown time (s), <= zero to disable.");
 torture_param(int, stall_cpu, 0, "Stall duration (s), zero to disable.");
@@ -146,6 +150,7 @@ static struct task_struct *stall_task;
 static struct task_struct *fwd_prog_task;
 static struct task_struct **barrier_cbs_tasks;
 static struct task_struct *barrier_task;
+static struct task_struct *read_exit_task;
 
 #define RCU_TORTURE_PIPE_LEN 10
 
@@ -177,6 +182,7 @@ static long n_rcu_torture_boosts;
 static atomic_long_t n_rcu_torture_timers;
 static long n_barrier_attempts;
 static long n_barrier_successes; /* did rcu_barrier test succeed? */
+static unsigned long n_read_exits;
 static struct list_head rcu_torture_removed;
 static unsigned long shutdown_jiffies;
 
@@ -1539,10 +1545,11 @@ rcu_torture_stats_print(void)
n_rcu_torture_boosts,
atomic_long_read(_rcu_torture_timers));
torture_onoff_stats();
-   pr_cont("barrier: %ld/%ld:%ld\n",
+   pr_cont("barrier: %ld/%ld:%ld ",
data_race(n_barrier_successes),
data_race(n_barrier_attempts),
data_race(n_rcu_torture_barrier_error));
+   pr_cont("read-exits: %ld\n", data_race(n_read_exits));
 
pr_alert("%s%s ", torture_type, TORTURE_FLAG);
if (atomic_read(_rcu_torture_mberror) ||
@@ -1634,7 +1641,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops 
*cur_ops, const char *tag)
 "stall_cpu=%d stall_cpu_holdoff=%d stall_cpu_irqsoff=%d "
 "stall_cpu_block=%d "
 "n_barrier_cbs=%d "
-

[PATCH tip/core/rcu 02/23] torture: Add script to smoke-test commits in a branch

2020-06-22 Thread paulmck
From: "Paul E. McKenney" 

This commit adds a kvm-check-branches.sh script that takes a list
of commits and commit ranges and runs a short rcutorture test on all
scenarios on each specified commit.  A summary is printed at the end, and
the script returns success if all rcutorture runs completed without error.

Signed-off-by: Paul E. McKenney 
---
 .../selftests/rcutorture/bin/kvm-check-branches.sh | 108 +
 1 file changed, 108 insertions(+)
 create mode 100755 tools/testing/selftests/rcutorture/bin/kvm-check-branches.sh

diff --git a/tools/testing/selftests/rcutorture/bin/kvm-check-branches.sh 
b/tools/testing/selftests/rcutorture/bin/kvm-check-branches.sh
new file mode 100755
index 000..6e65c13
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/bin/kvm-check-branches.sh
@@ -0,0 +1,108 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0+
+#
+# Run a group of kvm.sh tests on the specified commits.  This currently
+# unconditionally does three-minute runs on each scenario in CFLIST,
+# taking advantage of all available CPUs and trusting the "make" utility.
+# In the short term, adjustments can be made by editing this script and
+# CFLIST.  If some adjustments appear to have ongoing value, this script
+# might grow some command-line arguments.
+#
+# Usage: kvm-check-branches.sh commit1 commit2..commit3 commit4 ...
+#
+# This script considers its arguments one at a time.  If more elaborate
+# specification of commits is needed, please use "git rev-list" to
+# produce something that this simple script can understand.  The reason
+# for retaining the simplicity is that it allows the user to more easily
+# see which commit came from which branch.
+#
+# This script creates a .mm.dd-hh.mm.ss-group entry in the "res"
+# directory.  The calls to kvm.sh create the usual entries, but this script
+# moves them under the .mm.dd-hh.mm.ss-group entry, each in its own
+# directory numbered in run order, that is, "0001", "0002", and so on.
+# For successful runs, the large build artifacts are removed.  Doing this
+# reduces the disk space required by about two orders of magnitude for
+# successful runs.
+#
+# Copyright (C) Facebook, 2020
+#
+# Authors: Paul E. McKenney 
+
+if ! git status > /dev/null 2>&1
+then
+   echo '!!!' This script needs to run in a git archive. 1>&2
+   echo '!!!' Giving up. 1>&2
+   exit 1
+fi
+
+# Remember where we started so that we can get back and the end.
+curcommit="`git status | head -1 | awk '{ print $NF }'`"
+
+nfail=0
+ntry=0
+resdir="tools/testing/selftests/rcutorture/res"
+ds="`date +%Y.%m.%d-%H.%M.%S`-group"
+if ! test -e $resdir
+then
+   mkdir $resdir || :
+fi
+mkdir $resdir/$ds
+echo Results directory: $resdir/$ds
+
+KVM="`pwd`/tools/testing/selftests/rcutorture"; export KVM
+PATH=${KVM}/bin:$PATH; export PATH
+. functions.sh
+cpus="`identify_qemu_vcpus`"
+echo Using up to $cpus CPUs.
+
+# Each pass through this loop does one command-line argument.
+for gitbr in $@
+do
+   echo ' --- git branch ' $gitbr
+
+   # Each pass through this loop tests one commit.
+   for i in `git rev-list "$gitbr"`
+   do
+   ntry=`expr $ntry + 1`
+   idir=`awk -v ntry="$ntry" 'END { printf "%04d", ntry; }' < 
/dev/null`
+   echo ' --- commit ' $i from branch $gitbr
+   date
+   mkdir $resdir/$ds/$idir
+   echo $gitbr > $resdir/$ds/$idir/gitbr
+   echo $i >> $resdir/$ds/$idir/gitbr
+
+   # Test the specified commit.
+   git checkout $i > $resdir/$ds/$idir/git-checkout.out 2>&1
+   echo git checkout return code: $? "(Commit $ntry: $i)"
+   kvm.sh --cpus $cpus --duration 3 --trust-make > 
$resdir/$ds/$idir/kvm.sh.out 2>&1
+   ret=$?
+   echo kvm.sh return code $ret for commit $i from branch $gitbr
+
+   # Move the build products to their resting place.
+   runresdir="`grep -m 1 '^Results directory:' < 
$resdir/$ds/$idir/kvm.sh.out | sed -e 's/^Results directory://'`"
+   mv $runresdir $resdir/$ds/$idir
+   rrd="`echo $runresdir | sed -e 's,^.*/,,'`"
+   echo Run results: $resdir/$ds/$idir/$rrd
+   if test "$ret" -ne 0
+   then
+   # Failure, so leave all evidence intact.
+   nfail=`expr $nfail + 1`
+   else
+   # Success, so remove large files to save about 1GB.
+   ( cd $resdir/$ds/$idir/$rrd; rm -f */vmlinux */bzImage 
*/System.map */Module.symvers )
+   fi
+   done
+done
+date
+
+# Go back to the original commit.
+git checkout "$curcommit"
+
+if test $nfail -ne 0
+then
+   echo '!!! ' $nfail failures in $ntry 'runs!!!'
+   exit 1
+else
+   echo No failures in $ntry runs.
+   exit 0
+fi
-- 
2.9.5



[PATCH tip/core/rcu 17/23] torture: Improve diagnostic for KCSAN-incapable compilers

2020-06-22 Thread paulmck
From: "Paul E. McKenney" 

Using --kcsan when the compiler does not support KCSAN results in this:

:CONFIG_KCSAN=y: improperly set
:CONFIG_KCSAN_REPORT_ONCE_IN_MS=10: improperly set
:CONFIG_KCSAN_VERBOSE=y: improperly set
:CONFIG_KCSAN_INTERRUPT_WATCHER=y: improperly set
Clean KCSAN run in 
/home/git/linux-rcu/tools/testing/selftests/rcutorture/res/2020.06.16-09.53.16

This is a bit obtuse, so this commit adds checks resulting in this:

:CONFIG_KCSAN=y: improperly set
:CONFIG_KCSAN_REPORT_ONCE_IN_MS=10: improperly set
:CONFIG_KCSAN_VERBOSE=y: improperly set
:CONFIG_KCSAN_INTERRUPT_WATCHER=y: improperly set
Compiler or architecture does not support KCSAN!
Did you forget to switch your compiler with --kmake-arg 
CC=?

Suggested-by: Marco Elver 
Signed-off-by: Paul E. McKenney 
Acked-by: Marco Elver 
---
 tools/testing/selftests/rcutorture/bin/kvm-recheck.sh | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh 
b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh
index 357899c..840a467 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh
@@ -44,7 +44,8 @@ do
then
echo QEMU killed
fi
-   configcheck.sh $i/.config $i/ConfigFragment
+   configcheck.sh $i/.config $i/ConfigFragment > $T 2>&1
+   cat $T
if test -r $i/Make.oldconfig.err
then
cat $i/Make.oldconfig.err
@@ -73,7 +74,11 @@ do
done
if test -f "$rd/kcsan.sum"
then
-   if test -s "$rd/kcsan.sum"
+   if grep -q CONFIG_KCSAN=y $T
+   then
+   echo "Compiler or architecture does not support KCSAN!"
+   echo Did you forget to switch your compiler with 
'--kmake-arg CC='?
+   elif test -s "$rd/kcsan.sum"
then
echo KCSAN summary in $rd/kcsan.sum
else
-- 
2.9.5



[PATCH tip/core/rcu 05/23] torture: Set configfile variable to current scenario

2020-06-22 Thread paulmck
From: "Paul E. McKenney" 

The torture-test recheck logic fails to set the configfile variable to
the current scenario, so this commit properly initializes this variable.
This change isn't critical given that all errors for a given scenario
follow that scenario's heading, but it is easier on the eyes to repeat it.
And this repetition also prevents confusion as to whether a given message
goes with the previous heading or the next one.

Signed-off-by: Paul E. McKenney 
---
 tools/testing/selftests/rcutorture/bin/kvm-recheck.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh 
b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh
index 736f047..2261aa6 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh
@@ -31,6 +31,7 @@ do
head -1 $resdir/log
fi
TORTURE_SUITE="`cat $i/../TORTURE_SUITE`"
+   configfile=`echo $i | sed -e 's,^.*/,,'`
rm -f $i/console.log.*.diags
kvm-recheck-${TORTURE_SUITE}.sh $i
if test -f "$i/qemu-retval" && test "`cat $i/qemu-retval`" -ne 
0 && test "`cat $i/qemu-retval`" -ne 137
-- 
2.9.5



[PATCH tip/core/rcu 06/23] rcutorture: Handle non-statistic bang-string error messages

2020-06-22 Thread paulmck
From: "Paul E. McKenney" 

The current console parsing assumes that console lines containing "!!!"
are statistics lines from which it can parse the number of rcutorture
too-short grace-period failures.  This prints confusing output for
other problems, including memory exhaustion.  This commit therefore
differentiates between these cases and prints an appropriate error string.

Signed-off-by: Paul E. McKenney 
---
 .../testing/selftests/rcutorture/bin/parse-console.sh  | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/rcutorture/bin/parse-console.sh 
b/tools/testing/selftests/rcutorture/bin/parse-console.sh
index 4bf62d7..1c64ca8 100755
--- a/tools/testing/selftests/rcutorture/bin/parse-console.sh
+++ b/tools/testing/selftests/rcutorture/bin/parse-console.sh
@@ -44,11 +44,23 @@ then
tail -1 |
awk '
{
-   for (i=NF-8;i<=NF;i++)
+   normalexit = 1;
+   for (i=NF-8;i<=NF;i++) {
+   if (i <= 0 || i !~ /^[0-9]*$/) {
+   bangstring = $0;
+   gsub(/^\[[^]]*] /, "", bangstring);
+   print bangstring;
+   normalexit = 0;
+   exit 0;
+   }
sum+=$i;
+   }
}
-   END { print sum }'`
-   print_bug $title FAILURE, $nerrs instances
+   END {
+   if (normalexit)
+   print sum " instances"
+   }'`
+   print_bug $title FAILURE, $nerrs
exit
fi
 
-- 
2.9.5



[PATCH tip/core/rcu 21/23] torture: Avoid duplicate specification of qemu command

2020-06-22 Thread paulmck
From: "Paul E. McKenney" 

Currently, the qemu command is constructed twice, once to dump it
to the qemu-cmd file and again to execute it.  This is of course an
accident waiting to happen, but is done to ensure that the remainder
of the script has an accurate idea of the running qemu command's PID.
This commit therefore places both the qemu command and the PID capture
into a new temporary file and sources that temporary file.  Thus the
single construction of the qemu command into the qemu-cmd file suffices
for both purposes.

Signed-off-by: Paul E. McKenney 
---
 tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh 
b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
index 5ec095d..484445b 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
@@ -161,8 +161,16 @@ then
touch $resdir/buildonly
exit 0
 fi
+
+# Decorate qemu-cmd with redirection, backgrounding, and PID capture
+sed -e 's/$/ 2>\&1 \&/' < $resdir/qemu-cmd > $T/qemu-cmd
+echo 'echo $! > $resdir/qemu_pid' >> $T/qemu-cmd
+
+# In case qemu refuses to run...
 echo "NOTE: $QEMU either did not run or was interactive" > $resdir/console.log
-( $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append "$qemu_append 
$boot_args" > $resdir/qemu-output 2>&1 & echo $! > $resdir/qemu_pid; wait `cat  
$resdir/qemu_pid`; echo $? > $resdir/qemu-retval ) &
+
+# Attempt to run qemu
+( . $T/qemu-cmd; wait `cat  $resdir/qemu_pid`; echo $? > $resdir/qemu-retval ) 
&
 commandcompleted=0
 sleep 10 # Give qemu's pid a chance to reach the file
 if test -s "$resdir/qemu_pid"
-- 
2.9.5



[PATCH tip/core/rcu 10/23] rcu/rcutorture: Replace 0 with false

2020-06-22 Thread paulmck
From: Jules Irenge 

Coccinelle reports a warning

WARNING: Assignment of 0/1 to bool variable

The root cause is that the variable lastphase is a bool, but is
initialised with integer 0.  This commit therefore replaces the 0 with
a false.

Signed-off-by: Jules Irenge 
Signed-off-by: Paul E. McKenney 
---
 kernel/rcu/rcutorture.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 5911207..37455a1 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -2185,7 +2185,7 @@ static void rcu_torture_barrier1cb(void *rcu_void)
 static int rcu_torture_barrier_cbs(void *arg)
 {
long myid = (long)arg;
-   bool lastphase = 0;
+   bool lastphase = false;
bool newphase;
struct rcu_head rcu;
 
-- 
2.9.5



[PATCH tip/core/rcu 07/23] rcutorture: NULL rcu_torture_current earlier in cleanup code

2020-06-22 Thread paulmck
From: "Paul E. McKenney" 

Currently, the rcu_torture_current variable remains non-NULL until after
all readers have stopped.  During this time, rcu_torture_stats_print()
will think that the test is still ongoing, which can result in confusing
dmesg output.  This commit therefore NULLs rcu_torture_current immediately
after the rcu_torture_writer() kthread has decided to stop, thus informing
rcu_torture_stats_print() much sooner.

Signed-off-by: Paul E. McKenney 
---
 kernel/rcu/rcutorture.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 2621a33..5911207 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1172,6 +1172,7 @@ rcu_torture_writer(void *arg)
WARN(1, "%s: rtort_pipe_count: %d\n", 
__func__, rcu_tortures[i].rtort_pipe_count);
}
} while (!torture_must_stop());
+   rcu_torture_current = NULL;  // Let stats task know that we are done.
/* Reset expediting back to unexpedited. */
if (expediting > 0)
expediting = -expediting;
@@ -2473,7 +2474,6 @@ rcu_torture_cleanup(void)
 reader_tasks[i]);
kfree(reader_tasks);
}
-   rcu_torture_current = NULL;
 
if (fakewriter_tasks) {
for (i = 0; i < nfakewriters; i++) {
-- 
2.9.5



[PATCH tip/core/rcu 22/23] torture: Remove obsolete "cd $KVM"

2020-06-22 Thread paulmck
From: "Paul E. McKenney" 

In the dim distant past, qemu commands needed to be run from the
rcutorture directory, but this is no longer the case.  This commit
therefore removes the now-useless "cd $KVM" from the kvm-test-1-run.sh
script.

Signed-off-by: Paul E. McKenney 
---
 tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh 
b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
index 484445b..e07779a 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
@@ -124,7 +124,6 @@ seconds=$4
 qemu_args=$5
 boot_args=$6
 
-cd $KVM
 kstarttime=`gawk 'BEGIN { print systime() }' < /dev/null`
 if test -z "$TORTURE_BUILDONLY"
 then
-- 
2.9.5



<    1   2   3   4   5   6   7   8   9   10   >