[PATCH v3 0/7] selftsts/ftrace: Add requires list for each test case

2020-06-02 Thread Masami Hiramatsu
Hi,

Here is the 3rd version of the series of "requires:" list for
simplifying and unifying requirement checks for each test case.

The previous version is here.

https://lkml.kernel.org/r/15910259.42416.547252366885528860.stgit@devnote2

I've fixed a comment in the template file in this version.

In short, this series introduces "requires:" line instead of
checking required ftrace interfaces in each test case.

The requires line supports following checks
 - tracefs interface check: Check whether the given file or directory
   in the tracefs. (No suffix) [3/7],[4/7],[5/7]
 - available tracer check: Check whether the given tracer is available
   (":tracer" suffix) [6/7]
 - README feature check: Check whether the given string is in the
   README (":README" suffix) [7/7]

This series also includes the description line fix and
unresolved -> unsupported change ([1/7] and [2/7]).

Since this series depends on following 2 commits,

commit 619ee76f5c9f ("selftests/ftrace: Return unsupported if no
 error_log file") on Shuah's Kselftest tree
commit bea24f766efc ("selftests/ftrace: Distinguish between hist
 and synthetic event checks") on Steven's Tracing tree

This can be applied on the tree which merged both of them.
Also, you can get the series from the following.

 git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git 
ftracetest-requires-v3


Thank you,

---

Masami Hiramatsu (7):
  selftests/ftrace: Allow ":" in description
  selftests/ftrace: Return unsupported for the unconfigured features
  selftests/ftrace: Add "requires:" list support
  selftests/ftrace: Convert required interface checks into requires list
  selftests/ftrace: Convert check_filter_file() with requires list
  selftests/ftrace: Support ":tracer" suffix for requires
  selftests/ftrace: Support ":README" suffix for requires


 tools/testing/selftests/ftrace/ftracetest  |   11 ++-
 .../selftests/ftrace/test.d/00basic/snapshot.tc|3 +-
 .../selftests/ftrace/test.d/00basic/trace_pipe.tc  |3 +-
 .../ftrace/test.d/direct/kprobe-direct.tc  |6 +---
 .../ftrace/test.d/dynevent/add_remove_kprobe.tc|6 +---
 .../ftrace/test.d/dynevent/add_remove_synth.tc |5 +--
 .../ftrace/test.d/dynevent/clear_select_events.tc  |   11 +--
 .../ftrace/test.d/dynevent/generic_clear_event.tc  |8 +
 .../selftests/ftrace/test.d/event/event-enable.tc  |6 +---
 .../selftests/ftrace/test.d/event/event-no-pid.tc  |   11 +--
 .../selftests/ftrace/test.d/event/event-pid.tc |   11 +--
 .../ftrace/test.d/event/subsystem-enable.tc|6 +---
 .../ftrace/test.d/event/toplevel-enable.tc |6 +---
 .../ftrace/test.d/ftrace/fgraph-filter-stack.tc|   14 +
 .../ftrace/test.d/ftrace/fgraph-filter.tc  |8 +
 .../ftrace/test.d/ftrace/func-filter-glob.tc   |8 +
 .../test.d/ftrace/func-filter-notrace-pid.tc   |   13 +---
 .../ftrace/test.d/ftrace/func-filter-pid.tc|   13 +---
 .../ftrace/test.d/ftrace/func-filter-stacktrace.tc |3 +-
 .../selftests/ftrace/test.d/ftrace/func_cpumask.tc |6 +---
 .../ftrace/test.d/ftrace/func_event_triggers.tc|7 ++---
 .../ftrace/test.d/ftrace/func_mod_trace.tc |3 +-
 .../ftrace/test.d/ftrace/func_profile_stat.tc  |3 +-
 .../ftrace/test.d/ftrace/func_profiler.tc  |   12 +---
 .../ftrace/test.d/ftrace/func_set_ftrace_file.tc   |6 ++--
 .../ftrace/test.d/ftrace/func_stack_tracer.tc  |8 +
 .../test.d/ftrace/func_traceonoff_triggers.tc  |6 ++--
 .../ftrace/test.d/ftrace/tracing-error-log.tc  |   12 ++--
 tools/testing/selftests/ftrace/test.d/functions|   28 ++
 .../ftrace/test.d/instances/instance-event.tc  |6 +---
 .../selftests/ftrace/test.d/instances/instance.tc  |6 +---
 .../ftrace/test.d/kprobe/add_and_remove.tc |3 +-
 .../selftests/ftrace/test.d/kprobe/busy_check.tc   |3 +-
 .../selftests/ftrace/test.d/kprobe/kprobe_args.tc  |3 +-
 .../ftrace/test.d/kprobe/kprobe_args_comm.tc   |3 +-
 .../ftrace/test.d/kprobe/kprobe_args_string.tc |3 +-
 .../ftrace/test.d/kprobe/kprobe_args_symbol.tc |3 +-
 .../ftrace/test.d/kprobe/kprobe_args_syntax.tc |5 +--
 .../ftrace/test.d/kprobe/kprobe_args_type.tc   |5 +--
 .../ftrace/test.d/kprobe/kprobe_args_user.tc   |4 +--
 .../ftrace/test.d/kprobe/kprobe_eventname.tc   |3 +-
 .../ftrace/test.d/kprobe/kprobe_ftrace.tc  |6 +---
 .../ftrace/test.d/kprobe/kprobe_module.tc  |3 +-
 .../ftrace/test.d/kprobe/kprobe_multiprobe.tc  |5 +--
 .../ftrace/test.d/kprobe/kprobe_syntax_errors.tc   |5 +--
 .../ftrace/test.d/kprobe/kretprobe_args.tc |3 +-
 .../ftrace/test.d/kprobe/kretprobe_maxactive.tc|4 +--
 .../ftrace/test.d/kprobe/multiple_kprobes.tc   |3 +-
 .../selftests/ftrace/test.d/kprobe/probepoint.tc 

linux-next: manual merge of the v4l-dvb-next tree with the v4l-dvb tree

2020-06-02 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the v4l-dvb-next tree got a conflict in:

  drivers/staging/media/atomisp/pci/sh_css.c

between commit:

  27333dadef57 ("media: atomisp: adjust some code at sh_css that could be 
broken")

from the v4l-dvb tree and commits:

  815618c139d7 ("media: atomisp: fix pipeline initialization code")
  be1fdab273a9 ("media: atomisp: change the detection of ISP2401 at runtime")

from the v4l-dvb-next tree.

I fixed it up (I used the version from the latter tree) and can carry the
fix as necessary. This is now fixed as far as linux-next is concerned,
but any non trivial conflicts should be mentioned to your upstream
maintainer when your tree is submitted for merging.  You may also want
to consider cooperating with the maintainer of the conflicting tree to
minimise any particularly complex conflicts.

Can you please make sure that the v4l-dvb tree and v4l-dvb-next tree
are ins sync?  They share some patches that are not the same commits.

-- 
Cheers,
Stephen Rothwell


pgpRheuxGSO8Y.pgp
Description: OpenPGP digital signature


Re: [PATCH v10 00/10] exynos-ufs: Add support for UFS HCI

2020-06-02 Thread Martin K. Petersen
On Thu, 28 May 2020 06:46:48 +0530, Alim Akhtar wrote:

> This patch-set introduces UFS (Universal Flash Storage) host
> controller support for Samsung family SoC. Mostly, it consists of
> UFS PHY and host specific driver.
> [...]

Applied [1,2,3,4,5,9] to 5.9/scsi-queue. The series won't show up in
my public tree until shortly after -rc1 is released.

Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: qedf: remove redundant initialization of variable rc

2020-06-02 Thread Martin K. Petersen
On Wed, 27 May 2020 12:52:42 +0100, Colin King wrote:

> The variable rc is being initialized with a value that is never read
> and it is being updated later with a new value.  The initialization is
> redundant and can be removed.

Applied to 5.8/scsi-queue, thanks!

[1/1] scsi: qedf: Remove redundant initialization of variable rc
  https://git.kernel.org/mkp/scsi/c/89523cb8a67c

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v1 1/1] scsi: ufs: Don't update urgent bkops level when toggle auto bkops

2020-06-02 Thread Martin K. Petersen
On Wed, 27 May 2020 19:24:42 -0700, Can Guo wrote:

> Urgent bkops level is used to compare against actual bkops status read
> from UFS device. Urgent bkops level is set during initialization and might
> be updated in exception event handler during runtime, but it should not be
> updated to the actual bkops status every time when auto bkops is toggled.
> Otherwise, if urgent bkops level is updated to 0, auto bkops shall always
> be kept enabled.

Applied to 5.8/scsi-queue, thanks!

[1/1] scsi: ufs: Don't update urgent bkops level when toggling auto bkops
  https://git.kernel.org/mkp/scsi/c/be32acff4380

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: Fix reference count leak in iscsi_boot_create_kobj.

2020-06-02 Thread Martin K. Petersen
On Thu, 28 May 2020 15:13:53 -0500, wu000...@umn.edu wrote:

> kobject_init_and_add() should be handled when it return an error,
> because kobject_init_and_add() takes reference even when it fails.
> If this function returns an error, kobject_put() must be called to
> properly clean up the memory associated with the object. Previous
> commit "b8eb718348b8" fixed a similar problem. Thus replace calling
> kfree() by calling kobject_put().

Applied to 5.8/scsi-queue, thanks!

[1/1] scsi: iscsi: Fix reference count leak in iscsi_boot_create_kobj
  https://git.kernel.org/mkp/scsi/c/0267ffce562c

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: ufs: Remove redundant urgent_bkop_lvl initialization

2020-06-02 Thread Martin K. Petersen
On Sat, 30 May 2020 22:12:00 +0800, Stanley Chu wrote:

> In ufshcd_probe_hba(), all BKOP SW tracking variables can be reset
> together in ufshcd_force_reset_auto_bkops(), thus urgent_bkop_lvl
> initialization in the beginning of ufshcd_probe_hba() can be merged
> into ufshcd_force_reset_auto_bkops().

Applied to 5.8/scsi-queue, thanks!

[1/1] scsi: ufs: Remove redundant urgent_bkop_lvl initialization
  https://git.kernel.org/mkp/scsi/c/7b6668d8b806

-- 
Martin K. Petersen  Oracle Linux Engineering


arch/powerpc/boot/decompress.c:133: undefined reference to `__decompress'

2020-06-02 Thread kbuild test robot
Hi Nathan,

It's probably a bug fix that unveils the link errors.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   d9afbb3509900a953f5cf90bc57e793ee80c1108
commit: 5990cdee689c6885b27c6d969a3d58b09002b0bc lib/mpi: Fix building for 
powerpc with clang
date:   6 weeks ago
config: powerpc-randconfig-r032-20200602 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
2388a096e7865c043e83ece4e26654bd3d1a20d5)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
git checkout 5990cdee689c6885b27c6d969a3d58b09002b0bc
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>, old ones prefixed by <<):

powerpc-linux-gnu-ld: arch/powerpc/boot/wrapper.a(decompress.o): in function 
`partial_decompress':
>> arch/powerpc/boot/decompress.c:133: undefined reference to `__decompress'

vim +133 arch/powerpc/boot/decompress.c

1b7898ee276b39 Oliver O'Halloran 2016-09-22   98  
1b7898ee276b39 Oliver O'Halloran 2016-09-22   99  /**
1b7898ee276b39 Oliver O'Halloran 2016-09-22  100   * partial_decompress - 
decompresses part or all of a compressed buffer
1b7898ee276b39 Oliver O'Halloran 2016-09-22  101   * @inbuf:   input buffer
1b7898ee276b39 Oliver O'Halloran 2016-09-22  102   * @input_size:  length of 
the input buffer
1b7898ee276b39 Oliver O'Halloran 2016-09-22  103   * @outbuf:  input buffer
1b7898ee276b39 Oliver O'Halloran 2016-09-22  104   * @output_size: length of 
the input buffer
1b7898ee276b39 Oliver O'Halloran 2016-09-22  105   * @skip number of 
output bytes to ignore
1b7898ee276b39 Oliver O'Halloran 2016-09-22  106   *
1b7898ee276b39 Oliver O'Halloran 2016-09-22  107   * This function takes 
compressed data from inbuf, decompresses and write it to
1b7898ee276b39 Oliver O'Halloran 2016-09-22  108   * outbuf. Once output_size 
bytes are written to the output buffer, or the
1b7898ee276b39 Oliver O'Halloran 2016-09-22  109   * stream is exhausted the 
function will return the number of bytes that were
1b7898ee276b39 Oliver O'Halloran 2016-09-22  110   * decompressed. Otherwise it 
will return whatever error code the decompressor
1b7898ee276b39 Oliver O'Halloran 2016-09-22  111   * reported (NB: This is 
specific to each decompressor type).
1b7898ee276b39 Oliver O'Halloran 2016-09-22  112   *
1b7898ee276b39 Oliver O'Halloran 2016-09-22  113   * The skip functionality is 
mainly there so the program and discover
1b7898ee276b39 Oliver O'Halloran 2016-09-22  114   * the size of the compressed 
image so that it can ask firmware (if present)
1b7898ee276b39 Oliver O'Halloran 2016-09-22  115   * for an appropriately sized 
buffer.
1b7898ee276b39 Oliver O'Halloran 2016-09-22  116   */
1b7898ee276b39 Oliver O'Halloran 2016-09-22  117  long partial_decompress(void 
*inbuf, unsigned long input_size,
1b7898ee276b39 Oliver O'Halloran 2016-09-22  118void *outbuf, unsigned 
long output_size, unsigned long _skip)
1b7898ee276b39 Oliver O'Halloran 2016-09-22  119  {
1b7898ee276b39 Oliver O'Halloran 2016-09-22  120int ret;
1b7898ee276b39 Oliver O'Halloran 2016-09-22  121  
1b7898ee276b39 Oliver O'Halloran 2016-09-22  122/*
1b7898ee276b39 Oliver O'Halloran 2016-09-22  123 * The skipped bytes 
needs to be included in the size of data we want
1b7898ee276b39 Oliver O'Halloran 2016-09-22  124 * to decompress.
1b7898ee276b39 Oliver O'Halloran 2016-09-22  125 */
1b7898ee276b39 Oliver O'Halloran 2016-09-22  126output_size += _skip;
1b7898ee276b39 Oliver O'Halloran 2016-09-22  127  
1b7898ee276b39 Oliver O'Halloran 2016-09-22  128decompressed_bytes = 0;
1b7898ee276b39 Oliver O'Halloran 2016-09-22  129output_buffer = outbuf;
1b7898ee276b39 Oliver O'Halloran 2016-09-22  130limit = output_size;
1b7898ee276b39 Oliver O'Halloran 2016-09-22  131skip = _skip;
1b7898ee276b39 Oliver O'Halloran 2016-09-22  132  
1b7898ee276b39 Oliver O'Halloran 2016-09-22 @133ret = 
__decompress(inbuf, input_size, NULL, flush, outbuf,

:: The code at line 133 was first introduced by commit
:: 1b7898ee276b39e54d870dc4ef3374f663d0b426 powerpc/boot: Use the pre-boot 
decompression API

:: TO: Oliver O'Halloran 
:: CC: Michael Ellerman 

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH] tcp: fix TCP socks unreleased in BBR mode

2020-06-02 Thread Eric Dumazet
On Tue, Jun 2, 2020 at 6:53 PM Jason Xing  wrote:
>
> Hi Eric,
>
> I'm sorry that I didn't write enough clearly. We're running the
> pristine 4.19.125 linux kernel (the latest LTS version) and have been
> haunted by such an issue. This patch is high-important, I think. So
> I'm going to resend this email with the [patch 4.19] on the headline
> and cc Greg.

Yes, please always give for which tree a patch is meant for.

Problem is that your patch is not correct.
In these old kernels, tcp_internal_pacing() is called _after_ the
packet has been sent.
It is too late to 'give up pacing'

The packet should not have been sent if the pacing timer is queued
(otherwise this means we do not respect pacing)

So the bug should be caught earlier. check where tcp_pacing_check()
calls are missing.



>
>
> Thanks,
> Jason
>
> On Tue, Jun 2, 2020 at 9:05 PM Eric Dumazet  wrote:
> >
> > On Tue, Jun 2, 2020 at 1:05 AM  wrote:
> > >
> > > From: Jason Xing 
> > >
> > > TCP socks cannot be released because of the sock_hold() increasing the
> > > sk_refcnt in the manner of tcp_internal_pacing() when RTO happens.
> > > Therefore, this situation could increase the slab memory and then trigger
> > > the OOM if the machine has beening running for a long time. This issue,
> > > however, can happen on some machine only running a few days.
> > >
> > > We add one exception case to avoid unneeded use of sock_hold if the
> > > pacing_timer is enqueued.
> > >
> > > Reproduce procedure:
> > > 0) cat /proc/slabinfo | grep TCP
> > > 1) switch net.ipv4.tcp_congestion_control to bbr
> > > 2) using wrk tool something like that to send packages
> > > 3) using tc to increase the delay in the dev to simulate the busy case.
> > > 4) cat /proc/slabinfo | grep TCP
> > > 5) kill the wrk command and observe the number of objects and slabs in 
> > > TCP.
> > > 6) at last, you could notice that the number would not decrease.
> > >
> > > Signed-off-by: Jason Xing 
> > > Signed-off-by: liweishi 
> > > Signed-off-by: Shujin Li 
> > > ---
> > >  net/ipv4/tcp_output.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index cc4ba42..5cf63d9 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -969,7 +969,8 @@ static void tcp_internal_pacing(struct sock *sk, 
> > > const struct sk_buff *skb)
> > > u64 len_ns;
> > > u32 rate;
> > >
> > > -   if (!tcp_needs_internal_pacing(sk))
> > > +   if (!tcp_needs_internal_pacing(sk) ||
> > > +   hrtimer_is_queued(_sk(sk)->pacing_timer))
> > > return;
> > > rate = sk->sk_pacing_rate;
> > > if (!rate || rate == ~0U)
> > > --
> > > 1.8.3.1
> > >
> >
> > Hi Jason.
> >
> > Please do not send patches that do not apply to current upstream trees.
> >
> > Instead, backport to your kernels the needed fixes.
> >
> > I suspect that you are not using a pristine linux kernel, but some
> > heavily modified one and something went wrong in your backports.
> > Do not ask us to spend time finding what went wrong.
> >
> > Thank you.


Re: [RFC] Restrict the untrusted devices, to bind to only a set of "whitelisted" drivers

2020-06-02 Thread Rajat Jain
On Mon, Jun 1, 2020 at 10:06 PM Greg Kroah-Hartman
 wrote:
>
> On Mon, Jun 01, 2020 at 06:25:42PM -0500, Bjorn Helgaas wrote:
> > [+cc Greg, linux-kernel for wider exposure]
>
> Thanks for the cc:, missed this...
>
> >
> > On Tue, May 26, 2020 at 09:30:08AM -0700, Rajat Jain wrote:
> > > On Thu, May 14, 2020 at 7:18 PM Rajat Jain  wrote:
> > > > On Thu, May 14, 2020 at 12:13 PM Raj, Ashok  wrote:
> > > > > On Wed, May 13, 2020 at 02:26:18PM -0700, Rajat Jain wrote:
> > > > > > On Wed, May 13, 2020 at 8:19 AM Bjorn Helgaas  
> > > > > > wrote:
> > > > > > > On Fri, May 01, 2020 at 04:07:10PM -0700, Rajat Jain wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Currently, the PCI subsystem marks the PCI devices as 
> > > > > > > > "untrusted", if
> > > > > > > > the firmware asks it to:
> > > > > > > >
> > > > > > > > 617654aae50e ("PCI / ACPI: Identify untrusted PCI devices")
> > > > > > > > 9cb30a71acd4 ("PCI: OF: Support "external-facing" property")
> > > > > > > >
> > > > > > > > An "untrusted" device indicates a (likely external facing) 
> > > > > > > > device that
> > > > > > > > may be malicious, and can trigger DMA attacks on the system. It 
> > > > > > > > may
> > > > > > > > also try to exploit any vulnerabilities exposed by the driver, 
> > > > > > > > that
> > > > > > > > may allow it to read/write unintended addresses in the host 
> > > > > > > > (e.g. if
> > > > > > > > DMA buffers for the device, share memory pages with other 
> > > > > > > > driver data
> > > > > > > > structures or code etc).
> > > > > > > >
> > > > > > > > High Level proposal
> > > > > > > > ===
> > > > > > > > Currently, the "untrusted" device property is used as a hint to 
> > > > > > > > enable
> > > > > > > > IOMMU restrictions (on Intel), disable ATS (on ARM) etc. We'd 
> > > > > > > > like to
> > > > > > > > go a step further, and allow the administrator to build a list 
> > > > > > > > of
> > > > > > > > whitelisted drivers for these "untrusted" devices. This 
> > > > > > > > whitelist of
> > > > > > > > drivers are the ones that he trusts enough to have little or no
> > > > > > > > vulnerabilities. (He may have built this list of whitelisted 
> > > > > > > > drivers
> > > > > > > > by a combination of code analysis of drivers, or by extensive 
> > > > > > > > testing
> > > > > > > > using PCIe fuzzing etc). We propose that the administrator be 
> > > > > > > > allowed
> > > > > > > > to specify this list of whitelisted drivers to the kernel, and 
> > > > > > > > the PCI
> > > > > > > > subsystem to impose this behavior:
> > > > > > > >
> > > > > > > > 1) The "untrusted" devices can bind to only "whitelisted 
> > > > > > > > drivers".
> > > > > > > > 2) The other devices (i.e. dev->untrusted=0) can bind to any 
> > > > > > > > driver.
> > > > > > > >
> > > > > > > > Of course this behavior is to be imposed only if such a 
> > > > > > > > whitelist is
> > > > > > > > provided by the administrator.
> > >
> > > I haven't heard much on this proposal after the initial inputs (to
> > > which I responded). Essentially, I agree that IO-MMU and ACS
> > > restrictions need to be put in plcase. But I think we need this
> > > additionally. Does this look acceptable to you? I wanted to start
> > > spinning out the patches, but wanted to see if there are any pending
> > > comments or concerns.
> >
> > I think it makes sense to code this up and see what it would look
> > like.  The bare minimum seems like a driver "bind-to-external-devices"
> > bit that's visible in sysfs plus something in the driver probe path
> > that checks it.  Neither is inherently PCI-specific, but maybe the
> > right place will become obvious when implementing it.


Agree. I'll try to code it up.

My proposal became PCI specific because

* The need for my proposal arrived out of the potentially malicious
*external* devices that can (NOW, with the advent of thunderbolt)
directly DMA into the CPU memory space. PCI (enabled by Thunderbolt 3
and USB4) is the only interface that fits the bill for laptops at
least (There are few more interfaces that allow DMA directly into host
memory such as LPC etc, but they are all internal so far).

* It hinges on the "untrusted" attribute (I see your concerns on this,
and more on this later) which is part of the "struct pci_dev". If we
can move that flag higher up to "struct device", then we can make this
proposal not PCI specific I think.

> >
> > I'm still not 100% sure the device "external/untrusted" bit is the
> > right thing to check.  If you don't trust a driver enough to expose it
> > to an external device, is it reasonable to trust it for internal
> > devices?  It seems like one could attack the driver of even an
> > internal device like a NIC by controlling the data fed to it.
> >
> > The existing use of "external/untrusted" for IOMMU protection is
> > different.  There we're acknowledging that the *device* itself is
> > unknown and we need to protect ourselves from malicious DMA.
> >
> > Here 

Re: linux-next: manual merge of the jc_docs tree with the ext4 tree

2020-06-02 Thread Stephen Rothwell
Hi all,

On Fri, 22 May 2020 13:06:16 +1000 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the jc_docs tree got a conflict in:
> 
>   Documentation/filesystems/fiemap.rst
> 
> between commit:
> 
>   469581d9e5c9 ("fs: move fiemap range validation into the file systems 
> instances")
> 
> from the ext4 tree and commit:
> 
>   e6f7df74ec1a ("docs: filesystems: convert fiemap.txt to ReST")
> 
> from the jc_docs tree.
> 
> diff --cc Documentation/filesystems/fiemap.rst
> index 35c8571eccb6,2a572e7edc08..
> --- a/Documentation/filesystems/fiemap.rst
> +++ b/Documentation/filesystems/fiemap.rst
> @@@ -203,10 -206,9 +206,10 @@@ EINTR once fatal signal received
>   
>   
>   Flag checking should be done at the beginning of the ->fiemap callback via 
> the
> - fiemap_prep() helper:
>  -fiemap_check_flags() helper::
> ++fiemap_prep() helper::
>   
> - int fiemap_prep(struct inode *inode, struct fiemap_extent_info *fieinfo,
> - u64 start, u64 *len, u32 supported_flags);
>  -  int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
> ++  int fiemap_prep(struct inode *inode, struct fiemap_extent_info *fieinfo,
> ++  u64 start, u64 *len, u32 supported_flags);
>   
>   The struct fieinfo should be passed in as received from ioctl_fiemap(). The
>   set of fiemap flags which the fs understands should be passed via fs_flags. 
> If

This is now a conflict between the ext4 tree and Linus' tree.

-- 
Cheers,
Stephen Rothwell


pgpStJO6sTeDh.pgp
Description: OpenPGP digital signature


Re: Re: [PATCH] drm/nouveau/clk/gm20b: Fix memory leak in gm20b_clk_new()

2020-06-02 Thread dinghao . liu

> On Tue, Jun 02, 2020 at 01:10:34PM +0200, Markus Elfring wrote:
> > > The original patch was basically fine.
> > 
> > I propose to reconsider the interpretation of the software situation once 
> > more.
> > 
> > * Should the allocated clock object be kept usable even after
> >   a successful return from this function?
> 
> Heh.  You're right.  The patch is freeing "clk" on the success path so
> that doesn't work.
> 

Ben has explained this problem:
https://lore.kernel.org/patchwork/patch/1249592/
Since the caller will check "pclk" on failure, we don't need to free
"clk" in gm20b_clk_new() and I think this patch is no longer needed.

Regards,
Dinghao

Re: [PATCH] net: genetlink: Fix memleak in genl_family_rcv_msg_dumpit()

2020-06-02 Thread Yuehaibing
On 2020/6/3 2:04, Cong Wang wrote:
> On Mon, Jun 1, 2020 at 11:47 PM YueHaibing  wrote:
>> @@ -630,6 +625,9 @@ static int genl_family_rcv_msg_dumpit(const struct 
>> genl_family *family,
>> err = __netlink_dump_start(net->genl_sock, skb, nlh, );
>> }
>>
>> +   genl_family_rcv_msg_attrs_free(info->family, info->attrs, true);
>> +   genl_dumpit_info_free(info);
>> +
>> return err;
>>  }
> 
> I do not think you can just move it after __netlink_dump_start(),
> because cb->done() can be called, for example, in netlink_sock_destruct()
> too.

netlink_sock_destruct() call cb->done() while nlk->cb_running is true,

if nlk->cb_running is not set to true in __netlink_dump_start() before return,

the memleak still occurs.

> 
> 



[PATCH 0/2] Update CascadelakeX and SkylakeX events list

2020-06-02 Thread Jin Yao
This patchset updates CascadelakeX events to v1.08 and
updates SkylakeX events to v1.21.

The events have been tested on CascadelakeX and SkylakeX
servers with latest perf/core branch.

Jin Yao (2):
  perf vendor events: Update CascadelakeX events to v1.08
  perf vendor events: Update SkylakeX events to v1.21

 .../arch/x86/cascadelakex/cache.json  |   28 +-
 .../arch/x86/cascadelakex/clx-metrics.json|  153 +-
 .../arch/x86/cascadelakex/frontend.json   |   34 +
 .../arch/x86/cascadelakex/memory.json |  704 ++---
 .../arch/x86/cascadelakex/other.json  | 1100 
 .../arch/x86/cascadelakex/pipeline.json   |   10 -
 .../arch/x86/cascadelakex/uncore-other.json   |   21 +
 .../pmu-events/arch/x86/skylakex/cache.json   | 2348 +
 .../arch/x86/skylakex/floating-point.json |   96 +-
 .../arch/x86/skylakex/frontend.json   |  656 ++---
 .../pmu-events/arch/x86/skylakex/memory.json  | 1977 +++---
 .../pmu-events/arch/x86/skylakex/other.json   |  172 +-
 .../arch/x86/skylakex/pipeline.json   | 1206 +
 .../arch/x86/skylakex/skx-metrics.json|  141 +-
 .../arch/x86/skylakex/uncore-memory.json  |   26 +-
 .../arch/x86/skylakex/uncore-other.json   |  730 -
 .../arch/x86/skylakex/virtual-memory.json |  358 +--
 17 files changed, 5198 insertions(+), 4562 deletions(-)

-- 
2.17.1



RE: [PATCH] exfat: fix memory leak in exfat_parse_param()

2020-06-02 Thread Namjae Jeon
> On Wed, Jun 03, 2020 at 10:29:57AM +0900, Namjae Jeon wrote:
> 
> > exfat_free() should call exfat_free_iocharset() after stealing
> > param->string instead of kstrdup in exfat_parse_param().
> 
> ITYM
>   extfat_free() should call exfat_free_iocharset(), to prevent a leak in 
> case we fail after
> parsing iocharset= but before calling
> get_tree_bdev()
> 
>   Additionally, there's no point copying param->string in
> exfat_parse_param() - just steal it, leaving NULL in param->string.
> That's independent from the leak or fix thereof - it's simply avoiding an 
> extra copy.
Updated it in v2.
Thanks!



Re: [GIT PULL] General notification queue and key notifications

2020-06-02 Thread Ian Kent
On Tue, 2020-06-02 at 16:55 +0100, David Howells wrote:
> 
> [[ With regard to the mount/sb notifications and fsinfo(), Karel Zak
> and
>Ian Kent have been working on making libmount use them,
> preparatory to
>working on systemd:
> 
>   https://github.com/karelzak/util-linux/commits/topic/fsinfo
>   
> https://github.com/raven-au/util-linux/commits/topic/fsinfo.public
> 
>Development has stalled briefly due to other commitments, so I'm
> not
>sure I can ask you to pull those parts of the series for
> now.  Christian
>Brauner would like to use them in lxc, but hasn't started.
>]]

Linus,

Just so your aware of what has been done and where we are at here's
a summary.

Karel has done quite a bit of work on libmount (at this stage it's
getting hold of the mount information, aka. fsinfo()) and most of
what I have done is included in that too which you can see in Karel's
repo above). You can see a couple of bug fixes and a little bit of
new code present in my repo which hasn't been sent over to Karel
yet.

This infrastructure is essential before notifications work is started
which is where we will see the most improvement.

It turns out that while systemd uses libmount it has it's own
notifications handling sub-system as it deals with several event
types, not just mount information, in the same area. So, unfortunately,
changes will need to be made there as well as in libmount, more so
than the trivial changes to use fsinfo() via libmount.

That's where we are at the moment and I will get back to it once
I've dealt with a few things I postponed to work on libmount.

If you would like a more detailed account of what we have found I
can provide that too.

Is there anything else you would like from me or Karel?

Ian



[PATCH v2] exfat: fix memory leak in exfat_parse_param()

2020-06-02 Thread Namjae Jeon
From: Al Viro 

butt3rflyh4ck reported memory leak found by syzkaller.

A param->string held by exfat_mount_options.

BUG: memory leak

unreferenced object 0x88801972e090 (size 8):
  comm "syz-executor.2", pid 16298, jiffies 4295172466 (age 14.060s)
  hex dump (first 8 bytes):
6b 6f 69 38 2d 75 00 00  koi8-u..
  backtrace:
[<5bfe35d6>] kstrdup+0x36/0x70 mm/util.c:60
[<18ed3277>] exfat_parse_param+0x160/0x5e0
fs/exfat/super.c:276
[<7680462b>] vfs_parse_fs_param+0x2b4/0x610
fs/fs_context.c:147
[<97c027f2>] vfs_parse_fs_string+0xe6/0x150
fs/fs_context.c:191
[<371bf78f>] generic_parse_monolithic+0x16f/0x1f0
fs/fs_context.c:231
[<5ce5eb1b>] do_new_mount fs/namespace.c:2812 [inline]
[<5ce5eb1b>] do_mount+0x12bb/0x1b30 fs/namespace.c:3141
[] __do_sys_mount fs/namespace.c:3350 [inline]
[] __se_sys_mount fs/namespace.c:3327 [inline]
[] __x64_sys_mount+0x18f/0x230 fs/namespace.c:3327
[<3b024e98>] do_syscall_64+0xf6/0x7d0
arch/x86/entry/common.c:295
[] entry_SYSCALL_64_after_hwframe+0x49/0xb3

exfat_free() should call exfat_free_iocharset(), to prevent a leak
in case we fail after parsing iocharset= but before calling
get_tree_bdev().

Additionally, there's no point copying param->string in
exfat_parse_param() - just steal it, leaving NULL in param->string.
That's independent from the leak or fix thereof - it's simply
avoiding an extra copy.

Fixes: 719c1e182916 ("exfat: add super block operations")
Cc: sta...@vger.kernel.org # v5.7
Reported-by: butt3rflyh4ck 
Signed-off-by: Al Viro 
---
 v2:
   - update patch description in more detail.

 fs/exfat/super.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index 405717e4e3ea..e650e65536f8 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -273,9 +273,8 @@ static int exfat_parse_param(struct fs_context *fc, struct 
fs_parameter *param)
break;
case Opt_charset:
exfat_free_iocharset(sbi);
-   opts->iocharset = kstrdup(param->string, GFP_KERNEL);
-   if (!opts->iocharset)
-   return -ENOMEM;
+   opts->iocharset = param->string;
+   param->string = NULL;
break;
case Opt_errors:
opts->errors = result.uint_32;
@@ -686,7 +685,12 @@ static int exfat_get_tree(struct fs_context *fc)
 
 static void exfat_free(struct fs_context *fc)
 {
-   kfree(fc->s_fs_info);
+   struct exfat_sb_info *sbi = fc->s_fs_info;
+
+   if (sbi) {
+   exfat_free_iocharset(sbi);
+   kfree(sbi);
+   }
 }
 
 static const struct fs_context_operations exfat_context_ops = {
-- 
2.17.1



Re: [PATCH] tcp: fix TCP socks unreleased in BBR mode

2020-06-02 Thread David Miller
From: Jason Xing 
Date: Wed, 3 Jun 2020 09:53:10 +0800

> I'm sorry that I didn't write enough clearly. We're running the
> pristine 4.19.125 linux kernel (the latest LTS version) and have been
> haunted by such an issue. This patch is high-important, I think. So
> I'm going to resend this email with the [patch 4.19] on the headline
> and cc Greg.

That's not the appropriate thing to do.


Re: [RFC 02/16] x86/kvm: Introduce KVM memory protection feature

2020-06-02 Thread Huang, Kai
On Wed, 2020-05-27 at 10:39 +0200, Vitaly Kuznetsov wrote:
> Sean Christopherson  writes:
> 
> > On Mon, May 25, 2020 at 06:15:25PM +0300, Kirill A. Shutemov wrote:
> > > On Mon, May 25, 2020 at 04:58:51PM +0200, Vitaly Kuznetsov wrote:
> > > > > @@ -727,6 +734,15 @@ static void __init kvm_init_platform(void)
> > > > >  {
> > > > >   kvmclock_init();
> > > > >   x86_platform.apic_post_init = kvm_apic_init;
> > > > > +
> > > > > + if (kvm_para_has_feature(KVM_FEATURE_MEM_PROTECTED)) {
> > > > > + if (kvm_hypercall0(KVM_HC_ENABLE_MEM_PROTECTED)) {
> > > > > + pr_err("Failed to enable KVM memory
> > > > > protection\n");
> > > > > + return;
> > > > > + }
> > > > > +
> > > > > + mem_protected = true;
> > > > > + }
> > > > >  }
> > > > 
> > > > Personally, I'd prefer to do this via setting a bit in a KVM-specific
> > > > MSR instead. The benefit is that the guest doesn't need to remember if
> > > > it enabled the feature or not, it can always read the config msr. May
> > > > come handy for e.g. kexec/kdump.
> > > 
> > > I think we would need to remember it anyway. Accessing MSR is somewhat
> > > expensive. But, okay, I can rework it MSR if needed.
> > 
> > I think Vitaly is talking about the case where the kernel can't easily get
> > at its cached state, e.g. after booting into a new kernel.  The kernel would
> > still have an X86_FEATURE bit or whatever, providing a virtual MSR would be
> > purely for rare slow paths.
> > 
> > That being said, a hypercall plus CPUID bit might be better, e.g. that'd
> > allow the guest to query the state without risking a #GP.
> 
> We have rdmsr_safe() for that! :-) MSR (and hypercall to that matter)
> should have an associated CPUID feature bit of course.
> 
> Yes, hypercall + CPUID would do but normally we treat CPUID data as
> static and in this case we'll make it a dynamically flipping
> bit. Especially if we introduce 'KVM_HC_DISABLE_MEM_PROTECTED' later.

Not sure why is KVM_HC_DISABLE_MEM_PROTECTED needed?

> 
> > > Note, that we can avoid the enabling algother, if we modify BIOS to deal
> > > with private/shared memory. Currently BIOS get system crash if we enable
> > > the feature from time zero.
> > 
> > Which would mesh better with a CPUID feature bit.
> > 
> 
> And maybe even help us to resolve 'reboot' problem.

IMO we can ask Qemu to call hypercall to 'enable' memory protection when
creating VM, and guest kernel *queries* whether it is protected via CPUID
feature bit.



[PATCH 0/3] Convert i.MX/MXS I2C/LPI2C binding doc to json-schema

2020-06-02 Thread Anson Huang
Coverts i.MX/MXS I2C.LPI2C binding doc to json-schema, some examples are
too old, update them based on latest DT file, also add more compatible
based on supported SoCs.

Anson Huang (3):
  dt-bindings: i2c: Convert imx lpi2c to json-schema
  dt-bindings: i2c: Convert mxs i2c to json-schema
  dt-bindings: i2c: Convert imx i2c to json-schema

 .../devicetree/bindings/i2c/i2c-imx-lpi2c.txt  |  20 
 .../devicetree/bindings/i2c/i2c-imx-lpi2c.yaml |  45 
 Documentation/devicetree/bindings/i2c/i2c-imx.txt  |  49 -
 Documentation/devicetree/bindings/i2c/i2c-imx.yaml | 118 +
 Documentation/devicetree/bindings/i2c/i2c-mxs.txt  |  25 -
 Documentation/devicetree/bindings/i2c/i2c-mxs.yaml |  55 ++
 6 files changed, 218 insertions(+), 94 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.txt
 create mode 100644 Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.yaml
 delete mode 100644 Documentation/devicetree/bindings/i2c/i2c-imx.txt
 create mode 100644 Documentation/devicetree/bindings/i2c/i2c-imx.yaml
 delete mode 100644 Documentation/devicetree/bindings/i2c/i2c-mxs.txt
 create mode 100644 Documentation/devicetree/bindings/i2c/i2c-mxs.yaml

-- 
2.7.4



[PATCH 2/3] dt-bindings: i2c: Convert mxs i2c to json-schema

2020-06-02 Thread Anson Huang
Convert the MXS I2C binding to DT schema format using json-schema

Signed-off-by: Anson Huang 
---
 Documentation/devicetree/bindings/i2c/i2c-mxs.txt  | 25 --
 Documentation/devicetree/bindings/i2c/i2c-mxs.yaml | 55 ++
 2 files changed, 55 insertions(+), 25 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/i2c/i2c-mxs.txt
 create mode 100644 Documentation/devicetree/bindings/i2c/i2c-mxs.yaml

diff --git a/Documentation/devicetree/bindings/i2c/i2c-mxs.txt 
b/Documentation/devicetree/bindings/i2c/i2c-mxs.txt
deleted file mode 100644
index 4e1c8ac..000
--- a/Documentation/devicetree/bindings/i2c/i2c-mxs.txt
+++ /dev/null
@@ -1,25 +0,0 @@
-* Freescale MXS Inter IC (I2C) Controller
-
-Required properties:
-- compatible: Should be "fsl,-i2c"
-- reg: Should contain registers location and length
-- interrupts: Should contain ERROR interrupt number
-- clock-frequency: Desired I2C bus clock frequency in Hz.
-   Only 10Hz and 40Hz modes are supported.
-- dmas: DMA specifier, consisting of a phandle to DMA controller node
-  and I2C DMA channel ID.
-  Refer to dma.txt and fsl-mxs-dma.txt for details.
-- dma-names: Must be "rx-tx".
-
-Examples:
-
-i2c0: i2c@80058000 {
-   #address-cells = <1>;
-   #size-cells = <0>;
-   compatible = "fsl,imx28-i2c";
-   reg = <0x80058000 2000>;
-   interrupts = <111>;
-   clock-frequency = <10>;
-   dmas = <_apbx 6>;
-   dma-names = "rx-tx";
-};
diff --git a/Documentation/devicetree/bindings/i2c/i2c-mxs.yaml 
b/Documentation/devicetree/bindings/i2c/i2c-mxs.yaml
new file mode 100644
index 000..7adcba3
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/i2c-mxs.yaml
@@ -0,0 +1,55 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/i2c/i2c-mxs.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Freescale MXS Inter IC (I2C) Controller
+
+maintainers:
+  - Shawn Guo 
+
+properties:
+  compatible:
+enum:
+  - fsl,imx23-i2c
+  - fsl,imx28-i2c
+
+  reg:
+maxItems: 1
+
+  interrupts:
+maxItems: 1
+
+  clock-frequency:
+$ref: /schemas/types.yaml#/definitions/uint32
+description: |
+  Desired I2C bus clock frequency in Hz, only 10Hz and 40Hz
+  modes are supported.
+default: 10
+
+  dmas:
+maxItems: 1
+
+  dma-names:
+const: rx-tx
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - dmas
+  - dma-names
+
+examples:
+  - |
+i2c@80058000 {
+#address-cells = <1>;
+#size-cells = <0>;
+compatible = "fsl,imx28-i2c";
+reg = <0x80058000 2000>;
+interrupts = <111>;
+clock-frequency = <10>;
+dmas = <_apbx 6>;
+dma-names = "rx-tx";
+};
-- 
2.7.4



[PATCH 3/3] dt-bindings: i2c: Convert imx i2c to json-schema

2020-06-02 Thread Anson Huang
Convert the i.MX I2C binding to DT schema format using json-schema,
some improvements applied, such as update example based on latest DT
file, add more compatible for existing SoCs, and remove unnecessary
common property "pinctrl".

Signed-off-by: Anson Huang 
---
 Documentation/devicetree/bindings/i2c/i2c-imx.txt  |  49 -
 Documentation/devicetree/bindings/i2c/i2c-imx.yaml | 118 +
 2 files changed, 118 insertions(+), 49 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/i2c/i2c-imx.txt
 create mode 100644 Documentation/devicetree/bindings/i2c/i2c-imx.yaml

diff --git a/Documentation/devicetree/bindings/i2c/i2c-imx.txt 
b/Documentation/devicetree/bindings/i2c/i2c-imx.txt
deleted file mode 100644
index b967544..000
--- a/Documentation/devicetree/bindings/i2c/i2c-imx.txt
+++ /dev/null
@@ -1,49 +0,0 @@
-* Freescale Inter IC (I2C) and High Speed Inter IC (HS-I2C) for i.MX
-
-Required properties:
-- compatible :
-  - "fsl,imx1-i2c" for I2C compatible with the one integrated on i.MX1 SoC
-  - "fsl,imx21-i2c" for I2C compatible with the one integrated on i.MX21 SoC
-  - "fsl,vf610-i2c" for I2C compatible with the one integrated on Vybrid vf610 
SoC
-- reg : Should contain I2C/HS-I2C registers location and length
-- interrupts : Should contain I2C/HS-I2C interrupt
-- clocks : Should contain the I2C/HS-I2C clock specifier
-
-Optional properties:
-- clock-frequency : Constains desired I2C/HS-I2C bus clock frequency in Hz.
-  The absence of the property indicates the default frequency 100 kHz.
-- dmas: A list of two dma specifiers, one for each entry in dma-names.
-- dma-names: should contain "tx" and "rx".
-- scl-gpios: specify the gpio related to SCL pin
-- sda-gpios: specify the gpio related to SDA pin
-- pinctrl: add extra pinctrl to configure i2c pins to gpio function for i2c
-  bus recovery, call it "gpio" state
-
-Examples:
-
-i2c@83fc4000 { /* I2C2 on i.MX51 */
-   compatible = "fsl,imx51-i2c", "fsl,imx21-i2c";
-   reg = <0x83fc4000 0x4000>;
-   interrupts = <63>;
-};
-
-i2c@70038000 { /* HS-I2C on i.MX51 */
-   compatible = "fsl,imx51-i2c", "fsl,imx21-i2c";
-   reg = <0x70038000 0x4000>;
-   interrupts = <64>;
-   clock-frequency = <40>;
-};
-
-i2c0: i2c@40066000 { /* i2c0 on vf610 */
-   compatible = "fsl,vf610-i2c";
-   reg = <0x40066000 0x1000>;
-   interrupts =<0 71 0x04>;
-   dmas = < 0 50>,
-   < 0 51>;
-   dma-names = "rx","tx";
-   pinctrl-names = "default", "gpio";
-   pinctrl-0 = <_i2c1>;
-   pinctrl-1 = <_i2c1_gpio>;
-   scl-gpios = < 26 GPIO_ACTIVE_HIGH>;
-   sda-gpios = < 27 GPIO_ACTIVE_HIGH>;
-};
diff --git a/Documentation/devicetree/bindings/i2c/i2c-imx.yaml 
b/Documentation/devicetree/bindings/i2c/i2c-imx.yaml
new file mode 100644
index 000..0d31d1c
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/i2c-imx.yaml
@@ -0,0 +1,118 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/i2c/i2c-imx.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Freescale Inter IC (I2C) and High Speed Inter IC (HS-I2C) for i.MX
+
+maintainers:
+  - Wolfram Sang 
+
+properties:
+  compatible:
+oneOf:
+  - const: fsl,imx1-i2c
+  - const: fsl,imx21-i2c
+  - const: fsl,vf610-i2c
+  - items:
+  - const: fsl,imx35-i2c
+  - const: fsl,imx1-i2c
+  - items:
+  - enum:
+- fsl,imx25-i2c
+- fsl,imx27-i2c
+- fsl,imx31-i2c
+- fsl,imx50-i2c
+- fsl,imx51-i2c
+- fsl,imx53-i2c
+- fsl,imx6q-i2c
+- fsl,imx6sl-i2c
+- fsl,imx6sx-i2c
+- fsl,imx6sll-i2c
+- fsl,imx6ul-i2c
+- fsl,imx7s-i2c
+- fsl,imx8mq-i2c
+- fsl,imx8mm-i2c
+- fsl,imx8mn-i2c
+- fsl,imx8mp-i2c
+  - const: fsl,imx21-i2c
+
+  reg:
+maxItems: 1
+
+  interrupts:
+maxItems: 1
+
+  clocks:
+maxItems: 1
+
+  clock-frequency:
+$ref: /schemas/types.yaml#/definitions/uint32
+description: |
+  Constains desired I2C/HS-I2C bus clock frequency in Hz.
+  The absence of the property indicates the default frequency 100 kHz.
+default: 10
+
+  dmas:
+items:
+  - description: DMA controller phandle and request line for RX
+  - description: DMA controller phandle and request line for TX
+
+  dma-names:
+items:
+  - const: rx
+  - const: tx
+
+  sda-gpios:
+$ref: '/schemas/types.yaml#/definitions/phandle'
+description: |
+  gpio used for the sda signal, this should be flagged as
+  active high using open drain with (GPIO_ACTIVE_HIGH|GPIO_OPEN_DRAIN)
+  from  since the signal is by definition
+  open drain.
+maxItems: 1
+
+  scl-gpios:
+$ref: '/schemas/types.yaml#/definitions/phandle'
+description: |
+  gpio used for the scl 

[PATCH 1/3] dt-bindings: i2c: Convert imx lpi2c to json-schema

2020-06-02 Thread Anson Huang
Convert the i.MX LPI2C binding to DT schema format using json-schema

Signed-off-by: Anson Huang 
---
 .../devicetree/bindings/i2c/i2c-imx-lpi2c.txt  | 20 --
 .../devicetree/bindings/i2c/i2c-imx-lpi2c.yaml | 45 ++
 2 files changed, 45 insertions(+), 20 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.txt
 create mode 100644 Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.yaml

diff --git a/Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.txt 
b/Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.txt
deleted file mode 100644
index f0c072f..000
--- a/Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.txt
+++ /dev/null
@@ -1,20 +0,0 @@
-* Freescale Low Power Inter IC (LPI2C) for i.MX
-
-Required properties:
-- compatible :
-  - "fsl,imx7ulp-lpi2c" for LPI2C compatible with the one integrated on 
i.MX7ULP soc
-  - "fsl,imx8qxp-lpi2c" for LPI2C compatible with the one integrated on 
i.MX8QXP soc
-  - "fsl,imx8qm-lpi2c" for LPI2C compatible with the one integrated on i.MX8QM 
soc
-- reg : address and length of the lpi2c master registers
-- interrupts : lpi2c interrupt
-- clocks : lpi2c clock specifier
-
-Examples:
-
-lpi2c7: lpi2c7@40a5 {
-   compatible = "fsl,imx7ulp-lpi2c";
-   reg = <0x40A5 0x1>;
-   interrupt-parent = <>;
-   interrupts = ;
-   clocks = < IMX7ULP_CLK_LPI2C7>;
-};
diff --git a/Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.yaml 
b/Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.yaml
new file mode 100644
index 000..3c0be0c
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.yaml
@@ -0,0 +1,45 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/i2c/i2c-imx-lpi2c.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Freescale Low Power Inter IC (LPI2C) for i.MX
+
+maintainers:
+  - Anson Huang 
+
+properties:
+  compatible:
+enum:
+  - fsl,imx7ulp-lpi2c
+  - fsl,imx8qxp-lpi2c
+  - fsl,imx8qm-lpi2c
+
+  reg:
+maxItems: 1
+
+  interrupts:
+maxItems: 1
+
+  clocks:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+
+examples:
+  - |
+#include 
+#include 
+
+lpi2c7@40a5 {
+compatible = "fsl,imx7ulp-lpi2c";
+reg = <0x40A5 0x1>;
+interrupt-parent = <>;
+interrupts = ;
+clocks = < IMX7ULP_CLK_LPI2C7>;
+};
-- 
2.7.4



Re: [PATCH -next] vgacon: Fix an out-of-bounds in vgacon_scrollback_update()

2020-06-02 Thread Yang Yingliang

ping

On 2020/5/13 10:28, Yang Yingliang wrote:

I got a slab-out-of-bounds report when I doing fuzz test.

[  334.989515] 
==
[  334.989577] BUG: KASAN: slab-out-of-bounds in vgacon_scroll+0x57a/0x8ed
[  334.989588] Write of size 1766 at addr 8883de69ff3e by task test/2658
[  334.989593]
[  334.989608] CPU: 3 PID: 2658 Comm: test Not tainted 
5.7.0-rc5-5-g152036d1379f #789
[  334.989617] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[  334.989624] Call Trace:
[  334.989646]  dump_stack+0xe4/0x14e
[  334.989676]  print_address_description.constprop.5+0x3f/0x60
[  334.989699]  ? vgacon_scroll+0x57a/0x8ed
[  334.989710]  __kasan_report.cold.8+0x92/0xaf
[  334.989735]  ? vgacon_scroll+0x57a/0x8ed
[  334.989761]  kasan_report+0x37/0x50
[  334.989789]  check_memory_region+0x1c1/0x1e0
[  334.989806]  memcpy+0x38/0x60
[  334.989824]  vgacon_scroll+0x57a/0x8ed
[  334.989876]  con_scroll+0x4ef/0x5e0
[  334.989904]  ? lockdep_hardirqs_on+0x5e0/0x5e0
[  334.989934]  lf+0x24f/0x2a0
[  334.989951]  ? con_scroll+0x5e0/0x5e0
[  334.989975]  ? find_held_lock+0x33/0x1c0
[  334.990005]  do_con_trol+0x313/0x5ff0
[  334.990027]  ? lock_downgrade+0x730/0x730
[  334.990045]  ? reset_palette+0x440/0x440
[  334.990070]  ? _raw_spin_unlock_irqrestore+0x4b/0x60
[  334.990095]  ? notifier_call_chain+0x120/0x170
[  334.990132]  ? __atomic_notifier_call_chain+0xf0/0x180
[  334.990160]  do_con_write.part.16+0xb2b/0x1b20
[  334.990238]  ? do_con_trol+0x5ff0/0x5ff0
[  334.990258]  ? mutex_lock_io_nested+0x1280/0x1280
[  334.990269]  ? rcu_read_unlock+0x50/0x50
[  334.990315]  ? __mutex_unlock_slowpath+0xd9/0x670
[  334.990340]  ? lockdep_hardirqs_on+0x3a2/0x5e0
[  334.990368]  con_write+0x36/0xc0
[  334.990389]  do_output_char+0x561/0x780
[  334.990414]  n_tty_write+0x58e/0xd30
[  334.990478]  ? n_tty_read+0x1800/0x1800
[  334.990500]  ? prepare_to_wait_exclusive+0x300/0x300
[  334.990525]  ? __might_fault+0x17a/0x1c0
[  334.990557]  tty_write+0x430/0x960
[  334.990568]  ? n_tty_read+0x1800/0x1800
[  334.990600]  ? tty_release+0x1280/0x1280
[  334.990622]  __vfs_write+0x81/0x100
[  334.990648]  vfs_write+0x1ce/0x510
[  334.990676]  ksys_write+0x104/0x200
[  334.990691]  ? __ia32_sys_read+0xb0/0xb0
[  334.990708]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  334.990725]  ? trace_hardirqs_off_caller+0x40/0x1a0
[  334.990744]  ? do_syscall_64+0x3b/0x5e0
[  334.990775]  do_syscall_64+0xc8/0x5e0
[  334.990798]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[  334.990811] RIP: 0033:0x44f369
[  334.990827] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff 
ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
[  334.990834] RSP: 002b:7ffe9ace0968 EFLAGS: 0246 ORIG_RAX: 
0001
[  334.990848] RAX: ffda RBX: 00400418 RCX: 0044f369
[  334.990856] RDX: 0381 RSI: 20003500 RDI: 0003
[  334.990865] RBP: 7ffe9ace0980 R08: 20003530 R09: 7ffe9ace0980
[  334.990873] R10: 0001 R11: 0246 R12: 00402110
[  334.990881] R13:  R14: 006bf018 R15: 
[  334.990937]
[  334.990943] The buggy address belongs to the page:
[  334.990962] page:ea000f79a400 refcount:1 mapcount:0 
mapping:2bff47b3 index:0x0 head:ea000f79a400 order:4 
compound_mapcount:0 compound_pincount:0
[  334.990973] flags: 0x2f8001(head)
[  334.990992] raw: 002f8001 dead0100 dead0122 

[  334.991006] raw:   0001 

[  334.991013] page dumped because: kasan: bad access detected
[  334.991017]
[  334.991023] Memory state around the buggy address:
[  334.991034]  8883de6a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[  334.991044]  8883de6a0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[  334.991054] >8883de6a0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc 
fc
[  334.991061]  ^
[  334.991071]  8883de6a0180: fc fc fc fc fc fc 00 00 00 00 00 00 00 00 00 
00
[  334.991082]  8883de6a0200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[  334.991088] 
==

Because vgacon_scrollback_cur->tail plus memcpy size is greater than
vgacon_scrollback_cur->size. Fix this by checking the memcpy size.

Reported-by: Hulk Robot 
Signed-off-by: Yang Yingliang 
---
  drivers/video/console/vgacon.c | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/video/console/vgacon.c b/drivers/video/console/vgacon.c
index 998b0de1812f..b51ffb9a208d 100644
--- a/drivers/video/console/vgacon.c
+++ b/drivers/video/console/vgacon.c
@@ 

Re: linux-next: manual merge of the akpm-current tree with the btrfs tree

2020-06-02 Thread Stephen Rothwell
Hi all,

On Mon, 25 May 2020 21:11:28 +1000 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the akpm-current tree got a conflict in:
> 
>   fs/btrfs/inode.c
> 
> between commit:
> 
>   f31e5f70919f ("btrfs: switch to iomap_dio_rw() for dio")
> 
> from the btrfs tree and commit:
> 
>   2167c1133b8b ("btrfs: convert from readpages to readahead")
> 
> from the akpm-current tree.
> 
> diff --cc fs/btrfs/inode.c
> index fb95efeb63ed,8b3489f229c7..
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@@ -10075,8 -10538,8 +10060,8 @@@ static const struct address_space_opera
>   .readpage   = btrfs_readpage,
>   .writepage  = btrfs_writepage,
>   .writepages = btrfs_writepages,
> - .readpages  = btrfs_readpages,
> + .readahead  = btrfs_readahead,
>  -.direct_IO  = btrfs_direct_IO,
>  +.direct_IO  = noop_direct_IO,
>   .invalidatepage = btrfs_invalidatepage,
>   .releasepage= btrfs_releasepage,
>   #ifdef CONFIG_MIGRATION

This is now a conflict between commit

  ba206a026ff4 ("btrfs: convert from readpages to readahead")

from Linus' tree and commit

  a43a67a2d715 ("btrfs: switch to iomap_dio_rw() for dio")

from the btrfs tree.
-- 
Cheers,
Stephen Rothwell


pgpFM4HxnQ8jG.pgp
Description: OpenPGP digital signature


Re: [GIT PULL] io_uring updates for 5.8-rc1

2020-06-02 Thread Jens Axboe
On 6/2/20 5:03 PM, Linus Torvalds wrote:
> On Mon, Jun 1, 2020 at 10:55 AM Jens Axboe  wrote:
>>
>>   git://git.kernel.dk/linux-block.git for-5.8/io_uring-2020-06-01
> 
> I'm not sure why pr-tracker-bot didn't like your io_uring pull request.
> 
> It replied to your two other pull requests, but not to this one. I'm
> not seeing any hugely fundamental differences between this and the two
> others..

Pretty sure that happened last time too, but I don't know why. For
the incremental ones after the merge window, it seemed to work fine...

-- 
Jens Axboe



Re: [Question]: about 'cpuinfo_cur_freq' shown in sysfs when the CPU is in idle state

2020-06-02 Thread Hanjun Guo

On 2020/6/2 11:34, Xiongfeng Wang wrote:

Hi Viresh,

Sorry to disturb you about another problem as follows.

CPPC use the increment of Desired Performance counter and Reference Performance
counter to get the CPU frequency and show it in sysfs through
'cpuinfo_cur_freq'. But ACPI CPPC doesn't specifically define the behavior of
these two counters when the CPU is in idle state, such as stop incrementing when
the CPU is in idle state.

ARMv8.4 Extension inctroduced support for the Activity Monitors Unit (AMU). The
processor frequency cycles and constant frequency cycles in AMU can be used as
Delivered Performance counter and Reference Performance counter. These two
counter in AMU does not increase when the PE is in WFI or WFE. So the increment
is zero when the PE is in WFI/WFE. This cause no issue because
'cppc_get_rate_from_fbctrs()' in cppc_cpufreq driver will check the increment
and return the desired performance if the increment is zero.

But when the CPU goes into power down idle state, accessing these two counters
in AMU by memory-mapped address will return zero. Such as CPU1 went into power
down idle state and CPU0 try to get the frequency of CPU1. In this situation,
will display a very big value for 'cpuinfo_cur_freq' in sysfs. Do you have some
advice about this problem ?


Just a wild guess, how about just return 0 for idle CPUs? which means
the frequency is 0 for idle CPUs.



I was thinking about an idea as follows. We can run 'cppc_cpufreq_get_rate()' on
the CPU to be measured, so that we can make sure the CPU is in C0 state when we
access the two counters. Also we can return the actual frequency rather than
desired performance when the CPU is in WFI/WFE. But this modification will
change the existing logical and I am not sure if this will cause some bad 
effect.


diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 257d726..ded3bcc 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -396,9 +396,10 @@ static int cppc_get_rate_from_fbctrs(struct cppc_cpudata 
*cpu,
 return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
  }

-static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+static int cppc_cpufreq_get_rate_cpu(void *info)
  {
 struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+ unsigned int cpunum = *(unsigned int *)info;
 struct cppc_cpudata *cpu = all_cpu_data[cpunum];
 int ret;

@@ -418,6 +419,22 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int 
cpunum)
 return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
  }

+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+ unsigned int ret;
+
+ ret = smp_call_on_cpu(cpunum, cppc_cpufreq_get_rate_cpu, , true);
+
+ /*
+  * convert negative error code to zero, otherwise we will display
+  * an odd value for 'cpuinfo_cur_freq' in sysfs
+  */
+ if (ret < 0)
+ ret = 0;
+
+ return ret;
+}
+
  static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
  {
 struct cppc_cpudata *cpudata;



It will bring the CPU back if the CPU is in idle state, not friendly to
powersaving :)

Thanks
Hanjun



Re: [PATCH 1/2] perf tools: check libasan and libubsan in Makefile.config

2020-06-02 Thread Tiezhu Yang

On 06/02/2020 10:15 PM, Jiri Olsa wrote:

On Tue, Jun 02, 2020 at 12:15:03PM +0800, Tiezhu Yang wrote:

When build perf with ASan or UBSan, if libasan or libubsan can not find,
the feature-glibc is 0 and there exists the following error log which is
wrong, because we can find gnu/libc-version.h in /usr/include, glibc-devel
is also installed.

[yangtiezhu@linux perf]$ make DEBUG=1 EXTRA_CFLAGS='-fno-omit-frame-pointer 
-fsanitize=address'
   BUILD:   Doing 'make -j4' parallel build
   HOSTCC   fixdep.o
   HOSTLD   fixdep-in.o
   LINK fixdep
:1:0: warning: -fsanitize=address and -fsanitize=kernel-address are not 
supported for this target
:1:0: warning: -fsanitize=address not supported for this target

Auto-detecting system features:
... dwarf: [ OFF ]
...dwarf_getlocations: [ OFF ]
... glibc: [ OFF ]
...  gtk2: [ OFF ]
...  libaudit: [ OFF ]
...libbfd: [ OFF ]
...libcap: [ OFF ]
...libelf: [ OFF ]
...   libnuma: [ OFF ]
...numa_num_possible_cpus: [ OFF ]
...   libperl: [ OFF ]
... libpython: [ OFF ]
... libcrypto: [ OFF ]
... libunwind: [ OFF ]
...libdw-dwarf-unwind: [ OFF ]
...  zlib: [ OFF ]
...  lzma: [ OFF ]
... get_cpuid: [ OFF ]
...   bpf: [ OFF ]
...libaio: [ OFF ]
...   libzstd: [ OFF ]
...disassembler-four-args: [ OFF ]

Makefile.config:393: *** No gnu/libc-version.h found, please install 
glibc-dev[el].  Stop.
Makefile.perf:224: recipe for target 'sub-make' failed
make[1]: *** [sub-make] Error 2
Makefile:69: recipe for target 'all' failed
make: *** [all] Error 2
[yangtiezhu@linux perf]$ ls /usr/include/gnu/libc-version.h
/usr/include/gnu/libc-version.h

After install libasan and libubsan, the feature-glibc is 1 and the build
process is success, so the cause is related with libasan or libubsan, we
should check them and print an error log to reflect the reality.

Signed-off-by: Tiezhu Yang 
---
  tools/perf/Makefile.config | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 12a8204..b699d21 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -387,6 +387,12 @@ else
NO_LIBBPF := 1
NO_JVMTI := 1
  else
+  ifneq ($(shell ldconfig -p | grep libasan >/dev/null 2>&1; echo $$?), 0)
+msg := $(error No libasan found, please install libasan);
+  endif
+  ifneq ($(shell ldconfig -p | grep libubsan >/dev/null 2>&1; echo $$?), 0)
+msg := $(error No libubsan found, please install libubsan);
+  endif

hum, would it be better to have check for this in tools/build/features?


Hi Jiri,

Thanks for your suggestion.

Do you mean that it is better to add this check at the end of file
tools/build/Makefile.feature?



jirka


ifneq ($(filter s% -static%,$(LDFLAGS),),)
  msg := $(error No static glibc found, please install glibc-static);
else
--
2.1.0





Re: linux-next: manual merge of the rcu tree with the powerpc tree

2020-06-02 Thread Stephen Rothwell
Hi all,

On Tue, 19 May 2020 17:23:16 +1000 Stephen Rothwell  
wrote:
>
> Hi all,
> 
> Today's linux-next merge of the rcu tree got a conflict in:
> 
>   arch/powerpc/kernel/traps.c
> 
> between commit:
> 
>   116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting")
> 
> from the powerpc tree and commit:
> 
>   187416eeb388 ("hardirq/nmi: Allow nested nmi_enter()")
> 
> from the rcu tree.

This is now a conflict between commit

  69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()")

From Linus tree and the above powerpc tree commit.
-- 
Cheers,
Stephen Rothwell


pgpMlwY4q2EXF.pgp
Description: OpenPGP digital signature


Re: [RFC PATCH v1] irqchip: Add IRQCHIP_MODULE_BEGIN/END helper macros

2020-06-02 Thread Saravana Kannan
On Fri, May 1, 2020 at 1:23 PM Saravana Kannan  wrote:
>
> On Fri, May 1, 2020 at 1:48 AM Marc Zyngier  wrote:
> >
> > On 2020-04-29 20:04, Saravana Kannan wrote:
> > > On Wed, Apr 29, 2020 at 2:28 AM Marc Zyngier  wrote:
> >
> > [...]
> >
> > >> One thing though: this seems to be exclusively DT driven. Have you
> > >> looked into how that would look like for other firmware types such as
> > >> ACPI?
> > >
> > > I'm not very familiar with ACPI at all. I've just started to learn
> > > about how it works in the past few months poking at code when I have
> > > some time. So I haven't tried to get this to work with ACPI nor do I
> > > think I'll be able to do that anytime in the near future. I hope that
> > > doesn't block this from being used for DT based platforms.
> >
> > As long as you don't try to modularise a driver that does both DT and
> > ACPI, you'll be safe. I'm also actively trying to discourage people
> > from inventing custom irqchips on ACPI platforms (the spec almost
> > forbids them, but not quite).
> >
> > >> Another thing is the handling of dependencies. Statically built
> > >> irqchips are initialized in the right order based on the topology
> > >> described in DT, and are initialized early enough that client devices
> > >> will find their irqchip This doesn't work here, obviously.
> > >
> > > Yeah, I read that code thoroughly :)
> > >
> > >> How do you
> > >> propose we handle these dependencies, both between irqchip drivers and
> > >> client drivers?
> > >
> > > For client drivers, we don't need to do anything. The IRQ apis seem to
> > > already handle -EPROBE_DEFER correctly in this case.
> > >
> > > For irqchip drivers, the easy answer can be: Load the IRQ modules
> > > early if you make them modules.
> >
> > Uhuh. I'm afraid that's not a practical solution. We need to offer the
> > same behaviour for both and not rely on the user to understand the
> > topology of the SoC.
> >
> > > But in my case, I've been testing this with fw_devlink=on. The TL;DR
> > > of "fw_devlink=on" in this context is that the IRQ devices will get
> > > device links created based on "interrupt-parent" property. So, with
> > > the magic of device links, these IRQ devices will probe in the right
> > > topological order without any wasted deferred probe attempts. For
> > > cases without fw_devlink=on, I think I can improve
> > > platform_irqchip_probe() in my patch to check if the parent device has
> > > probed and defer if it hasn't.
> >
> > Seems like an interesting option. Two things then:
> >
> > - Can we enforce the use of fw_devlink for modularized irqchips?
>
> fw_devlink doesn't have any config and it's a command line option. So
> not sure how you can enforce that.
>
> > - For those irqchips that can be modularized, it is apparent that they
> >should have been written as platform devices the first place. Maybe
> >we should just do that (long term, though).
>
> I agree. If they can be platform devices, they should be. But when
> those platform device drivers are built in, you'll either need:
> 1) fw_devlink=on to enforce the topological init order
> Or
> 2) have a generic irqchip probe helper function that ensures that.
> My patch with some additional checks added to platform_irqchip_probe()
> can provide (2).
>
> In the short term, my patch series also makes it easier to convert
> existing non-platform drivers into platform drivers.
>
> So if I fix up platform_irqchip_probe() to also do -EPROBE_DEFER to
> enforce topology, will that make this patch acceptable?
>

Friendly reminder.

-Saravana


Re: [PATCHES] uaccess hpsa

2020-06-02 Thread Martin K. Petersen


>   hpsa compat ioctl done (hopefully) saner.  I really want
> to kill compat_alloc_user_space() off - it's always trouble and
> for a driver-private ioctls it's absolutely pointless.
>
>   Note that this is only compile-tested - I don't have the
> hardware to test it on *or* userland to issue the ioctls in
> question.  So this series definitely needs a review and testing
> from hpsa maintainers before it might go anywhere.

Don: Please test and review. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] exfat: fix memory leak in exfat_parse_param()

2020-06-02 Thread Al Viro
On Wed, Jun 03, 2020 at 10:29:57AM +0900, Namjae Jeon wrote:

> exfat_free() should call exfat_free_iocharset() after stealing
> param->string instead of kstrdup in exfat_parse_param().

ITYM
extfat_free() should call exfat_free_iocharset(), to prevent
a leak in case we fail after parsing iocharset= but before calling
get_tree_bdev()

Additionally, there's no point copying param->string in
exfat_parse_param() - just steal it, leaving NULL in param->string.
That's independent from the leak or fix thereof - it's simply
avoiding an extra copy.


Re: [PATCH] xfs/XXX: Add xfs/XXX

2020-06-02 Thread Xiao Yang

On 2020/6/3 2:14, Darrick J. Wong wrote:

On Tue, Jun 02, 2020 at 04:51:48PM +0800, Xiao Yang wrote:

On 2020/4/14 0:30, Darrick J. Wong wrote:

This might be a good time to introduce a few new helpers:

_require_scratch_dax ("Does $SCRATCH_DEV support DAX?")
_require_scratch_dax_mountopt ("Does the fs support the DAX mount options?")
_require_scratch_daX_iflag ("Does the fs support FS_XFLAG_DAX?")

Hi Darrick,

Now, I am trying to introduce these new helpers and have some questions:
1) There are five testcases related to old dax implementation, should we
only convert them to new dax implementation or make them compatible with old
and new dax implementation?


What is the 'old' DAX implementation?  ext2 XIP?

Hi Darrick,

Thanks for your quick feedback.

Right, the 'old' DAX implementation means old dax mount option(i.e. -o dax)

Compare new and old dax mount option on ext4 and xfs, is the following 
logic right?

-o dax=always == -o dax
-o dax=never == without dax
-o dax=inode == nothing

Of course, we should uses new option if ext4/xfs supports new dax mount 
option on distros.  But should we fallback to use old option if ext4/xfs 
doesn't support new dax mount option on some old distros?

btw:
it seems hard for testcases to use two different sets of mount 
options(i.e. old and new) so do you have any suggestion?





2) I think _require_xfs_io_command "chattr" "x" is enough to check if fs
supports FS_XFLAG_DAX.  Is it necessary to add _require_scratch_dax_iflag()?
like this:
_require_scratch_dax_iflag()
{
_require_xfs_io_command "chattr" "x"
}


I suggested that list based on the major control knobs that will be
visible to userspace programs.  Even if this is just a one-line helper,
its name is useful for recognizing which of those knobs we're looking
for.

Yes, you could probably save a trivial amount of time by skipping one
iteration of bash function calling, but now everyone has to remember
that the xfs_io chattr "x" flag means the dax inode flag, and not
confuse it for chmod +x or something else.


Got it, thanks for your detailed explanation.

Best Regards,
Xiao Yang


--D


Best Regards,
Xiao Yang





.







Re: [PATCH] tcp: fix TCP socks unreleased in BBR mode

2020-06-02 Thread Jason Xing
Hi Eric,

I'm sorry that I didn't write enough clearly. We're running the
pristine 4.19.125 linux kernel (the latest LTS version) and have been
haunted by such an issue. This patch is high-important, I think. So
I'm going to resend this email with the [patch 4.19] on the headline
and cc Greg.

Thanks,
Jason

On Tue, Jun 2, 2020 at 9:05 PM Eric Dumazet  wrote:
>
> On Tue, Jun 2, 2020 at 1:05 AM  wrote:
> >
> > From: Jason Xing 
> >
> > TCP socks cannot be released because of the sock_hold() increasing the
> > sk_refcnt in the manner of tcp_internal_pacing() when RTO happens.
> > Therefore, this situation could increase the slab memory and then trigger
> > the OOM if the machine has beening running for a long time. This issue,
> > however, can happen on some machine only running a few days.
> >
> > We add one exception case to avoid unneeded use of sock_hold if the
> > pacing_timer is enqueued.
> >
> > Reproduce procedure:
> > 0) cat /proc/slabinfo | grep TCP
> > 1) switch net.ipv4.tcp_congestion_control to bbr
> > 2) using wrk tool something like that to send packages
> > 3) using tc to increase the delay in the dev to simulate the busy case.
> > 4) cat /proc/slabinfo | grep TCP
> > 5) kill the wrk command and observe the number of objects and slabs in TCP.
> > 6) at last, you could notice that the number would not decrease.
> >
> > Signed-off-by: Jason Xing 
> > Signed-off-by: liweishi 
> > Signed-off-by: Shujin Li 
> > ---
> >  net/ipv4/tcp_output.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index cc4ba42..5cf63d9 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -969,7 +969,8 @@ static void tcp_internal_pacing(struct sock *sk, const 
> > struct sk_buff *skb)
> > u64 len_ns;
> > u32 rate;
> >
> > -   if (!tcp_needs_internal_pacing(sk))
> > +   if (!tcp_needs_internal_pacing(sk) ||
> > +   hrtimer_is_queued(_sk(sk)->pacing_timer))
> > return;
> > rate = sk->sk_pacing_rate;
> > if (!rate || rate == ~0U)
> > --
> > 1.8.3.1
> >
>
> Hi Jason.
>
> Please do not send patches that do not apply to current upstream trees.
>
> Instead, backport to your kernels the needed fixes.
>
> I suspect that you are not using a pristine linux kernel, but some
> heavily modified one and something went wrong in your backports.
> Do not ask us to spend time finding what went wrong.
>
> Thank you.


Re: [RFC PATCH v4 07/10] vfio/pci: introduce a new irq type VFIO_IRQ_TYPE_REMAP_BAR_REGION

2020-06-02 Thread Yan Zhao
On Tue, Jun 02, 2020 at 01:34:35PM -0600, Alex Williamson wrote:
> I'm not at all happy with this.  Why do we need to hide the migration
> sparse mmap from the user until migration time?  What if instead we
> introduced a new VFIO_REGION_INFO_CAP_SPARSE_MMAP_SAVING capability
> where the existing capability is the normal runtime sparse setup and
> the user is required to use this new one prior to enabled device_state
> with _SAVING.  The vendor driver could then simply track mmap vmas to
> the region and refuse to change device_state if there are outstanding
> mmaps conflicting with the _SAVING sparse mmap layout.  No new IRQs
> required, no new irqfds, an incremental change to the protocol,
> backwards compatible to the extent that a vendor driver requiring this
> will automatically fail migration.
> 
right. looks we need to use this approach to solve the problem.
thanks for your guide.
so I'll abandon the current remap irq way for dirty tracking during live
migration.
but anyway, it demos how to customize irq_types in vendor drivers.
then, what do you think about patches 1-5?

> > > What happens if the mmap re-evaluation occurs asynchronous to the
> > > device_state write?  The vendor driver can track outstanding mmap vmas
> > > to areas it's trying to revoke, so the vendor driver can know when
> > > userspace has reached an acceptable state (assuming we require
> > > userspace to munmap areas that are no longer valid).  We should also
> > > consider what we can accomplish by invalidating user mmaps, ex. can we
> > > fault them back in on a per-page basis and continue to mark them dirty
> > > in the migration state, re-invalidating on each iteration until they've
> > > finally been closed.   It seems the vendor driver needs to handle
> > > incrementally closing each mmap anyway, there's no requirement to the
> > > user to stop the device (ie. block all access), make these changes,
> > > then restart the device.  So perhaps the vendor driver can "limp" along
> > > until userspace completes the changes.  I think we can assume we are in
> > > a cooperative environment here, userspace wants to perform a migration,
> > > disabling direct access to some regions is for mediating those accesses
> > > during migration, not for preventing the user from accessing something
> > > they shouldn't have access to, userspace is only delaying the migration
> > > or affecting the state of their device by not promptly participating in
> > > the protocol.
> > >   
> > the problem is that the mmap re-evaluation has to be done before
> > device_state is successfully set to SAVING. otherwise, the QEMU may
> > have left save_setup stage and it's too late to start dirty tracking.
> > And the reason for us to trap the BAR regions is not because there're
> > dirty data in this region, it is because we want to know when the device
> > registers mapped in the BARs are written, so we can do dirty page track
> > of system memory in software way.
> 
> I think my proposal above resolves this.
>
yes.

> > > Another problem I see though is what about p2p DMA?  If the vendor
> > > driver invalidates an mmap we're removing it from both direct CPU as
> > > well as DMA access via the IOMMU.  We can't signal to the guest OS that
> > > a DMA channel they've been using is suddenly no longer valid.  Is QEMU
> > > going to need to avoid ever IOMMU mapping device_ram for regions
> > > subject to mmap invalidation?  That would introduce an undesirable need
> > > to choose whether we want to support p2p or migration unless we had an
> > > IOMMU that could provide dirty tracking via p2p, right?  Thanks,  
> > 
> > yes, if there are device memory mapped in the BARs to be remapped, p2p
> > DMA would be affected. Perhaps it is what vendor driver should be aware
> > of and know what it is doing before sending out the remap irq ?
> > in i40e vf's case, the BAR 0 to be remapped is only for device registers,
> > so is it still good?
> 
> No, we can't design the interface based on one vendor driver's
> implementation of the interface or the requirements of a single device.
> If we took the approach above where the user is provided both the
> normal sparse mmap and the _SAVING sparse mmap, perhaps QEMU could
> avoid DMA mapping portions that don't exist in the _SAVING version, at
> least then the p2p DMA mappings would be consistent across the
> transition.  QEMU might be able to combine the sparse mmap maps such
> that it can easily drop ranges not present during _SAVING.  QEMU would
> need to munmap() the dropped ranges rather than simply mark the
> MemoryRegion disabled though for the vendor driver to have visibility
> of the vm_ops.close callback.  Thanks,
>
ok. got it! thanks you!

Yan


linux-next: build failure after merge of the overlayfs tree

2020-06-02 Thread Stephen Rothwell
Hi all,

After merging the overlayfs tree, today's linux-next build (x86_64
allmodconfig) failed like this:

ERROR: modpost: "security_file_ioctl" [fs/overlayfs/overlay.ko] undefined!

Caused by commit

  b5940870e166 ("ovl: call secutiry hook in ovl_real_ioctl()")

I have applied this patch for today.

From: Stephen Rothwell 
Date: Wed, 3 Jun 2020 11:44:19 +1000
Subject: [PATCH] export security_file_ioctl

Signed-off-by: Stephen Rothwell 
---
 security/security.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/security/security.c b/security/security.c
index 51de970fbb1e..077ac86faacf 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1459,6 +1459,7 @@ int security_file_ioctl(struct file *file, unsigned int 
cmd, unsigned long arg)
 {
return call_int_hook(file_ioctl, 0, file, cmd, arg);
 }
+EXPORT_SYMBOL_GPL(security_file_ioctl);
 
 static inline unsigned long mmap_prot(struct file *file, unsigned long prot)
 {
-- 
2.26.2

-- 
Cheers,
Stephen Rothwell


pgpoB17a3GXlf.pgp
Description: OpenPGP digital signature


Re: [PATCH RFC] uaccess: user_access_begin_after_access_ok()

2020-06-02 Thread Al Viro
On Tue, Jun 02, 2020 at 04:45:05AM -0400, Michael S. Tsirkin wrote:
> So vhost needs to poke at userspace *a lot* in a quick succession.  It
> is thus benefitial to enable userspace access, do our thing, then
> disable. Except access_ok has already been pre-validated with all the
> relevant nospec checks, so we don't need that.  Add an API to allow
> userspace access after access_ok and barrier_nospec are done.

BTW, what are you going to do about vq->iotlb != NULL case?  Because
you sure as hell do *NOT* want e.g. translate_desc() under STAC.
Disable it around the calls of translate_desc()?

How widely do you hope to stretch the user_access areas, anyway?

BTW, speaking of possible annotations: looks like there's a large
subset of call graph that can be reached only from vhost_worker()
or from several ioctls, with all uaccess limited to that subgraph
(thankfully).  Having that explicitly marked might be a good idea...

Unrelated question, while we are at it: is there any point having
vhost_get_user() a polymorphic macro?  In all callers the third
argument is __virtio16 __user * and the second one is an explicit
* where  is __virtio16 *.  Similar for
vhost_put_user(): in all callers the third arugment is
__virtio16 __user * and the second - cpu_to_vhost16(vq, something).

Incidentally, who had come up with the name __vhost_get_user?
Makes for lovey WTF moment for readers - esp. in vhost_put_user()...


Re: [git pull] a couple of sparc ptrace fixes

2020-06-02 Thread David Miller
From: Al Viro 
Date: Sun, 31 May 2020 02:04:14 +0100

> The following changes since commit 8f3d9f354286745c751374f5f1fcafee6b3f3136:
> 
>   Linux 5.7-rc1 (2020-04-12 12:35:55 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-davem

Pulled, thanks Al.



Re: [PATCH 0/3] sparc32 SRMMU fixes for SMP

2020-06-02 Thread David Miller
From: Will Deacon 
Date: Tue, 26 May 2020 18:32:59 +0100

> Hi folks,
> 
> Enabling SMP for sparc32 uncovered some issues in the SRMMU page-table
> allocation code. One of these was introduced by me, but the other two
> seem to have been there a while and are probably just exposed more
> easily by my recent changes.
> 
> Tested on QEMU. I'm assuming these will go via David's tree.

Series applied, thanks Will.



Re: [PATCH] cxl: Fix kobject memory leak in cxl_sysfs_afu_new_cr()

2020-06-02 Thread wanghai (M)



在 2020/6/3 1:20, Markus Elfring 写道:

Fix it by adding a call to kobject_put() in the error path of
kobject_init_and_add().

Thanks for another completion of the exception handling.

Would an other patch subject be a bit nicer?

Thanks for the guidance, I will perfect this description and send a v2


…

+++ b/drivers/misc/cxl/sysfs.c
@@ -624,7 +624,7 @@ static struct afu_config_record 
*cxl_sysfs_afu_new_cr(struct cxl_afu *afu, int c
rc = kobject_init_and_add(>kobj, _config_record_type,
  >dev.kobj, "cr%i", cr->cr);
if (rc)
-   goto err;
+   goto err1;

…

Can an other label be more reasonable here?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=f359287765c04711ff54fbd11645271d8e5ff763#n465
I just used the original author's label, should I replace all his labels 
like'err','err1' with reasonable one.




Re: [PATCH] sparc: remove unused header file nfs_fs.h

2020-06-02 Thread David Miller
From: Anupam Aggarwal 
Date: Fri, 29 May 2020 17:56:00 +0530

> Remove unused header file linux/nfs_fs.h
> 
> Signed-off-by: Anupam Aggarwal 
> Signed-off-by: Vivek Trivedi 
> Signed-off-by: Amit Sahrawat 

Applied, thank you.



Re: [PATCH V2] pinctrl: sirf: add missing put_device() call in sirfsoc_gpio_probe()

2020-06-02 Thread yukuai (C)

On 2020/6/3 9:35, yu kuai wrote:

A coccicheck run provided information like the following:

drivers/pinctrl/sirf/pinctrl-sirf.c:798:2-8: ERROR: missing put_device;
call of_find_device_by_node on line 792, but without a corresponding
object release within this function.

Generated by: scripts/coccinelle/free/put_device.cocci

Thus add a jump target to fix the exception handling for this
function implementation.

Fixes: 5130216265f6 ("PINCTRL: SiRF: add GPIO and GPIO irq support in CSR 
SiRFprimaII")
Signed-off-by: yu kuai 
---
  drivers/pinctrl/sirf/pinctrl-sirf.c | 20 ++--
  1 file changed, 14 insertions(+), 6 deletions(-)

Sorry about the missing change log:

Changes in V2:
 change the variant of commit message suggested by Markus.

Best Regards,
Yu Kuai



[PATCH V2] pinctrl: sirf: add missing put_device() call in sirfsoc_gpio_probe()

2020-06-02 Thread yu kuai
A coccicheck run provided information like the following:

drivers/pinctrl/sirf/pinctrl-sirf.c:798:2-8: ERROR: missing put_device;
call of_find_device_by_node on line 792, but without a corresponding
object release within this function.

Generated by: scripts/coccinelle/free/put_device.cocci

Thus add a jump target to fix the exception handling for this
function implementation.

Fixes: 5130216265f6 ("PINCTRL: SiRF: add GPIO and GPIO irq support in CSR 
SiRFprimaII")
Signed-off-by: yu kuai 
---
 drivers/pinctrl/sirf/pinctrl-sirf.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/pinctrl/sirf/pinctrl-sirf.c 
b/drivers/pinctrl/sirf/pinctrl-sirf.c
index 1ebcb957c654..63a287d5795f 100644
--- a/drivers/pinctrl/sirf/pinctrl-sirf.c
+++ b/drivers/pinctrl/sirf/pinctrl-sirf.c
@@ -794,13 +794,17 @@ static int sirfsoc_gpio_probe(struct device_node *np)
return -ENODEV;
 
sgpio = devm_kzalloc(>dev, sizeof(*sgpio), GFP_KERNEL);
-   if (!sgpio)
-   return -ENOMEM;
+   if (!sgpio) {
+   err = -ENOMEM;
+   goto out_put_device;
+   }
spin_lock_init(>lock);
 
regs = of_iomap(np, 0);
-   if (!regs)
-   return -ENOMEM;
+   if (!regs) {
+   err = -ENOMEM;
+   goto out_put_device;
+   }
 
sgpio->chip.gc.request = sirfsoc_gpio_request;
sgpio->chip.gc.free = sirfsoc_gpio_free;
@@ -824,8 +828,10 @@ static int sirfsoc_gpio_probe(struct device_node *np)
girq->parents = devm_kcalloc(>dev, SIRFSOC_GPIO_NO_OF_BANKS,
 sizeof(*girq->parents),
 GFP_KERNEL);
-   if (!girq->parents)
-   return -ENOMEM;
+   if (!girq->parents) {
+   err = -ENOMEM;
+   goto out_put_device;
+   }
for (i = 0; i < SIRFSOC_GPIO_NO_OF_BANKS; i++) {
bank = >sgpio_bank[i];
spin_lock_init(>lock);
@@ -868,6 +874,8 @@ static int sirfsoc_gpio_probe(struct device_node *np)
gpiochip_remove(>chip.gc);
 out:
iounmap(regs);
+out_put_device:
+   put_device(>dev);
return err;
 }
 
-- 
2.25.4



Re: [RFC 09/16] KVM: Protected memory extension

2020-06-02 Thread Huang, Kai
On Mon, 2020-05-25 at 18:34 +0300, Kirill A. Shutemov wrote:
> On Mon, May 25, 2020 at 05:26:37PM +0200, Vitaly Kuznetsov wrote:
> > "Kirill A. Shutemov"  writes:
> > 
> > > Add infrastructure that handles protected memory extension.
> > > 
> > > Arch-specific code has to provide hypercalls and define non-zero
> > > VM_KVM_PROTECTED.
> > > 
> > > Signed-off-by: Kirill A. Shutemov 
> > > ---
> > >  include/linux/kvm_host.h |   4 ++
> > >  mm/mprotect.c|   1 +
> > >  virt/kvm/kvm_main.c  | 131 +++
> > >  3 files changed, 136 insertions(+)
> > > 
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index bd0bb600f610..d7072f6d6aa0 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -700,6 +700,10 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm);
> > >  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> > >  struct kvm_memory_slot *slot);
> > >  
> > > +int kvm_protect_all_memory(struct kvm *kvm);
> > > +int kvm_protect_memory(struct kvm *kvm,
> > > +unsigned long gfn, unsigned long npages, bool protect);
> > > +
> > >  int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
> > >   struct page **pages, int nr_pages);
> > >  
> > > diff --git a/mm/mprotect.c b/mm/mprotect.c
> > > index 494192ca954b..552be3b4c80a 100644
> > > --- a/mm/mprotect.c
> > > +++ b/mm/mprotect.c
> > > @@ -505,6 +505,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct
> > > vm_area_struct **pprev,
> > >   vm_unacct_memory(charged);
> > >   return error;
> > >  }
> > > +EXPORT_SYMBOL_GPL(mprotect_fixup);
> > >  
> > >  /*
> > >   * pkey==-1 when doing a legacy mprotect()
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 530af95efdf3..07d45da5d2aa 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -155,6 +155,8 @@ static void kvm_uevent_notify_change(unsigned int
> > > type, struct kvm *kvm);
> > >  static unsigned long long kvm_createvm_count;
> > >  static unsigned long long kvm_active_vms;
> > >  
> > > +static int protect_memory(unsigned long start, unsigned long end, bool
> > > protect);
> > > +
> > >  __weak int kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> > >   unsigned long start, unsigned long end, bool blockable)
> > >  {
> > > @@ -1309,6 +1311,14 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > >   if (r)
> > >   goto out_bitmap;
> > >  
> > > + if (mem->memory_size && kvm->mem_protected) {
> > > + r = protect_memory(new.userspace_addr,
> > > +new.userspace_addr + new.npages * PAGE_SIZE,
> > > +true);
> > > + if (r)
> > > + goto out_bitmap;
> > > + }
> > > +
> > >   if (old.dirty_bitmap && !new.dirty_bitmap)
> > >   kvm_destroy_dirty_bitmap();
> > >   return 0;
> > > @@ -2652,6 +2662,127 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu
> > > *vcpu, gfn_t gfn)
> > >  }
> > >  EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);
> > >  
> > > +static int protect_memory(unsigned long start, unsigned long end, bool
> > > protect)
> > > +{
> > > + struct mm_struct *mm = current->mm;
> > > + struct vm_area_struct *vma, *prev;
> > > + int ret;
> > > +
> > > + if (down_write_killable(>mmap_sem))
> > > + return -EINTR;
> > > +
> > > + ret = -ENOMEM;
> > > + vma = find_vma(current->mm, start);
> > > + if (!vma)
> > > + goto out;
> > > +
> > > + ret = -EINVAL;
> > > + if (vma->vm_start > start)
> > > + goto out;
> > > +
> > > + if (start > vma->vm_start)
> > > + prev = vma;
> > > + else
> > > + prev = vma->vm_prev;
> > > +
> > > + ret = 0;
> > > + while (true) {
> > > + unsigned long newflags, tmp;
> > > +
> > > + tmp = vma->vm_end;
> > > + if (tmp > end)
> > > + tmp = end;
> > > +
> > > + newflags = vma->vm_flags;
> > > + if (protect)
> > > + newflags |= VM_KVM_PROTECTED;
> > > + else
> > > + newflags &= ~VM_KVM_PROTECTED;
> > > +
> > > + /* The VMA has been handled as part of other memslot */
> > > + if (newflags == vma->vm_flags)
> > > + goto next;
> > > +
> > > + ret = mprotect_fixup(vma, , start, tmp, newflags);
> > > + if (ret)
> > > + goto out;
> > > +
> > > +next:
> > > + start = tmp;
> > > + if (start < prev->vm_end)
> > > + start = prev->vm_end;
> > > +
> > > + if (start >= end)
> > > + goto out;
> > > +
> > > + vma = prev->vm_next;
> > > + if (!vma || vma->vm_start != start) {
> > > + ret = -ENOMEM;
> > > + goto out;
> > > + }
> > > + }
> > > +out:
> > > + up_write(>mmap_sem);
> > > + return ret;
> > > +}
> > > +
> > > +int kvm_protect_memory(struct kvm 

[PATCH] exfat: fix memory leak in exfat_parse_param()

2020-06-02 Thread Namjae Jeon
From: Al Viro 

butt3rflyh4ck reported memory leak found by syzkaller.

A param->string held by exfat_mount_options.

BUG: memory leak

unreferenced object 0x88801972e090 (size 8):
  comm "syz-executor.2", pid 16298, jiffies 4295172466 (age 14.060s)
  hex dump (first 8 bytes):
6b 6f 69 38 2d 75 00 00  koi8-u..
  backtrace:
[<5bfe35d6>] kstrdup+0x36/0x70 mm/util.c:60
[<18ed3277>] exfat_parse_param+0x160/0x5e0
fs/exfat/super.c:276
[<7680462b>] vfs_parse_fs_param+0x2b4/0x610
fs/fs_context.c:147
[<97c027f2>] vfs_parse_fs_string+0xe6/0x150
fs/fs_context.c:191
[<371bf78f>] generic_parse_monolithic+0x16f/0x1f0
fs/fs_context.c:231
[<5ce5eb1b>] do_new_mount fs/namespace.c:2812 [inline]
[<5ce5eb1b>] do_mount+0x12bb/0x1b30 fs/namespace.c:3141
[] __do_sys_mount fs/namespace.c:3350 [inline]
[] __se_sys_mount fs/namespace.c:3327 [inline]
[] __x64_sys_mount+0x18f/0x230 fs/namespace.c:3327
[<3b024e98>] do_syscall_64+0xf6/0x7d0
arch/x86/entry/common.c:295
[] entry_SYSCALL_64_after_hwframe+0x49/0xb3

exfat_free() should call exfat_free_iocharset() after stealing
param->string instead of kstrdup in exfat_parse_param().

Fixes: 719c1e182916 ("exfat: add super block operations")
Cc: sta...@vger.kernel.org # v5.7
Reported-by: butt3rflyh4ck 
Signed-off-by: Al Viro 
---
 fs/exfat/super.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index 405717e4e3ea..e650e65536f8 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -273,9 +273,8 @@ static int exfat_parse_param(struct fs_context *fc, struct 
fs_parameter *param)
break;
case Opt_charset:
exfat_free_iocharset(sbi);
-   opts->iocharset = kstrdup(param->string, GFP_KERNEL);
-   if (!opts->iocharset)
-   return -ENOMEM;
+   opts->iocharset = param->string;
+   param->string = NULL;
break;
case Opt_errors:
opts->errors = result.uint_32;
@@ -686,7 +685,12 @@ static int exfat_get_tree(struct fs_context *fc)
 
 static void exfat_free(struct fs_context *fc)
 {
-   kfree(fc->s_fs_info);
+   struct exfat_sb_info *sbi = fc->s_fs_info;
+
+   if (sbi) {
+   exfat_free_iocharset(sbi);
+   kfree(sbi);
+   }
 }
 
 static const struct fs_context_operations exfat_context_ops = {
-- 
2.17.1



Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD

2020-06-02 Thread Martin K. Petersen


> when kernel crash, and kexec into kdump kernel, megaraid_sas will hung
> and print follow error logs
>
> 24.1485901 sd 0:0:G:0: [sda 1 tag809 BRCfl Debug mfi stat 0x2(1, data len 
> requested/conpleted 0X100
> 0/0x0)]
> 24.1867171 sd 0:0:G :9: [sda I tag861 BRCfl Debug mfft stat 0x2d, data len 
> reques ted/conp1e Led 0X100
> 0/0x0]
> 24.2054191 sd 0:O:6:O: [sda 1 tag861 FAILED Result: hustbyte=DIDGK 
> drioerbyte-DRIUCR SENSE]
> 24.2549711 bik_update_ request ! 1/0 error , dev sda, sector 937782912 op 
> 0x0:(READ) flags 0x0 phys_seg 1 prio class
> 21.2752791 buffer_io_error 2 callbacks suppressed
> 21.2752731 Duffer IO error an dev sda, logical block 117212064, async page 
> read
>
> this bug is caused by commit '59db5a931bbe73f ("scsi: megaraid_sas:
> Handle sequence JBOD map failure at driver level ")' and can be fixed
> by not set JOB when reset_devices on

Broadcom: Please review.

Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


RE: memory leak in exfat_parse_param

2020-06-02 Thread Namjae Jeon
> On Tue, Jun 02, 2020 at 01:03:05PM +0800, butt3rflyh4ck wrote:
> > I report a bug (in linux-5.7.0-rc7) found by syzkaller.
> >
> > kernel config:
> > https://protect2.fireeye.com/url?k=f3a88a7d-ae6446d8-f3a90132-0cc47a30
> > d446-6021a2fbdd1681a8=1=https%3A%2F%2Fgithub.com%2Fbutterflyhack%2
> > Fsyzkaller-fuzz%2Fblob%2Fmaster%2Fconfig-v5.7.0-rc7
> >
> > and can reproduce.
> >
> > A param->string held by exfat_mount_options.
> 
> Humm...
> 
>   First of all, exfat_free() ought to call exfat_free_upcase_table().
> What's more, WTF bother with that kstrdup(), anyway?  Just steal the string 
> and be done with that...
Thanks for your patch. I will push it to exfat tree.
> 
> Signed-off-by: Al Viro 
> ---
> diff --git a/fs/exfat/super.c b/fs/exfat/super.c index 
> 0565d5539d57..01cd7ed1614d 100644
> --- a/fs/exfat/super.c
> +++ b/fs/exfat/super.c
> @@ -259,9 +259,8 @@ static int exfat_parse_param(struct fs_context *fc, 
> struct fs_parameter *param)
>   break;
>   case Opt_charset:
>   exfat_free_iocharset(sbi);
> - opts->iocharset = kstrdup(param->string, GFP_KERNEL);
> - if (!opts->iocharset)
> - return -ENOMEM;
> + opts->iocharset = param->string;
> + param->string = NULL;
>   break;
>   case Opt_errors:
>   opts->errors = result.uint_32;
> @@ -611,7 +610,10 @@ static int exfat_get_tree(struct fs_context *fc)
> 
>  static void exfat_free(struct fs_context *fc)  {
> - kfree(fc->s_fs_info);
> + struct exfat_sb_info *sbi = fc->s_fs_info;
> +
> + exfat_free_iocharset(sbi);
> + kfree(sbi);
>  }
> 
>  static const struct fs_context_operations exfat_context_ops = {



Re: [GIT PULL] x86/mm changes for v5.8

2020-06-02 Thread Singh, Balbir
On Tue, 2020-06-02 at 16:28 -0700, Linus Torvalds wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you can confirm the sender and know the 
> content is safe.
> 
> 
> 
> On Tue, Jun 2, 2020 at 4:01 PM Singh, Balbir  wrote:
> > 
> > >  (c) and if I read the code correctly, trying to flush the L1D$ on
> > > non-intel without the HW support, it causes a WARN_ON_ONCE()! WTF?
> > 
> > That is not correct, the function only complains if we do a software 
> > fallback
> > flush without allocating the flush pages.
> 
> Right.
> 
> And if you're not on Intel, then that allocation would never have been
> done, since the allocation function returns an error for non-intel
> systems.
> 
> > That function is not exposed without
> > the user using the prctl() API, which allocates those flush pages.
> 
> See above: it doesn't actually allocate those pages on anything but intel 
> CPU's.
> 
> That said, looking deeper, it then does look like a
> l1d_flush_init_once() failure will also cause the code to avoid
> setting the TIF_SPEC_L1D_FLUSH bit, so non-intel CPU's will never call
> the actual flushing routines, and thus never hit the WARN_ON. Ok.
> 
> > >  (2) the HW case is done for any vendor, if it reports the "I have the 
> > > MSR"
> > 
> > No l1d_flush_init_once() fails for users opting in via the prctl(), it
> > succeeds for users of L1TF.
> 
> Yeah, again it looks like this all is basically just a hack for Intel CPU's.
> 
> It should never have been conditional on "do this on Intel".
> 
> It should have been conditional on the L1TF bug.
> 
> Yes, there's certainly overlap there, but it's not complete.
> 
> > >  (3) the VMX support certainly has various sanity checks like "oh, CPU
> > > doesn't have X86_BUG_L1TF, then I won't do this even if there was some
> > > kernel command line to say I should". But the new prctrl doesn't have
> > > anything like that. It just enables that L1D$ thing mindlessly,
> > > thinking that user-land software somehow knows what it's doing. BS.
> > 
> > So you'd like to see a double opt-in?
> 
> I'd like it to be gated on being sane by default, together with some
> system option like we have for pretty much all the mitigations.
> 
> > Unforunately there is no gating
> > of the bug and I tried to make it generic - clearly calling it opt-in
> > flushing for the paranoid, for those who really care about CVE-2020-0550.
> 
> No, you didn't make it generic at all - you made it depend on
> X86_VENDOR_INTEL instead.
> 
> So now the logic is "on Intel, do this thing whether it makes sense or
> not, on other vendors, never do it whether it _would_ make sense or
> not".
> 
> That to me is not sensible. I just don't see the logic.
> 
> This feature should never be enabled unless X86_BUG_L1TF is on, as far
> as I can tell.
> 
> And it should never be enabled if SMT is on.
> 
> At that point, it at least starts making sense. Maybe we don't need
> any further admin options at that point.
> 
> > Would this make you happier?
> > 
> > 1. Remove SW fallback flush
> > 2. Implement a double opt-in (CAP_SYS_ADMIN for the prctl or a
> >system wide disable)?
> > 3. Ensure the flush happens only when the current core has
> >SMT disabled
> 
> I think that (3) case should basically be "X86_BUG_L1TF && !SMT". That
> should basically be the default setting for this.
> 
> The (2) thing I would prefer to just be the same kind of thing we do
> for all the other mitigations: have a kernel command line to override
> the defaults.
> 
> The SW fallback right now feels wrong to me. It does seem to be very
> microarchitecture-specific and I'd really like to understand the
> reason for the magic TLB filling. At the same time, if the feature is
> at least enabled under sane and understandable circumstances, and
> people have a way to turn it off, maybe I don't care too much.
>

I cooked up a quick patch (yet untested patch, which leaves the current
refactoring as is) for comments. This should hopefully address your concerns.
This is not the final patch, just the approach for the line of thinking
so far.


diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index a58360c8e6e8..988a9d0c31ec 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -293,6 +293,13 @@ enum taa_mitigations {
TAA_MITIGATION_TSX_DISABLED,
 };
 
+enum l1d_flush_out_mitigations {
+   L1D_FLUSH_OUT_OFF,
+   L1D_FLUSH_OUT_ON,
+};
+
+static enum l1d_flush_out_mitigations l1d_flush_out_mitigation __ro_after_init 
= L1D_FLUSH_OUT_ON;
+
 /* Default mitigation for TAA-affected CPUs */
 static enum taa_mitigations taa_mitigation __ro_after_init = 
TAA_MITIGATION_VERW;
 static bool taa_nosmt __ro_after_init;
@@ -376,6 +383,18 @@ static void __init taa_select_mitigation(void)
pr_info("%s\n", taa_strings[taa_mitigation]);
 }
 
+static int __init l1d_flush_out_parse_cmdline(char *str)
+{
+   if 

Re: [PATCH v2] scsi: st: convert convert get_user_pages() --> pin_user_pages()

2020-06-02 Thread Martin K. Petersen


> This code was using get_user_pages*(), in a "Case 1" scenario (Direct
> IO), using the categorization from [1]. That means that it's time to
> convert the get_user_pages*() + put_page() calls to pin_user_pages*()
> + unpin_user_pages() calls.

Kai: Please review.

Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] mm/vmstat: Add events for PMD based THP migration without split

2020-06-02 Thread Anshuman Khandual



On 06/02/2020 08:31 PM, Matthew Wilcox wrote:
> On Fri, May 22, 2020 at 09:04:04AM +0530, Anshuman Khandual wrote:
>> This adds the following two new VM events which will help in validating PMD
>> based THP migration without split. Statistics reported through these events
>> will help in performance debugging.
>>
>> 1. THP_PMD_MIGRATION_SUCCESS
>> 2. THP_PMD_MIGRATION_FAILURE
> 
> There's nothing actually PMD specific about these events, is there?
> If we have a THP of a non-PMD size, you'd want that reported through the
> same statistic, wouldn't you?

Yes, there is nothing PMD specific here and we would use the same statistics
for non-PMD size THP migration (if any) as well. But is THP migration really
supported for non-PMD sizes ? CONFIG_ARCH_ENABLE_THP_MIGRATION depends upon
CONFIG_TRANSPARENT_HUGEPAGE without being specific or denying about possible
PUD level support. Fair enough, will drop the PMD from the events and their
functions.


[GIT PULL] erofs updates for 5.8-rc1

2020-06-02 Thread Gao Xiang
Hi Linus,

Could you consider this pull request for 5.8-rc1?

The most outstanding part is the new mount api conversion, which is
actually a old patch already pending for several cycles. And the others
are recent trivial cleanups here.

All commits have been tested and have been in linux-next as well.
This merges cleanly with master.

Thanks,
Gao Xiang

The following changes since commit 9cb1fd0efd195590b828b9b865421ad345a4a145:

  Linux 5.7-rc7 (2020-05-24 15:32:54 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git 
tags/erofs-for-5.8-rc1

for you to fetch changes up to 34f853b849eb6a509eb8f40f2f5946ebb1f62739:

  erofs: suppress false positive last_block warning (2020-05-29 18:58:13 +0800)


Changes since last update:

 - Convert to use the new mount apis;

 - Some random cleanup patches.


Chao Yu (1):
  erofs: convert to use the new mount fs_context api

Chengguang Xu (1):
  erofs: code cleanup by removing ifdef macro surrounding

Gao Xiang (1):
  erofs: suppress false positive last_block warning

 fs/erofs/data.c |   4 +-
 fs/erofs/inode.c|   6 --
 fs/erofs/internal.h |  27 +++---
 fs/erofs/namei.c|   2 -
 fs/erofs/super.c| 255 +++-
 fs/erofs/xattr.c|   4 +-
 fs/erofs/xattr.h|   7 +-
 fs/erofs/zdata.c|   4 +-
 8 files changed, 136 insertions(+), 173 deletions(-)



Re: [PATCH 08/14] x86/entry: Optimize local_db_save() for virt

2020-06-02 Thread Sean Christopherson
On Fri, May 29, 2020 at 11:27:36PM +0200, Peter Zijlstra wrote:
> Because DRn access is 'difficult' with virt; but the DR7 read is
> cheaper than a cacheline miss on native, add a virt specific
> fast path to local_db_save(), such that when breakpoints are not in
> use we avoid touching DRn entirely.
> 
> Suggested-by: Andy Lutomirski 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  arch/x86/include/asm/debugreg.h |7 ++-
>  arch/x86/kernel/hw_breakpoint.c |   26 ++
>  arch/x86/kvm/vmx/nested.c   |2 +-
>  3 files changed, 29 insertions(+), 6 deletions(-)

...

> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3028,9 +3028,9 @@ static int nested_vmx_check_vmentry_hw(s
>   /*
>* VMExit clears RFLAGS.IF and DR7, even on a consistency check.
>*/
> - local_irq_enable();
>   if (hw_breakpoint_active())
>   set_debugreg(__this_cpu_read(cpu_dr7), 7);
> + local_irq_enable();
>   preempt_enable();

This should be a separate patch, probably with:

  Cc: sta...@vger.kernel.org
  Fixes: 52017608da33 ("KVM: nVMX: add option to perform early consistency 
checks via H/W")



[PATCH v3 3/4] seccomp: Introduce addfd ioctl to seccomp user notifier

2020-06-02 Thread Sargun Dhillon
This adds a seccomp notifier ioctl which allows for the listener to "add"
file descriptors to a process which originated a seccomp user
notification. This allows calls like mount, and mknod to be "implemented",
as the return value, and the arguments are data in memory. On the other
hand, calls like connect can be "implemented" using pidfd_getfd.

Unfortunately, there are calls which return file descriptors, like
open, which are vulnerable to TOC-TOU attacks, and require that the
more privileged supervisor can inspect the argument, and perform the
syscall on behalf of the process generating the notification. This
allows the file descriptor generated from that open call to be
returned to the calling process.

In addition, there is funcitonality to allow for replacement of
specific file descriptors, following dup2-like semantics.

This extends a previously added helper (file_receive), and introduces
a new helper built on top of it -- file_receive_replace, which is
meant to assist with calling replace_fd, with files received from
remote processes.

As a note, the seccomp_notif_addfd structure is laid out based on 8-byte
alignment without requiring packing as there have been packing issues with
uapi highlighted before [1][2]. Although we could overload the newfd field
and use -1 to indicate that it is not to be used, doing so requires
changing the size of the fd field, and introduces struct packing
complexity.

[1]: https://lore.kernel.org/lkml/87o8w9bcaf@mid.deneb.enyo.de/
[2]: 
https://lore.kernel.org/lkml/a328b91d-fd8f-4f27-b3c2-91a9c45f1...@rasmusvillemoes.dk/

Signed-off-by: Sargun Dhillon 
Suggested-by: Matt Denton 
Cc: Al Viro 
Cc: Chris Palmer 
Cc: Christian Brauner 
Cc: Jann Horn 
Cc: Kees Cook 
Cc: Robert Sesek 
Cc: Tycho Andersen 
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 fs/file.c|  29 +-
 include/linux/file.h |   1 +
 include/uapi/linux/seccomp.h |  25 +
 kernel/seccomp.c | 184 ++-
 4 files changed, 234 insertions(+), 5 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 5afd76fca8c2..eb413c1fdb7f 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -938,15 +938,19 @@ int replace_fd(unsigned fd, struct file *file, unsigned 
flags)
  * File Receive - Receive a file from another process
  *
  * This function is designed to receive files from other tasks. It encapsulates
- * logic around security and cgroups. The file descriptor provided must be a
- * freshly allocated (unused) file descriptor.
+ * logic around security and cgroups. It can either replace an existing file
+ * descriptor, or install the file at a new unused one. If the file is meant
+ * to be installed on a new file descriptor, it must be allocated with the
+ * right flags by the user and the flags passed must be 0 -- as anything else
+ * is ignored.
  *
  * This helper does not consume a reference to the file, so the caller must put
  * their reference.
  *
  * Returns 0 upon success.
  */
-int file_receive(int fd, struct file *file)
+static int __file_receive(int fd, unsigned int flags, struct file *file,
+ bool replace)
 {
struct socket *sock;
int err;
@@ -955,7 +959,14 @@ int file_receive(int fd, struct file *file)
if (err)
return err;
 
-   fd_install(fd, get_file(file));
+   if (replace) {
+   err = replace_fd(fd, file, flags);
+   if (err)
+   return err;
+   } else {
+   WARN_ON(flags);
+   fd_install(fd, get_file(file));
+   }
 
sock = sock_from_file(file, );
if (sock) {
@@ -966,6 +977,16 @@ int file_receive(int fd, struct file *file)
return 0;
 }
 
+int file_receive_replace(int fd, unsigned int flags, struct file *file)
+{
+   return __file_receive(fd, flags, file, true);
+}
+
+int file_receive(int fd, struct file *file)
+{
+   return __file_receive(fd, 0, file, false);
+}
+
 static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
 {
int err = -EBADF;
diff --git a/include/linux/file.h b/include/linux/file.h
index 7b56dc23e560..e4ca058fb559 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -94,5 +94,6 @@ extern void fd_install(unsigned int fd, struct file *file);
 extern void flush_delayed_fput(void);
 extern void __fput_sync(struct file *);
 
+extern int file_receive_replace(int fd, unsigned int flags, struct file *file);
 extern int file_receive(int fd, struct file *file);
 #endif /* __LINUX_FILE_H */
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index c1735455bc53..aec3e43c4418 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -113,6 +113,27 @@ struct seccomp_notif_resp {
__u32 flags;
 };
 
+/* valid flags for seccomp_notif_addfd */
+#define SECCOMP_ADDFD_FLAG_SETFD   (1UL << 0) /* Specify remote fd */
+
+/**

[PATCH v3 2/4] pid: Use file_receive helper to copy FDs

2020-06-02 Thread Sargun Dhillon
The code to copy file descriptors was duplicated in pidfd_getfd.
Rather than continue to duplicate it, this hoists the code out of
kernel/pid.c and uses the newly added file_receive helper.

Earlier, when this was implemented there was some back-and-forth
about how the semantics should work around copying around file
descriptors [1], and it was decided that the default behaviour
should be to not modify cgroup data. As a matter of least surprise,
this approach follows the default semantics as presented by SCM_RIGHTS.

In the future, a flag can be added to avoid manipulating the cgroup
data on copy.

[1]: https://lore.kernel.org/lkml/20200107175927.4558-1-sar...@sargun.me/

Signed-off-by: Sargun Dhillon 
Suggested-by: Kees Cook 
Cc: Al Viro 
Cc: Christian Brauner 
Cc: Daniel Wagner 
Cc: David S. Miller 
Cc: Jann Horn 
Cc: John Fastabend 
Cc: Tejun Heo 
Cc: Tycho Andersen 
Cc: sta...@vger.kernel.org
Cc: cgro...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 kernel/pid.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index c835b844aca7..1642cf940aa1 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -606,7 +606,7 @@ static int pidfd_getfd(struct pid *pid, int fd)
 {
struct task_struct *task;
struct file *file;
-   int ret;
+   int ret, err;
 
task = get_pid_task(pid, PIDTYPE_PID);
if (!task)
@@ -617,18 +617,16 @@ static int pidfd_getfd(struct pid *pid, int fd)
if (IS_ERR(file))
return PTR_ERR(file);
 
-   ret = security_file_receive(file);
-   if (ret) {
-   fput(file);
-   return ret;
-   }
-
ret = get_unused_fd_flags(O_CLOEXEC);
-   if (ret < 0)
-   fput(file);
-   else
-   fd_install(ret, file);
+   if (ret >= 0) {
+   err = file_receive(ret, file);
+   if (err) {
+   put_unused_fd(ret);
+   ret = err;
+   }
+   }
 
+   fput(file);
return ret;
 }
 
-- 
2.25.1



Re: [PATCH v2 1/2] media: dt-bindings: media: xilinx: Add Xilinx UHD-SDI Receiver Subsystem

2020-06-02 Thread Laurent Pinchart
Hi Vishal,

On Mon, Jun 01, 2020 at 03:14:52PM +, Vishal Sagar wrote:
> On Wednesday, May 6, 2020 6:32 PM, Laurent Pinchart wrote:
> > On Wed, Apr 29, 2020 at 07:47:03PM +0530, Vishal Sagar wrote:
> > > Add bindings documentation for Xilinx UHD-SDI Receiver Subsystem.
> > >
> > > The Xilinx UHD-SDI Receiver Subsystem consists of SMPTE UHD-SDI (RX) IP
> > > core, an SDI RX to Video Bridge IP core to convert SDI video to native
> > > video and a Video In to AXI4-Stream IP core to convert native video to
> > > AXI4-Stream.
> > >
> > > Signed-off-by: Vishal Sagar 
> > > ---
> > > v2
> > > - Removed references to xlnx,video*
> > > - Fixed as per Sakari Ailus and Rob Herring's comments
> > > - Converted to yaml format
> > >
> > >  .../bindings/media/xilinx/xlnx,sdirxss.yaml   | 132 ++
> > >  1 file changed, 132 insertions(+)
> > >  create mode 100644 
> > > Documentation/devicetree/bindings/media/xilinx/xlnx,sdirxss.yaml
> > >
> > > diff --git
> > a/Documentation/devicetree/bindings/media/xilinx/xlnx,sdirxss.yaml
> > b/Documentation/devicetree/bindings/media/xilinx/xlnx,sdirxss.yaml
> > > new file mode 100644
> > > index ..9133ad19df55
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/media/xilinx/xlnx,sdirxss.yaml
> > > @@ -0,0 +1,132 @@
> > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > +%YAML 1.2
> > > +---
> > > +$id: http://devicetree.org/schemas/media/xilinx/xlnx,sdirxss.yaml#
> > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > +
> > > +
> > > +title: Xilinx SMPTE UHD-SDI Receiver Subsystem
> > > +
> > > +maintainers:
> > > +  - Vishal Sagar 
> > > +
> > > +description: |
> > > +  The SMPTE UHD-SDI Receiver (RX) Subsystem allows you to quickly create 
> > > systems
> > > +  based on SMPTE SDI protocols. It receives unaligned native SDI streams 
> > > from
> > > +  the SDI GT PHY and outputs an AXI4-Stream video stream, native video, 
> > > or
> > > +  native SDI using Xilinx transceivers as the physical layer.
> > > +
> > > +  The subsystem consists of
> > > +  1 - SMPTE UHD-SDI Rx
> > > +  2 - SDI Rx to Native Video Bridge
> > > +  3 - Video In to AXI4-Stream Bridge
> > > +
> > > +  The subsystem can capture SDI streams in upto 12G mode 8 data streams 
> > > and output
> > 
> > s/upto/up to/
> 
> I will fix this in next version. 
> 
> > > +  a dual pixel per clock RGB/YUV444,422/420 10/12 bits per component 
> > > AXI4-Stream.
> > > +
> > > +properties:
> > > +  compatible:
> > > +items:
> > > +  - enum:
> > > +- xlnx,v-smpte-uhdsdi-rx-ss-2.0
> > > +
> > > +  reg:
> > > +maxItems: 1
> > > +
> > > +  interrupts:
> > > +maxItems: 1
> > > +
> > > +  clocks:
> > > +description: List of clock specifiers
> > > +items:
> > > +  - description: AXI4-Lite clock
> > > +  - description: SMPTE UHD-SDI Rx core clock
> > > +  - description: Video clock
> > > +
> > > +  clock-names:
> > > +items:
> > > +  - const: s_axi_aclk
> > > +  - const: sdi_rx_clk
> > > +  - const: video_out_clk
> > > +
> > > +  xlnx,bpp:
> > > +description: Bits per pixel supported. Can be 10 or 12 bits per 
> > > pixel only.
> > > +allOf:
> > > +  - $ref: "/schemas/types.yaml#/definitions/uint32"
> > > +  - enum: [10, 12]
> > 
> > I don't see this as a design parameter in the documentation (pg290,
> > v2.0). What does it correspond to ? All the BPC mentions in the
> > documentation always state that 10-bit is the only supported value.
> 
> The new version of IP being released will have 10 and 12 bit support. It is 
> already in the Xilinx linux-xlnx repo.
> I will rename this to "xlnx,bpc" instead of "xlnx,bpp" to refer to bits per 
> component.

Is the documentation for the new IP core version available ? Should this
property only be allowed for the new version, given that in v2.0 the BPC
is fixed to 10 ?

> > > +
> > > +  xlnx,line-rate:
> > > +description: |
> > > +  The maximum mode supported by the design. Possible values are as 
> > > below
> > > +  12G_SDI_8DS - 12G mode with 8 data streams
> > > +  6G_SDI  -  6G mode
> > > +  3G_SDI  -  3G mode
> > > +enum:
> > > +  - 12G_SDI_8DS
> > > +  - 6G_SDI
> > > +  - 3G_SDI
> > 
> > How about making this an integer property, with #define in
> > include/dt-bindings/media/xilinx-sdi.h ? As far as I understand, the SDI
> > TX subsystem has the same parameter, so the #define could be shared
> > between the two.
> 
> Yes that is ok with me. I will add this in the next version.
> 
> > > +
> > > +  xlnx,include-edh:
> > > +type: boolean
> > > +description: |
> > > +  This is present when the Error Detection and Handling processor is
> > > +  enabled in design.
> > > +
> > > +  ports:
> > > +type: object
> > > +description: |
> > > +  Generally the SDI port is connected to a device like SDI Broadcast 
> > > camera
> > > +  which is independently controlled. Hence port@0 is 

[PATCH v3 4/4] selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD

2020-06-02 Thread Sargun Dhillon
Test whether we can add file descriptors in response to notifications.
This injects the file descriptors via notifications, and then uses
kcmp to determine whether or not it has been successful.

It also includes some basic sanity checking for arguments.

Signed-off-by: Sargun Dhillon 
Cc: Al Viro 
Cc: Chris Palmer 
Cc: Christian Brauner 
Cc: Jann Horn 
Cc: Kees Cook 
Cc: Robert Sesek 
Cc: Tycho Andersen 
Cc: Matt Denton 
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 183 ++
 1 file changed, 183 insertions(+)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 402ccb3a4e52..a786b1734ddd 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -182,6 +183,12 @@ struct seccomp_metadata {
 #define SECCOMP_IOCTL_NOTIF_SEND   SECCOMP_IOWR(1, \
struct seccomp_notif_resp)
 #define SECCOMP_IOCTL_NOTIF_ID_VALID   SECCOMP_IOR(2, __u64)
+/* On success, the return value is the remote process's added fd number */
+#define SECCOMP_IOCTL_NOTIF_ADDFD  SECCOMP_IOR(3,  \
+   struct seccomp_notif_addfd)
+
+/* valid flags for seccomp_notif_addfd */
+#define SECCOMP_ADDFD_FLAG_SETFD   (1UL << 0) /* Specify remote fd */
 
 struct seccomp_notif {
__u64 id;
@@ -202,6 +209,15 @@ struct seccomp_notif_sizes {
__u16 seccomp_notif_resp;
__u16 seccomp_data;
 };
+
+struct seccomp_notif_addfd {
+   __u64 size;
+   __u64 id;
+   __u32 flags;
+   __u32 srcfd;
+   __u32 newfd;
+   __u32 newfd_flags;
+};
 #endif
 
 #ifndef PTRACE_EVENTMSG_SYSCALL_ENTRY
@@ -3822,6 +3838,173 @@ TEST(user_notification_filter_empty_threaded)
EXPECT_GT((pollfd.revents & POLLHUP) ?: 0, 0);
 }
 
+TEST(user_notification_sendfd)
+{
+   pid_t pid;
+   long ret;
+   int status, listener, memfd;
+   struct seccomp_notif_addfd addfd = {};
+   struct seccomp_notif req = {};
+   struct seccomp_notif_resp resp = {};
+   /* 100 ms */
+   struct timespec delay = { .tv_nsec = 1 };
+
+   memfd = memfd_create("test", 0);
+   ASSERT_GE(memfd, 0);
+
+   ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+   ASSERT_EQ(0, ret) {
+   TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+   }
+
+   /* Check that the basic notification machinery works */
+   listener = user_trap_syscall(__NR_getppid,
+SECCOMP_FILTER_FLAG_NEW_LISTENER);
+   ASSERT_GE(listener, 0);
+
+   pid = fork();
+   ASSERT_GE(pid, 0);
+
+   if (pid == 0) {
+   if (syscall(__NR_getppid) != USER_NOTIF_MAGIC)
+   exit(1);
+   exit(syscall(__NR_getppid) != USER_NOTIF_MAGIC);
+   }
+
+   ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, ), 0);
+
+   addfd.size = sizeof(addfd);
+   addfd.srcfd = memfd;
+   addfd.newfd_flags = O_CLOEXEC;
+   addfd.newfd = 0;
+   addfd.id = req.id;
+   addfd.flags = 0xff;
+
+   /* Verify bad flags cannot be set */
+   EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, ), -1);
+   EXPECT_EQ(errno, EINVAL);
+
+   /* Verify that remote_fd cannot be set without setting flags */
+   addfd.flags = 0;
+   addfd.newfd = 1;
+   EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, ), -1);
+   EXPECT_EQ(errno, EINVAL);
+
+   /* Verify we can set an arbitrary remote fd */
+   addfd.newfd = 0;
+
+   ret = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, );
+   EXPECT_GE(ret, 0);
+   EXPECT_EQ(filecmp(getpid(), pid, memfd, ret), 0);
+
+   /* Verify we can set a specific remote fd */
+   addfd.newfd = 42;
+   addfd.flags = SECCOMP_ADDFD_FLAG_SETFD;
+
+   EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, ), 42);
+   EXPECT_EQ(filecmp(getpid(), pid, memfd, 42), 0);
+
+   resp.id = req.id;
+   resp.error = 0;
+   resp.val = USER_NOTIF_MAGIC;
+
+   EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, ), 0);
+
+   /*
+* This sets the ID of the ADD FD to the last request plus 1. The
+* notification ID increments 1 per notification.
+*/
+   addfd.id = req.id + 1;
+
+   /* This spins until the underlying notification is generated */
+   while (ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, ) != -1 &&
+  errno != -EINPROGRESS)
+   nanosleep(, NULL);
+
+   memset(, 0, sizeof(req));
+   ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, ), 0);
+   ASSERT_EQ(addfd.id, req.id);
+
+   resp.id = req.id;
+   resp.error = 0;
+   resp.val = USER_NOTIF_MAGIC;
+   EXPECT_EQ(ioctl(listener, 

Re: general protection fault in nfsd_reply_cache_free_locked

2020-06-02 Thread J. Bruce Fields
On Mon, May 11, 2020 at 11:55:16PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:

This is like


https://lore.kernel.org/linux-nfs/5016dd05a5e6b...@google.com/

in that we're discovering the drc is corrupt while destroying it.

I don't see the problem yet.

--b.

> 
> HEAD commit:6e7f2eac Merge tag 'arm64-fixes' of git://git.kernel.org/p..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1456703410
> kernel config:  https://syzkaller.appspot.com/x/.config?x=b0212dbee046bc1f
> dashboard link: https://syzkaller.appspot.com/bug?extid=a29df412692980277f9d
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+a29df412692980277...@syzkaller.appspotmail.com
> 
> general protection fault, probably for non-canonical address 
> 0xdc02:  [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0010-0x0017]
> CPU: 0 PID: 27932 Comm: kworker/u4:4 Not tainted 5.7.0-rc4-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Workqueue: netns cleanup_net
> RIP: 0010:nfsd_reply_cache_free_locked+0x2d/0x380 fs/nfsd/nfscache.c:122
> Code: 56 41 55 41 54 49 89 fc 55 48 89 f5 53 48 89 d3 e8 08 c0 2f ff 48 8d 7d 
> 61 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 
> 83 e2 07 38 d0 7f 08 84 c0 0f 85 a7 02 00 00
> RSP: 0018:c90008bb7b70 EFLAGS: 00010202
> RAX: dc00 RBX: 88826000 RCX: dc00
> RDX: 0002 RSI: 82436ea8 RDI: 0011
> RBP: ffb0 R08: 888093792400 R09: fbfff185cf3e
> R10: 8c2e79ef R11: fbfff185cf3d R12: 88800010
> R13: 88800018 R14:  R15: 88800010
> FS:  () GS:8880ae60() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 001b2c531000 CR3: 685a2000 CR4: 001426f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  nfsd_reply_cache_shutdown+0x150/0x350 fs/nfsd/nfscache.c:203
>  nfsd_exit_net+0x189/0x4c0 fs/nfsd/nfsctl.c:1504
>  ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
>  cleanup_net+0x511/0xa50 net/core/net_namespace.c:603
>  process_one_work+0x965/0x16a0 kernel/workqueue.c:2268
>  worker_thread+0x96/0xe20 kernel/workqueue.c:2414
>  kthread+0x388/0x470 kernel/kthread.c:268
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> Modules linked in:
> ---[ end trace 54f06072fc6a1afa ]---
> RIP: 0010:nfsd_reply_cache_free_locked+0x2d/0x380 fs/nfsd/nfscache.c:122
> Code: 56 41 55 41 54 49 89 fc 55 48 89 f5 53 48 89 d3 e8 08 c0 2f ff 48 8d 7d 
> 61 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 
> 83 e2 07 38 d0 7f 08 84 c0 0f 85 a7 02 00 00
> RSP: 0018:c90008bb7b70 EFLAGS: 00010202
> RAX: dc00 RBX: 88826000 RCX: dc00
> RDX: 0002 RSI: 82436ea8 RDI: 0011
> RBP: ffb0 R08: 888093792400 R09: fbfff185cf3e
> R10: 8c2e79ef R11: fbfff185cf3d R12: 88800010
> R13: 88800018 R14:  R15: 88800010
> FS:  () GS:8880ae60() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 001b2c531000 CR3: 94cfb000 CR4: 001426f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


[PATCH v3 0/4] Add seccomp notifier ioctl that enables adding fds

2020-06-02 Thread Sargun Dhillon
This adds the capability for seccomp notifier listeners to add file
descriptors in response to a seccomp notification. This is useful for
syscalls in which the previous capabilities were not sufficient. The
current mechanism works well for syscalls that either have side effects
that are system / namespace wide (mount), or that operate on a specific
set of registers (reboot, mknod), and don't require dereferencing pointers.
The problem with derefencing pointers in a supervisor is that it leaves
us vulnerable to TOC-TOU [1] style attacks. For syscalls that had a direct
effect on file descriptors pidfd_getfd was added, allowing for those file
descriptors to be directly operated upon by the supervisor [2].

Unfortunately, this leaves system calls which return file descriptors
out of the picture. These are fairly common syscalls, such as openat,
socket, and perf_event_open that return file descriptors, and have
arguments that are pointers. These require that the supervisor is able to
verify the arguments, make the call on behalf of the process on hand,
and pass back the resulting file descriptor. This is where addfd comes
into play.

There is an additional flag that allows you to "set" an FD, rather than
add it with an arbitrary number. This has dup2 style semantics, and
installs the new file at that file descriptor, and atomically closes
the old one if it existed. This is useful for a particular use case
that we have, in which we want to swap out AF_INET sockets for AF_UNIX,
AF_INET6, and sockets in another namespace when doing "upconversion".

My specific usecase at Netflix is to enable our IPv4-IPv6 transition
mechanism, in which we our namespaces have no real IPv4 reachability,
and when it comes time to do a connect(2), we get a socket from a
namespace with global IPv4 reachability.

In addition, we intend to use it for our servicemesh, and where our
service mesh needs to intercept traffic ingress traffic, the addfd
capability will act as a mechanism to do socket activation.

Addfd is not implemented as a separate syscall, a la pidfd_getfd, as
VFS makes some optimizations in regards to the fdtable, and assumes
that they are not modified by external processes. Although a mechanism
that scheduled something in the context of the task could work, it is
somewhat simpler to do it in the context of the ioctl as we control
the task while in kernel. In addition there are not obvious needs
for this beyond seccomp notifier.

This mechanism leaves a potential issue that if the manager is
interrupted while injecting FDs, the child process will be left with
leaked / dangling FDs. This may lead to undefined behaviour. A
mechanism to work around this is to extend the structure and add a
"rollback" mechanism for FDs to be closed if things fail.

This introduces a new helper -- file_receive, which is responsible
for moving fds across processes. The helper replaces code in
SCM_RIGHTS. In SCM_RIGHTS compat codepath there was a bug that
resulted in this not being set all. This fixes that bug, and should
be cherry-picked into long-term. The file_receive change should
probably go into stable. The file_receive code also replaced the
receive fd logic in pidfd_getfd. This is somewhat contrary to my
original view[5], but I think it is best for the principal of
least surprise to adopt it. This should be cherry-picked into stable.

I tested this on amd64 with the x86-64 and x32 ABIs.

Given there is no testing infrastructure for cgroup v1, I opted to
forgo adding new tests there as it is considered deprecated.

Changes since v2:
 * Introducion of the file_receive helper which hoists out logic to
   manipulate file descriptors outside of seccomp.c to file.c
 * Small fix that manipulated the socket's cgroup even when the
   receive failed
 * seccomp struct layout
Changes since v1:
 * find_notification has been cleaned up slightly, and it replaces a use
   case in send as well.
 * Fixes ref counting rules to get / release references in the ioctl side,
   rather than the seccomp notifier side [3].
 * Removes the optional move flag, and opts into SCM_RIGHTS
 * Rearranges the seccomp_notif_addfd datastructure for greater user
   clarity [4]. In order to avoid unnamed padding it makes size u64,
   which is a little bit of a waste of space.
 * Changes error codes to return ESRCH upon the process going away on
   notification, and EINPROGRESS is the notification is in an unexpected
   state (and added tests for this behaviour)

[1]: 
https://lore.kernel.org/lkml/20190918084833.9369-2-christian.brau...@ubuntu.com/
[2]: https://lore.kernel.org/lkml/20200107175927.4558-1-sar...@sargun.me/
[3]: https://lore.kernel.org/lkml/20200525000537.gb23...@zeniv.linux.org.uk/
[4]: https://lore.kernel.org/lkml/20200525135036.vp2nmmx42y7dfznf@wittgenstein/
[5]: https://lore.kernel.org/lkml/20200107175927.4558-1-sar...@sargun.me/

Sargun Dhillon (4):
  fs, net: Standardize on file_receive helper to move fds across
processes
  pid: Use file_receive helper to 

[PATCH v3 1/4] fs, net: Standardize on file_receive helper to move fds across processes

2020-06-02 Thread Sargun Dhillon
Previously there were two chunks of code where the logic to receive file
descriptors was duplicated in net. The compat version of copying
file descriptors via SCM_RIGHTS did not have logic to update cgroups.
Logic to change the cgroup data was added in:
commit 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set 
correctly")
commit d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set 
correctly")

This was not copied to the compat path. This commit fixes that, and thus
should be cherry-picked into stable.

This introduces a helper (file_receive) which encapsulates the logic for
handling calling security hooks as well as manipulating cgroup information.
This helper can then be used other places in the kernel where file
descriptors are copied between processes

I tested cgroup classid setting on both the compat (x32) path, and the
native path to ensure that when moving the file descriptor the classid
is set.

Signed-off-by: Sargun Dhillon 
Suggested-by: Kees Cook 
Cc: Al Viro 
Cc: Christian Brauner 
Cc: Daniel Wagner 
Cc: David S. Miller 
Cc: Jann Horn ,
Cc: John Fastabend 
Cc: Tejun Heo 
Cc: Tycho Andersen 
Cc: sta...@vger.kernel.org
Cc: cgro...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 fs/file.c| 35 +++
 include/linux/file.h |  1 +
 net/compat.c | 10 +-
 net/core/scm.c   | 14 --
 4 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index abb8b7081d7a..5afd76fca8c2 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -18,6 +18,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 unsigned int sysctl_nr_open __read_mostly = 1024*1024;
 unsigned int sysctl_nr_open_min = BITS_PER_LONG;
@@ -931,6 +934,38 @@ int replace_fd(unsigned fd, struct file *file, unsigned 
flags)
return err;
 }
 
+/*
+ * File Receive - Receive a file from another process
+ *
+ * This function is designed to receive files from other tasks. It encapsulates
+ * logic around security and cgroups. The file descriptor provided must be a
+ * freshly allocated (unused) file descriptor.
+ *
+ * This helper does not consume a reference to the file, so the caller must put
+ * their reference.
+ *
+ * Returns 0 upon success.
+ */
+int file_receive(int fd, struct file *file)
+{
+   struct socket *sock;
+   int err;
+
+   err = security_file_receive(file);
+   if (err)
+   return err;
+
+   fd_install(fd, get_file(file));
+
+   sock = sock_from_file(file, );
+   if (sock) {
+   sock_update_netprioidx(>sk->sk_cgrp_data);
+   sock_update_classid(>sk->sk_cgrp_data);
+   }
+
+   return 0;
+}
+
 static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
 {
int err = -EBADF;
diff --git a/include/linux/file.h b/include/linux/file.h
index 142d102f285e..7b56dc23e560 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -94,4 +94,5 @@ extern void fd_install(unsigned int fd, struct file *file);
 extern void flush_delayed_fput(void);
 extern void __fput_sync(struct file *);
 
+extern int file_receive(int fd, struct file *file);
 #endif /* __LINUX_FILE_H */
diff --git a/net/compat.c b/net/compat.c
index 4bed96e84d9a..8ac0e7e09208 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -293,9 +293,6 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct 
scm_cookie *scm)
 
for (i = 0, cmfptr = (int __user *) CMSG_COMPAT_DATA(cm); i < fdmax; 
i++, cmfptr++) {
int new_fd;
-   err = security_file_receive(fp[i]);
-   if (err)
-   break;
err = get_unused_fd_flags(MSG_CMSG_CLOEXEC & kmsg->msg_flags
  ? O_CLOEXEC : 0);
if (err < 0)
@@ -306,8 +303,11 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct 
scm_cookie *scm)
put_unused_fd(new_fd);
break;
}
-   /* Bump the usage count and install the file. */
-   fd_install(new_fd, get_file(fp[i]));
+   err = file_receive(new_fd, fp[i]);
+   if (err) {
+   put_unused_fd(new_fd);
+   break;
+   }
}
 
if (i > 0) {
diff --git a/net/core/scm.c b/net/core/scm.c
index dc6fed1f221c..ba93abf2881b 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -303,11 +303,7 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie 
*scm)
for (i=0, cmfptr=(__force int __user *)CMSG_DATA(cm); imsg_flags
  ? O_CLOEXEC : 0);
if (err < 0)
@@ -318,13 +314,11 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie 
*scm)
put_unused_fd(new_fd);
break;
}
-   /* Bump the usage count and install 

Re: [PATCH] pinctrl: sirf: Add missing put_device() call in sirfsoc_gpio_probe()

2020-06-02 Thread yukuai (C)

On 2020/6/3 2:56, Markus Elfring wrote:

in sirfsoc_gpio_probe(), if of_find_device_by_node() succeed,
put_device() is missing in the error handling patch.


How do you think about another wording variant?

A coccicheck run provided information like the following.

drivers/pinctrl/sirf/pinctrl-sirf.c:798:2-8: ERROR: missing put_device;
call of_find_device_by_node on line 792, but without a corresponding
object release within this function.

Generated by: scripts/coccinelle/free/put_device.cocci

Thus add a jump target to fix the exception handling for this
function implementation.


Would you like to add the tag “Fixes” to the commit message?


Will do, thanks for your advise!

Yu Kuai



Re: [PATCH 4/6] KVM: X86: Split kvm_update_cpuid()

2020-06-02 Thread Sean Christopherson
On Fri, May 29, 2020 at 04:55:43PM +0800, Xiaoyao Li wrote:
> Split the part of updating KVM states from kvm_update_cpuid(), and put
> it into a new kvm_update_state_based_on_cpuid(). So it's clear that
> kvm_update_cpuid() is to update guest CPUID settings, while
> kvm_update_state_based_on_cpuid() is to update KVM states based on the
> updated CPUID settings.

What about kvm_update_vcpu_model()?  "state" isn't necessarily correct
either.


Re: [GIT PULL] SELinux patches for v5.8

2020-06-02 Thread pr-tracker-bot
The pull request you sent on Mon, 1 Jun 2020 21:06:48 -0400:

> git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git 
> tags/selinux-pr-20200601

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/f41030a20b38552a2da3b3f6bc9e7a78637d6c23

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL][Security] lockdown: Allow unprivileged users to see lockdown status

2020-06-02 Thread pr-tracker-bot
The pull request you sent on Tue, 2 Jun 2020 12:15:04 +1000 (AEST):

> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
> next-general

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/56f2e3b7d819f4fa44857ba81aa6870f18714ea0

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] Audit patches for v5.8

2020-06-02 Thread pr-tracker-bot
The pull request you sent on Mon, 1 Jun 2020 20:48:59 -0400:

> git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git 
> tags/audit-pr-20200601

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/9d99b1647fa56805c1cfef2d81ee7b9855359b62

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PATCH] sbp-target: add the missed kfree() in an error path

2020-06-02 Thread Martin K. Petersen


Chris,

> I think you might be right. I also don't have much time to maintain it
> these days and the hardware I had is long dead.

In that case I'd appreciate a patch to remove it.

Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [GIT PULL][Security] lockdown: Allow unprivileged users to see lockdown status

2020-06-02 Thread Linus Torvalds
On Mon, Jun 1, 2020 at 7:15 PM James Morris  wrote:
>
> Just one update for the security subsystem: allows unprivileged users to
> see the status of the lockdown feature. From Jeremy Cline.

Hmm.

That branch seems to have sprouted another commit just today.

I ended up taking that too as trivial, but it shows how you seem to
basically send me a pointer to a live branch. Please don't do that.
When you make changes to that branch, I now get those changes that you
may not have meant to send me (and that I get upset for being
surprised by).

An easy solution to that is to send me a signed tag instead of a
pointer to a branch. Then you can continue to update the branch, while
the tag stays stable.

Plus we've been encouraging signed tags for pull requests anyway.

  Linus


Re: [PATCH -next] IB/hfi1: Use free_netdev() in hfi1_netdev_free()

2020-06-02 Thread Jason Gunthorpe
On Tue, Jun 02, 2020 at 02:16:35PM +0800, YueHaibing wrote:
> dummy_netdev shold be freed by free_netdev() instead of
> kfree(). Also remove unneeded variable 'priv'
> 
> Fixes: 4730f4a6c6b2 ("IB/hfi1: Activate the dummy netdev")
> Signed-off-by: YueHaibing 
> Reported-by: kbuild test robot 
> Reported-by: Dan Carpenter 
> Reviewed-by: Dennis Dalessandro 
> ---
>  drivers/infiniband/hw/hfi1/netdev_rx.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)

Applied to for-next, thanks

Jason


Re: [PATCH v3] iommu/vt-d: Don't apply gfx quirks to untrusted devices

2020-06-02 Thread Prashant Malani
(Trimming text)

On Wed, Jun 03, 2020 at 12:23:48AM +, Rajat Jain wrote:
> On Tue, Jun 2, 2020 at 4:49 PM Prashant Malani  wrote:
> >
> > Hi Rajat,
> 
> Hi Prashant, thanks for taking a look.
> 
> >
> > On Tue, Jun 02, 2020 at 04:26:02PM -0700, Rajat Jain wrote:
> > > +static bool risky_device(struct pci_dev *pdev)
> > > +{
> > > + if (pdev->untrusted) {
> > > + pci_warn(pdev,
> > > +  "Skipping IOMMU quirk for dev (%04X:%04X) on 
> > > untrusted"
> > > +  " PCI link. Please check with your BIOS/Platform"
> > > +  " vendor about this\n", pdev->vendor, 
> > > pdev->device);
> > > + return true;
> > > + }
> > > + return false;
> > minor suggestion: Perhaps you could use a guard clause here? It would save 
> > you
> > a level of indentation, and possibly allow better string splitting
> > (e.g keeping "untrusted PCI" together). So something like:
> >
> > if (!pdev->untrusted)
> > return false;
> 
> I personally have found double negation expressions always confusing,
> even if negation is part of the variable. (For e.g. I have found I
> need to be always stop and convince myself that:
> 
> "if (!pdev->untrusted)"
> 
> 
> conceptually implies
> 
> "if (pdev->trusted)".
> 
> 
> So I tend to keep negations to minimum. In this case, it doesn't buy
> us much either, so I'd prefer to keep it the same unless there are
> more opinions on this. OTOH I don't mind changing it too if you feel
> strongly about this.

Ordinarily, I'd agree with you regarding double-negatives.

However, in this case the condition phrasing is so brief ("not untrusted") that 
I'd
argue the indentation savings outweigh possible interpretation issues.

That said, I don't have a strong opinion here, so will defer to the 
maintainer's preference.

Best,

> 
> Thanks,
> 
> Rajat
> 
> 
> >
> > pci_warn(...);
> >
> > I also hear the column limit warning is now for 100 chars [1], though
> > I'm not sure how it's being handled in this file.
> >
> > Best regards,
> >
> > -Prashant
> >
> > [1]:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/Documentation/process/coding-style.rst?id=bdc48fa11e46f867ea4d75fa59ee87a7f48be144
> >
> > > +}
> > > +
> > >  const struct iommu_ops intel_iommu_ops = {
> > >   .capable= intel_iommu_capable,
> > >   .domain_alloc   = intel_iommu_domain_alloc,
> > > @@ -6214,6 +6231,9 @@ const struct iommu_ops intel_iommu_ops = {
> > >
> > >  static void quirk_iommu_igfx(struct pci_dev *dev)
> > >  {
> > > + if (risky_device(dev))
> > > + return;
> > > +
> > >   pci_info(dev, "Disabling IOMMU for graphics on this chipset\n");
> > >   dmar_map_gfx = 0;
> > >  }
> > > @@ -6255,6 +6275,9 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 
> > > 0x163D, quirk_iommu_igfx);
> > >
> > >  static void quirk_iommu_rwbf(struct pci_dev *dev)
> > >  {
> > > + if (risky_device(dev))
> > > + return;
> > > +
> > >   /*
> > >* Mobile 4 Series Chipset neglects to set RWBF capability,
> > >* but needs it. Same seems to hold for the desktop versions.
> > > @@ -6285,6 +6308,9 @@ static void quirk_calpella_no_shadow_gtt(struct 
> > > pci_dev *dev)
> > >  {
> > >   unsigned short ggc;
> > >
> > > + if (risky_device(dev))
> > > + return;
> > > +
> > >   if (pci_read_config_word(dev, GGC, ))
> > >   return;
> > >
> > > @@ -6318,6 +6344,12 @@ static void __init check_tylersburg_isoch(void)
> > >   pdev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x3a3e, NULL);
> > >   if (!pdev)
> > >   return;
> > > +
> > > + if (risky_device(pdev)) {
> > > + pci_dev_put(pdev);
> > > + return;
> > > + }
> > > +
> > >   pci_dev_put(pdev);
> > >
> > >   /* System Management Registers. Might be hidden, in which case
> > > @@ -6327,6 +6359,11 @@ static void __init check_tylersburg_isoch(void)
> > >   if (!pdev)
> > >   return;
> > >
> > > + if (risky_device(pdev)) {
> > > + pci_dev_put(pdev);
> > > + return;
> > > + }
> > > +
> > >   if (pci_read_config_dword(pdev, 0x188, )) {
> > >   pci_dev_put(pdev);
> > >   return;
> > > --
> > > 2.27.0.rc2.251.g90737beb825-goog
> > >


Re: [GIT PULL] SELinux patches for v5.8

2020-06-02 Thread Linus Torvalds
On Mon, Jun 1, 2020 at 6:07 PM Paul Moore  wrote:
>
> - A number of improvements to various SELinux internal data structures
> to help improve performance.  We move the role transitions into a hash
> table.  In the content structure we shift from hashing the content
> string (aka SELinux label) to the structure itself, when it is valid.
> This last change not only offers a speedup, but it helps us simplify
> the code some as well.

Side note since you mention performance work: in the past when I've
looked at SELinux performance (generally as part of pathname lookup
etc VFS loads), the biggest cost by far was that all the SELinux data
structures take a ton of cache misses.

Yes, some of the hashing shows up in the profiles, but _most_ of it
was loading the data from inode->i_security etc.

And the reason seemed to be that every single inode ends up having a
separately allocated "struct inode_security_struct" (aka "isec"). Even
if the contents are often all exactly the same for a large set of
inodes that thus _could_ conceptually share the data.

Now, it used to be - before being able to stack security layers -
SElinux would control that pointer, and it could have done some kind
of sharing scheme with copy-on-write behavior (the way we do 'struct
cred' for processes), and it would have caused a much smaller cache
footprint (and thus likely much fewer cache misses).

These days, that sharing of the i_security pointer across different
security layers makes that sound really really painful.

But I do wonder if anybody in selinux land (or general security
subsystem land) has been thinking of maybe at least having a "this
inode has no special labeling" marker that could possibly avoid having
all those extra allocations.

Because it really does end up being visible in profiles how costly it
is to look up any data behind inode->i_security.

   Linus


Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

2020-06-02 Thread Manivannan Sadhasivam
On Tue, Jun 02, 2020 at 01:04:26PM -0700, Brian Norris wrote:
> On Tue, Jun 2, 2020 at 12:40 PM John Stultz  wrote:
> > On Tue, Jun 2, 2020 at 12:16 PM Brian Norris  
> > wrote:
> > > On Mon, Jun 1, 2020 at 10:25 PM John Stultz  
> > > wrote:
> > > >
> > > > Ever since 5.7-rc1, if we call
> > > > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
> > > > reboot, resulting in the device getting stuck in the usb crash
> > > > debug mode and not coming back up wihthout a hard power off.
> > > >
> > > > This hack avoids the issue by returning early in
> > > > ath10k_qmi_event_server_exit().
> > > >
> > > > A better solution is very much desired!
> > >
> > > Any chance you can bisect what caused this? There are a lot of
> > > non-ath10k pieces involved in this stuff.
> >
> > Amit had spent some work on chasing it down to the in kernel qrtr-ns
> > work, and reported it here:
> >   https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html
> >
> > But that discussion seemingly stalled out, so I came up with this hack
> > to workaround it for us.
> 
> If I'm reading it right, then that means we should revert this stuff
> from v5.7-rc1:
> 
> 0c2204a4ad71 net: qrtr: Migrate nameservice to kernel from userspace
> 
> At least, until people can resolve the tail end of that thread. New
> features (ath11k, etc.) are not a reason to break existing features
> (ath10k/wcn3990).

I don't agree with this. If you read through the replies to the bug report,
it is clear that NS migration uncovered a corner case or even a bug. So we
should try to fix that indeed.

Govind: Did you get chance to work on fixing this issue?

Thanks,
Mani

> 
> Brian


Re: [PATCH v3] iommu/vt-d: Don't apply gfx quirks to untrusted devices

2020-06-02 Thread Rajat Jain
On Tue, Jun 2, 2020 at 4:49 PM Prashant Malani  wrote:
>
> Hi Rajat,

Hi Prashant, thanks for taking a look.

>
> On Tue, Jun 02, 2020 at 04:26:02PM -0700, Rajat Jain wrote:
> > Currently, an external malicious PCI device can masquerade the VID:PID
> > of faulty gfx devices, and thus apply iommu quirks to effectively
> > disable the IOMMU restrictions for itself.
> >
> > Thus we need to ensure that the device we are applying quirks to, is
> > indeed an internal trusted device.
> >
> > Signed-off-by: Rajat Jain 
> > Acked-by: Lu Baolu 
> > ---
> > v3: - Separate out the warning mesage in a function to be called from
> >   other places. Change the warning string as suggested.
> > v2: - Change the warning print strings.
> > - Add Lu Baolu's acknowledgement.
> >
> >  drivers/iommu/intel-iommu.c | 37 +
> >  1 file changed, 37 insertions(+)
> >
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index ef0a5246700e5..dc859f02985a0 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -6185,6 +6185,23 @@ intel_iommu_domain_set_attr(struct iommu_domain 
> > *domain,
> >   return ret;
> >  }
> >
> > +/*
> > + * Check that the device does not live on an external facing PCI port that 
> > is
> > + * marked as untrusted. Such devices should not be able to apply quirks and
> > + * thus not be able to bypass the IOMMU restrictions.
> > + */
> > +static bool risky_device(struct pci_dev *pdev)
> > +{
> > + if (pdev->untrusted) {
> > + pci_warn(pdev,
> > +  "Skipping IOMMU quirk for dev (%04X:%04X) on 
> > untrusted"
> > +  " PCI link. Please check with your BIOS/Platform"
> > +  " vendor about this\n", pdev->vendor, pdev->device);
> > + return true;
> > + }
> > + return false;
> minor suggestion: Perhaps you could use a guard clause here? It would save you
> a level of indentation, and possibly allow better string splitting
> (e.g keeping "untrusted PCI" together). So something like:
>
> if (!pdev->untrusted)
> return false;

I personally have found double negation expressions always confusing,
even if negation is part of the variable. (For e.g. I have found I
need to be always stop and convince myself that:

"if (!pdev->untrusted)"


conceptually implies

"if (pdev->trusted)".


So I tend to keep negations to minimum. In this case, it doesn't buy
us much either, so I'd prefer to keep it the same unless there are
more opinions on this. OTOH I don't mind changing it too if you feel
strongly about this.

Thanks,

Rajat


>
> pci_warn(...);
>
> I also hear the column limit warning is now for 100 chars [1], though
> I'm not sure how it's being handled in this file.
>
> Best regards,
>
> -Prashant
>
> [1]:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/Documentation/process/coding-style.rst?id=bdc48fa11e46f867ea4d75fa59ee87a7f48be144
>
> > +}
> > +
> >  const struct iommu_ops intel_iommu_ops = {
> >   .capable= intel_iommu_capable,
> >   .domain_alloc   = intel_iommu_domain_alloc,
> > @@ -6214,6 +6231,9 @@ const struct iommu_ops intel_iommu_ops = {
> >
> >  static void quirk_iommu_igfx(struct pci_dev *dev)
> >  {
> > + if (risky_device(dev))
> > + return;
> > +
> >   pci_info(dev, "Disabling IOMMU for graphics on this chipset\n");
> >   dmar_map_gfx = 0;
> >  }
> > @@ -6255,6 +6275,9 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x163D, 
> > quirk_iommu_igfx);
> >
> >  static void quirk_iommu_rwbf(struct pci_dev *dev)
> >  {
> > + if (risky_device(dev))
> > + return;
> > +
> >   /*
> >* Mobile 4 Series Chipset neglects to set RWBF capability,
> >* but needs it. Same seems to hold for the desktop versions.
> > @@ -6285,6 +6308,9 @@ static void quirk_calpella_no_shadow_gtt(struct 
> > pci_dev *dev)
> >  {
> >   unsigned short ggc;
> >
> > + if (risky_device(dev))
> > + return;
> > +
> >   if (pci_read_config_word(dev, GGC, ))
> >   return;
> >
> > @@ -6318,6 +6344,12 @@ static void __init check_tylersburg_isoch(void)
> >   pdev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x3a3e, NULL);
> >   if (!pdev)
> >   return;
> > +
> > + if (risky_device(pdev)) {
> > + pci_dev_put(pdev);
> > + return;
> > + }
> > +
> >   pci_dev_put(pdev);
> >
> >   /* System Management Registers. Might be hidden, in which case
> > @@ -6327,6 +6359,11 @@ static void __init check_tylersburg_isoch(void)
> >   if (!pdev)
> >   return;
> >
> > + if (risky_device(pdev)) {
> > + pci_dev_put(pdev);
> > + return;
> > + }
> > +
> >   if (pci_read_config_dword(pdev, 0x188, )) {
> >   pci_dev_put(pdev);
> >   return;
> > --
> > 2.27.0.rc2.251.g90737beb825-goog
> 

Re: kobject_init_and_add is easy to misuse

2020-06-02 Thread Jason Gunthorpe
On Tue, Jun 02, 2020 at 02:51:10PM -0700, James Bottomley wrote:

> My first thought was "what?  I got suckered into creating a patch",
> thanks ;-)  But now I look, all the error paths do unwind back to the
> initial state, so kfree() on error looks to be completely correct. 

It doesn't fully unwind if the kobject is put into a kset, then
another thread can get the kref during kset_find_obj() and kfree() won't
wait for the kref to go to 0. It must use put.

Jason


Re: [PATCH v2 1/2] video: fbdev: amifb: add FIXME about dead APUS support

2020-06-02 Thread Finn Thain
On Tue, 2 Jun 2020, Al Viro wrote:

> I have done that on aranym (which is how I'd been doing all testing for 
> e.g. signal-related m68k patches) and I've seen references to some 
> out-of-tree qemu variant doing quadra, but nothing for amiga 
> emulators...
> 

Laurent Vivier's Quadra 800 emulation is no longer out of tree. It 
appeared in QEMU v4.2.0 and ethernet support was stabilized in QEMU 
v5.0.0.


Re: [GIT PULL] Audit patches for v5.8

2020-06-02 Thread Linus Torvalds
On Mon, Jun 1, 2020 at 5:49 PM Paul Moore  wrote:
>
>   Unfortunately I just noticed
> that one of the commit subject lines is truncated - sorry about that,
> it's my fault not Richard's - but since the important part is there
> ("add subj creds to NETFILTER_CFG") I opted to leave it as-is and not
> disrupt the git log.  If you would rather have the subject line fixed,
> let me know and I'll correct it.

It looks a bit odd, but not worth the churn of fixing up. Thanks, pulled,

  Linus


Re: [PATCH 00/10] fix swiotlb-xen for RPi4

2020-06-02 Thread Boris Ostrovsky
On 6/2/20 5:51 PM, Stefano Stabellini wrote:
> I would like to ask the maintainers, Juergen, Boris, Konrad, whether you
> have any more feedback before I send v2 of the series.


I think I only had one comment and that's all. Most were from Julien.


-boris


>
> Cheers,
>
> Stefano
>
>
> On Wed, 20 May 2020, Stefano Stabellini wrote:
>> Hi all,
>>
>> This series is a collection of fixes to get Linux running on the RPi4 as
>> dom0.
>>
>> Conceptually there are only two significant changes:
>>
>> - make sure not to call virt_to_page on vmalloc virt addresses (patch
>>   #1)
>> - use phys_to_dma and dma_to_phys to translate phys to/from dma
>>   addresses (all other patches)
>>
>> In particular in regards to the second part, the RPi4 is the first
>> board where Xen can run that has the property that dma addresses are
>> different from physical addresses, and swiotlb-xen was written with the
>> assumption that phys addr == dma addr.
>>
>> This series adds the phys_to_dma and dma_to_phys calls to make it work.
>>
>>
>> Cheers,
>>
>> Stefano
>>



Re: [PATCH v5 0/3] close_range()

2020-06-02 Thread Linus Torvalds
On Tue, Jun 2, 2020 at 4:33 PM Christian Brauner
 wrote:
> >
> > And maybe this _did_ get mentioned last time, and I just don't find
> > it. I also don't see anything like that in the patches, although the
> > flags argument is there.
>
> I spent some good time digging and I couldn't find this mentioned
> anywhere so maybe it just never got sent to the list?

It's entirely possible that it was just a private musing, and you
re-opening this issue just resurrected the thought.

I'm not sure how simple it would be to implement, but looking at it it
shouldn't be problematic to add a "max_fd" argument to unshare_fd()
and dup_fd().

Although the range for unsharing is obviously reversed, so I'd suggest
not trying to make "dup_fd()" take the exact range into account.

More like just making __close_range() do basically something like

rcu_read_lock();
cur_max = files_fdtable(files)->max_fds;
rcu_read_unlock();

if (flags & CLOSE_RANGE_UNSHARE) {
unsigned int max_unshare_fd = ~0u;
if (cur_max >= max_fd)
max_unshare_fd = fd;
unshare_fd(max_unsgare_fd);
}

.. do the rest of __close_range() here ..

and all that "max_unsgare_fd" would do would be to limit the top end
of the file descriptor table unsharing: we'd still do the exact range
handling in __close_range() itself.

Because teaching unshare_fd() and dup_fd() about anything more complex
than the above doesn't sound worth it, but adding a way to just avoid
the unnecessary copy of any high file descriptors sounds simple
enough.

But I haven't thought deeply about this. I might have missed something.

Linus


[PATCH v2] hwmon: bt1-pvt: Define Temp- and Volt-to-N poly as maybe-unused

2020-06-02 Thread Serge Semin
Clang-based kernel building with W=1 warns that some static const
variables are unused:

drivers/hwmon/bt1-pvt.c:67:30: warning: unused variable 'poly_temp_to_N' 
[-Wunused-const-variable]
static const struct pvt_poly poly_temp_to_N = {
 ^
drivers/hwmon/bt1-pvt.c:99:30: warning: unused variable 'poly_volt_to_N' 
[-Wunused-const-variable]
static const struct pvt_poly poly_volt_to_N = {
 ^

Indeed these polynomials are utilized only when the PVT sensor alarms are
enabled. In that case they are used to convert the temperature and
voltage alarm limits from normal quantities (Volts and degree Celsius) to
the sensor data representation N = [0, 1023]. Otherwise when alarms are
disabled the driver only does the detected data conversion to the human
readable form and doesn't need that polynomials defined. So let's mark the
Temp-to-N and Volt-to-N polynomials with __maybe_unused attribute.

Note gcc with W=1 doesn't notice the problem.

Fixes: 87976ce2825d ("hwmon: Add Baikal-T1 PVT sensor driver")
Reported-by: kbuild test robot 
Signed-off-by: Serge Semin 
Cc: Maxim Kaurkin 
Cc: Alexey Malahov 

---

Link: 
https://lore.kernel.org/linux-hwmon/20200602091219.24404-1-sergey.se...@baikalelectronics.ru
Changelog v2:
- Repalce if-defs with __maybe_unused attribute.
---
 drivers/hwmon/bt1-pvt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hwmon/bt1-pvt.c b/drivers/hwmon/bt1-pvt.c
index 1a9772fb1f73..8709b3f54086 100644
--- a/drivers/hwmon/bt1-pvt.c
+++ b/drivers/hwmon/bt1-pvt.c
@@ -64,7 +64,7 @@ static const struct pvt_sensor_info pvt_info[] = {
  * 48380,
  * where T = [-48380, 147438] mC and N = [0, 1023].
  */
-static const struct pvt_poly poly_temp_to_N = {
+static const struct pvt_poly __maybe_unused poly_temp_to_N = {
.total_divider = 1,
.terms = {
{4, 18322, 1, 1},
@@ -96,7 +96,7 @@ static const struct pvt_poly poly_N_to_temp = {
  * N = (18658e-3*V - 11572) / 10,
  * V = N * 10^5 / 18658 + 11572 * 10^4 / 18658.
  */
-static const struct pvt_poly poly_volt_to_N = {
+static const struct pvt_poly __maybe_unused poly_volt_to_N = {
.total_divider = 10,
.terms = {
{1, 18658, 1000, 1},
-- 
2.26.2



Re: kobject_init_and_add is easy to misuse

2020-06-02 Thread James Bottomley
On Tue, 2020-06-02 at 14:51 -0700, James Bottomley wrote:
> On Tue, 2020-06-02 at 22:07 +0200, Greg Kroah-Hartman wrote:
> > On Tue, Jun 02, 2020 at 12:54:16PM -0700, James Bottomley wrote:
> 
> [...]
> > > I think the only way we can make the failure semantics consistent
> > > is to have the kobject_init() ones (so kfree on failure).  That
> > > means for the add part, the function would have to unwind
> > > everything it did from init on so kfree() is still an option.  If
> > > people agree, then I can produce the patch ... it's just the
> > > current drive to transform everyone who's doing kfree() into
> > > kobject_put() would become wrong ...
> > 
> > Everyone should be putting their kfree into the kobject release
> > anyway, right?
> 
> No, that's the problem ... for a static kobject you can't free it;
> and the release path may make assumption which aren't valid depending
> on the kobject state.
> 
> > Anyway, let's see your patch before I start to object further :)
> 
> My first thought was "what?  I got suckered into creating a patch",
> thanks ;-)  But now I look, all the error paths do unwind back to the
> initial state, so kfree() on error looks to be completely correct.

Actually, I spoke too soon.  I did another analysis of the syzkaller
flow in b8eb718348b8 ("net-sysfs: Fix reference count leak in
rx|netdev_queue_add_kobject") and it turns out there is a single piece
of state that's not correctly unwound: the kobj->name which, thanks to
additions after kobject_init_and_add() was created, is now allocated
via kmalloc if it's not a rodata string and is always and freed in
kobject_cleanup via kfree_const().  This problem can be fixed by
unwinding the name allocation at the end of kobject_init_and_add() ...
or it could be unwound in kobject_add_varg, which would also make
kobject_add() unwind correctly.

The unwind step is to kfree_const(kobj->name); kobj->name = NULL; so it
won't interfere if the kobject_put() is called instead of a simple
kfree.

Would you prefer the unwind in kobject_init_and_add() like the patch
below or in kobject_add_varg()?


James

---

diff --git a/lib/kobject.c b/lib/kobject.c
index 65fa7bf70c57..9991baf43d27 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -472,6 +472,10 @@ int kobject_init_and_add(struct kobject *kobj, struct 
kobj_type *ktype,
va_start(args, fmt);
retval = kobject_add_varg(kobj, parent, fmt, args);
va_end(args);
+   if (retval && kobj->name) {
+   kfree_const(kobj->name);
+   kobj->name = NULL;
+   }
 
return retval;
 }


Re: [PATCH v3 6/6] MAINTAINERS: Add maintainers for MIPS core drivers

2020-06-02 Thread Serge Semin
On Tue, Jun 02, 2020 at 11:12:31AM +0100, Marc Zyngier wrote:
> On 2020-06-02 11:09, Serge Semin wrote:
> > Add Thomas and myself as maintainers of the MIPS CPU and GIC IRQchip,
> > MIPS
> > GIC timer and MIPS CPS CPUidle drivers.
> > 
> > Signed-off-by: Serge Semin 
> > 
> > ---
> > 
> > Changelog v3:
> > - Keep the files list alphabetically ordered.
> > - Add Thomas as the co-maintainer of the designated drivers.
> > ---
> >  MAINTAINERS | 11 +++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 2926327e4976..20532e0287d7 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -11278,6 +11278,17 @@
> > F:  arch/mips/configs/generic/board-boston.config
> >  F: drivers/clk/imgtec/clk-boston.c
> >  F: include/dt-bindings/clock/boston-clock.h
> > 
> > +MIPS CORE DRIVERS
> > +M: Thomas Bogendoerfer 
> > +M: Serge Semin 
> > +L: linux-m...@vger.kernel.org
> > +S: Supported
> > +F: drivers/bus/mips_cdmm.c
> > +F: drivers/clocksource/mips-gic-timer.c
> > +F: drivers/cpuidle/cpuidle-cps.c
> > +F: drivers/irqchip/irq-mips-cpu.c
> > +F: drivers/irqchip/irq-mips-gic.c
> > +
> >  MIPS GENERIC PLATFORM
> >  M: Paul Burton 
> >  L: linux-m...@vger.kernel.org
> 
> Acked-by: Marc Zyngier 
> 
> I assume this will go via the MIPS tree.

Yes, I also think so. Though I suppose first we have to get acks from
Rafael J. Wysocki (CPU IDLE) or Daniel Lezcano (CPU IDLE,
CLOCKSOURCE/CLOCKEVENT) or Thomas Gleixner (CLOCKSOURCE, CLOCKEVENT)
since we are going to maintain the drivers from the subsystems they
support. Am I right?

-Sergey

> 
> Thanks,
> 
> M.
> -- 
> Jazz is not dead. It just smells funny...


RE: [RFC PATCH 1/2] Drivers: hv: vmbus: Re-balance channel interrupts across CPUs at CPU hotplug

2020-06-02 Thread Michael Kelley
From: Andrea Parri (Microsoft)  Sent: Tuesday, May 26, 
2020 3:32 PM
> 
> CPU hot removals and additions present an opportunity for (re-)balancing
> the channel interrupts across the available CPUs.  Current code does not
> balance the interrupts at CPU hotplug; furthermore/consequently, the hot
> removal path currently fails (to remove the specified CPU) whenever some
> interrupt is bound to the CPU to be removed and the VMBus is connected.
> 
> Address such issues by implementing vmbus_balance_vp_indexes_at_cpuhp():
> invoke this primitive to balance the channel interrupts across available
> CPUs at CPU hotplug operations.  In the hot removal path, such primitive
> will (try to) move/balance interrupts out of the to-be-removed CPU so as
> to meet the user request to hot remove the CPU.
> 
> The balancing algorithm distributes the channel interrupts evenly across
> the available CPUs and NUMA nodes; to do so, it introduces and maintains
> per-device and per-connection channel statistics/counts to keep track of
> the (current) assignments of the channels to the CPUs/nodes.  By design,
> only "performance"-critical channels/devices are "balanced".
> 
> The proposed algorithm relies on the (recently introduced) capability to
> reassign a channel interrupt to a CPU (cf., the CHANNELMSG_MODIFYCHANNEL
> message type).  As such, the new balancing process is effective starting
> with VMBus version 4.1 (no changes in semantics or behavior are intended
> for VMBus versions lower than 4.1).
> 
> Suggested-by: Nuno Das Neves 
> Signed-off-by: Andrea Parri (Microsoft) 
> ---
>  drivers/hv/channel.c  |  38 +++
>  drivers/hv/channel_mgmt.c | 219 ++
>  drivers/hv/connection.c   |  23 ++--
>  drivers/hv/hv.c   |  62 ++-
>  drivers/hv/hyperv_vmbus.h |  72 +
>  drivers/hv/vmbus_drv.c|  45 +++-
>  include/linux/hyperv.h|  22 +++-
>  kernel/cpu.c  |   1 +
>  8 files changed, 416 insertions(+), 66 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 90070b337c10d..2974aa9dc956c 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "hyperv_vmbus.h"
> 
> @@ -317,6 +318,43 @@ int vmbus_send_modifychannel(u32 child_relid, u32 
> target_vp)
>  }
>  EXPORT_SYMBOL_GPL(vmbus_send_modifychannel);
> 
> +bool vmbus_modifychannel(struct vmbus_channel *channel,
> +  u32 origin_cpu, u32 target_cpu)
> +{
> + if (vmbus_send_modifychannel(channel->offermsg.child_relid,
> +  hv_cpu_number_to_vp_number(target_cpu)))
> + return false;
> +
> + /*
> +  * Warning.  At this point, there is *no* guarantee that the host will
> +  * have successfully processed the vmbus_send_modifychannel() request.
> +  * See the header comment of vmbus_send_modifychannel() for more info.
> +  *
> +  * Lags in the processing of the above vmbus_send_modifychannel() can
> +  * result in missed interrupts if the "old" target CPU is taken offline
> +  * before Hyper-V starts sending interrupts to the "new" target CPU.
> +  * But apart from this offlining scenario, the code tolerates such
> +  * lags.  It will function correctly even if a channel interrupt comes
> +  * in on a CPU that is different from the channel target_cpu value.
> +  */
> +
> + channel->target_cpu = target_cpu;
> + channel->target_vp = hv_cpu_number_to_vp_number(target_cpu);
> + channel->numa_node = cpu_to_node(target_cpu);
> +
> + /* See init_vp_index(). */
> + if (hv_is_perf_channel(channel))
> + hv_update_alloced_cpus(origin_cpu, target_cpu);
> +
> + /* Currently set only for storvsc channels. */
> + if (channel->change_target_cpu_callback) {
> + (*channel->change_target_cpu_callback)(channel,
> + origin_cpu, target_cpu);
> + }
> +
> + return true;
> +}
> +
>  /*
>   * create_gpadl_header - Creates a gpadl for the specified buffer
>   */
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 417a95e5094dd..c158f86787940 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -497,10 +497,14 @@ static void vmbus_add_channel_work(struct work_struct 
> *work)
>   /*
>* Start the process of binding the primary channel to the driver
>*/
> +
> + /* See vmbus_balance_vp_indexes_at_cpuhp(). */
> + mutex_lock(_connection.channel_mutex);
>   newchannel->device_obj = vmbus_device_create(
>   >offermsg.offer.if_type,
>   >offermsg.offer.if_instance,
>   newchannel);
> + mutex_unlock(_connection.channel_mutex);
>   if (!newchannel->device_obj)
>   goto err_deq_chan;
> 
> @@ -515,6 +519,8 @@ static void vmbus_add_channel_work(struct work_struct 
> *work)
>   

[PATCH stable 5.4] PM: wakeup: Show statistics for deleted wakeup sources again

2020-06-02 Thread Florian Fainelli
From: zhuguangqing 

commit e976eb4b91e906f20ec25b20c152d53c472fc3fd upstream

After commit 00ee22c28915 (PM / wakeup: Use seq_open() to show wakeup
stats), print_wakeup_source_stats(m, _ws) is not called from
wakeup_sources_stats_seq_show() any more.

Because deleted_ws is one of the wakeup sources, it should be shown
too, so add it to the end of all other wakeup sources.

Signed-off-by: zhuguangqing 
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Florian Fainelli 
---
 drivers/base/power/wakeup.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index 0bd9b291bb29..92f0960e9014 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -1073,6 +1073,9 @@ static void *wakeup_sources_stats_seq_next(struct 
seq_file *m,
break;
}
 
+   if (!next_ws)
+   print_wakeup_source_stats(m, _ws);
+
return next_ws;
 }
 
-- 
2.17.1



[rcu:dev.2020.06.01b] BUILD SUCCESS 9c814827af953f2e109feef5272154c00a8f4541

2020-06-02 Thread kbuild test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git  
dev.2020.06.01b
branch HEAD: 9c814827af953f2e109feef5272154c00a8f4541  refperf: Add test for 
RCU Tasks Trace readers.

i386-tinyconfig vmlinux size:


 TOTAL  TEXT  built-in.*  arch/x86/events/zhaoxin/built-in.*


  -233  -233  8747b07d1944 
Merge branch 'kcsan-dev.2020.04.13c' into HEAD   
 0 0  03e8e094dad9 
Merge branch 'lkmm-dev.2020.05.16a' into HEAD
 0 0  17e0ee2a3ec9 
torture:  Remove qemu dependency on EFI firmware 
 0 0  c58148777978 
torture: Add script to smoke-test commits in a branch
   +38   +38  396a79cc6818 
fork: Annotate a data race in vm_area_dup()  
 0 0  8035e0fc710a 
x86/mm/pat: Mark an intentional data race
 0 0  d7a51c24ee4b 
rculist: Add ASSERT_EXCLUSIVE_ACCESS() to __list_splice_init 
 0 0  e5efa2f1b7b6 
locktorture: Use true and false to assign to bool variables  
 0 0  7514d7f181ab 
srcu: Fix a typo in comment "amoritized"->"amortized"
 0 0  9dbd776542e3 rcu: 
Simplify the calculation of rcu_state.ncpus 
 0 0  df12d657bcc0 
docs: RCU: Convert checklist.txt to ReST 
 0 0  fdfeb779e1bd 
docs: RCU: Convert lockdep-splat.txt to ReST 
 0 0  68b5951f7eb2 
docs: RCU: Convert lockdep.txt to ReST   
 0 0  ce9edc0c8a82 
docs: RCU: Convert rculist_nulls.txt to ReST 
 0 0  1bee818b03c7 
docs: RCU: Convert torture.txt to ReST   
 0 0  9100131711bc 
docs: RCU: Convert rcuref.txt to ReST
 0 0  080f194cfa87 
docs: RCU: Convert stallwarn.txt to ReST 
 0 0  6999f47d8456 
docs: RCU: Don't duplicate chapter names in rculist_nulls.rs 
 0 0  55ce2e8178f2 
rcutorture: Add races with task-exit processing  
 0 0  1c60a5e52538 
torture: Set configfile variable to current scenario 
 0 0  9969401f1706 
rcutorture: Handle non-statistic bang-string error messages  
 0 0  6f099e1b362b 
rcutorture: NULL rcu_torture_current earlier in cleanup code 
 0 0  6816417616c4 
kcsan: Add test suite
 0 0  848d16e04f52 doc: 
Timer problems can cause RCU CPU stall warnings 
 0 0  2364a9f967ec rcu: 
Add callbacks-invoked counters  
 0 0  2775724beeef rcu: 
Add comment documenting rcu_callback_map's purpose  
 0 0 +138684  bfd78bca7bdf 
Revert b8c17e6664c4 ("rcu: Maintain special bits at bottom o 
+1 0 -138684  8903088434e7 
rcu/tree: Add better tracing for dyntick-idle
-1 0  c0601bb42994 
rcu/tree: Clean up dynticks counter usage
 0 0  3f3baaf3ac07 
rcu/tree: Remove dynticks_nmi_nesting counter
+1 0  725e4ad9e020 
trace: events: rcu: Change description of rcu_dyntick trace  
 0 0  

[rcu:urgent-for-mingo] BUILD SUCCESS b3e2d20973db3ec87a6dd2fee0c88d3c2e7c2f61

2020-06-02 Thread kbuild test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git  
urgent-for-mingo
branch HEAD: b3e2d20973db3ec87a6dd2fee0c88d3c2e7c2f61  rcuperf: Fix printk 
format warning

elapsed time: 485m

configs tested: 99
configs skipped: 7

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm  allyesconfig
arm  allmodconfig
arm   allnoconfig
arm64allyesconfig
arm64   defconfig
arm64allmodconfig
arm64 allnoconfig
arm   imx_v4_v5_defconfig
h8300 edosk2674_defconfig
pariscgeneric-64bit_defconfig
m68k amcore_defconfig
arm  moxart_defconfig
sh   j2_defconfig
sparc64  allyesconfig
armmps2_defconfig
arm  prima2_defconfig
s390  allnoconfig
mips  allnoconfig
mipsgpr_defconfig
sh sh7710voipgw_defconfig
powerpc  storcenter_defconfig
mips  decstation_64_defconfig
sh   rts7751r2dplus_defconfig
sh magicpanelr2_defconfig
ia64 allmodconfig
s390 alldefconfig
c6xevmc6472_defconfig
sh  rsk7203_defconfig
arm   netwinder_defconfig
arm  badge4_defconfig
arcnsimosci_defconfig
i386 allyesconfig
i386defconfig
i386  debian-10.3
i386  allnoconfig
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
nds32   defconfig
nds32 allnoconfig
csky allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
h8300allmodconfig
xtensa  defconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips allmodconfig
pariscallnoconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc  allyesconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
powerpc defconfig
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
s390 allyesconfig
s390 allmodconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
sparc64 defconfig
sparc64   allnoconfig
sparc64  allmodconfig
um   allmodconfig
umallnoconfig
um  defconfig
um   allyesconfig
x86_64   rhel
x86_64   rhel-7.6
x86_64rhel-7.6-kselftests
x86_64 rhel-7.2-clear
x86_64lkp
x86_64  fedora-25
x86_64  kexec

---
0-DAY CI Kernel Test Service, Intel Corporation

Re: [GIT PULL] vfs: improve DAX behavior for 5.8, part 1

2020-06-02 Thread Ira Weiny
On Tue, Jun 02, 2020 at 09:58:52AM -0700, Darrick J. Wong wrote:
> Hi Linus,
> 
> After many years of LKML-wrangling about how to enable programs to query
> and influence the file data access mode (DAX) when a filesystem resides
> on storage devices such as persistent memory, Ira Weiny has emerged with
> a proposed set of standard behaviors that has not been shot down by
> anyone!  We're more or less standardizing on the current XFS behavior
> and adapting ext4 to do the same.

Also, for those interested: The corresponding man page change mentioned in the
commit has been submitted here:

https://lore.kernel.org/lkml/20200505002016.1085071-1-ira.we...@intel.com/

Ira

> 
> This pull request is the first of a handful that will make ext4 and XFS
> present a consistent interface for user programs that care about DAX.
> We add a statx attribute that programs can check to see if DAX is
> enabled on a particular file.  Then, we update the DAX documentation to
> spell out the user-visible behaviors that filesystems will guarantee
> (until the next storage industry shakeup).  The on-disk inode flag has
> been in XFS for a few years now.
> 
> Note that Stephen Rothwell reported a minor merge conflict[1] between
> the first cleanup patch and a different change in the block layer.  The
> resolution looks pretty straightforward, but let me know if you
> encounter problems.
> 
> --D
> 
> [1] 
> https://lore.kernel.org/linux-next/20200522145848.38cdc...@canb.auug.org.au/
> 
> The following changes since commit 0e698dfa282211e414076f9dc7e83c1c288314fd:
> 
>   Linux 5.7-rc4 (2020-05-03 14:56:04 -0700)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git tags/vfs-5.8-merge-1
> 
> for you to fetch changes up to 83d9088659e8f113741bb197324bd9554d159657:
> 
>   Documentation/dax: Update Usage section (2020-05-04 08:49:39 -0700)
> 
> 
> New code for 5.8:
> - Clean up io_is_direct.
> - Add a new statx flag to indicate when file data access is being done
>   via DAX (as opposed to the page cache).
> - Update the documentation for how system administrators and application
>   programmers can take advantage of the (still experimental DAX) feature.
> 
> 
> Ira Weiny (3):
>   fs: Remove unneeded IS_DAX() check in io_is_direct()
>   fs/stat: Define DAX statx attribute
>   Documentation/dax: Update Usage section
> 
>  Documentation/filesystems/dax.txt | 142 
> +-
>  drivers/block/loop.c  |   6 +-
>  fs/stat.c |   3 +
>  include/linux/fs.h|   7 +-
>  include/uapi/linux/stat.h |   1 +
>  5 files changed, 147 insertions(+), 12 deletions(-)


Re: [PATCH] iommu/amd: Fix event counter availability check

2020-06-02 Thread Shuah Khan

On 5/31/20 1:22 AM, Alexander Monakov wrote:

Hi,

Adding Shuah Khan to Cc: I've noticed you've seen this issue on Ryzen 2400GE;
can you have a look at the patch? Would be nice to know if it fixes the
problem for you too.



I am not seeing any change in behavior on my system. I still see:

I can't read perf counters.

The question I asked in my previous thread on this:


I see 2 banks and 4 counters on my system. Is it sufficient to check
the first bank and first counter? In other words, if the first one
isn't writable, are all counters non-writable?

Should we read the config first and then, try to see if any of the
counters are writable? I have a patch that does that, I can send it
out for review.

I changed the logic to read config to get max banks and counters
before checking if counters are writable and tried writing to all.
The result is the same and all of them aren't writable. However,
when disable the writable check and assume they are, I can run

perf stat -e 'amd_iommu_0 on all events and get data.

perf stat -e 'amd_iommu_0/cmd_processed/' sleep 10

 Performance counter stats for 'system wide':

56  amd_iommu_0/cmd_processed/

  10.001525171 seconds time elapsed


perf stat -a -e amd_iommu/mem_trans_total/ sleep 10

 Performance counter stats for 'system wide':

 2,696  amd_iommu/mem_trans_total/

  10.001465115 seconds time elapsed

I tried all possible events listed under amd_iommu_0 and I can get
data on all of them. No problems in dmesg.


This patch doesn't really address that question.

thanks,
-- Shuah


Re: [PATCH v8 04/10] drm: bridge: dw_mipi_dsi: allow bridge daisy chaining

2020-06-02 Thread Laurent Pinchart
Hi Adrian,

Thank you for the patch.

On Mon, Apr 27, 2020 at 11:19:46AM +0300, Adrian Ratiu wrote:
> Up until now the assumption was that the synopsis dsi bridge will
> directly connect to an encoder provided by the platform driver, but
> the current practice for drivers is to leave the encoder empty via
> the simple encoder API and add their logic to their own drm_bridge.
> 
> Thus we need an ablility to connect the DSI bridge to another bridge
> provided by the platform driver, so we extend the dw_mipi_dsi bind()
> API with a new "previous bridge" arg instead of just hardcoding NULL.
> 
> Cc: Laurent Pinchart 
> Signed-off-by: Adrian Ratiu 
> ---
> New in v8.
> ---
>  drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c   | 6 --
>  drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 2 +-
>  include/drm/bridge/dw_mipi_dsi.h| 5 -
>  3 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c 
> b/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c
> index 16fd87055e7b7..140ff40fa1b62 100644
> --- a/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c
> +++ b/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c
> @@ -1456,11 +1456,13 @@ EXPORT_SYMBOL_GPL(dw_mipi_dsi_remove);
>  /*
>   * Bind/unbind API, used from platforms based on the component framework.
>   */
> -int dw_mipi_dsi_bind(struct dw_mipi_dsi *dsi, struct drm_encoder *encoder)
> +int dw_mipi_dsi_bind(struct dw_mipi_dsi *dsi,
> +  struct drm_encoder *encoder,
> +  struct drm_bridge *prev_bridge)
>  {
>   int ret;
>  
> - ret = drm_bridge_attach(encoder, >bridge, NULL, 0);
> + ret = drm_bridge_attach(encoder, >bridge, prev_bridge, 0);

Please note that chaining of bridges doesn't work well if multiple
bridges in the chain try to create a connector. This is why a
DRM_BRIDGE_ATTACH_NO_CONNECTOR flag has been added, with a helper to
create a connector for a chain of bridges (drm_bridge_connector_init()).
This won't play well with the component framework. I would recommend
using the of_drm_find_bridge() instead in the rockchip driver, and
deprecating dw_mipi_dsi_bind().

>   if (ret) {
>   DRM_ERROR("Failed to initialize bridge with drm\n");
>   return ret;
> diff --git a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c 
> b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
> index 3feff0c45b3f7..83ef43be78135 100644
> --- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
> +++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
> @@ -929,7 +929,7 @@ static int dw_mipi_dsi_rockchip_bind(struct device *dev,
>   return ret;
>   }
>  
> - ret = dw_mipi_dsi_bind(dsi->dmd, >encoder);
> + ret = dw_mipi_dsi_bind(dsi->dmd, >encoder, NULL);
>   if (ret) {
>   DRM_DEV_ERROR(dev, "Failed to bind: %d\n", ret);
>   return ret;
> diff --git a/include/drm/bridge/dw_mipi_dsi.h 
> b/include/drm/bridge/dw_mipi_dsi.h
> index b0e390b3288e8..699b3531f5b36 100644
> --- a/include/drm/bridge/dw_mipi_dsi.h
> +++ b/include/drm/bridge/dw_mipi_dsi.h
> @@ -14,6 +14,7 @@
>  #include 
>  
>  struct drm_display_mode;
> +struct drm_bridge;
>  struct drm_encoder;
>  struct dw_mipi_dsi;
>  struct mipi_dsi_device;
> @@ -62,7 +63,9 @@ struct dw_mipi_dsi *dw_mipi_dsi_probe(struct 
> platform_device *pdev,
> const struct dw_mipi_dsi_plat_data
> *plat_data);
>  void dw_mipi_dsi_remove(struct dw_mipi_dsi *dsi);
> -int dw_mipi_dsi_bind(struct dw_mipi_dsi *dsi, struct drm_encoder *encoder);
> +int dw_mipi_dsi_bind(struct dw_mipi_dsi *dsi,
> +  struct drm_encoder *encoder,
> +  struct drm_bridge *prev_bridge);
>  void dw_mipi_dsi_unbind(struct dw_mipi_dsi *dsi);
>  void dw_mipi_dsi_set_slave(struct dw_mipi_dsi *dsi, struct dw_mipi_dsi 
> *slave);
>  

-- 
Regards,

Laurent Pinchart


Re: [PATCH v2 3/7] selftests/ftrace: Add "requires:" list support

2020-06-02 Thread Masami Hiramatsu
On Tue, 2 Jun 2020 09:21:45 -0400
Steven Rostedt  wrote:

> On Tue,  2 Jun 2020 18:08:31 +0900
> Masami Hiramatsu  wrote:
> 
> > +++ b/tools/testing/selftests/ftrace/test.d/template
> > @@ -1,6 +1,7 @@
> >  #!/bin/sh
> >  # SPDX-License-Identifier: GPL-2.0
> >  # description: %HERE DESCRIBE WHAT THIS DOES%
> > +# requires: %HERE LIST UP REQUIRED FILES%
> 
> Not sure what you mean by "LIST UP". Perhaps you mean "LIST OF"?

Ah, perhups we don't need UP. "list the required files" will be OK?

Thank you,

> 
> -- Steve
> 
> 
> >  # you have to add ".tc" extention for your testcase file
> >  # Note that all tests are run with "errexit" option.


-- 
Masami Hiramatsu 


Re: [PATCH v3] iommu/vt-d: Don't apply gfx quirks to untrusted devices

2020-06-02 Thread Prashant Malani
Hi Rajat,

On Tue, Jun 02, 2020 at 04:26:02PM -0700, Rajat Jain wrote:
> Currently, an external malicious PCI device can masquerade the VID:PID
> of faulty gfx devices, and thus apply iommu quirks to effectively
> disable the IOMMU restrictions for itself.
> 
> Thus we need to ensure that the device we are applying quirks to, is
> indeed an internal trusted device.
> 
> Signed-off-by: Rajat Jain 
> Acked-by: Lu Baolu 
> ---
> v3: - Separate out the warning mesage in a function to be called from
>   other places. Change the warning string as suggested.
> v2: - Change the warning print strings.
> - Add Lu Baolu's acknowledgement.
> 
>  drivers/iommu/intel-iommu.c | 37 +
>  1 file changed, 37 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index ef0a5246700e5..dc859f02985a0 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -6185,6 +6185,23 @@ intel_iommu_domain_set_attr(struct iommu_domain 
> *domain,
>   return ret;
>  }
>  
> +/*
> + * Check that the device does not live on an external facing PCI port that is
> + * marked as untrusted. Such devices should not be able to apply quirks and
> + * thus not be able to bypass the IOMMU restrictions.
> + */
> +static bool risky_device(struct pci_dev *pdev)
> +{
> + if (pdev->untrusted) {
> + pci_warn(pdev,
> +  "Skipping IOMMU quirk for dev (%04X:%04X) on untrusted"
> +  " PCI link. Please check with your BIOS/Platform"
> +  " vendor about this\n", pdev->vendor, pdev->device);
> + return true;
> + }
> + return false;
minor suggestion: Perhaps you could use a guard clause here? It would save you
a level of indentation, and possibly allow better string splitting
(e.g keeping "untrusted PCI" together). So something like:

if (!pdev->untrusted)
return false;

pci_warn(...);

I also hear the column limit warning is now for 100 chars [1], though
I'm not sure how it's being handled in this file.

Best regards,

-Prashant

[1]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/Documentation/process/coding-style.rst?id=bdc48fa11e46f867ea4d75fa59ee87a7f48be144

> +}
> +
>  const struct iommu_ops intel_iommu_ops = {
>   .capable= intel_iommu_capable,
>   .domain_alloc   = intel_iommu_domain_alloc,
> @@ -6214,6 +6231,9 @@ const struct iommu_ops intel_iommu_ops = {
>  
>  static void quirk_iommu_igfx(struct pci_dev *dev)
>  {
> + if (risky_device(dev))
> + return;
> +
>   pci_info(dev, "Disabling IOMMU for graphics on this chipset\n");
>   dmar_map_gfx = 0;
>  }
> @@ -6255,6 +6275,9 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x163D, 
> quirk_iommu_igfx);
>  
>  static void quirk_iommu_rwbf(struct pci_dev *dev)
>  {
> + if (risky_device(dev))
> + return;
> +
>   /*
>* Mobile 4 Series Chipset neglects to set RWBF capability,
>* but needs it. Same seems to hold for the desktop versions.
> @@ -6285,6 +6308,9 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev 
> *dev)
>  {
>   unsigned short ggc;
>  
> + if (risky_device(dev))
> + return;
> +
>   if (pci_read_config_word(dev, GGC, ))
>   return;
>  
> @@ -6318,6 +6344,12 @@ static void __init check_tylersburg_isoch(void)
>   pdev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x3a3e, NULL);
>   if (!pdev)
>   return;
> +
> + if (risky_device(pdev)) {
> + pci_dev_put(pdev);
> + return;
> + }
> +
>   pci_dev_put(pdev);
>  
>   /* System Management Registers. Might be hidden, in which case
> @@ -6327,6 +6359,11 @@ static void __init check_tylersburg_isoch(void)
>   if (!pdev)
>   return;
>  
> + if (risky_device(pdev)) {
> + pci_dev_put(pdev);
> + return;
> + }
> +
>   if (pci_read_config_dword(pdev, 0x188, )) {
>   pci_dev_put(pdev);
>   return;
> -- 
> 2.27.0.rc2.251.g90737beb825-goog
> 


Re: [PATCH 00/10] spi: Adding support for Microchip Sparx5 SoC

2020-06-02 Thread Serge Semin
On Tue, Jun 02, 2020 at 10:18:28AM +0200, Lars Povlsen wrote:
> 
> Serge Semin writes:
> 
> > Hello Lars,
> >
> > On Wed, May 13, 2020 at 04:00:21PM +0200, Lars Povlsen wrote:
> >> This is an add-on series to the main SoC Sparx5 series
> >> (Message-ID: <20200513125532.24585-1-lars.povl...@microchip.com>).
> >>
> >> The series add support for Sparx5 on top of the existing
> >> ocelot/jaguar2 spi driver.
> >>
> >> It spins off the existing support for the MSCC platforms into a
> >> separate driver, as adding new platforms from the MSCC/Microchip
> >> product lines will further complicate (clutter) the original driver.
> >>
> >> New YAML dt-bindings are provided for the resulting driver.
> >>
> >> It is expected that the DT patches are to be taken directly by the arm-soc
> >> maintainers.
> >
> > Regarding our cooperation. It can be implemented as follows. Since your 
> > patchset
> > is less cumbersome than mine and is more ready to be integrated into the 
> > generic DW
> > APB SSI code, it would be better to first make it through Mark', Andy' and 
> > my reviews
> > to be further merged into the kernel version of the driver. After that I'll 
> > have
> > my code altered so it could be applied on top of your patches. When 
> > everything
> > is done we'll have a more comprehensive DW APB SSI driver with poll-based
> > PIO operations support, new features like rx-delay, etc.
> >
> 

> Hi Serge!
> 
> I think I would be able to work on the SPI patches this week. Should I
> base it on the current spi-next or 5.7? Then address the comments and
> send out a new revision?

Finally I've done a part of review. It must be enough for v2. As Mark said the
new version is supposed to be based on the spi-next, since that branch's got
all recent DW APB SSI patches applied.

-Sergey

> 
> Thanks for reaching out.
> 
> ---Lars
> 
> > Thank you one more time for the series you've shared with us. Let's see 
> > what can
> > be done to improve it...
> >
> > -Sergey
> >
> >>
> >> Lars Povlsen (10):
> >>   spi: dw: Add support for polled operation via no IRQ specified in DT
> >>   spi: dw: Add support for RX sample delay register
> >>   spi: dw: Add support for client driver memory operations
> >>   dt-bindings: spi: Add bindings for spi-dw-mchp
> >>   spi: spi-dw-mmio: Spin off MSCC platforms into spi-dw-mchp
> >>   dt-bindings: spi: spi-dw-mchp: Add Sparx5 support
> >>   spi: spi-dw-mchp: Add Sparx5 support
> >>   arm64: dts: sparx5: Add SPI controller
> >>   arm64: dts: sparx5: Add spi-nor support
> >>   arm64: dts: sparx5: Add spi-nand devices
> >>
> >>  .../bindings/spi/mscc,ocelot-spi.yaml |  89 
> >>  .../bindings/spi/snps,dw-apb-ssi.txt  |   7 +-
> >>  MAINTAINERS   |   2 +
> >>  arch/arm64/boot/dts/microchip/sparx5.dtsi |  37 ++
> >>  .../boot/dts/microchip/sparx5_pcb125.dts  |  16 +
> >>  .../boot/dts/microchip/sparx5_pcb134.dts  |  22 +
> >>  .../dts/microchip/sparx5_pcb134_board.dtsi|   9 +
> >>  .../boot/dts/microchip/sparx5_pcb135.dts  |  23 +
> >>  .../dts/microchip/sparx5_pcb135_board.dtsi|   9 +
> >>  arch/mips/configs/generic/board-ocelot.config |   2 +-
> >>  drivers/spi/Kconfig   |   7 +
> >>  drivers/spi/Makefile  |   1 +
> >>  drivers/spi/spi-dw-mchp.c | 399 ++
> >>  drivers/spi/spi-dw-mmio.c |  93 
> >>  drivers/spi/spi-dw.c  |  31 +-
> >>  drivers/spi/spi-dw.h  |   4 +
> >>  16 files changed, 644 insertions(+), 107 deletions(-)
> >>  create mode 100644 
> >> Documentation/devicetree/bindings/spi/mscc,ocelot-spi.yaml
> >>  create mode 100644 drivers/spi/spi-dw-mchp.c
> >>
> >> --
> >> 2.26.2
> >>
> >> ___
> >> linux-arm-kernel mailing list
> >> linux-arm-ker...@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> --
> Lars Povlsen,
> Microchip


[tip:master] BUILD SUCCESS 16fc229652f8188dd898584c946293ea576abbbb

2020-06-02 Thread kbuild test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git  master
branch HEAD: 16fc229652f8188dd898584c946293ea576a  Merge branch 'sched/core'

elapsed time: 726m

configs tested: 102
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm  allyesconfig
arm  allmodconfig
arm   allnoconfig
arm64allyesconfig
arm64   defconfig
arm64allmodconfig
arm64 allnoconfig
mips loongson1b_defconfig
shsh7763rdp_defconfig
sh   se7619_defconfig
h8300   h8s-sim_defconfig
m68k   m5475evb_defconfig
arm  pxa255-idp_defconfig
arm am200epdkit_defconfig
shhp6xx_defconfig
mips  maltaaprp_defconfig
alphaalldefconfig
mips  maltasmvp_eva_defconfig
shdreamcast_defconfig
sh  lboxre2_defconfig
arm lpc18xx_defconfig
arm  zx_defconfig
alpha   defconfig
ia64 alldefconfig
mipsnlm_xlp_defconfig
i386  allnoconfig
i386 allyesconfig
i386defconfig
i386  debian-10.3
ia64 allmodconfig
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
nds32   defconfig
nds32 allnoconfig
csky allyesconfig
cskydefconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
h8300allmodconfig
xtensa  defconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips  allnoconfig
mips allmodconfig
pariscallnoconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc defconfig
powerpc  allyesconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a001-20200602
i386 randconfig-a006-20200602
i386 randconfig-a002-20200602
i386 randconfig-a005-20200602
i386 randconfig-a003-20200602
i386 randconfig-a004-20200602
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
s390 allyesconfig
s390  allnoconfig
s390 allmodconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
sparc64 defconfig
sparc64   allnoconfig
sparc64  allyesconfig
sparc64  allmodconfig
um   allmodconfig
umallnoconfig
um   allyesconfig
um  defconfig
x86_64   rhel
x86_64   rhel-7.6
x86_64rhel-7.6-kselftests
x86_64 rhel-7.2-clear
x86_64lkp
x86_64  fedora-25
x86_64

Re: [PATCH 7/7] dt-bindings: display: Document Cadence MHDP HDMI/DP bindings

2020-06-02 Thread Laurent Pinchart
Hi Sandor,

Thank you for the patch.

On Mon, Jun 01, 2020 at 02:17:37PM +0800, sandor...@nxp.com wrote:
> From: Sandor Yu 
> 
> Document the bindings used for the Cadence MHDP HDMI/DP bridge.
> 
> Signed-off-by: Sandor Yu 
> ---
>  .../bindings/display/bridge/cdns,mhdp.yaml| 46 +++
>  .../devicetree/bindings/display/imx/mhdp.yaml | 59 +++

Please split the patch in two.

>  2 files changed, 105 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/display/bridge/cdns,mhdp.yaml
>  create mode 100644 Documentation/devicetree/bindings/display/imx/mhdp.yaml
> 
> diff --git a/Documentation/devicetree/bindings/display/bridge/cdns,mhdp.yaml 
> b/Documentation/devicetree/bindings/display/bridge/cdns,mhdp.yaml
> new file mode 100644
> index ..aa23feba744a
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/display/bridge/cdns,mhdp.yaml
> @@ -0,0 +1,46 @@
> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause))
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/display/bridge/cdns,mhdp.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Cadence MHDP TX Encoder
> +
> +maintainers:
> +  - Sandor Yu 
> +
> +description: |
> +  Cadence MHDP Controller supports one or more of the protocols,
> +  such as HDMI and DisplayPort.
> +  Each protocol requires a different FW binaries.
> +
> +  This document defines device tree properties for the Cadence MHDP Encoder
> +  (CDNS MHDP TX). It doesn't constitue a device tree binding
> +  specification by itself but is meant to be referenced by platform-specific
> +  device tree bindings.
> +
> +  When referenced from platform device tree bindings the properties defined 
> in
> +  this document are defined as follows. The platform device tree bindings are
> +  responsible for defining whether each property is required or optional.
> +
> +properties:
> +  reg:
> +maxItems: 1
> +description: Memory mapped base address and length of the MHDP TX 
> registers.
> +
> +  interrupts:
> +maxItems: 2
> +
> +  interrupt-names:
> +- const: plug_in
> +  description: Hotplug detect interrupter for cable plugin event.
> +- const: plug_out
> +  description: Hotplug detect interrupter for cable plugout event.

Does the IP core really have two different interrupt lines, one for
hot-plug and one for hot-unplug ? That's a very unusual design.

> +
> +  port:
> +type: object
> +description: |
> +  The connectivity of the MHDP TX with the rest of the system is
> +  expressed in using ports as specified in the device graph bindings 
> defined
> +  in Documentation/devicetree/bindings/graph.txt. The numbering of the 
> ports
> +  is platform-specific.
> diff --git a/Documentation/devicetree/bindings/display/imx/mhdp.yaml 
> b/Documentation/devicetree/bindings/display/imx/mhdp.yaml
> new file mode 100644
> index ..17850cfd1cb1
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/display/imx/mhdp.yaml
> @@ -0,0 +1,59 @@
> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/display/bridge/mhdp.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Cadence MHDP Encoder
> +
> +maintainers:
> +  - Sandor Yu 
> +
> +description: |
> +  The MHDP transmitter is a Cadence HD Display TX controller IP
> +  with a companion PHY IP.
> +  The MHDP supports one or more of the protocols,
> +  such as HDMI(1.4 & 2.0), DisplayPort(1.2).
> +  switching between the two modes (HDMI and DisplayPort)
> +  requires reloading the appropriate FW

Does the IP core integrated in the imx8mp SoCs (as that is what this
binding targets) support both HDMI and DP ? If not this should be
reworded to be more specific to the SoC.

> +
> +  These DT bindings follow the Cadence MHDP TX bindings defined in
> +  Documentation/devicetree/bindings/display/bridge/cdns,mhdp.yaml with the
> +  following device-specific properties.
> +
> +Properties:

Have you tried validating this with make dt_binding_check ? See
Documentation/devicetree/writing-schema.rst for more information.

> +  compatible:
> +enum:
> +  - nxp,imx8mq-cdns-hdmi
> +  - nxp,imx8mq-cdns-dp
> +
> +  reg: See cdns,mhdp.yaml.

This isn't how bindings are referenced. You need to reference the parent
binding with $ref, either globally, or on an individual property basis.

> +
> +  interrupts: See cdns,mhdp.yaml.
> +
> +  interrupt-names: See cdns,mhdp.yaml.

That's it ? No clocks, no power domains, no resets, no PHYs (especially
given that you mention a PHY companion IP above) ?

> +
> +  ports: See cdns,mhdp.yaml.

This isn't correct. Please soo of-graph.txt. If can have either one port
node, or one ports node that contains one of more port subnodes. In this
case you need at least two ports, one for the input to the HDMI encoder,
and one for the HDMI output. The latter should be connected to a DT node
representing the HDMI 

Re: [PATCH v5 1/3] open: add close_range()

2020-06-02 Thread Christian Brauner
On Wed, Jun 03, 2020 at 01:30:57AM +0200, Florian Weimer wrote:
> * Christian Brauner:
> 
> > The performance is striking. For good measure, comparing the following
> > simple close_all_fds() userspace implementation that is essentially just
> > glibc's version in [6]:
> >
> > static int close_all_fds(void)
> > {
> > int dir_fd;
> > DIR *dir;
> > struct dirent *direntp;
> >
> > dir = opendir("/proc/self/fd");
> > if (!dir)
> > return -1;
> > dir_fd = dirfd(dir);
> > while ((direntp = readdir(dir))) {
> > int fd;
> > if (strcmp(direntp->d_name, ".") == 0)
> > continue;
> > if (strcmp(direntp->d_name, "..") == 0)
> > continue;
> > fd = atoi(direntp->d_name);
> > if (fd == dir_fd || fd == 0 || fd == 1 || fd == 2)
> > continue;
> > close(fd);
> > }
> > closedir(dir);
> > return 0;
> > }
> >
> 
> > [6]: 
> > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/grantpt.c;h=2030e07fa6e652aac32c775b8c6e005844c3c4eb;hb=HEAD#l17
> >  Note that this is an internal implementation that is not exported.
> >  Currently, libc seems to not provide an exported version of this
> >  because of missing kernel support to do this.
> 
> Just to be clear, this code is not compiled into glibc anymore in
> typical configurations.  I have posted a patch to turn grantpt into a
> no-op: 

That's great! (I remember commenting on that thread.)


Re: [PATCH 3/7] drm: bridge: cadence: initial support for MHDP DP bridge driver

2020-06-02 Thread Laurent Pinchart
Hi Sandor,

Thank you for the patch.

On Mon, Jun 01, 2020 at 02:17:33PM +0800, sandor...@nxp.com wrote:
> From: Sandor Yu 
> 
> This adds initial support for MHDP DP bridge driver.
> Basic DP functions are supported, that include:
>  -Video mode set on-the-fly
>  -Cable hotplug detect
>  -MAX support resolution to 3096x2160@60fps
>  -Support DP audio
>  -EDID read via AUX
> 
> Signed-off-by: Sandor Yu 
> ---
>  drivers/gpu/drm/bridge/cadence/Kconfig|   4 +
>  drivers/gpu/drm/bridge/cadence/Makefile   |   1 +
>  drivers/gpu/drm/bridge/cadence/cdns-dp-core.c | 530 ++
>  .../gpu/drm/bridge/cadence/cdns-mhdp-audio.c  | 100 
>  .../gpu/drm/bridge/cadence/cdns-mhdp-common.c |  42 +-
>  .../gpu/drm/bridge/cadence/cdns-mhdp-common.h |   3 +
>  drivers/gpu/drm/bridge/cadence/cdns-mhdp-dp.c |  34 +-
>  drivers/gpu/drm/rockchip/cdn-dp-core.c|   7 +-
>  include/drm/bridge/cdns-mhdp.h|  52 +-
>  9 files changed, 740 insertions(+), 33 deletions(-)
>  create mode 100644 drivers/gpu/drm/bridge/cadence/cdns-dp-core.c
> 
> diff --git a/drivers/gpu/drm/bridge/cadence/Kconfig 
> b/drivers/gpu/drm/bridge/cadence/Kconfig
> index 48c1b0f77dc6..b7b8d30b18b6 100644
> --- a/drivers/gpu/drm/bridge/cadence/Kconfig
> +++ b/drivers/gpu/drm/bridge/cadence/Kconfig
> @@ -5,3 +5,7 @@ config DRM_CDNS_MHDP
>   depends on OF
>   help
> Support Cadence MHDP API library.
> +
> +config DRM_CDNS_DP
> + tristate "Cadence DP DRM driver"
> + depends on DRM_CDNS_MHDP
> diff --git a/drivers/gpu/drm/bridge/cadence/Makefile 
> b/drivers/gpu/drm/bridge/cadence/Makefile
> index ddb2ba4fb852..cb3c88311a64 100644
> --- a/drivers/gpu/drm/bridge/cadence/Makefile
> +++ b/drivers/gpu/drm/bridge/cadence/Makefile
> @@ -1,3 +1,4 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  cdns_mhdp_drmcore-y := cdns-mhdp-common.o cdns-mhdp-audio.o cdns-mhdp-dp.o
> +cdns_mhdp_drmcore-$(CONFIG_DRM_CDNS_DP) += cdns-dp-core.o
>  obj-$(CONFIG_DRM_CDNS_MHDP)  += cdns_mhdp_drmcore.o
> diff --git a/drivers/gpu/drm/bridge/cadence/cdns-dp-core.c 
> b/drivers/gpu/drm/bridge/cadence/cdns-dp-core.c
> new file mode 100644
> index ..b2fe8fdc64ed
> --- /dev/null
> +++ b/drivers/gpu/drm/bridge/cadence/cdns-dp-core.c
> @@ -0,0 +1,530 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Cadence Display Port Interface (DP) driver
> + *
> + * Copyright (C) 2019-2020 NXP Semiconductor, Inc.
> + *
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "cdns-mhdp-common.h"
> +
> +/*
> + * This function only implements native DPDC reads and writes
> + */
> +static ssize_t dp_aux_transfer(struct drm_dp_aux *aux,
> + struct drm_dp_aux_msg *msg)
> +{
> + struct cdns_mhdp_device *mhdp = dev_get_drvdata(aux->dev);
> + bool native = msg->request & (DP_AUX_NATIVE_WRITE & DP_AUX_NATIVE_READ);
> + int ret;
> +
> + /* Ignore address only message */
> + if ((msg->size == 0) || (msg->buffer == NULL)) {
> + msg->reply = native ?
> + DP_AUX_NATIVE_REPLY_ACK : DP_AUX_I2C_REPLY_ACK;
> + return msg->size;
> + }
> +
> + if (!native) {
> + dev_err(mhdp->dev, "%s: only native messages supported\n", 
> __func__);
> + return -EINVAL;
> + }
> +
> + /* msg sanity check */
> + if (msg->size > DP_AUX_MAX_PAYLOAD_BYTES) {
> + dev_err(mhdp->dev, "%s: invalid msg: size(%zu), request(%x)\n",
> + __func__, msg->size, (unsigned 
> int)msg->request);
> + return -EINVAL;
> + }
> +
> + if (msg->request == DP_AUX_NATIVE_WRITE) {
> + const u8 *buf = msg->buffer;
> + int i;
> + for (i = 0; i < msg->size; ++i) {
> + ret = cdns_mhdp_dpcd_write(mhdp,
> +msg->address + i, buf[i]);
> + if (!ret)
> + continue;
> +
> + DRM_DEV_ERROR(mhdp->dev, "Failed to write DPCD\n");
> +
> + return ret;
> + }
> + msg->reply = DP_AUX_NATIVE_REPLY_ACK;
> + return msg->size;
> + }
> +
> + if (msg->request == DP_AUX_NATIVE_READ) {
> + ret = cdns_mhdp_dpcd_read(mhdp, msg->address, msg->buffer, 
> msg->size);
> + if (ret < 0)
> + return -EIO;
> + msg->reply = DP_AUX_NATIVE_REPLY_ACK;
> + return msg->size;
> + }
> + return 0;
> +}
> +
> +static int dp_aux_init(struct cdns_mhdp_device *mhdp,
> +   struct device *dev)
> +{
> + int ret;
> +
> + mhdp->dp.aux.name = "imx_dp_aux";
> + mhdp->dp.aux.dev = dev;
> + mhdp->dp.aux.transfer = dp_aux_transfer;
> +
> + ret = 

Re: [PATCH v5 0/3] close_range()

2020-06-02 Thread Christian Brauner
On Tue, Jun 02, 2020 at 02:03:09PM -0700, Linus Torvalds wrote:
> On Tue, Jun 2, 2020 at 1:42 PM Christian Brauner
>  wrote:
> >
> > This is a resend of the close_range() syscall, as discussed in [1]. There 
> > weren't any outstanding
> > discussions anymore and this was in mergeable shape. I simply hadn't gotten 
> > around to moving this
> > into my for-next the last few cycles and then forgot about it. Thanks to 
> > Kyle and the Python people,
> > and others for consistenly reminding me before every merge window and mea 
> > culpa for not moving on
> > this sooner. I plan on moving this into for-next after v5.8-rc1 has been 
> > released and targeting the
> > v5.9 merge window.
> 
> Btw, I did have one reaction that I can't find in the original thread,
> which probably means that it got lost.
> 
> If one of the designed uses for this is for dropping file descriptors
> just before execve(), it's possible that we'd want to have the option
> to say "unshare my fd array" as part of close_range().
> 
> Yes, yes, you can do
> 
> unshare(CLONE_FILES);
> close_range(3,~0u);
> 
> to do it as two operations (and you had that as the example typical
> use), but it would actually be better to be able to do
> 
> close_range(3, ~0ul, CLOSE_RANGE_UNSHARE);
> 
> instead. Because otherwise we just waste time copying the file
> descriptors first in the unshare, and then closing them after.. Double
> the work..
> 
> And maybe this _did_ get mentioned last time, and I just don't find
> it. I also don't see anything like that in the patches, although the
> flags argument is there.

I spent some good time digging and I couldn't find this mentioned
anywhere so maybe it just never got sent to the list?
It sounds pretty useful, so yeah let me add a patch for this tomorrow.

Christian


Re: [PATCH] hwmon: bt1-pvt: Declare Temp- and Volt-to-N poly when alarms are enabled

2020-06-02 Thread Serge Semin
On Tue, Jun 02, 2020 at 07:07:46AM -0700, Guenter Roeck wrote:
> On Tue, Jun 02, 2020 at 12:12:19PM +0300, Serge Semin wrote:
> > Clang-based kernel building with W=1 warns that some static const
> > variables are unused:
> > 
> > drivers/hwmon/bt1-pvt.c:67:30: warning: unused variable 'poly_temp_to_N' 
> > [-Wunused-const-variable]
> > static const struct pvt_poly poly_temp_to_N = {
> >  ^
> > drivers/hwmon/bt1-pvt.c:99:30: warning: unused variable 'poly_volt_to_N' 
> > [-Wunused-const-variable]
> > static const struct pvt_poly poly_volt_to_N = {
> >  ^
> > 
> > Indeed these polynomials are utilized only when the PVT sensor alarms are
> > enabled. In that case they are used to convert the temperature and
> > voltage alarm limits from normal quantities (Volts and degree Celsius) to
> > the sensor data representation N = [0, 1023]. Otherwise when alarms are
> > disabled the driver only does the detected data conversion to the human
> > readable form and doesn't need that polynomials defined. So let's declare
> > the Temp-to-N and Volt-to-N polynomials only if the PVT alarms are
> > switched on at compile-time.
> > 
> > Note gcc with W=1 doesn't notice the problem.
> > 
> > Fixes: 87976ce2825d ("hwmon: Add Baikal-T1 PVT sensor driver")
> > Reported-by: kbuild test robot 
> > Signed-off-by: Serge Semin 
> > Cc: Maxim Kaurkin 
> > Cc: Alexey Malahov 
> 

> I don't really like the added #if. Can you use __maybe_unused instead ?

Ok. __maybe_unused is much better. Thanks for suggestion.

-Sergey

> 
> Thanks,
> Guenter
> 
> > ---
> >  drivers/hwmon/bt1-pvt.c | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/hwmon/bt1-pvt.c b/drivers/hwmon/bt1-pvt.c
> > index 1a9772fb1f73..1a5212c04549 100644
> > --- a/drivers/hwmon/bt1-pvt.c
> > +++ b/drivers/hwmon/bt1-pvt.c
> > @@ -64,6 +64,7 @@ static const struct pvt_sensor_info pvt_info[] = {
> >   * 48380,
> >   * where T = [-48380, 147438] mC and N = [0, 1023].
> >   */
> > +#if defined(CONFIG_SENSORS_BT1_PVT_ALARMS)
> >  static const struct pvt_poly poly_temp_to_N = {
> > .total_divider = 1,
> > .terms = {
> > @@ -74,6 +75,7 @@ static const struct pvt_poly poly_temp_to_N = {
> > {0, 1720400, 1, 1}
> > }
> >  };
> > +#endif /* CONFIG_SENSORS_BT1_PVT_ALARMS */
> >  
> >  static const struct pvt_poly poly_N_to_temp = {
> > .total_divider = 1,
> > @@ -96,6 +98,7 @@ static const struct pvt_poly poly_N_to_temp = {
> >   * N = (18658e-3*V - 11572) / 10,
> >   * V = N * 10^5 / 18658 + 11572 * 10^4 / 18658.
> >   */
> > +#if defined(CONFIG_SENSORS_BT1_PVT_ALARMS)
> >  static const struct pvt_poly poly_volt_to_N = {
> > .total_divider = 10,
> > .terms = {
> > @@ -103,6 +106,7 @@ static const struct pvt_poly poly_volt_to_N = {
> > {0, -11572, 1, 1}
> > }
> >  };
> > +#endif /* CONFIG_SENSORS_BT1_PVT_ALARMS */
> >  
> >  static const struct pvt_poly poly_N_to_volt = {
> > .total_divider = 10,


Re: [PATCH 1/4] drivers: clk: qcom: Add msm8992 GCC driver

2020-06-02 Thread Bryan O'Donoghue

On 31/05/2020 18:46, Konrad Dybcio wrote:


+static struct clk_fixed_factor xo = {
+   .mult = 1,
+   .div = 1,
+   .hw.init = &(struct clk_init_data)
+   {
+   .name = "xo",
+   .parent_names = (const char *[]) { "xo_board" },
+   .num_parents = 1,
+   .ops = _fixed_factor_ops,
+   },
+};


I think you can drop that, and use the DTS definition.

xo_board: xo_board {
compatible = "fixed-clock";
#clock-cells = <0>;
clock-frequency = <1920>;
};

sleep_clk: sleep_clk {
compatible = "fixed-clock";
#clock-cells = <0>;
clock-frequency = <32768>;
};

clock_gcc: clock-controller@fc40 {
compatible = "qcom,gcc-msm8994";
#clock-cells = <1>;
#reset-cells = <1>;
#power-domain-cells = <1>;
reg = <0xfc40 0x2000>;

+clock-names = "xo",
+  "sleep_clk";
+clocks = <_board>,
+ <_clk>;

};



+static int gcc_msm8992_probe(struct platform_device *pdev)
+{
+   struct device *dev = >dev;
+   struct clk *clk;
+
+   clk = devm_clk_register(dev, );
+   if (IS_ERR(clk))
+   return PTR_ERR(clk);


You should drop this too.



+MODULE_ALIAS("platform:gcc-msm8992");


and that.

---
bod



Re: linux-next: manual merge of the block tree with the rdma tree

2020-06-02 Thread Jason Gunthorpe
On Wed, Jun 03, 2020 at 01:40:51AM +0300, Max Gurtovoy wrote:
> 
> On 6/3/2020 12:37 AM, Jens Axboe wrote:
> > On 6/2/20 1:09 PM, Jason Gunthorpe wrote:
> > > On Tue, Jun 02, 2020 at 01:02:55PM -0600, Jens Axboe wrote:
> > > > On 6/2/20 1:01 PM, Jason Gunthorpe wrote:
> > > > > On Tue, Jun 02, 2020 at 11:37:26AM +0300, Max Gurtovoy wrote:
> > > > > > On 6/2/2020 5:56 AM, Stephen Rothwell wrote:
> > > > > > > Hi all,
> > > > > > Hi,
> > > > > > 
> > > > > > This looks good to me.
> > > > > > 
> > > > > > Can you share a pointer to the tree so we'll test it in our labs ?
> > > > > > 
> > > > > > need to re-test:
> > > > > > 
> > > > > > 1. srq per core
> > > > > > 
> > > > > > 2. srq per core + T10-PI
> > > > > > 
> > > > > > And both will run with shared CQ.
> > > > > Max, this is too much conflict to send to Linus between your own
> > > > > patches. I am going to drop the nvme part of this from RDMA.
> > > > > 
> > > > > Normally I don't like applying partial series, but due to this tree
> > > > > split, you can send the rebased nvme part through the nvme/block tree
> > > > > at rc1 in two weeks..
> 
> Yes, I'll send it in 2 weeks.
> 
> Actually I hoped the iSER patches for CQ pool will be sent in this series
> but eventually they were not.
> 
> This way we could have taken only the iser part and the new API.
> 
> I saw the pulled version too late since I wasn't CCed to it and it was
> already merged before I had a chance to warn you about possible conflict.
> 
> I think in general we should try to add new RDMA APIs first with iSER/SRP
> and avoid conflicting trees.

If you are careful we can construct a shared branch and if Jens/etc is
willing he can pull the RDMA base code after RDMA merges the branch
and then apply the nvme parts. This is how things work with netdev

It is tricky and you have to plan for it during your submission step,
but we should be able to manage in most cases if this comes up more
often.

Jason


Re: [PATCH v5 1/3] open: add close_range()

2020-06-02 Thread Florian Weimer
* Christian Brauner:

> The performance is striking. For good measure, comparing the following
> simple close_all_fds() userspace implementation that is essentially just
> glibc's version in [6]:
>
> static int close_all_fds(void)
> {
> int dir_fd;
> DIR *dir;
> struct dirent *direntp;
>
> dir = opendir("/proc/self/fd");
> if (!dir)
> return -1;
> dir_fd = dirfd(dir);
> while ((direntp = readdir(dir))) {
> int fd;
> if (strcmp(direntp->d_name, ".") == 0)
> continue;
> if (strcmp(direntp->d_name, "..") == 0)
> continue;
> fd = atoi(direntp->d_name);
> if (fd == dir_fd || fd == 0 || fd == 1 || fd == 2)
> continue;
> close(fd);
> }
> closedir(dir);
> return 0;
> }
>

> [6]: 
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/grantpt.c;h=2030e07fa6e652aac32c775b8c6e005844c3c4eb;hb=HEAD#l17
>  Note that this is an internal implementation that is not exported.
>  Currently, libc seems to not provide an exported version of this
>  because of missing kernel support to do this.

Just to be clear, this code is not compiled into glibc anymore in
typical configurations.  I have posted a patch to turn grantpt into a
no-op: 

I'm not entirely convinced that it's safe to keep iterating over
/proc/self/fd while also closing descriptors.  Ideally, I think an
application should call getdents64, process the file names for
descriptors in the buffer, and if any have been closed, seek to zero
before the next getdents64 call.  Maybe procfs is different, but with
other file systems, unlinking files can trigger directory reordering,
and then you get strange effects.  The d_ino behavior for
/proc/self/fd is a bit strange as well (it's not consistently
descriptor plus 3).


Re: [PATCH net-next v5 4/4] net: dp83869: Add RGMII internal delay configuration

2020-06-02 Thread Dan Murphy

Florian

On 6/2/20 6:13 PM, Florian Fainelli wrote:


On 6/2/2020 4:10 PM, Dan Murphy wrote:

Florian

On 6/2/20 5:33 PM, Florian Fainelli wrote:

On 6/2/2020 9:45 AM, Dan Murphy wrote:

Add RGMII internal delay configuration for Rx and Tx.

Signed-off-by: Dan Murphy 
---

[snip]


+
   enum {
   DP83869_PORT_MIRRORING_KEEP,
   DP83869_PORT_MIRRORING_EN,
@@ -108,6 +113,8 @@ enum {
   struct dp83869_private {
   int tx_fifo_depth;
   int rx_fifo_depth;
+    s32 rx_id_delay;
+    s32 tx_id_delay;
   int io_impedance;
   int port_mirroring;
   bool rxctrl_strap_quirk;
@@ -232,6 +239,22 @@ static int dp83869_of_init(struct phy_device
*phydev)
    >tx_fifo_depth))
   dp83869->tx_fifo_depth = DP83869_PHYCR_FIFO_DEPTH_4_B_NIB;
   +    ret = of_property_read_u32(of_node, "rx-internal-delay-ps",
+   >rx_id_delay);
+    if (ret) {
+    dp83869->rx_id_delay =
+    dp83869_internal_delay[DP83869_CLK_DELAY_DEF];
+    ret = 0;
+    }
+
+    ret = of_property_read_u32(of_node, "tx-internal-delay-ps",
+   >tx_id_delay);
+    if (ret) {
+    dp83869->tx_id_delay =
+    dp83869_internal_delay[DP83869_CLK_DELAY_DEF];
+    ret = 0;
+    }

It is still not clear to me why is not the parsing being done by the PHY
library helper directly?

Why would we do that for these properties and not any other?

Those properties have a standard name, which makes them suitable for
parsing by the core PHY library.

Unless there is a new precedence being set here by having the PHY
framework do all the dt node parsing for common properties.

You could parse the vendor properties through the driver, let the PHY
library parse the standard properties, and resolve any ordering
precedence within the driver. In general, I would favor standard
properties over vendor properties.

Does this help?


Ok so new precedence then.

Because there are common properties like tx-fifo-depth, rx-fifo-depth, 
enet-phy-lane-swap and max_speed that the PHY framework should parse as 
well.


Dan



Re: [GIT PULL] x86/mm changes for v5.8

2020-06-02 Thread Linus Torvalds
On Tue, Jun 2, 2020 at 4:01 PM Singh, Balbir  wrote:
>
> >  (c) and if I read the code correctly, trying to flush the L1D$ on
> > non-intel without the HW support, it causes a WARN_ON_ONCE()! WTF?
>
> That is not correct, the function only complains if we do a software fallback
> flush without allocating the flush pages.

Right.

And if you're not on Intel, then that allocation would never have been
done, since the allocation function returns an error for non-intel
systems.

> That function is not exposed without
> the user using the prctl() API, which allocates those flush pages.

See above: it doesn't actually allocate those pages on anything but intel CPU's.

That said, looking deeper, it then does look like a
l1d_flush_init_once() failure will also cause the code to avoid
setting the TIF_SPEC_L1D_FLUSH bit, so non-intel CPU's will never call
the actual flushing routines, and thus never hit the WARN_ON. Ok.

> >  (2) the HW case is done for any vendor, if it reports the "I have the MSR"
>
> No l1d_flush_init_once() fails for users opting in via the prctl(), it
> succeeds for users of L1TF.

Yeah, again it looks like this all is basically just a hack for Intel CPU's.

It should never have been conditional on "do this on Intel".

It should have been conditional on the L1TF bug.

Yes, there's certainly overlap there, but it's not complete.

> >  (3) the VMX support certainly has various sanity checks like "oh, CPU
> > doesn't have X86_BUG_L1TF, then I won't do this even if there was some
> > kernel command line to say I should". But the new prctrl doesn't have
> > anything like that. It just enables that L1D$ thing mindlessly,
> > thinking that user-land software somehow knows what it's doing. BS.
>
> So you'd like to see a double opt-in?

I'd like it to be gated on being sane by default, together with some
system option like we have for pretty much all the mitigations.

> Unforunately there is no gating
> of the bug and I tried to make it generic - clearly calling it opt-in
> flushing for the paranoid, for those who really care about CVE-2020-0550.

No, you didn't make it generic at all - you made it depend on
X86_VENDOR_INTEL instead.

So now the logic is "on Intel, do this thing whether it makes sense or
not, on other vendors, never do it whether it _would_ make sense or
not".

That to me is not sensible. I just don't see the logic.

This feature should never be enabled unless X86_BUG_L1TF is on, as far
as I can tell.

And it should never be enabled if SMT is on.

At that point, it at least starts making sense. Maybe we don't need
any further admin options at that point.

> Would this make you happier?
>
> 1. Remove SW fallback flush
> 2. Implement a double opt-in (CAP_SYS_ADMIN for the prctl or a
>system wide disable)?
> 3. Ensure the flush happens only when the current core has
>SMT disabled

I think that (3) case should basically be "X86_BUG_L1TF && !SMT". That
should basically be the default setting for this.

The (2) thing I would prefer to just be the same kind of thing we do
for all the other mitigations: have a kernel command line to override
the defaults.

The SW fallback right now feels wrong to me. It does seem to be very
microarchitecture-specific and I'd really like to understand the
reason for the magic TLB filling. At the same time, if the feature is
at least enabled under sane and understandable circumstances, and
people have a way to turn it off, maybe I don't care too much.

  Linus


<    1   2   3   4   5   6   7   8   9   10   >