date:20190528

[PATCH] usb: phy: mxs: Disable external charger detect in mxs_phy_hw_init()

2019-05-28 Thread Andrey Smirnov

Since this driver already handles changer detction state, copy the
workaround code currently residing in arch/arm/mach-imx/anatop.c into
this drier to consolidate the places modifying it.

Signed-off-by: Andrey Smirnov 
Cc: Chris Healy 
Cc: Felipe Balbi 
Cc: Shawn Guo 
Cc: Fabio Estevam 
Cc: NXP Linux Team 
Cc: linux-...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---

The intent of this patch is to consolidate all of the code maipulating
charge detection state to a signle place and if this patch is agreed
upon I plan to follow it up with this change:

https://github.com/ndreys/linux/commit/7248f2b85b4706760fd33d2ff970e2ea12d3bea7

Thanks,
Andrey Smirnov

 drivers/usb/phy/phy-mxs-usb.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/usb/phy/phy-mxs-usb.c b/drivers/usb/phy/phy-mxs-usb.c
index 1b1bb0ad40c3..6fa16ab31e2e 100644
--- a/drivers/usb/phy/phy-mxs-usb.c
+++ b/drivers/usb/phy/phy-mxs-usb.c
@@ -63,6 +63,7 @@
 
 #define ANADIG_USB1_CHRG_DETECT_SET0x1b4
 #define ANADIG_USB1_CHRG_DETECT_CLR0x1b8
+#define ANADIG_USB2_CHRG_DETECT_SET0x214
 #define ANADIG_USB1_CHRG_DETECT_EN_B   BIT(20)
 #define ANADIG_USB1_CHRG_DETECT_CHK_CHRG_B BIT(19)
 #define ANADIG_USB1_CHRG_DETECT_CHK_CONTACTBIT(18)
@@ -250,6 +251,19 @@ static int mxs_phy_hw_init(struct mxs_phy *mxs_phy)
if (mxs_phy->data->flags & MXS_PHY_NEED_IP_FIX)
writel(BM_USBPHY_IP_FIX, base + HW_USBPHY_IP_SET);
 
+   if (mxs_phy->regmap_anatop) {
+   unsigned int reg = mxs_phy->port_id ?
+   ANADIG_USB1_CHRG_DETECT_SET :
+   ANADIG_USB2_CHRG_DETECT_SET;
+   /*
+* The external charger detector needs to be disabled,
+* or the signal at DP will be poor
+*/
+   regmap_write(mxs_phy->regmap_anatop, reg,
+ANADIG_USB1_CHRG_DETECT_EN_B |
+ANADIG_USB1_CHRG_DETECT_CHK_CHRG_B);
+   }
+
mxs_phy_tx_init(mxs_phy);
 
return 0;
-- 
2.21.0

Re: [PATCH net-next] net: link_watch: prevent starvation when processing linkwatch wq

2019-05-28 Thread David Miller

From: Yunsheng Lin 
Date: Mon, 27 May 2019 09:47:54 +0800

> When user has configured a large number of virtual netdev, such
> as 4K vlans, the carrier on/off operation of the real netdev
> will also cause it's virtual netdev's link state to be processed
> in linkwatch. Currently, the processing is done in a work queue,
> which may cause worker starvation problem for other work queue.
> 
> This patch releases the cpu when link watch worker has processed
> a fixed number of netdev' link watch event, and schedule the
> work queue again when there is still link watch event remaining.
> 
> Signed-off-by: Yunsheng Lin 

Why not rtnl_unlock(); yield(); rtnl_lock(); every "100" events
processed?

That seems better than adding all of this overhead to reschedule the
workqueue every 100 items.

[PATCH] ARM: dts: imx7d-sdb: Make SW2's voltage fixed

2019-05-28 Thread Anson . Huang

From: Anson Huang 

On i.MX7D SDB board, SW2 supplies a lot of peripheral devices,
its voltage should be fixed at 1.8V. The commit 43967d9b5a7c
("ARM: dts: imx7d-sdb: Assign corresponding power supply for LDOs")
assigns SW2 as the supplier of vdd1p0d, and when its comsumers
pcie-phy/mipi-phy try to set the vdd1p0d to 1.0V, regulator core
will also set SW2 to its best(min) voltage to 1.5V, and it will
lead to board reset.

This patch makes SW2's voltage fixed at 1.8V to avoid this issue.

Fixes: 43967d9b5a7c ("ARM: dts: imx7d-sdb: Assign corresponding power supply 
for LDOs")
Reported-by: Leonard Crestez 
Signed-off-by: Anson Huang 
---
 arch/arm/boot/dts/imx7d-sdb.dts | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/imx7d-sdb.dts b/arch/arm/boot/dts/imx7d-sdb.dts
index efc83bc..a5365b8 100644
--- a/arch/arm/boot/dts/imx7d-sdb.dts
+++ b/arch/arm/boot/dts/imx7d-sdb.dts
@@ -263,8 +263,8 @@
};
 
sw2_reg: sw2 {
-   regulator-min-microvolt = <150>;
-   regulator-max-microvolt = <185>;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
regulator-boot-on;
regulator-always-on;
};
-- 
2.7.4

Re: [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users

2019-05-28 Thread Masami Hiramatsu

On Wed, 22 May 2019 10:40:27 -0400
Steven Rostedt  wrote:

> On Wed, 22 May 2019 23:19:55 +0900
> Masami Hiramatsu  wrote:
> 
> > >  void *fgraph_reserve_data(int size_in_bytes)
> > > 
> > > Allows the entry function to reserve up to 4 words of data on
> > > the shadow stack. On success, a pointer to the contents is returned.
> > > This may be only called once per entry function.
> > > 
> > >  void *fgraph_retrieve_data(void)
> > > 
> > > Allows the return function to retrieve the reserved data that was
> > > allocated by the entry function.  
> > 
> > Nice! this seems good for kretprobe too. I'll review and try to port
> > kretprobe on this framework.
> 
> If you rather pull from my git repo and not download all the patches,
> they are currently available in my ftrace/fgraph-multi branch.

Hi Steve,

I found that these interfaces seem tightly coupled with fgraph_ops. But that
cause a problem when I'm using it from kretprobe.

kretprobe has 2 handlers, entry handler and return handler, and both need
pt_regs. But fgraph_ops's entryfunc and retfunc do not pass the pt_regs.
That is the biggest issue for me on these APIs.
Can we expand fgraph_ops with regs parameter?

Thank you,

-- 
Masami Hiramatsu

Re: [PATCH] net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT

2019-05-28 Thread Rasmus Villemoes

On 28/05/2019 15.44, Andrew Lunn wrote:
> On Tue, May 28, 2019 at 01:17:10PM +, Rasmus Villemoes wrote:
>> Currently, the upper half of a 4-byte STATS_TYPE_PORT statistic ends
>> up in bits 47:32 of the return value, instead of bits 31:16 as they
>> should.
>>
>> Signed-off-by: Rasmus Villemoes 
> 
> Hi Rasmus
> 
> Please include a Fixes tag, to indicate where the problem was
> introduced. In this case, i think it was:
> 
> Fixes: 6e46e2d821bb ("net: dsa: mv88e6xxx: Fix u64 statistics")
> 
> And set the Subject to [PATCH net] to indicate this should be applied
> to the net tree.

Will do.

>> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c 
>> b/drivers/net/dsa/mv88e6xxx/chip.c
>> index 370434bdbdab..317553d2cb21 100644
>> --- a/drivers/net/dsa/mv88e6xxx/chip.c
>> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
>> @@ -785,7 +785,7 @@ static uint64_t _mv88e6xxx_get_ethtool_stat(struct 
>> mv88e6xxx_chip *chip,
>>  err = mv88e6xxx_port_read(chip, port, s->reg + 1, ®);
>>  if (err)
>>  return U64_MAX;
>> -high = reg;
>> +low |= ((u32)reg) << 16;
>>  }
>>  break;
>>  case STATS_TYPE_BANK1:
> 
> What i don't like about this is how the function finishes:
> 
>   }
> value = (((u64)high) << 32) | low;
> return value;
> }
> 
> A better fix might be
> 
> - break
> + value = (((u64)high) << 16 | low;
> + return value;

Why? It's odd to have the u32 "high" sometimes represent the high 32
bits, sometimes the third 16 bits. It would make it harder to support an
8-byte STATS_TYPE_PORT statistic. I think the code is much cleaner if
each case is just responsible for providing the upper/lower 32 bits,
then have the common case combine them; . It's just that in the
STATS_TYPE_BANK cases, the 32 bits are assembled from two 16 bit values
by a helper (mv88e6xxx_g1_stats_read), while it is "open-coded" in the
first case.

I'll resend my patch with the fixes tag (thanks for finding that; I had
already dug way too deep past that one) and fixed subject.

Rasmus

>   Andrew  
> 

-- 
Rasmus Villemoes
Software Developer
Prevas A/S
Hedeager 3
DK-8200 Aarhus N
+45 51210274
rasmus.villem...@prevas.dk
www.prevas.dk

Re: [PATCH v4 00/30] coresight: Support for ACPI bindings

2019-05-28 Thread Leo Yan

On Tue, May 28, 2019 at 08:51:26AM -0600, Mathieu Poirier wrote:
> Good day,
> 
> On Tue, May 28, 2019 at 01:19:24PM +0800, Leo Yan wrote:
> > Hi Suzuki,
> > 
> > On Wed, May 22, 2019 at 11:34:33AM +0100, Suzuki K Poulose wrote:
> > > This series adds the support for CoreSight devices on ACPI based
> > > platforms. The device connections are encoded as _DSD graph property[0],
> > > with CoreSight specific extensions to indicate the direction of data
> > > flow as described in [1]. Components attached to CPUs are listed
> > > as child devices of the corresponding CPU, removing explicit links
> > > to the CPU like we do in the DT.
> > > 
> > > The majority of the series cleans up the driver and prepares the subsystem
> > > for platform agnostic firwmare probing, naming scheme, searching etc.
> > > 
> > > We introduce platform independent helpers to parse the platform supplied
> > > information. Thus we rename the platform handling code from:
> > >   of_coresight.c  => coresight-platform.c
> > > 
> > > The CoreSight driver creates shadow devices that appear on the Coresight
> > > bus, in addition to the real devices (e.g, AMBA bus devices). The name
> > > of these devices match the real device. This makes the device name
> > > a bit cryptic for ACPI platform. So this series also introduces a generic
> > > platform agnostic device naming scheme for the shadow Coresight devices.
> > > Towards this we also make changes to the way we lookup devices to resolve
> > > the connections, as we can't use the names to identify the devices. So,
> > > we use the "fwnode_handle" of the real device for the device lookups.
> > > Towards that we clean up the drivers to keep track of the "CoreSight"
> > > device rather than the "real" device. However, all real operations,
> > > like DMA allocation, Power management etc. must be performed on
> > > the real device which is the parent of the shadow device.
> > > 
> > > Finally we add the support for parsing the ACPI platform data. The power
> > > management support is missing in the ACPI (and this is not specific to
> > > CoreSight). The firmware must ensure that the respective power domains
> > > are turned on.
> > > 
> > > Applies on v5.2-rc1
> > > 
> > > Tested on a Juno-r0 board with ACPI bindings patch (Patch 31/30) added on
> > > top of [2]. You would need to make sure that the debug power domain is
> > > turned on before the Linux kernel boots. (e.g, connect the DS-5 to the
> > > Juno board while at UEFI). arm32 code is only compile tested.
> > 
> > After I applied this patch set, I found all device names under
> > '/sys/bus/event_source/devices/cs_etm/sinks/' have been changed as
> > below on my DB410c board:
> > # ls /sys/bus/event_source/devices/cs_etm/sinks/
> > tmc_etf0  tmc_etr0  tpiu0
> 
> Yes, that is the expected behavior.
> 
> > 
> > This leads to below command failure when open PMU device:
> > # perf record -e cs_etm/@826000.etr/ --per-thread uname
> > failed to set sink "826000.etr" on event cs_etm/@826000.etr/ with 2 (No 
> > such file or directory)
> 
> Correct.
> 
> > 
> > I must use below command so that perf can match string with the
> > device name under '/sys/bus/event_source/devices/cs_etm/sinks/':
> > # perf record -e cs_etm/@tmc_etr0/ --per-thread uname
> 
> Correct.
> 
> > 
> > Seems to me, this is an unexpected change and when I worked on the
> > patch set v2, IIRC that version still can use '826000.etr' to open PMU
> > device.
> 
> Correct - v2 did not address the new naming convention for devices present 
> under
> 'event_source', something that was corrected in v3.

Thanks for confirmation, Mathieu.

Re: linux-next: Tree for May 28 (platform/olpc/olpc-xo175-ec)

2019-05-28 Thread Lubomir Rintel

On Tue, 2019-05-28 at 11:05 -0700, Randy Dunlap wrote:
> On 5/27/19 9:58 PM, Stephen Rothwell wrote:
> > Hi all,
> > 
> > Changes since 20190524:
> > 
> 
> on x86, there are some issues with drivers/platform/olpc/olpc-xo175-ec.c:
> 
> a. when CONFIG_SPI is not set/enabled:
> 
> WARNING: unmet direct dependencies detected for SPI_SLAVE
>   Depends on [n]: SPI [=n]
>   Selected by [y]:
>   - OLPC_XO175_EC [=y] && (ARCH_MMP || COMPILE_TEST [=y])
> 
> ld: drivers/platform/olpc/olpc-xo175-ec.o: in function `olpc_xo175_ec_remove':
> olpc-xo175-ec.c:(.text+0x79): undefined reference to `spi_slave_abort'
> ld: drivers/platform/olpc/olpc-xo175-ec.o: in function 
> `olpc_xo175_ec_send_command':
> olpc-xo175-ec.c:(.text+0x24d): undefined reference to `spi_async'
> ld: drivers/platform/olpc/olpc-xo175-ec.o: in function `olpc_xo175_ec_cmd':
> olpc-xo175-ec.c:(.text+0xb3c): undefined reference to `spi_slave_abort'
> ld: drivers/platform/olpc/olpc-xo175-ec.o: in function 
> `olpc_xo175_ec_spi_driver_init':
> olpc-xo175-ec.c:(.init.text+0xa): undefined reference to 
> `__spi_register_driver'
> 
> b. when CONFIG_INPUT is not set/enabled:
> 
> ERROR: "input_register_device" [drivers/platform/olpc/olpc-xo175-ec.ko] 
> undefined!
> ERROR: "input_set_capability" [drivers/platform/olpc/olpc-xo175-ec.ko] 
> undefined!
> ERROR: "devm_input_allocate_device" [drivers/platform/olpc/olpc-xo175-ec.ko] 
> undefined!
> ERROR: "input_event" [drivers/platform/olpc/olpc-xo175-ec.ko] undefined!
> 
> c. when some power mgt. Kconfig symbol is not set/enabled:
> 
> ERROR: "power_supply_put" [drivers/platform/olpc/olpc-xo175-ec.ko] undefined!
> ERROR: "power_supply_changed" [drivers/platform/olpc/olpc-xo175-ec.ko] 
> undefined!
> ERROR: "power_supply_get_by_name" [drivers/platform/olpc/olpc-xo175-ec.ko] 
> undefined!
> 
> d. drivers/platform/olpc/Kconfig needs to use "menuconfig" like all of the 
> other
>Kconfig files in drivers/platform/ so that its menu is listed in the 
> correct
>place in *config interfaces.

Hi

Thanks for the heads up.

I think YueHaibing  sent in patches for a. and
b. -- I'll follow up with the fixes for the rest.

> :(

:(

Lubo

Re: [GIT PULL] pin control fixes for v5.2

2019-05-28 Thread Linus Walleij

On Tue, May 28, 2019 at 6:44 PM Linus Torvalds
 wrote:
> On Tue, May 28, 2019 at 1:44 AM Linus Walleij  
> wrote:
> >
> > The outstanding commits are the Intel fixes [..]
>
> Heh. Swedism? "Outstanding" in English means "exceptionally good". I
> suspect you meant commits that "står ut", which translates to "stands
> out".

Dammit it is a Swedishism of course, it happens when I'm stressed.
Luckily there is another one on the other end.

Thanks,
Linus Walleij

[PATCH 1/1] scsi: esas2r: esas2r_init: check return value

2019-05-28 Thread Xidong Wang

In esas2r_resume(), the return value of pci_enable_device() is not
checked before pdev is used.

Signed-off-by: Xidong Wang 
---
 drivers/scsi/esas2r/esas2r_init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/esas2r/esas2r_init.c 
b/drivers/scsi/esas2r/esas2r_init.c
index 950cd92..883d35f 100644
--- a/drivers/scsi/esas2r/esas2r_init.c
+++ b/drivers/scsi/esas2r/esas2r_init.c
@@ -686,6 +686,9 @@ int esas2r_resume(struct pci_dev *pdev)
esas2r_log_dev(ESAS2R_LOG_INFO, &(pdev->dev),
   "pci_enable_device() called");
rez = pci_enable_device(pdev);
+   if (rez < 0) {
+   goto error_exit;
+   }
pci_set_master(pdev);
 
if (!a) {
-- 
2.7.4

Re: [PATCH] wd719x: pass GFP_ATOMIC instead of GFP_KERNEL

2019-05-28 Thread Christoph Hellwig

On Wed, May 29, 2019 at 07:05:40AM +0530, Hariprasad Kelam wrote:
> wd719x_chip_init is getting called in interrupt disabled
> mode(spin_lock_irqsave) , so we need to GFP_ATOMIC instead
> of GFP_KERNEL.
> 
> Issue identified by coccicheck

I don't think request_firmware is any more happy being called under
a spinlock.  The right fix is to not hold a spinlock over the board
initialization.

Re: [PATCH 1/1] drm/panel: truly: Add additional delay after pulling down reset gpio

2019-05-28 Thread Vivek Gautam





On 5/28/2019 2:13 PM, Marc Gonzalez wrote:

On 27/05/2019 12:26, Vivek Gautam wrote:


MTP SDM845 panel seems to need additional delay to bring panel
to a workable state. Running modetest without this change displays
blurry artifacts.

Signed-off-by: Vivek Gautam 
---
  drivers/gpu/drm/panel/panel-truly-nt35597.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/panel/panel-truly-nt35597.c 
b/drivers/gpu/drm/panel/panel-truly-nt35597.c
index fc2a66c53db4..aa7153fd3be4 100644
--- a/drivers/gpu/drm/panel/panel-truly-nt35597.c
+++ b/drivers/gpu/drm/panel/panel-truly-nt35597.c
@@ -280,6 +280,7 @@ static int truly_35597_power_on(struct truly_nt35597 *ctx)
gpiod_set_value(ctx->reset_gpio, 1);
usleep_range(1, 2);
gpiod_set_value(ctx->reset_gpio, 0);
+   usleep_range(1, 2);

I'm not sure usleep_range() makes sense with these values.

AFAIU, usleep_range() is typically used for sub-jiffy sleeps, and is based
on HRT to generate an interrupt.

Once we get into jiffy granularity, it seems to me msleep() is good enough.
IIUC, it would piggy-back on the jiffy timer interrupt.

In short, why not just use msleep(10); ?


I am just maintaining the symmetry across older code.

Thanks
Vivek


Regards.

[PATCH] vmalloc: Don't use flush flag when no exec perm

2019-05-28 Thread Rick Edgecombe

The addition of VM_FLUSH_RESET_PERMS for BPF JIT allocations was
bisected to prevent boot on an UltraSparc III machine. It was found that
sometime shortly after the TLB flush this flag does on vfree of the BPF
program, the machine hung. Further investigation showed that before any of
the changes for this flag were introduced, with CONFIG_DEBUG_PAGEALLOC
configured (which does a similar TLB flush of the vmalloc range on
every vfree), this machine also hung shortly after the first vmalloc
unmap/free.

So the evidence points to there being some existing issue with the
vmalloc TLB flushes, but it's still unknown exactly why these hangs are
happening on sparc. It is also unknown when someone with this hardware
could resolve this, and in the meantime using this flag on it turns a
lurking behavior into something that prevents boot.

However Linux on sparc64 doesn't restrict executable permissions and so
there is actually not really a need to use this flag. If normal memory is
executable, any memory copied from the user could be executed without any
extra steps. There also isn't a need to reset direct map permissions. So
to work around this issue we can just not use the flag in these cases.

So change the helper that sets this flag to simply not set it if the
architecture has these properties. Do this by comparing if PAGE_KERNEL is
the same as PAGE_KERNEL_EXEC. Also make the logic always do the flush if
an architecture has a way to reset direct map permissions by checking
CONFIG_ARCH_HAS_SET_DIRECT_MAP. Place the helper in vmalloc.c to work
around header dependency issues. Also, remove VM_FLUSH_RESET_PERMS from
vmalloc_exec() so it doesn't get set unconditionally anywhere.

Note, today arm has direct map permissions and no
CONFIG_ARCH_HAS_SET_DIRECT_MAP, but it also restricts executable
permissions so this logic will work today. When arm adds
set_direct_map_() implementations and removes the set_memory_() block from
from vm_remove_mappings() as currently proposed, then this will be correct
as well.

This logic could be put in vm_remove_mappings() instead, but doing it this
way leaves the raw flag generic and open for future usages. So change the
name of the helper to match its new conditional properties.

Fixes: d53d2f7 ("bpf: Use vmalloc special flag")
Reported-by: Meelis Roos 
Cc: David S. Miller 
Cc: pet...@infradead.org 
Cc: Nadav Amit 
Cc: Ard Biesheuvel 
Signed-off-by: Rick Edgecombe 
---

Hi,

This is what I came up with for working around the sparc issue. The
other solution I had looked at was making a CONFIG_ARCH_NEEDS_VM_FLUSH
and just opt out only sparc. Very open to suggestions.

 arch/x86/kernel/ftrace.c   |  2 +-
 arch/x86/kernel/kprobes/core.c |  2 +-
 include/linux/filter.h |  4 ++--
 include/linux/vmalloc.h| 10 ++
 kernel/module.c|  4 ++--
 mm/vmalloc.c   | 25 ++---
 6 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 0927bb158ffc..9793f6491882 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -823,7 +823,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int 
*tramp_size)
/* ALLOC_TRAMP flags lets us know we created it */
ops->flags |= FTRACE_OPS_FL_ALLOC_TRAMP;
 
-   set_vm_flush_reset_perms(trampoline);
+   set_vm_flush_if_needed(trampoline);
 
/*
 * Module allocation needs to be completed by making the page
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 9e4fa2484d10..2e3c31c63a6f 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -434,7 +434,7 @@ void *alloc_insn_page(void)
if (!page)
return NULL;
 
-   set_vm_flush_reset_perms(page);
+   set_vm_flush_if_needed(page);
/*
 * First make the page read-only, and only then make it executable to
 * prevent it from being W+X in between.
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 7148bab96943..7b20d43a9cf1 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -735,13 +735,13 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 
size_default)
 
 static inline void bpf_prog_lock_ro(struct bpf_prog *fp)
 {
-   set_vm_flush_reset_perms(fp);
+   set_vm_flush_if_needed(fp);
set_memory_ro((unsigned long)fp, fp->pages);
 }
 
 static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
 {
-   set_vm_flush_reset_perms(hdr);
+   set_vm_flush_if_needed(hdr);
set_memory_ro((unsigned long)hdr, hdr->pages);
set_memory_x((unsigned long)hdr, hdr->pages);
 }
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 51e131245379..2fdd1d62a603 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -151,13 +151,7 @@ extern int map_kernel_range_noflush(unsigned long start, 
unsigned long size,

Re: [PATCH V2 09/12] soc/tegra: pmc: add pmc wake support for tegra210

2019-05-28 Thread JC Kuo


Hi Sowjanya,

usleep_range() in tegra210_pmc_irq_set_wake() should be replaced with 
udelay() because caller irq_set_irq_wake() acquired spinlock and made 
this context atomic.


Thanks,

JC

On 5/29/19 7:08 AM, Sowjanya Komatineni wrote:

This patch implements PMC wakeup sequence for Tegra210 and defines
common used wake events of RTC alarm and power key.

Signed-off-by: Sowjanya Komatineni 
---
  drivers/soc/tegra/pmc.c | 113 
  1 file changed, 113 insertions(+)

diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
index 974b4c9f6ada..54dc8409e353 100644
--- a/drivers/soc/tegra/pmc.c
+++ b/drivers/soc/tegra/pmc.c
@@ -57,6 +57,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #define PMC_CNTRL			0x0

  #define  PMC_CNTRL_INTR_POLARITY  BIT(17) /* inverts INTR polarity */
@@ -66,6 +67,12 @@
  #define  PMC_CNTRL_SYSCLK_OE  BIT(11) /* system clock enable */
  #define  PMC_CNTRL_SYSCLK_POLARITYBIT(10) /* sys clk polarity */
  #define  PMC_CNTRL_MAIN_RST   BIT(4)
+#define  PMC_CNTRL_LATCH_WAKEUPS   BIT(5)
+
+#define PMC_WAKE_MASK  0x0c
+#define PMC_WAKE_LEVEL 0x10
+#define PMC_WAKE_STATUS0x14
+#define PMC_SW_WAKE_STATUS 0x18
  
  #define DPD_SAMPLE			0x020

  #define  DPD_SAMPLE_ENABLEBIT(0)
@@ -96,6 +103,11 @@
  
  #define PMC_SCRATCH41			0x140
  
+#define PMC_WAKE2_MASK			0x160

+#define PMC_WAKE2_LEVEL0x164
+#define PMC_WAKE2_STATUS   0x168
+#define PMC_SW_WAKE2_STATUS0x16c
+
  #define PMC_SENSOR_CTRL   0x1b0
  #define  PMC_SENSOR_CTRL_SCRATCH_WRITEBIT(2)
  #define  PMC_SENSOR_CTRL_ENABLE_RST   BIT(1)
@@ -245,6 +257,7 @@ struct tegra_pmc_soc {
  
  	const struct tegra_wake_event *wake_events;

unsigned int num_wake_events;
+   unsigned int max_supported_wake_events;
  };
  
  static const char * const tegra186_reset_sources[] = {

@@ -1917,6 +1930,54 @@ static const struct irq_domain_ops 
tegra_pmc_irq_domain_ops = {
.alloc = tegra_pmc_irq_alloc,
  };
  
+static int tegra210_pmc_irq_set_wake(struct irq_data *data, unsigned int on)

+{
+   struct tegra_pmc *pmc = irq_data_get_irq_chip_data(data);
+   unsigned int offset, bit;
+   u32 value;
+
+   if (data->hwirq == ULONG_MAX)
+   return 0;
+
+   offset = data->hwirq / 32;
+   bit = data->hwirq % 32;
+
+   /*
+* latch wakeups to SW_WAKE_STATUS register to capture events
+* that would not make it into wakeup event register during LP0 exit.
+*/
+   value = tegra_pmc_readl(pmc, PMC_CNTRL);
+   value |= PMC_CNTRL_LATCH_WAKEUPS;
+   tegra_pmc_writel(pmc, value, PMC_CNTRL);
+   usleep_range(110, 120);
+
+   value &= ~PMC_CNTRL_LATCH_WAKEUPS;
+   tegra_pmc_writel(pmc, value, PMC_CNTRL);
+   usleep_range(110, 120);
+
+   tegra_pmc_writel(pmc, 0, PMC_SW_WAKE_STATUS);
+   if (pmc->soc->max_supported_wake_events > 32)
+   tegra_pmc_writel(pmc, 0, PMC_SW_WAKE2_STATUS);
+
+   tegra_pmc_writel(pmc, 0, PMC_WAKE_STATUS);
+   if (pmc->soc->max_supported_wake_events > 32)
+   tegra_pmc_writel(pmc, 0, PMC_WAKE2_STATUS);
+
+   /* enable PMC wake */
+   if (data->hwirq >= 32)
+   offset = PMC_WAKE2_MASK;
+   else
+   offset = PMC_WAKE_MASK;
+   value = tegra_pmc_readl(pmc, offset);
+   if (on)
+   value |= 1 << bit;
+   else
+   value &= ~(1 << bit);
+   tegra_pmc_writel(pmc, value, offset);
+
+   return 0;
+}
+
  static int tegra186_pmc_irq_set_wake(struct irq_data *data, unsigned int on)
  {
struct tegra_pmc *pmc = irq_data_get_irq_chip_data(data);
@@ -1948,6 +2009,48 @@ static int tegra186_pmc_irq_set_wake(struct irq_data 
*data, unsigned int on)
return 0;
  }
  
+static int tegra210_pmc_irq_set_type(struct irq_data *data, unsigned int type)

+{
+   struct tegra_pmc *pmc = irq_data_get_irq_chip_data(data);
+   unsigned int offset, bit;
+   u32 value;
+
+   if (data->hwirq == ULONG_MAX)
+   return 0;
+
+   offset = data->hwirq / 32;
+   bit = data->hwirq % 32;
+
+   if (data->hwirq >= 32)
+   offset = PMC_WAKE2_LEVEL;
+   else
+   offset = PMC_WAKE_LEVEL;
+   value = tegra_pmc_readl(pmc, offset);
+
+   switch (type) {
+   case IRQ_TYPE_EDGE_RISING:
+   case IRQ_TYPE_LEVEL_HIGH:
+   value |= 1 << bit;
+   break;
+
+   case IRQ_TYPE_EDGE_FALLING:
+   case IRQ_TYPE_LEVEL_LOW:
+   value &= ~(1 << bit);
+   break;
+
+   case IRQ_TYPE_EDGE_RISING | IRQ_TYPE_EDGE_FALLING:
+   value ^= 1 << bit;
+   break;
+
+   default:
+   return -EINVAL;
+   }
+
+   tegra_pmc_writel(pmc, value, offset);
+
+   return 0;

[PATCH v7 3/3] x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector

2019-05-28 Thread Zhao Yakui

Linux kernel uses the HYPERVISOR_CALLBACK_VECTOR for hypervisor upcall
vector. It is already used for Xen and HyperV.
After the ACRN hypervisor is detected, it will also use this defined
vector to notify the ACRN guest.

Co-developed-by: Jason Chen CJ 
Signed-off-by: Jason Chen CJ 
Signed-off-by: Zhao Yakui 
Reviewed-by: Thomas Gleixner 
---
V1->V2: Remove the unused API definition of acrn_setup_intr_handler and
acrn_remove_intr_handler.
Adjust the order of header file
Add the declaration of acrn_hv_vector_handler and tracing
definition of acrn_hv_callback_vector.

v2->v3: No change
v3->v4: Refine the file name of acrnhyper.h to acrn.h
v5->v6: Add the "extern" for the function declarations in header file
Add some comments for calling entering_ack_irq
Some other minor changes(unnecessary spliting two lines.
and minor change in commit log)
v6->v7: Include the header file of asm/apic.h to fix the buidling error
when enabling cflags="-Werror=implict-function-declaration".
---
 arch/x86/Kconfig|  1 +
 arch/x86/entry/entry_64.S   |  5 +
 arch/x86/include/asm/acrn.h | 11 +++
 arch/x86/kernel/cpu/acrn.c  | 30 ++
 4 files changed, 47 insertions(+)
 create mode 100644 arch/x86/include/asm/acrn.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3020bc7..170d5cf 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -838,6 +838,7 @@ config JAILHOUSE_GUEST
 config ACRN_GUEST
bool "ACRN Guest support"
depends on X86_64
+   select X86_HV_CALLBACK_VECTOR
help
  This option allows to run Linux as guest in ACRN hypervisor. Enabling
  this will allow the kernel to boot in virtualized environment under
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 11aa3b2..2fe6289 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1142,6 +1142,11 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
+#if IS_ENABLED(CONFIG_ACRN_GUEST)
+apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
+   acrn_hv_callback_vector acrn_hv_vector_handler
+#endif
+
 idtentry debug do_debughas_error_code=0
paranoid=1 shift_ist=IST_INDEX_DB ist_offset=DB_STACK_OFFSET
 idtentry int3  do_int3 has_error_code=0
create_gap=1
 idtentry stack_segment do_stack_segmenthas_error_code=1
diff --git a/arch/x86/include/asm/acrn.h b/arch/x86/include/asm/acrn.h
new file mode 100644
index 000..4adb13f
--- /dev/null
+++ b/arch/x86/include/asm/acrn.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_ACRN_H
+#define _ASM_X86_ACRN_H
+
+extern void acrn_hv_callback_vector(void);
+#ifdef CONFIG_TRACING
+#define trace_acrn_hv_callback_vector acrn_hv_callback_vector
+#endif
+
+extern void acrn_hv_vector_handler(struct pt_regs *regs);
+#endif /* _ASM_X86_ACRN_H */
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
index f556640..a110c8b 100644
--- a/arch/x86/kernel/cpu/acrn.c
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -9,7 +9,12 @@
  *
  */
 
+#include 
+#include 
+#include 
+#include 
 #include 
+#include 
 
 static uint32_t __init acrn_detect(void)
 {
@@ -18,6 +23,8 @@ static uint32_t __init acrn_detect(void)
 
 static void __init acrn_init_platform(void)
 {
+   /* Setup the IDT for ACRN hypervisor callback */
+   alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, acrn_hv_callback_vector);
 }
 
 static bool acrn_x2apic_available(void)
@@ -30,6 +37,29 @@ static bool acrn_x2apic_available(void)
return false;
 }
 
+static void (*acrn_intr_handler)(void);
+
+__visible void __irq_entry acrn_hv_vector_handler(struct pt_regs *regs)
+{
+   struct pt_regs *old_regs = set_irq_regs(regs);
+
+   /*
+* The hypervisor requires that the APIC EOI should be acked.
+* If the APIC EOI is not acked, the APIC ISR bit for the
+* HYPERVISOR_CALLBACK_VECTOR will not be cleared and then it
+* will block the interrupt whose vector is lower than
+* HYPERVISOR_CALLBACK_VECTOR.
+*/
+   entering_ack_irq();
+   inc_irq_stat(irq_hv_callback_count);
+
+   if (acrn_intr_handler)
+   acrn_intr_handler();
+
+   exiting_irq();
+   set_irq_regs(old_regs);
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_acrn = {
.name   = "ACRN",
.detect = acrn_detect,
-- 
2.7.4

[PATCH v7 1/3] x86/Kconfig: Add new config symbol to unify conditional definition of hv_irq_callback_count

2019-05-28 Thread Zhao Yakui

Add a special Kconfig symbol X86_HV_CALLBACK_VECTOR so that the guests
using the hypervisor interrupt callback counter can select and thus
enable that counter. Select it when xen or hyperv support is enabled.
No functional changes.

Signed-off-by: Zhao Yakui 
Reviewed-by: Borislav Petkov 
Reviewed-by: Thomas Gleixner 
---
v3->v4: Follow the comments to refine the commit log.
---
 arch/x86/Kconfig   | 3 +++
 arch/x86/include/asm/hardirq.h | 2 +-
 arch/x86/kernel/irq.c  | 2 +-
 arch/x86/xen/Kconfig   | 1 +
 drivers/hv/Kconfig | 1 +
 5 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2bbbd4d..c9ab090 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -781,6 +781,9 @@ config PARAVIRT_SPINLOCKS
 
  If you are unsure how to answer this question, answer Y.
 
+config X86_HV_CALLBACK_VECTOR
+   def_bool n
+
 source "arch/x86/xen/Kconfig"
 
 config KVM_GUEST
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index d9069bb..0753379 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -37,7 +37,7 @@ typedef struct {
 #ifdef CONFIG_X86_MCE_AMD
unsigned int irq_deferred_error_count;
 #endif
-#if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
+#ifdef CONFIG_X86_HV_CALLBACK_VECTOR
unsigned int irq_hv_callback_count;
 #endif
 #if IS_ENABLED(CONFIG_HYPERV)
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 9b68b5b..4e8f193 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -135,7 +135,7 @@ int arch_show_interrupts(struct seq_file *p, int prec)
seq_printf(p, "%10u ", per_cpu(mce_poll_count, j));
seq_puts(p, "  Machine check polls\n");
 #endif
-#if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
+#ifdef CONFIG_X86_HV_CALLBACK_VECTOR
if (test_bit(HYPERVISOR_CALLBACK_VECTOR, system_vectors)) {
seq_printf(p, "%*s: ", prec, "HYP");
for_each_online_cpu(j)
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index e07abef..ba5a418 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -7,6 +7,7 @@ config XEN
bool "Xen guest support"
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select X86_HV_CALLBACK_VECTOR
depends on X86_64 || (X86_32 && X86_PAE)
depends on X86_LOCAL_APIC && X86_TSC
help
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 1c1a251..cafcb97 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -6,6 +6,7 @@ config HYPERV
tristate "Microsoft Hyper-V client drivers"
depends on X86 && ACPI && X86_LOCAL_APIC && HYPERVISOR_GUEST
select PARAVIRT
+   select X86_HV_CALLBACK_VECTOR
help
  Select this option to run Linux as a Hyper-V client operating
  system.
-- 
2.7.4

[PATCH v7 2/3] x86: Add the support of Linux guest on ACRN hypervisor

2019-05-28 Thread Zhao Yakui

ACRN is an open-source hypervisor maintained by Linux Foundation.
It is built for embedded IOT with small footprint and real-time features.
Add the ACRN guest support so that it allows linux to be booted under the
ACRN hypervisor. Following this patch it will setup the upcall
notification vector, enable hypercall and provide the interface that is
used to manage the virtualized CPU/memory/device/interrupt for other
guest OS.

Co-developed-by: Jason Chen CJ 
Signed-off-by: Jason Chen CJ 
Signed-off-by: Zhao Yakui 
Reviewed-by: Thomas Gleixner 
---
v1->v2: Change the CONFIG_ACRN to CONFIG_ACRN_GUEST, which makes it easy to
understand.
Remove the export of x86_hyper_acrn.

v2->v3: Remove the unnecessary dependency of PARAVIRT
v3->v4: Refine the commit log and add more meaningful description in Kconfig
v4->v5: No change
v5->v6: No change
v6->v7: No change
---
 arch/x86/Kconfig  | 12 
 arch/x86/include/asm/hypervisor.h |  1 +
 arch/x86/kernel/cpu/Makefile  |  1 +
 arch/x86/kernel/cpu/acrn.c| 39 +++
 arch/x86/kernel/cpu/hypervisor.c  |  4 
 5 files changed, 57 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/acrn.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c9ab090..3020bc7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -835,6 +835,18 @@ config JAILHOUSE_GUEST
  cell. You can leave this option disabled if you only want to start
  Jailhouse and run Linux afterwards in the root cell.
 
+config ACRN_GUEST
+   bool "ACRN Guest support"
+   depends on X86_64
+   help
+ This option allows to run Linux as guest in ACRN hypervisor. Enabling
+ this will allow the kernel to boot in virtualized environment under
+ the ACRN hypervisor.
+ ACRN is a flexible, lightweight reference open-source hypervisor, 
built
+ with real-time and safety-criticality in mind. It is built for 
embedded
+ IOT with small footprint and real-time features. More details can be
+ found in https://projectacrn.org/
+
 endif #HYPERVISOR_GUEST
 
 source "arch/x86/Kconfig.cpu"
diff --git a/arch/x86/include/asm/hypervisor.h 
b/arch/x86/include/asm/hypervisor.h
index 8c5aaba..50a30f6 100644
--- a/arch/x86/include/asm/hypervisor.h
+++ b/arch/x86/include/asm/hypervisor.h
@@ -29,6 +29,7 @@ enum x86_hypervisor_type {
X86_HYPER_XEN_HVM,
X86_HYPER_KVM,
X86_HYPER_JAILHOUSE,
+   X86_HYPER_ACRN,
 };
 
 #ifdef CONFIG_HYPERVISOR_GUEST
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 1796d2b..b9b3017 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -44,6 +44,7 @@ obj-$(CONFIG_X86_CPU_RESCTRL) += resctrl/
 obj-$(CONFIG_X86_LOCAL_APIC)   += perfctr-watchdog.o
 
 obj-$(CONFIG_HYPERVISOR_GUEST) += vmware.o hypervisor.o mshyperv.o
+obj-$(CONFIG_ACRN_GUEST)   += acrn.o
 
 ifdef CONFIG_X86_FEATURE_NAMES
 quiet_cmd_mkcapflags = MKCAP   $@
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
new file mode 100644
index 000..f556640
--- /dev/null
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ACRN detection support
+ *
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ *
+ * Jason Chen CJ 
+ * Zhao Yakui 
+ *
+ */
+
+#include 
+
+static uint32_t __init acrn_detect(void)
+{
+   return hypervisor_cpuid_base("ACRNACRNACRN\0\0", 0);
+}
+
+static void __init acrn_init_platform(void)
+{
+}
+
+static bool acrn_x2apic_available(void)
+{
+   /* x2apic is not supported now.
+* Later it needs to check the X86_FEATURE_X2APIC bit of cpu info
+* returned by CPUID to determine whether the x2apic is
+* supported in Linux guest.
+*/
+   return false;
+}
+
+const __initconst struct hypervisor_x86 x86_hyper_acrn = {
+   .name   = "ACRN",
+   .detect = acrn_detect,
+   .type   = X86_HYPER_ACRN,
+   .init.init_platform = acrn_init_platform,
+   .init.x2apic_available  = acrn_x2apic_available,
+};
diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c
index 479ca47..87e39ad 100644
--- a/arch/x86/kernel/cpu/hypervisor.c
+++ b/arch/x86/kernel/cpu/hypervisor.c
@@ -32,6 +32,7 @@ extern const struct hypervisor_x86 x86_hyper_xen_pv;
 extern const struct hypervisor_x86 x86_hyper_xen_hvm;
 extern const struct hypervisor_x86 x86_hyper_kvm;
 extern const struct hypervisor_x86 x86_hyper_jailhouse;
+extern const struct hypervisor_x86 x86_hyper_acrn;
 
 static const __initconst struct hypervisor_x86 * const hypervisors[] =
 {
@@ -49,6 +50,9 @@ static const __initconst struct hypervisor_x86 * const 
hypervisors[] =
 #ifdef CONFIG_JAILHOUSE_GUEST
&x86_hyper_jailhouse,
 #endif
+#ifdef CONFIG_ACRN_GUEST
+   &x86_hyper_acrn,
+#endif
 };
 
 enum x86_hypervisor_type x86_h

[PATCH v7 0/3] x86: Add the support of ACRN guest under x86

2019-05-28 Thread Zhao Yakui

ACRN is a flexible, lightweight reference hypervisor, built with real-time
and safety-criticality in mind, optimized to streamline embedded development
through an open source platform. It is built for embedded IOT with small
footprint and real-time features. More details can be found
in https://projectacrn.org/

This is the patch set that allows the Linux to work on ACRN hypervisor and it 
can
work with the following patch set to manage the Linux guest on ACRN hypervisor. 
It
includes the detection of ACRN hypervisor, upcall notification vector from
hypervisor, hypercall. The hypervisor detection is similar to Xen/VMWARE/Hyperv.
ACRN also uses the upcall notification mechanism similar to that in 
Xen/Microsoft
HyperV when it needs to send the notification to Linux guest. The hypercall 
provides
the mechanism that can be used to query/configure the ACRN hypervisor by Linux 
guest.

Following this patch set, we will send acrn driver part, which provides the 
interface
that can be used to manage the virtualized CPU/memory/device/interrupt for 
other guest
OS after the ACRN hypervisor is detected.

v1->v2: Change the CONFIG_ACRN to CONFIG_ACRN_GUEST, which makes it easy to
understand.
Remove the export of x86_hyper_acrn.
Remove the unused API definition of acrn_setup_intr_handler and
acrn_remove_intr_handler.
Adjust the order of header file
Add the declaration of acrn_hv_vector_handler and tracing
definition of acrn_hv_callback_vector.
Refine the comments for the function of acrn_hypercall0/1/2

v2-v3:  Add one new config symbol to unify the conditional definition
of hv_irq_callback_count
Use the "vmcall" mnemonic to replace the hard-code byte definition
Remove the unnecessary dependency of CONFIG_PARAVIRT for ACRN_GUEST

v3-v4:  Rename the file name of acrnhyper.h to acrn.h
Refine the commit log and some other minor changes(more comments and 
redundant ifdef in acrn.h, sorting the header file in acrn.c)

v4->v5: Minor changes of comments/commit log in patch 04
Use _ASM_X86_ACRN_HYPERCALL_H instead of _ASM_X86_ACRNHYPERCALL_H.
Use the "VMCALL" mnemonic in comment/commit log.
Uppercase r8/rdi/rsi/rax for hypercall parameter register in comment.

v5->v6: Remove the explicit register variable for inline assembly
Add the "extern" for the function declaration in acrn.h
Add comments about acking ACPI EOI in acrn_hv_callback_handler
Minor changes for comments/commit log in patch 03/04

v6->v7: Add the missing header file of asm/apic.h in acrn.c
Remove the definition of ACRN hypercall as it is not used in this
patch set.


Zhao Yakui (3):
  x86/Kconfig: Add new config symbol to unify conditional definition of
hv_irq_callback_count
  x86: Add the support of Linux guest on ACRN hypervisor
  x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector

 arch/x86/Kconfig  | 16 +
 arch/x86/entry/entry_64.S |  5 +++
 arch/x86/include/asm/acrn.h   | 11 +++
 arch/x86/include/asm/hardirq.h|  2 +-
 arch/x86/include/asm/hypervisor.h |  1 +
 arch/x86/kernel/cpu/Makefile  |  1 +
 arch/x86/kernel/cpu/acrn.c| 69 +++
 arch/x86/kernel/cpu/hypervisor.c  |  4 +++
 arch/x86/kernel/irq.c |  2 +-
 arch/x86/xen/Kconfig  |  1 +
 drivers/hv/Kconfig|  1 +
 11 files changed, 111 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/include/asm/acrn.h
 create mode 100644 arch/x86/kernel/cpu/acrn.c

-- 
2.7.4

Re: [PATCH] extcon: arizona: Correct error handling on regmap_update_bits_check

2019-05-28 Thread Chanwoo Choi

Hi,

On 19. 5. 29. 오전 1:50, Charles Keepax wrote:
> Ensure the case when regmap_update_bits_check fails and the change
> variable is not updated is handled correctly.
> 
> Signed-off-by: Charles Keepax 
> ---
>  drivers/extcon/extcon-arizona.c | 22 +-
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/extcon/extcon-arizona.c b/drivers/extcon/extcon-arizona.c
> index 9327479c719c2..ba2d16de161f8 100644
> --- a/drivers/extcon/extcon-arizona.c
> +++ b/drivers/extcon/extcon-arizona.c
> @@ -335,10 +335,12 @@ static void arizona_start_mic(struct 
> arizona_extcon_info *info)
>  
>   arizona_extcon_pulse_micbias(info);
>  
> - regmap_update_bits_check(arizona->regmap, ARIZONA_MIC_DETECT_1,
> -  ARIZONA_MICD_ENA, ARIZONA_MICD_ENA,
> -  &change);
> - if (!change) {
> + ret = regmap_update_bits_check(arizona->regmap, ARIZONA_MIC_DETECT_1,
> +ARIZONA_MICD_ENA, ARIZONA_MICD_ENA,
> +&change);
> + if (ret < 0) {
> + dev_err(arizona->dev, "Failed to enable micd: %d\n", ret);
> + } else if (!change) {
>   regulator_disable(info->micvdd);
>   pm_runtime_put_autosuspend(info->dev);
>   }
> @@ -350,12 +352,14 @@ static void arizona_stop_mic(struct arizona_extcon_info 
> *info)
>   const char *widget = arizona_extcon_get_micbias(info);
>   struct snd_soc_dapm_context *dapm = arizona->dapm;
>   struct snd_soc_component *component = snd_soc_dapm_to_component(dapm);
> - bool change;
> + bool change = false;
>   int ret;
>  
> - regmap_update_bits_check(arizona->regmap, ARIZONA_MIC_DETECT_1,
> -  ARIZONA_MICD_ENA, 0,
> -  &change);
> + ret = regmap_update_bits_check(arizona->regmap, ARIZONA_MIC_DETECT_1,
> +ARIZONA_MICD_ENA, 0,
> +&change);
> + if (ret < 0)
> + dev_err(arizona->dev, "Failed to disable micd: %d\n", ret);
>  
>   ret = snd_soc_component_disable_pin(component, widget);
>   if (ret != 0)
> @@ -1726,7 +1730,7 @@ static int arizona_extcon_remove(struct platform_device 
> *pdev)
>   struct arizona_extcon_info *info = platform_get_drvdata(pdev);
>   struct arizona *arizona = info->arizona;
>   int jack_irq_rise, jack_irq_fall;
> - bool change;
> + bool change = false;
>  
>   regmap_update_bits_check(arizona->regmap, ARIZONA_MIC_DETECT_1,
>ARIZONA_MICD_ENA, 0,

You better to check the return value as the part of this patch.


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

linux-next: Tree for May 29

2019-05-28 Thread Stephen Rothwell

Hi all,

Changes since 20190528:

New tree: s390-fixes

The akpm-current tree still had its build failure due to an interaction
with the ftrace tree for which I reverted 2 commits.

Non-merge commits (relative to Linus' tree): 2617
 2837 files changed, 98532 insertions(+), 49203 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 291 trees (counting Linus' and 70 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (9fb67d643f6f Merge tag 'pinctrl-v5.2-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl)
Merging fixes/master (2bbacd1a9278 Merge tag 'kconfig-v5.2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging kspp-gustavo/for-next/kspp (034e673710d3 platform/x86: acer-wmi: Mark 
expected switch fall-throughs)
Merging kbuild-current/fixes (30a28f11b618 kbuild: tar-pkg: enable 
communication with jobserver)
Merging arc-current/for-curr (4c70850aeb2e ARC: [plat-hsdk]: Add missing FIFO 
size entry in GMAC node)
Merging arm-current/fixes (e17b1af96b2a ARM: 8857/1: efi: enable CP15 DMB 
instructions before cleaning the cache)
Merging arm64-fixes/for-next/fixes (edbcf50eb8ae arm64: insn: Add 
BUILD_BUG_ON() for invalid masks)
Merging m68k-current/for-linus (fdd20ec8786a Documentation/features/time: Mark 
m68k having modern-timekeeping)
Merging powerpc-fixes/fixes (8b909e354870 powerpc/kexec: Fix loading of kernel 
+ initramfs with kexec_file_load())
Merging s390-fixes/fixes (bef9f0ba300a s390/crypto: fix gcm-aes-s390 selftest 
failures)
Merging sparc/master (54dee406374c Merge tag 'arm64-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (7aae703f8096 dpaa_eth: use only online CPU portals)
Merging bpf/master (bd95e678e0f6 bpf: sockmap, fix use after free from sleep in 
psock backlog workqueue)
Merging ipsec/master (7c80eb1c7e2b af_key: fix leaks in key_pol_get_resp and 
dump_sp.)
Merging netfilter/master (3e66b7cc50ef net: tulip: de4x5: Drop redundant 
MODULE_DEVICE_TABLE())
Merging ipvs/master (e633508a9528 netfilter: nft_fib: Fix existence check 
support)
Merging wireless-drivers/master (6aca09771db4 rtw88: Make some symbols static)
Merging mac80211/master (551842446ed6 mac80211: mesh: fix RCU warning)
Merging rdma-fixes/for-rc (619122be3d40 RDMA/hns: Fix PD memory leak for 
internal allocation)
Merging sound-current/for-linus (0b074ab7fc0d ALSA: line6: Assure canceling 
delayed work at disconnection)
Merging sound-asoc-fixes/for-linus (4796b9d9b0fb Merge branch 'asoc-5.2' into 
asoc-linus)
Merging regmap-fixes/for-linus (38ee2a8cc70e Merge branch 'regmap-5.2' into 
regmap-linus)
Merging regulator-fixes/for-linus (ae920866d4fc Merge branch 'regulator-5.2' 
into regulator-linus)
Merging spi-fixes/for-linus (10f94574dfdc Merge branch 'spi-5.2' into spi-linus)
Merging pci-current/for-linus (a188339ca5a3 Linux 5.2-rc1)
Merging driver-core.current/driver-core-linus (cd6c84d8f0cd Linux 5.2-rc2)
Merging tty.current/tty-linus (a1ad1cc9704f vt/fbcon: deinitialize resources in 
visual_init() after failed memory allocation)
Merging usb.current/usb-linus (a47686636d84 m

RE: [PATCH RESEND 2/5] ARM: dts: imx7d-sdb: Assign corresponding power supply for LDOs

2019-05-28 Thread Anson Huang

Hi, Leonard

> -Original Message-
> From: Leonard Crestez
> Sent: Wednesday, May 29, 2019 3:24 AM
> To: Anson Huang 
> Cc: robh...@kernel.org; mark.rutl...@arm.com; shawn...@kernel.org;
> s.ha...@pengutronix.de; ker...@pengutronix.de; feste...@gmail.com;
> devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org; dl-linux-imx 
> Subject: Re: [PATCH RESEND 2/5] ARM: dts: imx7d-sdb: Assign corresponding
> power supply for LDOs
> 
> On 12.05.2019 12:57, Anson Huang wrote:
> > On i.MX7D SDB board, sw2 supplies 1p0d/1p2 LDO, this patch assigns
> > corresponding power supply for 1p0d/1p2 LDO to avoid confusion by
> > below log:
> >
> > vdd1p0d: supplied by regulator-dummy
> > vdd1p2: supplied by regulator-dummy
> >
> > With this patch, the power supply is more accurate:
> >
> > vdd1p0d: supplied by SW2
> > vdd1p2: supplied by SW2
> >
> > diff --git a/arch/arm/boot/dts/imx7d-sdb.dts
> > b/arch/arm/boot/dts/imx7d-sdb.dts
> >
> > +®_1p0d {
> > +   vin-supply = <&sw2_reg>;
> > +};
> > +
> > +®_1p2 {
> > +   vin-supply = <&sw2_reg>;
> > +};
> 
> It's not clear why but this patch breaks imx7d-sdb boot. Checked two
> boards: in a board farm and on my desk.

Thanks for reporting this issue, I can reproduce it now, a quick debug shows
that with this patch, when setting reg_1p0d's voltage to 1.0V, the SW2's voltage
will be changed to 1.5V, the expected voltage should be 1.8V, so 1.5V cause 
board
reset. Below patch can fix this issue, but I am still checking if this is the 
best fix, once
I figure out, I will send out a fix patch for review:

+++ b/arch/arm/boot/dts/imx7d-sdb.dts
@@ -267,6 +267,7 @@
regulator-max-microvolt = <185>;
regulator-boot-on;
regulator-always-on;
+   regulator-max-step-microvolt = <25000>;
};

Thanks,
Anson

> 
> --
> Regards,
> Leonard

Re: [PATCH v5 0/2] Fix issues with vmalloc flush flag

2019-05-28 Thread Edgecombe, Rick P

On Tue, 2019-05-28 at 17:23 -0700, David Miller wrote:
> From: Rick Edgecombe 
> Date: Mon, 27 May 2019 14:10:56 -0700
> 
> > These two patches address issues with the recently added
> > VM_FLUSH_RESET_PERMS vmalloc flag.
> > 
> > Patch 1 addresses an issue that could cause a crash after other
> > architectures besides x86 rely on this path.
> > 
> > Patch 2 addresses an issue where in a rare case strange arguments
> > could be provided to flush_tlb_kernel_range(). 
> 
> It just occurred to me another situation that would cause trouble on
> sparc64, and that's if someone the address range of the main kernel
> image ended up being passed to flush_tlb_kernel_range().
> 
> That would flush the locked kernel mapping and crash the kernel
> instantly in a completely non-recoverable way.

Hmm, I haven't received the logs from Meelis that will show the real
ranges being passed into flush_tlb_kernel_range() on sparc, but it
should be flushing a range spanning from the modules to the direct map.
It looks like the kernel is at the very bottom of the address space, so
not included. Or do you mean the pages that hold the kernel text on the
direct map?

But regardless of this new code, DEBUG_PAGEALLOC hangs with the first
vmalloc free/unmap. That should be just flushing a single allocation in
the vmalloc range.

If it is somehow catching a locked entry though... Are there any sparc
flush mechanisms that could be used in vmalloc that won't touch locked
entries? Peter Z was pointing out that flush_tlb_all() might be more
approriate for vmalloc anyway.

Re: [RFC 1/7] mm: introduce MADV_COOL

2019-05-28 Thread Michal Hocko

On Wed 29-05-19 10:40:33, Hillf Danton wrote:
> 
> On Wed, 29 May 2019 00:11:15 +0800 Michal Hocko wrote:
> > On Tue 28-05-19 23:38:11, Hillf Danton wrote:
> > > 
> > > In short, I prefer to skip IO mapping since any kind of address range
> > > can be expected from userspace, and it may probably cover an IO mapping.
> > > And things can get out of control, if we reclaim some IO pages while
> > > underlying device is trying to fill data into any of them, for instance.
> > 
> > What do you mean by IO pages why what is the actual problem?
> > 
> Io pages are the backing-store pages of a mapping whose vm_flags has
> VM_IO set, and the comment in mm/memory.c says:
> /*
>  * Physically remapped pages are special. Tell the
>  * rest of the world about it:
>  *   VM_IO tells people not to look at these pages
>  *  (accesses can have side effects).
> 

OK, thanks for the clarification of the first part of the question. Now
to the second and the more important one. What is the actual concern?
AFAIK those pages shouldn't be on LRU list. If they are then they should
be safe to get reclaimed otherwise we would have a problem when
reclaiming them on the normal memory pressure. Why is this madvise any
different?
-- 
Michal Hocko
SUSE Labs

[RFC PATCH v3] rtl8xxxu: Improve TX performance of RTL8723BU on rtl8xxxu driver

2019-05-28 Thread Chris Chiu

We have 3 laptops which connect the wifi by the same RTL8723BU.
The PCI VID/PID of the wifi chip is 10EC:B720 which is supported.
They have the same problem with the in-kernel rtl8xxxu driver, the
iperf (as a client to an ethernet-connected server) gets ~1Mbps.
Nevertheless, the signal strength is reported as around -40dBm,
which is quite good. From the wireshark capture, the tx rate for each
data and qos data packet is only 1Mbps. Compare to the driver from
https://github.com/lwfinger/rtl8723bu, the same iperf test gets ~12
Mbps or more. The signal strength is reported similarly around
-40dBm. That's why we want to improve.

After reading the source code of the rtl8xxxu driver and Larry's, the
major difference is that Larry's driver has a watchdog which will keep
monitoring the signal quality and updating the rate mask just like the
rtl8xxxu_gen2_update_rate_mask() does if signal quality changes.
And this kind of watchdog also exists in rtlwifi driver of some specific
chips, ex rtl8192ee, rtl8188ee, rtl8723ae, rtl8821ae...etc. They have
the same member function named dm_watchdog and will invoke the
corresponding dm_refresh_rate_adaptive_mask to adjust the tx rate
mask.

With this commit, the tx rate of each data and qos data packet will
be 39Mbps (MCS4) with the 0xF0 as the tx rate mask. The 20th bit
to 23th bit means MCS4 to MCS7. It means that the firmware still picks
the lowest rate from the rate mask and explains why the tx rate of
data and qos data is always lowest 1Mbps because the default rate mask
passed is always 0xFFF ranges from the basic CCK rate, OFDM rate,
and MCS rate. However, with Larry's driver, the tx rate observed from
wireshark under the same condition is almost 65Mbps or 72Mbps.

I believe the firmware of RTL8723BU may need fix. And I think we
can still bring in the dm_watchdog as rtlwifi to improve from the
driver side. Please leave precious comments for my commits and
suggest what I can do better. Or suggest if there's any better idea
to fix this. Thanks.

Signed-off-by: Chris Chiu 
---


Notes:
  v2:
   - Fix errors and warnings complained by checkpatch.pl
   - Replace data structure rate_adaptive by 2 member variables
   - Make rtl8xxxu_wireless_mode non-static
   - Runs refresh_rate_mask() only in station mode
  v3:
   - Remove ugly rtl8xxxu_watchdog data structure
   - Make sure only one vif exists


 .../net/wireless/realtek/rtl8xxxu/rtl8xxxu.h  |  49 ++
 .../realtek/rtl8xxxu/rtl8xxxu_8723b.c | 145 ++
 .../wireless/realtek/rtl8xxxu/rtl8xxxu_core.c |  80 +-
 3 files changed, 273 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h 
b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
index 8828baf26e7b..42e9227f4d19 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
@@ -1195,6 +1195,44 @@ struct rtl8723bu_c2h {
 
 struct rtl8xxxu_fileops;
 
+/*mlme related.*/
+enum wireless_mode {
+   WIRELESS_MODE_UNKNOWN = 0,
+   /* Sub-Element */
+   WIRELESS_MODE_B = BIT(0),
+   WIRELESS_MODE_G = BIT(1),
+   WIRELESS_MODE_A = BIT(2),
+   WIRELESS_MODE_N_24G = BIT(3),
+   WIRELESS_MODE_N_5G = BIT(4),
+   WIRELESS_AUTO = BIT(5),
+   WIRELESS_MODE_AC = BIT(6),
+   WIRELESS_MODE_MAX = 0x7F,
+};
+
+/* from rtlwifi/wifi.h */
+enum ratr_table_mode_new {
+   RATEID_IDX_BGN_40M_2SS = 0,
+   RATEID_IDX_BGN_40M_1SS = 1,
+   RATEID_IDX_BGN_20M_2SS_BN = 2,
+   RATEID_IDX_BGN_20M_1SS_BN = 3,
+   RATEID_IDX_GN_N2SS = 4,
+   RATEID_IDX_GN_N1SS = 5,
+   RATEID_IDX_BG = 6,
+   RATEID_IDX_G = 7,
+   RATEID_IDX_B = 8,
+   RATEID_IDX_VHT_2SS = 9,
+   RATEID_IDX_VHT_1SS = 10,
+   RATEID_IDX_MIX1 = 11,
+   RATEID_IDX_MIX2 = 12,
+   RATEID_IDX_VHT_3SS = 13,
+   RATEID_IDX_BGN_3SS = 14,
+};
+
+#define RTL8XXXU_RATR_STA_INIT 0
+#define RTL8XXXU_RATR_STA_HIGH 1
+#define RTL8XXXU_RATR_STA_MID  2
+#define RTL8XXXU_RATR_STA_LOW  3
+
 struct rtl8xxxu_priv {
struct ieee80211_hw *hw;
struct usb_device *udev;
@@ -1299,6 +1337,14 @@ struct rtl8xxxu_priv {
u8 pi_enabled:1;
u8 no_pape:1;
u8 int_buf[USB_INTR_CONTENT_LENGTH];
+   u8 ratr_index;
+   u8 rssi_level;
+   /*
+* Single virtual interface permitted since the driver supports STATION
+* mode only.
+*/
+   struct ieee80211_vif *vif;
+   struct delayed_work ra_watchdog;
 };
 
 struct rtl8xxxu_rx_urb {
@@ -1335,6 +1381,8 @@ struct rtl8xxxu_fileops {
  bool ht40);
void (*update_rate_mask) (struct rtl8xxxu_priv *priv,
  u32 ramask, int sgi);
+   void (*refresh_rate_mask) (struct rtl8xxxu_priv *priv, int signal,
+  struct ieee80211_sta *sta);
void (*report_connect) (struct rtl8xxxu_priv *priv,
u8 macid, bool con

Re: [PATCH net-next 0/5] PTP support for the SJA1105 DSA driver

2019-05-28 Thread Richard Cochran

On Wed, May 29, 2019 at 02:56:22AM +0300, Vladimir Oltean wrote:
> Not all is rosy, though.

You can sure say that again!
 
> PTP timestamping will only work when the ports are bridged. Otherwise,
> the metadata follow-up frames holding RX timestamps won't be received
> because they will be blocked by the master port's MAC filter. Linuxptp
> tries to put the net device in ALLMULTI/PROMISC mode,

Untrue.

> but DSA doesn't
> pass this on to the master port, which does the actual reception.
> The master port is put in promiscous mode when the slave ports are
> enslaved to a bridge.
> 
> Also, even with software-corrected timestamps, one can observe a
> negative path delay reported by linuxptp:
> 
> ptp4l[55.600]: master offset  8 s2 freq  +83677 path delay -2390
> ptp4l[56.600]: master offset 17 s2 freq  +83688 path delay -2391
> ptp4l[57.601]: master offset  6 s2 freq  +83682 path delay -2391
> ptp4l[58.601]: master offset -1 s2 freq  +83677 path delay -2391
> 
> Without investigating too deeply, this appears to be introduced by the
> correction applied by linuxptp to t4 (t4c: corrected master rxtstamp)
> during the path delay estimation process (removing the correction makes
> the path delay positive).

No.  The root cause is the time stamps delivered by the hardware or
your driver.  That needs to be addressed before going forward.

Thanks,
Richard

[PATCH] amd64-agp: fix arbitrary kernel memory writes

2019-05-28 Thread Young Xiao

pg_start is copied from userspace on AGPIOC_BIND and AGPIOC_UNBIND ioctl
cmds of agp_ioctl() and passed to agpioc_bind_wrap().  As said in the
comment, (pg_start + mem->page_count) may wrap in case of AGPIOC_BIND,
and it is not checked at all in case of AGPIOC_UNBIND.  As a result, user
with sufficient privileges (usually "video" group) may generate either
local DoS or privilege escalation.

See commit 194b3da873fd ("agp: fix arbitrary kernel memory writes")
for details.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/char/agp/amd64-agp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/char/agp/amd64-agp.c b/drivers/char/agp/amd64-agp.c
index c69e39f..5daa0e3 100644
--- a/drivers/char/agp/amd64-agp.c
+++ b/drivers/char/agp/amd64-agp.c
@@ -60,7 +60,8 @@ static int amd64_insert_memory(struct agp_memory *mem, off_t 
pg_start, int type)
 
/* Make sure we can fit the range in the gatt table. */
/* FIXME: could wrap */
-   if (((unsigned long)pg_start + mem->page_count) > num_entries)
+   if (((pg_start + mem->page_count) > num_entries) ||
+   ((pg_start + mem->page_count) < pg_start))
return -EINVAL;
 
j = pg_start;
-- 
2.7.4

Re: [PATCH net-next 3/5] net: dsa: mv88e6xxx: Let taggers specify a can_timestamp function

2019-05-28 Thread Richard Cochran

On Wed, May 29, 2019 at 02:56:25AM +0300, Vladimir Oltean wrote:
> The newly introduced function is called on both the RX and TX paths.

NAK on this patch.
 
> The boolean returned by port_txtstamp should only return false if the
> driver tried to timestamp the skb but failed.

So you say.
 
> Currently there is some logic in the mv88e6xxx driver that determines
> whether it should timestamp frames or not.
> 
> This is wasteful, because if the decision is to not timestamp them, then
> DSA will have cloned an skb and freed it immediately afterwards.

No, it isn't wasteful.  Look at the tests in that driver to see why.
 
> Additionally other drivers (sja1105) may have other hardware criteria
> for timestamping frames on RX, and the default conditions for
> timestamping a frame are too restrictive.

I'm sorry, but we won't change the frame just for one device that has
design issues.

Please put device specific workarounds into its driver.

Thanks,
Richard

Re: [PATCH net-next 1/5] timecounter: Add helper for reconstructing partial timestamps

2019-05-28 Thread Richard Cochran

On Tue, May 28, 2019 at 07:14:22PM -0700, John Stultz wrote:
> Hrm. Is this actually generic? Would it make more sense to have the
> specific implementations with this quirk implement this in their
> read() handler? If not, why?

Strongly agree that this workaround should stay in the driver.  After
all, we do not want to encourage HW designers to continue in this way.

Thanks,
Richard

[PATCH v8 3/3] i2c-ocores: sifive: add polling mode workaround for FU540-C000 SoC.

2019-05-28 Thread Sagar Shrikant Kadam

The i2c-ocore driver already has a polling mode interface.But it needs
a workaround for FU540 Chipset on HiFive unleashed board (RevA00).
There is an erratum in FU540 chip that prevents interrupt driven i2c
transfers from working, and also the I2C controller's interrupt bit
cannot be cleared if set, due to this the existing i2c polling mode
interface added in mainline earlier doesn't work, and CPU stall's
infinitely, when-ever i2c transfer is initiated.

Ref:
commit dd7dbf0eb090 ("i2c: ocores: refactor setup for polling")

The workaround / fix under OCORES_FLAG_BROKEN_IRQ is particularly for
FU540-COOO SoC.

The polling function identifies a SiFive device based on the device node
and enables the workaround.

Signed-off-by: Sagar Shrikant Kadam 
---
 drivers/i2c/busses/i2c-ocores.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-ocores.c b/drivers/i2c/busses/i2c-ocores.c
index b334fa2..4117f1a 100644
--- a/drivers/i2c/busses/i2c-ocores.c
+++ b/drivers/i2c/busses/i2c-ocores.c
@@ -35,6 +35,7 @@ struct ocores_i2c {
int iobase;
u32 reg_shift;
u32 reg_io_width;
+   unsigned long flags;
wait_queue_head_t wait;
struct i2c_adapter adap;
struct i2c_msg *msg;
@@ -84,6 +85,8 @@ struct ocores_i2c {
 #define TYPE_GRLIB 1
 #define TYPE_SIFIVE_REV0   2
 
+#define OCORES_FLAG_BROKEN_IRQ BIT(1) /* Broken IRQ for FU540-C000 SoC */
+
 static void oc_setreg_8(struct ocores_i2c *i2c, int reg, u8 value)
 {
iowrite8(value, i2c->base + (reg << i2c->reg_shift));
@@ -236,9 +239,12 @@ static irqreturn_t ocores_isr(int irq, void *dev_id)
struct ocores_i2c *i2c = dev_id;
u8 stat = oc_getreg(i2c, OCI2C_STATUS);
 
-   if (!(stat & OCI2C_STAT_IF))
+   if (i2c->flags & OCORES_FLAG_BROKEN_IRQ) {
+   if ((stat & OCI2C_STAT_IF) && !(stat & OCI2C_STAT_BUSY))
+   return IRQ_NONE;
+   } else if (!(stat & OCI2C_STAT_IF)) {
return IRQ_NONE;
-
+   }
ocores_process(i2c, stat);
 
return IRQ_HANDLED;
@@ -353,6 +359,11 @@ static void ocores_process_polling(struct ocores_i2c *i2c)
ret = ocores_isr(-1, i2c);
if (ret == IRQ_NONE)
break; /* all messages have been transferred */
+   else {
+   if (i2c->flags & OCORES_FLAG_BROKEN_IRQ)
+   if (i2c->state == STATE_DONE)
+   break;
+   }
}
 }
 
@@ -595,6 +606,7 @@ static int ocores_i2c_probe(struct platform_device *pdev)
 {
struct ocores_i2c *i2c;
struct ocores_i2c_platform_data *pdata;
+   const struct of_device_id *match;
struct resource *res;
int irq;
int ret;
@@ -677,6 +689,14 @@ static int ocores_i2c_probe(struct platform_device *pdev)
irq = platform_get_irq(pdev, 0);
if (irq == -ENXIO) {
ocores_algorithm.master_xfer = ocores_xfer_polling;
+
+   /*
+* Set in OCORES_FLAG_BROKEN_IRQ to enable workaround for
+* FU540-C000 SoC in polling mode.
+*/
+   match = of_match_node(ocores_i2c_match, pdev->dev.of_node);
+   if (match && (long)match->data == TYPE_SIFIVE_REV0)
+   i2c->flags |= OCORES_FLAG_BROKEN_IRQ;
} else {
if (irq < 0)
return irq;
-- 
1.9.1

[PATCH v8 0/3] Extend dt bindings to support I2C on sifive devices and a fix broken IRQ in polling mode.

2019-05-28 Thread Sagar Shrikant Kadam

The patch is based on mainline v5.2-rc1 and extends DT-bindings for Opencore 
based I2C IP block reimplemented
in FU540 SoC, available on HiFive unleashed board (Rev A00), and also provides 
a workaround for broken IRQ
which affects the already available I2C polling mode interface in mainline, for 
FU540-C000 chipsets.

The polling mode workaround patch fixes the CPU stall issue, when-ever i2c 
transfer are initiated.

This workaround checks if it's a FU540 chipset based on device tree 
information, and check's for open
core's IF(interrupt flag) and BUSY flags to break from the polling loop upon 
completion of transfer.

To test the patch, a PMOD-AD2 sensor is connected to HiFive Unleashed board 
over J1 connector, and
appropriate device node is added into board specific device tree as per the 
information provided in
dt-bindings in Documentation/devicetree/bindings/i2c/i2c-ocores.txt.
Without this workaround, the CPU stall's infinitely.

Busybox i2c utilities used to verify workaround : i2cdetect, i2cdump, i2cset, 
i2cget


Patch History:
V7<->V8:
-Incorporated review comments for cosmetic changes like: space, comma and 
period(.)

V6<->V7:
-Rectified space and tab issue in dt bindings strings.
-Implemented workaround based on i2c->flags, as per review comment on v6.

V5<->V6:
-Incorporated suggestions on v5 patch as follows:
-Reformatted compatibility strings in dt doc with one valid combination on each 
line.
-Removed interrupt-parents from optional property list. 
-With rebase to v5.2-rc1, the v5 variant of polling workaround PATCH becomes 
in-compatible.
 Till kernel v5.1 the polling mode was enabled based on i2c->flags, wherease in 
kernel v5.2-rc1 polling mode is set as
 master transfer algorithim at probe time itself, and i2c->flags checks are 
removed.
-Modified v5 to check for SiFive device type in polling function and include 
the workaround/fix for broken IRQ.

v4<->V5:
-Removed un-necessary checks of OCORES_FLAG_BROKEN_IRQ.

V3<->V4:
-Incorporated suggestions on v3 patch as follows:
-OCORES_FLAG_BROKEN_IRQ BIT position rectified.
-Updated BORKEN_IRQ flag checks such that if sifive device (Fu540-C000) is 
identified,then use polling mode as IRQ is broken.

V2<->V3:
-Incorporated review comments on v2 patch as follows:
-Rectified compatibility string sequence with the most specific one at the 
first (dt bindings). 
-Moved interrupts and interrupt-parent under optional property list 
(dt-bindings).
-Updated reference to sifive-blocks-ip-versioning.txt and URL to IP repository 
used (dt-bindings).
-Removed example for i2c0 device node from binding doc (dt-bindings).
-Included sifive,i2c0 device under compatibility table in i2c-ocores driver 
(i2c-ocores).
-Updated polling mode hooks for SoC specific fix to handle broken IRQ 
(i2c-ocores).


V1<->V2:
-Incorporate review comments from Andrew
-Extend dt bindings into i2c-ocores.txt instead of adding new file
-Rename SIFIVE_FLAG_POLL to OCORES_FLAG_BROKEN_IRQ

V1:
-Update dt bindings for sifive i2c devices
-Fix broken IRQ affecting i2c polling mode interface.

Sagar Shrikant Kadam (3):
  dt-bindings: i2c: extend existing opencore bindings.
  i2c-ocores: sifive: add support for i2c device on FU540-c000 SoC.
  i2c-ocores: sifive: add polling mode workaround for FU540-C000 SoC.

 .../devicetree/bindings/i2c/i2c-ocores.txt |  9 --
 drivers/i2c/busses/i2c-ocores.c| 33 --
 2 files changed, 38 insertions(+), 4 deletions(-)

-- 
1.9.1

[PATCH v8 1/3] dt-bindings: i2c: extend existing opencore bindings.

2019-05-28 Thread Sagar Shrikant Kadam

Reformatted compatibility strings to one valid combination on
each line.
Add FU540-C000 specific device tree bindings to already available
i2-ocores file. This device is available on
HiFive Unleashed Rev A00 board. Move interrupt under optional
property list as this can be optional.

The FU540-C000 SoC from sifive, has an Opencore's I2C block
reimplementation.

The DT compatibility string for this IP is present in HDL and available at.
https://github.com/sifive/sifive-blocks/blob/master/src/main/scala/devices/i2c/I2C.scala#L73

Signed-off-by: Sagar Shrikant Kadam 
---
 Documentation/devicetree/bindings/i2c/i2c-ocores.txt | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/i2c/i2c-ocores.txt 
b/Documentation/devicetree/bindings/i2c/i2c-ocores.txt
index 17bef9a..6b25a80 100644
--- a/Documentation/devicetree/bindings/i2c/i2c-ocores.txt
+++ b/Documentation/devicetree/bindings/i2c/i2c-ocores.txt
@@ -1,9 +1,13 @@
 Device tree configuration for i2c-ocores
 
 Required properties:
-- compatible  : "opencores,i2c-ocores" or "aeroflexgaisler,i2cmst"
+- compatible  : "opencores,i2c-ocores"
+"aeroflexgaisler,i2cmst"
+"sifive,fu540-c000-i2c", "sifive,i2c0"
+For Opencore based I2C IP block reimplemented in
+FU540-C000 SoC. Please refer to 
sifive-blocks-ip-versioning.txt
+for additional details.
 - reg : bus address start and address range size of device
-- interrupts  : interrupt number
 - clocks  : handle to the controller clock; see the note below.
 Mutually exclusive with opencores,ip-clock-frequency
 - opencores,ip-clock-frequency: frequency of the controller clock in Hz;
@@ -12,6 +16,7 @@ Required properties:
 - #size-cells : should be <0>
 
 Optional properties:
+- interrupts  : interrupt number.
 - clock-frequency : frequency of bus clock in Hz; see the note below.
 Defaults to 100 KHz when the property is not specified
 - reg-shift   : device register offsets are shifted by this value
-- 
1.9.1

[PATCH v8 2/3] i2c-ocores: sifive: add support for i2c device on FU540-c000 SoC.

2019-05-28 Thread Sagar Shrikant Kadam

Update device id table for Opencore's I2C master based re-implementation
used in FU540-c000 chipset on HiFive Unleashed platform.

Device ID's include Sifive, soc-specific device for chip specific tweaks
and sifive IP block specific device for generic programming model.

Signed-off-by: Sagar Shrikant Kadam 
---
 drivers/i2c/busses/i2c-ocores.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/i2c/busses/i2c-ocores.c b/drivers/i2c/busses/i2c-ocores.c
index c3dabee..b334fa2 100644
--- a/drivers/i2c/busses/i2c-ocores.c
+++ b/drivers/i2c/busses/i2c-ocores.c
@@ -82,6 +82,7 @@ struct ocores_i2c {
 
 #define TYPE_OCORES0
 #define TYPE_GRLIB 1
+#define TYPE_SIFIVE_REV0   2
 
 static void oc_setreg_8(struct ocores_i2c *i2c, int reg, u8 value)
 {
@@ -462,6 +463,14 @@ static u32 ocores_func(struct i2c_adapter *adap)
.compatible = "aeroflexgaisler,i2cmst",
.data = (void *)TYPE_GRLIB,
},
+   {
+   .compatible = "sifive,fu540-c000-i2c",
+   .data = (void *)TYPE_SIFIVE_REV0,
+   },
+   {
+   .compatible = "sifive,i2c0",
+   .data = (void *)TYPE_SIFIVE_REV0,
+   },
{},
 };
 MODULE_DEVICE_TABLE(of, ocores_i2c_match);
-- 
1.9.1

[PATCH v8 3/3] i2c-ocores: sifive: add polling mode workaround for FU540-C000 SoC.

2019-05-28 Thread Sagar Shrikant Kadam

The i2c-ocore driver already has a polling mode interface.But it needs
a workaround for FU540 Chipset on HiFive unleashed board (RevA00).
There is an erratum in FU540 chip that prevents interrupt driven i2c
transfers from working, and also the I2C controller's interrupt bit
cannot be cleared if set, due to this the existing i2c polling mode
interface added in mainline earlier doesn't work, and CPU stall's
infinitely, when-ever i2c transfer is initiated.

Ref:
commit dd7dbf0eb090 ("i2c: ocores: refactor setup for polling")

The workaround / fix under OCORES_FLAG_BROKEN_IRQ is particularly for
FU540-COOO SoC.

The polling function identifies a SiFive device based on the device node
and enables the workaround.

Signed-off-by: Sagar Shrikant Kadam 
---
 drivers/i2c/busses/i2c-ocores.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-ocores.c b/drivers/i2c/busses/i2c-ocores.c
index b334fa2..4117f1a 100644
--- a/drivers/i2c/busses/i2c-ocores.c
+++ b/drivers/i2c/busses/i2c-ocores.c
@@ -35,6 +35,7 @@ struct ocores_i2c {
int iobase;
u32 reg_shift;
u32 reg_io_width;
+   unsigned long flags;
wait_queue_head_t wait;
struct i2c_adapter adap;
struct i2c_msg *msg;
@@ -84,6 +85,8 @@ struct ocores_i2c {
 #define TYPE_GRLIB 1
 #define TYPE_SIFIVE_REV0   2
 
+#define OCORES_FLAG_BROKEN_IRQ BIT(1) /* Broken IRQ for FU540-C000 SoC */
+
 static void oc_setreg_8(struct ocores_i2c *i2c, int reg, u8 value)
 {
iowrite8(value, i2c->base + (reg << i2c->reg_shift));
@@ -236,9 +239,12 @@ static irqreturn_t ocores_isr(int irq, void *dev_id)
struct ocores_i2c *i2c = dev_id;
u8 stat = oc_getreg(i2c, OCI2C_STATUS);
 
-   if (!(stat & OCI2C_STAT_IF))
+   if (i2c->flags & OCORES_FLAG_BROKEN_IRQ) {
+   if ((stat & OCI2C_STAT_IF) && !(stat & OCI2C_STAT_BUSY))
+   return IRQ_NONE;
+   } else if (!(stat & OCI2C_STAT_IF)) {
return IRQ_NONE;
-
+   }
ocores_process(i2c, stat);
 
return IRQ_HANDLED;
@@ -353,6 +359,11 @@ static void ocores_process_polling(struct ocores_i2c *i2c)
ret = ocores_isr(-1, i2c);
if (ret == IRQ_NONE)
break; /* all messages have been transferred */
+   else {
+   if (i2c->flags & OCORES_FLAG_BROKEN_IRQ)
+   if (i2c->state == STATE_DONE)
+   break;
+   }
}
 }
 
@@ -595,6 +606,7 @@ static int ocores_i2c_probe(struct platform_device *pdev)
 {
struct ocores_i2c *i2c;
struct ocores_i2c_platform_data *pdata;
+   const struct of_device_id *match;
struct resource *res;
int irq;
int ret;
@@ -677,6 +689,14 @@ static int ocores_i2c_probe(struct platform_device *pdev)
irq = platform_get_irq(pdev, 0);
if (irq == -ENXIO) {
ocores_algorithm.master_xfer = ocores_xfer_polling;
+
+   /*
+* Set in OCORES_FLAG_BROKEN_IRQ to enable workaround for
+* FU540-C000 SoC in polling mode.
+*/
+   match = of_match_node(ocores_i2c_match, pdev->dev.of_node);
+   if (match && (long)match->data == TYPE_SIFIVE_REV0)
+   i2c->flags |= OCORES_FLAG_BROKEN_IRQ;
} else {
if (irq < 0)
return irq;
-- 
1.9.1

Re: [PATCH] perf: Fix oops when kthread execs user process

2019-05-28 Thread Michael Ellerman

Will Deacon  writes:
> On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote:
>> On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
>> > When a kthread calls call_usermodehelper() the steps are:
>> >   1. allocate current->mm
>> >   2. load_elf_binary()
>> >   3. populate current->thread.regs
>> > 
>> > While doing this, interrupts are not disabled. If there is a perf
>> > interrupt in the middle of this process (i.e. step 1 has completed
>> > but not yet reached to step 3) and if perf tries to read userspace
>> > regs, kernel oops.
>
> This seems to be because pt_regs(current) gives NULL for kthreads on Power.

Right, we've done that since roughly forever in copy_thread():

int copy_thread(unsigned long clone_flags, unsigned long usp,
unsigned long kthread_arg, struct task_struct *p)
{
...
/* Copy registers */
sp -= sizeof(struct pt_regs);
childregs = (struct pt_regs *) sp;
if (unlikely(p->flags & PF_KTHREAD)) {
/* kernel thread */
memset(childregs, 0, sizeof(struct pt_regs));
childregs->gpr[1] = sp + sizeof(struct pt_regs);
...
p->thread.regs = NULL;  /* no user register state */

See commit from 2002:

https://github.com/mpe/linux-fullhistory/commit/c0a96c0918d21d8a99270e94d9c4a4a322d04581#diff-edb76bfcc84905163f34d24d2aad3f3aR187

> From the initial report [1], it doesn't look like the mm isn't initialised,
> but rather than we're dereferencing a NULL pt_regs pointer somehow for the
> current task (see previous comment). I don't see how that can happen on
> arm64, given that we put the pt_regs on the kernel stack which is allocated
> during fork.

We have the regs on the stack too (see above), but we're explicitly
NULL'ing the link from task->thread.

Looks like on arm64 and x86 there is no link from task->thread, instead
you get from task to pt_regs via task_stack_page().

That actually seems potentially fishy given the comment on
task_stack_page() about the stack going away for exiting tasks. We
should probably be NULL'ing the regs pointer in free_thread_stack() or
similar. Though that race mustn't be happening because other arches
would see it.

Or are we just wrong and kthreads should have non-NULL regs? I can't
find another arch that does the same as us.

cheers

Re: [PATCH] x86/fpu: Use fault_in_pages_writeable() for pre-faulting

2019-05-28 Thread Andrew Morton

On Sun, 26 May 2019 19:33:25 +0200 Sebastian Andrzej Siewior 
 wrote:

> From: Hugh Dickins 
> 
> Since commit
> 
>d9c9ce34ed5c8 ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() 
> fails")

Please add this as a

Fixes: d9c9ce34ed5c8 ("x86/fpu: Fault-in user stack if 
copy_fpstate_to_sigframe() fails")

line so that anyone who backports d9c9ce34ed5c8 has a chance of finding
this patch also.

kernel BUG at mm/swap_state.c:170!

2019-05-28 Thread Mikhail Gavrilov

Hi folks.
I am observed kernel panic after update to git tag 5.2-rc2.
This crash happens at memory pressing when swap being used.

Unfortunately in journalctl saved only this:

May 29 08:02:02 localhost.localdomain kernel: page:e9095823
refcount:1 mapcount:1 mapping:8f3ffeb36949 index:0x625002ab2
May 29 08:02:02 localhost.localdomain kernel: anon
May 29 08:02:02 localhost.localdomain kernel: flags:
0x17fffe00080034(uptodate|lru|active|swapbacked)
May 29 08:02:02 localhost.localdomain kernel: raw: 0017fffe00080034
e90944640888 e90956e208c8 8f3ffeb36949
May 29 08:02:02 localhost.localdomain kernel: raw: 000625002ab2
 0001 8f41aeeff000
May 29 08:02:02 localhost.localdomain kernel: page dumped because:
VM_BUG_ON_PAGE(entry != page)
May 29 08:02:02 localhost.localdomain kernel: page->mem_cgroup:8f41aeeff000
May 29 08:02:02 localhost.localdomain kernel: [ cut here
]
May 29 08:02:02 localhost.localdomain kernel: kernel BUG at mm/swap_state.c:170!




--
Best Regards,
Mike Gavrilov.

Re: [ext4] 079f9927c7: ltp.mmap16.fail

2019-05-28 Thread Theodore Ts'o

On Wed, May 29, 2019 at 10:52:56AM +0800, kernel test robot wrote:
> FYI, we noticed the following commit (built with gcc-7):
> 
> commit: 079f9927c7bfa026d963db1455197159ebe5b534 ("ext4: gracefully handle 
> ext4_break_layouts() failure during truncate")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

Jan --- this is the old version of your patch, which I had dropped
before sending a push request to Linus.  However, I forgot to reset
the dev branch so it still had the old patch on it, and so it got
picked up in linux-next.  Apologies for the confusion.

I've reset the dev branch on ext4.git, and the new version of your
patch will show up there shortly, as I start reviewing patches for the
next merge window.

Cheers,

- Ted

> <<>>
> tag=mmap16 stime=1559078706
> cmdline="mmap16"
> contacts=""
> analysis=exit
> <<>>
> mke2fs 1.43.4 (31-Jan-2017)
> mmap16  0  TINFO  :  Using test device LTP_DEV='/dev/loop0'
> mmap16  0  TINFO  :  Formatting /dev/loop0 with ext4 opts='-b 1024' extra 
> opts='10240'
> mmap16  1  TFAIL  :  mmap16.c:85: Bug is reproduced!
> <<>>
> initiation_status="ok"
> duration=8 termination_type=exited termination_id=1 corefile=no
> cutime=11 cstime=345
> <<>>

Re: [RFC PATCH v5 16/16] dcache: Add CONFIG_DCACHE_SMO

2019-05-28 Thread Tobin C. Harding

On Tue, May 21, 2019 at 02:05:38AM +, Roman Gushchin wrote:
> On Tue, May 21, 2019 at 11:31:18AM +1000, Tobin C. Harding wrote:
> > On Tue, May 21, 2019 at 12:57:47AM +, Roman Gushchin wrote:
> > > On Mon, May 20, 2019 at 03:40:17PM +1000, Tobin C. Harding wrote:
> > > > In an attempt to make the SMO patchset as non-invasive as possible add a
> > > > config option CONFIG_DCACHE_SMO (under "Memory Management options") for
> > > > enabling SMO for the DCACHE.  Whithout this option dcache constructor is
> > > > used but no other code is built in, with this option enabled slab
> > > > mobility is enabled and the isolate/migrate functions are built in.
> > > > 
> > > > Add CONFIG_DCACHE_SMO to guard the partial shrinking of the dcache via
> > > > Slab Movable Objects infrastructure.
> > > 
> > > Hm, isn't it better to make it a static branch? Or basically anything
> > > that allows switching on the fly?
> > 
> > If that is wanted, turning SMO on and off per cache, we can probably do
> > this in the SMO code in SLUB.
> 
> Not necessarily per cache, but without recompiling the kernel.
> > 
> > > It seems that the cost of just building it in shouldn't be that high.
> > > And the question if the defragmentation worth the trouble is so much
> > > easier to answer if it's possible to turn it on and off without rebooting.
> > 
> > If the question is 'is defragmentation worth the trouble for the
> > dcache', I'm not sure having SMO turned off helps answer that question.
> > If one doesn't shrink the dentry cache there should be very little
> > overhead in having SMO enabled.  So if one wants to explore this
> > question then they can turn on the config option.  Please correct me if
> > I'm wrong.
> 
> The problem with a config option is that it's hard to switch over.
> 
> So just to test your changes in production a new kernel should be built,
> tested and rolled out to a representative set of machines (which can be
> measured in thousands of machines). Then if results are questionable,
> it should be rolled back.
> 
> What you're actually guarding is the kmem_cache_setup_mobility() call,
> which can be perfectly avoided using a boot option, for example. Turning
> it on and off completely dynamic isn't that hard too.

Hi Roman,

I've added a boot parameter to SLUB so that admins can enable/disable
SMO at boot time system wide.  Then for each object that implements SMO
(currently XArray and dcache) I've also added a boot parameter to
enable/disable SMO for that cache specifically (these depend on SMO
being enabled system wide).

All three boot parameters default to 'off', I've added a config option
to default each to 'on'.

I've got a little more testing to do on another part of the set then the
PATCH version is coming at you :)

This is more a courtesy email than a request for comment, but please
feel free to shout if you don't like the method outlined above.

Fully dynamic config is not currently possible because currently the SMO
implementation does not support disabling mobility for a cache once it
is turned on, a bit of extra logic would need to be added and some state
stored - I'm not sure it warrants it ATM but that can be easily added
later if wanted.  Maybe Christoph will give his opinion on this.

thanks,
Tobin.

Re: [PATCH v2] mm/swap: Fix release_pages() when releasing devmap pages

2019-05-28 Thread Ira Weiny

On Mon, May 27, 2019 at 05:01:07PM +0200, Michal Hocko wrote:
> On Fri 24-05-19 10:36:56, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> > 
> > Device pages can be more than type MEMORY_DEVICE_PUBLIC.
> > 
> > Handle all device pages within release_pages()
> > 
> > This was found via code inspection while determining if release_pages()
> > and the new put_user_pages() could be interchangeable.
> 
> Please expand more about who is such a user and why does it use
> release_pages rather than put_*page API.

Sorry for not being more clear.   The error was discovered while discussing a
proposal to change a use of release_pages() to put_user_pages()[1]

[1] 
https://lore.kernel.org/lkml/20190523172852.ga27...@iweiny-desk2.sc.intel.com/

In that thread John was saying that release_pages() was functionally equivalent
to a loop around put_page().  He also suggested implementing put_user_pages()
by using release_pages().  On the surface they did not seem the same to me so I
did a deep dive to make sure they were and found this error.

>
> The above changelog doesn't
> really help understanding what is the actual problem. I also do not
> understand the fix and a failure mode from release_pages is just scary.

This is not failing release_pages().  The fix is that not all devmap pages are
"public" type.  So previous to this change devmap pages of other types would
not correctly be accounted for.

The discussion about put_devmap_managed_page() "failing" is not about it
failing directly but rather in how these pages should be accounted for.  Only
devmap pages which require pagemap ops (specifically page_free()) require
put_devmap_managed_page() processing.   Because of the optimized locking in
release_pages() the zone device check is required to release the lock even if
put_devmap_managed_page() does not handle the put.

> It is basically impossible to handle the error case. So what is going on
> here?

I think what has happened is the code in release_pages() and put_page()
diverged at some point.  I think it is worth a clean up in this area but I
don't see way to do it at the moment which would be any cleaner than what is
there.  So I've refrained from doing so.

Does this help?  Would you like to roll a V3 with some of this in the commit
message?

Ira

>
>
>
> 
> > Cc: Jérôme Glisse 
> > Cc: Michal Hocko 
> > Reviewed-by: Dan Williams 
> > Reviewed-by: John Hubbard 
> > Signed-off-by: Ira Weiny 
> > 
> > ---
> > Changes from V1:
> > Add comment clarifying that put_devmap_managed_page() can still
> > fail.
> > Add Reviewed-by tags.
> > 
> >  mm/swap.c | 11 +++
> >  1 file changed, 7 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/swap.c b/mm/swap.c
> > index 9d0432baddb0..f03b7b4bfb4f 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -740,15 +740,18 @@ void release_pages(struct page **pages, int nr)
> > if (is_huge_zero_page(page))
> > continue;
> >  
> > -   /* Device public page can not be huge page */
> > -   if (is_device_public_page(page)) {
> > +   if (is_zone_device_page(page)) {
> > if (locked_pgdat) {
> > spin_unlock_irqrestore(&locked_pgdat->lru_lock,
> >flags);
> > locked_pgdat = NULL;
> > }
> > -   put_devmap_managed_page(page);
> > -   continue;
> > +   /*
> > +* zone-device-pages can still fail here and will
> > +* therefore need put_page_testzero()
> > +*/
> > +   if (put_devmap_managed_page(page))
> > +   continue;
> > }
> >  
> > page = compound_head(page);
> > -- 
> > 2.20.1
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs

[PATCH 1/1] Drivers: hv: vmbus: Break out ISA independent parts of mshyperv.h

2019-05-28 Thread Michael Kelley

Break out parts of mshyperv.h that are ISA independent into a
separate file in include/asm-generic. This move facilitates
ARM64 code reusing these definitions and avoids code
duplication. No functionality or behavior is changed.

Signed-off-by: Michael Kelley 
---
 MAINTAINERS |   1 +
 arch/x86/include/asm/mshyperv.h | 147 +---
 include/asm-generic/mshyperv.h  | 182 
 3 files changed, 187 insertions(+), 143 deletions(-)
 create mode 100644 include/asm-generic/mshyperv.h

diff --git a/MAINTAINERS b/MAINTAINERS
index cf2a5b7..521192d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7308,6 +7308,7 @@ F:net/vmw_vsock/hyperv_transport.c
 F: include/clocksource/hyperv_timer.h
 F: include/linux/hyperv.h
 F: include/uapi/linux/hyperv.h
+F: include/asm-generic/mshyperv.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index f4fa8a9..2a793bf 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -3,84 +3,15 @@
 #define _ASM_X86_MSHYPER_H
 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
 
-#define VP_INVAL   U32_MAX
-
-struct ms_hyperv_info {
-   u32 features;
-   u32 misc_features;
-   u32 hints;
-   u32 nested_features;
-   u32 max_vp_index;
-   u32 max_lp_index;
-};
-
-extern struct ms_hyperv_info ms_hyperv;
-
-
 typedef int (*hyperv_fill_flush_list_func)(
struct hv_guest_mapping_flush_list *flush,
void *data);
 
-/*
- * Generate the guest ID.
- */
-
-static inline  __u64 generate_guest_id(__u64 d_info1, __u64 kernel_version,
-  __u64 d_info2)
-{
-   __u64 guest_id = 0;
-
-   guest_id = (((__u64)HV_LINUX_VENDOR_ID) << 48);
-   guest_id |= (d_info1 << 48);
-   guest_id |= (kernel_version << 16);
-   guest_id |= d_info2;
-
-   return guest_id;
-}
-
-
-/* Free the message slot and signal end-of-message if required */
-static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
-{
-   /*
-* On crash we're reading some other CPU's message page and we need
-* to be careful: this other CPU may already had cleared the header
-* and the host may already had delivered some other message there.
-* In case we blindly write msg->header.message_type we're going
-* to lose it. We can still lose a message of the same type but
-* we count on the fact that there can only be one
-* CHANNELMSG_UNLOAD_RESPONSE and we don't care about other messages
-* on crash.
-*/
-   if (cmpxchg(&msg->header.message_type, old_msg_type,
-   HVMSG_NONE) != old_msg_type)
-   return;
-
-   /*
-* Make sure the write to MessageType (ie set to
-* HVMSG_NONE) happens before we read the
-* MessagePending and EOMing. Otherwise, the EOMing
-* will not deliver any more messages since there is
-* no empty slot
-*/
-   mb();
-
-   if (msg->header.message_flags.msg_pending) {
-   /*
-* This will cause message queue rescan to
-* possibly deliver another msg from the
-* hypervisor
-*/
-   wrmsrl(HV_X64_MSR_EOM, 0);
-   }
-}
-
 #define hv_init_timer(timer, tick) \
wrmsrl(HV_X64_MSR_STIMER0_COUNT + (2*timer), tick)
 #define hv_init_timer_config(timer, val) \
@@ -97,6 +28,8 @@ static inline void vmbus_signal_eom(struct hv_message *msg, 
u32 old_msg_type)
 
 #define hv_get_vp_index(index) rdmsrl(HV_X64_MSR_VP_INDEX, index)
 
+#define hv_signal_eom() wrmsrl(HV_X64_MSR_EOM, 0)
+
 #define hv_get_synint_state(int_num, val) \
rdmsrl(HV_X64_MSR_SINT0 + int_num, val)
 #define hv_set_synint_state(int_num, val) \
@@ -122,13 +55,6 @@ static inline void vmbus_signal_eom(struct hv_message *msg, 
u32 old_msg_type)
 #define trace_hyperv_callback_vector hyperv_callback_vector
 #endif
 void hyperv_vector_handler(struct pt_regs *regs);
-void hv_setup_vmbus_irq(void (*handler)(void));
-void hv_remove_vmbus_irq(void);
-
-void hv_setup_kexec_handler(void (*handler)(void));
-void hv_remove_kexec_handler(void);
-void hv_setup_crash_handler(void (*handler)(struct pt_regs *regs));
-void hv_remove_crash_handler(void);
 
 /*
  * Routines for stimer0 Direct Mode handling.
@@ -136,8 +62,6 @@ static inline void vmbus_signal_eom(struct hv_message *msg, 
u32 old_msg_type)
  */
 void hv_stimer0_vector_handler(struct pt_regs *regs);
 void hv_stimer0_callback_vector(void);
-int hv_setup_stimer0_irq(int *irq, int *vector, void (*handler)(void));
-void hv_remove_stimer0_irq(int irq);
 
 static inline void hv_enable_stimer0_percpu_irq(int irq) {}
 static inline void hv_disable_stimer0_percpu_irq(int irq) {}
@@ -282,14 +206,6 @@ static inline u64

[PATCH] NFC: microread/pn544: Fix possible null pointer deference error

2019-05-28 Thread Young Xiao

When there is an access phy-hdev in pn544_hci_i2c_irq_thread_fn or
microread_i2c_irq_thread_fn, it is not initialized in pn544_hci_i2c_probe
or microread_i2c_probe.

Therefore, we change the order of calling function xxx_probe and
request_threaded_irq, and add guard of phy->hdev in
xxx_i2c_irq_thread_fn function.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/nfc/microread/i2c.c | 19 +++
 drivers/nfc/pn544/i2c.c | 16 
 2 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/drivers/nfc/microread/i2c.c b/drivers/nfc/microread/i2c.c
index 1806d20..80fc6d5 100644
--- a/drivers/nfc/microread/i2c.c
+++ b/drivers/nfc/microread/i2c.c
@@ -212,7 +212,7 @@ static irqreturn_t microread_i2c_irq_thread_fn(int irq, 
void *phy_id)
struct sk_buff *skb = NULL;
int r;
 
-   if (!phy || irq != phy->i2c_dev->irq) {
+   if (!phy || !phy->hdev || irq != phy->i2c_dev->irq) {
WARN_ON_ONCE(1);
return IRQ_NONE;
}
@@ -257,6 +257,12 @@ static int microread_i2c_probe(struct i2c_client *client,
 
i2c_set_clientdata(client, phy);
phy->i2c_dev = client;
+   r = microread_probe(phy, &i2c_phy_ops, LLC_SHDLC_NAME,
+   MICROREAD_I2C_FRAME_HEADROOM,
+   MICROREAD_I2C_FRAME_TAILROOM,
+   MICROREAD_I2C_LLC_MAX_PAYLOAD, &phy->hdev);
+   if (r < 0)
+   return r;
 
r = request_threaded_irq(client->irq, NULL, microread_i2c_irq_thread_fn,
 IRQF_TRIGGER_RISING | IRQF_ONESHOT,
@@ -266,21 +272,10 @@ static int microread_i2c_probe(struct i2c_client *client,
return r;
}
 
-   r = microread_probe(phy, &i2c_phy_ops, LLC_SHDLC_NAME,
-   MICROREAD_I2C_FRAME_HEADROOM,
-   MICROREAD_I2C_FRAME_TAILROOM,
-   MICROREAD_I2C_LLC_MAX_PAYLOAD, &phy->hdev);
-   if (r < 0)
-   goto err_irq;
 
nfc_info(&client->dev, "Probed\n");
 
return 0;
-
-err_irq:
-   free_irq(client->irq, phy);
-
-   return r;
 }
 
 static int microread_i2c_remove(struct i2c_client *client)
diff --git a/drivers/nfc/pn544/i2c.c b/drivers/nfc/pn544/i2c.c
index d0207f8..c9694c8 100644
--- a/drivers/nfc/pn544/i2c.c
+++ b/drivers/nfc/pn544/i2c.c
@@ -496,7 +496,7 @@ static irqreturn_t pn544_hci_i2c_irq_thread_fn(int irq, 
void *phy_id)
struct sk_buff *skb = NULL;
int r;
 
-   if (!phy || irq != phy->i2c_dev->irq) {
+   if (!phy || !phy->hdev || irq != phy->i2c_dev->irq) {
WARN_ON_ONCE(1);
return IRQ_NONE;
}
@@ -924,6 +924,13 @@ static int pn544_hci_i2c_probe(struct i2c_client *client,
 
pn544_hci_i2c_platform_init(phy);
 
+   r = pn544_hci_probe(phy, &i2c_phy_ops, LLC_SHDLC_NAME,
+   PN544_I2C_FRAME_HEADROOM, PN544_I2C_FRAME_TAILROOM,
+   PN544_HCI_I2C_LLC_MAX_PAYLOAD,
+   pn544_hci_i2c_fw_download, &phy->hdev);
+   if (r < 0)
+   return r;
+
r = devm_request_threaded_irq(&client->dev, client->irq, NULL,
  pn544_hci_i2c_irq_thread_fn,
  IRQF_TRIGGER_RISING | IRQF_ONESHOT,
@@ -933,13 +940,6 @@ static int pn544_hci_i2c_probe(struct i2c_client *client,
return r;
}
 
-   r = pn544_hci_probe(phy, &i2c_phy_ops, LLC_SHDLC_NAME,
-   PN544_I2C_FRAME_HEADROOM, PN544_I2C_FRAME_TAILROOM,
-   PN544_HCI_I2C_LLC_MAX_PAYLOAD,
-   pn544_hci_i2c_fw_download, &phy->hdev);
-   if (r < 0)
-   return r;
-
return 0;
 }
 
-- 
2.7.4

RE: [PATCH RESEND V13 2/5] thermal: of-thermal: add API for getting sensor ID from DT

2019-05-28 Thread Anson Huang

Hi, Eduardo

> -Original Message-
> From: Eduardo Valentin 
> Sent: Wednesday, May 29, 2019 11:02 AM
> To: Anson Huang 
> Cc: robh...@kernel.org; mark.rutl...@arm.com; shawn...@kernel.org;
> s.ha...@pengutronix.de; ker...@pengutronix.de; feste...@gmail.com;
> catalin.mari...@arm.com; will.dea...@arm.com; rui.zh...@intel.com;
> daniel.lezc...@linaro.org; Aisheng Dong ;
> ulf.hans...@linaro.org; Peng Fan ; Daniel Baluta
> ; maxime.rip...@bootlin.com; o...@lixom.net;
> ja...@amarulasolutions.com; horms+rene...@verge.net.au; Leonard Crestez
> ; bjorn.anders...@linaro.org;
> dingu...@kernel.org; enric.balle...@collabora.com;
> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; linux...@vger.kernel.org; dl-linux-imx  i...@nxp.com>
> Subject: Re: [PATCH RESEND V13 2/5] thermal: of-thermal: add API for getting
> sensor ID from DT
> 
> On Tue, May 28, 2019 at 02:06:18PM +0800, anson.hu...@nxp.com wrote:
> > From: Anson Huang 
> >
> > On some platforms like i.MX8QXP, the thermal driver needs a real HW
> > sensor ID from DT thermal zone, the HW sensor ID is used to get
> > temperature from SCU firmware, and the virtual sensor ID starting from
> > 0 to N is NOT used at all, this patch adds new API
> > thermal_zone_of_get_sensor_id() to provide the feature of getting
> > sensor ID from DT thermal zone's node.
> >
> > Signed-off-by: Anson Huang 
> > ---
> > Changes since V12:
> > - adjust the second parameter of thermal_zone_of_get_sensor_id() API,
> then caller no need
> >   to pass the of_phandle_args structure and put the sensor_specs.np
> manually, also putting
> >   the sensor node device check inside this API to make it easy for
> > usage;
> 
> What happened to using nxp,resource-id property in your driver?
> Why do we need this as an API in of-thermal? What other drivers may benefit
> of this?
> 
> Regardless, this patch needs to document the new API under Documentation/

As Rob has different opinion about this property, he thought it is unnecessary, 
see below
discussion mail, that is why I need to add API to get the resource ID from 
phandle argument.
I am totally confused now, which approach should we adopt?

https://patchwork.kernel.org/patch/10831397/

Thanks,
Anson

> 
> > ---
> >  drivers/thermal/of-thermal.c | 66 +---
> 
> >  include/linux/thermal.h  | 10 +++
> >  2 files changed, 60 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/thermal/of-thermal.c
> > b/drivers/thermal/of-thermal.c index dc5093b..a53792b 100644
> > --- a/drivers/thermal/of-thermal.c
> > +++ b/drivers/thermal/of-thermal.c
> > @@ -449,6 +449,54 @@ thermal_zone_of_add_sensor(struct device_node
> > *zone,  }
> >
> >  /**
> > + * thermal_zone_of_get_sensor_id - get sensor ID from a DT thermal
> > + zone
> > + * @tz_np: a valid thermal zone device node.
> > + * @sensor_np: a sensor node of a valid sensor device.
> > + * @id: a sensor ID pointer will be passed back.
> > + *
> > + * This function will get sensor ID from a given thermal zone node,
> > + use
> > + * "thermal-sensors" as list name, and get sensor ID from first
> > + phandle's
> > + * argument.
> > + *
> > + * Return: 0 on success, proper error code otherwise.
> > + */
> > +
> > +int thermal_zone_of_get_sensor_id(struct device_node *tz_np,
> > + struct device_node *sensor_np,
> > + u32 *id)
> > +{
> > +   struct of_phandle_args sensor_specs;
> > +   int ret;
> > +
> > +   ret = of_parse_phandle_with_args(tz_np,
> > +"thermal-sensors",
> > +"#thermal-sensor-cells",
> > +0,
> > +&sensor_specs);
> > +   if (ret)
> > +   return ret;
> > +
> > +   if (sensor_specs.np != sensor_np) {
> > +   of_node_put(sensor_specs.np);
> > +   return -ENODEV;
> > +   }
> > +
> > +   if (sensor_specs.args_count >= 1) {
> > +   *id = sensor_specs.args[0];
> > +   WARN(sensor_specs.args_count > 1,
> > +"%pOFn: too many cells in sensor specifier %d\n",
> > +sensor_specs.np, sensor_specs.args_count);
> > +   } else {
> > +   *id = 0;
> > +   }
> > +
> > +   of_node_put(sensor_specs.np);
> > +
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(thermal_zone_of_get_sensor_id);
> > +
> > +/**
> >   * thermal_zone_of_sensor_register - registers a sensor to a DT thermal 
> > zone
> >   * @dev: a valid struct device pointer of a sensor device. Must contain
> >   *   a valid .of_node, for the sensor node.
> > @@ -499,36 +547,22 @@ thermal_zone_of_sensor_register(struct device
> *dev, int sensor_id, void *data,
> > sensor_np = of_node_get(dev->of_node);
> >
> > for_each_available_child_of_node(np, child) {
> > -   struct of_phandle_args sensor_specs;
> > int ret, id;
> >
> >

[PATCH] ASoC: cs42xx8: Fix build error with CONFIG_GPIOLIB is not set

2019-05-28 Thread shengjiu . wang

From: Shengjiu Wang 

config: x86_64-randconfig-x000201921-201921
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
make ARCH=x86_64

sound/soc/codecs/cs42xx8.c: In function ‘cs42xx8_probe’:
sound/soc/codecs/cs42xx8.c:472:25: error: implicit declaration of function 
‘devm_gpiod_get_optional’; did you mean ‘devm_clk_get_optional’? 
[-Werror=implicit-function-declaration]
  cs42xx8->gpiod_reset = devm_gpiod_get_optional(dev, "reset",
 ^~~
 devm_clk_get_optional
sound/soc/codecs/cs42xx8.c:473:8: error: ‘GPIOD_OUT_HIGH’ undeclared (first use 
in this function); did you mean ‘GPIOF_INIT_HIGH’?
GPIOD_OUT_HIGH);
^~
GPIOF_INIT_HIGH
sound/soc/codecs/cs42xx8.c:473:8: note: each undeclared identifier is reported 
only once for each function it appears in
sound/soc/codecs/cs42xx8.c:477:2: error: implicit declaration of function 
‘gpiod_set_value_cansleep’; did you mean ‘gpio_set_value_cansleep’? 
[-Werror=implicit-function-declaration]
  gpiod_set_value_cansleep(cs42xx8->gpiod_reset, 0);
  ^~~~
  gpio_set_value_cansleep

Fixes: bfe95dfa4dac ("ASoC: cs42xx8: Add reset gpio handling")
Reported-by: kbuild test robot 
Signed-off-by: Shengjiu Wang 
---
 sound/soc/codecs/cs42xx8.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/codecs/cs42xx8.c b/sound/soc/codecs/cs42xx8.c
index 3e8dbf63adbe..3bbc62322dfe 100644
--- a/sound/soc/codecs/cs42xx8.c
+++ b/sound/soc/codecs/cs42xx8.c
@@ -14,7 +14,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
-- 
2.21.0

ebpf trace doesn't work during cpu hotplug

2019-05-28 Thread Ming Lei

Hi,

Looks ebpf trace doesn't work during cpu hotplug, see the following trace:

1) trace two functions called during CPU unplug via bcc/trace

/usr/share/bcc/tools/trace -T 'takedown_cpu "%d", arg1'  'take_cpu_down'

2) put cpu7 offline via:

echo 0 > /sys/devices/system/cpu/cpu7/online

3) only trace on 'takedown_cpu' is dumped via bcc/trace:

TIME PID TID COMMFUNC -
03:23:17 733 733 bashtakedown_cpu 7

The lost trace on 'take_cpu_down' can never be shown, even though
CPU7 is switched ON again.

take_cpu_down is called via stop_machine_cpuslocked.

Thanks,
Ming Lei

RE: [EXT] Re: [PATCH] arm64: dts: ls1028a: Add Thermal Monitor Unit node

2019-05-28 Thread Andy Tang

> -Original Message-
> From: Eduardo Valentin 
> Sent: 2019年5月29日 10:54
> To: Andy Tang 
> Cc: shawn...@kernel.org; Leo Li ;
> robh...@kernel.org; mark.rutl...@arm.com;
> linux-arm-ker...@lists.infradead.org; devicet...@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux...@vger.kernel.org;
> daniel.lezc...@linaro.org; rui.zh...@intel.com
> Subject: [EXT] Re: [PATCH] arm64: dts: ls1028a: Add Thermal Monitor Unit
> node
> 
> Caution: EXT Email
> 
> On Thu, Apr 25, 2019 at 04:26:40PM +0800, Yuantian Tang wrote:
> > The Thermal Monitoring Unit (TMU) monitors and reports the temperature
> > from 2 remote temperature measurement sites located on ls1028a chip.
> > Add TMU dts node to enable this feature.
> >
> > Signed-off-by: Yuantian Tang 
> 
> I dont see anything wrong from a thermal standpoint.
> 
> Acked-by: Eduardo Valentin 
> 
> Please get this via your arch tree maintainer to avoid merge conflicts.
Thanks for your review. 
The only concern for arch tree maintainer is that "cooling-maps" is a required 
property.
So I have to add cooling-maps for each zone. 
Since there are two thermal zones but only one cooling device, which is 
cpufreq, I have to
use CPUFREQ as cooling device twice which may cause cooling decision conflict.
The case will get worse when we have 7 thermal zones.
This makes me think "maybe we need to change cooling-maps to an optional 
property".
In this way, we can put the cooling devices to specific thermal zones and leave 
the zones without
Cooling devices to do the default action which is reset or poweroff soc.
What's your opinion about this?

BR,
Andy

> 
> > ---
> >  arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi |  114
> > 
> >  1 files changed, 114 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> > b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> > index b045812..a25f5fc 100644
> > --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> > +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> > @@ -29,6 +29,7 @@
> >   clocks = <&clockgen 1 0>;
> >   next-level-cache = <&l2>;
> >   cpu-idle-states = <&CPU_PH20>;
> > + #cooling-cells = <2>;
> >   };
> >
> >   cpu1: cpu@1 {
> > @@ -39,6 +40,7 @@
> >   clocks = <&clockgen 1 0>;
> >   next-level-cache = <&l2>;
> >   cpu-idle-states = <&CPU_PH20>;
> > + #cooling-cells = <2>;
> >   };
> >
> >   l2: l2-cache {
> > @@ -398,6 +400,118 @@
> >   status = "disabled";
> >   };
> >
> > + tmu: tmu@1f0 {
> > + compatible = "fsl,qoriq-tmu";
> > + reg = <0x0 0x1f8 0x0 0x1>;
> > + interrupts = <0 23 0x4>;
> > + fsl,tmu-range = <0xb 0xa0026 0x80048
> 0x70061>;
> > + fsl,tmu-calibration = <0x 0x0024
> > +0x0001
> 0x002b
> > +0x0002
> 0x0031
> > +0x0003
> 0x0038
> > +0x0004
> 0x003f
> > +0x0005
> 0x0045
> > +0x0006
> 0x004c
> > +0x0007
> 0x0053
> > +0x0008
> 0x0059
> > +0x0009
> 0x0060
> > +0x000a
> 0x0066
> > +0x000b
> 0x006d
> > +
> > +0x0001
> 0x001c
> > +0x00010001
> 0x0024
> > +0x00010002
> 0x002c
> > +0x00010003
> 0x0035
> > +0x00010004
> 0x003d
> > +0x00010005
> 0x0045
> > +0x00010006
> 0x004d
> > +0x00010007
> 0x0045
> > +0x00010008
> 0x005e
> > +0x00010009
> 0x0066
> > +0x0001000a
> 0x006e
> > +
> > +0x0002
> 0x0018
> > +0x00020001
> 0x0022
> > +0x00020002
> 0x002d
> > +0x00020003
> 0x0038
> > +

Re: linux-next: Fixes tag needs some work in the cifs tree

2019-05-28 Thread Murphy Zhou

On Fri, May 24, 2019 at 10:14 PM Steve French  wrote:
>
> fixed and repushed to cifs-2.6.git for-next

Thanks!

[resend including mail lists]

>
> On Thu, May 23, 2019 at 11:27 PM Stephen Rothwell  
> wrote:
> >
> > Hi all,
> >
> > In commit
> >
> >   f875253b5fe6 ("fs/cifs/smb2pdu.c: fix buffer free in SMB2_ioctl_free")
> >
> > Fixes tag
> >
> >   Fixes: 2c87d6a ("cifs: Allocate memory for all iovs in smb2_ioctl")
> >
> > has these problem(s):
> >
> >   - SHA1 should be at least 12 digits long
> > Can be fixed by setting core.abbrev to 12 (or more) or (for git v2.11
> > or later) just making sure it is not set (or set to "auto").
> >
> > --
> > Cheers,
> > Stephen Rothwell
>
>
>
> --
> Thanks,
>
> Steve

Re: [PATCH 3/4] vsock/virtio: fix flush of works during the .remove()

2019-05-28 Thread Jason Wang




On 2019/5/28 下午6:56, Stefano Garzarella wrote:

We flush all pending works before to call vdev->config->reset(vdev),
but other works can be queued before the vdev->config->del_vqs(vdev),
so we add another flush after it, to avoid use after free.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Stefano Garzarella 
---
  net/vmw_vsock/virtio_transport.c | 23 +--
  1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index e694df10ab61..ad093ce96693 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -660,6 +660,15 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
return ret;
  }
  
+static void virtio_vsock_flush_works(struct virtio_vsock *vsock)

+{
+   flush_work(&vsock->loopback_work);
+   flush_work(&vsock->rx_work);
+   flush_work(&vsock->tx_work);
+   flush_work(&vsock->event_work);
+   flush_work(&vsock->send_pkt_work);
+}
+
  static void virtio_vsock_remove(struct virtio_device *vdev)
  {
struct virtio_vsock *vsock = vdev->priv;
@@ -668,12 +677,6 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
mutex_lock(&the_virtio_vsock_mutex);
the_virtio_vsock = NULL;
  
-	flush_work(&vsock->loopback_work);

-   flush_work(&vsock->rx_work);
-   flush_work(&vsock->tx_work);
-   flush_work(&vsock->event_work);
-   flush_work(&vsock->send_pkt_work);
-
/* Reset all connected sockets when the device disappear */
vsock_for_each_connected_socket(virtio_vsock_reset_sock);
  
@@ -690,6 +693,9 @@ static void virtio_vsock_remove(struct virtio_device *vdev)

vsock->event_run = false;
mutex_unlock(&vsock->event_lock);
  
+	/* Flush all pending works */

+   virtio_vsock_flush_works(vsock);
+
/* Flush all device writes and interrupts, device will not use any
 * more buffers.
 */
@@ -726,6 +732,11 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
/* Delete virtqueues and flush outstanding callbacks if any */
vdev->config->del_vqs(vdev);
  
+	/* Other works can be queued before 'config->del_vqs()', so we flush

+* all works before to free the vsock object to avoid use after free.
+*/
+   virtio_vsock_flush_works(vsock);



Some questions after a quick glance:

1) It looks to me that the work could be queued from the path of 
vsock_transport_cancel_pkt() . Is that synchronized here?


2) If we decide to flush after dev_vqs(), is tx_run/rx_run/event_run 
still needed? It looks to me we've already done except that we need 
flush rx_work in the end since send_pkt_work can requeue rx_work.


Thanks



+
kfree(vsock);
mutex_unlock(&the_virtio_vsock_mutex);
  }

Re: [PATCH] cpumask: Remove error message and backtrace on out-of-memory condition

2019-05-28 Thread Andrew Morton

On Mon, 27 May 2019 14:29:58 +0200 Geert Uytterhoeven  
wrote:

> There is no need to print an error message and backtrace if
> kmalloc_node() fails, as the memory allocation core already takes care
> of that.
> 
> ...
>
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -114,13 +114,6 @@ bool alloc_cpumask_var_node(cpumask_var_t *mask, gfp_t 
> flags, int node)
>  {
>   *mask = kmalloc_node(cpumask_size(), flags, node);
>  
> -#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> - if (!*mask) {
> - printk(KERN_ERR "=> alloc_cpumask_var: failed!\n");
> - dump_stack();
> - }
> -#endif
> -
>   return *mask != NULL;
>  }
>  EXPORT_SYMBOL(alloc_cpumask_var_node);

Well, not really - as it stands CONFIG_DEBUG_PER_CPU_MAPS=y can override a
caller's __GFP_NOWARN.

I wonder if anyone ever sets CONFIG_DEBUG_PER_CPU_MAPS any more...

Re: [PATCH -next] EDAC: aspeed: Remove set but not used variable 'np'

2019-05-28 Thread Stefan Schaeckeler (sschaeck)

On  Tuesday, May 28, 2019 at 6:27 PM, Andrew Jeffery wrote:
> On Sun, 26 May 2019, at 00:12, YueHaibing wrote:
> > Fixes gcc '-Wunused-but-set-variable' warning:
> >
> > drivers/edac/aspeed_edac.c: In function aspeed_probe:
> > drivers/edac/aspeed_edac.c:284:22: warning: variable np set but not
> > used [-Wunused-but-set-variable]
> >
> > It is never used and can be removed.
> >
> > Signed-off-by: YueHaibing 
>
> Reviewed-by: Andrew Jeffery 

Reviewed-by: Stefan Schaeckeler 

> > ---
> >  drivers/edac/aspeed_edac.c | 4 
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/drivers/edac/aspeed_edac.c b/drivers/edac/aspeed_edac.c
> > index 11833c0a5d07..5634437bb39d 100644
> > --- a/drivers/edac/aspeed_edac.c
> > +++ b/drivers/edac/aspeed_edac.c
> > @@ -281,15 +281,11 @@ static int aspeed_probe(struct platform_device *pdev)
> > struct device *dev = &pdev->dev;
> > struct edac_mc_layer layers[2];
> > struct mem_ctl_info *mci;
> > -   struct device_node *np;
> > struct resource *res;
> > void __iomem *regs;
> > u32 reg04;
> > int rc;
> >
> > -   /* setup regmap */
> > -   np = dev->of_node;
> > -
> > res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > if (!res)
> > return -ENOENT;
> > --
> > 2.17.1

[PATCH v2] lib: test_overflow: Avoid tainting the kernel and fix wrap size

2019-05-28 Thread Kees Cook

This adds __GFP_NOWARN to the kmalloc()-portions of the overflow test to
avoid tainting the kernel. Additionally fixes up the math on wrap size
to be architecture and page size agnostic.

Reported-by: Randy Dunlap 
Suggested-by: Rasmus Villemoes 
Fixes: ca90800a91ba ("test_overflow: Add memory allocation overflow tests")
Signed-off-by: Kees Cook 
---
v2: fix leftover __GFP_NOWARN (joe)
---
 lib/test_overflow.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/lib/test_overflow.c b/lib/test_overflow.c
index fc680562d8b6..7a4b6f6c5473 100644
--- a/lib/test_overflow.c
+++ b/lib/test_overflow.c
@@ -486,16 +486,17 @@ static int __init test_overflow_shift(void)
  * Deal with the various forms of allocator arguments. See comments above
  * the DEFINE_TEST_ALLOC() instances for mapping of the "bits".
  */
-#define alloc010(alloc, arg, sz) alloc(sz, GFP_KERNEL)
-#define alloc011(alloc, arg, sz) alloc(sz, GFP_KERNEL, NUMA_NO_NODE)
+#define alloc_GFP   (GFP_KERNEL | __GFP_NOWARN)
+#define alloc010(alloc, arg, sz) alloc(sz, alloc_GFP)
+#define alloc011(alloc, arg, sz) alloc(sz, alloc_GFP, NUMA_NO_NODE)
 #define alloc000(alloc, arg, sz) alloc(sz)
 #define alloc001(alloc, arg, sz) alloc(sz, NUMA_NO_NODE)
-#define alloc110(alloc, arg, sz) alloc(arg, sz, GFP_KERNEL)
+#define alloc110(alloc, arg, sz) alloc(arg, sz, alloc_GFP)
 #define free0(free, arg, ptr)   free(ptr)
 #define free1(free, arg, ptr)   free(arg, ptr)
 
-/* Wrap around to 8K */
-#define TEST_SIZE  (9 << PAGE_SHIFT)
+/* Wrap around to 16K */
+#define TEST_SIZE  (5 * 4096)
 
 #define DEFINE_TEST_ALLOC(func, free_func, want_arg, want_gfp, want_node)\
 static int __init test_ ## func (void *arg)\
-- 
2.17.1


-- 
Kees Cook

Re: [PATCH v3 2/4] mtd: rawnand: Add Macronix MX25F0A NAND controller

2019-05-28 Thread masonccyang



Hi Miquel,

> > > > > > +static void mxic_nand_select_chip(struct nand_chip *chip, int 
 
> > chipnr) 
> > > > > 
> > > > > _select_target() is preferred now 
> > > > 
> > > > Do you mean I implement mxic_nand_select_target() to control #CS ?
> > > > 
> > > > If so, I need to call mxic_nand_select_target( ) to control #CS ON
> > > > and then #CS OFF in _exec_op() due to nand_select_target() > nand_base,c> 
> > > > is still calling chip->legacy.select_chip ? 
> > > 
> > > You must forget about the ->select_chip() callback. Now it should be
> > > handled directly from the controller driver. Please have a look at 
the
> > > commit pointed against the marvell_nand.c driver. 
> > 
> > I have no Marvell NFC datasheet and have one question.
> > 
> > In marvell_nand.c, there is no xxx_deselect_target() or 
> > something like that doing #CS OFF.
> > marvell_nfc_select_target() seems always to make one of chip or die
> > #CS keep low.
> > 
> > Is it right ?
> 
> Yes, AFAIR there is no "de-assert" mechanism in this controller.
> 
> > 
> > How to make all #CS keep high for NAND to enter 
> > low-power standby mode if driver don't use "legacy.select_chip()" ?
> 
> See commit 02b4a52604a4 ("mtd: rawnand: Make ->select_chip() optional
> when ->exec_op() is implemented") which states:
> 
> "When [->select_chip() is] not implemented, the core is assuming
>the CS line is automatically asserted/deasserted by the driver
>->exec_op() implementation."
> 
> Of course, the above is right only when the controller driver supports
> the ->exec_op() interface. 

Currently, it seems that we will get the incorrect data and error
operation due to CS in error toggling if CS line is controlled in 
->exec_op().
i.e,. 

1) In nand_onfi_detect() to call nand_exec_op() twice by 
nand_read_param_page_op() and annd_read_data_op()

2) In nand_write_page_xxx to call nand_exec_op() many times by
nand_prog_page_begin_op(), nand_write_data_op() and 
nand_prog_page_end_op().


Should we consider to add a CS line controller in struct nand_controller
i.e,.

struct nand_controller {
 struct mutex lock;
 const struct nand_controller_ops *ops;
+  void (*select_chip)(struct nand_chip *chip, int cs);
};

to replace legacy.select_chip() ?


To patch in nand_select_target() and nand_deselect_target()

void nand_select_target(struct nand_chip *chip, unsigned int cs)
{
/*
 * cs should always lie between 0 and chip->numchips, when that's 
not
 * the case it's a bug and the caller should be fixed.
 */
if (WARN_ON(cs > chip->numchips))
return;

chip->cur_cs = cs;

+   if (chip->controller->select_chip)
+   chip->controller->select_chip(chip, cs);
+
if (chip->legacy.select_chip)
chip->legacy.select_chip(chip, cs);
}

void nand_deselect_target(struct nand_chip *chip)
{
+   if (chip->controller->select_chip)
+   chip->controller->select_chip(chip, -1);
+
if (chip->legacy.select_chip)
chip->legacy.select_chip(chip, -1);

chip->cur_cs = -1;
}


> 
> So if you think it is not too time consuming and worth the trouble to
> assert/deassert the CS at each operation, you may do it in your driver.
> 
> 
> Thanks,
> Miquèl

thanks & best regards,
Mason

CONFIDENTIALITY NOTE:

This e-mail and any attachments may contain confidential information 
and/or personal data, which is protected by applicable laws. Please be 
reminded that duplication, disclosure, distribution, or use of this e-mail 
(and/or its attachments) or any part thereof is prohibited. If you receive 
this e-mail in error, please notify us immediately and delete this mail as 
well as its attachment(s) from your system. In addition, please be 
informed that collection, processing, and/or use of personal data is 
prohibited unless expressly permitted by personal data protection laws. 
Thank you for your attention and cooperation.

Macronix International Co., Ltd.

=





CONFIDENTIALITY NOTE:

This e-mail and any attachments may contain confidential information and/or 
personal data, which is protected by applicable laws. Please be reminded that 
duplication, disclosure, distribution, or use of this e-mail (and/or its 
attachments) or any part thereof is prohibited. If you receive this e-mail in 
error, please notify us immediately and delete this mail as well as its 
attachment(s) from your system. In addition, please be informed that 
collection, processing, and/or use of personal data is prohibited unless 
expressly permitted by personal data protection laws. Thank you for your 
attention and cooperation.

Macronix International Co., Ltd.

=

Re: [PATCH] thermal/drivers/of: Add a get_temp_id callback function

2019-05-28 Thread Eduardo Valentin

On Thu, May 23, 2019 at 07:48:56PM -0700, Andrey Smirnov wrote:
> On Mon, Apr 29, 2019 at 9:51 AM Daniel Lezcano
>  wrote:
> >
> > On 24/04/2019 01:08, Daniel Lezcano wrote:
> > > On 23/04/2019 17:44, Eduardo Valentin wrote:
> > >> Hello,
> > >>
> > >> On Tue, Apr 16, 2019 at 07:22:03PM +0200, Daniel Lezcano wrote:
> > >>> Currently when we register a sensor, we specify the sensor id and a data
> > >>> pointer to be passed when the get_temp function is called. However the
> > >>> sensor_id is not passed to the get_temp callback forcing the driver to
> > >>> do extra allocation and adding back pointer to find out from the sensor
> > >>> information the driver data and then back to the sensor id.
> > >>>
> > >>> Add a new callback get_temp_id() which will be called if set. It will
> > >>> call the get_temp_id() with the sensor id.
> > >>>
> > >>> That will be more consistent with the registering function.
> > >>
> > >> I still do not understand why we need to have a get_id callback.
> > >> The use cases I have seen so far, which I have been intentionally 
> > >> rejecting, are
> > >> mainly solvable by creating other compatible entries. And really, if you
> > >> have, say a bandgap, chip that supports multiple sensors, but on
> > >> SoC version A it has 5 sensors, and on SoC version B it has only 4,
> > >> or on SoC version C, it has 5 but they are either logially located
> > >> in different places (gpu vs iva regions), these are all cases in which
> > >> you want a different compatible!
> > >>
> > >> Do you mind sharing why you need a get sensor id callback?
> > >
> > > It is not a get sensor id callback, it is a get_temp callback which pass
> > > the sensor id.
> > >
> > > See in the different drivers, it is a common pattern there is a
> > > structure for the driver, then a structure for the sensor. When the
> > > get_temp is called, the callback needs info from the sensor structure
> > > and from the driver structure, so a back pointer to the driver structure
> > > is added in the sensor structure.
> >

Do you mind sending a patch showing how one could convert an existing
driver to use this new API?

> > Hi Eduardo,
> >
> > does the explanation clarifies the purpose of this change?
> >
> 
> Eduardo, did you ever have a chance to revisit this thread? I would
> really like to make some progress on this one to unblock my i.MX8MQ
> hwmon series.

The problem I have with this patch is that it is an API which resides
only in of-thermal. Growing APIs on DT only diverges of-thermal from
thermal core and platform drivers.

Besides, this patch needs to document the API in Documention/

> 
> Thanks,
> Andrey Smirnov

Re: [PATCH] lib: test_overflow: Avoid taining the kernel and fix wrap size

2019-05-28 Thread Kees Cook

On Tue, May 28, 2019 at 04:40:06PM -0700, Joe Perches wrote:
> On Tue, 2019-05-28 at 15:51 -0700, Kees Cook wrote:
> > This adds __GFP_NOWARN to the kmalloc()-portions of the overflow test to
> > avoid tainting the kernel. Additionally fixes up the math on wrap size
> > to be architecture and page size agnostic.
> []
> > diff --git a/lib/test_overflow.c b/lib/test_overflow.c
> []
> > @@ -486,16 +486,17 @@ static int __init test_overflow_shift(void)
> []
> > +#define alloc_GFP   (GFP_KERNEL | __GFP_NOWARN)
> []
> > +#define alloc110(alloc, arg, sz) alloc(arg, sz, alloc_GFP | __GFP_NOWARN)
> 
> seems redundant.

Whoops. Missed that one. Fixing...

-- 
Kees Cook

Re: [PATCH RESEND V13 2/5] thermal: of-thermal: add API for getting sensor ID from DT

2019-05-28 Thread Eduardo Valentin

On Tue, May 28, 2019 at 02:06:18PM +0800, anson.hu...@nxp.com wrote:
> From: Anson Huang 
> 
> On some platforms like i.MX8QXP, the thermal driver needs a
> real HW sensor ID from DT thermal zone, the HW sensor ID is
> used to get temperature from SCU firmware, and the virtual
> sensor ID starting from 0 to N is NOT used at all, this patch
> adds new API thermal_zone_of_get_sensor_id() to provide the
> feature of getting sensor ID from DT thermal zone's node.
> 
> Signed-off-by: Anson Huang 
> ---
> Changes since V12:
>   - adjust the second parameter of thermal_zone_of_get_sensor_id() API, 
> then caller no need
> to pass the of_phandle_args structure and put the sensor_specs.np 
> manually, also putting
> the sensor node device check inside this API to make it easy for 
> usage;

What happened to using nxp,resource-id property in your driver?
Why do we need this as an API in of-thermal? What other drivers may
benefit of this?

Regardless, this patch needs to document the new API under
Documentation/

> ---
>  drivers/thermal/of-thermal.c | 66 
> +---
>  include/linux/thermal.h  | 10 +++
>  2 files changed, 60 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
> index dc5093b..a53792b 100644
> --- a/drivers/thermal/of-thermal.c
> +++ b/drivers/thermal/of-thermal.c
> @@ -449,6 +449,54 @@ thermal_zone_of_add_sensor(struct device_node *zone,
>  }
>  
>  /**
> + * thermal_zone_of_get_sensor_id - get sensor ID from a DT thermal zone
> + * @tz_np: a valid thermal zone device node.
> + * @sensor_np: a sensor node of a valid sensor device.
> + * @id: a sensor ID pointer will be passed back.
> + *
> + * This function will get sensor ID from a given thermal zone node, use
> + * "thermal-sensors" as list name, and get sensor ID from first phandle's
> + * argument.
> + *
> + * Return: 0 on success, proper error code otherwise.
> + */
> +
> +int thermal_zone_of_get_sensor_id(struct device_node *tz_np,
> +   struct device_node *sensor_np,
> +   u32 *id)
> +{
> + struct of_phandle_args sensor_specs;
> + int ret;
> +
> + ret = of_parse_phandle_with_args(tz_np,
> +  "thermal-sensors",
> +  "#thermal-sensor-cells",
> +  0,
> +  &sensor_specs);
> + if (ret)
> + return ret;
> +
> + if (sensor_specs.np != sensor_np) {
> + of_node_put(sensor_specs.np);
> + return -ENODEV;
> + }
> +
> + if (sensor_specs.args_count >= 1) {
> + *id = sensor_specs.args[0];
> + WARN(sensor_specs.args_count > 1,
> +  "%pOFn: too many cells in sensor specifier %d\n",
> +  sensor_specs.np, sensor_specs.args_count);
> + } else {
> + *id = 0;
> + }
> +
> + of_node_put(sensor_specs.np);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(thermal_zone_of_get_sensor_id);
> +
> +/**
>   * thermal_zone_of_sensor_register - registers a sensor to a DT thermal zone
>   * @dev: a valid struct device pointer of a sensor device. Must contain
>   *   a valid .of_node, for the sensor node.
> @@ -499,36 +547,22 @@ thermal_zone_of_sensor_register(struct device *dev, int 
> sensor_id, void *data,
>   sensor_np = of_node_get(dev->of_node);
>  
>   for_each_available_child_of_node(np, child) {
> - struct of_phandle_args sensor_specs;
>   int ret, id;
>  
>   /* For now, thermal framework supports only 1 sensor per zone */
> - ret = of_parse_phandle_with_args(child, "thermal-sensors",
> -  "#thermal-sensor-cells",
> -  0, &sensor_specs);
> + ret = thermal_zone_of_get_sensor_id(child, sensor_np, &id);
>   if (ret)
>   continue;
>  
> - if (sensor_specs.args_count >= 1) {
> - id = sensor_specs.args[0];
> - WARN(sensor_specs.args_count > 1,
> -  "%pOFn: too many cells in sensor specifier %d\n",
> -  sensor_specs.np, sensor_specs.args_count);
> - } else {
> - id = 0;
> - }
> -
> - if (sensor_specs.np == sensor_np && id == sensor_id) {
> + if (id == sensor_id) {
>   tzd = thermal_zone_of_add_sensor(child, sensor_np,
>data, ops);
>   if (!IS_ERR(tzd))
>   tzd->ops->set_mode(tzd, THERMAL_DEVICE_ENABLED);
>  
> - of_node_put(sensor_specs.np);
>   of_node_put(child);
>   goto exit;
>

[PATCH v2 net-next] net: mvpp2: cls: Remove unnessesary check in mvpp2_ethtool_cls_rule_ins

2019-05-28 Thread YueHaibing

Fix smatch warning:

drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c:1236
 mvpp2_ethtool_cls_rule_ins() warn: unsigned 'info->fs.location' is never less 
than zero.

'info->fs.location' is u32 type, never less than zero.

Signed-off-by: YueHaibing 
---
v2: rework patch based net-next
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c 
b/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c
index bd19a910dc90..e1c90adb2982 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c
@@ -1300,8 +1300,7 @@ int mvpp2_ethtool_cls_rule_ins(struct mvpp2_port *port,
struct mvpp2_ethtool_fs *efs, *old_efs;
int ret = 0;
 
-   if (info->fs.location >= MVPP2_N_RFS_ENTRIES_PER_FLOW ||
-   info->fs.location < 0)
+   if (info->fs.location >= MVPP2_N_RFS_ENTRIES_PER_FLOW)
return -EINVAL;
 
efs = kzalloc(sizeof(*efs), GFP_KERNEL);
-- 
2.20.1

[GIT PULL] tracing: Avoid memory leak in predicate_parse()

2019-05-28 Thread Steven Rostedt



Linus,

This fixes a memory leak from the error path in the event filter logic.


Please pull the latest trace-v5.2-rc2 tree, which can be found at:


  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
trace-v5.2-rc2

Tag SHA1: 0658b13d1bfd40bda1c2bd1ef3738857e1bf4000
Head SHA1: dfb4a6f2191a80c8b790117d0ff592fd712d3296


Tomas Bortoli (1):
  tracing: Avoid memory leak in predicate_parse()


 kernel/trace/trace_events_filter.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)
---
commit dfb4a6f2191a80c8b790117d0ff592fd712d3296
Author: Tomas Bortoli 
Date:   Tue May 28 17:43:38 2019 +0200

tracing: Avoid memory leak in predicate_parse()

In case of errors, predicate_parse() goes to the out_free label
to free memory and to return an error code.

However, predicate_parse() does not free the predicates of the
temporary prog_stack array, thence leaking them.

Link: http://lkml.kernel.org/r/20190528154338.29976-1-tomasbort...@gmail.com

Cc: sta...@vger.kernel.org
Fixes: 80765597bc587 ("tracing: Rewrite filter logic to be simpler and 
faster")
Reported-by: syzbot+6b8e0fb820e570c59...@syzkaller.appspotmail.com
Signed-off-by: Tomas Bortoli 
[ Added protection around freeing prog_stack[i].pred ]
Signed-off-by: Steven Rostedt (VMware) 

diff --git a/kernel/trace/trace_events_filter.c 
b/kernel/trace/trace_events_filter.c
index d3e59312ef40..5079d1db3754 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -428,7 +428,7 @@ predicate_parse(const char *str, int nr_parens, int 
nr_preds,
op_stack = kmalloc_array(nr_parens, sizeof(*op_stack), GFP_KERNEL);
if (!op_stack)
return ERR_PTR(-ENOMEM);
-   prog_stack = kmalloc_array(nr_preds, sizeof(*prog_stack), GFP_KERNEL);
+   prog_stack = kcalloc(nr_preds, sizeof(*prog_stack), GFP_KERNEL);
if (!prog_stack) {
parse_error(pe, -ENOMEM, 0);
goto out_free;
@@ -579,7 +579,11 @@ predicate_parse(const char *str, int nr_parens, int 
nr_preds,
 out_free:
kfree(op_stack);
kfree(inverts);
-   kfree(prog_stack);
+   if (prog_stack) {
+   for (i = 0; prog_stack[i].pred; i++)
+   kfree(prog_stack[i].pred);
+   kfree(prog_stack);
+   }
return ERR_PTR(ret);
 }

[RFC PATCH net-next] net: hns3: hclge_dbg_get_m7_stats_info() can be static

2019-05-28 Thread kbuild test robot



Fixes: 33a90e2f20e6 ("net: hns3: add support for dump firmware statistics by 
debugfs")
Signed-off-by: kbuild test robot 
---
 hclge_debugfs.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
index ed1f533..4fbed47a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
@@ -921,7 +921,7 @@ static void hclge_dbg_dump_rst_info(struct hclge_dev *hdev)
 hdev->rst_stats.reset_cnt);
 }
 
-void hclge_dbg_get_m7_stats_info(struct hclge_dev *hdev)
+static void hclge_dbg_get_m7_stats_info(struct hclge_dev *hdev)
 {
struct hclge_desc *desc_src, *desc_tmp;
struct hclge_get_m7_bd_cmd *req;

[net-next:master 161/171] drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:924:6: sparse: sparse: symbol 'hclge_dbg_get_m7_stats_info' was not declared. Should it be static?

2019-05-28 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   602e0f295a91813c9a15938f2a292b9c60a416d9
commit: 33a90e2f20e6c455889a0f41857692221172a5ae [161/171] net: hns3: add 
support for dump firmware statistics by debugfs
reproduce:
# apt-get install sparse
# sparse version: v0.6.1-rc1-7-g2b96cd8-dirty
git checkout 33a90e2f20e6c455889a0f41857692221172a5ae
make ARCH=x86_64 allmodconfig
make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 


sparse warnings: (new ones prefixed by >>)

   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:32:17: sparse: 
sparse: cast from restricted __le32
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:564:31: sparse: 
sparse: restricted __le16 degrades to integer
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:598:39: sparse: 
sparse: incorrect type in assignment (different base types) @@expected 
unsigned int @@got restricted __le32unsigned int @@
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:598:39: sparse:   
 expected unsigned int
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:598:39: sparse:   
 got restricted __le32 [usertype] qs_bit_map
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:833:30: sparse: 
sparse: restricted __le16 degrades to integer
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:840:33: sparse: 
sparse: restricted __le16 degrades to integer
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:841:30: sparse: 
sparse: restricted __le16 degrades to integer
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:842:31: sparse: 
sparse: restricted __le16 degrades to integer
   drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:844:33: sparse: 
sparse: restricted __le16 degrades to integer
>> drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:924:6: sparse: 
>> sparse: symbol 'hclge_dbg_get_m7_stats_info' was not declared. Should it be 
>> static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Re: [PATCH] arm64: dts: ls1028a: Add Thermal Monitor Unit node

2019-05-28 Thread Eduardo Valentin

On Thu, Apr 25, 2019 at 04:26:40PM +0800, Yuantian Tang wrote:
> The Thermal Monitoring Unit (TMU) monitors and reports the
> temperature from 2 remote temperature measurement sites
> located on ls1028a chip.
> Add TMU dts node to enable this feature.
> 
> Signed-off-by: Yuantian Tang 

I dont see anything wrong from a thermal standpoint.

Acked-by: Eduardo Valentin 

Please get this via your arch tree maintainer to avoid merge conflicts.

> ---
>  arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi |  114 
> 
>  1 files changed, 114 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> index b045812..a25f5fc 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> @@ -29,6 +29,7 @@
>   clocks = <&clockgen 1 0>;
>   next-level-cache = <&l2>;
>   cpu-idle-states = <&CPU_PH20>;
> + #cooling-cells = <2>;
>   };
>  
>   cpu1: cpu@1 {
> @@ -39,6 +40,7 @@
>   clocks = <&clockgen 1 0>;
>   next-level-cache = <&l2>;
>   cpu-idle-states = <&CPU_PH20>;
> + #cooling-cells = <2>;
>   };
>  
>   l2: l2-cache {
> @@ -398,6 +400,118 @@
>   status = "disabled";
>   };
>  
> + tmu: tmu@1f0 {
> + compatible = "fsl,qoriq-tmu";
> + reg = <0x0 0x1f8 0x0 0x1>;
> + interrupts = <0 23 0x4>;
> + fsl,tmu-range = <0xb 0xa0026 0x80048 0x70061>;
> + fsl,tmu-calibration = <0x 0x0024
> +0x0001 0x002b
> +0x0002 0x0031
> +0x0003 0x0038
> +0x0004 0x003f
> +0x0005 0x0045
> +0x0006 0x004c
> +0x0007 0x0053
> +0x0008 0x0059
> +0x0009 0x0060
> +0x000a 0x0066
> +0x000b 0x006d
> +
> +0x0001 0x001c
> +0x00010001 0x0024
> +0x00010002 0x002c
> +0x00010003 0x0035
> +0x00010004 0x003d
> +0x00010005 0x0045
> +0x00010006 0x004d
> +0x00010007 0x0045
> +0x00010008 0x005e
> +0x00010009 0x0066
> +0x0001000a 0x006e
> +
> +0x0002 0x0018
> +0x00020001 0x0022
> +0x00020002 0x002d
> +0x00020003 0x0038
> +0x00020004 0x0043
> +0x00020005 0x004d
> +0x00020006 0x0058
> +0x00020007 0x0063
> +0x00020008 0x006e
> +
> +0x0003 0x0010
> +0x00030001 0x001c
> +0x00030002 0x0029
> +0x00030003 0x0036
> +0x00030004 0x0042
> +0x00030005 0x004f
> +0x00030006 0x005b
> +0x00030007 0x0068>;
> + little-endian;
> + #thermal-sensor-cells = <1>;
> + };
> +
> + thermal-zones {
> + core-cluster {
> + polling-delay-passive = <1000>;
> + polling-delay = <5000>;
> + thermal-sensors = <&tmu 0>;
> +
> + trips {
> + core_cluster_a

Re: [PATCH -next] drivers: thermal: tsens: Change hw_id type to int in is_sensor_enabled

2019-05-28 Thread Eduardo Valentin

YueHaibing,

On Mon, May 27, 2019 at 09:41:24PM +0800, YueHaibing wrote:
> Sensor hw_id is int type other u32, is_sensor_enabled
> should use int to compare, this fix smatch warning:
> 
> drivers/thermal/qcom/tsens-common.c:72
>  is_sensor_enabled() warn: unsigned 'hw_id' is never less than zero.
> 
> Fixes: 3e6a8fb33084 ("drivers: thermal: tsens: Add new operation to check if 
> a sensor is enabled")

Thanks for the patch, but we had to revert this commit which was
causing some issues. So, your patch is not applicable.

> Signed-off-by: YueHaibing 

Thank you anyways.

> ---
>  drivers/thermal/qcom/tsens-common.c | 2 +-
>  drivers/thermal/qcom/tsens.h| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/thermal/qcom/tsens-common.c 
> b/drivers/thermal/qcom/tsens-common.c
> index 928e8e81ba69..5df4eed84535 100644
> --- a/drivers/thermal/qcom/tsens-common.c
> +++ b/drivers/thermal/qcom/tsens-common.c
> @@ -64,7 +64,7 @@ void compute_intercept_slope(struct tsens_priv *priv, u32 
> *p1,
>   }
>  }
>  
> -bool is_sensor_enabled(struct tsens_priv *priv, u32 hw_id)
> +bool is_sensor_enabled(struct tsens_priv *priv, int hw_id)
>  {
>   u32 val;
>   int ret;
> diff --git a/drivers/thermal/qcom/tsens.h b/drivers/thermal/qcom/tsens.h
> index eefe3844fb4e..15264806f6a8 100644
> --- a/drivers/thermal/qcom/tsens.h
> +++ b/drivers/thermal/qcom/tsens.h
> @@ -315,7 +315,7 @@ void compute_intercept_slope(struct tsens_priv *priv, u32 
> *pt1, u32 *pt2, u32 mo
>  int init_common(struct tsens_priv *priv);
>  int get_temp_tsens_valid(struct tsens_priv *priv, int i, int *temp);
>  int get_temp_common(struct tsens_priv *priv, int i, int *temp);
> -bool is_sensor_enabled(struct tsens_priv *priv, u32 hw_id);
> +bool is_sensor_enabled(struct tsens_priv *priv, int hw_id);
>  
>  /* TSENS target */
>  extern const struct tsens_plat_data data_8960;

Re: [PATCH 1/3] mm: thp: make deferred split shrinker memcg aware

2019-05-28 Thread Yang Shi





On 5/28/19 10:42 PM, Kirill Tkhai wrote:

Hi, Yang,

On 28.05.2019 15:44, Yang Shi wrote:

Currently THP deferred split shrinker is not memcg aware, this may cause
premature OOM with some configuration. For example the below test would
run into premature OOM easily:

$ cgcreate -g memory:thp
$ echo 4G > /sys/fs/cgroup/memory/thp/memory/limit_in_bytes
$ cgexec -g memory:thp transhuge-stress 4000

transhuge-stress comes from kernel selftest.

It is easy to hit OOM, but there are still a lot THP on the deferred
split queue, memcg direct reclaim can't touch them since the deferred
split shrinker is not memcg aware.

Convert deferred split shrinker memcg aware by introducing per memcg
deferred split queue.  The THP should be on either per node or per memcg
deferred split queue if it belongs to a memcg.  When the page is
immigrated to the other memcg, it will be immigrated to the target
memcg's deferred split queue too.

And, move deleting THP from deferred split queue in page free before
memcg uncharge so that the page's memcg information is available.

Reuse the second tail page's deferred_list for per memcg list since the
same THP can't be on multiple deferred split queues.

Cc: Kirill Tkhai 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: "Kirill A . Shutemov" 
Cc: Hugh Dickins 
Cc: Shakeel Butt 
Signed-off-by: Yang Shi 
---
  include/linux/huge_mm.h|  24 ++
  include/linux/memcontrol.h |   6 ++
  include/linux/mm_types.h   |   7 +-
  mm/huge_memory.c   | 182 +
  mm/memcontrol.c|  20 +
  mm/swap.c  |   4 +
  6 files changed, 194 insertions(+), 49 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7cd5c15..f6d1cde 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -250,6 +250,26 @@ static inline bool thp_migration_supported(void)
return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);
  }
  
+static inline struct list_head *page_deferred_list(struct page *page)

+{
+   /*
+* Global deferred list in the second tail pages is occupied by
+* compound_head.
+*/
+   return &page[2].deferred_list;
+}
+
+static inline struct list_head *page_memcg_deferred_list(struct page *page)
+{
+   /*
+* Memcg deferred list in the second tail pages is occupied by
+* compound_head.
+*/
+   return &page[2].memcg_deferred_list;
+}
+
+extern void del_thp_from_deferred_split_queue(struct page *);
+
  #else /* CONFIG_TRANSPARENT_HUGEPAGE */
  #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
  #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
@@ -368,6 +388,10 @@ static inline bool thp_migration_supported(void)
  {
return false;
  }
+
+static inline void del_thp_from_deferred_split_queue(struct page *page)
+{
+}
  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
  
  #endif /* _LINUX_HUGE_MM_H */

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index bc74d6a..9ff5fab 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -316,6 +316,12 @@ struct mem_cgroup {
struct list_head event_list;
spinlock_t event_list_lock;
  
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE

+   struct list_head split_queue;
+   unsigned long split_queue_len;
+   spinlock_t split_queue_lock;
+#endif
+
struct mem_cgroup_per_node *nodeinfo[0];
/* WARNING: nodeinfo must be the last member here */
  };
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8ec38b1..405f5e6 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -139,7 +139,12 @@ struct page {
struct {/* Second tail page of compound page */
unsigned long _compound_pad_1;  /* compound_head */
unsigned long _compound_pad_2;
-   struct list_head deferred_list;
+   union {
+   /* Global THP deferred split list */
+   struct list_head deferred_list;
+   /* Memcg THP deferred split list */
+   struct list_head memcg_deferred_list;

Why we need two namesakes for this list entry?

For me it looks redundantly: it does not give additional information,
but it leads to duplication (and we have two helpers page_deferred_list()
and page_memcg_deferred_list() instead of one).


Yes, kind of. Actually I was also wondering if this is worth or not. My 
point is this may improve the code readability. We can figure out what 
split queue (per node or per memcg) is being manipulated just by the 
name of the list.


If the most people thought this is unnecessary, I'm definitely ok to 
just keep one name.





+   };
};
struct {/* Page table pages */
unsigned long _pt_pad_1;/* compound_head */
diff --git a/mm/hu

[PATCH] pinctrl: ns2: Fix potential NULL dereference

2019-05-28 Thread Young Xiao

platform_get_resource() may fail and return NULL, so we should
better check it's return value to avoid a NULL pointer dereference
a bit later in the code.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/pinctrl/bcm/pinctrl-ns2-mux.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pinctrl/bcm/pinctrl-ns2-mux.c 
b/drivers/pinctrl/bcm/pinctrl-ns2-mux.c
index 4b5cf0e..2bf6af7 100644
--- a/drivers/pinctrl/bcm/pinctrl-ns2-mux.c
+++ b/drivers/pinctrl/bcm/pinctrl-ns2-mux.c
@@ -1048,6 +1048,8 @@ static int ns2_pinmux_probe(struct platform_device *pdev)
return PTR_ERR(pinctrl->base0);
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+   if (!res)
+   return -EINVAL;
pinctrl->base1 = devm_ioremap_nocache(&pdev->dev, res->start,
resource_size(res));
if (!pinctrl->base1) {
-- 
2.7.4

Re: [PATCH] kernel/sys.c: fix possible spectre-v1 in do_prlimit()

2019-05-28 Thread Dianzhang Chen

Hi,

Although when detect it is misprediction and drop the execution, but
it can not drop all the effects of speculative execution, like the
cache state. During the speculative execution, the:


rlim = tsk->signal->rlim + resource;// use resource as index

...

*old_rlim = *rlim;


may read some secret data into cache.

and then the attacker can use side-channel attack to find out what the
secret data is.


Virtually any observable effect of speculatively executed code can be
leveraged to create the covert channel that leaks sensitive
information[1].


A general form of spectre v1 would be[1]:

if (x < array1_size) {

y = array1[x];

// do something using y that is

// observable when speculatively

// executed

}


[1] https://spectreattack.com/spectre.pdf

Cyrill Gorcunov  于2019年5月28日周二 下午3:10写道：
>
> On Tue, May 28, 2019 at 10:37:10AM +0800, Dianzhang Chen wrote:
> > Hi,
> > Because when i reply your email，i always get 'Message rejected' from
> > gmail(get this rejection from all the recipients). I still don't know
> > how to deal with it, so i reply your email here:
>
> Hi! This is weird. Next time simply reply to LKML (I CC'ed it back).
>
> > Because of speculative execution, the attacker can bypass the bound
> > check `if (resource >= RLIM_NLIMITS)`.
>
> And then misprediction get detected and execution is dropped. So I
> still don't see a problem here, since we don't leak info even in
> such case.
>
> That said I don't mind for this patch but rather in a sake of
> code clarity, not because of spectre issue since it has
> nothing to do here.
>
> > as for array_index_nospec(index, size), it will clamp the index within
> > the range of [0, size), and attacker can't exploit speculative
> > execution to make the index out of range [0, size).
> >
> >
> > For more detail, please check the link below:
> >
> > https://github.com/torvalds/linux/commit/f3804203306e098dae9ca51540fcd5eb700d7f40

Re: [PATCH] thermal: tsens: Remove unnecessary comparison of unsigned integer with < 0

2019-05-28 Thread Eduardo Valentin

Gustavo,

On Mon, May 27, 2019 at 11:08:25AM -0500, Gustavo A. R. Silva wrote:
> There is no need to compare hw_id with < 0 because such comparison
> of an unsigned value is always false.
> 
> Fix this by removing such comparison.


Thanks for fixing this. But we had to revert the commit that introduces
this issue. So this patch is no longer applicable.

> 
> Addresses-Coverity-ID: 1445440 ("Unsigned compared against 0")
> Fixes: 3e6a8fb33084 ("drivers: thermal: tsens: Add new operation to check if 
> a sensor is enabled")
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/thermal/qcom/tsens-common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/qcom/tsens-common.c 
> b/drivers/thermal/qcom/tsens-common.c
> index 928e8e81ba69..94878ad35464 100644
> --- a/drivers/thermal/qcom/tsens-common.c
> +++ b/drivers/thermal/qcom/tsens-common.c
> @@ -69,7 +69,7 @@ bool is_sensor_enabled(struct tsens_priv *priv, u32 hw_id)
>   u32 val;
>   int ret;
>  
> - if ((hw_id > (priv->num_sensors - 1)) || (hw_id < 0))
> + if (hw_id > priv->num_sensors - 1)
>   return -EINVAL;
>   ret = regmap_field_read(priv->rf[SENSOR_EN], &val);
>   if (ret)

[PATCH 1/1] Revert "drivers: thermal: tsens: Add new operation to check if a sensor is enabled"

2019-05-28 Thread Eduardo Valentin

This reverts commit 3e6a8fb3308419129c7a52de6eb42feef5a919a0.

Cc: Andy Gross 
Cc: David Brown 
Cc: Amit Kucheria 
Cc: Zhang Rui 
Cc: Daniel Lezcano 
Suggested-by: Amit Kucheria 
Reported-by: Andy Gross 
Signed-off-by: Eduardo Valentin 
---

Added this for next -rc, as per request.

 drivers/thermal/qcom/tsens-common.c | 14 --
 drivers/thermal/qcom/tsens-v0_1.c   |  1 -
 drivers/thermal/qcom/tsens-v2.c |  1 -
 drivers/thermal/qcom/tsens.c|  5 -
 drivers/thermal/qcom/tsens.h|  1 -
 5 files changed, 22 deletions(-)

diff --git a/drivers/thermal/qcom/tsens-common.c 
b/drivers/thermal/qcom/tsens-common.c
index 928e8e8..528df88 100644
--- a/drivers/thermal/qcom/tsens-common.c
+++ b/drivers/thermal/qcom/tsens-common.c
@@ -64,20 +64,6 @@ void compute_intercept_slope(struct tsens_priv *priv, u32 
*p1,
}
 }
 
-bool is_sensor_enabled(struct tsens_priv *priv, u32 hw_id)
-{
-   u32 val;
-   int ret;
-
-   if ((hw_id > (priv->num_sensors - 1)) || (hw_id < 0))
-   return -EINVAL;
-   ret = regmap_field_read(priv->rf[SENSOR_EN], &val);
-   if (ret)
-   return ret;
-
-   return val & (1 << hw_id);
-}
-
 static inline int code_to_degc(u32 adc_code, const struct tsens_sensor *s)
 {
int degc, num, den;
diff --git a/drivers/thermal/qcom/tsens-v0_1.c 
b/drivers/thermal/qcom/tsens-v0_1.c
index a319283..6f26fad 100644
--- a/drivers/thermal/qcom/tsens-v0_1.c
+++ b/drivers/thermal/qcom/tsens-v0_1.c
@@ -334,7 +334,6 @@ static const struct reg_field 
tsens_v0_1_regfields[MAX_REGFIELDS] = {
/* CTRL_OFFSET */
[TSENS_EN] = REG_FIELD(SROT_CTRL_OFF, 0,  0),
[TSENS_SW_RST] = REG_FIELD(SROT_CTRL_OFF, 1,  1),
-   [SENSOR_EN]= REG_FIELD(SROT_CTRL_OFF, 3, 13),
 
/* - TM -- */
/* INTERRUPT ENABLE */
diff --git a/drivers/thermal/qcom/tsens-v2.c b/drivers/thermal/qcom/tsens-v2.c
index 1099069..0a4f2b8 100644
--- a/drivers/thermal/qcom/tsens-v2.c
+++ b/drivers/thermal/qcom/tsens-v2.c
@@ -44,7 +44,6 @@ static const struct reg_field 
tsens_v2_regfields[MAX_REGFIELDS] = {
/* CTRL_OFF */
[TSENS_EN] = REG_FIELD(SROT_CTRL_OFF,0,  0),
[TSENS_SW_RST] = REG_FIELD(SROT_CTRL_OFF,1,  1),
-   [SENSOR_EN]= REG_FIELD(SROT_CTRL_OFF,3, 18),
 
/* - TM -- */
/* INTERRUPT ENABLE */
diff --git a/drivers/thermal/qcom/tsens.c b/drivers/thermal/qcom/tsens.c
index 36b0b52..0627d86 100644
--- a/drivers/thermal/qcom/tsens.c
+++ b/drivers/thermal/qcom/tsens.c
@@ -85,11 +85,6 @@ static int tsens_register(struct tsens_priv *priv)
struct thermal_zone_device *tzd;
 
for (i = 0;  i < priv->num_sensors; i++) {
-   if (!is_sensor_enabled(priv, priv->sensor[i].hw_id)) {
-   dev_err(priv->dev, "sensor %d: disabled\n",
-   priv->sensor[i].hw_id);
-   continue;
-   }
priv->sensor[i].priv = priv;
priv->sensor[i].id = i;
tzd = devm_thermal_zone_of_sensor_register(priv->dev, i,
diff --git a/drivers/thermal/qcom/tsens.h b/drivers/thermal/qcom/tsens.h
index eefe384..2fd9499 100644
--- a/drivers/thermal/qcom/tsens.h
+++ b/drivers/thermal/qcom/tsens.h
@@ -315,7 +315,6 @@ void compute_intercept_slope(struct tsens_priv *priv, u32 
*pt1, u32 *pt2, u32 mo
 int init_common(struct tsens_priv *priv);
 int get_temp_tsens_valid(struct tsens_priv *priv, int i, int *temp);
 int get_temp_common(struct tsens_priv *priv, int i, int *temp);
-bool is_sensor_enabled(struct tsens_priv *priv, u32 hw_id);
 
 /* TSENS target */
 extern const struct tsens_plat_data data_8960;
-- 
2.1.4

Re: [RFC PATCH 0/3] Make deferred split shrinker memcg aware

2019-05-28 Thread Yang Shi





On 5/29/19 9:22 AM, David Rientjes wrote:

On Tue, 28 May 2019, Yang Shi wrote:


I got some reports from our internal application team about memcg OOM.
Even though the application has been killed by oom killer, there are
still a lot THPs reside, page reclaim doesn't reclaim them at all.

Some investigation shows they are on deferred split queue, memcg direct
reclaim can't shrink them since THP deferred split shrinker is not memcg
aware, this may cause premature OOM in memcg.  The issue can be
reproduced easily by the below test:


Right, we've also encountered this.  I talked to Kirill about it a week or
so ago where the suggestion was to split all compound pages on the
deferred split queues under the presence of even memory pressure.

That breaks cgroup isolation and perhaps unfairly penalizes workloads that
are running attached to other memcg hierarchies that are not under
pressure because their compound pages are now split as a side effect.
There is a benefit to keeping these compound pages around while not under
memory pressure if all pages are subsequently mapped again.


Yes, I do agree. I tried other approaches too, it sounds making deferred 
split queue per memcg is the optimal one.





$ cgcreate -g memory:thp
$ echo 4G > /sys/fs/cgroup/memory/thp/memory/limit_in_bytes
$ cgexec -g memory:thp ./transhuge-stress 4000

transhuge-stress comes from kernel selftest.

It is easy to hit OOM, but there are still a lot THP on the deferred split
queue, memcg direct reclaim can't touch them since the deferred split
shrinker is not memcg aware.


Yes, we have seen this on at least 4.15 as well.


Convert deferred split shrinker memcg aware by introducing per memcg deferred
split queue.  The THP should be on either per node or per memcg deferred
split queue if it belongs to a memcg.  When the page is immigrated to the
other memcg, it will be immigrated to the target memcg's deferred split queue
too.

And, move deleting THP from deferred split queue in page free before memcg
uncharge so that the page's memcg information is available.

Reuse the second tail page's deferred_list for per memcg list since the same
THP can't be on multiple deferred split queues at the same time.

Remove THP specific destructor since it is not used anymore with memcg aware
THP shrinker (Please see the commit log of patch 2/3 for the details).

Make deferred split shrinker not depend on memcg kmem since it is not slab.
It doesn't make sense to not shrink THP even though memcg kmem is disabled.

With the above change the test demonstrated above doesn't trigger OOM anymore
even though with cgroup.memory=nokmem.


I'm curious if your internal applications team is also asking for
statistics on how much memory can be freed if the deferred split queues
can be shrunk?  We have applications that monitor their own memory usage


No, but this reminds me. The THPs on deferred split queue should be 
accounted into available memory too.



through memcg stats or usage and proactively try to reduce that usage when
it is growing too large.  The deferred split queues have significantly
increased both memcg usage and rss when they've upgraded kernels.

How are your applications monitoring how much memory from deferred split
queues can be freed on memory pressure?  Any thoughts on providing it as a
memcg stat?


I don't think they have such monitor. I saw rss_huge is abormal in memcg 
stat even after the application is killed by oom, so I realized the 
deferred split queue may play a role here.


The memcg stat doesn't have counters for available memory as global 
vmstat. It may be better to have such statistics, or extending 
reclaimable "slab" to shrinkable/reclaimable "memory".




Thanks!

Re: [PATCH net-next] net: stmmac: Switch to devm_alloc_etherdev_mqs

2019-05-28 Thread Jisheng Zhang

On Tue, 28 May 2019 11:07:53 -0700 David Miller wrote:

> 
> You never even tried to compiled this patch.
> 

oops, my bad. I patched the another branch and tested the patch but when I
manually patch net-next tree, I made a mistake. Sorry.

[PATCH net-next v2] net: stmmac: Switch to devm_alloc_etherdev_mqs

2019-05-28 Thread Jisheng Zhang

Make use of devm_alloc_etherdev_mqs() to simplify the code.

Signed-off-by: Jisheng Zhang 
---
Since V1:
 - fix the build error, sorry, my bad.

 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index a87ec70b19f1..4defdcb4f237 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -4243,9 +4243,8 @@ int stmmac_dvr_probe(struct device *device,
u32 queue, maxq;
int ret = 0;
 
-   ndev = alloc_etherdev_mqs(sizeof(struct stmmac_priv),
- MTL_MAX_TX_QUEUES,
- MTL_MAX_RX_QUEUES);
+   ndev = devm_alloc_etherdev_mqs(device, sizeof(struct stmmac_priv),
+  MTL_MAX_TX_QUEUES, MTL_MAX_RX_QUEUES);
if (!ndev)
return -ENOMEM;
 
@@ -4277,8 +4276,7 @@ int stmmac_dvr_probe(struct device *device,
priv->wq = create_singlethread_workqueue("stmmac_wq");
if (!priv->wq) {
dev_err(priv->device, "failed to create workqueue\n");
-   ret = -ENOMEM;
-   goto error_wq;
+   return -ENOMEM;
}
 
INIT_WORK(&priv->service_task, stmmac_service_task);
@@ -4434,8 +4432,6 @@ int stmmac_dvr_probe(struct device *device,
}
 error_hw_init:
destroy_workqueue(priv->wq);
-error_wq:
-   free_netdev(ndev);
 
return ret;
 }
@@ -4472,7 +4468,6 @@ int stmmac_dvr_remove(struct device *dev)
stmmac_mdio_unregister(ndev);
destroy_workqueue(priv->wq);
mutex_destroy(&priv->lock);
-   free_netdev(ndev);
 
return 0;
 }
-- 
2.20.1

[PATCH] sparc: perf: fix updated event period in response to PERF_EVENT_IOC_PERIOD

2019-05-28 Thread Young Xiao

The PERF_EVENT_IOC_PERIOD ioctl command can be used to change the
sample period of a running perf_event. Consequently, when calculating
the next event period, the new period will only be considered after the
previous one has overflowed.

This patch changes the calculation of the remaining event ticks so that
they are offset if the period has changed.

See commit 3581fe0ef37c ("ARM: 7556/1: perf: fix updated event period in
response to PERF_EVENT_IOC_PERIOD") for details.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 arch/sparc/kernel/perf_event.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index 6de7c68..a58ae9c 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -891,6 +891,10 @@ static int sparc_perf_event_set_period(struct perf_event 
*event,
s64 period = hwc->sample_period;
int ret = 0;
 
+   /* The period may have been changed by PERF_EVENT_IOC_PERIOD */
+   if (unlikely(period != hwc->last_period))
+   left = period - (hwc->last_period - left);
+
if (unlikely(left <= -period)) {
left = period;
local64_set(&hwc->period_left, left);
-- 
2.7.4

[PATCH net-next v3 0/5] net: stmmac: enable EHL SGMII

2019-05-28 Thread Voon Weifeng

This patch-set is to enable Ethernet controller
(DW Ethernet QoS and DW Ethernet PCS) with SGMII interface in Elkhart Lake.
The DW Ethernet PCS is the Physical Coding Sublayer that is between Ethernet
MAC and PHY and uses MDIO Clause-45 as Communication.

Kweh Hock Leong (1):
  net: stmmac: enable clause 45 mdio support

Ong Boon Leong (3):
  net: stmmac: introducing support for DWC xPCS logics
  net: stmmac: add xpcs function hooks into main driver and ethtool
  net: stmmac: add xPCS functions for device with DWMACv5.1

Voon Weifeng (1):
  net: stmmac: add EHL SGMII 1Gbps PCI info and PCI ID

 drivers/net/ethernet/stmicro/stmmac/Makefile   |   2 +-
 drivers/net/ethernet/stmicro/stmmac/common.h   |   1 +
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c  |  33 
 drivers/net/ethernet/stmicro/stmmac/dwxpcs.c   | 198 +
 drivers/net/ethernet/stmicro/stmmac/dwxpcs.h   |  51 ++
 drivers/net/ethernet/stmicro/stmmac/hwif.c |  41 -
 drivers/net/ethernet/stmicro/stmmac/hwif.h |  21 +++
 drivers/net/ethernet/stmicro/stmmac/stmmac.h   |   2 +
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |  50 --
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 152 
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  |  40 -
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c   | 111 
 include/linux/phy.h|   2 +
 include/linux/stmmac.h |   3 +
 14 files changed, 649 insertions(+), 58 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxpcs.c
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxpcs.h

-- 
Changelog v2:
*Added support for the C37 AN for 1000BASE-X and SGMII (MAC side SGMII only)
*removed and submitted the fix patch to net
 "net: stmmac: dma channel control register need to be init first"
*Squash the following 2 patches and move it to the end of the patch set:
 "net: stmmac: add EHL SGMII 1Gbps platform data and PCI ID"
 "net: stmmac: add xPCS platform data for EHL"
Changelog v3:
*Applied reversed christmas tree
1.9.1

[PATCH net-next v3 5/5] net: stmmac: add EHL SGMII 1Gbps PCI info and PCI ID

2019-05-28 Thread Voon Weifeng

Added EHL SGMII 1Gbps PCI ID. Different MII and speed will have
different PCI ID.

Signed-off-by: Voon Weifeng 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 111 +++
 1 file changed, 111 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
index 7cbc01f316fa..f2225c1eafc2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -23,6 +23,7 @@
 #include 
 
 #include "stmmac.h"
+#include "dwxpcs.h"
 
 /*
  * This struct is used to associate PCI Function of MAC controller on a board,
@@ -118,6 +119,113 @@ static int stmmac_default_data(struct pci_dev *pdev,
.setup = stmmac_default_data,
 };
 
+static int ehl_common_data(struct pci_dev *pdev,
+  struct plat_stmmacenet_data *plat)
+{
+   int i;
+
+   plat->bus_id = 1;
+   plat->phy_addr = 0;
+   plat->clk_csr = 5;
+   plat->has_gmac = 0;
+   plat->has_gmac4 = 1;
+   plat->xpcs_phy_addr = 0x16;
+   plat->pcs_mode = AN_CTRL_PCS_MD_C37_SGMII;
+   plat->force_sf_dma_mode = 0;
+   plat->tso_en = 1;
+
+   plat->rx_queues_to_use = 8;
+   plat->tx_queues_to_use = 8;
+   plat->rx_sched_algorithm = MTL_RX_ALGORITHM_SP;
+
+   for (i = 0; i < plat->rx_queues_to_use; i++) {
+   plat->rx_queues_cfg[i].mode_to_use = MTL_QUEUE_DCB;
+   plat->rx_queues_cfg[i].chan = i;
+
+   /* Disable Priority config by default */
+   plat->rx_queues_cfg[i].use_prio = false;
+
+   /* Disable RX queues routing by default */
+   plat->rx_queues_cfg[i].pkt_route = 0x0;
+   }
+
+   for (i = 0; i < plat->tx_queues_to_use; i++) {
+   plat->tx_queues_cfg[i].mode_to_use = MTL_QUEUE_DCB;
+
+   /* Disable Priority config by default */
+   plat->tx_queues_cfg[i].use_prio = false;
+   }
+
+   plat->tx_sched_algorithm = MTL_TX_ALGORITHM_WRR;
+   plat->tx_queues_cfg[0].weight = 0x09;
+   plat->tx_queues_cfg[1].weight = 0x0A;
+   plat->tx_queues_cfg[2].weight = 0x0B;
+   plat->tx_queues_cfg[3].weight = 0x0C;
+   plat->tx_queues_cfg[4].weight = 0x0D;
+   plat->tx_queues_cfg[5].weight = 0x0E;
+   plat->tx_queues_cfg[6].weight = 0x0F;
+   plat->tx_queues_cfg[7].weight = 0x10;
+
+   plat->mdio_bus_data->phy_reset = NULL;
+   plat->mdio_bus_data->phy_mask = 0;
+
+   plat->dma_cfg->pbl = 32;
+   plat->dma_cfg->pblx8 = true;
+   plat->dma_cfg->fixed_burst = 0;
+   plat->dma_cfg->mixed_burst = 0;
+   plat->dma_cfg->aal = 0;
+
+   plat->axi = devm_kzalloc(&pdev->dev, sizeof(*plat->axi),
+GFP_KERNEL);
+   if (!plat->axi)
+   return -ENOMEM;
+   plat->axi->axi_lpi_en = 0;
+   plat->axi->axi_xit_frm = 0;
+   plat->axi->axi_wr_osr_lmt = 0;
+   plat->axi->axi_rd_osr_lmt = 2;
+   plat->axi->axi_blen[0] = 4;
+   plat->axi->axi_blen[1] = 8;
+   plat->axi->axi_blen[2] = 16;
+
+   /* Set default value for multicast hash bins */
+   plat->multicast_filter_bins = HASH_TABLE_SIZE;
+
+   /* Set default value for unicast filter entries */
+   plat->unicast_filter_entries = 1;
+
+   /* Set the maxmtu to a default of JUMBO_LEN */
+   plat->maxmtu = JUMBO_LEN;
+
+   /* Set 32KB fifo size as the advertised fifo size in
+* the HW features is not the same as the HW implementation
+*/
+   plat->tx_fifo_size = 32768;
+   plat->rx_fifo_size = 32768;
+
+   return 0;
+}
+
+static int ehl_sgmii1g_data(struct pci_dev *pdev,
+   struct plat_stmmacenet_data *plat)
+{
+   int ret;
+
+   /* Set common default data first */
+   ret = ehl_common_data(pdev, plat);
+
+   if (ret)
+   return ret;
+
+   plat->interface = PHY_INTERFACE_MODE_SGMII;
+   plat->has_xpcs = 1;
+
+   return 0;
+}
+
+static struct stmmac_pci_info ehl_sgmii1g_pci_info = {
+   .setup = ehl_sgmii1g_data,
+};
+
 static const struct stmmac_pci_func_data galileo_stmmac_func_data[] = {
{
.func = 6,
@@ -290,6 +398,7 @@ static int stmmac_pci_probe(struct pci_dev *pdev,
res.addr = pcim_iomap_table(pdev)[i];
res.wol_irq = pdev->irq;
res.irq = pdev->irq;
+   res.xpcs_irq = 0;
 
return stmmac_dvr_probe(&pdev->dev, plat, &res);
 }
@@ -359,6 +468,7 @@ static int __maybe_unused stmmac_pci_resume(struct device 
*dev)
 
 #define STMMAC_QUARK_ID  0x0937
 #define STMMAC_DEVICE_ID 0x1108
+#define STMMAC_EHL_SGMII1G_ID   0x4b31
 
 #define STMMAC_DEVICE(vendor_id, dev_id, info) {   \
PCI_VDEVICE(vendor_id, dev_id), \
@@ -369,6 +479,7 @@ static int __maybe_unused stmmac_pci_resume(struct device 
*dev)
STMMAC_DEVICE(STMMAC, STMMAC_DEVICE_ID, stmmac_pci_info),
STMMAC_DEV

Re: [PATCH v5 3/7] iommu/vt-d: Introduce is_downstream_to_pci_bridge helper

2019-05-28 Thread Lu Baolu


Hi,

On 5/28/19 7:50 PM, Eric Auger wrote:

Several call sites are about to check whether a device belongs
to the PCI sub-hierarchy of a candidate PCI-PCI bridge.
Introduce an helper to perform that check.



This looks good to me.

Reviewed-by: Lu Baolu 

Best regards,
Baolu



Signed-off-by: Eric Auger 
---
  drivers/iommu/intel-iommu.c | 37 +
  1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5ec8b5bd308f..879f11c82b05 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -736,12 +736,39 @@ static int iommu_dummy(struct device *dev)
return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
  }
  
+/* is_downstream_to_pci_bridge - test if a device belongs to the

+ * PCI sub-hierarchy of a candidate PCI-PCI bridge
+ *
+ * @dev: candidate PCI device belonging to @bridge PCI sub-hierarchy
+ * @bridge: the candidate PCI-PCI bridge
+ *
+ * Return: true if @dev belongs to @bridge PCI sub-hierarchy
+ */
+static bool
+is_downstream_to_pci_bridge(struct device *dev, struct device *bridge)
+{
+   struct pci_dev *pdev, *pbridge;
+
+   if (!dev_is_pci(dev) || !dev_is_pci(bridge))
+   return false;
+
+   pdev = to_pci_dev(dev);
+   pbridge = to_pci_dev(bridge);
+
+   if (pbridge->subordinate &&
+   pbridge->subordinate->number <= pdev->bus->number &&
+   pbridge->subordinate->busn_res.end >= pdev->bus->number)
+   return true;
+
+   return false;
+}
+
  static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 
*devfn)
  {
struct dmar_drhd_unit *drhd = NULL;
struct intel_iommu *iommu;
struct device *tmp;
-   struct pci_dev *ptmp, *pdev = NULL;
+   struct pci_dev *pdev = NULL;
u16 segment = 0;
int i;
  
@@ -787,13 +814,7 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf

goto out;
}
  
-			if (!pdev || !dev_is_pci(tmp))

-   continue;
-
-   ptmp = to_pci_dev(tmp);
-   if (ptmp->subordinate &&
-   ptmp->subordinate->number <= pdev->bus->number &&
-   ptmp->subordinate->busn_res.end >= 
pdev->bus->number)
+   if (is_downstream_to_pci_bridge(dev, tmp))
goto got_pdev;
}

[PATCH net-next v3 2/5] net: stmmac: introducing support for DWC xPCS logics

2019-05-28 Thread Voon Weifeng

From: Ong Boon Leong 

xPCS is DWC Ethernet Physical Coding Sublayer that may be integrated
into a GbE controller that uses DWC EQoS MAC controller. An example of
HW configuration is shown below:-

  <-GBE Controller-->|<--External PHY chip-->

  +--+ +++---+   +--+
  |   EQoS   | <-GMII->| DW |<-->|PHY| <-- SGMII --> | External GbE |
  |   MAC| |xPCS||IF |   | PHY Chip |
  +--+ +++---+   +--+
 ^   ^  ^
 |   |  |
 +-MDIO-+

xPCS is a Clause-45 MDIO Manageable Device (MMD) and we need a way to
differentiate it from external PHY chip that is discovered over MDIO.
Therefore, xpcs_phy_addr is introduced in stmmac platform data
(plat_stmmacenet_data) for differentiating xPCS from 'phy_addr' that
belongs to external PHY.

Basic functionalities for initializing xPCS and configuring auto
negotiation (AN), loopback, link status, AN advertisement and Link
Partner ability are implemented. The implementation supports the C37
AN for 1000BASE-X and SGMII (MAC side SGMII only).

Tested-by: Tan, Tee Min 
Reviewed-by: Voon Weifeng 
Reviewed-by: Kweh Hock Leong 
Signed-off-by: Ong Boon Leong 
Signed-off-by: Voon Weifeng 
---
 drivers/net/ethernet/stmicro/stmmac/Makefile |   2 +-
 drivers/net/ethernet/stmicro/stmmac/common.h |   1 +
 drivers/net/ethernet/stmicro/stmmac/dwxpcs.c | 198 +++
 drivers/net/ethernet/stmicro/stmmac/dwxpcs.h |  51 +++
 drivers/net/ethernet/stmicro/stmmac/hwif.h   |  19 +++
 include/linux/stmmac.h   |   1 +
 6 files changed, 271 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxpcs.c
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxpcs.h

diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index c529c21e9bdd..57ca648fae4e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -6,7 +6,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
  mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o  \
  dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \
  stmmac_tc.o dwxgmac2_core.o dwxgmac2_dma.o dwxgmac2_descs.o \
- $(stmmac-y)
+ dwxpcs.o $(stmmac-y)
 
 # Ordering matters. Generic driver must be last.
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 272b9ca66314..67d03a5a21af 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -419,6 +419,7 @@ struct mii_regs {
 
 struct mac_device_info {
const struct stmmac_ops *mac;
+   const struct stmmac_xpcs *xpcs;
const struct stmmac_desc_ops *desc;
const struct stmmac_dma_ops *dma;
const struct stmmac_mode_ops *mode;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxpcs.c 
b/drivers/net/ethernet/stmicro/stmmac/dwxpcs.c
new file mode 100644
index ..081d3631afd2
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxpcs.c
@@ -0,0 +1,198 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019, Intel Corporation.
+ * DWC Ethernet Physical Coding Sublayer
+ */
+#include 
+#include 
+#include "dwxpcs.h"
+#include "stmmac.h"
+
+/* DW xPCS mdiobus_read and mdiobus_write helper functions */
+#define xpcs_read(dev, reg) \
+   mdiobus_read(priv->mii, xpcs_phy_addr, \
+MII_ADDR_C45 | (reg) | \
+((dev) << MII_DEVADDR_C45_SHIFT))
+#define xpcs_write(dev, reg, val) \
+   mdiobus_write(priv->mii, xpcs_phy_addr, \
+ MII_ADDR_C45 | (reg) | \
+ ((dev) << MII_DEVADDR_C45_SHIFT), val)
+
+static void dw_xpcs_init(struct net_device *ndev, int pcs_mode)
+{
+   struct stmmac_priv *priv = netdev_priv(ndev);
+   int xpcs_phy_addr = priv->plat->xpcs_phy_addr;
+   int phydata;
+
+   if (pcs_mode == AN_CTRL_PCS_MD_C37_SGMII) {
+   /* For AN for SGMII mode, the settings are :-
+* 1) VR_MII_AN_CTRL Bit(2:1)[PCS_MODE] = 10b (SGMII AN)
+* 2) VR_MII_AN_CTRL Bit(3) [TX_CONFIG] = 0b (MAC side SGMII)
+*DW xPCS used with DW EQoS MAC is always MAC
+*side SGMII.
+* 3) VR_MII_AN_CTRL Bit(0) [AN_INTR_EN] = 1b (AN Interrupt
+*enabled)
+* 4) VR_MII_DIG_CTRL1 Bit(9) [MAC_AUTO_SW] = 1b (Automatic
+*speed mode change after SGMII AN complete)
+* Note: Since it is MAC side SGMII, there is no need to set
+

[PATCH net-next v3 4/5] net: stmmac: add xPCS functions for device with DWMACv5.1

2019-05-28 Thread Voon Weifeng

From: Ong Boon Leong 

We introduce support for driver that has v5.10 IP and is also using
xPCS as MMD. This can be easily enabled for other product that integrates
xPCS that is not using v5.00 IP.

Reviewed-by: Chuah Kim Tatt 
Reviewed-by: Voon Weifeng 
Reviewed-by: Kweh Hock Leong 
Reviewed-by: Baoli Zhang 
Signed-off-by: Ong Boon Leong 
Signed-off-by: Voon Weifeng 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 33 ++
 drivers/net/ethernet/stmicro/stmmac/hwif.c| 41 ++-
 drivers/net/ethernet/stmicro/stmmac/hwif.h|  2 ++
 3 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index b4bb5629de38..34f05068142e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -801,6 +801,39 @@ static void dwmac4_debug(void __iomem *ioaddr, struct 
stmmac_extra_stats *x,
.flex_pps_config = dwmac5_flex_pps_config,
 };
 
+const struct stmmac_ops dwmac510_xpcs_ops = {
+   .core_init = dwmac4_core_init,
+   .set_mac = stmmac_dwmac4_set_mac,
+   .rx_ipc = dwmac4_rx_ipc_enable,
+   .rx_queue_enable = dwmac4_rx_queue_enable,
+   .rx_queue_prio = dwmac4_rx_queue_priority,
+   .tx_queue_prio = dwmac4_tx_queue_priority,
+   .rx_queue_routing = dwmac4_rx_queue_routing,
+   .prog_mtl_rx_algorithms = dwmac4_prog_mtl_rx_algorithms,
+   .prog_mtl_tx_algorithms = dwmac4_prog_mtl_tx_algorithms,
+   .set_mtl_tx_queue_weight = dwmac4_set_mtl_tx_queue_weight,
+   .map_mtl_to_dma = dwmac4_map_mtl_dma,
+   .config_cbs = dwmac4_config_cbs,
+   .dump_regs = dwmac4_dump_regs,
+   .host_irq_status = dwmac4_irq_status,
+   .host_mtl_irq_status = dwmac4_irq_mtl_status,
+   .flow_ctrl = dwmac4_flow_ctrl,
+   .pmt = dwmac4_pmt,
+   .set_umac_addr = dwmac4_set_umac_addr,
+   .get_umac_addr = dwmac4_get_umac_addr,
+   .set_eee_mode = dwmac4_set_eee_mode,
+   .reset_eee_mode = dwmac4_reset_eee_mode,
+   .set_eee_timer = dwmac4_set_eee_timer,
+   .set_eee_pls = dwmac4_set_eee_pls,
+   .debug = dwmac4_debug,
+   .set_filter = dwmac4_set_filter,
+   .safety_feat_config = dwmac5_safety_feat_config,
+   .safety_feat_irq_status = dwmac5_safety_feat_irq_status,
+   .safety_feat_dump = dwmac5_safety_feat_dump,
+   .rxp_config = dwmac5_rxp_config,
+   .flex_pps_config = dwmac5_flex_pps_config,
+};
+
 int dwmac4_setup(struct stmmac_priv *priv)
 {
struct mac_device_info *mac = priv->hw;
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c 
b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index 81b966a8261b..f1cb3ce165e5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -73,11 +73,13 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
bool gmac;
bool gmac4;
bool xgmac;
+   bool has_xpcs;
u32 min_id;
const struct stmmac_regs_off regs;
const void *desc;
const void *dma;
const void *mac;
+   const void *xpcs;
const void *hwtimestamp;
const void *mode;
const void *tc;
@@ -89,6 +91,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
.gmac = false,
.gmac4 = false,
.xgmac = false,
+   .has_xpcs = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC3_X_OFFSET,
@@ -97,6 +100,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
.desc = NULL,
.dma = &dwmac100_dma_ops,
.mac = &dwmac100_ops,
+   .xpcs = NULL,
.hwtimestamp = &stmmac_ptp,
.mode = NULL,
.tc = NULL,
@@ -106,6 +110,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
.gmac = true,
.gmac4 = false,
.xgmac = false,
+   .has_xpcs = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC3_X_OFFSET,
@@ -114,6 +119,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
.desc = NULL,
.dma = &dwmac1000_dma_ops,
.mac = &dwmac1000_ops,
+   .xpcs = NULL,
.hwtimestamp = &stmmac_ptp,
.mode = NULL,
.tc = NULL,
@@ -123,6 +129,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
.gmac = false,
.gmac4 = true,
.xgmac = false,
+   .has_xpcs = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
@@ -130,6 +137,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
},
.des

[PATCH net-next v3 3/5] net: stmmac: add xpcs function hooks into main driver and ethtool

2019-05-28 Thread Voon Weifeng

From: Ong Boon Leong 

With xPCS functions now ready, we add them into the main driver and
ethtool logics. To differentiate from EQoS MAC PCS and DWC Ethernet
xPCS, we introduce 'has_xpcs' in platform data as a mean to indicate
whether GBE controller includes xPCS or not.

To support platform-specific C37 AN PCS mode selection for MII MMD,
we introduce 'pcs_mode' in platform data.

The basic framework for xPCS interrupt handling is implemented too.

Reviewed-by: Chuah Kim Tatt 
Reviewed-by: Voon Weifeng 
Reviewed-by: Kweh Hock Leong 
Reviewed-by: Baoli Zhang 
Signed-off-by: Ong Boon Leong 
Signed-off-by: Voon Weifeng 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h   |   2 +
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |  50 +--
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 152 -
 include/linux/stmmac.h |   2 +
 4 files changed, 158 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h 
b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index dd95d959c1ce..0b8460a4a220 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -36,6 +36,7 @@ struct stmmac_resources {
const char *mac;
int wol_irq;
int lpi_irq;
+   int xpcs_irq;
int irq;
 };
 
@@ -168,6 +169,7 @@ struct stmmac_priv {
int clk_csr;
struct timer_list eee_ctrl_timer;
int lpi_irq;
+   int xpcs_irq;
int eee_enabled;
int eee_active;
int tx_lpi_timer;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index e09522c5509a..f0815d196147 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -28,6 +28,7 @@
 
 #include "stmmac.h"
 #include "dwmac_dma.h"
+#include "dwxpcs.h"
 
 #define REG_SPACE_SIZE 0x1060
 #define MAC100_ETHTOOL_NAME"st_mac100"
@@ -277,7 +278,8 @@ static int stmmac_ethtool_get_link_ksettings(struct 
net_device *dev,
struct phy_device *phy = dev->phydev;
 
if (priv->hw->pcs & STMMAC_PCS_RGMII ||
-   priv->hw->pcs & STMMAC_PCS_SGMII) {
+   priv->hw->pcs & STMMAC_PCS_SGMII ||
+   priv->plat->pcs_mode == AN_CTRL_PCS_MD_C37_1000BASEX) {
struct rgmii_adv adv;
u32 supported, advertising, lp_advertising;
 
@@ -294,6 +296,11 @@ static int stmmac_ethtool_get_link_ksettings(struct 
net_device *dev,
if (stmmac_pcs_get_adv_lp(priv, priv->ioaddr, &adv))
return -EOPNOTSUPP; /* should never happen indeed */
 
+   /* Get ADV & LPA is only application for 1000BASE-X C37.
+* For MAC side SGMII AN, get ADV & LPA from PHY.
+*/
+   stmmac_xpcs_get_adv_lp(priv, dev, &adv, priv->plat->pcs_mode);
+
/* Encoding of PSE bits is defined in 802.3z, 37.2.1.4 */
 
ethtool_convert_link_mode_to_legacy_u32(
@@ -376,22 +383,23 @@ static int stmmac_ethtool_get_link_ksettings(struct 
net_device *dev,
int rc;
 
if (priv->hw->pcs & STMMAC_PCS_RGMII ||
-   priv->hw->pcs & STMMAC_PCS_SGMII) {
-   u32 mask = ADVERTISED_Autoneg | ADVERTISED_Pause;
-
+   priv->hw->pcs & STMMAC_PCS_SGMII ||
+   priv->plat->pcs_mode == AN_CTRL_PCS_MD_C37_1000BASEX) {
/* Only support ANE */
if (cmd->base.autoneg != AUTONEG_ENABLE)
return -EINVAL;
 
-   mask &= (ADVERTISED_1000baseT_Half |
-   ADVERTISED_1000baseT_Full |
-   ADVERTISED_100baseT_Half |
-   ADVERTISED_100baseT_Full |
-   ADVERTISED_10baseT_Half |
-   ADVERTISED_10baseT_Full);
-
mutex_lock(&priv->lock);
stmmac_pcs_ctrl_ane(priv, priv->ioaddr, 1, priv->hw->ps, 0);
+
+   /* For 1000BASE-X C37 AN, it is always 1000Mbps. And, we only
+* support FD which is set by default in SR_MII_AN_ADV
+* during XPCS init. So, we don't need to set FD again.
+* For SGMII C37 AN, we let user to change link settings
+* through PHY since it is MAC side SGMII.
+*/
+   stmmac_xpcs_ctrl_ane(priv, dev, 1, 0);
+
mutex_unlock(&priv->lock);
 
return 0;
@@ -457,6 +465,16 @@ static void stmmac_ethtool_gregs(struct net_device *dev,
pause->autoneg = 1;
if (!adv_lp.pause)
return;
+   } else if (priv->plat->pcs_mode == AN_CTRL_PCS_MD_C37_1000BASEX &&
+  !stmmac_xpcs_get_adv_lp(priv, netdev, &adv_lp,
+  priv->plat->pcs_mode)) {
+   /* DW xPCS 1000BASE-X C37 AN mode only because for

[PATCH net-next v3 1/5] net: stmmac: enable clause 45 mdio support

2019-05-28 Thread Voon Weifeng

From: Kweh Hock Leong 

DWMAC4 is capable to support clause 45 mdio communication.
This patch enable the feature on stmmac_mdio_write() and
stmmac_mdio_read() by following phy_write_mmd() and
phy_read_mmd() mdiobus read write implementation format.

Reviewed-by: Li, Yifan 
Signed-off-by: Kweh Hock Leong 
Signed-off-by: Ong Boon Leong 
Signed-off-by: Weifeng Voon 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 40 ++-
 include/linux/phy.h   |  2 ++
 2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index bdd351597b55..c3d8f1d145ec 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -34,11 +34,27 @@
 
 #define MII_BUSY 0x0001
 #define MII_WRITE 0x0002
+#define MII_DATA_MASK GENMASK(15, 0)
 
 /* GMAC4 defines */
 #define MII_GMAC4_GOC_SHIFT2
+#define MII_GMAC4_REG_ADDR_SHIFT   16
 #define MII_GMAC4_WRITE(1 << MII_GMAC4_GOC_SHIFT)
 #define MII_GMAC4_READ (3 << MII_GMAC4_GOC_SHIFT)
+#define MII_GMAC4_C45E BIT(1)
+
+static void stmmac_mdio_c45_setup(struct stmmac_priv *priv, int phyreg,
+ u32 *val, u32 *data)
+{
+   unsigned int reg_shift = priv->hw->mii.reg_shift;
+   unsigned int reg_mask = priv->hw->mii.reg_mask;
+
+   *val |= MII_GMAC4_C45E;
+   *val &= ~reg_mask;
+   *val |= ((phyreg >> MII_DEVADDR_C45_SHIFT) << reg_shift) & reg_mask;
+
+   *data |= (phyreg & MII_REGADDR_C45_MASK) << MII_GMAC4_REG_ADDR_SHIFT;
+}
 
 /* XGMAC defines */
 #define MII_XGMAC_SADDRBIT(18)
@@ -165,22 +181,26 @@ static int stmmac_mdio_read(struct mii_bus *bus, int 
phyaddr, int phyreg)
struct stmmac_priv *priv = netdev_priv(ndev);
unsigned int mii_address = priv->hw->mii.addr;
unsigned int mii_data = priv->hw->mii.data;
-   u32 v;
-   int data;
u32 value = MII_BUSY;
+   int data = 0;
+   u32 v;
 
value |= (phyaddr << priv->hw->mii.addr_shift)
& priv->hw->mii.addr_mask;
value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
& priv->hw->mii.clk_csr_mask;
-   if (priv->plat->has_gmac4)
+   if (priv->plat->has_gmac4) {
value |= MII_GMAC4_READ;
+   if (phyreg & MII_ADDR_C45)
+   stmmac_mdio_c45_setup(priv, phyreg, &value, &data);
+   }
 
if (readl_poll_timeout(priv->ioaddr + mii_address, v, !(v & MII_BUSY),
   100, 1))
return -EBUSY;
 
+   writel(data, priv->ioaddr + mii_data);
writel(value, priv->ioaddr + mii_address);
 
if (readl_poll_timeout(priv->ioaddr + mii_address, v, !(v & MII_BUSY),
@@ -188,7 +208,7 @@ static int stmmac_mdio_read(struct mii_bus *bus, int 
phyaddr, int phyreg)
return -EBUSY;
 
/* Read the data from the MII data register */
-   data = (int)readl(priv->ioaddr + mii_data);
+   data = (int)readl(priv->ioaddr + mii_data) & MII_DATA_MASK;
 
return data;
 }
@@ -208,8 +228,9 @@ static int stmmac_mdio_write(struct mii_bus *bus, int 
phyaddr, int phyreg,
struct stmmac_priv *priv = netdev_priv(ndev);
unsigned int mii_address = priv->hw->mii.addr;
unsigned int mii_data = priv->hw->mii.data;
-   u32 v;
u32 value = MII_BUSY;
+   int data = phydata;
+   u32 v;
 
value |= (phyaddr << priv->hw->mii.addr_shift)
& priv->hw->mii.addr_mask;
@@ -217,10 +238,13 @@ static int stmmac_mdio_write(struct mii_bus *bus, int 
phyaddr, int phyreg,
 
value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
& priv->hw->mii.clk_csr_mask;
-   if (priv->plat->has_gmac4)
+   if (priv->plat->has_gmac4) {
value |= MII_GMAC4_WRITE;
-   else
+   if (phyreg & MII_ADDR_C45)
+   stmmac_mdio_c45_setup(priv, phyreg, &value, &data);
+   } else {
value |= MII_WRITE;
+   }
 
/* Wait until any existing MII operation is complete */
if (readl_poll_timeout(priv->ioaddr + mii_address, v, !(v & MII_BUSY),
@@ -228,7 +252,7 @@ static int stmmac_mdio_write(struct mii_bus *bus, int 
phyaddr, int phyreg,
return -EBUSY;
 
/* Set the MII address register to write */
-   writel(phydata, priv->ioaddr + mii_data);
+   writel(data, priv->ioaddr + mii_data);
writel(value, priv->ioaddr + mii_address);
 
/* Wait until any existing MII operation is complete */
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 073fb151b5a9..d3daac8ec686 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@

[PATCH] falcon: pass valid pointer from ef4_enqueue_unwind.

2019-05-28 Thread Young Xiao

The bytes_compl and pkts_compl pointers passed to ef4_dequeue_buffers
cannot be NULL. Add a paranoid warning to check this condition and fix
the one case where they were NULL.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/net/ethernet/sfc/falcon/tx.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/falcon/tx.c 
b/drivers/net/ethernet/sfc/falcon/tx.c
index c5059f4..ed89bc6 100644
--- a/drivers/net/ethernet/sfc/falcon/tx.c
+++ b/drivers/net/ethernet/sfc/falcon/tx.c
@@ -69,6 +69,7 @@ static void ef4_dequeue_buffer(struct ef4_tx_queue *tx_queue,
}
 
if (buffer->flags & EF4_TX_BUF_SKB) {
+   EF4_WARN_ON_PARANOID(!pkts_compl || !bytes_compl);
(*pkts_compl)++;
(*bytes_compl) += buffer->skb->len;
dev_consume_skb_any((struct sk_buff *)buffer->skb);
@@ -271,12 +272,14 @@ static int ef4_tx_map_data(struct ef4_tx_queue *tx_queue, 
struct sk_buff *skb)
 static void ef4_enqueue_unwind(struct ef4_tx_queue *tx_queue)
 {
struct ef4_tx_buffer *buffer;
+   unsigned int bytes_compl = 0;
+   unsigned int pkts_compl = 0;
 
/* Work backwards until we hit the original insert pointer value */
while (tx_queue->insert_count != tx_queue->write_count) {
--tx_queue->insert_count;
buffer = __ef4_tx_queue_get_insert_buffer(tx_queue);
-   ef4_dequeue_buffer(tx_queue, buffer, NULL, NULL);
+   ef4_dequeue_buffer(tx_queue, buffer, &pkts_compl, &bytes_compl);
}
 }
 
-- 
2.7.4

Re: [PATCH net-next 1/5] timecounter: Add helper for reconstructing partial timestamps

2019-05-28 Thread John Stultz

On Tue, May 28, 2019 at 4:58 PM Vladimir Oltean  wrote:
>
> Some PTP hardware offers a 64-bit free-running counter whose snapshots
> are used for timestamping, but only makes part of that snapshot
> available as timestamps (low-order bits).
>
> In that case, timecounter/cyclecounter users must bring the cyclecounter
> and timestamps to the same bit width, and they currently have two
> options of doing so:
>
> - Trim the higher bits of the timecounter itself to the number of bits
>   of the timestamps.  This might work for some setups, but if the
>   wraparound of the timecounter in this case becomes high (~10 times per
>   second) then this causes additional strain on the system, which must
>   read the clock that often just to avoid missing the wraparounds.
>
> - Reconstruct the timestamp by racing to read the PTP time within one
>   wraparound cycle since the timestamp was generated.  This is
>   preferable when the wraparound time is small (do a time-critical
>   readout once vs doing it periodically), and it has no drawback even
>   when the wraparound is comfortably sized.
>
> Signed-off-by: Vladimir Oltean 
> ---
>  include/linux/timecounter.h |  7 +++
>  kernel/time/timecounter.c   | 33 +
>  2 files changed, 40 insertions(+)
>
> diff --git a/include/linux/timecounter.h b/include/linux/timecounter.h
> index 2496ad4cfc99..03eab1f3bb9c 100644
> --- a/include/linux/timecounter.h
> +++ b/include/linux/timecounter.h
> @@ -30,6 +30,9 @@
>   * by the implementor and user of specific instances of this API.
>   *
>   * @read:  returns the current cycle value
> + * @partial_tstamp_mask:bitmask in case the hardware emits timestamps
> + * which only capture low-order bits of the full
> + * counter, and should be reconstructed.
>   * @mask:  bitmask for two's complement
>   * subtraction of non 64 bit counters,
>   * see CYCLECOUNTER_MASK() helper macro
> @@ -38,6 +41,7 @@
>   */
>  struct cyclecounter {
> u64 (*read)(const struct cyclecounter *cc);
> +   u64 partial_tstamp_mask;
> u64 mask;
> u32 mult;
> u32 shift;
> @@ -136,4 +140,7 @@ extern u64 timecounter_read(struct timecounter *tc);
>  extern u64 timecounter_cyc2time(struct timecounter *tc,
> u64 cycle_tstamp);
>
> +extern u64 cyclecounter_reconstruct(const struct cyclecounter *cc,
> +   u64 ts_partial);
> +
>  #endif
> diff --git a/kernel/time/timecounter.c b/kernel/time/timecounter.c
> index 85b98e727306..d4657d64e38d 100644
> --- a/kernel/time/timecounter.c
> +++ b/kernel/time/timecounter.c
> @@ -97,3 +97,36 @@ u64 timecounter_cyc2time(struct timecounter *tc,
> return nsec;
>  }
>  EXPORT_SYMBOL_GPL(timecounter_cyc2time);
> +
> +/**
> + * cyclecounter_reconstruct - reconstructs @ts_partial
> + * @cc:Pointer to cycle counter.
> + * @ts_partial:Typically RX or TX NIC timestamp, provided by 
> hardware as
> + * the lower @partial_tstamp_mask bits of the cycle counter,
> + * sampled at the time the timestamp was collected.
> + * To reconstruct into a full @mask bit-wide timestamp, the
> + * cycle counter is read and the high-order bits (up to @mask) 
> are
> + * filled in.
> + * Must be called within one wraparound of @partial_tstamp_mask
> + * bits of the cycle counter.
> + */
> +u64 cyclecounter_reconstruct(const struct cyclecounter *cc, u64 ts_partial)
> +{
> +   u64 ts_reconstructed;
> +   u64 cycle_now;
> +
> +   cycle_now = cc->read(cc);
> +
> +   ts_reconstructed = (cycle_now & ~cc->partial_tstamp_mask) |
> +   ts_partial;
> +
> +   /* Check lower bits of current cycle counter against the timestamp.
> +* If the current cycle counter is lower than the partial timestamp,
> +* then wraparound surely occurred and must be accounted for.
> +*/
> +   if ((cycle_now & cc->partial_tstamp_mask) <= ts_partial)
> +   ts_reconstructed -= (cc->partial_tstamp_mask + 1);
> +
> +   return ts_reconstructed;
> +}
> +EXPORT_SYMBOL_GPL(cyclecounter_reconstruct);

Hrm. Is this actually generic? Would it make more sense to have the
specific implementations with this quirk implement this in their
read() handler? If not, why?

thanks
-john

Re: [PATCH v2 1/3] KVM: x86: add support for user wait instructions

2019-05-28 Thread Tao Xu




On 29/05/2019 09:24, Paolo Bonzini wrote:

On 24/05/19 09:56, Tao Xu wrote:

+7.19 KVM_CAP_ENABLE_USR_WAIT_PAUSE
+
+Architectures: x86
+Parameters: args[0] whether feature should be enabled or not
+
+With this capability enabled, a VM can use UMONITOR, UMWAIT and TPAUSE
+instructions. If the instruction causes a delay, the amount of
+time delayed is called here the physical delay. The physical delay is
+first computed by determining the virtual delay (the time to delay
+relative to the VM’s timestamp counter). Otherwise, UMONITOR, UMWAIT
+and TPAUSE cause an invalid-opcode exception(#UD).
+


There is no need to make it a capability.  You can just check the guest
CPUID and see if it includes X86_FEATURE_WAITPKG.

Paolo



Thank you Paolo, but I have another question. I was wondering if it is 
appropriate to enable X86_FEATURE_WAITPKG when QEMU uses "-overcommit 
cpu-pm=on"? Or just enable X86_FEATURE_WAITPKG when QEMU add the feature 
"-cpu host,+waitpkg"? User wait instructions is the wait or pause 
instructions may be executed at any privilege level, but can use 
IA32_UMWAIT_CONTROL to set the maximum time.

Re: [PATCH 2/2] Revert "mm, thp: restore node-local hugepage allocations"

2019-05-28 Thread David Rientjes

On Fri, 24 May 2019, Andrea Arcangeli wrote:

> > > We are going in circles, *yes* there is a problem for potential swap 
> > > storms today because of the poor interaction between memory compaction 
> > > and 
> > > directed reclaim but this is a result of a poor API that does not allow 
> > > userspace to specify that its workload really will span multiple sockets 
> > > so faulting remotely is the best course of action.  The fix is not to 
> > > cause regressions for others who have implemented a userspace stack that 
> > > is based on the past 3+ years of long standing behavior or for 
> > > specialized 
> > > workloads where it is known that it spans multiple sockets so we want 
> > > some 
> > > kind of different behavior.  We need to provide a clear and stable API to 
> > > define these terms for the page allocator that is independent of any 
> > > global setting of thp enabled, defrag, zone_reclaim_mode, etc.  It's 
> > > workload dependent.
> > 
> > um, who is going to do this work?
> 
> That's a good question. It's going to be a not simple patch to
> backport to -stable: it'll be intrusive and it will affect
> mm/page_alloc.c significantly so it'll reject heavy. I wouldn't
> consider it -stable material at least in the short term, it will
> require some testing.
> 

Hi Andrea,

I'm not sure what patch you're referring to, unfortunately.  The above 
comment was referring to APIs that are made available to userspace to 
define when to fault locally vs remotely and what the preference should be 
for any form of compaction or reclaim to achieve that.  Today we have 
global enabling options, global defrag settings, enabling prctls, and 
madvise options.  The point it makes is that whether a specific workload 
fits into a single socket is workload dependant and thus we are left with 
prctls and madvise options.  The prctl either enables thp or it doesn't, 
it is not interesting here; the madvise is overloaded in four different 
ways (enabling, stalling at fault, collapsability, defrag) so it's not 
surprising that continuing to overload it for existing users will cause 
undesired results.  It makes an argument that we need a clear and stable 
means of defining the behavior, not changing the 4+ year behavior and 
giving those who regress no workaround.

> This is why applying a simple fix that avoids the swap storms (and the
> swap-less pathological THP regression for vfio device assignment GUP
> pinning) is preferable before adding an alloc_pages_multi_order (or
> equivalent) so that it'll be the allocator that will decide when
> exactly to fallback from 2M to 4k depending on the NUMA distance and
> memory availability during the zonelist walk. The basic idea is to
> call alloc_pages just once (not first for 2M and then for 4k) and
> alloc_pages will decide which page "order" to return.
> 

The commit description doesn't mention the swap storms that you're trying 
to fix, it's probably better to describe that again and why it is not 
beneficial to swap unless an entire pageblock can become free or memory 
compaction has indicated that additional memory freeing would allow 
migration to make an entire pageblock free.  I understand that's a 
invasive code change, but merging this patch changes the 4+ year behavior 
that started here:

commit 077fcf116c8c2bd7ee9487b645aa3b50368db7e1
Author: Aneesh Kumar K.V 
Date:   Wed Feb 11 15:27:12 2015 -0800

mm/thp: allocate transparent hugepages on local node

And that commit's description describes quite well the regression that we 
encounter if we remove __GFP_THISNODE here.  That's because the access 
latency regression is much more substantial than what was reported for 
Naples in your changelog.

In the interest of making forward progress, can we agree that swapping 
from the local node *never* makes sense unless we can show that an entire 
pageblock can become free or that it enables memory compaction to migrate 
memory that can make an entire pageblock free?  Are you reporting swap 
storms for the local node when one of these is true?

> > Implementing a new API doesn't help existing userspace which is hurting
> > from the problem which this patch addresses.
> 
> Yes, we can't change all apps that may not fit in a single NUMA
> node. Currently it's unsafe to turn "transparent_hugepages/defrag =
> always" or the bad behavior can then materialize also outside of
> MADV_HUGEPAGE. Those apps that use MADV_HUGEPAGE on their long lived
> allocations (i.e. guest physical memory) like qemu are affected even
> with the default "defrag = madvise". Those apps are using
> MADV_HUGEPAGE for more than 3 years and they are widely used and open
> source of course.
> 

I continue to reiterate that the 4+ year long standing behavior of 
MADV_HUGEPAGE is overloaded; you are anticipating a specific behavior for 
workloads that do not fit in a single NUMA node whereas other users 
developed in the past four years are anticipating a different behavior.  
I'm trying to propose

Re: [PATCH v2 2/7] drivers/soc: Add Aspeed XDMA Engine Driver

2019-05-28 Thread Andrew Jeffery




On Sat, 25 May 2019, at 01:39, Eddie James wrote:
> 
> On 5/21/19 7:02 AM, Arnd Bergmann wrote:
> > On Mon, May 20, 2019 at 10:19 PM Eddie James  wrote:
> >> diff --git a/include/uapi/linux/aspeed-xdma.h 
> >> b/include/uapi/linux/aspeed-xdma.h
> >> new file mode 100644
> >> index 000..2a4bd13
> >> --- /dev/null
> >> +++ b/include/uapi/linux/aspeed-xdma.h
> >> @@ -0,0 +1,26 @@
> >> +/* SPDX-License-Identifier: GPL-2.0+ */
> >> +/* Copyright IBM Corp 2019 */
> >> +
> >> +#ifndef _UAPI_LINUX_ASPEED_XDMA_H_
> >> +#define _UAPI_LINUX_ASPEED_XDMA_H_
> >> +
> >> +#include 
> >> +
> >> +/*
> >> + * aspeed_xdma_op
> >> + *
> >> + * upstream: boolean indicating the direction of the DMA operation; 
> >> upstream
> >> + *   means a transfer from the BMC to the host
> >> + *
> >> + * host_addr: the DMA address on the host side, typically configured by 
> >> PCI
> >> + *subsystem
> >> + *
> >> + * len: the size of the transfer in bytes; it should be a multiple of 16 
> >> bytes
> >> + */
> >> +struct aspeed_xdma_op {
> >> +   __u32 upstream;
> >> +   __u64 host_addr;
> >> +   __u32 len;
> >> +};
> >> +
> >> +#endif /* _UAPI_LINUX_ASPEED_XDMA_H_ */
> > If this is a user space interface, please remove the holes in the
> > data structure.
> 
> 
> Surely it's 4-byte aligned and there won't be holes??

__u64 is 8-byte aligned, so you have a hole after upstream.

Easiest just to put upstream after len?

Andrew

[PATCH] intel_menlow: avoid null pointer deference error

2019-05-28 Thread Young Xiao

Fix a null pointer deference by acpi_driver_data() if device is
null (dereference before check). We should only set cdev and check
this is OK after we are sure device is not null.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/platform/x86/intel_menlow.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/intel_menlow.c 
b/drivers/platform/x86/intel_menlow.c
index 77eb870..28feb5c 100644
--- a/drivers/platform/x86/intel_menlow.c
+++ b/drivers/platform/x86/intel_menlow.c
@@ -180,9 +180,13 @@ static int intel_menlow_memory_add(struct acpi_device 
*device)
 
 static int intel_menlow_memory_remove(struct acpi_device *device)
 {
-   struct thermal_cooling_device *cdev = acpi_driver_data(device);
+   struct thermal_cooling_device *cdev;
+
+   if (!device)
+   return -EINVAL;
 
-   if (!device || !cdev)
+   cdev = acpi_driver_data(device);
+   if (!cdev)
return -EINVAL;
 
sysfs_remove_link(&device->dev.kobj, "thermal_cooling");
-- 
2.7.4

[PATCH] wcd9335: fix a incorrect use of kstrndup()

2019-05-28 Thread Gen Zhang

In wcd9335_codec_enable_dec(), 'widget_name' is allocated by kstrndup().
However, according to doc: "Note: Use kmemdup_nul() instead if the size
is known exactly." So we should use kmemdup_nul() here instead of
kstrndup().

Signed-off-by: Gen Zhang 
---
diff --git a/sound/soc/codecs/wcd9335.c b/sound/soc/codecs/wcd9335.c
index a04a7ce..85737fe 100644
--- a/sound/soc/codecs/wcd9335.c
+++ b/sound/soc/codecs/wcd9335.c
@@ -2734,7 +2734,7 @@ static int wcd9335_codec_enable_dec(struct 
snd_soc_dapm_widget *w,
char *dec;
u8 hpf_coff_freq;
 
-   widget_name = kstrndup(w->name, 15, GFP_KERNEL);
+   widget_name = kmemdup_nul(w->name, 15, GFP_KERNEL);
if (!widget_name)
return -ENOMEM;
 
---

Re: [PATCH RESEND 2/7] csky: entry: Remove unneeded need_resched() loop

2019-05-28 Thread Guo Ren

Thx Valentin,

You are right, Approved.

Best Regards
 Guo Ren

On Tue, May 28, 2019 at 11:48:43AM +0100, Valentin Schneider wrote:
> Since the enabling and disabling of IRQs within preempt_schedule_irq()
> is contained in a need_resched() loop, we don't need the outer arch
> code loop.
> 
> Signed-off-by: Valentin Schneider 
> Cc: Guo Ren 
> ---
>  arch/csky/kernel/entry.S | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/arch/csky/kernel/entry.S b/arch/csky/kernel/entry.S
> index a7e84bd8..679afbcc2001 100644
> --- a/arch/csky/kernel/entry.S
> +++ b/arch/csky/kernel/entry.S
> @@ -292,11 +292,7 @@ ENTRY(csky_irq)
>   ldw r8, (r9, TINFO_FLAGS)
>   btsti   r8, TIF_NEED_RESCHED
>   bf  2f
> -1:
>   jbsrpreempt_schedule_irq/* irq en/disable is done inside */
> - ldw r7, (r9, TINFO_FLAGS)   /* get new tasks TI_FLAGS */
> - btsti   r7, TIF_NEED_RESCHED
> - bt  1b  /* go again */
>  #endif
>  2:
>   jmpiret_from_exception
> -- 
> 2.20.1
>

Re: [PATCH] perf: Fix oops when kthread execs user process

2019-05-28 Thread Michael Ellerman

Peter Zijlstra  writes:
> On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote:
>> When a kthread calls call_usermodehelper() the steps are:
>>   1. allocate current->mm
>>   2. load_elf_binary()
>>   3. populate current->thread.regs
>> 
>> While doing this, interrupts are not disabled. If there is a perf
>> interrupt in the middle of this process (i.e. step 1 has completed
>> but not yet reached to step 3) and if perf tries to read userspace
>> regs, kernel oops.
>> 
>> Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
>> pt_regs are not set.
>> 
>> See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs
>> user process") for details.
>
> Why the hell do we set current->mm before it is complete? Note that
> normally exec() builds the new mm before attaching it, see exec_mmap()
> in flush_old_exec().
>
> Also, why did those PPC folks 'fix' this in isolation? And why didn't
> you Cc them?

We just assumed it was our bug, 'cause we have plenty of those :)

cheers

[v4, PATCH] add some features in stmmac

2019-05-28 Thread Biao Huang

Changes in v4:  
retain the reverse xmas tree ordering.  

Changes in v3:  
rewrite the patch base on serires in
https://patchwork.ozlabs.org/project/netdev/list/?series=109699 

Changes in v2;  
1. reverse Christmas tree order in dwmac4_set_filter.   
2. remove clause 45 patch, waiting for cl45 patch from Boon Leong   
   

v1: 
This series add some features in stmmac driver. 
1. add support for hash table size 128/256  
2. add mdio clause 45 access from mac device for dwmac4.

Biao Huang (1): 
  net: stmmac: add support for hash table size 128/256 in dwmac4

 drivers/net/ethernet/stmicro/stmmac/common.h  |7 +--   
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h  |4 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c |   49 - 
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c  |1 + 
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |4 ++
 5 files changed, 40 insertions(+), 25 deletions(-) 

--  
1.7.9.5

[v4, PATCH] net: stmmac: add support for hash table size 128/256 in dwmac4

2019-05-28 Thread Biao Huang

1. get hash table size in hw feature reigster, and add support
for taller hash table(128/256) in dwmac4.
2. only clear GMAC_PACKET_FILTER bits used in this function,
to avoid side effect to functions of other bits.

Signed-off-by: Biao Huang 
---
 drivers/net/ethernet/stmicro/stmmac/common.h  |7 +--
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h  |4 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c |   49 -
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c  |1 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |4 ++
 5 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 1961fe9..26bbcd8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -335,6 +335,7 @@ struct dma_features {
/* 802.3az - Energy-Efficient Ethernet (EEE) */
unsigned int eee;
unsigned int av;
+   unsigned int hash_tb_sz;
unsigned int tsoen;
/* TX and RX csum */
unsigned int tx_coe;
@@ -428,9 +429,9 @@ struct mac_device_info {
struct mii_regs mii;/* MII register Addresses */
struct mac_link link;
void __iomem *pcsr; /* vpointer to device CSRs */
-   int multicast_filter_bins;
-   int unicast_filter_entries;
-   int mcast_bits_log2;
+   unsigned int multicast_filter_bins;
+   unsigned int unicast_filter_entries;
+   unsigned int mcast_bits_log2;
unsigned int rx_csum;
unsigned int pcs;
unsigned int pmt;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 01c1089..a37e09b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -18,8 +18,7 @@
 /*  MAC registers */
 #define GMAC_CONFIG0x
 #define GMAC_PACKET_FILTER 0x0008
-#define GMAC_HASH_TAB_0_31 0x0010
-#define GMAC_HASH_TAB_32_630x0014
+#define GMAC_HASH_TAB(x)   (0x10 + x * 4)
 #define GMAC_RX_FLOW_CTRL  0x0090
 #define GMAC_QX_TX_FLOW_CTRL(x)(0x70 + x * 4)
 #define GMAC_TXQ_PRTY_MAP0 0x98
@@ -184,6 +183,7 @@ enum power_event {
 #define GMAC_HW_FEAT_MIISELBIT(0)
 
 /* MAC HW features1 bitmap */
+#define GMAC_HW_HASH_TB_SZ GENMASK(25, 24)
 #define GMAC_HW_FEAT_AVSEL BIT(20)
 #define GMAC_HW_TSOEN  BIT(18)
 #define GMAC_HW_TXFIFOSIZE GENMASK(10, 6)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index 5e98da4..2544cff 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -403,41 +403,50 @@ static void dwmac4_set_filter(struct mac_device_info *hw,
  struct net_device *dev)
 {
void __iomem *ioaddr = (void __iomem *)dev->base_addr;
-   unsigned int value = 0;
+   int numhashregs = (hw->multicast_filter_bins >> 5);
+   int mcbitslog2 = hw->mcast_bits_log2;
+   unsigned int value;
+   int i;
 
+   value = readl(ioaddr + GMAC_PACKET_FILTER);
+   value &= ~GMAC_PACKET_FILTER_HMC;
+   value &= ~GMAC_PACKET_FILTER_HPF;
+   value &= ~GMAC_PACKET_FILTER_PCF;
+   value &= ~GMAC_PACKET_FILTER_PM;
+   value &= ~GMAC_PACKET_FILTER_PR;
if (dev->flags & IFF_PROMISC) {
value = GMAC_PACKET_FILTER_PR | GMAC_PACKET_FILTER_PCF;
} else if ((dev->flags & IFF_ALLMULTI) ||
-   (netdev_mc_count(dev) > HASH_TABLE_SIZE)) {
+  (netdev_mc_count(dev) > hw->multicast_filter_bins)) {
/* Pass all multi */
-   value = GMAC_PACKET_FILTER_PM;
-   /* Set the 64 bits of the HASH tab. To be updated if taller
-* hash table is used
-*/
-   writel(0x, ioaddr + GMAC_HASH_TAB_0_31);
-   writel(0x, ioaddr + GMAC_HASH_TAB_32_63);
+   value |= GMAC_PACKET_FILTER_PM;
+   /* Set all the bits of the HASH tab */
+   for (i = 0; i < numhashregs; i++)
+   writel(0x, ioaddr + GMAC_HASH_TAB(i));
} else if (!netdev_mc_empty(dev)) {
-   u32 mc_filter[2];
+   u32 mc_filter[8];
struct netdev_hw_addr *ha;
 
/* Hash filter for multicast */
-   value = GMAC_PACKET_FILTER_HMC;
+   value |= GMAC_PACKET_FILTER_HMC;
 
memset(mc_filter, 0, sizeof(mc_filter));
netdev_for_each_mc_addr(ha, dev) {
-   /* The upper 6 bits of the calculated CRC are used to
-* index the content of the Hash Table Reg 0 and

[PATCH] wd719x: pass GFP_ATOMIC instead of GFP_KERNEL

2019-05-28 Thread Hariprasad Kelam

wd719x_chip_init is getting called in interrupt disabled
mode(spin_lock_irqsave) , so we need to GFP_ATOMIC instead
of GFP_KERNEL.

Issue identified by coccicheck

Signed-off-by: Hariprasad Kelam 
---
 drivers/scsi/wd719x.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/wd719x.c b/drivers/scsi/wd719x.c
index c2f4006..f300fd7 100644
--- a/drivers/scsi/wd719x.c
+++ b/drivers/scsi/wd719x.c
@@ -319,7 +319,7 @@ static int wd719x_chip_init(struct wd719x *wd)
 
if (!wd->fw_virt)
wd->fw_virt = dma_alloc_coherent(&wd->pdev->dev, wd->fw_size,
-&wd->fw_phys, GFP_KERNEL);
+&wd->fw_phys, GFP_ATOMIC);
if (!wd->fw_virt) {
ret = -ENOMEM;
goto wd719x_init_end;
-- 
2.7.4

RE: [EXT] Re: Issue: regmap: use debugfs even when no device

2019-05-28 Thread Andy Duan

From: Mark Brown  Sent: Tuesday, May 28, 2019 9:27 PM
> On Tue, May 28, 2019 at 02:20:15AM +, Andy Duan wrote:
> 
> > So on i.MX8MM/8QM/8QXP platforms, we catch the issue that user dump
> > regmap registers without power cause system hang.
> > Maybe revert the patch is more reasonable ?
> 
> This is an issue with or without a device - you can have the same issue with
> devices that are powered off.  Typically where power is dynamic the driver
> will use a register cache so the registers are always available.

Correct, regmap without device also has issue when power if off, because regmap
doesn't implement runtime pm for the device, but maybe device driver implement
the runtime pm for the device. 

So regmap how to manage the clock and power when access registers by debugfs ?

Andy

[PATCH] dm-init: fix 2 incorrect use of kstrndup()

2019-05-28 Thread Gen Zhang

In drivers/md/dm-init.c, kstrndup() is incorrectly used twice.

It should be: char *kstrndup(const char *s, size_t max, gfp_t gfp);

Signed-off-by: Gen Zhang 
---
diff --git a/drivers/md/dm-init.c b/drivers/md/dm-init.c
index 352e803..526e261 100644
--- a/drivers/md/dm-init.c
+++ b/drivers/md/dm-init.c
@@ -140,8 +140,8 @@ static char __init *dm_parse_table_entry(struct dm_device 
*dev, char *str)
return ERR_PTR(-EINVAL);
}
/* target_args */
-   dev->target_args_array[n] = kstrndup(field[3], GFP_KERNEL,
-DM_MAX_STR_SIZE);
+   dev->target_args_array[n] = kstrndup(field[3], DM_MAX_STR_SIZE,
+   GFP_KERNEL);
if (!dev->target_args_array[n])
return ERR_PTR(-ENOMEM);
 
@@ -275,7 +275,7 @@ static int __init dm_init_init(void)
DMERR("Argument is too big. Limit is %d\n", DM_MAX_STR_SIZE);
return -EINVAL;
}
-   str = kstrndup(create, GFP_KERNEL, DM_MAX_STR_SIZE);
+   str = kstrndup(create, DM_MAX_STR_SIZE, GFP_KERNEL);
if (!str)
return -ENOMEM;
 
---

Re: [PATCH -next] EDAC: aspeed: Remove set but not used variable 'np'

2019-05-28 Thread Andrew Jeffery




On Sun, 26 May 2019, at 00:12, YueHaibing wrote:
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/edac/aspeed_edac.c: In function aspeed_probe:
> drivers/edac/aspeed_edac.c:284:22: warning: variable np set but not 
> used [-Wunused-but-set-variable]
> 
> It is never used and can be removed.
> 
> Signed-off-by: YueHaibing 

Reviewed-by: Andrew Jeffery 

> ---
>  drivers/edac/aspeed_edac.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/edac/aspeed_edac.c b/drivers/edac/aspeed_edac.c
> index 11833c0a5d07..5634437bb39d 100644
> --- a/drivers/edac/aspeed_edac.c
> +++ b/drivers/edac/aspeed_edac.c
> @@ -281,15 +281,11 @@ static int aspeed_probe(struct platform_device *pdev)
>   struct device *dev = &pdev->dev;
>   struct edac_mc_layer layers[2];
>   struct mem_ctl_info *mci;
> - struct device_node *np;
>   struct resource *res;
>   void __iomem *regs;
>   u32 reg04;
>   int rc;
>  
> - /* setup regmap */
> - np = dev->of_node;
> -
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>   if (!res)
>   return -ENOENT;
> -- 
> 2.17.1
> 
> 
>

Re: [RFC PATCH 0/3] Make deferred split shrinker memcg aware

2019-05-28 Thread David Rientjes

On Tue, 28 May 2019, Yang Shi wrote:

> 
> I got some reports from our internal application team about memcg OOM.
> Even though the application has been killed by oom killer, there are
> still a lot THPs reside, page reclaim doesn't reclaim them at all.
> 
> Some investigation shows they are on deferred split queue, memcg direct
> reclaim can't shrink them since THP deferred split shrinker is not memcg
> aware, this may cause premature OOM in memcg.  The issue can be
> reproduced easily by the below test:
> 

Right, we've also encountered this.  I talked to Kirill about it a week or 
so ago where the suggestion was to split all compound pages on the 
deferred split queues under the presence of even memory pressure.

That breaks cgroup isolation and perhaps unfairly penalizes workloads that 
are running attached to other memcg hierarchies that are not under 
pressure because their compound pages are now split as a side effect.  
There is a benefit to keeping these compound pages around while not under 
memory pressure if all pages are subsequently mapped again.

> $ cgcreate -g memory:thp
> $ echo 4G > /sys/fs/cgroup/memory/thp/memory/limit_in_bytes
> $ cgexec -g memory:thp ./transhuge-stress 4000
> 
> transhuge-stress comes from kernel selftest.
> 
> It is easy to hit OOM, but there are still a lot THP on the deferred split
> queue, memcg direct reclaim can't touch them since the deferred split
> shrinker is not memcg aware.
> 

Yes, we have seen this on at least 4.15 as well.

> Convert deferred split shrinker memcg aware by introducing per memcg deferred
> split queue.  The THP should be on either per node or per memcg deferred
> split queue if it belongs to a memcg.  When the page is immigrated to the
> other memcg, it will be immigrated to the target memcg's deferred split queue
> too.
> 
> And, move deleting THP from deferred split queue in page free before memcg
> uncharge so that the page's memcg information is available.
> 
> Reuse the second tail page's deferred_list for per memcg list since the same
> THP can't be on multiple deferred split queues at the same time.
> 
> Remove THP specific destructor since it is not used anymore with memcg aware
> THP shrinker (Please see the commit log of patch 2/3 for the details).
> 
> Make deferred split shrinker not depend on memcg kmem since it is not slab.
> It doesn't make sense to not shrink THP even though memcg kmem is disabled.
> 
> With the above change the test demonstrated above doesn't trigger OOM anymore
> even though with cgroup.memory=nokmem.
> 

I'm curious if your internal applications team is also asking for 
statistics on how much memory can be freed if the deferred split queues 
can be shrunk?  We have applications that monitor their own memory usage 
through memcg stats or usage and proactively try to reduce that usage when 
it is growing too large.  The deferred split queues have significantly 
increased both memcg usage and rss when they've upgraded kernels.

How are your applications monitoring how much memory from deferred split 
queues can be freed on memory pressure?  Any thoughts on providing it as a 
memcg stat?

Thanks!

[UPSTREAM KERNEL] mm/zsmalloc.c: Add module parameter malloc_force_movable

2019-05-28 Thread Hui Zhu

zswap compresses swap pages into a dynamically allocated RAM-based
memory pool.  The memory pool should be zbud, z3fold or zsmalloc.
All of them will allocate unmovable pages.  It will increase the
number of unmovable page blocks that will bad for anti-fragment.

zsmalloc support page migration if request movable page:
handle = zs_malloc(zram->mem_pool, comp_len,
GFP_NOIO | __GFP_HIGHMEM |
__GFP_MOVABLE);

This commit adds module parameter malloc_force_movable to enable
or disable zs_malloc force allocate block with gfp
__GFP_HIGHMEM | __GFP_MOVABLE (disabled by default).

Following part is test log in a pc that has 8G memory and 2G swap.

When it disabled:
~# echo lz4 > /sys/module/zswap/parameters/compressor
~# echo zsmalloc > /sys/module/zswap/parameters/zpool
~# echo 1 > /sys/module/zswap/parameters/enabled
~# swapon /swapfile
~# cd /home/teawater/kernel/vm-scalability/
/home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 
1024))
/home/teawater/kernel/vm-scalability# ./case-anon-w-seq
2717908992 bytes / 4410183 usecs = 601836 KB/s
2717908992 bytes / 4524375 usecs = 586646 KB/s
2717908992 bytes / 4558583 usecs = 582244 KB/s
2717908992 bytes / 4824261 usecs = 550179 KB/s
348046 usecs to free memory
401680 usecs to free memory
369660 usecs to free memory
180867 usecs to free memory
/home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order   0  1  2  3  4  
5  6  7  8  9 10
Node0, zone  DMA, typeUnmovable  1  1  1  0  2  
1  1  0  1  0  0
Node0, zone  DMA, type  Movable  0  0  0  0  0  
0  0  0  0  1  3
Node0, zone  DMA, type  Reclaimable  0  0  0  0  0  
0  0  0  0  0  0
Node0, zone  DMA, type   HighAtomic  0  0  0  0  0  
0  0  0  0  0  0
Node0, zone  DMA, type  CMA  0  0  0  0  0  
0  0  0  0  0  0
Node0, zone  DMA, type  Isolate  0  0  0  0  0  
0  0  0  0  0  0
Node0, zoneDMA32, typeUnmovable 13 11 10 11 10  
6  7  3  1  0  0
Node0, zoneDMA32, type  Movable 36 26 39 40 37  
   36 24 29 14  6767
Node0, zoneDMA32, type  Reclaimable  0  0  0  0  0  
0  0  0  0  0  1
Node0, zoneDMA32, type   HighAtomic  0  0  0  0  0  
0  0  0  0  0  0
Node0, zoneDMA32, type  CMA  0  0  0  0  0  
0  0  0  0  0  0
Node0, zoneDMA32, type  Isolate  0  0  0  0  0  
0  0  0  0  0  0
Node0, zone   Normal, typeUnmovable   7744   7519   6900   5964   4583  
 2878   1346448146  1  0
Node0, zone   Normal, type  Movable645   1930   1685   1339   1020  
  670363210106310399
Node0, zone   Normal, type  Reclaimable 53 70116 48 13  
0  0  0  0  0  0
Node0, zone   Normal, type   HighAtomic  0  0  0  0  0  
0  0  0  0  0  0
Node0, zone   Normal, type  CMA  0  0  0  0  0  
0  0  0  0  0  0
Node0, zone   Normal, type  Isolate  0  0  0  0  0  
0  0  0  0  0  0

Number of blocks type Unmovable  Movable  Reclaimable   HighAtomic  
CMA  Isolate
Node 0, zone  DMA1700   
 00
Node 0, zoneDMA324 165020   
 00
Node 0, zone   Normal  947 1469   150   
 00

When it enabled:
~# echo 1 > /sys/module/zsmalloc/parameters/malloc_force_movable
~# echo lz4 > /sys/module/zswap/parameters/compressor
~# echo zsmalloc > /sys/module/zswap/parameters/zpool
~# echo 1 > /sys/module/zswap/parameters/enabled
~# swapon /swapfile
~# cd /home/teawater/kernel/vm-scalability/
/home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 
1024))
/home/teawater/kernel/vm-scalability# ./case-anon-w-seq
2717908992 bytes / 4779235 usecs = 555362 KB/s
2717908992 bytes / 4856673 usecs = 546507 KB/s
2717908992 bytes / 4920079 usecs = 539464 KB/s
2717908992 bytes / 4935505 usecs = 537778 KB/s
354839 usecs to free memory
368167 usecs to free memory
355460 usecs to free memory
385452 usecs to free memory
/home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo

Re: [PATCH] ARM: dts: aspeed: g4: add video engine support

2019-05-28 Thread Andrew Jeffery




On Mon, 27 May 2019, at 20:58, Alexander Filippov wrote:
> Add a node to describe the video engine and VGA scratch registers on
> AST2400.
> 
> These changes were copied from aspeed-g5.dtsi
> 
> Signed-off-by: Alexander Filippov 

Ugh, I should really sort out the bmc-misc stuff, I don't like to see it 
propagate
in its current form. That's not your problem though, and I hope to address it in
the near future.

For the OpenBMC kernel tree:

Acked-by: Andrew Jeffery 

> ---
>  arch/arm/boot/dts/aspeed-g4.dtsi | 62 
>  1 file changed, 62 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/aspeed-g4.dtsi 
> b/arch/arm/boot/dts/aspeed-g4.dtsi
> index 6011692df15a..adc1804918df 100644
> --- a/arch/arm/boot/dts/aspeed-g4.dtsi
> +++ b/arch/arm/boot/dts/aspeed-g4.dtsi
> @@ -168,6 +168,10 @@
>   compatible = "aspeed,g4-pinctrl";
>   };
>  
> + vga_scratch: scratch {
> + compatible = "aspeed,bmc-misc";
> + };
> +
>   p2a: p2a-control {
>   compatible = "aspeed,ast2400-p2a-ctrl";
>   status = "disabled";
> @@ -195,6 +199,16 @@
>   reg = <0x1e72 0x8000>;  // 32K
>   };
>  
> + video: video@1e70 {
> + compatible = "aspeed,ast2400-video-engine";
> + reg = <0x1e70 0x1000>;
> + clocks = <&syscon ASPEED_CLK_GATE_VCLK>,
> +  <&syscon ASPEED_CLK_GATE_ECLK>;
> + clock-names = "vclk", "eclk";
> + interrupts = <7>;
> + status = "disabled";
> + };
> +
>   gpio: gpio@1e78 {
>   #gpio-cells = <2>;
>   gpio-controller;
> @@ -1408,6 +1422,54 @@
>   };
>  };
>  
> +&vga_scratch {
> + dac_mux {
> + offset = <0x2c>;
> + bit-mask = <0x3>;
> + bit-shift = <16>;
> + };
> + vga0 {
> + offset = <0x50>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> + vga1 {
> + offset = <0x54>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> + vga2 {
> + offset = <0x58>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> + vga3 {
> + offset = <0x5c>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> + vga4 {
> + offset = <0x60>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> + vga5 {
> + offset = <0x64>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> + vga6 {
> + offset = <0x68>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> + vga7 {
> + offset = <0x6c>;
> + bit-mask = <0x>;
> + bit-shift = <0>;
> + };
> +};
> +
>  &sio_regs {
>   sio_2b {
>   offset = <0xf0>;
> -- 
> 2.20.1
> 
>

[PATCH] signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO

2019-05-28 Thread Eric W. Biederman



Recently syzbot in conjunction with KMSAN reported that
ptrace_peek_siginfo can copy an uninitialized siginfo to userspace.
Inspecting ptrace_peek_siginfo confirms this.

The problem is that off when initialized from args.off can be
initialized to a negaive value.  At which point the "if (off >= 0)"
test to see if off became negative fails because off started off
negative.

Prevent the core problem by adding a variable found that is only true
if a siginfo is found and copied to a temporary in preparation for
being copied to userspace.

Prevent args.off from being truncated when being assigned to off by
testing that off is <= the maximum possible value of off.  Convert off
to an unsigned long so that we should not have to truncate args.off,
we have well defined overflow behavior so if we add another check we
won't risk fighting undefined compiler behavior, and so that we have a
type whose maximum value is easy to test for.

Cc: Andrei Vagin 
Cc: sta...@vger.kernel.org
Reported-by: syzbot+0d602a1b0d8c95bdf...@syzkaller.appspotmail.com
Fixes: 84c751bd4aeb ("ptrace: add ability to retrieve signals without removing 
from a queue (v4)")
Signed-off-by: "Eric W. Biederman" 
---

Comments?
Concerns?

Otherwise I will queue this up and send it to Linus.

 kernel/ptrace.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 6f357f4fc859..4c2b24a885d3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -704,6 +704,10 @@ static int ptrace_peek_siginfo(struct task_struct *child,
if (arg.nr < 0)
return -EINVAL;
 
+   /* Ensure arg.off fits in an unsigned */
+   if (arg.off > ULONG_MAX)
+   return 0;
+
if (arg.flags & PTRACE_PEEKSIGINFO_SHARED)
pending = &child->signal->shared_pending;
else
@@ -711,18 +715,20 @@ static int ptrace_peek_siginfo(struct task_struct *child,
 
for (i = 0; i < arg.nr; ) {
kernel_siginfo_t info;
-   s32 off = arg.off + i;
+   unsigned long off = arg.off + i;
+   bool found = false;
 
spin_lock_irq(&child->sighand->siglock);
list_for_each_entry(q, &pending->list, list) {
if (!off--) {
+   found = true;
copy_siginfo(&info, &q->info);
break;
}
}
spin_unlock_irq(&child->sighand->siglock);
 
-   if (off >= 0) /* beyond the end of the list */
+   if (!found) /* beyond the end of the list */
break;
 
 #ifdef CONFIG_COMPAT
-- 
2.21.0.dirty

Re: [PATCH v2] qcom: apr: Make apr callbacks in non-atomic context

2019-05-28 Thread Bjorn Andersson

On Fri 08 Feb 09:55 PST 2019, Srinivas Kandagatla wrote:

> APR communication with DSP is not atomic in nature.
> Its request-response type. Trying to pretend that these are atomic
> and invoking apr client callbacks directly under atomic/irq context has
> endless issues with soundcard. It makes more sense to convert these
> to nonatomic calls. This also coverts all the dais to be nonatomic.
> 
> All the callbacks are now invoked as part of rx work queue.
> 
> Signed-off-by: Srinivas Kandagatla 
> Reviewed-by: Bjorn Andersson 

Picked up

Thanks,
Bjorn

> ---
> Changes since v1:
>  - flush and destroy work queue after removing the device
>to avoid active communication from device. suggested by Bjorn.
> 
>  drivers/soc/qcom/apr.c | 74 +++---
>  1 file changed, 69 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/soc/qcom/apr.c b/drivers/soc/qcom/apr.c
> index 74f8b9607daa..039e3aa6f5e0 100644
> --- a/drivers/soc/qcom/apr.c
> +++ b/drivers/soc/qcom/apr.c
> @@ -8,6 +8,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -17,8 +18,18 @@ struct apr {
>   struct rpmsg_endpoint *ch;
>   struct device *dev;
>   spinlock_t svcs_lock;
> + spinlock_t rx_lock;
>   struct idr svcs_idr;
>   int dest_domain_id;
> + struct workqueue_struct *rxwq;
> + struct work_struct rx_work;
> + struct list_head rx_list;
> +};
> +
> +struct apr_rx_buf {
> + struct list_head node;
> + int len;
> + uint8_t buf[];
>  };
>  
>  /**
> @@ -62,11 +73,7 @@ static int apr_callback(struct rpmsg_device *rpdev, void 
> *buf,
> int len, void *priv, u32 addr)
>  {
>   struct apr *apr = dev_get_drvdata(&rpdev->dev);
> - uint16_t hdr_size, msg_type, ver, svc_id;
> - struct apr_device *svc = NULL;
> - struct apr_driver *adrv = NULL;
> - struct apr_resp_pkt resp;
> - struct apr_hdr *hdr;
> + struct apr_rx_buf *abuf;
>   unsigned long flags;
>  
>   if (len <= APR_HDR_SIZE) {
> @@ -75,6 +82,34 @@ static int apr_callback(struct rpmsg_device *rpdev, void 
> *buf,
>   return -EINVAL;
>   }
>  
> + abuf = kzalloc(sizeof(*abuf) + len, GFP_ATOMIC);
> + if (!abuf)
> + return -ENOMEM;
> +
> + abuf->len = len;
> + memcpy(abuf->buf, buf, len);
> +
> + spin_lock_irqsave(&apr->rx_lock, flags);
> + list_add_tail(&abuf->node, &apr->rx_list);
> + spin_unlock_irqrestore(&apr->rx_lock, flags);
> +
> + queue_work(apr->rxwq, &apr->rx_work);
> +
> + return 0;
> +}
> +
> +
> +static int apr_do_rx_callback(struct apr *apr, struct apr_rx_buf *abuf)
> +{
> + uint16_t hdr_size, msg_type, ver, svc_id;
> + struct apr_device *svc = NULL;
> + struct apr_driver *adrv = NULL;
> + struct apr_resp_pkt resp;
> + struct apr_hdr *hdr;
> + unsigned long flags;
> + void *buf = abuf->buf;
> + int len = abuf->len;
> +
>   hdr = buf;
>   ver = APR_HDR_FIELD_VER(hdr->hdr_field);
>   if (ver > APR_PKT_VER + 1)
> @@ -132,6 +167,23 @@ static int apr_callback(struct rpmsg_device *rpdev, void 
> *buf,
>   return 0;
>  }
>  
> +static void apr_rxwq(struct work_struct *work)
> +{
> + struct apr *apr = container_of(work, struct apr, rx_work);
> + struct apr_rx_buf *abuf, *b;
> + unsigned long flags;
> +
> + if (!list_empty(&apr->rx_list)) {
> + list_for_each_entry_safe(abuf, b, &apr->rx_list, node) {
> + apr_do_rx_callback(apr, abuf);
> + spin_lock_irqsave(&apr->rx_lock, flags);
> + list_del(&abuf->node);
> + spin_unlock_irqrestore(&apr->rx_lock, flags);
> + kfree(abuf);
> + }
> + }
> +}
> +
>  static int apr_device_match(struct device *dev, struct device_driver *drv)
>  {
>   struct apr_device *adev = to_apr_device(dev);
> @@ -285,6 +337,14 @@ static int apr_probe(struct rpmsg_device *rpdev)
>   dev_set_drvdata(dev, apr);
>   apr->ch = rpdev->ept;
>   apr->dev = dev;
> + apr->rxwq = create_singlethread_workqueue("qcom_apr_rx");
> + if (!apr->rxwq) {
> + dev_err(apr->dev, "Failed to start Rx WQ\n");
> + return -ENOMEM;
> + }
> + INIT_WORK(&apr->rx_work, apr_rxwq);
> + INIT_LIST_HEAD(&apr->rx_list);
> + spin_lock_init(&apr->rx_lock);
>   spin_lock_init(&apr->svcs_lock);
>   idr_init(&apr->svcs_idr);
>   of_register_apr_devices(dev);
> @@ -303,7 +363,11 @@ static int apr_remove_device(struct device *dev, void 
> *null)
>  
>  static void apr_remove(struct rpmsg_device *rpdev)
>  {
> + struct apr *apr = dev_get_drvdata(&rpdev->dev);
> +
>   device_for_each_child(&rpdev->dev, NULL, apr_remove_device);
> + flush_workqueue(apr->rxwq);
> + destroy_workqueue(apr->rxwq);
>  }
>  
>  /*
> -- 
> 2.20.1
>

Re: [PATCH v3 0/2] Qualcomm PCIe2 PHY

2019-05-28 Thread Bjorn Andersson

On Wed 01 May 17:14 PDT 2019, Bjorn Andersson wrote:

> The Qualcomm PCIe2 PHY is based on design from Synopsys and found in
> several different platforms where the QMP PHY isn't used.
> 

Kishon, any feedback on this or would you be willing to pick it up?

Regards,
Bjorn

> Bjorn Andersson (2):
>   dt-bindings: phy: Add binding for Qualcomm PCIe2 PHY
>   phy: qcom: Add Qualcomm PCIe2 PHY driver
> 
>  .../bindings/phy/qcom-pcie2-phy.txt   |  42 +++
>  drivers/phy/qualcomm/Kconfig  |   8 +
>  drivers/phy/qualcomm/Makefile |   1 +
>  drivers/phy/qualcomm/phy-qcom-pcie2.c | 331 ++
>  4 files changed, 382 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/phy/qcom-pcie2-phy.txt
>  create mode 100644 drivers/phy/qualcomm/phy-qcom-pcie2.c
> 
> -- 
> 2.18.0
>

Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket

2019-05-28 Thread Jason Wang




On 2019/5/29 上午12:45, Stefano Garzarella wrote:

On Wed, May 15, 2019 at 10:48:44AM +0800, Jason Wang wrote:

On 2019/5/15 上午12:35, Stefano Garzarella wrote:

On Tue, May 14, 2019 at 11:25:34AM +0800, Jason Wang wrote:

On 2019/5/14 上午1:23, Stefano Garzarella wrote:

On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:

On 2019/5/10 下午8:58, Stefano Garzarella wrote:

+static struct virtio_vsock_buf *
+virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
+{
+   struct virtio_vsock_buf *buf;
+
+   if (pkt->len == 0)
+   return NULL;
+
+   buf = kzalloc(sizeof(*buf), GFP_KERNEL);
+   if (!buf)
+   return NULL;
+
+   /* If the buffer in the virtio_vsock_pkt is full, we can move it to
+* the new virtio_vsock_buf avoiding the copy, because we are sure that
+* we are not use more memory than that counted by the credit mechanism.
+*/
+   if (zero_copy && pkt->len == pkt->buf_len) {
+   buf->addr = pkt->buf;
+   pkt->buf = NULL;
+   } else {

Is the copy still needed if we're just few bytes less? We meet similar issue
for virito-net, and virtio-net solve this by always copy first 128bytes for
big packets.

See receive_big()

I'm seeing, It is more sophisticated.
IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
first 128 bytes, then adds the buffer used to receive the packet as a frag to
the skb.

Yes and the point is if the packet is smaller than 128 bytes the pages will
be recycled.



So it's avoid the overhead of allocation of a large buffer. I got it.

Just a curiosity, why the threshold is 128 bytes?


 From its name (GOOD_COPY_LEN), I think it just a value that won't lose much
performance, e.g the size two cachelines.


Jason, Stefan,
since I'm removing the patches to increase the buffers to 64 KiB and I'm
adding a threshold for small packets, I would simplify this patch,
removing the new buffer allocation and copying small packets into the
buffers already queued (if there is a space).
In this way, I should solve the issue of 1 byte packets.

Do you think could be better?



I think so.

Thanks




Thanks,
Stefano

Re: [PATCH v3 1/3] PCI: qcom: Use clk_bulk API for 2.4.0 controllers

2019-05-28 Thread Bjorn Andersson

On Thu 16 May 02:14 PDT 2019, Stanimir Varbanov wrote:

> Hi Bjorn,
> 
> On 5/2/19 3:19 AM, Bjorn Andersson wrote:
> > Before introducing the QCS404 platform, which uses the same PCIe
> > controller as IPQ4019, migrate this to use the bulk clock API, in order
> > to make the error paths slighly cleaner.
> > 
> > Acked-by: Stanimir Varbanov 
> > Reviewed-by: Niklas Cassel 
> > Signed-off-by: Bjorn Andersson 
> > ---
> > 
> > Changes since v2:
> > - Defined QCOM_PCIE_2_4_0_MAX_CLOCKS
> > 
> >  drivers/pci/controller/dwc/pcie-qcom.c | 49 --
> >  1 file changed, 14 insertions(+), 35 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/pcie-qcom.c 
> > b/drivers/pci/controller/dwc/pcie-qcom.c
> > index 0ed235d560e3..d740cbe0e56d 100644
> > --- a/drivers/pci/controller/dwc/pcie-qcom.c
> > +++ b/drivers/pci/controller/dwc/pcie-qcom.c
> > @@ -112,10 +112,10 @@ struct qcom_pcie_resources_2_3_2 {
> > struct regulator_bulk_data supplies[QCOM_PCIE_2_3_2_MAX_SUPPLY];
> >  };
> >  
> > +#define QCOM_PCIE_2_4_0_MAX_CLOCKS 3
> >  struct qcom_pcie_resources_2_4_0 {
> > -   struct clk *aux_clk;
> > -   struct clk *master_clk;
> > -   struct clk *slave_clk;
> > +   struct clk_bulk_data clks[QCOM_PCIE_2_4_0_MAX_CLOCKS];
> > +   int num_clks;
> > struct reset_control *axi_m_reset;
> > struct reset_control *axi_s_reset;
> > struct reset_control *pipe_reset;
> > @@ -638,18 +638,17 @@ static int qcom_pcie_get_resources_2_4_0(struct 
> > qcom_pcie *pcie)
> > struct qcom_pcie_resources_2_4_0 *res = &pcie->res.v2_4_0;
> > struct dw_pcie *pci = pcie->pci;
> > struct device *dev = pci->dev;
> > +   int ret;
> >  
> > -   res->aux_clk = devm_clk_get(dev, "aux");
> > -   if (IS_ERR(res->aux_clk))
> > -   return PTR_ERR(res->aux_clk);
> > +   res->clks[0].id = "aux";
> > +   res->clks[1].id = "master_bus";
> > +   res->clks[2].id = "slave_bus";
> >  
> > -   res->master_clk = devm_clk_get(dev, "master_bus");
> > -   if (IS_ERR(res->master_clk))
> > -   return PTR_ERR(res->master_clk);
> > +   res->num_clks = 3;
> 
> Use the new fresh define QCOM_PCIE_2_4_0_MAX_CLOCKS?
> 

As I replace it in patch 3/3 with a value different from "max clocks", I
don't think it makes sense to use the define here. So I'm leaving this
as is.

> >  
> > -   res->slave_clk = devm_clk_get(dev, "slave_bus");
> > -   if (IS_ERR(res->slave_clk))
> > -   return PTR_ERR(res->slave_clk);
> > +   ret = devm_clk_bulk_get(dev, res->num_clks, res->clks);
> > +   if (ret < 0)
> > +   return ret;
> >  
> > res->axi_m_reset = devm_reset_control_get_exclusive(dev, "axi_m");
> > if (IS_ERR(res->axi_m_reset))
> > @@ -719,9 +718,7 @@ static void qcom_pcie_deinit_2_4_0(struct qcom_pcie 
> > *pcie)
> > reset_control_assert(res->axi_m_sticky_reset);
> > reset_control_assert(res->pwr_reset);
> > reset_control_assert(res->ahb_reset);
> > -   clk_disable_unprepare(res->aux_clk);
> > -   clk_disable_unprepare(res->master_clk);
> > -   clk_disable_unprepare(res->slave_clk);
> > +   clk_bulk_disable_unprepare(res->num_clks, res->clks);
> >  }
> >  
> >  static int qcom_pcie_init_2_4_0(struct qcom_pcie *pcie)
> > @@ -850,23 +847,9 @@ static int qcom_pcie_init_2_4_0(struct qcom_pcie *pcie)
> >  
> > usleep_range(1, 12000);
> >  
> > -   ret = clk_prepare_enable(res->aux_clk);
> > -   if (ret) {
> > -   dev_err(dev, "cannot prepare/enable iface clock\n");
> > +   ret = clk_bulk_prepare_enable(res->num_clks, res->clks);
> > +   if (ret)
> > goto err_clk_aux;
> 
> Maybe you have to change the name of the label too?
> 

Updated this and posted v5. Should be good to be merged now.

Thanks for your reviews!

Regards,
Bjorn

[PATCH v5 3/3] PCI: qcom: Add QCS404 PCIe controller support

2019-05-28 Thread Bjorn Andersson

The QCS404 platform contains a PCIe controller of version 2.4.0 and a
Qualcomm PCIe2 PHY. The driver already supports version 2.4.0, for the
IPQ4019, but this support touches clocks and resets related to the PHY
as well, and there's no upstream driver for the PHY.

On QCS404 we must initialize the PHY, so a separate PHY driver is
implemented to take care of this and the controller driver is updated to
not require the PHY related resources. This is done by relying on the
fact that operations in both the clock and reset framework are nops when
passed NULL, so we can isolate this change to only the get_resource
function.

For QCS404 we also need to enable the AHB (iface) clock, in order to
access the register space of the controller, but as this is not part of
the IPQ4019 DT binding this is only added for new users of the 2.4.0
controller.

Acked-by: Stanimir Varbanov 
Reviewed-by: Niklas Cassel 
Reviewed-by: Vinod Koul 
Signed-off-by: Bjorn Andersson 
---

Changes since v4:
- Picked up Vinod's r-b and Stanimir's a-b

 drivers/pci/controller/dwc/pcie-qcom.c | 64 +++---
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-qcom.c 
b/drivers/pci/controller/dwc/pcie-qcom.c
index 23dc01212508..da5dd3639a49 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -112,7 +112,7 @@ struct qcom_pcie_resources_2_3_2 {
struct regulator_bulk_data supplies[QCOM_PCIE_2_3_2_MAX_SUPPLY];
 };
 
-#define QCOM_PCIE_2_4_0_MAX_CLOCKS 3
+#define QCOM_PCIE_2_4_0_MAX_CLOCKS 4
 struct qcom_pcie_resources_2_4_0 {
struct clk_bulk_data clks[QCOM_PCIE_2_4_0_MAX_CLOCKS];
int num_clks;
@@ -638,13 +638,16 @@ static int qcom_pcie_get_resources_2_4_0(struct qcom_pcie 
*pcie)
struct qcom_pcie_resources_2_4_0 *res = &pcie->res.v2_4_0;
struct dw_pcie *pci = pcie->pci;
struct device *dev = pci->dev;
+   bool is_ipq = of_device_is_compatible(dev->of_node, 
"qcom,pcie-ipq4019");
int ret;
 
res->clks[0].id = "aux";
res->clks[1].id = "master_bus";
res->clks[2].id = "slave_bus";
+   res->clks[3].id = "iface";
 
-   res->num_clks = 3;
+   /* qcom,pcie-ipq4019 is defined without "iface" */
+   res->num_clks = is_ipq ? 3 : 4;
 
ret = devm_clk_bulk_get(dev, res->num_clks, res->clks);
if (ret < 0)
@@ -658,27 +661,33 @@ static int qcom_pcie_get_resources_2_4_0(struct qcom_pcie 
*pcie)
if (IS_ERR(res->axi_s_reset))
return PTR_ERR(res->axi_s_reset);
 
-   res->pipe_reset = devm_reset_control_get_exclusive(dev, "pipe");
-   if (IS_ERR(res->pipe_reset))
-   return PTR_ERR(res->pipe_reset);
-
-   res->axi_m_vmid_reset = devm_reset_control_get_exclusive(dev,
-"axi_m_vmid");
-   if (IS_ERR(res->axi_m_vmid_reset))
-   return PTR_ERR(res->axi_m_vmid_reset);
-
-   res->axi_s_xpu_reset = devm_reset_control_get_exclusive(dev,
-   "axi_s_xpu");
-   if (IS_ERR(res->axi_s_xpu_reset))
-   return PTR_ERR(res->axi_s_xpu_reset);
-
-   res->parf_reset = devm_reset_control_get_exclusive(dev, "parf");
-   if (IS_ERR(res->parf_reset))
-   return PTR_ERR(res->parf_reset);
-
-   res->phy_reset = devm_reset_control_get_exclusive(dev, "phy");
-   if (IS_ERR(res->phy_reset))
-   return PTR_ERR(res->phy_reset);
+   if (is_ipq) {
+   /*
+* These resources relates to the PHY or are secure clocks, but
+* are controlled here for IPQ4019
+*/
+   res->pipe_reset = devm_reset_control_get_exclusive(dev, "pipe");
+   if (IS_ERR(res->pipe_reset))
+   return PTR_ERR(res->pipe_reset);
+
+   res->axi_m_vmid_reset = devm_reset_control_get_exclusive(dev,
+
"axi_m_vmid");
+   if (IS_ERR(res->axi_m_vmid_reset))
+   return PTR_ERR(res->axi_m_vmid_reset);
+
+   res->axi_s_xpu_reset = devm_reset_control_get_exclusive(dev,
+   
"axi_s_xpu");
+   if (IS_ERR(res->axi_s_xpu_reset))
+   return PTR_ERR(res->axi_s_xpu_reset);
+
+   res->parf_reset = devm_reset_control_get_exclusive(dev, "parf");
+   if (IS_ERR(res->parf_reset))
+   return PTR_ERR(res->parf_reset);
+
+   res->phy_reset = devm_reset_control_get_exclusive(dev, "phy");
+   if (IS_ERR(res->phy_reset))
+   return PTR_ERR(res->phy_reset);
+   }
 
res->axi_m_sticky_reset = devm_reset_control_get_exclusive(dev,

[PATCH v5 2/3] dt-bindings: PCI: qcom: Add QCS404 to the binding

2019-05-28 Thread Bjorn Andersson

The Qualcomm QCS404 platform contains a PCIe controller, add this to the
Qualcomm PCI binding document. The controller is the same version as the
one used in IPQ4019, but the PHY part is described separately, hence the
difference in clocks and resets.

Reviewed-by: Rob Herring 
Reviewed-by: Vinod Koul 
Signed-off-by: Bjorn Andersson 
---

Changes since v4:
- Picked up Vinod's r-b

 .../devicetree/bindings/pci/qcom,pcie.txt | 25 +--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/pci/qcom,pcie.txt 
b/Documentation/devicetree/bindings/pci/qcom,pcie.txt
index 1fd703bd73e0..ada80b01bf0c 100644
--- a/Documentation/devicetree/bindings/pci/qcom,pcie.txt
+++ b/Documentation/devicetree/bindings/pci/qcom,pcie.txt
@@ -10,6 +10,7 @@
- "qcom,pcie-msm8996" for msm8996 or apq8096
- "qcom,pcie-ipq4019" for ipq4019
- "qcom,pcie-ipq8074" for ipq8074
+   - "qcom,pcie-qcs404" for qcs404
 
 - reg:
Usage: required
@@ -116,6 +117,15 @@
- "ahb" AHB clock
- "aux" Auxiliary clock
 
+- clock-names:
+   Usage: required for qcs404
+   Value type: 
+   Definition: Should contain the following entries
+   - "iface"   AHB clock
+   - "aux" Auxiliary clock
+   - "master_bus"  AXI Master clock
+   - "slave_bus"   AXI Slave clock
+
 - resets:
Usage: required
Value type: 
@@ -167,6 +177,17 @@
- "ahb" AHB Reset
- "axi_m_sticky"AXI Master Sticky reset
 
+- reset-names:
+   Usage: required for qcs404
+   Value type: 
+   Definition: Should contain the following entries
+   - "axi_m"   AXI Master reset
+   - "axi_s"   AXI Slave reset
+   - "axi_m_sticky"AXI Master Sticky reset
+   - "pipe_sticky" PIPE sticky reset
+   - "pwr" PWR reset
+   - "ahb" AHB reset
+
 - power-domains:
Usage: required for apq8084 and msm8996/apq8096
Value type: 
@@ -195,12 +216,12 @@
Definition: A phandle to the PCIe endpoint power supply
 
 - phys:
-   Usage: required for apq8084
+   Usage: required for apq8084 and qcs404
Value type: 
Definition: List of phandle(s) as listed in phy-names property
 
 - phy-names:
-   Usage: required for apq8084
+   Usage: required for apq8084 and qcs404
Value type: 
Definition: Should contain "pciephy"
 
-- 
2.18.0

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 936 matches

Mail list logo