Re: [PATCH 1/2] staging: bcm2835-audio: Check if workqueue allocation failed

2018-07-12 Thread Dan Carpenter
On Fri, Jul 13, 2018 at 09:48:16AM +0300, Dan Carpenter wrote:
> On Fri, Jul 13, 2018 at 12:54:16AM +0300, Tuomas Tynkkynen wrote:
> > @@ -424,7 +411,9 @@ int bcm2835_audio_open(struct bcm2835_alsa_stream 
> > *alsa_stream)
> > int status;
> > int ret;
> >  
> > -   my_workqueue_init(alsa_stream);
> > +   alsa_stream->my_wq = alloc_workqueue("my_queue", WQ_HIGHPRI, 1);
> > +   if (!alsa_stream->my_wq)
> > +   return -ENOMEM;
> >  
> > ret = bcm2835_audio_open_connection(alsa_stream);
> > if (ret) {
> 
> This patch is good but if bcm2835_audio_open_connection() fails then
> we need to release alsa_stream->my_wq.

Never mind, you handle it in the next patch.  The bug *was* there in the
original code as well, so that's a legit way to split the patches.

regards,
dan carpenter


Re: [PATCH v3] checkpatch: Warn if missing author Signed-off-by

2018-07-12 Thread Linus Walleij
On Thu, Jul 12, 2018 at 12:03 PM Geert Uytterhoeven
 wrote:

> Print a warning if none of the Signed-off-by lines cover the patch
> author.
>
> Non-ASCII quoted printable encoding in From: headers and (lack of)
> double quotes are handled.
> Split From: headers are not fully handled: only the first part is
> compared.
>
> Signed-off-by: Geert Uytterhoeven 
> ---
> Tested using a set of ca. 4000 real world commits.
>
> Most common offenders are people using:
>   - different email addresses for author and Sob,
>   - different firstname/lastname order, or other different name
> spelling,
>   - suse.de vs. suse.com.

Pretty cool patch!
Acked-by: Linus Walleij 

Yours,
Linus Walleij


Re: [PATCH 1/2] staging: bcm2835-audio: Check if workqueue allocation failed

2018-07-12 Thread Dan Carpenter
On Fri, Jul 13, 2018 at 12:54:16AM +0300, Tuomas Tynkkynen wrote:
> @@ -424,7 +411,9 @@ int bcm2835_audio_open(struct bcm2835_alsa_stream 
> *alsa_stream)
>   int status;
>   int ret;
>  
> - my_workqueue_init(alsa_stream);
> + alsa_stream->my_wq = alloc_workqueue("my_queue", WQ_HIGHPRI, 1);
> + if (!alsa_stream->my_wq)
> + return -ENOMEM;
>  
>   ret = bcm2835_audio_open_connection(alsa_stream);
>   if (ret) {

This patch is good but if bcm2835_audio_open_connection() fails then
we need to release alsa_stream->my_wq.

regards,
dan carpenter




[PATCH] fat: Fix potential shift wrap with FITRIM ioctl on FAT

2018-07-12 Thread OGAWA Hirofumi
This patch is the fix of fat-add-fitrim-ioctl-for-fat-file-system.patch.
Maybe better to merge with it (if it is easy).

Anyway, please apply this with above patch.


From: Wentao Wang 

If we keep "trimmed" as an u32, there will be a potential shift wrap.

It would be a problem on a larger than 4GB partition with
FAT32. Though most tools who call this ioctl would ignore this value,
it would be great to fix it.

Signed-off-by: Wentao Wang 
Signed-off-by: OGAWA Hirofumi 
---

 fs/fat/fatent.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN fs/fat/fatent.c~fat-fitrim-fix fs/fat/fatent.c
--- linux/fs/fat/fatent.c~fat-fitrim-fix2018-07-13 15:39:14.417110998 
+0900
+++ linux-hirofumi/fs/fat/fatent.c  2018-07-13 15:39:14.418110996 +0900
@@ -705,8 +705,8 @@ int fat_trim_fs(struct inode *inode, str
struct msdos_sb_info *sbi = MSDOS_SB(sb);
const struct fatent_operations *ops = sbi->fatent_ops;
struct fat_entry fatent;
-   u64 ent_start, ent_end, minlen;
-   u32 free = 0, trimmed = 0;
+   u64 ent_start, ent_end, minlen, trimmed = 0;
+   u32 free = 0;
unsigned long reada_blocks, reada_mask, cur_block = 0;
int err = 0;
 
_

-- 
OGAWA Hirofumi 


Re: [PATCH v2 0/4] vfs: track per-sb writeback errors and report them via fsinfo()

2018-07-12 Thread Carlos Maiolino
On Tue, Jul 10, 2018 at 10:01:23AM -0400, Jeff Layton wrote:
> v2: drop buffer.c patch to record wb errors when underlying blockdev
> flush fails. We may eventually want that, but at this point I don't have
> a clear way to test it to determine its efficacy.
> 
> At LSF/MM this year, the PostgreSQL developers mentioned that they'd
> like to have some mechanism to check whether there have been any
> writeback errors on a filesystem, without necessarily flushing any of
> the cached data first.
> 
> Given that we have a new fsinfo syscall being introduced, we may as well
> use it to report writeback errors on a per superblock basis. This allows
> us to provide the info that the PostgreSQL developers wanted, without
> needing to change an existing interface.
> 
> This seems to do the right thing when tested by hand, but I don't yet
> have an xfstest for it, since the syscall is still quite new. Once that
> goes in and we get fsinfo support in xfs_io, it should be rather
> trivial to roll a testcase for this.
> 

Whole patch sounds fine, you can add:

Reviewed-by: Carlos Maiolino 

Cheers

> Al, if this looks ok, could you pull this into the branch where you
> have David's fsinfo patches queued up?
> 
> Thanks,
> Jeff
> 
> Jeff Layton (4):
>   vfs: track per-sb writeback errors
>   errseq: add a new errseq_scrape function
>   vfs: allow fsinfo to fetch the current state of s_wb_err
>   samples: extend test-fsinfo to access error_state
> 
>  fs/statfs.c |  9 +
>  include/linux/errseq.h  |  1 +
>  include/linux/fs.h  |  3 +++
>  include/linux/pagemap.h |  5 -
>  include/uapi/linux/fsinfo.h | 11 +++
>  lib/errseq.c| 33 +++--
>  samples/statx/test-fsinfo.c | 13 +
>  7 files changed, 72 insertions(+), 3 deletions(-)
> 
> -- 
> 2.17.1
> 

-- 
Carlos


Re: [v2,3/3] i2c: at91: added slave mode support

2018-07-12 Thread Ludovic Desroches
On Thu, Jul 12, 2018 at 11:56:24PM +0200, Wolfram Sang wrote:
> 
> > Yes sure, you can add my Ack. I would be pleased to see the slave
> > support taken.
> 
> Sadly, I can't get it to apply cleanly. Could you rebase and retest?
> 

Ok I'll handle it and add my Acked-by.

Ludovic


[PATCH] PCI: Unify pci and normal dma direction definition

2018-07-12 Thread Shunyong Yang
Current DMA direction definitions in pci-dma-compat.h and dma-direction.h
are mirrored in value. Unify them to enhance readability and avoid
possible inconsistency.

Cc: Joey Zheng 
Signed-off-by: Shunyong Yang 
---
 include/linux/dma-direction.h  | 2 +-
 include/linux/pci-dma-compat.h | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/dma-direction.h b/include/linux/dma-direction.h
index 3649a031893a..9d52716e9218 100644
--- a/include/linux/dma-direction.h
+++ b/include/linux/dma-direction.h
@@ -2,7 +2,7 @@
 #ifndef _LINUX_DMA_DIRECTION_H
 #define _LINUX_DMA_DIRECTION_H
 /*
- * These definitions mirror those in pci.h, so they can be used
+ * These definitions mirror those in pci-dma-compat.h, so they can be used
  * interchangeably with their PCI_ counterparts.
  */
 enum dma_data_direction {
diff --git a/include/linux/pci-dma-compat.h b/include/linux/pci-dma-compat.h
index 0dd1a3f7b309..c1c8d49b6072 100644
--- a/include/linux/pci-dma-compat.h
+++ b/include/linux/pci-dma-compat.h
@@ -8,10 +8,10 @@
 #include 
 
 /* This defines the direction arg to the DMA mapping routines. */
-#define PCI_DMA_BIDIRECTIONAL  0
-#define PCI_DMA_TODEVICE   1
-#define PCI_DMA_FROMDEVICE 2
-#define PCI_DMA_NONE   3
+#define PCI_DMA_BIDIRECTIONAL  (DMA_BIDIRECTIONAL)
+#define PCI_DMA_TODEVICE   (DMA_TO_DEVICE)
+#define PCI_DMA_FROMDEVICE (DMA_FROM_DEVICE)
+#define PCI_DMA_NONE   (DMA_NONE)
 
 static inline void *
 pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
-- 
1.8.3.1



linux-next: Tree for Jul 13

2018-07-12 Thread Stephen Rothwell
Hi all,

Changes since 20180712:

The net-next tree gained a conflict against the net tree.

The drm-intel tree gained a build failure due to an interaction with
Linus' tree for which I added a merge fix patch.

I removed a patch from the akpm-current tree to fix the PowerPC boot
failures.

Non-merge commits (relative to Linus' tree): 5724
 5689 files changed, 206787 insertions(+), 117769 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 283 trees (counting Linus' and 65 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (63f047771621 Merge tag 'mtd/fixes-for-4.18-rc5' of 
git://git.infradead.org/linux-mtd)
Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging kbuild-current/fixes (6d79a7b424a5 kbuild: suppress warnings from 
'getconf LFS_*')
Merging arc-current/for-curr (6e3761145a9b ARC: Fix CONFIG_SWAP)
Merging arm-current/fixes (6ef09e48c2bc ARM: 8780/1: ftrace: Only set kernel 
memory back to read-only after boot)
Merging arm64-fixes/for-next/fixes (2fd8eb4ad871 arm64: neon: Fix function 
may_use_simd() return error status)
Merging m68k-current/for-linus (b12c8a70643f m68k: Set default dma mask for 
platform devices)
Merging powerpc-fixes/fixes (22db552b50fa powerpc/powermac: Fix rtc read/write 
functions)
Merging sparc/master (1aaccb5fa0ea Merge tag 'rtc-4.18' of 
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (70b7ff130224 tcp: allow user to create repair socket 
without window probes)
Merging bpf/master (c7a897843224 bpf: don't leave partial mangled prog in 
jit_subprogs error path)
Merging ipsec/master (7284fdf39a91 esp6: fix memleak on error path in 
esp6_input)
Merging netfilter/master (0026129c8629 rhashtable: add restart routine in 
rhashtable_free_and_destroy())
Merging ipvs/master (312564269535 net: netsec: reduce DMA mask to 40 bits)
Merging wireless-drivers/master (248c690a2dc8 Merge tag 
'wireless-drivers-for-davem-2018-07-03' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers)
Merging mac80211/master (5cf3006cc81d nl80211: Add a missing break in 
parse_station_flags)
Merging rdma-fixes/for-rc (d63c46734c54 RDMA/mlx5: Fix memory leak in 
mlx5_ib_create_srq() error path)
Merging sound-current/for-linus (c5a59d2477ab ALSA: hda/ca0132: Update a pci 
quirk device name)
Merging sound-asoc-fixes/for-linus (7dc4ac12ac2b Merge branch 'asoc-4.18' into 
asoc-linus)
Merging regmap-fixes/for-linus (1e4b044d2251 Linux 4.18-rc4)
Merging regulator-fixes/for-linus (c1362eb9f806 Merge branch 'regulator-4.18' 
into regulator-linus)
Merging spi-fixes/for-linus (3c81743ecf5b Merge branch 'spi-4.18' into 
spi-linus)
Merging pci-current/for-linus (a83a21734416 PCI: endpoint: Fix NULL pointer 
dereference error when CONFIGFS is disabled)
Merging driver-core.current/driver-core-linus (722e5f2b1eec driver core: 
Partially revert "driver core: correct device's shutdown order")
Merging tty.current/tty-linus (021c91791a5e Linux 4.18-rc3)
Merging usb.current/usb-linus (c25c

Re: [RFC PATCH v2 1/4] dt-bindings: misc: Add bindings for misc. BMC control fields

2018-07-12 Thread Andrew Jeffery
Hi Rob, Ben,

I've replied to you both inline below, hopefully it's clear enough from the 
context.

On Fri, 13 Jul 2018, at 10:25, Benjamin Herrenschmidt wrote:
> On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote:
> > On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery  wrote:
> > > 
> > > Hi Rob,
> > > 
> > > Thanks for the response.
> > > 
> > > On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote:
> > > > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote:
> > > > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist 
> > > > > to
> > > > > provide remote management of (primarily) server platforms. BMCs are
> > > > > often tightly coupled to the platform in terms of behaviour and 
> > > > > provide
> > > > > many hardware features integral to booting and running the host 
> > > > > system.
> > > > > 
> > > > > Some of these hardware features are simple, for example scratch
> > > > > registers provided by the BMC that are exposed to both the host and 
> > > > > the
> > > > > BMC. In other cases there's a single bit switch to enable or disable
> > > > > some of the provided functionality.
> > > > > 
> > > > > The documentation defines bindings for fields in registers that do not
> > > > > integrate well into other driver models yet must be described to allow
> > > > > the BMC kernel to assume control of these features.
> > > > 
> > > > So we'll get a new binding when that happens? That will break
> > > > compatibility.
> > > 
> > > Can you please expand on this? I'm not following.
> > 
> > If we have a subsystem in the future, then there would likely be an
> > associated binding which would be different. So if you update the DT,
> > then old kernels won't work with it.
> 
> What kind of "subsystem" ? There is almost no way there could be one
> for that sort of BMC tunables. We've look at several BMC chips out
> there and requirements from several vendors, BIOS and system
> manufacturers and it's all over the place.

Right - This is the fundamental principle backing these patches: There will 
never be a coherent subsystem catering to any of what we want to describe with 
these bindings.

> 
> > > I feel like this is an argument of tradition. Maybe people have
> > > been dissuaded from doing so when they don't have a reasonable use-
> > > case? I'm not saying that what I'm proposing is unquestionably
> > > reasonable, but I don't want to dismiss it out of hand.
> > 
...
> > 
> > It comes up with system controller type blocks too that just have a
> > bunch of random registers. 

This matches the situation at hand.

> > Those change in every SoC and not in any
> > controlled or ordered way that would make describing the individual
> > sub-functions in DT worthwhile.

"Not worthwhile" is what I'm pushing back against for our use cases. I think 
they are narrow and limited enough to make it worthwhile.

Obviously we want to avoid describing these things *badly* - you mentioned the 
clock bindings - so I'm happy to hash out what the right representation should 
be. But I struggle to think the solution is not describing some of our hardware 
features at all.

> 
> So what's the alternative ? Because without something like what we
> propose, what's going to happen is /dev/mem ... that's what people do
> today.

Yep. And I've outlined in the cover letter what I think are the advantages of 
what I'm proposing over /dev/mem. It's not an incredible gain, but has several 
of nice-to-have properties.

> 
> > > > A node per register bit doesn't scale.
> > > 
> > > It isn't meant to scale in terms of a single system. Using it
> > > extensively is very likely wrong. Separately, register-bit-led does
> > > pretty much the same thing. Doesn't the scale argument apply there?
> > > Who is to stop me from attaching an insane number of LEDs to a
> > > system?
> > 
> > Review.
> > 
> > If you look, register-bit-led is rarely used outside of some ARM, Ltd.
> > boards. It's simply quite rare to have MMIO register bits that have a
> > fixed function of LED control.
> 
> Well, same here, we hope to review what goes upstream to make it
> reasonable. Otherwise it doens't matter. If a random vendor, let's say
> IBM, chose to chip a system where they put an insane amount of cruft in
> there, it will only affect those systems's BMC and the userspace stack
> on it.
> 
> Thankfully that stack is OpenBMC and IBM is aiming at having their
> device-tree's upstream, thus reviewed, thus it won't happen.
> 
> *Anything* can be abused. The point here is that we have a number,
> thankfully rather small, maybe a dozen or two, of tunables that are
> quite specific to a combination (system vendor, bmc vendor, system
> model) which control a few HW features that essentially do *NOT* fit in
> a subsystem.

Exactly. I tried to head off the abuse vector by requiring that uses be listed 
in the bindings document, and thus enforce some level of review. It might not 
be the most effective approach at the end of the day, but at least i

[PATCH v4 2/2] ARM: dts: am335x: add am335x-sancloud-bbe board support

2018-07-12 Thread Koen Kooi
The "Beaglebone Enhanced" by Sancloud is based on the Beaglebone Black,
but with the following differences:

 * Gigabit capable PHY
 * Extra USB hub, optional i2c control
 * lps3331ap barometer connected over i2c
 * MPU6050 6 axis MEMS accelerometer/gyro connected over i2c
 * 1GiB DDR3 RAM
 * RTL8723 Wifi/Bluetooth connected over USB

Tested on a revision G board.

Signed-off-by: Koen Kooi 
---
v4: No changes
v3: Drop oppnitro-10, not needed on the versions Sancloud is using
v2: * Add missing #include 
* Fix Barometer compatible string
v1: Initial submission

 arch/arm/boot/dts/Makefile|   1 +
 arch/arm/boot/dts/am335x-sancloud-bbe.dts | 136 ++
 2 files changed, 137 insertions(+)
 create mode 100644 arch/arm/boot/dts/am335x-sancloud-bbe.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 37a3de7..83a4d61 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -695,6 +695,7 @@ dtb-$(CONFIG_SOC_AM33XX) += \
am335x-pepper.dtb \
am335x-phycore-rdk.dtb \
am335x-pocketbeagle.dtb \
+   am335x-sancloud-bbe.dtb \
am335x-shc.dtb \
am335x-sbc-t335.dtb \
am335x-sl50.dtb \
diff --git a/arch/arm/boot/dts/am335x-sancloud-bbe.dts 
b/arch/arm/boot/dts/am335x-sancloud-bbe.dts
new file mode 100644
index 000..ba5f4bd
--- /dev/null
+++ b/arch/arm/boot/dts/am335x-sancloud-bbe.dts
@@ -0,0 +1,136 @@
+/*
+ * Copyright (C) 2012 Texas Instruments Incorporated - http://www.ti.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+/dts-v1/;
+
+#include "am33xx.dtsi"
+#include "am335x-bone-common.dtsi"
+#include "am335x-boneblack-common.dtsi"
+#include 
+
+/ {
+   model = "SanCloud BeagleBone Enhanced";
+   compatible = "sancloud,am335x-boneenhanced", "ti,am335x-bone-black", 
"ti,am335x-bone", "ti,am33xx";
+};
+
+&am33xx_pinmux {
+   pinctrl-names = "default";
+
+   cpsw_default: cpsw_default {
+   pinctrl-single,pins = <
+   /* Slave 1 */
+   AM33XX_IOPAD(0x914, PIN_OUTPUT_PULLDOWN | MUX_MODE2)
/* mii1_txen.rgmii1_tctl */
+   AM33XX_IOPAD(0x918, PIN_INPUT_PULLDOWN | MUX_MODE2) 
/* mii1_rxdv.rgmii1_rctl */
+   AM33XX_IOPAD(0x91c, PIN_OUTPUT_PULLDOWN | MUX_MODE2)
/* mii1_txd3.rgmii1_td3 */
+   AM33XX_IOPAD(0x920, PIN_OUTPUT_PULLDOWN | MUX_MODE2)
/* mii1_txd2.rgmii1_td2 */
+   AM33XX_IOPAD(0x924, PIN_OUTPUT_PULLDOWN | MUX_MODE2)
/* mii1_txd1.rgmii1_td1 */
+   AM33XX_IOPAD(0x928, PIN_OUTPUT_PULLDOWN | MUX_MODE2)
/* mii1_txd0.rgmii1_td0 */
+   AM33XX_IOPAD(0x92c, PIN_OUTPUT_PULLDOWN | MUX_MODE2)
/* mii1_txclk.rgmii1_tclk */
+   AM33XX_IOPAD(0x930, PIN_INPUT_PULLDOWN | MUX_MODE2) 
/* mii1_rxclk.rgmii1_rclk */
+   AM33XX_IOPAD(0x934, PIN_INPUT_PULLDOWN | MUX_MODE2) 
/* mii1_rxd3.rgmii1_rd3 */
+   AM33XX_IOPAD(0x938, PIN_INPUT_PULLDOWN | MUX_MODE2) 
/* mii1_rxd2.rgmii1_rd2 */
+   AM33XX_IOPAD(0x93c, PIN_INPUT_PULLDOWN | MUX_MODE2) 
/* mii1_rxd1.rgmii1_rd1 */
+   AM33XX_IOPAD(0x940, PIN_INPUT_PULLDOWN | MUX_MODE2) 
/* mii1_rxd0.rgmii1_rd0 */
+   >;
+   };
+
+   cpsw_sleep: cpsw_sleep {
+   pinctrl-single,pins = <
+   /* Slave 1 reset value */
+   AM33XX_IOPAD(0x914, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x918, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x91c, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x920, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x924, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x928, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x92c, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x930, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x934, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x938, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x93c, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x940, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   >;
+   };
+
+   davinci_mdio_default: davinci_mdio_default {
+   pinctrl-single,pins = <
+   /* MDIO */
+   AM33XX_IOPAD(0x948, PIN_INPUT_PULLUP | SLEWCTRL_FAST | 
MUX_MODE0)   /* mdio_data.mdio_data */
+   AM33XX_IOPAD(0x94c, PIN_OUTPUT_PULLUP | MUX_MODE0)  
/* mdio_clk.mdio_clk */
+   >;
+  

[PATCH v4 1/2] dt-bindings: Add vendor prefix for Sancloud

2018-07-12 Thread Koen Kooi
Add vendor prefix for Sancloud Ltd.

Signed-off-by: Koen Kooi 
Acked-by: Rob Herring 
---

v4: Add Acked-by
v3: No changes
v2: No changes
v1: Initial submission

 Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt 
b/Documentation/devicetree/bindings/vendor-prefixes.txt
index 7cad066..c7aaa1f 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -314,6 +314,7 @@ rohmROHM Semiconductor Co., Ltd
 roofullShenzhen Roofull Technology Co, Ltd
 samsungSamsung Semiconductor
 samtec Samtec/Softing company
+sancloud   Sancloud Ltd
 sandiskSandisk Corporation
 sbsSmart Battery System
 schindler  Schindler
-- 
2.0.1



[PATCH v4 0/2] ARM: dts: am3355: add support for the Sancloud Beaglebone Enhanced

2018-07-12 Thread Koen Kooi
The "Beaglebone Enhanced" by Sancloud is based on the Beaglebone Black,
but with the following differences:

 * Gigabit capable PHY
 * Extra USB hub, optional i2c control
 * lps3331ap barometer connected over i2c
 * MPU6050 6 axis MEMS accelerometer/gyro connected over i2c
 * 1GiB DDR3 RAM
 * RTL8723 Wifi/Bluetooth connected over USB

This series adds the Sancloud vendor prefix as well as the actual dts.

v4: Add Robs Acked-by for 1/2
v3: Drop 1GHz Opp tweak
v2: * Add missing #include 
* Fix barometer compatible string
v1: Initial submission, not the dts actually tested :/

Also double checked if the kbuild error has been fixed:

koen@beast:/build/pkg/linux-torvalds$ git describe
v4.18-rc4-71-gd69088d
koen@beast:/build/pkg/linux-torvalds$ ARCH=arm 
CROSS_COMPILE=arm-angstrom-linux-gnueabi- make am335x-sancloud-bbe.dtb
  DTC arch/arm/boot/dts/am335x-sancloud-bbe.dtb
koen@beast:/build/pkg/linux-torvalds$ 

Same successful result on tmlind/for-next (which has v2 already) and 
robh/for-next

Koen Kooi (2):
  dt-bindings: Add vendor prefix for Sancloud
  ARM: dts: am335x: add am335x-sancloud-bbe board support

 .../devicetree/bindings/vendor-prefixes.txt|   1 +
 arch/arm/boot/dts/Makefile |   1 +
 arch/arm/boot/dts/am335x-sancloud-bbe.dts  | 146 +
 3 files changed, 148 insertions(+)
 create mode 100644 arch/arm/boot/dts/am335x-sancloud-bbe.dts

-- 
2.0.1



[PATCH v3 2/2] ARM: dts: am335x: add am335x-sancloud-bbe board support

2018-07-12 Thread Koen Kooi
The "Beaglebone Enhanced" by Sancloud is based on the Beaglebone Black,
but with the following differences:

 * Gigabit capable PHY
 * Extra USB hub, optional i2c control
 * lps3331ap barometer connected over i2c
 * MPU6050 6 axis MEMS accelerometer/gyro connected over i2c
 * 1GiB DDR3 RAM
 * RTL8723 Wifi/Bluetooth connected over USB

Tested on a revision G board.

Signed-off-by: Koen Kooi 
---
v3: Drop oppnitro-10, not needed on the versions Sancloud is using
v2: * Add missing #include 
* Fix Barometer compatible string
v1: Initial submission

 arch/arm/boot/dts/Makefile|   1 +
 arch/arm/boot/dts/am335x-sancloud-bbe.dts | 136 ++
 2 files changed, 137 insertions(+)
 create mode 100644 arch/arm/boot/dts/am335x-sancloud-bbe.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 37a3de7..83a4d61 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -695,6 +695,7 @@ dtb-$(CONFIG_SOC_AM33XX) += \
am335x-pepper.dtb \
am335x-phycore-rdk.dtb \
am335x-pocketbeagle.dtb \
+   am335x-sancloud-bbe.dtb \
am335x-shc.dtb \
am335x-sbc-t335.dtb \
am335x-sl50.dtb \
diff --git a/arch/arm/boot/dts/am335x-sancloud-bbe.dts 
b/arch/arm/boot/dts/am335x-sancloud-bbe.dts
new file mode 100644
index 000..ba5f4bd
--- /dev/null
+++ b/arch/arm/boot/dts/am335x-sancloud-bbe.dts
@@ -0,0 +1,136 @@
+/*
+ * Copyright (C) 2012 Texas Instruments Incorporated - http://www.ti.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+/dts-v1/;
+
+#include "am33xx.dtsi"
+#include "am335x-bone-common.dtsi"
+#include "am335x-boneblack-common.dtsi"
+#include 
+
+/ {
+   model = "SanCloud BeagleBone Enhanced";
+   compatible = "sancloud,am335x-boneenhanced", "ti,am335x-bone-black", 
"ti,am335x-bone", "ti,am33xx";
+};
+
+&am33xx_pinmux {
+   pinctrl-names = "default";
+
+   cpsw_default: cpsw_default {
+   pinctrl-single,pins = <
+   /* Slave 1 */
+   AM33XX_IOPAD(0x914, PIN_OUTPUT_PULLDOWN | MUX_MODE2)
/* mii1_txen.rgmii1_tctl */
+   AM33XX_IOPAD(0x918, PIN_INPUT_PULLDOWN | MUX_MODE2) 
/* mii1_rxdv.rgmii1_rctl */
+   AM33XX_IOPAD(0x91c, PIN_OUTPUT_PULLDOWN | MUX_MODE2)
/* mii1_txd3.rgmii1_td3 */
+   AM33XX_IOPAD(0x920, PIN_OUTPUT_PULLDOWN | MUX_MODE2)
/* mii1_txd2.rgmii1_td2 */
+   AM33XX_IOPAD(0x924, PIN_OUTPUT_PULLDOWN | MUX_MODE2)
/* mii1_txd1.rgmii1_td1 */
+   AM33XX_IOPAD(0x928, PIN_OUTPUT_PULLDOWN | MUX_MODE2)
/* mii1_txd0.rgmii1_td0 */
+   AM33XX_IOPAD(0x92c, PIN_OUTPUT_PULLDOWN | MUX_MODE2)
/* mii1_txclk.rgmii1_tclk */
+   AM33XX_IOPAD(0x930, PIN_INPUT_PULLDOWN | MUX_MODE2) 
/* mii1_rxclk.rgmii1_rclk */
+   AM33XX_IOPAD(0x934, PIN_INPUT_PULLDOWN | MUX_MODE2) 
/* mii1_rxd3.rgmii1_rd3 */
+   AM33XX_IOPAD(0x938, PIN_INPUT_PULLDOWN | MUX_MODE2) 
/* mii1_rxd2.rgmii1_rd2 */
+   AM33XX_IOPAD(0x93c, PIN_INPUT_PULLDOWN | MUX_MODE2) 
/* mii1_rxd1.rgmii1_rd1 */
+   AM33XX_IOPAD(0x940, PIN_INPUT_PULLDOWN | MUX_MODE2) 
/* mii1_rxd0.rgmii1_rd0 */
+   >;
+   };
+
+   cpsw_sleep: cpsw_sleep {
+   pinctrl-single,pins = <
+   /* Slave 1 reset value */
+   AM33XX_IOPAD(0x914, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x918, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x91c, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x920, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x924, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x928, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x92c, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x930, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x934, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x938, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x93c, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   AM33XX_IOPAD(0x940, PIN_INPUT_PULLDOWN | MUX_MODE7)
+   >;
+   };
+
+   davinci_mdio_default: davinci_mdio_default {
+   pinctrl-single,pins = <
+   /* MDIO */
+   AM33XX_IOPAD(0x948, PIN_INPUT_PULLUP | SLEWCTRL_FAST | 
MUX_MODE0)   /* mdio_data.mdio_data */
+   AM33XX_IOPAD(0x94c, PIN_OUTPUT_PULLUP | MUX_MODE0)  
/* mdio_clk.mdio_clk */
+   >;
+   };
+
+   d

[PATCH v3 1/2] dt-bindings: Add vendor prefix for Sancloud

2018-07-12 Thread Koen Kooi
Add vendor prefix for Sancloud Ltd.

Signed-off-by: Koen Kooi 
---

v3: No changes
v2: No changes
v1: Initial submission

 Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt 
b/Documentation/devicetree/bindings/vendor-prefixes.txt
index 7cad066..c7aaa1f 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -314,6 +314,7 @@ rohmROHM Semiconductor Co., Ltd
 roofullShenzhen Roofull Technology Co, Ltd
 samsungSamsung Semiconductor
 samtec Samtec/Softing company
+sancloud   Sancloud Ltd
 sandiskSandisk Corporation
 sbsSmart Battery System
 schindler  Schindler
-- 
2.0.1



[PATCH v3 0/2] ARM: dts: am3355: add support for the Sancloud Beaglebone Enhanced

2018-07-12 Thread Koen Kooi
The "Beaglebone Enhanced" by Sancloud is based on the Beaglebone Black,
but with the following differences:

 * Gigabit capable PHY
 * Extra USB hub, optional i2c control
 * lps3331ap barometer connected over i2c
 * MPU6050 6 axis MEMS accelerometer/gyro connected over i2c
 * 1GiB DDR3 RAM
 * RTL8723 Wifi/Bluetooth connected over USB

This series adds the Sancloud vendor prefix as well as the actual dts.

v3: Drop 1GHz Opp tweak
v2: * Add missing #include 
* Fix barometer compatible string
v1: Initial submission, not the dts actually tested :/

Also double checked if the kbuild error has been fixed:

koen@beast:/build/pkg/linux-torvalds$ git describe
v4.18-rc4-71-gd69088d
koen@beast:/build/pkg/linux-torvalds$ ARCH=arm 
CROSS_COMPILE=arm-angstrom-linux-gnueabi- make am335x-sancloud-bbe.dtb
  DTC arch/arm/boot/dts/am335x-sancloud-bbe.dtb
koen@beast:/build/pkg/linux-torvalds$ 

Same successful result on tmlind/for-next (which has v2 already) and 
robh/for-next

Koen Kooi (2):
  dt-bindings: Add vendor prefix for Sancloud
  ARM: dts: am335x: add am335x-sancloud-bbe board support

 .../devicetree/bindings/vendor-prefixes.txt|   1 +
 arch/arm/boot/dts/Makefile |   1 +
 arch/arm/boot/dts/am335x-sancloud-bbe.dts  | 146 +
 3 files changed, 148 insertions(+)
 create mode 100644 arch/arm/boot/dts/am335x-sancloud-bbe.dts

-- 
2.0.1



[PATCH v4 1/2] leds: core: Introduce generic pattern interface

2018-07-12 Thread Baolin Wang
From: Bjorn Andersson 

Some LED controllers have support for autonomously controlling
brightness over time, according to some preprogrammed pattern or
function.

This adds a new optional operator that LED class drivers can implement
if they support such functionality as well as a new device attribute to
configure the pattern for a given LED.

[Baolin Wang did some improvements.]

Signed-off-by: Bjorn Andersson 
Signed-off-by: Baolin Wang 
---
Changes from v3:
 - Move the check in pattern_show() to of_led_classdev_register().
 - Add more documentation to explain how to set/clear one pattern.

Changes from v2:
 - Change kernel version to 4.19.
 - Force user to return error pointer if failed to issue pattern_get().
 - Use strstrip() to trim trailing newline.
 - Other optimization.

Changes from v1:
 - Add some comments suggested by Pavel.
 - Change 'delta_t' can be 0.

Note: I removed the pattern repeat check and will get the repeat number by 
adding
one extra file named 'pattern_repeat' according to previous discussion.
---
 Documentation/ABI/testing/sysfs-class-led |   20 +
 drivers/leds/led-class.c  |  118 +
 include/linux/leds.h  |   19 +
 3 files changed, 157 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-class-led 
b/Documentation/ABI/testing/sysfs-class-led
index 5f67f7a..f4b73ad 100644
--- a/Documentation/ABI/testing/sysfs-class-led
+++ b/Documentation/ABI/testing/sysfs-class-led
@@ -61,3 +61,23 @@ Description:
gpio and backlight triggers. In case of the backlight trigger,
it is useful when driving a LED which is intended to indicate
a device in a standby like state.
+
+What: /sys/class/leds//pattern
+Date:  July 2018
+KernelVersion: 4.19
+Description:
+   Specify a pattern for the LED, for LED hardware that support
+   altering the brightness as a function of time.
+
+   The pattern is given by a series of tuples, of brightness and
+   duration (ms). The LED is expected to traverse the series and
+   each brightness value for the specified duration. Duration of
+   0 means brightness should immediately change to new value.
+
+   As LED hardware might have different capabilities and precision
+   the requested pattern might be slighly adjusted by the driver
+   and the resulting pattern of such operation should be returned
+   when this file is read.
+
+   Writing non-empty string to this file will active the pattern,
+   and empty string will disable the pattern.
diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c
index 3c7e348..0992a0e 100644
--- a/drivers/leds/led-class.c
+++ b/drivers/leds/led-class.c
@@ -74,6 +74,119 @@ static ssize_t max_brightness_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(max_brightness);
 
+static ssize_t pattern_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct led_classdev *led_cdev = dev_get_drvdata(dev);
+   struct led_pattern *pattern;
+   size_t offset = 0;
+   int count, n, i;
+
+   pattern = led_cdev->pattern_get(led_cdev, &count);
+   if (IS_ERR(pattern))
+   return PTR_ERR(pattern);
+
+   for (i = 0; i < count; i++) {
+   n = snprintf(buf + offset, PAGE_SIZE - offset, "%d %d ",
+pattern[i].brightness, pattern[i].delta_t);
+
+   if (offset + n >= PAGE_SIZE)
+   goto err_nospc;
+
+   offset += n;
+   }
+
+   buf[offset - 1] = '\n';
+
+   kfree(pattern);
+   return offset;
+
+err_nospc:
+   kfree(pattern);
+   return -ENOSPC;
+}
+
+static ssize_t pattern_store(struct device *dev,
+struct device_attribute *attr,
+const char *buf, size_t size)
+{
+   struct led_classdev *led_cdev = dev_get_drvdata(dev);
+   struct led_pattern *pattern = NULL;
+   char *sbegin, *elem, *s;
+   unsigned long val;
+   int ret = 0, len = 0;
+   bool odd = true;
+
+   sbegin = kstrndup(buf, size, GFP_KERNEL);
+   if (!sbegin)
+   return -ENOMEM;
+
+   /*
+* Trim trailing newline, if the remaining string is empty,
+* clear the pattern.
+*/
+   s = strstrip(sbegin);
+   if (!*s) {
+   if (led_cdev->pattern_clear)
+   ret = led_cdev->pattern_clear(led_cdev);
+   goto out;
+   }
+
+   pattern = kcalloc(size, sizeof(*pattern), GFP_KERNEL);
+   if (!pattern) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   /* Parse out the brightness & delta_t touples */
+   while ((elem = strsep(&s, " ")) != NULL) {
+   ret = kstrtoul(elem, 10, &val);
+   if (ret)
+

[PATCH v4 2/2] leds: sc27xx: Add pattern_set/get/clear interfaces for LED controller

2018-07-12 Thread Baolin Wang
This patch implements the 'pattern_set', 'pattern_get' and 'pattern_clear'
interfaces to support SC27XX LED breathing mode.

Signed-off-by: Baolin Wang 
---
Changes from v3:
 - None.

Changes from v2:
 - No updates.

Changes from v1:
 - No updates.
---
 drivers/leds/leds-sc27xx-bltc.c |  160 +++
 1 file changed, 160 insertions(+)

diff --git a/drivers/leds/leds-sc27xx-bltc.c b/drivers/leds/leds-sc27xx-bltc.c
index 9d9b7aa..898f92d 100644
--- a/drivers/leds/leds-sc27xx-bltc.c
+++ b/drivers/leds/leds-sc27xx-bltc.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* PMIC global control register definition */
@@ -32,8 +33,13 @@
 #define SC27XX_DUTY_MASK   GENMASK(15, 0)
 #define SC27XX_MOD_MASKGENMASK(7, 0)
 
+#define SC27XX_CURVE_SHIFT 8
+#define SC27XX_CURVE_L_MASKGENMASK(7, 0)
+#define SC27XX_CURVE_H_MASKGENMASK(15, 8)
+
 #define SC27XX_LEDS_OFFSET 0x10
 #define SC27XX_LEDS_MAX3
+#define SC27XX_LEDS_PATTERN_CNT4
 
 struct sc27xx_led {
char name[LED_MAX_NAME_SIZE];
@@ -122,6 +128,157 @@ static int sc27xx_led_set(struct led_classdev *ldev, enum 
led_brightness value)
return err;
 }
 
+static int sc27xx_led_pattern_clear(struct led_classdev *ldev)
+{
+   struct sc27xx_led *leds = to_sc27xx_led(ldev);
+   struct regmap *regmap = leds->priv->regmap;
+   u32 base = sc27xx_led_get_offset(leds);
+   u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL;
+   u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line;
+   int err;
+
+   mutex_lock(&leds->priv->lock);
+
+   /* Reset the rise, high, fall and low time to zero. */
+   regmap_write(regmap, base + SC27XX_LEDS_CURVE0, 0);
+   regmap_write(regmap, base + SC27XX_LEDS_CURVE1, 0);
+
+   err = regmap_update_bits(regmap, ctrl_base,
+   (SC27XX_LED_RUN | SC27XX_LED_TYPE) << ctrl_shift, 0);
+
+   mutex_unlock(&leds->priv->lock);
+
+   return err;
+}
+
+static int sc27xx_led_pattern_set(struct led_classdev *ldev,
+ struct led_pattern *pattern,
+ int len)
+{
+   struct sc27xx_led *leds = to_sc27xx_led(ldev);
+   u32 base = sc27xx_led_get_offset(leds);
+   u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL;
+   u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line;
+   struct regmap *regmap = leds->priv->regmap;
+   int err;
+
+   /*
+* Must contain 4 patterns to configure the rise time, high time, fall
+* time and low time to enable the breathing mode.
+*/
+   if (len != SC27XX_LEDS_PATTERN_CNT)
+   return -EINVAL;
+
+   mutex_lock(&leds->priv->lock);
+
+   err = regmap_update_bits(regmap, base + SC27XX_LEDS_CURVE0,
+SC27XX_CURVE_L_MASK, pattern[0].delta_t);
+   if (err)
+   goto out;
+
+   err = regmap_update_bits(regmap, base + SC27XX_LEDS_CURVE1,
+SC27XX_CURVE_L_MASK, pattern[1].delta_t);
+   if (err)
+   goto out;
+
+   err = regmap_update_bits(regmap, base + SC27XX_LEDS_CURVE0,
+SC27XX_CURVE_H_MASK,
+pattern[2].delta_t << SC27XX_CURVE_SHIFT);
+   if (err)
+   goto out;
+
+
+   err = regmap_update_bits(regmap, base + SC27XX_LEDS_CURVE1,
+SC27XX_CURVE_H_MASK,
+pattern[3].delta_t << SC27XX_CURVE_SHIFT);
+   if (err)
+   goto out;
+
+
+   err = regmap_update_bits(regmap, base + SC27XX_LEDS_DUTY,
+SC27XX_DUTY_MASK,
+(pattern[0].brightness << SC27XX_DUTY_SHIFT) |
+SC27XX_MOD_MASK);
+   if (err)
+   goto out;
+
+   /* Enable the LED breathing mode */
+   err = regmap_update_bits(regmap, ctrl_base,
+SC27XX_LED_RUN << ctrl_shift,
+SC27XX_LED_RUN << ctrl_shift);
+
+out:
+   mutex_unlock(&leds->priv->lock);
+
+   return err;
+}
+
+static struct led_pattern *sc27xx_led_pattern_get(struct led_classdev *ldev,
+ int *len)
+{
+   struct sc27xx_led *leds = to_sc27xx_led(ldev);
+   u32 base = sc27xx_led_get_offset(leds);
+   struct regmap *regmap = leds->priv->regmap;
+   struct led_pattern *pattern;
+   int i, err;
+   u32 val;
+
+   /*
+* Must allocate 4 patterns to show the rise time, high time, fall time
+* and low time.
+*/
+   pattern = kcalloc(SC27XX_LEDS_PATTERN_CNT, sizeof(*pattern),
+ GFP_KERNEL);
+   if (!pattern)
+   return ERR_PTR(-ENOMEM);
+
+   mutex_lock(&leds->priv->lock);
+
+   err = regmap_read(regmap, base + SC27XX_LEDS_CURVE0, &val)

Re: [patch -mm] mm, oom: remove oom_lock from exit_mmap

2018-07-12 Thread Tetsuo Handa
What a simplified description of oom_lock...

Positive effects

(1) Serialize "setting TIF_MEMDIE and calling __thaw_task()/atomic_inc() from
mark_oom_victim()" and "setting oom_killer_disabled = true from
oom_killer_disable()".

(2) Serialize all printk() messages from out_of_memory().

(3) Prevent from selecting new OOM victim when there is an !MMF_OOM_SKIP mm
which current thread should wait for.

(4) Mutex blocking_notifier_call_chain() from out_of_memory() because some of
callbacks might not be thread-safe and/or serialized call might release
more memory than needed.

Negative effects

(A) Threads which called mutex_lock(&oom_lock) before calling out_of_memory()
are blocked waiting for "__oom_reap_task_mm() from exit_mmap()" and/or
"__oom_reap_task_mm() from oom_reap_task_mm()".

(B) Threads which do not call out_of_memory() because mutex_trylock(&oom_lock)
failed continue consuming CPU resources pointlessly.

Regarding (A), we can reduce the range oom_lock serializes from
"__oom_reap_task_mm()" to "setting MMF_OOM_SKIP", for oom_lock is useful for 
(3).
Therefore, we can apply below change on top of your patch. But I don't like
sharing MMF_UNSBALE for two purposes (reason is explained below).

Regarding (B), we can do direct OOM reaping (like my proposal does).

---
 kernel/fork.c |  5 +
 mm/mmap.c | 21 +
 mm/oom_kill.c | 57 ++---
 3 files changed, 36 insertions(+), 47 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 6747298..f37d481 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -984,6 +984,11 @@ static inline void __mmput(struct mm_struct *mm)
}
if (mm->binfmt)
module_put(mm->binfmt->module);
+   if (unlikely(mm_is_oom_victim(mm))) {
+   mutex_lock(&oom_lock);
+   set_bit(MMF_OOM_SKIP, &mm->flags);
+   mutex_unlock(&oom_lock);
+   }
mmdrop(mm);
 }
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 7f918eb..203061f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3075,19 +3075,17 @@ void exit_mmap(struct mm_struct *mm)
__oom_reap_task_mm(mm);
 
/*
-* Now, set MMF_UNSTABLE to avoid racing with the oom reaper.
-* This needs to be done before calling munlock_vma_pages_all(),
-* which clears VM_LOCKED, otherwise the oom reaper cannot
-* reliably test for it.  If the oom reaper races with
-* munlock_vma_pages_all(), this can result in a kernel oops if
-* a pmd is zapped, for example, after follow_page_mask() has
-* checked pmd_none().
+* Wait for the oom reaper to complete. This needs to be done
+* before calling munlock_vma_pages_all(), which clears
+* VM_LOCKED, otherwise the oom reaper cannot reliably test for
+* it. If the oom reaper races with munlock_vma_pages_all(),
+* this can result in a kernel oops if a pmd is zapped, for
+* example, after follow_page_mask() has checked pmd_none().
 *
-* Taking mm->mmap_sem for write after setting MMF_UNSTABLE will
-* guarantee that the oom reaper will not run on this mm again
-* after mmap_sem is dropped.
+* Taking mm->mmap_sem for write will guarantee that the oom
+* reaper will not run on this mm again after mmap_sem is
+* dropped.
 */
-   set_bit(MMF_UNSTABLE, &mm->flags);
down_write(&mm->mmap_sem);
up_write(&mm->mmap_sem);
}
@@ -3115,7 +3113,6 @@ void exit_mmap(struct mm_struct *mm)
unmap_vmas(&tlb, vma, 0, -1);
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb, 0, -1);
-   set_bit(MMF_OOM_SKIP, &mm->flags);
 
/*
 * Walk the list again, actually closing and freeing it,
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index e6328ce..7ed4ed0 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -488,11 +488,9 @@ void __oom_reap_task_mm(struct mm_struct *mm)
 * Tell all users of get_user/copy_from_user etc... that the content
 * is no longer stable. No barriers really needed because unmapping
 * should imply barriers already and the reader would hit a page fault
-* if it stumbled over a reaped memory. If MMF_UNSTABLE is already set,
-* reaping as already occurred so nothing left to do.
+* if it stumbled over a reaped memory.
 */
-   if (test_and_set_bit(MMF_UNSTABLE, &mm->flags))
-   return;
+   set_bit(MMF_UNSTABLE, &mm->flags);
 
for (vma = mm->mmap ; vma; vma = vma->vm_next) {
if (!can_madv_dontneed_vma(vma))
@@ -524,25 +522,9 @@ void __oom_reap_task_mm(struct mm_struct *mm)
 
 s

Re: [PATCH] perf tools: Synthesize GROUP_DESC feature in pipe mode

2018-07-12 Thread Stephane Eranian
Hi Jiri,
On Thu, Jul 12, 2018 at 9:49 AM Jiri Olsa  wrote:
>
> On Thu, Jul 12, 2018 at 09:34:45AM -0700, Stephane Eranian wrote:
> > Hi Jiri,
> > On Thu, Jul 12, 2018 at 6:52 AM Jiri Olsa  wrote:
> > >
> > > Stephan reported, that pipe mode does not carry the group
> > > information and thus the piped report won't display the
> > > grouped output for following command:
> > >
> > Thanks for fixing this quickly.
>
> could I have your tested/acked by?
>
Acked-by: Stephane Eranian 

> > I think we should have more testing on the pipe mode, in general.
>
> yea, we should
>
> jirka
>
> >
> > >   # perf record -e '{cycles,instructions,branches}' -a sleep 4 | perf 
> > > report
> > >
> > > It has no idea about the group setup, so it will display
> > > events separately:
> > >
> > >   # Overhead  Command  Shared Object ...
> > >   #   ...  ...
> > >   #
> > >6.71%  swapper  [kernel.kallsyms]
> > >2.28%  offlineimap  libpython2.7.so.1.0
> > >0.78%  perf [kernel.kallsyms]
> > >   ...
> > >
> > > Fixing GROUP_DESC feature record to be synthesized in pipe mode,
> > > so the report output is grouped if there's group defined in record:
> > >
> > >   # Overhead  Command  Shared...
> > >   #   ...  ...
> > >   #
> > >7.57%   0.16%   0.30%  swapper  [kernel
> > >1.87%   3.15%   2.46%  offlineimap  libpyth
> > >1.33%   0.00%   0.00%  perf [kernel
> > >   ...
> > >
> > > Cc: David Carrillo-Cisneros 
> > > Reported-by: Stephane Eranian 
> > > Link: http://lkml.kernel.org/n/tip-ybqyh8ac4g173iy3xt4px...@git.kernel.org
> > > Signed-off-by: Jiri Olsa 
> > > ---
> > >  tools/perf/util/header.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> > > index 59fcc790c865..af9aaf28f976 100644
> > > --- a/tools/perf/util/header.c
> > > +++ b/tools/perf/util/header.c
> > > @@ -2587,7 +2587,7 @@ static const struct feature_ops 
> > > feat_ops[HEADER_LAST_FEATURE] = {
> > > FEAT_OPR(NUMA_TOPOLOGY, numa_topology,  true),
> > > FEAT_OPN(BRANCH_STACK,  branch_stack,   false),
> > > FEAT_OPR(PMU_MAPPINGS,  pmu_mappings,   false),
> > > -   FEAT_OPN(GROUP_DESC,group_desc, false),
> > > +   FEAT_OPR(GROUP_DESC,group_desc, false),
> > > FEAT_OPN(AUXTRACE,  auxtrace,   false),
> > > FEAT_OPN(STAT,  stat,   false),
> > > FEAT_OPN(CACHE, cache,  true),
> > > --
> > > 2.17.1
> > >


Re: [PATCH v2 2/3] dmaengine: imx-sdma: add memcpy interface

2018-07-12 Thread Sascha Hauer
On Fri, Jul 13, 2018 at 09:08:46PM +0800, Robin Gong wrote:
> Add MEMCPY capability for imx-sdma driver.
> 
> Signed-off-by: Robin Gong 
> ---
>  drivers/dma/imx-sdma.c | 95 
> --
>  1 file changed, 92 insertions(+), 3 deletions(-)
> 
> @@ -1318,6 +1347,63 @@ static struct sdma_desc *sdma_transfer_init(struct 
> sdma_channel *sdmac,
>   return NULL;
>  }
>  
> +static struct dma_async_tx_descriptor *sdma_prep_memcpy(
> + struct dma_chan *chan, dma_addr_t dma_dst,
> + dma_addr_t dma_src, size_t len, unsigned long flags)
> +{
> + struct sdma_channel *sdmac = to_sdma_chan(chan);
> + struct sdma_engine *sdma = sdmac->sdma;
> + int channel = sdmac->channel;
> + size_t count;
> + int i = 0, param;
> + struct sdma_buffer_descriptor *bd;
> + struct sdma_desc *desc;
> +
> + if (!chan || !len)
> + return NULL;
> +
> + dev_dbg(sdma->dev, "memcpy: %pad->%pad, len=%zu, channel=%d.\n",
> + &dma_src, &dma_dst, len, channel);
> +
> + desc = sdma_transfer_init(sdmac, DMA_MEM_TO_MEM,
> + len / SDMA_BD_MAX_CNT + 1);
> + if (!desc)
> + return NULL;
> +
> + do {
> + count = min_t(size_t, len, SDMA_BD_MAX_CNT);
> + bd = &desc->bd[i];
> + bd->buffer_addr = dma_src;
> + bd->ext_buffer_addr = dma_dst;
> + bd->mode.count = count;
> + desc->chn_count += count;
> + /* align with sdma->dma_device.copy_align: 4bytes */
> + bd->mode.command = 0;
> +
> + dma_src += count;
> + dma_dst += count;
> + len -= count;
> + i++;

NACK.

Please actually look at your code and find out where you do unaligned DMA
accesses. Hint: What happens when this loop body is executed more than once?

Sascha

> +
> + param = BD_DONE | BD_EXTD | BD_CONT;
> + /* last bd */
> + if (!len) {
> + param |= BD_INTR;
> + param |= BD_LAST;
> + param &= ~BD_CONT;
> + }
> +
> + dev_dbg(sdma->dev, "entry %d: count: %zd dma: 0x%x %s%s\n",
> + i, count, bd->buffer_addr,
> + param & BD_WRAP ? "wrap" : "",
> + param & BD_INTR ? " intr" : "");
> +
> + bd->mode.status = param;
> + } while (len);
> +

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |


Re: [PATCH 1/2] ARM: dts: imx51-zii-scu3-esb: Add switch IRQ line pinumx config

2018-07-12 Thread Andrey Smirnov
On Thu, Jul 12, 2018 at 6:37 AM Fabio Estevam  wrote:
>
> Hi Andrey,
>
> On Wed, Jul 11, 2018 at 11:33 PM, Andrey Smirnov
>  wrote:
>
> > +   pinctrl_switch: switchgrp {
> > +   fsl,pins = <
> > +   MX51_PAD_AUD3_BB_CK__GPIO4_20   0xc5
>
> The i.MX51 Reference Manual states that 0xa5 is the default reset
> value for the register IOMUXC_SW_PAD_CTL_PAD_AUD3_BB_CK.
>
> By reading your commit log I had the impression you wanted to provide
> the default value explicitly.
>
> Please clarify.

I wanted to avoid relying on defaults be it register reset values or
settings that bootloader left us with. Default value of 0xa5 works,
but, given how the pin is IRQ_TYPE_LEVEL_HIGH, I though it would be
better to configure it to have a pulldown. Do you want me to add that
to commit log?

Thanks,
Andrey Smirnov


Re: [PATCH] PCI/AER: Enable SERR# forwarding in non ACPI flow

2018-07-12 Thread poza

On 2018-07-12 20:15, Bharat Kumar Gogada wrote:

Currently PCI_BRIDGE_CTL_SERR is being enabled only in
ACPI flow.
This bit is required for forwarding errors reported
by EP devices to upstream device.
This patch enables SERR# for Type-1 PCI device.

Signed-off-by: Bharat Kumar Gogada 
---
 drivers/pci/pcie/aer.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e8838..943e084 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -343,6 +343,19 @@ int pci_enable_pcie_error_reporting(struct pci_dev 
*dev)

if (!dev->aer_cap)
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /*
+* A Type-1 PCI bridge will not forward ERR_ messages coming
+* from an endpoint if SERR# forwarding is not enabled.
+*/
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &control);
+   control |= PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
 	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
PCI_EXP_AER_FLAGS);

 }
 EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
@@ -352,6 +365,16 @@ int pci_disable_pcie_error_reporting(struct 
pci_dev *dev)

if (pcie_aer_get_firmware_first(dev))
return -EIO;

+   if (!IS_ENABLED(CONFIG_ACPI) &&
+   dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   u16 control;
+
+   /* Clear SERR Forwarding */
+   pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &control);
+   control &= ~PCI_BRIDGE_CTL_SERR;
+   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+   }
+
return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
  PCI_EXP_AER_FLAGS);
 }



Should this configuration no be set by Firmware ? why should Linux 
dictate it ?


Regards,
Oza.




[PATCH v2 3/3] ARM: configs: imx_v6_v7_defconfig: add DMATEST support

2018-07-12 Thread Robin Gong
Add DMATEST support and remove invalid options, such as
CONFIG_BT_HCIUART_H4 is default enabled and CONFIG_SND_SOC_IMX_WM8962
is out of date and not appear in any config file. Please refer to
Documentation/driver-api/dmaengine/dmatest.rst to test MEMCPY feature
of imx-sdma.

Signed-off-by: Robin Gong 
---
 arch/arm/configs/imx_v6_v7_defconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/configs/imx_v6_v7_defconfig 
b/arch/arm/configs/imx_v6_v7_defconfig
index e381d05..f28d4d9 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -81,7 +81,6 @@ CONFIG_CAN=y
 CONFIG_CAN_FLEXCAN=y
 CONFIG_BT=y
 CONFIG_BT_HCIUART=y
-CONFIG_BT_HCIUART_H4=y
 CONFIG_BT_HCIUART_LL=y
 CONFIG_CFG80211=y
 CONFIG_CFG80211_WEXT=y
@@ -282,7 +281,6 @@ CONFIG_SND_SOC_FSL_ASRC=y
 CONFIG_SND_IMX_SOC=y
 CONFIG_SND_SOC_PHYCORE_AC97=y
 CONFIG_SND_SOC_EUKREA_TLV320=y
-CONFIG_SND_SOC_IMX_WM8962=y
 CONFIG_SND_SOC_IMX_ES8328=y
 CONFIG_SND_SOC_IMX_SGTL5000=y
 CONFIG_SND_SOC_IMX_SPDIF=y
@@ -371,6 +369,7 @@ CONFIG_DMADEVICES=y
 CONFIG_FSL_EDMA=y
 CONFIG_IMX_SDMA=y
 CONFIG_MXS_DMA=y
+CONFIG_DMATEST=m
 CONFIG_STAGING=y
 CONFIG_STAGING_MEDIA=y
 CONFIG_VIDEO_IMX_MEDIA=y
-- 
2.7.4



[PATCH v2 2/3] dmaengine: imx-sdma: add memcpy interface

2018-07-12 Thread Robin Gong
Add MEMCPY capability for imx-sdma driver.

Signed-off-by: Robin Gong 
---
 drivers/dma/imx-sdma.c | 95 --
 1 file changed, 92 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index e3d5e73..ef50f2c 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -342,6 +342,7 @@ struct sdma_desc {
  * @pc_from_device:script address for those device_2_memory
  * @pc_to_device:  script address for those memory_2_device
  * @device_to_device:  script address for those device_2_device
+ * @pc_to_pc:  script address for those memory_2_memory
  * @flags: loop mode or not
  * @per_address:   peripheral source or destination address in common case
  *  destination address in p_2_p case
@@ -367,6 +368,7 @@ struct sdma_channel {
enum dma_slave_buswidth word_size;
unsigned intpc_from_device, pc_to_device;
unsigned intdevice_to_device;
+   unsigned intpc_to_pc;
unsigned long   flags;
dma_addr_t  per_address, per_address2;
unsigned long   event_mask[2];
@@ -869,14 +871,16 @@ static void sdma_get_pc(struct sdma_channel *sdmac,
 * These are needed once we start to support transfers between
 * two peripherals or memory-to-memory transfers
 */
-   int per_2_per = 0;
+   int per_2_per = 0, emi_2_emi = 0;
 
sdmac->pc_from_device = 0;
sdmac->pc_to_device = 0;
sdmac->device_to_device = 0;
+   sdmac->pc_to_pc = 0;
 
switch (peripheral_type) {
case IMX_DMATYPE_MEMORY:
+   emi_2_emi = sdma->script_addrs->ap_2_ap_addr;
break;
case IMX_DMATYPE_DSP:
emi_2_per = sdma->script_addrs->bp_2_ap_addr;
@@ -949,6 +953,7 @@ static void sdma_get_pc(struct sdma_channel *sdmac,
sdmac->pc_from_device = per_2_emi;
sdmac->pc_to_device = emi_2_per;
sdmac->device_to_device = per_2_per;
+   sdmac->pc_to_pc = emi_2_emi;
 }
 
 static int sdma_load_context(struct sdma_channel *sdmac)
@@ -965,6 +970,8 @@ static int sdma_load_context(struct sdma_channel *sdmac)
load_address = sdmac->pc_from_device;
else if (sdmac->direction == DMA_DEV_TO_DEV)
load_address = sdmac->device_to_device;
+   else if (sdmac->direction == DMA_MEM_TO_MEM)
+   load_address = sdmac->pc_to_pc;
else
load_address = sdmac->pc_to_device;
 
@@ -1214,10 +1221,28 @@ static int sdma_alloc_chan_resources(struct dma_chan 
*chan)
 {
struct sdma_channel *sdmac = to_sdma_chan(chan);
struct imx_dma_data *data = chan->private;
+   struct imx_dma_data mem_data;
int prio, ret;
 
-   if (!data)
-   return -EINVAL;
+   /*
+* MEMCPY may never setup chan->private by filter function such as
+* dmatest, thus create 'struct imx_dma_data mem_data' for this case.
+* Please note in any other slave case, you have to setup chan->private
+* with 'struct imx_dma_data' in your own filter function if you want to
+* request dma channel by dma_request_channel() rather than
+* dma_request_slave_channel(). Othwise, 'MEMCPY in case?' will appear
+* to warn you to correct your filter function.
+*/
+   if (!data) {
+   dev_dbg(sdmac->sdma->dev, "MEMCPY in case?\n");
+   mem_data.priority = 2;
+   mem_data.peripheral_type = IMX_DMATYPE_MEMORY;
+   mem_data.dma_request = 0;
+   mem_data.dma_request2 = 0;
+   data = &mem_data;
+
+   sdma_get_pc(sdmac, IMX_DMATYPE_MEMORY);
+   }
 
switch (data->priority) {
case DMA_PRIO_HIGH:
@@ -1307,6 +1332,10 @@ static struct sdma_desc *sdma_transfer_init(struct 
sdma_channel *sdmac,
if (sdma_alloc_bd(desc))
goto err_desc_out;
 
+   /* No slave_config called in MEMCPY case, so do here */
+   if (direction == DMA_MEM_TO_MEM)
+   sdma_config_ownership(sdmac, false, true, false);
+
if (sdma_load_context(sdmac))
goto err_desc_out;
 
@@ -1318,6 +1347,63 @@ static struct sdma_desc *sdma_transfer_init(struct 
sdma_channel *sdmac,
return NULL;
 }
 
+static struct dma_async_tx_descriptor *sdma_prep_memcpy(
+   struct dma_chan *chan, dma_addr_t dma_dst,
+   dma_addr_t dma_src, size_t len, unsigned long flags)
+{
+   struct sdma_channel *sdmac = to_sdma_chan(chan);
+   struct sdma_engine *sdma = sdmac->sdma;
+   int channel = sdmac->channel;
+   size_t count;
+   int i = 0, param;
+   struct sdma_buffer_descriptor *bd;
+   struct sdma_desc *desc;
+
+   if (!chan || !len)
+   return NULL;
+
+   dev

[lkp-robot] [xarray] f0b90e702f: BUG:soft_lockup-CPU##stuck_for#s

2018-07-12 Thread kernel test robot

FYI, we noticed the following commit (built with gcc-7):

commit: f0b90e702fe74fa575b7382ec3474d341098d5b1 ("xarray: Add XArray 
unconditional store operations")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: boot

on test machine: qemu-system-i386 -enable-kvm -cpu Haswell,+smep,+smap -m 360M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


++++
|| 3d730c4294 | f0b90e702f |
++++
| boot_successes | 0  | 0  |
| boot_failures  | 14 | 25 |
| WARNING:at_mm/slab_common.c:#kmalloc_slab  | 14 | 25 |
| EIP:kmalloc_slab   | 14 | 25 |
| Mem-Info   | 14 | 25 |
| INFO:trying_to_register_non-static_key | 14 | 25 |
| BUG:unable_to_handle_kernel| 14 ||
| Oops:#[##] | 14 ||
| EIP:__pci_epf_register_driver  | 14 ||
| Kernel_panic-not_syncing:Fatal_exception   | 14 ||
| BUG:soft_lockup-CPU##stuck_for#s   | 0  | 25 |
| EIP:xa_entry   | 0  | 5  |
| Kernel_panic-not_syncing:softlockup:hung_tasks | 0  | 25 |
| EIP:xa_is_node | 0  | 8  |
| EIP:xas_load   | 0  | 2  |
| EIP:debug_lockdep_rcu_enabled  | 0  | 1  |
| EIP:xa_load| 0  | 3  |
| EIP:xas_descend| 0  | 2  |
| EIP:xa_head| 0  | 1  |
| EIP:xas_start  | 0  | 3  |
++++



[   44.03] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]
[   44.03] irq event stamp: 1072387
[   44.03] hardirqs last  enabled at (1072387): [<4106ebde>] 
console_unlock+0x3f3/0x42d
[   44.03] hardirqs last disabled at (1072386): [<4106e84f>] 
console_unlock+0x64/0x42d
[   44.03] softirqs last  enabled at (1072364): [<417ecbeb>] 
__do_softirq+0x183/0x1b3
[   44.03] softirqs last disabled at (1072357): [<41007967>] 
do_softirq_own_stack+0x1d/0x23
[   44.03] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 
4.18.0-rc3-00012-gf0b90e7 #169
[   44.03] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   44.03] EIP: xa_is_node+0x0/0x1a
[   44.03] Code: 89 73 08 89 7b 0c eb 0b 39 43 14 72 0c 8b 75 ec 8b 7d f0 
89 73 10 89 7b 14 8d 4d ec 89 d8 e8 88 fe ff ff 5a 59 5b 5e 5f 5d c3 <89> c2 55 
83 e2 03 83 fa 02 89 e5 0f 94 c2 3d 00 10 00 00 0f 97 c0 
[   44.03] EAX: 4c93caf2 EBX: 5442fec0 ECX: 4c93caf2 EDX: 0001
[   44.03] ESI:  EDI:  EBP: 5442feb4 ESP: 5442feac
[   44.03] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00200293
[   44.03] CR0: 80050033 CR2:  CR3: 01d27000 CR4: 000406b0
[   44.03] Call Trace:
[   44.03]  ? xas_load+0x26/0x2f
[   44.03]  ? xa_load+0x35/0x52
[   44.03]  ? xarray_checks+0x8c2/0x984
[   44.03]  ? check_xa_tag_1+0x308/0x308
[   44.03]  ? do_one_initcall+0x6a/0x13c
[   44.03]  ? parse_args+0xd9/0x1e3
[   44.03]  ? kernel_init_freeable+0xe1/0x172
[   44.03]  ? rest_init+0xaf/0xaf
[   44.03]  ? kernel_init+0x8/0xd0
[   44.03]  ? ret_from_fork+0x19/0x24
[   44.03] Kernel panic - not syncing: softlockup: hung tasks
[   44.03] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GWL
4.18.0-rc3-00012-gf0b90e7 #169
[   44.03] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   44.03] Call Trace:
[   44.03]  ? dump_stack+0x79/0xab
[   44.03]  ? panic+0x99/0x1d8
[   44.03]  ? watchdog_timer_fn+0x1ac/0x1d3
[   44.03]  ? __hrtimer_run_queues+0xa0/0x114
[   44.03]  ? watchdog+0x16/0x16
[   44.03]  ? hrtimer_run_queues+0xd2/0xe5
[   44.03]  ? run_local_timers+0x15/0x39
[   44.03]  ? update_process_times+0x18/0x39
[   44.03]  ? tick_nohz_handler+0xba/0xfb
[   44.03]  ? smp_apic_timer_interrupt+0x54/0x67
[   44.03]  ? apic_timer_interrupt+0x41/0x48
[   44.03]  ? siphash_2u64+0x54f/0x7de
[   44.03]  ? minmax_running_min+0x6f/0x6f
[   44.03]  ? xas_load+0x26/0x2f
[   44.03]  ? xa_load+0x35/0x52
[   44.03]  ? xarray_checks+0x8c2/0x984
[   44.03]  ? check_xa_tag_1+0x308/0x3

[PATCH v2 1/3] dmaengine: imx-sdma: add SDMA_BD_MAX_CNT to replace '0xffff'

2018-07-12 Thread Robin Gong
Add macro SDMA_BD_MAX_CNT to replace '0x'.

Signed-off-by: Robin Gong 
---
 drivers/dma/imx-sdma.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index 3b622d6..e3d5e73 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -185,6 +185,7 @@
  * Mode/Count of data node descriptors - IPCv2
  */
 struct sdma_mode_count {
+#define SDMA_BD_MAX_CNT0x
u32 count   : 16; /* size of the buffer pointed by this BD */
u32 status  :  8; /* E,R,I,C,W,D status bits stored here */
u32 command :  8; /* command mostly used for channel 0 */
@@ -1344,9 +1345,9 @@ static struct dma_async_tx_descriptor *sdma_prep_slave_sg(
 
count = sg_dma_len(sg);
 
-   if (count > 0x) {
+   if (count > SDMA_BD_MAX_CNT) {
dev_err(sdma->dev, "SDMA channel %d: maximum bytes for 
sg entry exceeded: %d > %d\n",
-   channel, count, 0x);
+   channel, count, SDMA_BD_MAX_CNT);
goto err_bd_out;
}
 
@@ -1421,9 +1422,9 @@ static struct dma_async_tx_descriptor 
*sdma_prep_dma_cyclic(
 
sdmac->flags |= IMX_DMA_SG_LOOP;
 
-   if (period_len > 0x) {
+   if (period_len > SDMA_BD_MAX_CNT) {
dev_err(sdma->dev, "SDMA channel %d: maximum period size 
exceeded: %zu > %d\n",
-   channel, period_len, 0x);
+   channel, period_len, SDMA_BD_MAX_CNT);
goto err_bd_out;
}
 
@@ -1970,7 +1971,7 @@ static int sdma_probe(struct platform_device *pdev)
sdma->dma_device.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
sdma->dma_device.device_issue_pending = sdma_issue_pending;
sdma->dma_device.dev->dma_parms = &sdma->dma_parms;
-   dma_set_max_seg_size(sdma->dma_device.dev, 65535);
+   dma_set_max_seg_size(sdma->dma_device.dev, SDMA_BD_MAX_CNT);
 
platform_set_drvdata(pdev, sdma);
 
-- 
2.7.4



[PATCH v2 0/3] add memcpy support for sdma

2018-07-12 Thread Robin Gong
This patchset is to add memcpy interface for imx-sdma, besides,to
support dmatest and enable config by default, so that could test dma
easily without any other device support such as uart/audio/spi...

Change from v1:
  1. remove bus_width check for memcpy since only max bus width needed for
 memcpy case to speedup copy.
  2. remove DMATEST support patch, since DMATEST is a common memcpy case.
  3. split to single patch for SDMA_BD_MAX_CNT instead of '0x'
  4. move sdma_config_ownership() from alloc_chan into sdma_prep_memcpy.
  5. address some minor review comments.

Robin Gong (3):
  dmaengine: imx-sdma: add SDMA_BD_MAX_CNT to replace '0x'
  dmaengine: imx-sdma: add memcpy interface
  ARM: configs: imx_v6_v7_defconfig: add DMATEST support

 arch/arm/configs/imx_v6_v7_defconfig |   3 +-
 drivers/dma/imx-sdma.c   | 106 ---
 2 files changed, 99 insertions(+), 10 deletions(-)

-- 
2.7.4



Re: [PATCH] vfio-pci: Disable binding to PFs with SR-IOV enabled

2018-07-12 Thread Peter Xu
On Thu, Jul 12, 2018 at 04:33:04PM -0600, Alex Williamson wrote:
> We expect to receive PFs with SR-IOV disabled, however some host
> drivers leave SR-IOV enabled at unbind.  This puts us in a state where
> we can potentially assign both the PF and the VF, leading to both
> functionality as well as security concerns due to lack of managing the
> SR-IOV state as well as vendor dependent isolation from the PF to VF.
> If we were to attempt to actively disable SR-IOV on driver probe, we
> risk VF bound drivers blocking, potentially risking live lock
> scenarios.  Therefore simply refuse to bind to PFs with SR-IOV enabled
> with a warning message indicating the issue.  Users can resolve this
> by re-binding to the host driver and disabling SR-IOV before
> attempting to use the device with vfio-pci.
> 
> Signed-off-by: Alex Williamson 

Reviewed-by: Peter Xu 

-- 
Peter Xu


[RFC PATCH] vfio/pci: map prefetchble bars as writecombine

2018-07-12 Thread Srinath Mannam
By default all BARs map with VMA access permissions
as pgprot_noncached.

In ARM64 pgprot_noncached is MT_DEVICE_nGnRnE which
is strongly ordered and allows aligned access.
This type of mapping works for NON-PREFETCHABLE bars
containing EP controller registers.
But it restricts PREFETCHABLE bars from doing
unaligned access.

In CMB NVMe drives PREFETCHABLE bars are required to
map as MT_NORMAL_NC to do unaligned access.

Signed-off-by: Srinath Mannam 
Reviewed-by: Ray Jui 
Reviewed-by: Vikram Prakash 
---
 drivers/vfio/pci/vfio_pci.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index b423a30..eff6b65 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1142,7 +1142,10 @@ static int vfio_pci_mmap(void *device_data, struct 
vm_area_struct *vma)
}
 
vma->vm_private_data = vdev;
-   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+   if (pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH)
+   vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+   else
+   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
 
return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
-- 
2.7.4



Important Notice...Reply Now

2018-07-12 Thread Richard & Angela Maxwell
My wife and I won the Euro Millions Lottery of £53 Million British Pounds and 
we have voluntarily decided to donate €1,000,000EUR(One Million Euros) to 5 
individuals randomly as part of our own charity project.
 
To verify our lottery winnings,please see our interview by visiting the web 
page below:

telegraph.co.uk/news/newstopics/howaboutthat/11511467/Lincolnshire-couple-win-53m-on-EuroMillions.html
 
Your email address was among the emails which were submitted to us by the 
Google Inc. as a web user; if you have received our email,kindly send us the 
below details so that we can transfer your €1,000,000.00 EUR(One Million Euros) 
to you in your own country.

Full Names:
Mobile No:
Age:
Occupation:
Country:

Send your response to: rmaxwel...@yahoo.com

Best Regards,
Richard & Angela Maxwell



Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

2018-07-12 Thread Paul E. McKenney
On Thu, Jul 12, 2018 at 07:05:39PM -0700, Daniel Lustig wrote:
> On 7/12/2018 11:10 AM, Linus Torvalds wrote:
> > On Thu, Jul 12, 2018 at 11:05 AM Peter Zijlstra  
> > wrote:
> >>
> >> The locking pattern is fairly simple and shows where RCpc comes apart
> >> from expectation real nice.
> > 
> > So who does RCpc right now for the unlock-lock sequence? Somebody
> > mentioned powerpc. Anybody else?
> > 
> > How nasty would be be to make powerpc conform? I will always advocate
> > tighter locking and ordering rules over looser ones..
> > 
> > Linus
> 
> RISC-V probably would have been RCpc if we weren't having this discussion.
> Depending on how we map atomics/acquire/release/unlock/lock, we can end up
> producing RCpc, "RCtso" (feel free to find a better name here...), or RCsc
> behaviors, and we're trying to figure out which we actually need.
> 
> I think the debate is this:
> 
> Obviously programmers would prefer just to have RCsc and not have to figure 
> out
> all the complexity of the other options.  On x86 or architectures with native
> RCsc operations (like ARMv8), that's generally easy enough to get.
> 
> For weakly-ordered architectures that use fences for ordering (including
> PowerPC and sometimes RISC-V, see below), though, it takes extra fences to go
> from RCpc to either "RCtso" or RCsc.  People using these architectures are
> concerned about whether there's a negative performance impact from those extra
> fences.
> 
> However, some scheduler code, some RCU code, and probably some other examples
> already implicitly or explicitly assume unlock()/lock() provides stronger
> ordering than RCpc.

Just to be clear, the RCU code uses smp_mb__after_unlock_lock() to get
the ordering that it needs out of spinlocks.  Maybe that is what you
meant by "explicitly assume", but I figured I should clarify.

Thanx, Paul



Re: Consolidating RCU-bh, RCU-preempt, and RCU-sched

2018-07-12 Thread Paul E. McKenney
On Fri, Jul 13, 2018 at 11:47:18AM +0800, Lai Jiangshan wrote:
> On Fri, Jul 13, 2018 at 8:02 AM, Paul E. McKenney
>  wrote:
> > Hello!
> >
> > I now have a semi-reasonable prototype of changes consolidating the
> > RCU-bh, RCU-preempt, and RCU-sched update-side APIs in my -rcu tree.
> > There are likely still bugs to be fixed and probably other issues as well,
> > but a prototype does exist.
> >
> > Assuming continued good rcutorture results and no objections, I am
> > thinking in terms of this timeline:
> >
> > o   Preparatory work and cleanups are slated for the v4.19 merge window.
> >
> > o   The actual consolidation and post-consolidation cleanup is slated
> > for the merge window after v4.19 (v5.0?).  These cleanups include
> > the replacements called out below within the RCU implementation
> > itself (but excluding kernel/rcu/sync.c, see question below).
> >
> > o   Replacement of now-obsolete update APIs is slated for the second
> > merge window after v4.19 (v5.1?).  The replacements are currently
> > expected to be as follows:
> >
> > synchronize_rcu_bh() -> synchronize_rcu()
> > synchronize_rcu_bh_expedited() -> synchronize_rcu_expedited()
> > call_rcu_bh() -> call_rcu()
> > rcu_barrier_bh() -> rcu_barrier()
> > synchronize_sched() -> synchronize_rcu()
> > synchronize_sched_expedited() -> synchronize_rcu_expedited()
> > call_rcu_sched() -> call_rcu()
> > rcu_barrier_sched() -> rcu_barrier()
> > get_state_synchronize_sched() -> get_state_synchronize_rcu()
> > cond_synchronize_sched() -> cond_synchronize_rcu()
> > synchronize_rcu_mult() -> synchronize_rcu()
> >
> > I have done light testing of these replacements with good results.
> >
> > Any objections to this timeline?
> >
> > I also have some questions on the ultimate end point.  I have default
> > choices, which I will likely take if there is no discussion.
> >
> > o
> > Currently, I am thinking in terms of keeping the per-flavor
> > read-side functions.  For example, rcu_read_lock_bh() would
> > continue to disable softirq, and would also continue to tell
> > lockdep about the RCU-bh read-side critical section.  However,
> > synchronize_rcu() will wait for all flavors of read-side critical
> > sections, including those introduced by (say) preempt_disable(),
> > so there will no longer be any possibility of mismatching (say)
> > RCU-bh readers with RCU-sched updaters.
> >
> > I could imagine other ways of handling this, including:
> >
> > a.  Eliminate rcu_read_lock_bh() in favor of
> > local_bh_disable() and so on.  Rely on lockdep
> > instrumentation of these other functions to identify RCU
> > readers, introducing such instrumentation as needed.  I am
> > not a fan of this approach because of the large number of
> > places in the Linux kernel where interrupts, preemption,
> > and softirqs are enabled or disabled "behind the scenes".
> >
> > b.  Eliminate rcu_read_lock_bh() in favor of rcu_read_lock(),
> > and required callers to also disable softirqs, preemption,
> > or whatever as needed.  I am not a fan of this approach
> > because it seems a lot less convenient to users of RCU-bh
> > and RCU-sched.
> >
> > At the moment, I therefore favor keeping the RCU-bh and RCU-sched
> > read-side APIs.  But are there better approaches?
> 
> Hello, Paul
> 
> Since local_bh_disable() will be guaranteed to be protected by RCU
> and more general. I'm afraid it will be preferred over
> rcu_read_lock_bh() which will be gradually being phased out.
> 
> In other words, keeping the RCU-bh read-side APIs will be a slower
> version of the option A. So will the same approach for the RCU-sched.
> But it'll still be better than the hurrying option A, IMHO.

I am OK with the read-side RCU-bh and RCU-sched interfaces going away,
it is just that I am not willing to put all that much effort into
it myself.  ;-)

Unless there is a good reason for me to hurry it along, of course.

Thanx, Paul

> Thanks,
> Lai
> 
> >
> > o   How should kernel/rcu/sync.c be handled?  Here are some
> > possibilities:
> >
> > a.  Leave the full gp_ops[] array and simply translate
> > the obsolete update-side functions to their RCU
> > equivalents.
> >
> > b.  Leave the current gp_ops[] array, but only have
> > the RCU_SYNC entry.  The __INIT_HELD field would
> > be set to a function that was OK with being in an
> > RCU read-side critical section, an interrupt-disabled
> > section, etc.
> >
> > 

Re: Consolidating RCU-bh, RCU-preempt, and RCU-sched

2018-07-12 Thread Lai Jiangshan
On Fri, Jul 13, 2018 at 8:02 AM, Paul E. McKenney
 wrote:
> Hello!
>
> I now have a semi-reasonable prototype of changes consolidating the
> RCU-bh, RCU-preempt, and RCU-sched update-side APIs in my -rcu tree.
> There are likely still bugs to be fixed and probably other issues as well,
> but a prototype does exist.
>
> Assuming continued good rcutorture results and no objections, I am
> thinking in terms of this timeline:
>
> o   Preparatory work and cleanups are slated for the v4.19 merge window.
>
> o   The actual consolidation and post-consolidation cleanup is slated
> for the merge window after v4.19 (v5.0?).  These cleanups include
> the replacements called out below within the RCU implementation
> itself (but excluding kernel/rcu/sync.c, see question below).
>
> o   Replacement of now-obsolete update APIs is slated for the second
> merge window after v4.19 (v5.1?).  The replacements are currently
> expected to be as follows:
>
> synchronize_rcu_bh() -> synchronize_rcu()
> synchronize_rcu_bh_expedited() -> synchronize_rcu_expedited()
> call_rcu_bh() -> call_rcu()
> rcu_barrier_bh() -> rcu_barrier()
> synchronize_sched() -> synchronize_rcu()
> synchronize_sched_expedited() -> synchronize_rcu_expedited()
> call_rcu_sched() -> call_rcu()
> rcu_barrier_sched() -> rcu_barrier()
> get_state_synchronize_sched() -> get_state_synchronize_rcu()
> cond_synchronize_sched() -> cond_synchronize_rcu()
> synchronize_rcu_mult() -> synchronize_rcu()
>
> I have done light testing of these replacements with good results.
>
> Any objections to this timeline?
>
> I also have some questions on the ultimate end point.  I have default
> choices, which I will likely take if there is no discussion.
>
> o
> Currently, I am thinking in terms of keeping the per-flavor
> read-side functions.  For example, rcu_read_lock_bh() would
> continue to disable softirq, and would also continue to tell
> lockdep about the RCU-bh read-side critical section.  However,
> synchronize_rcu() will wait for all flavors of read-side critical
> sections, including those introduced by (say) preempt_disable(),
> so there will no longer be any possibility of mismatching (say)
> RCU-bh readers with RCU-sched updaters.
>
> I could imagine other ways of handling this, including:
>
> a.  Eliminate rcu_read_lock_bh() in favor of
> local_bh_disable() and so on.  Rely on lockdep
> instrumentation of these other functions to identify RCU
> readers, introducing such instrumentation as needed.  I am
> not a fan of this approach because of the large number of
> places in the Linux kernel where interrupts, preemption,
> and softirqs are enabled or disabled "behind the scenes".
>
> b.  Eliminate rcu_read_lock_bh() in favor of rcu_read_lock(),
> and required callers to also disable softirqs, preemption,
> or whatever as needed.  I am not a fan of this approach
> because it seems a lot less convenient to users of RCU-bh
> and RCU-sched.
>
> At the moment, I therefore favor keeping the RCU-bh and RCU-sched
> read-side APIs.  But are there better approaches?

Hello, Paul

Since local_bh_disable() will be guaranteed to be protected by RCU
and more general. I'm afraid it will be preferred over
rcu_read_lock_bh() which will be gradually being phased out.

In other words, keeping the RCU-bh read-side APIs will be a slower
version of the option A. So will the same approach for the RCU-sched.
But it'll still be better than the hurrying option A, IMHO.

Thanks,
Lai

>
> o   How should kernel/rcu/sync.c be handled?  Here are some
> possibilities:
>
> a.  Leave the full gp_ops[] array and simply translate
> the obsolete update-side functions to their RCU
> equivalents.
>
> b.  Leave the current gp_ops[] array, but only have
> the RCU_SYNC entry.  The __INIT_HELD field would
> be set to a function that was OK with being in an
> RCU read-side critical section, an interrupt-disabled
> section, etc.
>
> This allows for possible addition of SRCU functionality.
> It is also a trivial change.  Note that the sole user
> of sync.c uses RCU_SCHED_SYNC, and this would need to
> be changed to RCU_SYNC.
>
> But is it likely that we will ever add SRCU?
>
> c.  Eliminate that gp_ops[] array, hard-coding the function
> pointers into their call sites.
>
> I don't really have a preference.  Left to myself, I will be lazy
> and take opt

[PATCH v1 1/2] mm: fix race on soft-offlining free huge pages

2018-07-12 Thread Naoya Horiguchi
There's a race condition between soft offline and hugetlb_fault which
causes unexpected process killing and/or hugetlb allocation failure.

The process killing is caused by the following flow:

  CPU 0   CPU 1  CPU 2

  soft offline
get_any_page
// find the hugetlb is free
  mmap a hugetlb file
  page fault
...
  hugetlb_fault
hugetlb_no_page
  alloc_huge_page
  // succeed
  soft_offline_free_page
  // set hwpoison flag
 mmap the hugetlb file
 page fault
   ...
 hugetlb_fault
   hugetlb_no_page
 find_lock_page
   return VM_FAULT_HWPOISON
   mm_fault_error
 do_sigbus
 // kill the process


The hugetlb allocation failure comes from the following flow:

  CPU 0  CPU 1

 mmap a hugetlb file
 // reserve all free page but don't fault-in
  soft offline
get_any_page
// find the hugetlb is free
  soft_offline_free_page
  // set hwpoison flag
dissolve_free_huge_page
// fail because all free hugepages are reserved
 page fault
   ...
 hugetlb_fault
   hugetlb_no_page
 alloc_huge_page
   ...
 dequeue_huge_page_node_exact
 // ignore hwpoisoned hugepage
 // and finally fail due to no-mem

The root cause of this is that current soft-offline code is written
based on an assumption that PageHWPoison flag should beset at first to
avoid accessing the corrupted data.  This makes sense for memory_failure()
or hard offline, but does not for soft offline because soft offline is
about corrected (not uncorrected) error and is safe from data lost.
This patch changes soft offline semantics where it sets PageHWPoison flag
only after containment of the error page completes successfully.

Reported-by: Xishi Qiu 
Suggested-by: Xishi Qiu 
Signed-off-by: Naoya Horiguchi 
---
 mm/hugetlb.c| 11 +--
 mm/memory-failure.c | 22 --
 mm/migrate.c|  2 --
 3 files changed, 21 insertions(+), 14 deletions(-)

diff --git v4.18-rc4-mmotm-2018-07-10-16-50/mm/hugetlb.c 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/hugetlb.c
index 430be42..937c142 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/mm/hugetlb.c
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/hugetlb.c
@@ -1479,22 +1479,20 @@ static int free_pool_huge_page(struct hstate *h, 
nodemask_t *nodes_allowed,
 /*
  * Dissolve a given free hugepage into free buddy pages. This function does
  * nothing for in-use (including surplus) hugepages. Returns -EBUSY if the
- * number of free hugepages would be reduced below the number of reserved
- * hugepages.
+ * dissolution fails because a give page is not a free hugepage, or because
+ * free hugepages are fully reserved.
  */
 int dissolve_free_huge_page(struct page *page)
 {
-   int rc = 0;
+   int rc = -EBUSY;
 
spin_lock(&hugetlb_lock);
if (PageHuge(page) && !page_count(page)) {
struct page *head = compound_head(page);
struct hstate *h = page_hstate(head);
int nid = page_to_nid(head);
-   if (h->free_huge_pages - h->resv_huge_pages == 0) {
-   rc = -EBUSY;
+   if (h->free_huge_pages - h->resv_huge_pages == 0)
goto out;
-   }
/*
 * Move PageHWPoison flag from head page to the raw error page,
 * which makes any subpages rather than the error page reusable.
@@ -1508,6 +1506,7 @@ int dissolve_free_huge_page(struct page *page)
h->free_huge_pages_node[nid]--;
h->max_huge_pages--;
update_and_free_page(h, head);
+   rc = 0;
}
 out:
spin_unlock(&hugetlb_lock);
diff --git v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c
index 9d142b9..c63d982 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c
@@ -1598,8 +1598,18 @@ static int

[PATCH v1 0/2] mm: soft-offline: fix race against page allocation

2018-07-12 Thread Naoya Horiguchi
Xishi recently reported the issue about race on reusing the target pages
of soft offlining.
Discussion and analysis showed that we need make sure that setting PG_hwpoison
should be done in the right place under zone->lock for soft offline.
1/2 handles free hugepage's case, and 2/2 hanldes free buddy page's case.

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (2):
  mm: fix race on soft-offlining free huge pages
  mm: soft-offline: close the race against page allocation

 include/linux/page-flags.h |  5 +
 include/linux/swapops.h| 10 --
 mm/hugetlb.c   | 11 +--
 mm/memory-failure.c| 44 +++-
 mm/migrate.c   |  4 +---
 mm/page_alloc.c| 29 +
 6 files changed, 75 insertions(+), 28 deletions(-)


[PATCH v1 2/2] mm: soft-offline: close the race against page allocation

2018-07-12 Thread Naoya Horiguchi
A process can be killed with SIGBUS(BUS_MCEERR_AR) when it tries to
allocate a page that was just freed on the way of soft-offline.
This is undesirable because soft-offline (which is about corrected error)
is less aggressive than hard-offline (which is about uncorrected error),
and we can make soft-offline fail and keep using the page for good reason
like "system is busy."

Two main changes of this patch are:

- setting migrate type of the target page to MIGRATE_ISOLATE. As done
  in free_unref_page_commit(), this makes kernel bypass pcplist when
  freeing the page. So we can assume that the page is in freelist just
  after put_page() returns,

- setting PG_hwpoison on free page under zone->lock which protects
  freelists, so this allows us to avoid setting PG_hwpoison on a page
  that is decided to be allocated soon.

Reported-by: Xishi Qiu 
Signed-off-by: Naoya Horiguchi 
---
 include/linux/page-flags.h |  5 +
 include/linux/swapops.h| 10 --
 mm/memory-failure.c| 26 +-
 mm/migrate.c   |  2 +-
 mm/page_alloc.c| 29 +
 5 files changed, 56 insertions(+), 16 deletions(-)

diff --git v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/page-flags.h 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/page-flags.h
index 901943e..74bee8c 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/page-flags.h
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/page-flags.h
@@ -369,8 +369,13 @@ PAGEFLAG_FALSE(Uncached)
 PAGEFLAG(HWPoison, hwpoison, PF_ANY)
 TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
+extern bool set_hwpoison_free_buddy_page(struct page *page);
 #else
 PAGEFLAG_FALSE(HWPoison)
+static inline bool set_hwpoison_free_buddy_page(struct page *page)
+{
+   return 0;
+}
 #define __PG_HWPOISON 0
 #endif
 
diff --git v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/swapops.h 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/swapops.h
index 9c0eb4d..fe8e08b 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/include/linux/swapops.h
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/include/linux/swapops.h
@@ -335,11 +335,6 @@ static inline int is_hwpoison_entry(swp_entry_t entry)
return swp_type(entry) == SWP_HWPOISON;
 }
 
-static inline bool test_set_page_hwpoison(struct page *page)
-{
-   return TestSetPageHWPoison(page);
-}
-
 static inline void num_poisoned_pages_inc(void)
 {
atomic_long_inc(&num_poisoned_pages);
@@ -362,11 +357,6 @@ static inline int is_hwpoison_entry(swp_entry_t swp)
return 0;
 }
 
-static inline bool test_set_page_hwpoison(struct page *page)
-{
-   return false;
-}
-
 static inline void num_poisoned_pages_inc(void)
 {
 }
diff --git v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c 
v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c
index c63d982..794687a 100644
--- v4.18-rc4-mmotm-2018-07-10-16-50/mm/memory-failure.c
+++ v4.18-rc4-mmotm-2018-07-10-16-50_patched/mm/memory-failure.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 #include "ras/ras_event.h"
 
@@ -1697,6 +1698,7 @@ static int __soft_offline_page(struct page *page, int 
flags)
 static int soft_offline_in_use_page(struct page *page, int flags)
 {
int ret;
+   int mt;
struct page *hpage = compound_head(page);
 
if (!PageHuge(page) && PageTransHuge(hpage)) {
@@ -1715,23 +1717,37 @@ static int soft_offline_in_use_page(struct page *page, 
int flags)
put_hwpoison_page(hpage);
}
 
+   /*
+* Setting MIGRATE_ISOLATE here ensures that the page will be linked
+* to free list immediately (not via pcplist) when released after
+* successful page migration. Otherwise we can't guarantee that the
+* page is really free after put_page() returns, so
+* set_hwpoison_free_buddy_page() highly likely fails.
+*/
+   mt = get_pageblock_migratetype(page);
+   set_pageblock_migratetype(page, MIGRATE_ISOLATE);
if (PageHuge(page))
ret = soft_offline_huge_page(page, flags);
else
ret = __soft_offline_page(page, flags);
-
+   set_pageblock_migratetype(page, mt);
return ret;
 }
 
-static void soft_offline_free_page(struct page *page)
+static int soft_offline_free_page(struct page *page)
 {
int rc = 0;
struct page *head = compound_head(page);
 
if (PageHuge(head))
rc = dissolve_free_huge_page(page);
-   if (!rc && !TestSetPageHWPoison(page))
-   num_poisoned_pages_inc();
+   if (!rc) {
+   if (set_hwpoison_free_buddy_page(page))
+   num_poisoned_pages_inc();
+   else
+   rc = -EBUSY;
+   }
+   return rc;
 }
 
 /**
@@ -1775,7 +1791,7 @@ int soft_offline_page(struct page *page, int flags)
if (ret > 0)
ret = soft_offline

Re: [PATCH 5/5] f2fs: do not __punch_discard_cmd in lfs mode

2018-07-12 Thread Chao Yu
On 2018/7/12 23:09, Yunlong Song wrote:
> In lfs mode, it is better to submit and wait for discard of the
> new_blkaddr's overall section, rather than punch it which makes
> more small discards and is not friendly with flash alignment. And
> f2fs does not have to wait discard of each new_blkaddr except for the
> start_block of each section with this patch.

For non-zoned block device, unaligned discard can be allowed; and if synchronous
discard is very slow, it will block block allocator here, rather than that, I
prefer just punch 4k lba of discard entry for performance.

If you don't want to encounter this condition, I suggest issue large size
discard more quickly.

Thanks,

> 
> Signed-off-by: Yunlong Song 
> ---
>  fs/f2fs/segment.c | 76 
> ++-
>  fs/f2fs/segment.h |  7 -
>  2 files changed, 75 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index f6c20e0..bce321a 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -893,7 +893,19 @@ static void __remove_discard_cmd(struct f2fs_sb_info 
> *sbi,
>  static void f2fs_submit_discard_endio(struct bio *bio)
>  {
>   struct discard_cmd *dc = (struct discard_cmd *)bio->bi_private;
> + struct f2fs_sb_info *sbi = F2FS_SB(dc->bdev->bd_super);
>  
> + if (test_opt(sbi, LFS)) {
> + unsigned int segno = GET_SEGNO(sbi, dc->lstart);
> + unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> + int cnt = (dc->len >> sbi->log_blocks_per_seg) /
> + sbi->segs_per_sec;
> +
> + while (cnt--) {
> + set_bit(secno, FREE_I(sbi)->discard_secmap);
> + secno++;
> + }
> + }
>   dc->error = blk_status_to_errno(bio->bi_status);
>   dc->state = D_DONE;
>   complete_all(&dc->wait);
> @@ -1349,8 +1361,15 @@ static void f2fs_wait_discard_bio(struct f2fs_sb_info 
> *sbi, block_t blkaddr)
>   dc = (struct discard_cmd *)f2fs_lookup_rb_tree(&dcc->root,
>   NULL, blkaddr);
>   if (dc) {
> - if (dc->state == D_PREP) {
> + if (dc->state == D_PREP && !test_opt(sbi, LFS))
>   __punch_discard_cmd(sbi, dc, blkaddr);
> + else if (dc->state == D_PREP && test_opt(sbi, LFS)) {
> + struct discard_policy dpolicy;
> +
> + __init_discard_policy(sbi, &dpolicy, DPOLICY_FORCE, 1);
> + __submit_discard_cmd(sbi, &dpolicy, dc);
> + dc->ref++;
> + need_wait = true;
>   } else {
>   dc->ref++;
>   need_wait = true;
> @@ -2071,9 +2090,10 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
>   unsigned int hint = GET_SEC_FROM_SEG(sbi, *newseg);
>   unsigned int old_zoneno = GET_ZONE_FROM_SEG(sbi, *newseg);
>   unsigned int left_start = hint;
> - bool init = true;
> + bool init = true, check_discard = test_opt(sbi, LFS) ? true : false;
>   int go_left = 0;
>   int i;
> + unsigned long *free_secmap;
>  
>   spin_lock(&free_i->segmap_lock);
>  
> @@ -2084,11 +2104,25 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
>   goto got_it;
>   }
>  find_other_zone:
> - secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint);
> + if (check_discard) {
> + int entries = f2fs_bitmap_size(MAIN_SECS(sbi)) / 
> sizeof(unsigned long);
> +
> + free_secmap = free_i->tmp_secmap;
> + for (i = 0; i < entries; i++)
> + free_secmap[i] = (!(free_i->free_secmap[i] ^
> + free_i->discard_secmap[i])) | 
> free_i->free_secmap[i];
> + } else
> + free_secmap = free_i->free_secmap;
> +
> + secno = find_next_zero_bit(free_secmap, MAIN_SECS(sbi), hint);
>   if (secno >= MAIN_SECS(sbi)) {
>   if (dir == ALLOC_RIGHT) {
> - secno = find_next_zero_bit(free_i->free_secmap,
> + secno = find_next_zero_bit(free_secmap,
>   MAIN_SECS(sbi), 0);
> + if (secno >= MAIN_SECS(sbi) && check_discard) {
> + check_discard = false;
> + goto find_other_zone;
> + }
>   f2fs_bug_on(sbi, secno >= MAIN_SECS(sbi));
>   } else {
>   go_left = 1;
> @@ -2098,13 +2132,17 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
>   if (go_left == 0)
>   goto skip_left;
>  
> - while (test_bit(left_start, free_i->free_secmap)) {
> + while (test_bit(left_start, free_secmap)) {
>   if (left_start > 0) {
>   left_start--;
>   continue;
>

Re: [PATCH v2 1/3] dt-bindings: thermal: Add binding document for SR thermal

2018-07-12 Thread Srinath Mannam
Hi Rob,

I have provided my inputs for the purpose of having multiple nodes.
Please get back if you have any comments or suggestions.

Regards,
Srinath.

On Tue, Jul 3, 2018 at 4:15 PM, Srinath Mannam
 wrote:
> Hi Rob,
>
> Kindly provide your feedback.
>
> Regards,
> Srinath.
>
> On Fri, Jun 22, 2018 at 11:21 AM, Srinath Mannam
>  wrote:
>> Hi Rob,
>>
>> Please find my comments for the reason to have multiple DT nodes.
>>
>> On Thu, Jun 21, 2018 at 1:22 AM, Rob Herring  wrote:
>>> On Mon, Jun 18, 2018 at 02:01:17PM +0530, Srinath Mannam wrote:
 From: Pramod Kumar 

 Add binding document for supported thermal implementation
 in Stingray.

 Signed-off-by: Pramod Kumar 
 Reviewed-by: Ray Jui 
 Reviewed-by: Scott Branden 
 Reviewed-by: Srinath Mannam 
 ---
  .../bindings/thermal/brcm,sr-thermal.txt   | 45 
 ++
  1 file changed, 45 insertions(+)
  create mode 100644 
 Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt

 diff --git a/Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt 
 b/Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt
 new file mode 100644
 index 000..33f9e11
 --- /dev/null
 +++ b/Documentation/devicetree/bindings/thermal/brcm,sr-thermal.txt
 @@ -0,0 +1,45 @@
 +* Broadcom Stingray Thermal
 +
 +This binding describes thermal sensors that is part of Stingray SoCs.
 +
 +Required properties:
 +- compatible : Must be "brcm,sr-thermal"
 +- reg : memory where tmon data will be available.
 +
 +Example:
 + tmons {
 + compatible = "simple-bus";
 + #address-cells = <1>;
 + #size-cells = <1>;
 + ranges;
 +
 + tmon_ihost0: thermal@8f10 {
 + compatible = "brcm,sr-thermal";
 + reg = <0x8f10 0x4>;
 + };
>>>
>>> You still haven't given me a compelling reason why you need a node per
>>> register.
>>>
>>> You have a single range of registers. Make this 1 node.
>>>
>>
>> We Have two reasons to have multiple nodes..
>> 1. Our chip has multiple functional blocks. Each functional block has
>> its own thermal zone.
>> Functional blocks and their thermal zones enabled/disabled based on end 
>> product.
>> Few functional blocks need to disabled for few products so thermal
>> zones also need to disable.
>> In that case, nodes of specific thermal zones are removed from DTS
>> file of corresponding product.
>>
>> 2. Thermal framework provides sysfs interface to configure thermal
>> zones and read temperature of thermal zone.
>> To configure individual thermal zone, we need to have separate DT node.
>> Same to read temperature of individual thermal zone.
>> Ex: To read temperature of thermal zone 0.
>>  cat /sys/class/thermal/thermal_zone0/temp
>>  To configure trip temperature of thermal zone 0.
>>   echo 11 > /sys/class/thermal/thermal_zone0/trip_point_0_temp
>>
>> Also to avoid driver source change for the multiple products it is
>> clean to have multiple DT nodes.
>>
>>> Rob


general protection fault in propagate_entity_cfs_rq

2018-07-12 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:6fd066604123 Merge branch 'bpf-arm-jit-improvements'
git tree:   bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=11e9267840
kernel config:  https://syzkaller.appspot.com/x/.config?x=a501a01deaf0fe9
dashboard link: https://syzkaller.appspot.com/bug?extid=2e37f794f31be5667a88
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1014db9440
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11f81e7840

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+2e37f794f31be5667...@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): team0: link is not ready
8021q: adding VLAN 0 to HW filter on device team0
IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #51
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:propagate_entity_cfs_rq.isra.70+0x199/0x20c0  
kernel/sched/fair.c:10039
Code: 0d 02 00 00 48 c7 c0 60 70 2a 89 48 89 f9 48 c1 e8 03 48 01 d8 48 89  
85 28 fb ff ff 4c 8d a9 58 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 18 00 0f  
85 5e 11 00 00 4c 8b a1 58 01 00 00 0f 1f 44 00 00

RSP: 0018:8801daf06c90 EFLAGS: 00010003
RAX: 03fffe20074fc1d0 RBX: dc00 RCX: 11003a7e0d2c
RDX: 11003a7e0d2a RSI: 11003b5e0e7f RDI: 11003a7e0d2c
RBP: 8801daf071a0 R08: 8801dae2cbc0 R09: 111a25cc
R10: 019d6e0b R11:  R12: 11003b5e0e3b
R13: 11003a7e0e84 R14: 8801d3f06800 R15: 
FS:  () GS:8801daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fb1b24d7e78 CR3: 0001ab04b000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 detach_entity_cfs_rq+0x6e3/0xf50 kernel/sched/fair.c:10059
 migrate_task_rq_fair+0xba/0x290 kernel/sched/fair.c:6709
 set_task_cpu+0x131/0x770 kernel/sched/core.c:1194
 detach_task.isra.89+0xdb/0x150 kernel/sched/fair.c:7438
 detach_tasks kernel/sched/fair.c:7525 [inline]
 load_balance+0xf0b/0x3640 kernel/sched/fair.c:8884
 rebalance_domains+0x82a/0xd90 kernel/sched/fair.c:9262
 run_rebalance_domains+0x365/0x4c0 kernel/sched/fair.c:9884
 __do_softirq+0x2e8/0xb17 kernel/softirq.c:288
 invoke_softirq kernel/softirq.c:368 [inline]
 irq_exit+0x1d1/0x200 kernel/softirq.c:408
 exiting_irq arch/x86/include/asm/apic.h:527 [inline]
 smp_apic_timer_interrupt+0x186/0x730 arch/x86/kernel/apic/apic.c:1052
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
 
RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54
Code: c7 48 89 45 d8 e8 5a 04 24 fa 48 8b 45 d8 e9 d2 fe ff ff 48 89 df e8  
49 04 24 fa eb 8a 90 90 90 90 90 90 90 55 48 89 e5 fb f4 <5d> c3 0f 1f 84  
00 00 00 00 00 55 48 89 e5 f4 5d c3 90 90 90 90 90

RSP: 0018:8801d9af7c38 EFLAGS: 0286 ORIG_RAX: ff13
RAX: dc00 RBX: 11003b35ef8a RCX: 81667982
RDX: 111e3610 RSI: 0004 RDI: 88f1b080
RBP: 8801d9af7c38 R08: ed003b5e46d7 R09: ed003b5e46d6
R10: ed003b5e46d6 R11: 8801daf236b3 R12: 0001
R13: 8801d9af7cf0 R14: 899edd20 R15: 
 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
 default_idle+0xc7/0x450 arch/x86/kernel/process.c:500
 arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:491
 default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
 cpuidle_idle_call kernel/sched/idle.c:153 [inline]
 do_idle+0x3aa/0x570 kernel/sched/idle.c:262
 cpu_startup_entry+0x10c/0x120 kernel/sched/idle.c:368
 start_secondary+0x433/0x5d0 arch/x86/kernel/smpboot.c:265
 secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242
Modules linked in:
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace cb0cd83b57bb4bba ]---
RIP: 0010:propagate_entity_cfs_rq.isra.70+0x199/0x20c0  
kernel/sched/fair.c:10039
Code: 0d 02 00 00 48 c7 c0 60 70 2a 89 48 89 f9 48 c1 e8 03 48 01 d8 48 89  
85 28 fb ff ff 4c 8d a9 58 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 18 00 0f  
85 5e 11 00 00 4c 8b a1 58 01 00 00 0f 1f 44 00 00

RSP: 0018:8801daf06c90 EFLAGS: 00010003
RAX: 03fffe20074fc1d0 RBX: dc00 RCX: 11003a7e0d2c
RDX: 11003a7e0d2a RSI: 11003b5e0e7f RDI: 11003a7e0d2c
RBP: 8801daf071a0 R08: 8801dae2cbc0 R09: 111a25cc
R10: 019d6e0b R11:  R12: 11003b5e0e3b
R13: 11003a7e0e84 R14: 8801d3f06800 R15: 
FS:  () GS:8801daf0() knlGS:
CS:  0010 DS:  ES:  CR0: 800500

Re: [PATCH 1/2] tracing: kprobes: Prohibit probing on notrace functions

2018-07-12 Thread Masami Hiramatsu
On Thu, 12 Jul 2018 13:54:12 -0400
Francis Deslauriers  wrote:

> From: Masami Hiramatsu 
> 
> Prohibit kprobe-events probing on notrace function.
> Since probing on the notrace function can cause recursive
> event call. In most case those are just skipped, but
> in some case it falls into infinite recursive call.

BTW, I'm considering to add an option to allow putting
kprobes on notrace function - just for debugging 
ftrace by kprobes. That is "developer only" option
so generally it should be disabled, but for debugging
the ftrace, we still need it. Or should I introduce
another kprobes module for debugging it?

Thank you,


-- 
Masami Hiramatsu 


Re: [V9fs-developer] [PATCH v2 3/6] 9p: Replace the fidlist with an IDR

2018-07-12 Thread Matthew Wilcox
On Fri, Jul 13, 2018 at 10:05:50AM +0800, jiangyiwen wrote:
> > @@ -908,30 +908,29 @@ static struct p9_fid *p9_fid_create(struct p9_client 
> > *clnt)
> >  {
> > int ret;
> > struct p9_fid *fid;
> > -   unsigned long flags;
> >  
> > p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
> > fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
> > if (!fid)
> > return NULL;
> >  
> > -   ret = p9_idpool_get(clnt->fidpool);
> > -   if (ret < 0)
> > -   goto error;
> > -   fid->fid = ret;
> > -
> > memset(&fid->qid, 0, sizeof(struct p9_qid));
> > fid->mode = -1;
> > fid->uid = current_fsuid();
> > fid->clnt = clnt;
> > fid->rdir = NULL;
> > -   spin_lock_irqsave(&clnt->lock, flags);
> > -   list_add(&fid->flist, &clnt->fidlist);
> > -   spin_unlock_irqrestore(&clnt->lock, flags);
> > +   fid->fid = 0;
> >  
> > -   return fid;
> > +   idr_preload(GFP_KERNEL);
> 
> It is best to use GFP_NOFS instead, or else it may cause some
> unpredictable problem, because when out of memory it will
> reclaim memory from v9fs.

Earlier in this function, fid was allocated with GFP_KERNEL:

> > fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);


> > +   spin_lock_irq(&clnt->lock);
> > +   ret = idr_alloc_u32(&clnt->fids, fid, &fid->fid, P9_NOFID - 1,
> > +   GFP_NOWAIT);
> > +   spin_unlock_irq(&clnt->lock);
> 
> use spin_lock instead, clnt->lock is not used in irq context.

I don't think that's right.  What about p9_fid_destroy?  It was already
using spin_lock_irqsave(), so I just assumed that whoever wrote that
code at least considered that it might be called from interrupt context.

Also consider p9_free_req() which shares the same lock.  We could get
rid of clnt->lock altogether as there's a lock embedded in each IDR,
but that'll introduce an unwanted dependence on the RDMA tree in this
merge window.

> > @@ -1095,14 +1086,11 @@ void p9_client_destroy(struct p9_client *clnt)
> >  
> > v9fs_put_trans(clnt->trans_mod);
> >  
> > -   list_for_each_entry_safe(fid, fidptr, &clnt->fidlist, flist) {
> > +   idr_for_each_entry(&clnt->fids, fid, id) {
> > pr_info("Found fid %d not clunked\n", fid->fid);
> > p9_fid_destroy(fid);
> > }
> >  
> > -   if (clnt->fidpool)
> > -   p9_idpool_destroy(clnt->fidpool);
> > -
> 
> I suggest add idr_destroy in the end.

Why?  p9_fid_destroy calls idr_remove() for each fid, so it'll already
be empty.

Thanks for all the review, to everyone who's submitted review.  This is
a really healthy community.


Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]

2018-07-12 Thread Theodore Y. Ts'o
On Thu, Jul 12, 2018 at 11:54:41PM +0100, David Howells wrote:
> 
> Would that mean then that doing:
> 
>   mount /dev/sda3 /a
>   mount /dev/sda3 /b
> 
> would then fail on the second command because /dev/sda3 is already open
> exclusively?

Good point.  One workaround would be to require an open with O_PATH instead.

- Ted


Re: [PATCH 0/2] scsi: arcmsr: fix error of resuming from hibernation

2018-07-12 Thread Martin K. Petersen


Ching,

> This patch series are against to mkp's 4.19/scsi-queue.
>
> 1. Fix error of resuming from hibernation for adapter type E.
> 2. Update driver version to v1.40.00.09-20180709

Applied to 4.19/scsi-queue, thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

2018-07-12 Thread Daniel Lustig
On 7/12/2018 2:45 AM, Will Deacon wrote:
> On Thu, Jul 12, 2018 at 11:34:32AM +0200, Peter Zijlstra wrote:
>> On Thu, Jul 12, 2018 at 09:40:40AM +0200, Peter Zijlstra wrote:
>>> And I think if we raise atomic*_acquire() to require TSO (but ideally
>>> raise it to RCsc) we're there.
>>
>> To clarify, just the RmW-acquire. Things like atomic_read_acquire() can
>> stay smp_load_acquire() and be RCpc.
> 
> I don't have strong opinions about strengthening RmW atomics to TSO, so
> if it helps to unblock Alan's patch (which doesn't go near this!) then I'll
> go with it. The important part is that we continue to allow roach motel
> into the RmW for other accesses in the non-fully-ordered cases.
> 
> Daniel -- your AMO instructions are cool with this, right? It's just the
> fence-based implementations that will need help?
> 
> Will
Right, let me pull this part out of the overly-long response I just gave
on the thread with Linus :)

if we pair AMOs with AMOs, we get RCsc, and everything is fine.  If we
start mixing in fences (mostly because we don't currently have native
load-acquire or store-release opcodes), then that's when all the rest of the
complexity comes in.

Dan


RE: [PATCH V2] ARM: dts: make pfuze switch always-on for imx platforms

2018-07-12 Thread Anson Huang
Hi, Shawn
Although the commit 5fe156f1cab4 ("regulator: pfuze100: add 
enable/disable for switch") is reverted to avoid the boot failure on some i.MX 
platforms, but adding the "regulator-always-on" property for those pfuze's 
critical switches are the right way and making sense, no matter how the pfuze 
regulator's switch ON/OFF function will be implemented, below patches should 
can be applied anyway?

ARM: dts: imx6sll-evk: make pfuze100 sw4 always on
ARM: dts: make pfuze switch always-on for imx platforms
ARM: dts: imx6sl-evk: keep sw4 always on

Let me know your thoughts, thanks!

Anson Huang
Best Regards!


> -Original Message-
> From: Anson Huang
> Sent: Wednesday, June 27, 2018 9:31 AM
> To: shawn...@kernel.org; s.ha...@pengutronix.de; ker...@pengutronix.de;
> Fabio Estevam ; robh...@kernel.org;
> mark.rutl...@arm.com
> Cc: dl-linux-imx ; linux-arm-ker...@lists.infradead.org;
> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH V2] ARM: dts: make pfuze switch always-on for imx platforms
> 
> commit 5fe156f1cab4 ("regulator: pfuze100: add enable/disable for switch")
> will cause those unreferenced switches being turned off if
> "regulator-always-on" is NOT present, as pfuze switches are normally used by
> critical modules which must be always ON or shared by many peripherals which
> do NOT implement power domain control, so just make sure all switches
> always ON to avoid any system issue caused by unexpectedly turning off
> switches.
> 
> Fixes: 5fe156f1cab4 ("regulator: pfuze100: add enable/disable for switch")
> Signed-off-by: Anson Huang 
> Reviewed-by: Fabio Estevam 
> ---
> changes since V1:
>   improve the way of referencing commit, and add fix tag.
>  arch/arm/boot/dts/imx6q-display5.dtsi  | 1 +
>  arch/arm/boot/dts/imx6q-mccmon6.dts| 1 +
>  arch/arm/boot/dts/imx6q-novena.dts | 1 +
>  arch/arm/boot/dts/imx6q-pistachio.dts  | 1 +
> arch/arm/boot/dts/imx6qdl-gw54xx.dtsi  | 1 +
> arch/arm/boot/dts/imx6qdl-sabresd.dtsi | 1 +
> arch/arm/boot/dts/imx6sx-sdb-reva.dts  | 1 +
>  7 files changed, 7 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx6q-display5.dtsi
> b/arch/arm/boot/dts/imx6q-display5.dtsi
> index 85232c7..33d266f 100644
> --- a/arch/arm/boot/dts/imx6q-display5.dtsi
> +++ b/arch/arm/boot/dts/imx6q-display5.dtsi
> @@ -326,6 +326,7 @@
>   sw4_reg: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   swbst_reg: swbst {
> diff --git a/arch/arm/boot/dts/imx6q-mccmon6.dts
> b/arch/arm/boot/dts/imx6q-mccmon6.dts
> index b7e9f38..e6429c5 100644
> --- a/arch/arm/boot/dts/imx6q-mccmon6.dts
> +++ b/arch/arm/boot/dts/imx6q-mccmon6.dts
> @@ -166,6 +166,7 @@
>   sw4_reg: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   swbst_reg: swbst {
> diff --git a/arch/arm/boot/dts/imx6q-novena.dts
> b/arch/arm/boot/dts/imx6q-novena.dts
> index fcd824d..0b3c651 100644
> --- a/arch/arm/boot/dts/imx6q-novena.dts
> +++ b/arch/arm/boot/dts/imx6q-novena.dts
> @@ -341,6 +341,7 @@
>   reg_sw4: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   reg_swbst: swbst {
> diff --git a/arch/arm/boot/dts/imx6q-pistachio.dts
> b/arch/arm/boot/dts/imx6q-pistachio.dts
> index a31e83c..6ea09f9 100644
> --- a/arch/arm/boot/dts/imx6q-pistachio.dts
> +++ b/arch/arm/boot/dts/imx6q-pistachio.dts
> @@ -253,6 +253,7 @@
>   sw4_reg: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   swbst_reg: swbst {
> diff --git a/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi
> b/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi
> index a1a6fb5..281cae5 100644
> --- a/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi
> +++ b/arch/arm/boot/dts/imx6qdl-gw54xx.dtsi
> @@ -268,6 +268,7 @@
>   sw4_reg: sw4 {
>   regulator-min-microvolt = <80>;
>   regulator-max-microvolt = <330>;
> + regulator-always-on;
>   };
> 
>   swbst_reg: swbst {
> diff --git a/arch/arm/boot/dts/imx6qdl-sabresd.dtsi
> b/arch/arm/boot/dts/imx6qdl-sabresd.dtsi
> index 15744ad..6e46a19 100644
> --- a/arch/arm/boot/dts/imx6qdl-sabresd.dtsi
> +++ b/arch

[PATCH] mm, swap: Make CONFIG_THP_SWAP depends on CONFIG_SWAP

2018-07-12 Thread Huang, Ying
From: Huang Ying 

CONFIG_THP_SWAP should depend on CONFIG_SWAP, because it's
unreasonable to optimize swapping for THP (Transparent Huge Page)
without basic swapping support.

In original code, when CONFIG_SWAP=n and CONFIG_THP_SWAP=y,
split_swap_cluster() will not be built because it is in swapfile.c,
but it will be called in huge_memory.c.  This doesn't trigger a build
error in practice because the call site is enclosed by
PageSwapCache(), which is defined to be constant 0 when CONFIG_SWAP=n.
But this is fragile and should be fixed.

The comments are fixed too to reflect the latest progress.

Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out")
Signed-off-by: "Huang, Ying" 
Reviewed-by: Dan Williams 
Reviewed-by: Naoya Horiguchi 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Dave Hansen 
Cc: Zi Yan 
Cc: Daniel Jordan 
---
 mm/Kconfig | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index b78e7cd4e9fe..97114c94239c 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -419,10 +419,11 @@ config ARCH_WANTS_THP_SWAP
 
 config THP_SWAP
def_bool y
-   depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP
+   depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP && SWAP
help
  Swap transparent huge pages in one piece, without splitting.
- XXX: For now this only does clustered swap space allocation.
+ XXX: For now, swap cluster backing transparent huge page
+ will be split after swapout.
 
  For selection by architectures with reasonable THP sizes.
 
-- 
2.16.4



Re: [V9fs-developer] [PATCH v2 3/6] 9p: Replace the fidlist with an IDR

2018-07-12 Thread jiangyiwen
On 2018/7/12 5:02, Matthew Wilcox wrote:
> The p9_idpool being used to allocate the IDs uses an IDR to allocate
> the IDs ... which we then keep in a doubly-linked list, rather than in
> the IDR which allocated them.  We can use an IDR directly which saves
> two pointers per p9_fid, and a tiny memory allocation per p9_client.
> 
> Signed-off-by: Matthew Wilcox 
> ---
>  include/net/9p/client.h |  9 +++--
>  net/9p/client.c | 44 +++--
>  2 files changed, 19 insertions(+), 34 deletions(-)
> 
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index 7af9d769b97d..e405729cd1c7 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -27,6 +27,7 @@
>  #define NET_9P_CLIENT_H
>  
>  #include 
> +#include 
>  
>  /* Number of requests per row */
>  #define P9_ROW_MAXTAG 255
> @@ -128,8 +129,7 @@ struct p9_req_t {
>   * @proto_version: 9P protocol version to use
>   * @trans_mod: module API instantiated with this client
>   * @trans: tranport instance state and API
> - * @fidpool: fid handle accounting for session
> - * @fidlist: List of active fid handles
> + * @fids: All active FID handles
>   * @tagpool - transaction id accounting for session
>   * @reqs - 2D array of requests
>   * @max_tag - current maximum tag id allocated
> @@ -169,8 +169,7 @@ struct p9_client {
>   } tcp;
>   } trans_opts;
>  
> - struct p9_idpool *fidpool;
> - struct list_head fidlist;
> + struct idr fids;
>  
>   struct p9_idpool *tagpool;
>   struct p9_req_t *reqs[P9_ROW_MAXTAG];
> @@ -188,7 +187,6 @@ struct p9_client {
>   * @iounit: the server reported maximum transaction size for this file
>   * @uid: the numeric uid of the local user who owns this handle
>   * @rdir: readdir accounting structure (allocated on demand)
> - * @flist: per-client-instance fid tracking
>   * @dlist: per-dentry fid tracking
>   *
>   * TODO: This needs lots of explanation.
> @@ -204,7 +202,6 @@ struct p9_fid {
>  
>   void *rdir;
>  
> - struct list_head flist;
>   struct hlist_node dlist;/* list of all fids attached to a 
> dentry */
>  };
>  
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 389a2904b7b3..b89c7298267c 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -908,30 +908,29 @@ static struct p9_fid *p9_fid_create(struct p9_client 
> *clnt)
>  {
>   int ret;
>   struct p9_fid *fid;
> - unsigned long flags;
>  
>   p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
>   fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
>   if (!fid)
>   return NULL;
>  
> - ret = p9_idpool_get(clnt->fidpool);
> - if (ret < 0)
> - goto error;
> - fid->fid = ret;
> -
>   memset(&fid->qid, 0, sizeof(struct p9_qid));
>   fid->mode = -1;
>   fid->uid = current_fsuid();
>   fid->clnt = clnt;
>   fid->rdir = NULL;
> - spin_lock_irqsave(&clnt->lock, flags);
> - list_add(&fid->flist, &clnt->fidlist);
> - spin_unlock_irqrestore(&clnt->lock, flags);
> + fid->fid = 0;
>  
> - return fid;
> + idr_preload(GFP_KERNEL);

It is best to use GFP_NOFS instead, or else it may cause some
unpredictable problem, because when out of memory it will
reclaim memory from v9fs.

> + spin_lock_irq(&clnt->lock);
> + ret = idr_alloc_u32(&clnt->fids, fid, &fid->fid, P9_NOFID - 1,
> + GFP_NOWAIT);
> + spin_unlock_irq(&clnt->lock);

use spin_lock instead, clnt->lock is not used in irq context.

> + idr_preload_end();
> +
> + if (!ret)
> + return fid;
>  
> -error:
>   kfree(fid);
>   return NULL;
>  }
> @@ -943,9 +942,8 @@ static void p9_fid_destroy(struct p9_fid *fid)
>  
>   p9_debug(P9_DEBUG_FID, "fid %d\n", fid->fid);
>   clnt = fid->clnt;
> - p9_idpool_put(fid->fid, clnt->fidpool);
>   spin_lock_irqsave(&clnt->lock, flags);
> - list_del(&fid->flist);
> + idr_remove(&clnt->fids, fid->fid);
>   spin_unlock_irqrestore(&clnt->lock, flags);
>   kfree(fid->rdir);
>   kfree(fid);
> @@ -1028,7 +1026,7 @@ struct p9_client *p9_client_create(const char 
> *dev_name, char *options)
>   memcpy(clnt->name, client_id, strlen(client_id) + 1);
>  
>   spin_lock_init(&clnt->lock);
> - INIT_LIST_HEAD(&clnt->fidlist);
> + idr_init(&clnt->fids);
>  
>   err = p9_tag_init(clnt);
>   if (err < 0)
> @@ -1048,18 +1046,12 @@ struct p9_client *p9_client_create(const char 
> *dev_name, char *options)
>   goto destroy_tagpool;
>   }
>  
> - clnt->fidpool = p9_idpool_create();
> - if (IS_ERR(clnt->fidpool)) {
> - err = PTR_ERR(clnt->fidpool);
> - goto put_trans;
> - }
> -
>   p9_debug(P9_DEBUG_MUX, "clnt %p trans %p msize %d protocol %d\n",
>clnt, clnt->trans_mod, clnt->msize, clnt->proto_version);
>  
>   err = clnt->trans_mod->create(clnt, dev_name, options);
>   if (err)
> -  

Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

2018-07-12 Thread Daniel Lustig
On 7/12/2018 11:10 AM, Linus Torvalds wrote:
> On Thu, Jul 12, 2018 at 11:05 AM Peter Zijlstra  wrote:
>>
>> The locking pattern is fairly simple and shows where RCpc comes apart
>> from expectation real nice.
> 
> So who does RCpc right now for the unlock-lock sequence? Somebody
> mentioned powerpc. Anybody else?
> 
> How nasty would be be to make powerpc conform? I will always advocate
> tighter locking and ordering rules over looser ones..
> 
> Linus

RISC-V probably would have been RCpc if we weren't having this discussion.
Depending on how we map atomics/acquire/release/unlock/lock, we can end up
producing RCpc, "RCtso" (feel free to find a better name here...), or RCsc
behaviors, and we're trying to figure out which we actually need.

I think the debate is this:

Obviously programmers would prefer just to have RCsc and not have to figure out
all the complexity of the other options.  On x86 or architectures with native
RCsc operations (like ARMv8), that's generally easy enough to get.

For weakly-ordered architectures that use fences for ordering (including
PowerPC and sometimes RISC-V, see below), though, it takes extra fences to go
from RCpc to either "RCtso" or RCsc.  People using these architectures are
concerned about whether there's a negative performance impact from those extra
fences.

However, some scheduler code, some RCU code, and probably some other examples
already implicitly or explicitly assume unlock()/lock() provides stronger
ordering than RCpc.  So, we have to decide whether to:
1) define unlock()/lock() to enforce "RCtso" or RCsc, insert more fences on
PowerPC and RISC-V accordingly, and probably negatively regress PowerPC
2) leave unlock()/lock() as enforcing only RCpc, fix any code that currently
assumes something stronger than RCpc is being provided, and hope people don't
get it wrong in the future
3) some mixture like having unlock()/lock() be "RCtso" but smp_store_release()/
smp_cond_load_acquire() be only RCpc

Also, FWIW, if other weakly-ordered architectures come along in the future and
also use any kind of lightweight fence rather than native RCsc operations,
they'll likely be in the same boat as RISC-V and Power here, in the sense of
not providing RCsc by default either.

Is that a fair assessment everyone?



I can also not-so-briefly summarize RISC-V's status here, since I think there's
been a bunch of confusion about where we're coming from:

First of all, I promise we're not trying to start a fight about all this :)
We're trying to understand the LKMM requirements so we know what instructions
to use.

With that, the easy case: RISC-V is RCsc if we use AMOs or load-reserved/
store-conditional, all of which have RCsc .aq and .rl bits:

  (a) ...
  amoswap.w.rl x0, x0, [lock]  // unlock()
  ...
loop:
  amoswap.w.aq a0, t1, [lock]  // lock()
  bnez a0, loop// lock()
  (b) ...

(a) is ordered before (b) here, regardless of (a) and (b).  Likewise for our
load-reserved/store-conditional instructions, which also have .aq and rl.
That's similiar to how ARM behaves, and is no problem.  We're happy with that
too.

Unfortunately, we don't (currently?) have plain load-acquire or store-release
opcodes in the ISA.  (That's a different discussion...)  For those, we need
fences instead.  And that's where it gets messier.

RISC-V *would* end up providing only RCpc if we use what I'd argue is the most
"natural" fence-based mapping for store-release operations, and then pair that
with LR/SC:

  (a) ...
  fence rw,w // unlock()
  sw x0, [lock]  // unlock()
  ...
loop:
  lr.w.aq a0, [lock]  // lock()
  sc.w t1, [lock] // lock()
  bnez loop   // lock()
  (b) ...

However, if (a) and (b) are loads to different addresses, then (a) is not
ordered before (b) here.  One unpaired RCsc operation is not a full fence.
Clearly "fence rw,w" is not sufficient if the scheduler, RCU, and elsewhere
depend on "RCtso" or RCsc.

RISC-V can get back to "RCtso", matching PowerPC, by using a stronger fence:

  (a) ...
  fence.tso  // unlock(), fence.tso == fence rw,w + fence r,r
  sw x0, [lock]  // unlock()
  ...
loop:
  lr.w.aq a0, [lock]  // lock()
  sc.w t1, [lock] // lock()
  bnez loop   // lock()
  (b) ...

(a) is ordered before (b), unless (a) is a store and (b) is a load to a
different address.

(Modeling note: this example is why I asked for Alan's v3 patch over the v2
patch, which I believe would only have worked if the fence.tso were at the end)

To get full RCsc here, we'd need a fence rw,rw in between the unlock store and
the lock load, much like PowerPC would I believe need a heavyweight sync:

  (a) ...
  fence rw,w // unlock()
  sw x0, [lock]  // unlock()
  ...
  fence rw,rw// can attach either to lock() or to unlock()
  ...
loop:
  lr.w.aq a0, [lock]  // lock()
  sc.w t1, [lock] // lock()
  bnez loop   // lock()
  (b) ...

In general, RISC-V's fence.tso will suffice wherever PowerPC's lwsync does, and
RISC-V's fence r

Re: [PATCH v13 06/18] x86/xen/time: initialize pv xen time in init_hypervisor_platform

2018-07-12 Thread Pavel Tatashin
> -void __ref xen_init_time_ops(void)
> +void __init xen_init_time_ops(void)
>  {
> pv_time_ops = xen_time_ops;
>
> @@ -542,17 +542,11 @@ void __init xen_hvm_init_time_ops(void)
> return;
>
> if (!xen_feature(XENFEAT_hvm_safe_pvclock)) {
> -   printk(KERN_INFO "Xen doesn't support pvclock on HVM,"
> -   "disable pv timer\n");
> +   pr_info("Xen doesn't support pvclock on HVM, disable pv 
> timer");
> return;
> }
> -
> -   pv_time_ops = xen_time_ops;
> +   xen_init_time_ops();
> x86_init.timers.setup_percpu_clockev = xen_time_init;
> x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents;

Boris reported a bug on HVM, which causes a panic in
x86_late_time_init(). It is introduced here: xen_init_time_ops() sets:
x86_init.timers.timer_init = xen_time_init; which was hpet_time_init()
in HVM. However, we might not even need hpet here. Thus, adding
x86_init.timers.timer_init = x86_init_noop; to the end of
xen_hvm_init_time_ops() should be sufficient.

Thank you,
Pavel


Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello

I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks


Re: [PATCH v3 1/2] leds: core: Introduce generic pattern interface

2018-07-12 Thread Baolin Wang
Hi Jacek,

On 13 July 2018 at 05:41, Jacek Anaszewski  wrote:
> Hi Baolin,
>
>
> On 07/12/2018 02:24 PM, Baolin Wang wrote:
>>
>> Hi Jacek,
>>
>> On 12 July 2018 at 05:10, Jacek Anaszewski 
>> wrote:
>>>
>>> Hi Baolin.
>>>
>>>
>>> On 07/11/2018 01:02 PM, Baolin Wang wrote:


 Hi Jacek and Pavel,

 On 29 June 2018 at 13:03, Baolin Wang  wrote:
>
>
> From: Bjorn Andersson 
>
> Some LED controllers have support for autonomously controlling
> brightness over time, according to some preprogrammed pattern or
> function.
>
> This adds a new optional operator that LED class drivers can implement
> if they support such functionality as well as a new device attribute to
> configure the pattern for a given LED.
>
> [Baolin Wang did some minor improvements.]
>
> Signed-off-by: Bjorn Andersson 
> Signed-off-by: Baolin Wang 
> ---
> Changes from v2:
>- Change kernel version to 4.19.
>- Force user to return error pointer if failed to issue
> pattern_get().
>- Use strstrip() to trim trailing newline.
>- Other optimization.
>
> Changes from v1:
>- Add some comments suggested by Pavel.
>- Change 'delta_t' can be 0.
>
> Note: I removed the pattern repeat check and will get the repeat number
> by adding
> one extra file named 'pattern_repeat' according to previous discussion.
> ---



 Do you have any comments for this version patch set? Thanks.
>>>
>>>
>>>
>>> I tried modifying uleds.c driver to support pattern ops, but
>>> I'm getting segfault when doing "cat pattern". I didn't give
>>> it serious testing and analysis - will do it at weekend.
>>>
>>> It also drew my attention to the issue of desired pattern sysfs
>>> interface semantics on uninitialized pattern. In your implementation
>>> user seems to be unable to determine if the pattern is activated
>>> or not. We should define the semantics for this use case and
>>> describe it in the documentation. Possibly pattern could
>>> return alone new line character then.
>>
>>
>> I am not sure I get your points correctly. If user writes values to
>> pattern interface which means we activated the pattern.
>> If I am wrong, could you elaborate on the issue you concerned? Thanks.
>
>
> Now I see, that writing empty string disables the pattern, right?
> It should be explicitly stated in the pattern file documentation.

Yes, you are right. OK, I will add some documentation for this. Thanks.

>>> This is the code snippet I've used for testing pattern interface:
>>>
>>> static struct led_pattern ptrn[10];
>>> static int ptrn_len;
>>>
>>> static int uled_pattern_clear(struct led_classdev *ldev)
>>> {
>>>  return 0;
>>> }
>>>
>>> static int uled_pattern_set(struct led_classdev *ldev,
>>>struct led_pattern *pattern,
>>>int len)
>>> {
>>>  int i;
>>>
>>>  for (i = 0; i < len; i++) {
>>>  ptrn[i].brightness = pattern[i].brightness;
>>>  ptrn[i].delta_t = pattern[i].delta_t;
>>>  }
>>>
>>>  ptrn_len = len;
>>>
>>>  return 0;
>>> }
>>>
>>> static struct led_pattern *uled_pattern_get(struct led_classdev *ldev,
>>>int *len)
>>> {
>>>  int i;
>>>
>>>  for (i = 0; i < ptrn_len; i++) {
>>>  ptrn[i].brightness = 3;
>>>  ptrn[i].delta_t = 5;
>>>  }
>>>
>>>  *len = ptrn_len;
>>>
>>>  return ptrn;
>>>
>>> }
>>
>>
>> The reason you met segfault when doing "cat pattern" is you should not
>> return one static pattern array, since in pattern_show() it will help
>> to free the pattern memory, could you change to return one pattern
>> pointer with dynamic allocation like my patch 2?
>
>
> Thanks for pointing this out.
>
>
>Documentation/ABI/testing/sysfs-class-led |   17 +
>drivers/leds/led-class.c  |  119
> +
>include/linux/leds.h  |   19 +
>3 files changed, 155 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-class-led
> b/Documentation/ABI/testing/sysfs-class-led
> index 5f67f7a..e01ac55 100644
> --- a/Documentation/ABI/testing/sysfs-class-led
> +++ b/Documentation/ABI/testing/sysfs-class-led
> @@ -61,3 +61,20 @@ Description:
>   gpio and backlight triggers. In case of the backlight
> trigger,
>   it is useful when driving a LED which is intended to
> indicate
>   a device in a standby like state.
> +
> +What: /sys/class/leds//pattern
> +Date: June 2018
> +KernelVersion: 4.19
> +Description:
> +   Specify a pattern for the LED, for LED hardware that support
> +   altering the brightness as a functio

Re: [PATCH v2 1/3] clk: meson: add DT documentation for emmc clock controller

2018-07-12 Thread Yixun Lan
Hi Rob, Jerome, Kevin

see my comments

On 07/13/18 08:15, Rob Herring wrote:
> On Thu, Jul 12, 2018 at 5:29 PM Yixun Lan  wrote:
>>
>> HI Rob
>>
>> see my comments
>>
>> On 07/12/2018 10:17 PM, Rob Herring wrote:
>>> On Wed, Jul 11, 2018 at 8:47 PM Yixun Lan  wrote:

 Hi Rob

 see my comments

 On 07/12/18 03:43, Rob Herring wrote:
> On Tue, Jul 10, 2018 at 04:36:56PM +, Yixun Lan wrote:
>> Document the MMC sub clock controller driver, the potential consumer
>> of this driver is MMC or NAND.
>
> So you all have decided to properly model this now?
>
 Yes, ;-)

>>
>> Signed-off-by: Yixun Lan 
>> ---
>>  .../bindings/clock/amlogic,mmc-clkc.txt   | 31 +++
>>  1 file changed, 31 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt
>>
>> diff --git 
>> a/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt 
>> b/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt
>> new file mode 100644
>> index ..ff6b4bf3ecf9
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt
>> @@ -0,0 +1,31 @@
>> +* Amlogic MMC Sub Clock Controller Driver
>> +
>> +The Amlogic MMC clock controller generates and supplies clock to support
>> +MMC and NAND controller
>> +
>> +Required Properties:
>> +
>> +- compatible: should be:
>> +"amlogic,meson-gx-mmc-clkc"
>> +"amlogic,meson-axg-mmc-clkc"
>> +
>> +- #clock-cells: should be 1.
>> +- clocks: phandles to clocks corresponding to the clock-names property
>> +- clock-names: list of parent clock names
>> +- "clkin0", "clkin1"
>> +
>> +Parent node should have the following properties :
>> +- compatible: "syscon", "simple-mfd, and "amlogic,meson-axg-mmc-clkc"
>
> You don't need "simple-mfd" and probably not syscon either. The order is
> wrong too. Most specific first.
>
 Ok, I will drop "simple-mfd"..

 but the syscon is a must, since this mmc clock model access registers
 via the regmap interface
>>>
>>> A syscon compatible should not be the only way to get a regmap.
>> do you have any suggestion about other function that I can use? is
>> devm_regmap_init_mmio() feasible
>>
>>> Removing lines 56/57 of drivers/mfd/syscon.c should be sufficient.
>>>
>> I'm not sure what's the valid point of removing compatible 'syscon' in
>> driver/mfd/syscon.c, sounds this will break a lot DT/or need to fix?
>> will you propose a patch for this? then I can certainly adjust here
> 
> Removing the 2 lines will simply allow any node to be a syscon. If
> there's a specific driver for a node, then that makes sense to allow
> that.
> 
>>
>>> Why do you need a regmap in the first place? What else needs to access
>>> this register directly?
>> Yes, the SD_EMMC_CLOCK register contain several bits which not fit well
>> into common clock model, and they need to be access in the NAND or eMMC
>> driver itself, Martin had explained this in early thread[1]
>>
>> In this register
>> Bit[31] select NAND or eMMC function
>> Bit[25] enable SDIO IRQ
>> Bit[24] Clock always on
>> Bit[15:14] SRAM Power down
>>
>> [1]
>> https://lkml.kernel.org/r/CAFBinCBeyXf6LNaZzAw6WnsxzDAv8E=yp2eem0xcpwmeui6...@mail.gmail.com
>>
>>> Don't you need a patch removing the clock code
>>> from within the emmc driver? It's not even using regmap, so using
>>> regmap here doesn't help.
>>>
>> No, and current eMMC driver still use iomap to access the register
> 
> Which means a read-modify-write can corrupt the register value if both
> users don't access thru regmap. Changes are probably infrequent enough
> that you get lucky...
> 
What's you says here is true.
and we try to guarantee that only one of NAND or eMMC is enabled, so no
race condition, as a example of the use cases:

1) for enabling NAND driver, we do
   a) enable both mmc-clkc, and NAND driver in DT, they can access
register by using regmap interface
   b) disable eMMC DT node

2) for enabling eMMC driver, we do
   a) enable eMMC node, access register by using iomap (for now)
   b) disable both mmc-clkc and NAND in DT


>> I think we probably would like to take two steps approach.
>> first, from the hardware perspective, the NAND and eMMC(port C) driver
>> can't exist at same time, since they share the pins, clock, internal
>> ram, So we have to only enable one of NAND or eMMC in DT, not enable
>> both of them.
> 
> Yes, of course.
> 
>> Second, we might like to convert eMMC driver to also use mmc-clkc model.
> 
> IMO, this should be done as part of merging this series. Otherwise, we
> have duplicated code for the same thing.

IMO, I'd leave this out of this series, since this patch series is quite
complete as itself. Although, the downside is code duplication.

Still, I need to hear Jerome, or Kevin's option, to see

Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks


Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-12 Thread Chao Fan
On Fri, Jul 13, 2018 at 07:52:40AM +0800, Baoquan He wrote:
>Hi Michal,
>
>On 07/12/18 at 02:32pm, Michal Hocko wrote:
>> On Thu 12-07-18 14:01:15, Chao Fan wrote:
>> > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
>> > >Hi Baoquan,
>> > >
>> > >At 07/11/2018 08:40 PM, Baoquan He wrote:
>> > >> Please try this v3 patch:
>> > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
>> > >> From: Baoquan He 
>> > >> Date: Wed, 11 Jul 2018 20:31:51 +0800
>> > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
>> > >> 
>> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting
>> > >> PFN movable zone begins in each node, kernel text position is not
>> > >> considered. KASLR may put kernel after which movable zone begins.
>> > >> 
>> > >> Fix it by finding movable zone after kernel text on that node.
>> > >> 
>> > >> Signed-off-by: Baoquan He 
>> > >
>> > >
>> > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or
>> > >'movablecore=' failed if the KASLR puts the kernel back the tail of the
>> > >last node, or more.
>> > 
>> > I think it may not fail.
>> > There is a 'restart' to do another pass.
>> > 
>> > >
>> > >Due to we have fix the mirror memory in KASLR side, and Chao is trying
>> > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix
>> > >this in the KASLR side.
>> > >
>> > 
>> > I think it's better to fix here, but not KASLR side.
>> > Cause much more code will be change if doing it in KASLR side.
>> > Since we didn't parse 'kernelcore' in compressed code, and you can see
>> > the distribution of ZONE_MOVABLE need so much code, so we do not need
>> > to do so much job in KASLR side. But here, several lines will be OK.
>> 
>> I am not able to find the beginning of the email thread right now. Could
>> you summarize what is the actual problem please?
>
>The bug is found on x86 now. 
>
>When added "kernelcore=" or "movablecore=" into kernel command line,
>kernel memory is spread evenly among nodes. However, this is right when
>KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
>If KASLR enabled, it could be put any place from 16M to 64T randomly.
> 
>Consider a scenario, we have 10 nodes, and each node has 20G memory, and
>we specify "kernelcore=50%", means each node will take 10G for
>kernelcore, 10G for movable area. But this doesn't take kernel position
>into consideration. E.g if kernel is put at 15G of 2nd node, namely
>node1. Then we think on node1 there's 10G for kernelcore, 10G for
>movable, in fact there's only 5G available for movable, just after
>kernel.
>
>I made a v4 patch which possibly can fix it.
>
>
>From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001
>From: Baoquan He 
>Date: Fri, 13 Jul 2018 07:49:29 +0800
>Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text
>
>In find_zone_movable_pfns_for_nodes(), when try to find the starting
>PFN movable zone begins at in each node, kernel text position is not
>considered. KASLR may put kernel after which movable zone begins.
>
>Fix it by finding movable zone after kernel text on that node.
>
>Signed-off-by: Baoquan He 

You can post it as alone PATCH, then I will test it next week.

Thanks,
Chao Fan

>---
> mm/page_alloc.c | 15 +--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 1521100f1e63..5bc1a47dafda 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -6547,7 +6547,7 @@ static unsigned long __init 
>early_calculate_totalpages(void)
> static void __init find_zone_movable_pfns_for_nodes(void)
> {
>   int i, nid;
>-  unsigned long usable_startpfn;
>+  unsigned long usable_startpfn, kernel_endpfn, arch_startpfn;
>   unsigned long kernelcore_node, kernelcore_remaining;
>   /* save the state before borrow the nodemask */
>   nodemask_t saved_node_state = node_states[N_MEMORY];
>@@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
>   if (!required_kernelcore || required_kernelcore >= totalpages)
>   goto out;
> 
>+  kernel_endpfn = PFN_UP(__pa_symbol(_end));
>   /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
>-  usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
>+  arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
> 
> restart:
>   /* Spread kernelcore memory as evenly as possible throughout nodes */
>@@ -6659,6 +6660,16 @@ static void __init 
>find_zone_movable_pfns_for_nodes(void)
>   unsigned long start_pfn, end_pfn;
> 
>   /*
>+   * KASLR may put kernel near tail of node memory,
>+   * start after kernel on that node to find PFN
>+   * at which zone begins.
>+   */
>+  if (pfn_to_nid(kernel_endpfn) == nid)
>+  usable_startpfn = max(arch_startpfn, kernel_endpfn);
>

Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks


REGRESSION: [PATCH] mmc: tegra: Use sdhci_pltfm_clk_get_max_clock

2018-07-12 Thread Marcel Ziswiler
On Mon, 2018-07-02 at 15:16 +0200, Ulf Hansson wrote:
> On 4 June 2018 at 17:35, Aapo Vienamo  wrote:
> > The sdhci get_max_clock callback is set to
> > sdhci_pltfm_clk_get_max_clock
> > and tegra_sdhci_get_max_clock is removed. It appears that the
> > shdci-tegra specific callback was originally introduced due to the
> > requirement that the host clock has to be twice the bus clock on
> > DDR50
> > mode. As far as I can tell the only effect the removal has on DDR50
> > mode
> > is in cases where the parent clock is unable to supply the
> > requested
> > clock rate, causing the DDR50 mode to run at a lower frequency.
> > Currently the DDR50 mode isn't enabled on any of the SoCs and would
> > also
> > require configuring the SDHCI clock divider register to function
> > properly.
> > 
> > The problem with tegra_sdhci_get_max_clock is that it divides the
> > clock
> > rate by two and thus artificially limits the maximum frequency of
> > faster
> > signaling modes which don't have the host-bus frequency ratio
> > requirement
> > of DDR50 such as SDR104 and HS200. Furthermore, the call to
> > clk_round_rate() may return an error which isn't handled by
> > tegra_sdhci_get_max_clock.
> > 
> > Signed-off-by: Aapo Vienamo 
> 
> Thanks, applied for next!
> 
> Kind regards
> Uffe
> 
> > ---
> >  drivers/mmc/host/sdhci-tegra.c | 15 ++-
> >  1 file changed, 2 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/mmc/host/sdhci-tegra.c
> > b/drivers/mmc/host/sdhci-tegra.c
> > index 970d38f6..c8745b5 100644
> > --- a/drivers/mmc/host/sdhci-tegra.c
> > +++ b/drivers/mmc/host/sdhci-tegra.c
> > @@ -234,17 +234,6 @@ static void
> > tegra_sdhci_set_uhs_signaling(struct sdhci_host *host,
> > sdhci_set_uhs_signaling(host, timing);
> >  }
> > 
> > -static unsigned int tegra_sdhci_get_max_clock(struct sdhci_host
> > *host)
> > -{
> > -   struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> > -
> > -   /*
> > -* DDR modes require the host to run at double the card
> > frequency, so
> > -* the maximum rate we can support is half of the module
> > input clock.
> > -*/
> > -   return clk_round_rate(pltfm_host->clk, UINT_MAX) / 2;
> > -}
> > -
> >  static void tegra_sdhci_set_tap(struct sdhci_host *host, unsigned
> > int tap)
> >  {
> > u32 reg;
> > @@ -309,7 +298,7 @@ static const struct sdhci_ops tegra_sdhci_ops =
> > {
> > .platform_execute_tuning = tegra_sdhci_execute_tuning,
> > .set_uhs_signaling = tegra_sdhci_set_uhs_signaling,
> > .voltage_switch = tegra_sdhci_voltage_switch,
> > -   .get_max_clock = tegra_sdhci_get_max_clock,
> > +   .get_max_clock = sdhci_pltfm_clk_get_max_clock,
> >  };
> > 
> >  static const struct sdhci_pltfm_data sdhci_tegra20_pdata = {
> > @@ -357,7 +346,7 @@ static const struct sdhci_ops
> > tegra114_sdhci_ops = {
> > .platform_execute_tuning = tegra_sdhci_execute_tuning,
> > .set_uhs_signaling = tegra_sdhci_set_uhs_signaling,
> > .voltage_switch = tegra_sdhci_voltage_switch,
> > -   .get_max_clock = tegra_sdhci_get_max_clock,
> > +   .get_max_clock = sdhci_pltfm_clk_get_max_clock,
> >  };
> > 
> >  static const struct sdhci_pltfm_data sdhci_tegra114_pdata = {
> > --
> > 2.7.4

Hm, for us this definitely breaks stuff. While using Stefan's patch set
[1] we may not only run eMMC at DDR52 even SD cards run stable at
SDR104. With this patch however the clock gets crippled to 45.33 resp.
48 MHz always. This is observed both on Apalis/Colibri T30 as well as
Apalis TK1.

Current next-20180712 just with Stefan's 3 patches:

root@apalis-t30:~# cat /sys/kernel/debug/mmc1/ios 
clock:  4800 Hz
actual clock:   4533 Hz
vdd:21 (3.3 ~ 3.4 V)
bus mode:   2 (push-pull)
chip select:0 (don't care)
power mode: 2 (on)
bus width:  3 (8 bits)
timing spec:8 (mmc DDR52)
signal voltage: 1 (1.80 V)
driver type:0 (driver type B)
root@apalis-t30:~# hdparm -t /dev/mmcblk1

/dev/mmcblk1:
 Timing buffered disk reads: 218 MB in  3.03 seconds =  71.95 MB/sec

root@apalis-t30:~# cat /sys/kernel/debug/mmc2/ios 
clock:  4800 Hz
actual clock:   4800 Hz
vdd:21 (3.3 ~ 3.4 V)
bus mode:   2 (push-pull)
chip select:0 (don't care)
power mode: 2 (on)
bus width:  2 (4 bits)
timing spec:6 (sd uhs SDR104)
signal voltage: 1 (1.80 V)
driver type:0 (driver type B)
root@apali

Re: [PATCH 22/32] vfs: Provide documentation for new mount API [ver #9]

2018-07-12 Thread Randy Dunlap
On 07/10/2018 03:43 PM, David Howells wrote:
> Provide documentation for the new mount API.
> 
> Signed-off-by: David Howells 
> ---
> 
>  Documentation/filesystems/mount_api.txt |  439 
> +++
>  1 file changed, 439 insertions(+)
>  create mode 100644 Documentation/filesystems/mount_api.txt

Hi,

I would review this but it sounds like I should just wait for the
next version.

-- 
~Randy


Re: [PATCH v8 2/2] regulator: add QCOM RPMh regulator driver

2018-07-12 Thread David Collins
On 07/12/2018 09:54 AM, Mark Brown wrote:
> On Mon, Jul 09, 2018 at 04:44:14PM -0700, David Collins wrote:
>> On 07/02/2018 03:28 AM, Mark Brown wrote:
>>> On Fri, Jun 22, 2018 at 05:46:14PM -0700, David Collins wrote:
 +static unsigned int rpmh_regulator_pmic4_ldo_of_map_mode(unsigned int 
 mode)
 +{
 +  static const unsigned int of_mode_map[RPMH_REGULATOR_MODE_COUNT] = {
 +  [RPMH_REGULATOR_MODE_RET]  = REGULATOR_MODE_STANDBY,
 +  [RPMH_REGULATOR_MODE_LPM]  = REGULATOR_MODE_IDLE,
 +  [RPMH_REGULATOR_MODE_AUTO] = REGULATOR_MODE_INVALID,
 +  [RPMH_REGULATOR_MODE_HPM]  = REGULATOR_MODE_FAST,
 +  };
> 
>>> Same here, based on that it looks like auto mode is a good map for
>>> normal.
> 
>> LDO type regulators physically do not support AUTO mode.  That is why I
>> specified REGULATOR_MODE_INVALID in the mapping.
> 
> The other question here is why this is even in the table if it's not
> valid (I'm not seeing a need for the MODE_COUNT define)?

I thought that having a table would be more concise and easier to follow.
I can change this to a switch case statement.

Take care,
David

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH V2 00/19] C-SKY(csky) Linux Kernel Port

2018-07-12 Thread Guo Ren
On Thu, Jul 12, 2018 at 10:04:10AM -0600, Sandra Loosemore wrote:
> On 07/12/2018 06:51 AM, Guo Ren wrote:
> >On Wed, Jul 11, 2018 at 10:51:33AM +0100, David Howells wrote:
> >>Can you say what the --target tuple should be so that I can add the arch to 
> >>my
> >>collection of Fedora cross-binutils and cross-gcc tools built from upstream
> >>binutils and gcc sources?
> >Metor Graghics are helping us upstream gcc and binutils.
> >
> >@Sandra,
> >
> >Could you help me to reply the question?
> 
> Neither binutils nor gcc support for C-SKY are in the upstream repositories
> yet.  We should be resubmitting the binutils port soon (with bug fixes to
> address the test failures that caused it to be rejected the last time), and
> the gcc port will follow that shortly.
> 
> The target triplets we have been testing are csky-elf, csky-linux-gnu, and
> csky-linux-uclibc.  Note that the gcc port will only support v2
> processors/ABI so that is the default ABI for these triplets.
> 
> I'm not familiar with the Fedora tools, but to build a complete toolchain
> you'll need library support as well and I'm not sure what the submission
> status/plans for that are.  E.g. Mentor did a newlib/libgloss port for local
> testing of the ELF toolchain and provided it to C-SKY, but pushing that to
> the upstream repository ourselves is not on our todo list.
> 
> -Sandra

Thank you, Sandra.

 Guo Ren


[PATCH 16/18] tools/accounting: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the
first patch of the serie) for the motivation behind this patch

 tools/accounting/getdelays.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/accounting/getdelays.c b/tools/accounting/getdelays.c
index 9f420d98b5fb..66817a7a4fce 100644
--- a/tools/accounting/getdelays.c
+++ b/tools/accounting/getdelays.c
@@ -314,8 +314,7 @@ int main(int argc, char *argv[])
err(1, "Invalid rcv buf size\n");
break;
case 'm':
-   strncpy(cpumask, optarg, sizeof(cpumask));
-   cpumask[sizeof(cpumask) - 1] = '\0';
+   strlcpy(cpumask, optarg, sizeof(cpumask));
maskset = 1;
printf("cpumask %s maskset %d\n", cpumask, maskset);
break;
-- 
2.17.1



[PATCH 18/18] cpupower: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the
first patch of the serie) for the motivation behind this patch

 tools/power/cpupower/bench/parse.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/power/cpupower/bench/parse.c 
b/tools/power/cpupower/bench/parse.c
index 9ba8a44ad2a7..1566b89989b2 100644
--- a/tools/power/cpupower/bench/parse.c
+++ b/tools/power/cpupower/bench/parse.c
@@ -221,9 +221,8 @@ int prepare_config(const char *path, struct config *config)
sscanf(val, "%u", &config->cpu);
 
else if (strcmp("governor", opt) == 0) {
-   strncpy(config->governor, val,
+   strlcpy(config->governor, val,
sizeof(config->governor));
-   config->governor[sizeof(config->governor) - 1] = '\0';
}
 
else if (strcmp("priority", opt) == 0) {
-- 
2.17.1



[PATCH 17/18] perf: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the
first patch of the serie) for the motivation behind this patch

 tools/perf/util/bpf-loader.h | 3 +--
 tools/perf/util/util.c   | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 5d3aefd6fae7..8d08a1fc97a0 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -143,10 +143,9 @@ __bpf_strerror(char *buf, size_t size)
 {
if (!size)
return 0;
-   strncpy(buf,
+   strlcpy(buf,
"ERROR: eBPF object loading is disabled during compiling.\n",
size);
-   buf[size - 1] = '\0';
return 0;
 }
 
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index eac5b858a371..8b9e3aa7aad3 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -459,8 +459,7 @@ fetch_kernel_version(unsigned int *puint, char *str,
return -1;
 
if (str && str_size) {
-   strncpy(str, utsname.release, str_size);
-   str[str_size - 1] = '\0';
+   strlcpy(str, utsname.release, str_size);
}
 
if (!puint || int_ver_ready)
-- 
2.17.1



[PATCH 13/18] ibmvscsi: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the
first patch of the serie) for the motivation behind this patch

 drivers/scsi/ibmvscsi/ibmvscsi.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 17df76f0be3c..79eb8af03a19 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -1274,14 +1274,12 @@ static void send_mad_capabilities(struct 
ibmvscsi_host_data *hostdata)
if (hostdata->client_migrated)
hostdata->caps.flags |= cpu_to_be32(CLIENT_MIGRATED);
 
-   strncpy(hostdata->caps.name, dev_name(&hostdata->host->shost_gendev),
+   strlcpy(hostdata->caps.name, dev_name(&hostdata->host->shost_gendev),
sizeof(hostdata->caps.name));
-   hostdata->caps.name[sizeof(hostdata->caps.name) - 1] = '\0';
 
location = of_get_property(of_node, "ibm,loc-code", NULL);
location = location ? location : dev_name(hostdata->dev);
-   strncpy(hostdata->caps.loc, location, sizeof(hostdata->caps.loc));
-   hostdata->caps.loc[sizeof(hostdata->caps.loc) - 1] = '\0';
+   strlcpy(hostdata->caps.loc, location, sizeof(hostdata->caps.loc));
 
req->common.type = cpu_to_be32(VIOSRP_CAPABILITIES_TYPE);
req->buffer = cpu_to_be64(hostdata->caps_addr);
-- 
2.17.1



[PATCH 15/18] blktrace: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Using strlcpy fixes this new gcc warning:
kernel/trace/blktrace.c: In function ‘do_blk_trace_setup’:
kernel/trace/blktrace.c:497:2: warning: ‘strncpy’ specified bound 32 equals 
destination size [-Wstringop-truncation]
  strncpy(buts->name, name, BLKTRACE_BDEV_SIZE);
  ^

Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the
first patch of the serie) for the motivation behind this patch

 kernel/trace/blktrace.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 987d9a9ae283..2478d9838eab 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -494,8 +494,7 @@ static int do_blk_trace_setup(struct request_queue *q, char 
*name, dev_t dev,
if (!buts->buf_size || !buts->buf_nr)
return -EINVAL;
 
-   strncpy(buts->name, name, BLKTRACE_BDEV_SIZE);
-   buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0';
+   strlcpy(buts->name, name, BLKTRACE_BDEV_SIZE);
 
/*
 * some device names have larger paths - convert the slashes
-- 
2.17.1



[PATCH 12/18] test_power: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the
first patch of the serie) for the motivation behind this patch

 drivers/power/supply/test_power.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/power/supply/test_power.c 
b/drivers/power/supply/test_power.c
index 57246cdbd042..64adf630f64f 100644
--- a/drivers/power/supply/test_power.c
+++ b/drivers/power/supply/test_power.c
@@ -297,8 +297,7 @@ static int map_get_value(struct battery_property_map *map, 
const char *key,
char buf[MAX_KEYLENGTH];
int cr;
 
-   strncpy(buf, key, MAX_KEYLENGTH);
-   buf[MAX_KEYLENGTH-1] = '\0';
+   strlcpy(buf, key, MAX_KEYLENGTH);
 
cr = strnlen(buf, MAX_KEYLENGTH) - 1;
if (cr < 0)
-- 
2.17.1



[PATCH 14/18] kdb_support: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the
first patch of the serie) for the motivation behind this patch

 kernel/debug/kdb/kdb_support.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/debug/kdb/kdb_support.c b/kernel/debug/kdb/kdb_support.c
index 990b3cc526c8..1f6a4b6bde0b 100644
--- a/kernel/debug/kdb/kdb_support.c
+++ b/kernel/debug/kdb/kdb_support.c
@@ -119,8 +119,7 @@ int kdbnearsym(unsigned long addr, kdb_symtab_t *symtab)
 * What was Rusty smoking when he wrote that code?
 */
if (symtab->sym_name != knt1) {
-   strncpy(knt1, symtab->sym_name, knt1_size);
-   knt1[knt1_size-1] = '\0';
+   strlcpy(knt1, symtab->sym_name, knt1_size);
}
for (i = 0; i < ARRAY_SIZE(kdb_name_table); ++i) {
if (kdb_name_table[i] &&
-- 
2.17.1



[PATCH 05/18] iio: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the
first patch of the serie) for the motivation behind this patch

 drivers/iio/common/st_sensors/st_sensors_core.c | 3 +--
 drivers/iio/pressure/st_pressure_i2c.c  | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iio/common/st_sensors/st_sensors_core.c 
b/drivers/iio/common/st_sensors/st_sensors_core.c
index 57db19182e95..26fbd1bd9413 100644
--- a/drivers/iio/common/st_sensors/st_sensors_core.c
+++ b/drivers/iio/common/st_sensors/st_sensors_core.c
@@ -380,8 +380,7 @@ void st_sensors_of_name_probe(struct device *dev,
return;
 
/* The name from the OF match takes precedence if present */
-   strncpy(name, of_id->data, len);
-   name[len - 1] = '\0';
+   strlcpy(name, of_id->data, len);
 }
 EXPORT_SYMBOL(st_sensors_of_name_probe);
 #else
diff --git a/drivers/iio/pressure/st_pressure_i2c.c 
b/drivers/iio/pressure/st_pressure_i2c.c
index fbb59059e942..2026a1012012 100644
--- a/drivers/iio/pressure/st_pressure_i2c.c
+++ b/drivers/iio/pressure/st_pressure_i2c.c
@@ -94,9 +94,8 @@ static int st_press_i2c_probe(struct i2c_client *client,
if ((ret < 0) || (ret >= ST_PRESS_MAX))
return -ENODEV;
 
-   strncpy(client->name, st_press_id_table[ret].name,
+   strlcpy(client->name, st_press_id_table[ret].name,
sizeof(client->name));
-   client->name[sizeof(client->name) - 1] = '\0';
} else if (!id)
return -ENODEV;
 
-- 
2.17.1



[PATCH 08/18] myricom: change strncpy+truncation to strlcpy

2018-07-12 Thread Dominique Martinet
Generated by scripts/coccinelle/misc/strncpy_truncation.cocci

Signed-off-by: Dominique Martinet 
---

Please see https://marc.info/?l=linux-kernel&m=153144450722324&w=2 (the
first patch of the serie) for the motivation behind this patch

 drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c 
b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index b2d2ec8c11e2..f7178cdb6bd8 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -553,8 +553,7 @@ myri10ge_validate_firmware(struct myri10ge_priv *mgp,
}
 
/* save firmware version for ethtool */
-   strncpy(mgp->fw_version, hdr->version, sizeof(mgp->fw_version));
-   mgp->fw_version[sizeof(mgp->fw_version) - 1] = '\0';
+   strlcpy(mgp->fw_version, hdr->version, sizeof(mgp->fw_version));
 
sscanf(mgp->fw_version, "%d.%d.%d", &mgp->fw_ver_major,
   &mgp->fw_ver_minor, &mgp->fw_ver_tiny);
-- 
2.17.1



Re: [bug] kpti, perf_event, bts: sporadic truncated trace

2018-07-12 Thread Hugh Dickins
On Thu, 12 Jul 2018, Metzger, Markus T wrote:

> Hello,
> 
> Starting with 4.15 I noticed that BTS is sporadically missing the tail
> of the trace in the perf_event data buffer.  It shows as
> 
> [decode error (1): instruction overflow]
> 
> in GDB.  Chances to see this are higher the longer the debuggee is
> running.  With this [1] tiny patch to one of GDB's tests, I am able to
> reproduce it reliably on my box.  To run the test, use:
> 
> $ make -s check RUNTESTFLAGS="gdb.btrace/exception.exp"
> 
> from the gdb/ sub-directory in the GDB build directory.
> 
> The issue remains when I use 'nopti' on the kernel command-line.
> 
> 
> Bisecting yielded commit
> 
> c1961a4 x86/events/intel/ds: Map debug buffers in cpu_entry_area
> 
> I reverted the commit on top of v4.17 [2] and the issue disappears
> when I use 'nopti' on the kernel command-line.
> 
> regards,
> markus.
> 
> 
> [1]
> diff --git a/gdb/testsuite/gdb.btrace/exception.exp 
> b/gdb/testsuite/gdb.btrace/exception.exp
> index 9408d61..a24ddd3 100755
> --- a/gdb/testsuite/gdb.btrace/exception.exp
> +++ b/gdb/testsuite/gdb.btrace/exception.exp
> @@ -36,16 +36,12 @@ if ![runto_main] {
>  gdb_test_no_output "set record function-call-history-size 0"
>  
>  # set bp
> -set bp_1 [gdb_get_line_number "bp.1" $srcfile]
>  set bp_2 [gdb_get_line_number "bp.2" $srcfile]
> -gdb_breakpoint $bp_1
>  gdb_breakpoint $bp_2
>  
> -# trace the code between the two breakpoints
> -gdb_continue_to_breakpoint "cont to bp.1" ".*$srcfile:$bp_1\r\n.*"
>  # increase the BTS buffer size - the trace can be quite big
> -gdb_test_no_output "set record btrace bts buffer-size 128000"
> -gdb_test_no_output "record btrace"
> +gdb_test_no_output "set record btrace bts buffer-size 1024000"
> +gdb_test_no_output "record btrace bts"
>  gdb_continue_to_breakpoint "cont to bp.2" ".*$srcfile:$bp_2\r\n.*"
>  
>  # show the flat branch trace
> 
> 
> [2]
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
[ snipped the revert ]

Although my name was kept on that commit as a generous courtesy, it
did change a lot after leaving my fingers - and I was never the best
person to be making perf changes in the first place!

I'm sorry to hear that it's breaking you, I've spent a little while
looking through its final state, most of it looks fine to me, but I
notice one discrepancy: whose effect I cannot predict at all, but
there's a chance that it has something to do with what you're seeing.

A little "optimization" crept into alloc_bts_buffer() along the way,
which now places bts_interrupt_threshold not on a record boundary.
And Stephane has shown me the sentence in Vol 3B, 17.4.9, which says
"This address must point to an offset from the BTS buffer base that
is a multiple of the BTS record size."

Please give the patch below a try, and let us know if it helps (if it
does not, then I think we'll need perfier expertise than I can give).

Hugh

--- 4.18-rc4/arch/x86/events/intel/ds.c 2018-06-03 14:15:21.0 -0700
+++ linux/arch/x86/events/intel/ds.c2018-07-12 17:38:28.471378616 -0700
@@ -408,9 +408,11 @@ static int alloc_bts_buffer(int cpu)
ds->bts_buffer_base = (unsigned long) cea;
ds_update_cea(cea, buffer, BTS_BUFFER_SIZE, PAGE_KERNEL);
ds->bts_index = ds->bts_buffer_base;
-   max = BTS_RECORD_SIZE * (BTS_BUFFER_SIZE / BTS_RECORD_SIZE);
-   ds->bts_absolute_maximum = ds->bts_buffer_base + max;
-   ds->bts_interrupt_threshold = ds->bts_absolute_maximum - (max / 16);
+   max = BTS_BUFFER_SIZE / BTS_RECORD_SIZE;
+   ds->bts_absolute_maximum = ds->bts_buffer_base +
+   max * BTS_RECORD_SIZE;
+   ds->bts_interrupt_threshold = ds->bts_absolute_maximum -
+   (max / 16) * BTS_RECORD_SIZE;
return 0;
 }
 


Re: [V9fs-developer] [PATCH v2 2/6] 9p: Change p9_fid_create calling convention

2018-07-12 Thread jiangyiwen
On 2018/7/12 5:02, Matthew Wilcox wrote:
> Return NULL instead of ERR_PTR when we can't allocate a FID.  The ENOSPC
> return value was getting all the way back to userspace, and that's
> confusing for a userspace program which isn't expecting read() to tell it
> there's no space left on the filesystem.  The best error we can return to
> indicate a temporary failure caused by lack of client resources is ENOMEM.
> 
> Maybe it would be better to sleep until a FID is available, but that's
> not a change I'm comfortable making.
> 
> Signed-off-by: Matthew Wilcox 

Reviewed-by: Yiwen Jiang 

> ---
>  net/9p/client.c | 23 +--
>  1 file changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 999eceb8af98..389a2904b7b3 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -913,13 +913,11 @@ static struct p9_fid *p9_fid_create(struct p9_client 
> *clnt)
>   p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
>   fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
>   if (!fid)
> - return ERR_PTR(-ENOMEM);
> + return NULL;
>  
>   ret = p9_idpool_get(clnt->fidpool);
> - if (ret < 0) {
> - ret = -ENOSPC;
> + if (ret < 0)
>   goto error;
> - }
>   fid->fid = ret;
>  
>   memset(&fid->qid, 0, sizeof(struct p9_qid));
> @@ -935,7 +933,7 @@ static struct p9_fid *p9_fid_create(struct p9_client 
> *clnt)
>  
>  error:
>   kfree(fid);
> - return ERR_PTR(ret);
> + return NULL;
>  }
>  
>  static void p9_fid_destroy(struct p9_fid *fid)
> @@ -1137,9 +1135,8 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, 
> struct p9_fid *afid,
>   p9_debug(P9_DEBUG_9P, ">>> TATTACH afid %d uname %s aname %s\n",
>afid ? afid->fid : -1, uname, aname);
>   fid = p9_fid_create(clnt);
> - if (IS_ERR(fid)) {
> - err = PTR_ERR(fid);
> - fid = NULL;
> + if (!fid) {
> + err = -ENOMEM;
>   goto error;
>   }
>   fid->uid = n_uname;
> @@ -1188,9 +1185,8 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, 
> uint16_t nwname,
>   clnt = oldfid->clnt;
>   if (clone) {
>   fid = p9_fid_create(clnt);
> - if (IS_ERR(fid)) {
> - err = PTR_ERR(fid);
> - fid = NULL;
> + if (!fid) {
> + err = -ENOMEM;
>   goto error;
>   }
>  
> @@ -2018,9 +2014,8 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid 
> *file_fid,
>   err = 0;
>   clnt = file_fid->clnt;
>   attr_fid = p9_fid_create(clnt);
> - if (IS_ERR(attr_fid)) {
> - err = PTR_ERR(attr_fid);
> - attr_fid = NULL;
> + if (!attr_fid) {
> + err = -ENOMEM;
>   goto error;
>   }
>   p9_debug(P9_DEBUG_9P,
> 



Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks


Re: [RFC PATCH v2 1/4] dt-bindings: misc: Add bindings for misc. BMC control fields

2018-07-12 Thread Benjamin Herrenschmidt
On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote:
> On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery  wrote:
> > 
> > Hi Rob,
> > 
> > Thanks for the response.
> > 
> > On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote:
> > > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote:
> > > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to
> > > > provide remote management of (primarily) server platforms. BMCs are
> > > > often tightly coupled to the platform in terms of behaviour and provide
> > > > many hardware features integral to booting and running the host system.
> > > > 
> > > > Some of these hardware features are simple, for example scratch
> > > > registers provided by the BMC that are exposed to both the host and the
> > > > BMC. In other cases there's a single bit switch to enable or disable
> > > > some of the provided functionality.
> > > > 
> > > > The documentation defines bindings for fields in registers that do not
> > > > integrate well into other driver models yet must be described to allow
> > > > the BMC kernel to assume control of these features.
> > > 
> > > So we'll get a new binding when that happens? That will break
> > > compatibility.
> > 
> > Can you please expand on this? I'm not following.
> 
> If we have a subsystem in the future, then there would likely be an
> associated binding which would be different. So if you update the DT,
> then old kernels won't work with it.

What kind of "subsystem" ? There is almost no way there could be one
for that sort of BMC tunables. We've look at several BMC chips out
there and requirements from several vendors, BIOS and system
manufacturers and it's all over the place.

> > I feel like this is an argument of tradition. Maybe people have
> > been dissuaded from doing so when they don't have a reasonable use-
> > case? I'm not saying that what I'm proposing is unquestionably
> > reasonable, but I don't want to dismiss it out of hand.
> 
> One of experience. The one that stands out is clock bindings.
> Initially we were doing a node per clock modelling which could end up
> being 100s of nodes and is difficult to get right (with DT being an
> ABI).
> 
> It comes up with system controller type blocks too that just have a
> bunch of random registers. Those change in every SoC and not in any
> controlled or ordered way that would make describing the individual
> sub-functions in DT worthwhile.

So what's the alternative ? Because without something like what we
propose, what's going to happen is /dev/mem ... that's what people do
today.

> > > A node per register bit doesn't scale.
> > 
> > It isn't meant to scale in terms of a single system. Using it
> > extensively is very likely wrong. Separately, register-bit-led does
> > pretty much the same thing. Doesn't the scale argument apply there?
> > Who is to stop me from attaching an insane number of LEDs to a
> > system?
> 
> Review.
> 
> If you look, register-bit-led is rarely used outside of some ARM, Ltd.
> boards. It's simply quite rare to have MMIO register bits that have a
> fixed function of LED control.

Well, same here, we hope to review what goes upstream to make it
reasonable. Otherwise it doens't matter. If a random vendor, let's say
IBM, chose to chip a system where they put an insane amount of cruft in
there, it will only affect those systems's BMC and the userspace stack
on it.

Thankfully that stack is OpenBMC and IBM is aiming at having their
device-tree's upstream, thus reviewed, thus it won't happen.

*Anything* can be abused. The point here is that we have a number,
thankfully rather small, maybe a dozen or two, of tunables that are
quite specific to a combination (system vendor, bmc vendor, system
model) which control a few HW features that essentially do *NOT* fit in
a subsystem.

For everything that does, we have created proper drivers (and are doing
more).


> > Obviously if there are lots of systems using it sparingly and
> > legitimately then maybe there's a scale issue, but isn't that just
> > a reality of different hardware designs? Whoever is implementing
> > support for the system is going to have to describe the hardware
> > one way or another.
> > 
> > > 
> > > Maybe this should be modelled using GPIO binding? There's a line there
> > > too as whether the signals are "general purpose" or not.
> > 
> > I don't think so, mainly because some of the things it is intended to be 
> > used for are not GPIOs. For instance, take the DAC mux I've described in 
> > the patch. It doesn't directly influence anything external to the SoC (i.e. 
> > it's certainly not a traditional GPIO in any sense). However, it does 
> > *indirectly* influence the SoC's behaviour by muxing the DAC internally 
> > between:
> > 
> > 0. VGA device exposed on the host PCIe bus
> > 1. The "Graphics CRT" controller
> > 2. VGA port A
> > 3. VGA port B
> 
> And this mux control is fixed in the SoC design?

This specific family of SoC (Aspeed) support those 4 configurations.

[PATCH -next] fsi: sbefifo: Fix missing unlock on error in sbefifo_dump_ffdc()

2018-07-12 Thread Wei Yongjun
Add the missing unlock before return from function sbefifo_dump_ffdc()
in the error handling case.

Fixes: 9f4a8a2d7f9d ("fsi/sbefifo: Add driver for the SBE FIFO")
Signed-off-by: Wei Yongjun 
---
 drivers/fsi/fsi-sbefifo.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/fsi/fsi-sbefifo.c b/drivers/fsi/fsi-sbefifo.c
index 6b31cc24..35f2749 100644
--- a/drivers/fsi/fsi-sbefifo.c
+++ b/drivers/fsi/fsi-sbefifo.c
@@ -150,6 +150,7 @@ static void sbefifo_dump_ffdc(struct device *dev, const 
__be32 *ffdc,
u32 w0, w1, w2, i;
if (ffdc_sz < 3) {
dev_err(dev, "SBE invalid FFDC package size %zd\n", 
ffdc_sz);
+   mutex_unlock(&sbefifo_ffdc_mutex);
return;
}
w0 = be32_to_cpu(*(ffdc++));



Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]

2018-07-12 Thread Andy Lutomirski



> On Jul 12, 2018, at 5:03 PM, David Howells  wrote:
> 
> Andy Lutomirski  wrote:
> 
 I tend to think that this *should* fail using the new API.  The semantics
 of the second mount request are bizarre at best.
>>> 
>>> You still have to support existing behaviour lest you break userspace.
>>> 
>> 
>> I assume the existing behavior is that a bind mount is created?  If so, the
>> new mount(8) tool could do it in user code.
> 
> You have a race there.
> 
> Also you can't currently directly create a bind mount from userspace as you
> can only bind from another path point - which you may not be able to access
> (either by permission failure or because it's not in your mount namespace).
> 

Are you trying to preserve the magic bind semantics with the new API?  If you 
are, I think it should be by explicit opt in only. Otherwise you risk having 
your shiny new way to specify fs options get ignored when the magic bind mount 
happens. 

Re: [PATCH v5 2/2] cpufreq: qcom-hw: Add support for QCOM cpufreq HW driver

2018-07-12 Thread Matthias Kaehlcke
Hi,

On Thu, Jul 12, 2018 at 11:35:45PM +0530, Taniya Das wrote:
> The CPUfreq HW present in some QCOM chipsets offloads the steps necessary
> for changing the frequency of CPUs. The driver implements the cpufreq
> driver interface for this hardware engine.
> 
> Signed-off-by: Saravana Kannan 
> Signed-off-by: Taniya Das 
> ---
>  drivers/cpufreq/Kconfig.arm   |  10 ++
>  drivers/cpufreq/Makefile  |   1 +
>  drivers/cpufreq/qcom-cpufreq-hw.c | 344 
> ++
>  3 files changed, 355 insertions(+)
>  create mode 100644 drivers/cpufreq/qcom-cpufreq-hw.c
> 
> diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
> index 52f5f1a..141ec3e 100644
> --- a/drivers/cpufreq/Kconfig.arm
> +++ b/drivers/cpufreq/Kconfig.arm
> @@ -312,3 +312,13 @@ config ARM_PXA2xx_CPUFREQ
> This add the CPUFreq driver support for Intel PXA2xx SOCs.
> 
> If in doubt, say N.
> +
> +config ARM_QCOM_CPUFREQ_HW
> + bool "QCOM CPUFreq HW driver"
> + help
> +  Support for the CPUFreq HW driver.
> +  Some QCOM chipsets have a HW engine to offload the steps
> +  necessary for changing the frequency of the CPUs. Firmware loaded
> +  in this engine exposes a programming interafce to the High-level OS.
> +  The driver implements the cpufreq driver interface for this HW engine.
> +  Say Y if you want to support CPUFreq HW.
> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> index fb4a2ec..1226a3e 100644
> --- a/drivers/cpufreq/Makefile
> +++ b/drivers/cpufreq/Makefile
> @@ -86,6 +86,7 @@ obj-$(CONFIG_ARM_TEGRA124_CPUFREQ)  += tegra124-cpufreq.o
>  obj-$(CONFIG_ARM_TEGRA186_CPUFREQ)   += tegra186-cpufreq.o
>  obj-$(CONFIG_ARM_TI_CPUFREQ) += ti-cpufreq.o
>  obj-$(CONFIG_ARM_VEXPRESS_SPC_CPUFREQ)   += vexpress-spc-cpufreq.o
> +obj-$(CONFIG_ARM_QCOM_CPUFREQ_HW)+= qcom-cpufreq-hw.o
> 
> 
>  
> ##
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c 
> b/drivers/cpufreq/qcom-cpufreq-hw.c
> new file mode 100644
> index 000..fa25a95
> --- /dev/null
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -0,0 +1,344 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2018, The Linux Foundation. All rights reserved.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define INIT_RATE3UL
> +#define XO_RATE  1920UL
> +#define LUT_MAX_ENTRIES  40U
> +#define CORE_COUNT_VAL(val)  (((val) & (GENMASK(18, 16))) >> 16)
> +#define LUT_ROW_SIZE 32
> +
> +enum {
> + REG_ENABLE,
> + REG_LUT_TABLE,
> + REG_PERF_STATE,
> +
> + REG_ARRAY_SIZE,
> +};
> +
> +struct cpufreq_qcom {
> + struct cpufreq_frequency_table *table;
> + struct device *dev;
> + const u16 *reg_offset;
> + void __iomem *base;
> + cpumask_t related_cpus;
> + unsigned int max_cores;

Same comment as on v4:

Why *max*_cores? This seems to be the number of CPUs in a cluster and
qcom_read_lut() expects the core count read from the LUT to match
exactly. Maybe it's the name from the datasheet? Should it still be
'num_cores' or similer?

> +static struct cpufreq_qcom *qcom_freq_domain_map[NR_CPUS];

It would be an option to limit this to the number of CPU clusters and
allocate it dynamically when the driver is initialized (key = first
core in the cluster). Probably not worth the hassle with the limited
number of cores though.

> +static int qcom_read_lut(struct platform_device *pdev,
> +  struct cpufreq_qcom *c)
> +{
> + struct device *dev = &pdev->dev;
> + unsigned int offset;
> + u32 data, src, lval, i, core_count, prev_cc, prev_freq, cur_freq;
> +
> + c->table = devm_kcalloc(dev, LUT_MAX_ENTRIES + 1,
> + sizeof(*c->table), GFP_KERNEL);
> + if (!c->table)
> + return -ENOMEM;
> +
> + offset = c->reg_offset[REG_LUT_TABLE];
> +
> + for (i = 0; i < LUT_MAX_ENTRIES; i++) {
> + data = readl_relaxed(c->base + offset + i * LUT_ROW_SIZE);
> + src = ((data & GENMASK(31, 30)) >> 30);
> + lval = (data & GENMASK(7, 0));
> + core_count = CORE_COUNT_VAL(data);
> +
> + if (src == 0)
> + c->table[i].frequency = INIT_RATE / 1000;
> + else
> + c->table[i].frequency = XO_RATE * lval / 1000;

You changed the condition from '!src' to 'src == 0'. My suggestion on
v4 was in part about a negative condition, but also about the
order. If it doesn't obstruct the code otherwise I think for an if-else
branch it is good practice to handle the more common case first and
then the 'exception'. I would expect most entries to have an actual
rate. Just a nit in any case, feel free to ignore if you prefer as is.

> +static int qcom_cpu_resource

Re: [PATCH v2 1/3] clk: meson: add DT documentation for emmc clock controller

2018-07-12 Thread Rob Herring
On Thu, Jul 12, 2018 at 5:29 PM Yixun Lan  wrote:
>
> HI Rob
>
> see my comments
>
> On 07/12/2018 10:17 PM, Rob Herring wrote:
> > On Wed, Jul 11, 2018 at 8:47 PM Yixun Lan  wrote:
> >>
> >> Hi Rob
> >>
> >> see my comments
> >>
> >> On 07/12/18 03:43, Rob Herring wrote:
> >>> On Tue, Jul 10, 2018 at 04:36:56PM +, Yixun Lan wrote:
>  Document the MMC sub clock controller driver, the potential consumer
>  of this driver is MMC or NAND.
> >>>
> >>> So you all have decided to properly model this now?
> >>>
> >> Yes, ;-)
> >>
> 
>  Signed-off-by: Yixun Lan 
>  ---
>   .../bindings/clock/amlogic,mmc-clkc.txt   | 31 +++
>   1 file changed, 31 insertions(+)
>   create mode 100644 
>  Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt
> 
>  diff --git 
>  a/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt 
>  b/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt
>  new file mode 100644
>  index ..ff6b4bf3ecf9
>  --- /dev/null
>  +++ b/Documentation/devicetree/bindings/clock/amlogic,mmc-clkc.txt
>  @@ -0,0 +1,31 @@
>  +* Amlogic MMC Sub Clock Controller Driver
>  +
>  +The Amlogic MMC clock controller generates and supplies clock to support
>  +MMC and NAND controller
>  +
>  +Required Properties:
>  +
>  +- compatible: should be:
>  +"amlogic,meson-gx-mmc-clkc"
>  +"amlogic,meson-axg-mmc-clkc"
>  +
>  +- #clock-cells: should be 1.
>  +- clocks: phandles to clocks corresponding to the clock-names property
>  +- clock-names: list of parent clock names
>  +- "clkin0", "clkin1"
>  +
>  +Parent node should have the following properties :
>  +- compatible: "syscon", "simple-mfd, and "amlogic,meson-axg-mmc-clkc"
> >>>
> >>> You don't need "simple-mfd" and probably not syscon either. The order is
> >>> wrong too. Most specific first.
> >>>
> >> Ok, I will drop "simple-mfd"..
> >>
> >> but the syscon is a must, since this mmc clock model access registers
> >> via the regmap interface
> >
> > A syscon compatible should not be the only way to get a regmap.
> do you have any suggestion about other function that I can use? is
> devm_regmap_init_mmio() feasible
>
> > Removing lines 56/57 of drivers/mfd/syscon.c should be sufficient.
> >
> I'm not sure what's the valid point of removing compatible 'syscon' in
> driver/mfd/syscon.c, sounds this will break a lot DT/or need to fix?
> will you propose a patch for this? then I can certainly adjust here

Removing the 2 lines will simply allow any node to be a syscon. If
there's a specific driver for a node, then that makes sense to allow
that.

>
> > Why do you need a regmap in the first place? What else needs to access
> > this register directly?
> Yes, the SD_EMMC_CLOCK register contain several bits which not fit well
> into common clock model, and they need to be access in the NAND or eMMC
> driver itself, Martin had explained this in early thread[1]
>
> In this register
> Bit[31] select NAND or eMMC function
> Bit[25] enable SDIO IRQ
> Bit[24] Clock always on
> Bit[15:14] SRAM Power down
>
> [1]
> https://lkml.kernel.org/r/CAFBinCBeyXf6LNaZzAw6WnsxzDAv8E=yp2eem0xcpwmeui6...@mail.gmail.com
>
> > Don't you need a patch removing the clock code
> > from within the emmc driver? It's not even using regmap, so using
> > regmap here doesn't help.
> >
> No, and current eMMC driver still use iomap to access the register

Which means a read-modify-write can corrupt the register value if both
users don't access thru regmap. Changes are probably infrequent enough
that you get lucky...

> I think we probably would like to take two steps approach.
> first, from the hardware perspective, the NAND and eMMC(port C) driver
> can't exist at same time, since they share the pins, clock, internal
> ram, So we have to only enable one of NAND or eMMC in DT, not enable
> both of them.

Yes, of course.

> Second, we might like to convert eMMC driver to also use mmc-clkc model.

IMO, this should be done as part of merging this series. Otherwise, we
have duplicated code for the same thing.

Rob


Re: [PATCH] vfio-pci: Disable binding to PFs with SR-IOV enabled

2018-07-12 Thread David Gibson
On Thu, Jul 12, 2018 at 04:33:04PM -0600, Alex Williamson wrote:
> We expect to receive PFs with SR-IOV disabled, however some host
> drivers leave SR-IOV enabled at unbind.  This puts us in a state where
> we can potentially assign both the PF and the VF, leading to both
> functionality as well as security concerns due to lack of managing the
> SR-IOV state as well as vendor dependent isolation from the PF to VF.
> If we were to attempt to actively disable SR-IOV on driver probe, we
> risk VF bound drivers blocking, potentially risking live lock
> scenarios.  Therefore simply refuse to bind to PFs with SR-IOV enabled
> with a warning message indicating the issue.  Users can resolve this
> by re-binding to the host driver and disabling SR-IOV before
> attempting to use the device with vfio-pci.
> 
> Signed-off-by: Alex Williamson 

Reviewed-by: David Gibson 

> ---
>  drivers/vfio/pci/vfio_pci.c |   13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index b423a309a6e0..f372f209c5c2 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1189,6 +1189,19 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
> struct pci_device_id *id)
>   if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
>   return -EINVAL;
>  
> + /*
> +  * Prevent binding to PFs with VFs enabled, this too easily allows
> +  * userspace instance with VFs and PFs from the same device, which
> +  * cannot work.  Disabling SR-IOV here would initiate removing the
> +  * VFs, which would unbind the driver, which is prone to blocking
> +  * if that VF is also in use by vfio-pci.  Just reject these PFs
> +  * and let the user sort it out.
> +  */
> + if (pci_num_vf(pdev)) {
> + pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n");
> + return -EBUSY;
> + }
> +
>   group = vfio_iommu_group_get(&pdev->dev);
>   if (!group)
>   return -EINVAL;
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCH v2 2/7] proc/kcore: replace kclist_lock rwlock with rwsem

2018-07-12 Thread Omar Sandoval
From: Omar Sandoval 

Now we only need kclist_lock from user context and at fs init time, and
the following changes need to sleep while holding the kclist_lock.

Signed-off-by: Omar Sandoval 
---
 fs/proc/kcore.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index ddeeb3a5a015..def92fccb167 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -59,8 +59,8 @@ struct memelfnote
 };
 
 static LIST_HEAD(kclist_head);
-static DEFINE_RWLOCK(kclist_lock);
-static int kcore_need_update = 1;
+static DECLARE_RWSEM(kclist_lock);
+static atomic_t kcore_need_update = ATOMIC_INIT(1);
 
 /* This doesn't grab kclist_lock, so it should only be used at init time. */
 void
@@ -117,8 +117,8 @@ static void __kcore_update_ram(struct list_head *list)
struct kcore_list *tmp, *pos;
LIST_HEAD(garbage);
 
-   write_lock(&kclist_lock);
-   if (kcore_need_update) {
+   down_write(&kclist_lock);
+   if (atomic_cmpxchg(&kcore_need_update, 1, 0)) {
list_for_each_entry_safe(pos, tmp, &kclist_head, list) {
if (pos->type == KCORE_RAM
|| pos->type == KCORE_VMEMMAP)
@@ -127,9 +127,8 @@ static void __kcore_update_ram(struct list_head *list)
list_splice_tail(list, &kclist_head);
} else
list_splice(list, &garbage);
-   kcore_need_update = 0;
proc_root_kcore->size = get_kcore_size(&nphdr, &size);
-   write_unlock(&kclist_lock);
+   up_write(&kclist_lock);
 
free_kclist_ents(&garbage);
 }
@@ -452,11 +451,11 @@ read_kcore(struct file *file, char __user *buffer, size_t 
buflen, loff_t *fpos)
int nphdr;
unsigned long start;
 
-   read_lock(&kclist_lock);
+   down_read(&kclist_lock);
size = get_kcore_size(&nphdr, &elf_buflen);
 
if (buflen == 0 || *fpos >= size) {
-   read_unlock(&kclist_lock);
+   up_read(&kclist_lock);
return 0;
}
 
@@ -473,11 +472,11 @@ read_kcore(struct file *file, char __user *buffer, size_t 
buflen, loff_t *fpos)
tsz = buflen;
elf_buf = kzalloc(elf_buflen, GFP_ATOMIC);
if (!elf_buf) {
-   read_unlock(&kclist_lock);
+   up_read(&kclist_lock);
return -ENOMEM;
}
elf_kcore_store_hdr(elf_buf, nphdr, elf_buflen);
-   read_unlock(&kclist_lock);
+   up_read(&kclist_lock);
if (copy_to_user(buffer, elf_buf + *fpos, tsz)) {
kfree(elf_buf);
return -EFAULT;
@@ -492,7 +491,7 @@ read_kcore(struct file *file, char __user *buffer, size_t 
buflen, loff_t *fpos)
if (buflen == 0)
return acc;
} else
-   read_unlock(&kclist_lock);
+   up_read(&kclist_lock);
 
/*
 * Check to see if our file offset matches with any of
@@ -505,12 +504,12 @@ read_kcore(struct file *file, char __user *buffer, size_t 
buflen, loff_t *fpos)
while (buflen) {
struct kcore_list *m;
 
-   read_lock(&kclist_lock);
+   down_read(&kclist_lock);
list_for_each_entry(m, &kclist_head, list) {
if (start >= m->addr && start < (m->addr+m->size))
break;
}
-   read_unlock(&kclist_lock);
+   up_read(&kclist_lock);
 
if (&m->list == &kclist_head) {
if (clear_user(buffer, tsz))
@@ -563,7 +562,7 @@ static int open_kcore(struct inode *inode, struct file 
*filp)
if (!filp->private_data)
return -ENOMEM;
 
-   if (kcore_need_update)
+   if (atomic_read(&kcore_need_update))
kcore_update_ram();
if (i_size_read(inode) != proc_root_kcore->size) {
inode_lock(inode);
@@ -593,9 +592,8 @@ static int __meminit kcore_callback(struct notifier_block 
*self,
switch (action) {
case MEM_ONLINE:
case MEM_OFFLINE:
-   write_lock(&kclist_lock);
-   kcore_need_update = 1;
-   write_unlock(&kclist_lock);
+   atomic_set(&kcore_need_update, 1);
+   break;
}
return NOTIFY_OK;
 }
-- 
2.18.0



[PATCH v2 5/7] proc/kcore: clean up ELF header generation

2018-07-12 Thread Omar Sandoval
From: Omar Sandoval 

Currently, the ELF file header, program headers, and note segment are
allocated all at once, in some icky code dating back to 2.3. Programs
tend to read the file header, then the program headers, then the note
segment, all separately, so this is a waste of effort. It's cleaner and
more efficient to handle the three separately.

Signed-off-by: Omar Sandoval 
---
 fs/proc/kcore.c | 350 +++-
 1 file changed, 141 insertions(+), 209 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index f1ae848c7bcc..a7e730b40154 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -49,15 +49,6 @@ static struct proc_dir_entry *proc_root_kcore;
 #definekc_offset_to_vaddr(o) ((o) + PAGE_OFFSET)
 #endif
 
-/* An ELF note in memory */
-struct memelfnote
-{
-   const char *name;
-   int type;
-   unsigned int datasz;
-   void *data;
-};
-
 static LIST_HEAD(kclist_head);
 static DECLARE_RWSEM(kclist_lock);
 static atomic_t kcore_need_update = ATOMIC_INIT(1);
@@ -73,7 +64,8 @@ kclist_add(struct kcore_list *new, void *addr, size_t size, 
int type)
list_add_tail(&new->list, &kclist_head);
 }
 
-static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
+static size_t get_kcore_size(int *nphdr, size_t *phdrs_len, size_t *notes_len,
+size_t *data_offset)
 {
size_t try, size;
struct kcore_list *m;
@@ -87,15 +79,15 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
size = try;
*nphdr = *nphdr + 1;
}
-   *elf_buflen =   sizeof(struct elfhdr) + 
-   (*nphdr + 2)*sizeof(struct elf_phdr) + 
-   3 * ((sizeof(struct elf_note)) +
-roundup(sizeof(CORE_STR), 4)) +
-   roundup(sizeof(struct elf_prstatus), 4) +
-   roundup(sizeof(struct elf_prpsinfo), 4) +
-   roundup(arch_task_struct_size, 4);
-   *elf_buflen = PAGE_ALIGN(*elf_buflen);
-   return size + *elf_buflen;
+
+   *phdrs_len = *nphdr * sizeof(struct elf_phdr);
+   *notes_len = (3 * (sizeof(struct elf_note) + ALIGN(sizeof(CORE_STR), 
4)) +
+ ALIGN(sizeof(struct elf_prstatus), 4) +
+ ALIGN(sizeof(struct elf_prpsinfo), 4) +
+ ALIGN(arch_task_struct_size, 4));
+   *data_offset = PAGE_ALIGN(sizeof(struct elfhdr) + *phdrs_len +
+ *notes_len);
+   return *data_offset + size;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -241,7 +233,7 @@ static int kcore_update_ram(void)
LIST_HEAD(list);
LIST_HEAD(garbage);
int nphdr;
-   size_t size;
+   size_t phdrs_len, notes_len, data_offset;
struct kcore_list *tmp, *pos;
int ret = 0;
 
@@ -263,7 +255,8 @@ static int kcore_update_ram(void)
}
list_splice_tail(&list, &kclist_head);
 
-   proc_root_kcore->size = get_kcore_size(&nphdr, &size);
+   proc_root_kcore->size = get_kcore_size(&nphdr, &phdrs_len, ¬es_len,
+  &data_offset);
 
 out:
up_write(&kclist_lock);
@@ -274,228 +267,168 @@ static int kcore_update_ram(void)
return ret;
 }
 
-/*/
-/*
- * determine size of ELF note
- */
-static int notesize(struct memelfnote *en)
+static void append_kcore_note(char *notes, size_t *i, const char *name,
+ unsigned int type, const void *desc,
+ size_t descsz)
 {
-   int sz;
-
-   sz = sizeof(struct elf_note);
-   sz += roundup((strlen(en->name) + 1), 4);
-   sz += roundup(en->datasz, 4);
-
-   return sz;
-} /* end notesize() */
-
-/*/
-/*
- * store a note in the header buffer
- */
-static char *storenote(struct memelfnote *men, char *bufp)
-{
-   struct elf_note en;
-
-#define DUMP_WRITE(addr,nr) do { memcpy(bufp,addr,nr); bufp += nr; } while(0)
-
-   en.n_namesz = strlen(men->name) + 1;
-   en.n_descsz = men->datasz;
-   en.n_type = men->type;
-
-   DUMP_WRITE(&en, sizeof(en));
-   DUMP_WRITE(men->name, en.n_namesz);
-
-   /* XXX - cast from long long to long to avoid need for libgcc.a */
-   bufp = (char*) roundup((unsigned long)bufp,4);
-   DUMP_WRITE(men->data, men->datasz);
-   bufp = (char*) roundup((unsigned long)bufp,4);
-
-#undef DUMP_WRITE
-
-   return bufp;
-} /* end storenote() */
-
-/*
- * store an ELF coredump header in the supplied buffer
- * nphdr is the number of elf_phdr to insert
- */
-static void elf_kcore_store_hdr(char *bufp, int nphdr, int dataoff)
-{
-   struct elf_prstatus prstatus;   /* NT_PRSTATUS */
-   struct elf_prpsinfo prpsinfo;   /* NT_PRPSINFO */
-   struct elf_phdr *nhdr, *phdr;
-   stru

[PATCH v2 4/7] proc/kcore: hold lock during read

2018-07-12 Thread Omar Sandoval
From: Omar Sandoval 

Now that we're using an rwsem, we can hold it during the entirety of
read_kcore() and have a common return path. This is preparation for the
next change.

Signed-off-by: Omar Sandoval 
---
 fs/proc/kcore.c | 70 -
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 33667db6e370..f1ae848c7bcc 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -440,19 +440,18 @@ static ssize_t
 read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 {
char *buf = file->private_data;
-   ssize_t acc = 0;
size_t size, tsz;
size_t elf_buflen;
int nphdr;
unsigned long start;
+   size_t orig_buflen = buflen;
+   int ret = 0;
 
down_read(&kclist_lock);
size = get_kcore_size(&nphdr, &elf_buflen);
 
-   if (buflen == 0 || *fpos >= size) {
-   up_read(&kclist_lock);
-   return 0;
-   }
+   if (buflen == 0 || *fpos >= size)
+   goto out;
 
/* trim buflen to not go beyond EOF */
if (buflen > size - *fpos)
@@ -465,28 +464,26 @@ read_kcore(struct file *file, char __user *buffer, size_t 
buflen, loff_t *fpos)
tsz = elf_buflen - *fpos;
if (buflen < tsz)
tsz = buflen;
-   elf_buf = kzalloc(elf_buflen, GFP_ATOMIC);
+   elf_buf = kzalloc(elf_buflen, GFP_KERNEL);
if (!elf_buf) {
-   up_read(&kclist_lock);
-   return -ENOMEM;
+   ret = -ENOMEM;
+   goto out;
}
elf_kcore_store_hdr(elf_buf, nphdr, elf_buflen);
-   up_read(&kclist_lock);
if (copy_to_user(buffer, elf_buf + *fpos, tsz)) {
kfree(elf_buf);
-   return -EFAULT;
+   ret = -EFAULT;
+   goto out;
}
kfree(elf_buf);
buflen -= tsz;
*fpos += tsz;
buffer += tsz;
-   acc += tsz;
 
/* leave now if filled buffer already */
if (buflen == 0)
-   return acc;
-   } else
-   up_read(&kclist_lock);
+   goto out;
+   }
 
/*
 * Check to see if our file offset matches with any of
@@ -499,25 +496,29 @@ read_kcore(struct file *file, char __user *buffer, size_t 
buflen, loff_t *fpos)
while (buflen) {
struct kcore_list *m;
 
-   down_read(&kclist_lock);
list_for_each_entry(m, &kclist_head, list) {
if (start >= m->addr && start < (m->addr+m->size))
break;
}
-   up_read(&kclist_lock);
 
if (&m->list == &kclist_head) {
-   if (clear_user(buffer, tsz))
-   return -EFAULT;
+   if (clear_user(buffer, tsz)) {
+   ret = -EFAULT;
+   goto out;
+   }
} else if (m->type == KCORE_VMALLOC) {
vread(buf, (char *)start, tsz);
/* we have to zero-fill user buffer even if no read */
-   if (copy_to_user(buffer, buf, tsz))
-   return -EFAULT;
+   if (copy_to_user(buffer, buf, tsz)) {
+   ret = -EFAULT;
+   goto out;
+   }
} else if (m->type == KCORE_USER) {
/* User page is handled prior to normal kernel page: */
-   if (copy_to_user(buffer, (char *)start, tsz))
-   return -EFAULT;
+   if (copy_to_user(buffer, (char *)start, tsz)) {
+   ret = -EFAULT;
+   goto out;
+   }
} else {
if (kern_addr_valid(start)) {
/*
@@ -525,26 +526,35 @@ read_kcore(struct file *file, char __user *buffer, size_t 
buflen, loff_t *fpos)
 * hardened user copy kernel text checks.
 */
if (probe_kernel_read(buf, (void *) start, 
tsz)) {
-   if (clear_user(buffer, tsz))
-   return -EFAULT;
+   if (clear_user(buffer, tsz)) {
+   ret = -EFAULT;
+   goto out;
+   }
} el

[PATCH v2 1/7] proc/kcore: don't grab lock for kclist_add()

2018-07-12 Thread Omar Sandoval
From: Omar Sandoval 

kclist_add() is only called at init time, so there's no point in
grabbing any locks. We're also going to replace the rwlock with a rwsem,
which we don't want to try grabbing during early boot.

Signed-off-by: Omar Sandoval 
---
 fs/proc/kcore.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 66c373230e60..ddeeb3a5a015 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -62,6 +62,7 @@ static LIST_HEAD(kclist_head);
 static DEFINE_RWLOCK(kclist_lock);
 static int kcore_need_update = 1;
 
+/* This doesn't grab kclist_lock, so it should only be used at init time. */
 void
 kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
 {
@@ -69,9 +70,7 @@ kclist_add(struct kcore_list *new, void *addr, size_t size, 
int type)
new->size = size;
new->type = type;
 
-   write_lock(&kclist_lock);
list_add_tail(&new->list, &kclist_head);
-   write_unlock(&kclist_lock);
 }
 
 static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
-- 
2.18.0



[PATCH v2 6/7] proc/kcore: optimize multiple page reads

2018-07-12 Thread Omar Sandoval
From: Omar Sandoval 

The current code does a full search of the segment list every time for
every page. This is wasteful, since it's almost certain that the next
page will be in the same segment. Instead, check if the previous segment
covers the current page before doing the list search.

Signed-off-by: Omar Sandoval 
---
 fs/proc/kcore.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index a7e730b40154..d1b875afc359 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -428,10 +428,18 @@ read_kcore(struct file *file, char __user *buffer, size_t 
buflen, loff_t *fpos)
if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
tsz = buflen;
 
+   m = NULL;
while (buflen) {
-   list_for_each_entry(m, &kclist_head, list) {
-   if (start >= m->addr && start < (m->addr+m->size))
-   break;
+   /*
+* If this is the first iteration or the address is not within
+* the previous entry, search for a matching entry.
+*/
+   if (!m || start < m->addr || start >= m->addr + m->size) {
+   list_for_each_entry(m, &kclist_head, list) {
+   if (start >= m->addr &&
+   start < m->addr + m->size)
+   break;
+   }
}
 
if (&m->list == &kclist_head) {
-- 
2.18.0



[PATCH v2 7/7] proc/kcore: add vmcoreinfo note to /proc/kcore

2018-07-12 Thread Omar Sandoval
From: Omar Sandoval 

The vmcoreinfo information is useful for runtime debugging tools, not
just for crash dumps. A lot of this information can be determined by
other means, but this is much more convenient.

Signed-off-by: Omar Sandoval 
---
 fs/proc/Kconfig|  1 +
 fs/proc/kcore.c| 18 --
 include/linux/crash_core.h |  2 ++
 kernel/crash_core.c|  4 ++--
 4 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index 0eaeb41453f5..817c02b13b1d 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -31,6 +31,7 @@ config PROC_FS
 config PROC_KCORE
bool "/proc/kcore support" if !ARM
depends on PROC_FS && MMU
+   select CRASH_CORE
help
  Provides a virtual ELF core file of the live kernel.  This can
  be read with gdb and other ELF tools.  No modifications can be
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index d1b875afc359..bef78923b387 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -10,6 +10,7 @@
  * Safe accesses to vmalloc/direct-mapped discontiguous areas, Kanoj 
Sarcar 
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -81,10 +82,13 @@ static size_t get_kcore_size(int *nphdr, size_t *phdrs_len, 
size_t *notes_len,
}
 
*phdrs_len = *nphdr * sizeof(struct elf_phdr);
-   *notes_len = (3 * (sizeof(struct elf_note) + ALIGN(sizeof(CORE_STR), 
4)) +
+   *notes_len = (4 * sizeof(struct elf_note) +
+ 3 * ALIGN(sizeof(CORE_STR), 4) +
+ VMCOREINFO_NOTE_NAME_BYTES +
  ALIGN(sizeof(struct elf_prstatus), 4) +
  ALIGN(sizeof(struct elf_prpsinfo), 4) +
- ALIGN(arch_task_struct_size, 4));
+ ALIGN(arch_task_struct_size, 4) +
+ ALIGN(vmcoreinfo_size, 4));
*data_offset = PAGE_ALIGN(sizeof(struct elfhdr) + *phdrs_len +
  *notes_len);
return *data_offset + size;
@@ -406,6 +410,16 @@ read_kcore(struct file *file, char __user *buffer, size_t 
buflen, loff_t *fpos)
  sizeof(prpsinfo));
append_kcore_note(notes, &i, CORE_STR, NT_TASKSTRUCT, current,
  arch_task_struct_size);
+   /*
+* vmcoreinfo_size is mostly constant after init time, but it
+* can be changed by crash_save_vmcoreinfo(). Racing here with a
+* panic on another CPU before the machine goes down is insanely
+* unlikely, but it's better to not leave potential buffer
+* overflows lying around, regardless.
+*/
+   append_kcore_note(notes, &i, VMCOREINFO_NOTE_NAME, 0,
+ vmcoreinfo_data,
+ min(vmcoreinfo_size, notes_len - i));
 
tsz = min_t(size_t, buflen, notes_offset + notes_len - *fpos);
if (copy_to_user(buffer, notes + *fpos - notes_offset, tsz)) {
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index b511f6d24b42..525510a9f965 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -60,6 +60,8 @@ phys_addr_t paddr_vmcoreinfo_note(void);
 #define VMCOREINFO_CONFIG(name) \
vmcoreinfo_append_str("CONFIG_%s=y\n", #name)
 
+extern unsigned char *vmcoreinfo_data;
+extern size_t vmcoreinfo_size;
 extern u32 *vmcoreinfo_note;
 
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index b66aced5e8c2..d02c58b94460 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -14,8 +14,8 @@
 #include 
 
 /* vmcoreinfo stuff */
-static unsigned char *vmcoreinfo_data;
-static size_t vmcoreinfo_size;
+unsigned char *vmcoreinfo_data;
+size_t vmcoreinfo_size;
 u32 *vmcoreinfo_note;
 
 /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
-- 
2.18.0



[PATCH v2 3/7] proc/kcore: fix memory hotplug vs multiple opens race

2018-07-12 Thread Omar Sandoval
From: Omar Sandoval 

There's a theoretical race condition that will cause /proc/kcore to miss
a memory hotplug event:

CPU0  CPU1
// hotplug event 1
kcore_need_update = 1

open_kcore()  open_kcore()
kcore_update_ram()kcore_update_ram()
// Walk RAM   // Walk RAM
__kcore_update_ram()  __kcore_update_ram()
kcore_need_update = 0

// hotplug event 2
kcore_need_update = 1
  kcore_need_update = 0

Note that CPU1 set up the RAM kcore entries with the state after hotplug
event 1 but cleared the flag for hotplug event 2. The RAM entries will
therefore be stale until there is another hotplug event.

This is an extremely unlikely sequence of events, but the fix makes the
synchronization saner, anyways: we serialize the entire update sequence,
which means that whoever clears the flag will always succeed in
replacing the kcore list.

Signed-off-by: Omar Sandoval 
---
 fs/proc/kcore.c | 93 +++--
 1 file changed, 44 insertions(+), 49 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index def92fccb167..33667db6e370 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -98,53 +98,15 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
return size + *elf_buflen;
 }
 
-static void free_kclist_ents(struct list_head *head)
-{
-   struct kcore_list *tmp, *pos;
-
-   list_for_each_entry_safe(pos, tmp, head, list) {
-   list_del(&pos->list);
-   kfree(pos);
-   }
-}
-/*
- * Replace all KCORE_RAM/KCORE_VMEMMAP information with passed list.
- */
-static void __kcore_update_ram(struct list_head *list)
-{
-   int nphdr;
-   size_t size;
-   struct kcore_list *tmp, *pos;
-   LIST_HEAD(garbage);
-
-   down_write(&kclist_lock);
-   if (atomic_cmpxchg(&kcore_need_update, 1, 0)) {
-   list_for_each_entry_safe(pos, tmp, &kclist_head, list) {
-   if (pos->type == KCORE_RAM
-   || pos->type == KCORE_VMEMMAP)
-   list_move(&pos->list, &garbage);
-   }
-   list_splice_tail(list, &kclist_head);
-   } else
-   list_splice(list, &garbage);
-   proc_root_kcore->size = get_kcore_size(&nphdr, &size);
-   up_write(&kclist_lock);
-
-   free_kclist_ents(&garbage);
-}
-
-
 #ifdef CONFIG_HIGHMEM
 /*
  * If no highmem, we can assume [0...max_low_pfn) continuous range of memory
  * because memory hole is not as big as !HIGHMEM case.
  * (HIGHMEM is special because part of memory is _invisible_ from the kernel.)
  */
-static int kcore_update_ram(void)
+static int kcore_ram_list(struct list_head *head)
 {
-   LIST_HEAD(head);
struct kcore_list *ent;
-   int ret = 0;
 
ent = kmalloc(sizeof(*ent), GFP_KERNEL);
if (!ent)
@@ -152,9 +114,8 @@ static int kcore_update_ram(void)
ent->addr = (unsigned long)__va(0);
ent->size = max_low_pfn << PAGE_SHIFT;
ent->type = KCORE_RAM;
-   list_add(&ent->list, &head);
-   __kcore_update_ram(&head);
-   return ret;
+   list_add(&ent->list, head);
+   return 0;
 }
 
 #else /* !CONFIG_HIGHMEM */
@@ -253,11 +214,10 @@ kclist_add_private(unsigned long pfn, unsigned long 
nr_pages, void *arg)
return 1;
 }
 
-static int kcore_update_ram(void)
+static int kcore_ram_list(struct list_head *list)
 {
int nid, ret;
unsigned long end_pfn;
-   LIST_HEAD(head);
 
/* Not inializedupdate now */
/* find out "max pfn" */
@@ -269,15 +229,50 @@ static int kcore_update_ram(void)
end_pfn = node_end;
}
/* scan 0 to max_pfn */
-   ret = walk_system_ram_range(0, end_pfn, &head, kclist_add_private);
-   if (ret) {
-   free_kclist_ents(&head);
+   ret = walk_system_ram_range(0, end_pfn, list, kclist_add_private);
+   if (ret)
return -ENOMEM;
+   return 0;
+}
+#endif /* CONFIG_HIGHMEM */
+
+static int kcore_update_ram(void)
+{
+   LIST_HEAD(list);
+   LIST_HEAD(garbage);
+   int nphdr;
+   size_t size;
+   struct kcore_list *tmp, *pos;
+   int ret = 0;
+
+   down_write(&kclist_lock);
+   if (!atomic_cmpxchg(&kcore_need_update, 1, 0))
+   goto out;
+
+   ret = kcore_ram_list(&list);
+   if (ret) {
+   /* Couldn't get the RAM list, try again next time. */
+   atomic_set(&kcore_need_update, 1);
+   list_splice_tail(&list, &garbage);
+   goto out;
+   }
+
+   list_for_each_entry_safe(pos, tmp, &kclist_head, list) {
+   if (pos->type == KCORE_RAM || pos->type == KCORE_VMEMMAP)
+   list_move(&pos->list, &garbage);
+   }
+   list_splice_tail(&list, &kclist_head);
+
+ 

[PATCH v2 0/7] /proc/kcore improvements

2018-07-12 Thread Omar Sandoval
From: Omar Sandoval 

Hi,

This series makes a few improvements to /proc/kcore. Patches 1 and 2 are
prep patches. Patch 3 is a fix/cleanup. Patch 4 is another prep patch.
Patches 5 and 6 are optimizations to ->read(). Patch 7 adds vmcoreinfo
to /proc/kcore (apparently I'm not the only one who wants this, see
https://www.spinics.net/lists/arm-kernel/msg665103.html).

I tested that the crash utility still works with this applied, and
readelf is happy with it, as well.

Andrew, since this didn't get any traction on the fsdevel side, and
you're already carrying James' patch, could you take this through -mm?

Thanks!

Changes from v1:

- Rebased onto v4.18-rc4 + James' patch
  (https://patchwork.kernel.org/patch/10519739/) in the mm tree
- Fix spurious sparse warning (see the report and response in
  https://patchwork.kernel.org/patch/10512431/)

Omar Sandoval (7):
  proc/kcore: don't grab lock for kclist_add()
  proc/kcore: replace kclist_lock rwlock with rwsem
  proc/kcore: fix memory hotplug vs multiple opens race
  proc/kcore: hold lock during read
  proc/kcore: clean up ELF header generation
  proc/kcore: optimize multiple page reads
  proc/kcore: add vmcoreinfo note to /proc/kcore

 fs/proc/Kconfig|   1 +
 fs/proc/kcore.c| 536 +
 include/linux/crash_core.h |   2 +
 kernel/crash_core.c|   4 +-
 4 files changed, 251 insertions(+), 292 deletions(-)

-- 
2.18.0



Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks


Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]

2018-07-12 Thread David Howells
Andy Lutomirski  wrote:

> >> I tend to think that this *should* fail using the new API.  The semantics
> >> of the second mount request are bizarre at best.
> > 
> > You still have to support existing behaviour lest you break userspace.
> > 
> 
> I assume the existing behavior is that a bind mount is created?  If so, the
> new mount(8) tool could do it in user code.

You have a race there.

Also you can't currently directly create a bind mount from userspace as you
can only bind from another path point - which you may not be able to access
(either by permission failure or because it's not in your mount namespace).

David


Re: [PATCH] x86: vdso: Fix leaky vdso link with CC=clang

2018-07-12 Thread Alistair Strachan
On Thu, Jul 12, 2018 at 4:20 PM Andy Lutomirski  wrote:
>
> > On Jul 12, 2018, at 3:06 PM, H. Peter Anvin  wrote:
> >
> >> On 07/12/18 13:37, Alistair Strachan wrote:
> >>> On Thu, Jul 12, 2018 at 1:25 PM H. Peter Anvin  wrote:
>  On 07/12/18 13:10, Alistair Strachan wrote:
>  The vdso{32,64}.so can fail to link with CC=clang when clang tries to
>  find a suitable GCC toolchain to link these libraries with.
> 
>  /usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
>   access beyond end of merged section (782)
> 
>  This happens because the host environment leaked into the cross
>  compiler environment due to the way clang searches for suitable GCC
>  toolchains.
> 
> >>>
> >>> Is this another clang bug that you want a workaround for in the kernel?
> >>
> >> Clang is a retargetable compiler (specified with --target=)
> >> and so it has a mechanism for searching for suitable binutils (from
> >> another "GCC toolchain") to perform assembly and linkage. This
> >> mechanism relies on both --target and --gcc-toolchain when
> >> cross-compiling, otherwise it will fall back to searching /usr.
> >>
> >> The --target and --gcc-toolchain flags are already specified correctly
> >> in the top level Makefile, but the vdso Makefile rolls its own linker
> >> flags and doesn't use KBUILD_CFLAGS. Therefore, these flags get
> >> incorrectly dropped from the vdso $CC link command line, and an
> >> inconsistency is created between the "GCC toolchain" used to generate
> >> the objects for the vdso, and the linker used to link them.
> >>
> >
> > It sounds like there needs to be a more fundamental symbol than
> > KBUILD_CFLAGS to contain these kinds of things.
>
> How about $(CC)?

I'm guessing, but I think this wasn't done originally because CC is
something the user might reasonably specify on the command line (the
other bit comes from CROSS_COMPILE), so doing this via CC would
require us to override the CC passed in on the command line. Not sure
how to do that, since the vdso makefile is executed with a submake, so
the usual "override CC := $(CC) something else" followed by "export
CC" doesn't work.


Consolidating RCU-bh, RCU-preempt, and RCU-sched

2018-07-12 Thread Paul E. McKenney
Hello!

I now have a semi-reasonable prototype of changes consolidating the
RCU-bh, RCU-preempt, and RCU-sched update-side APIs in my -rcu tree.
There are likely still bugs to be fixed and probably other issues as well,
but a prototype does exist.

Assuming continued good rcutorture results and no objections, I am
thinking in terms of this timeline:

o   Preparatory work and cleanups are slated for the v4.19 merge window.

o   The actual consolidation and post-consolidation cleanup is slated
for the merge window after v4.19 (v5.0?).  These cleanups include
the replacements called out below within the RCU implementation
itself (but excluding kernel/rcu/sync.c, see question below).

o   Replacement of now-obsolete update APIs is slated for the second
merge window after v4.19 (v5.1?).  The replacements are currently
expected to be as follows:

synchronize_rcu_bh() -> synchronize_rcu()
synchronize_rcu_bh_expedited() -> synchronize_rcu_expedited()
call_rcu_bh() -> call_rcu()
rcu_barrier_bh() -> rcu_barrier()
synchronize_sched() -> synchronize_rcu()
synchronize_sched_expedited() -> synchronize_rcu_expedited()
call_rcu_sched() -> call_rcu()
rcu_barrier_sched() -> rcu_barrier()
get_state_synchronize_sched() -> get_state_synchronize_rcu()
cond_synchronize_sched() -> cond_synchronize_rcu()
synchronize_rcu_mult() -> synchronize_rcu()

I have done light testing of these replacements with good results.

Any objections to this timeline?

I also have some questions on the ultimate end point.  I have default
choices, which I will likely take if there is no discussion.

o   
Currently, I am thinking in terms of keeping the per-flavor
read-side functions.  For example, rcu_read_lock_bh() would
continue to disable softirq, and would also continue to tell
lockdep about the RCU-bh read-side critical section.  However,
synchronize_rcu() will wait for all flavors of read-side critical
sections, including those introduced by (say) preempt_disable(),
so there will no longer be any possibility of mismatching (say)
RCU-bh readers with RCU-sched updaters.

I could imagine other ways of handling this, including:

a.  Eliminate rcu_read_lock_bh() in favor of
local_bh_disable() and so on.  Rely on lockdep
instrumentation of these other functions to identify RCU
readers, introducing such instrumentation as needed.  I am
not a fan of this approach because of the large number of
places in the Linux kernel where interrupts, preemption,
and softirqs are enabled or disabled "behind the scenes".

b.  Eliminate rcu_read_lock_bh() in favor of rcu_read_lock(),
and required callers to also disable softirqs, preemption,
or whatever as needed.  I am not a fan of this approach
because it seems a lot less convenient to users of RCU-bh
and RCU-sched.

At the moment, I therefore favor keeping the RCU-bh and RCU-sched
read-side APIs.  But are there better approaches?

o   How should kernel/rcu/sync.c be handled?  Here are some
possibilities:

a.  Leave the full gp_ops[] array and simply translate
the obsolete update-side functions to their RCU
equivalents.

b.  Leave the current gp_ops[] array, but only have
the RCU_SYNC entry.  The __INIT_HELD field would
be set to a function that was OK with being in an
RCU read-side critical section, an interrupt-disabled
section, etc.

This allows for possible addition of SRCU functionality.
It is also a trivial change.  Note that the sole user
of sync.c uses RCU_SCHED_SYNC, and this would need to
be changed to RCU_SYNC.

But is it likely that we will ever add SRCU?

c.  Eliminate that gp_ops[] array, hard-coding the function
pointers into their call sites.

I don't really have a preference.  Left to myself, I will be lazy
and take option #a.  Are there better approaches?

o   Currently, if a lock related to the scheduler's rq or pi locks is
held across rcu_read_unlock(), that lock must be held across the
entire read-side critical section in order to avoid deadlock.
Now that the end of the RCU read-side critical section is
deferred until sometime after interrupts are re-enabled, this
requirement could be lifted.  However, because the end of the RCU
read-side critical section is detected sometime after interrupts
are re-enabled, this means that a low-priority RCU reader might
remain p

Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-12 Thread Baoquan He
Hi Michal,

On 07/12/18 at 02:32pm, Michal Hocko wrote:
> On Thu 12-07-18 14:01:15, Chao Fan wrote:
> > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
> > >Hi Baoquan,
> > >
> > >At 07/11/2018 08:40 PM, Baoquan He wrote:
> > >> Please try this v3 patch:
> > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
> > >> From: Baoquan He 
> > >> Date: Wed, 11 Jul 2018 20:31:51 +0800
> > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
> > >> 
> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting
> > >> PFN movable zone begins in each node, kernel text position is not
> > >> considered. KASLR may put kernel after which movable zone begins.
> > >> 
> > >> Fix it by finding movable zone after kernel text on that node.
> > >> 
> > >> Signed-off-by: Baoquan He 
> > >
> > >
> > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or
> > >'movablecore=' failed if the KASLR puts the kernel back the tail of the
> > >last node, or more.
> > 
> > I think it may not fail.
> > There is a 'restart' to do another pass.
> > 
> > >
> > >Due to we have fix the mirror memory in KASLR side, and Chao is trying
> > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix
> > >this in the KASLR side.
> > >
> > 
> > I think it's better to fix here, but not KASLR side.
> > Cause much more code will be change if doing it in KASLR side.
> > Since we didn't parse 'kernelcore' in compressed code, and you can see
> > the distribution of ZONE_MOVABLE need so much code, so we do not need
> > to do so much job in KASLR side. But here, several lines will be OK.
> 
> I am not able to find the beginning of the email thread right now. Could
> you summarize what is the actual problem please?

The bug is found on x86 now. 

When added "kernelcore=" or "movablecore=" into kernel command line,
kernel memory is spread evenly among nodes. However, this is right when
KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
If KASLR enabled, it could be put any place from 16M to 64T randomly.
 
Consider a scenario, we have 10 nodes, and each node has 20G memory, and
we specify "kernelcore=50%", means each node will take 10G for
kernelcore, 10G for movable area. But this doesn't take kernel position
into consideration. E.g if kernel is put at 15G of 2nd node, namely
node1. Then we think on node1 there's 10G for kernelcore, 10G for
movable, in fact there's only 5G available for movable, just after
kernel.

I made a v4 patch which possibly can fix it.


>From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001
From: Baoquan He 
Date: Fri, 13 Jul 2018 07:49:29 +0800
Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text

In find_zone_movable_pfns_for_nodes(), when try to find the starting
PFN movable zone begins at in each node, kernel text position is not
considered. KASLR may put kernel after which movable zone begins.

Fix it by finding movable zone after kernel text on that node.

Signed-off-by: Baoquan He 
---
 mm/page_alloc.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100f1e63..5bc1a47dafda 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6547,7 +6547,7 @@ static unsigned long __init 
early_calculate_totalpages(void)
 static void __init find_zone_movable_pfns_for_nodes(void)
 {
int i, nid;
-   unsigned long usable_startpfn;
+   unsigned long usable_startpfn, kernel_endpfn, arch_startpfn;
unsigned long kernelcore_node, kernelcore_remaining;
/* save the state before borrow the nodemask */
nodemask_t saved_node_state = node_states[N_MEMORY];
@@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
if (!required_kernelcore || required_kernelcore >= totalpages)
goto out;
 
+   kernel_endpfn = PFN_UP(__pa_symbol(_end));
/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
-   usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
+   arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
 
 restart:
/* Spread kernelcore memory as evenly as possible throughout nodes */
@@ -6659,6 +6660,16 @@ static void __init find_zone_movable_pfns_for_nodes(void)
unsigned long start_pfn, end_pfn;
 
/*
+* KASLR may put kernel near tail of node memory,
+* start after kernel on that node to find PFN
+* at which zone begins.
+*/
+   if (pfn_to_nid(kernel_endpfn) == nid)
+   usable_startpfn = max(arch_startpfn, kernel_endpfn);
+   else
+   usable_startpfn = arch_startpfn;
+
+   /*
 * Recalculate kernelcore_node if the division per node
 * now exceeds what is necessary to satisfy the req

Re: [PATCH v13 16/18] sched: move sched clock initialization and merge with generic clock

2018-07-12 Thread kbuild test robot
Hi Pavel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.18-rc4 next-20180712]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Pavel-Tatashin/Early-boot-time-stamps/20180712-200238
config: microblaze-mmu_defconfig (attached as .config)
compiler: microblaze-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=microblaze 

All errors (new ones prefixed by >>):

   kernel/sched/clock.c: In function 'sched_clock_init':
>> kernel/sched/clock.c:440:2: error: implicit declaration of function 
>> 'generic_sched_clock_init'; did you mean 'sched_clock_init'? 
>> [-Werror=implicit-function-declaration]
 generic_sched_clock_init();
 ^~~~
 sched_clock_init
   cc1: some warnings being treated as errors

vim +440 kernel/sched/clock.c

   436  
   437  void __init sched_clock_init(void)
   438  {
   439  sched_clock_running = 1;
 > 440  generic_sched_clock_init();
   441  }
   442  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]

2018-07-12 Thread Andy Lutomirski



> On Jul 12, 2018, at 4:35 PM, David Howells  wrote:
> 
> Andy Lutomirski  wrote:
> 
>> I tend to think that this *should* fail using the new API.  The semantics of
>> the second mount request are bizarre at best.
> 
> You still have to support existing behaviour lest you break userspace.
> 

I assume the existing behavior is that a bind mount is created?  If so, the new 
mount(8) tool could do it in user code.

Re: [PATCH v5 1/2] dt-bindings: cpufreq: Introduce QCOM CPUFREQ Firmware bindings

2018-07-12 Thread Stephen Boyd
Quoting Taniya Das (2018-07-12 11:05:44)
[..]
> +   compatible = "qcom,kryo385";
> +   reg = <0x0 0x600>;
> +   enable-method = "psci";
> +   next-level-cache = <&L2_600>;
> +   qcom,freq-domain = <&freq_domain_table1>;
> +   L2_600: l2-cache {
> +   compatible = "cache";
> +   next-level-cache = <&L3_0>;
> +   };
> +   };
> +
> +   CPU7: cpu@700 {
> +   device_type = "cpu";
> +   compatible = "qcom,kryo385";
> +   reg = <0x0 0x700>;
> +   enable-method = "psci";
> +   next-level-cache = <&L2_700>;
> +   qcom,freq-domain = <&freq_domain_table1>;
> +   L2_700: l2-cache {
> +   compatible = "cache";
> +   next-level-cache = <&L3_0>;
> +   };
> +   };
> +   };
> +
> +   qcom,cpufreq-hw {
> +   compatible = "qcom,cpufreq-hw";
> +   #address-cells = <2>;
> +   #size-cells = <2>;
> +   ranges;
> +   freq_domain_table0: freq_table0 {
> +   reg = <0 0x17d43000 0 0x1400>;
> +   };
> +
> +   freq_domain_table1: freq_table1 {
> +   reg = <0 0x17d45800 0 0x1400>;
> +   };

It seems that we need to map the CPUs in the cpus node to the frequency
domains in the cpufreq-hw node. Wouldn't that be better served via a
#foo-cells and <&phandle foo-cell> property in the CPU node? It's
annoying that the cpufreq-hw node doesn't have a reg property, when it
really should have one that goes over the whole register space (or is
split across the frequency domains so that there are two reg properties
here).



Re: [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure

2018-07-12 Thread Andrew Morton
On Thu, 12 Jul 2018 13:29:42 -0400 Johannes Weiner  wrote:

> Right now, psi reports pressure and stall times of already concluded
> stall events. For most use cases this is current enough, but certain
> highly latency-sensitive applications, like the Android OOM killer,
> might want to know about and react to stall states before they have
> even concluded (e.g. a prolonged reclaim cycle).
> 
> This patches the procfs/cgroupfs interface such that when the pressure
> metrics are read, the current per-cpu states, if any, are taken into
> account as well.
> 
> Any ongoing states are concluded, their time snapshotted, and then
> restarted. This requires holding the rq lock to avoid corruption. It
> could use some form of rq lock ratelimiting or avoidance.
> 
> Requested-by: Suren Baghdasaryan 
> Not-yet-signed-off-by: Johannes Weiner 

What-does-that-mean:?


Re: [RFC v4 0/3] mm: zap pages with read mmap_sem in munmap for large mapping

2018-07-12 Thread Yang Shi




On 7/12/18 1:04 AM, Michal Hocko wrote:

On Wed 11-07-18 10:04:48, Yang Shi wrote:
[...]

One approach is to save all the vmas on a separate list, then zap_page_range
does unmap with this list.

Just detached unmapped vma chain from mm. You can keep the existing
vm_next chain and reuse it.


Yes. Other than this, we still need do:

  * Tell zap_page_range not update vm_flags as what I did in v4. Of 
course without VM_DEAD this time


  * Extract pagetable free code then do it after zap_page_range. I 
think I can just cal free_pgd_range() directly.








Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2

2018-07-12 Thread Andrew Morton
On Thu, 12 Jul 2018 13:29:32 -0400 Johannes Weiner  wrote:

>
> ...
>
> The io file is similar to memory. Because the block layer doesn't have
> a concept of hardware contention right now (how much longer is my IO
> request taking due to other tasks?), it reports CPU potential lost on
> all IO delays, not just the potential lost due to competition.

Probably dumb question: disks aren't the only form of IO.  Does it make
sense to accumulate PSI for other forms of IO?  Networking comes to
mind...



Re: [PATCH v5 2/2] cpufreq: qcom-hw: Add support for QCOM cpufreq HW driver

2018-07-12 Thread Stephen Boyd
Quoting Taniya Das (2018-07-12 11:05:45)
> The CPUfreq HW present in some QCOM chipsets offloads the steps necessary
> for changing the frequency of CPUs. The driver implements the cpufreq
> driver interface for this hardware engine.
> 
> Signed-off-by: Saravana Kannan 
> Signed-off-by: Taniya Das 
> diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
> index 52f5f1a..141ec3e 100644
> --- a/drivers/cpufreq/Kconfig.arm
> +++ b/drivers/cpufreq/Kconfig.arm
> @@ -312,3 +312,13 @@ config ARM_PXA2xx_CPUFREQ
>   This add the CPUFreq driver support for Intel PXA2xx SOCs.
> 
>   If in doubt, say N.
> +
> +config ARM_QCOM_CPUFREQ_HW
> +   bool "QCOM CPUFreq HW driver"

Why can't it be a module?

> +   help
> +Support for the CPUFreq HW driver.
> +Some QCOM chipsets have a HW engine to offload the steps
> +necessary for changing the frequency of the CPUs. Firmware loaded
> +in this engine exposes a programming interafce to the High-level OS.

typo on interface. Why is High capitalized? Just say OS?

> +The driver implements the cpufreq driver interface for this HW 
> engine.

So much 'driver'.

> +Say Y if you want to support CPUFreq HW.
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c 
> b/drivers/cpufreq/qcom-cpufreq-hw.c
> new file mode 100644
> index 000..fa25a95
> --- /dev/null
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -0,0 +1,344 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2018, The Linux Foundation. All rights reserved.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define INIT_RATE  3UL

This doesn't need to be configured from DT? Or more likely be specified
as some sort of PLL that is part of the clocks property so we know what
the 'safe' or 'default' frequency is?

> +#define XO_RATE1920UL

This should come from DT via some clocks property.

> +#define LUT_MAX_ENTRIES40U
> +#define CORE_COUNT_VAL(val)(((val) & (GENMASK(18, 16))) >> 16)
> +#define LUT_ROW_SIZE   32
> +
> +enum {
> +   REG_ENABLE,
> +   REG_LUT_TABLE,
> +   REG_PERF_STATE,
> +
> +   REG_ARRAY_SIZE,
> +};
> +
> +struct cpufreq_qcom {
> +   struct cpufreq_frequency_table *table;
> +   struct device *dev;
> +   const u16 *reg_offset;
> +   void __iomem *base;
> +   cpumask_t related_cpus;
> +   unsigned int max_cores;
> +};
> +
> +static u16 cpufreq_qcom_std_offsets[REG_ARRAY_SIZE] = {

const?

> +   [REG_ENABLE]= 0x0,
> +   [REG_LUT_TABLE] = 0x110,
> +   [REG_PERF_STATE]= 0x920,

Is the register map going to change again for the next device? It may be
better to precalculate the offset for the fast switch so that the
addition isn't in the hotpath.

> +};
> +
> +static struct cpufreq_qcom *qcom_freq_domain_map[NR_CPUS];
> +
> +static int
> +qcom_cpufreq_hw_target_index(struct cpufreq_policy *policy,
> +unsigned int index)
> +{
> +   struct cpufreq_qcom *c = policy->driver_data;
> +   unsigned int offset = c->reg_offset[REG_PERF_STATE];
> +
> +   writel_relaxed(index, c->base + offset);
> +
> +   return 0;
> +}
> +
> +static unsigned int qcom_cpufreq_hw_get(unsigned int cpu)
> +{
> +   struct cpufreq_qcom *c;
> +   struct cpufreq_policy *policy;
> +   unsigned int index, offset;
> +
> +   policy = cpufreq_cpu_get_raw(cpu);
> +   if (!policy)
> +   return 0;
> +
> +   c = policy->driver_data;
> +   offset = c->reg_offset[REG_PERF_STATE];
> +
> +   index = readl_relaxed(c->base + offset);
> +   index = min(index, LUT_MAX_ENTRIES - 1);
> +
> +   return policy->freq_table[index].frequency;
> +}
> +
> +static unsigned int
> +qcom_cpufreq_hw_fast_switch(struct cpufreq_policy *policy,
> +   unsigned int target_freq)
> +{
> +   struct cpufreq_qcom *c = policy->driver_data;
> +   unsigned int offset;
> +   int index;
> +
> +   index = cpufreq_table_find_index_l(policy, target_freq);

It's unfortunate that we have to search the table in software again.
Why can't we use policy->cached_resolved_idx to avoid this search twice?

> +   if (index < 0)
> +   return 0;
> +
> +   offset = c->reg_offset[REG_PERF_STATE];
> +
> +   writel_relaxed(index, c->base + offset);
> +
> +   return policy->freq_table[index].frequency;
> +}
> +
> +static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
> +{
> +   struct cpufreq_qcom *c;
> +
> +   c = qcom_freq_domain_map[policy->cpu];
> +   if (!c) {
> +   pr_err("No scaling support for CPU%d\n", policy->cpu);
> +   return -ENODEV;
> +   }
> +
> +   cpumask_copy(policy->cpus, &c->related_cpus);
> +
> +   policy->fast_switch_possible = true;
> +

[PATCH 5/6] swap: Add __swap_entry_free_locked()

2018-07-12 Thread Huang, Ying
From: Huang Ying 

The part of __swap_entry_free() with lock held is separated into a new
function __swap_entry_free_locked().  Because we want to reuse that
piece of code in some other places.

Just mechanical code refactoring, there is no any functional change in
this function.

Signed-off-by: "Huang, Ying" 
Cc: Dave Hansen 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Daniel Jordan 
Cc: Dan Williams 
---
 mm/swapfile.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index e0df8d22ac92..bc488bf36c86 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1180,16 +1180,13 @@ struct swap_info_struct *get_swap_device(swp_entry_t 
entry)
return NULL;
 }
 
-static unsigned char __swap_entry_free(struct swap_info_struct *p,
-  swp_entry_t entry, unsigned char usage)
+static unsigned char __swap_entry_free_locked(struct swap_info_struct *p,
+ unsigned long offset,
+ unsigned char usage)
 {
-   struct swap_cluster_info *ci;
-   unsigned long offset = swp_offset(entry);
unsigned char count;
unsigned char has_cache;
 
-   ci = lock_cluster_or_swap_info(p, offset);
-
count = p->swap_map[offset];
 
has_cache = count & SWAP_HAS_CACHE;
@@ -1217,6 +1214,17 @@ static unsigned char __swap_entry_free(struct 
swap_info_struct *p,
usage = count | has_cache;
p->swap_map[offset] = usage ? : SWAP_HAS_CACHE;
 
+   return usage;
+}
+
+static unsigned char __swap_entry_free(struct swap_info_struct *p,
+  swp_entry_t entry, unsigned char usage)
+{
+   struct swap_cluster_info *ci;
+   unsigned long offset = swp_offset(entry);
+
+   ci = lock_cluster_or_swap_info(p, offset);
+   usage = __swap_entry_free_locked(p, offset, usage);
unlock_cluster_or_swap_info(p, ci);
 
return usage;
-- 
2.16.4



[PATCH 2/6] mm/swapfile.c: Replace some #ifdef with IS_ENABLED()

2018-07-12 Thread Huang, Ying
From: Huang Ying 

In mm/swapfile.c, THP (Transparent Huge Page) swap specific code is
enclosed by #ifdef CONFIG_THP_SWAP/#endif to avoid code dilating when
THP isn't enabled.  But #ifdef/#endif in .c file hurt the code
readability, so Dave suggested to use IS_ENABLED(CONFIG_THP_SWAP)
instead and let compiler to do the dirty job for us.  This has
potential to remove some duplicated code too.  From output of `size`,

text   data bss dec hex filename
THP=y: 26269   2076 340   28685700d mm/swapfile.o
ifdef/endif:   24115   2028 340   264836773 mm/swapfile.o
IS_ENABLED:24179   2028 340   2654767b3 mm/swapfile.o

IS_ENABLED() based solution works quite well, almost as good as that
of #ifdef/#endif.  And from the diffstat, the removed lines are more
than added lines.

One #ifdef for split_swap_cluster() is kept.  Because it is a public
function with a stub implementation for CONFIG_THP_SWAP=n in swap.h.

Signed-off-by: "Huang, Ying" 
Suggested-by: Dave Hansen 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Daniel Jordan 
Cc: Dan Williams 
---
 mm/swapfile.c | 56 
 1 file changed, 16 insertions(+), 40 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index e31aa601d9c0..75c84aa763a3 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -870,7 +870,6 @@ static int scan_swap_map_slots(struct swap_info_struct *si,
return n_ret;
 }
 
-#ifdef CONFIG_THP_SWAP
 static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot)
 {
unsigned long idx;
@@ -878,6 +877,11 @@ static int swap_alloc_cluster(struct swap_info_struct *si, 
swp_entry_t *slot)
unsigned long offset, i;
unsigned char *map;
 
+   if (!IS_ENABLED(CONFIG_THP_SWAP)) {
+   VM_WARN_ON_ONCE(1);
+   return 0;
+   }
+
if (cluster_list_empty(&si->free_clusters))
return 0;
 
@@ -908,13 +912,6 @@ static void swap_free_cluster(struct swap_info_struct *si, 
unsigned long idx)
unlock_cluster(ci);
swap_range_free(si, offset, SWAPFILE_CLUSTER);
 }
-#else
-static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot)
-{
-   VM_WARN_ON_ONCE(1);
-   return 0;
-}
-#endif /* CONFIG_THP_SWAP */
 
 static unsigned long scan_swap_map(struct swap_info_struct *si,
   unsigned char usage)
@@ -1260,7 +1257,6 @@ static void swapcache_free(swp_entry_t entry)
}
 }
 
-#ifdef CONFIG_THP_SWAP
 static void swapcache_free_cluster(swp_entry_t entry)
 {
unsigned long offset = swp_offset(entry);
@@ -1271,6 +1267,9 @@ static void swapcache_free_cluster(swp_entry_t entry)
unsigned int i, free_entries = 0;
unsigned char val;
 
+   if (!IS_ENABLED(CONFIG_THP_SWAP))
+   return;
+
si = _swap_info_get(entry);
if (!si)
return;
@@ -1306,6 +1305,7 @@ static void swapcache_free_cluster(swp_entry_t entry)
}
 }
 
+#ifdef CONFIG_THP_SWAP
 int split_swap_cluster(swp_entry_t entry)
 {
struct swap_info_struct *si;
@@ -1320,11 +1320,7 @@ int split_swap_cluster(swp_entry_t entry)
unlock_cluster(ci);
return 0;
 }
-#else
-static inline void swapcache_free_cluster(swp_entry_t entry)
-{
-}
-#endif /* CONFIG_THP_SWAP */
+#endif
 
 void put_swap_page(struct page *page, swp_entry_t entry)
 {
@@ -1483,7 +1479,6 @@ int swp_swapcount(swp_entry_t entry)
return count;
 }
 
-#ifdef CONFIG_THP_SWAP
 static bool swap_page_trans_huge_swapped(struct swap_info_struct *si,
 swp_entry_t entry)
 {
@@ -1494,6 +1489,9 @@ static bool swap_page_trans_huge_swapped(struct 
swap_info_struct *si,
int i;
bool ret = false;
 
+   if (!IS_ENABLED(CONFIG_THP_SWAP))
+   return swap_swapcount(si, entry) != 0;
+
ci = lock_cluster_or_swap_info(si, offset);
if (!ci || !cluster_is_huge(ci)) {
if (map[roffset] != SWAP_HAS_CACHE)
@@ -1516,7 +1514,7 @@ static bool page_swapped(struct page *page)
swp_entry_t entry;
struct swap_info_struct *si;
 
-   if (likely(!PageTransCompound(page)))
+   if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!PageTransCompound(page)))
return page_swapcount(page) != 0;
 
page = compound_head(page);
@@ -1540,10 +1538,8 @@ static int page_trans_huge_map_swapcount(struct page 
*page, int *total_mapcount,
/* hugetlbfs shouldn't call it */
VM_BUG_ON_PAGE(PageHuge(page), page);
 
-   if (likely(!PageTransCompound(page))) {
-   mapcount = atomic_read(&page->_mapcount) + 1;
-   if (total_mapcount)
-   *total_mapcount = mapcount;
+   if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!PageTransCompound(page))) {
+   mapcount = page_t

[PATCH 1/6] swap: Add comments to lock_cluster_or_swap_info()

2018-07-12 Thread Huang, Ying
From: Huang Ying 

To improve the code readability.

Signed-off-by: "Huang, Ying" 
Suggested-by: Dave Hansen 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Shaohua Li 
Cc: Hugh Dickins 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Daniel Jordan 
Cc: Dan Williams 
---
 mm/swapfile.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index d8fddfb000ec..e31aa601d9c0 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -297,6 +297,12 @@ static inline void unlock_cluster(struct swap_cluster_info 
*ci)
spin_unlock(&ci->lock);
 }
 
+/*
+ * At most times, fine grained cluster lock is sufficient to protect
+ * the operations on sis->swap_map.  No need to acquire gross grained
+ * sis->lock.  But cluster and cluster lock isn't available for HDD,
+ * so sis->lock will be instead for them.
+ */
 static inline struct swap_cluster_info *lock_cluster_or_swap_info(
struct swap_info_struct *si,
unsigned long offset)
-- 
2.16.4



  1   2   3   4   5   6   7   >