date:20190618

Re: [PATCH] net: stmmac: add sanity check to device_property_read_u32_array call

2019-06-18 Thread Colin Ian King

On 19/06/2019 06:13, Martin Blumenstingl wrote:
> Hi Colin,
> 
>> Currently the call to device_property_read_u32_array is not error checked
>> leading to potential garbage values in the delays array that are then used
>> in msleep delays.  Add a sanity check to the property fetching.
>>
>> Addresses-Coverity: ("Uninitialized scalar variable")
>> Signed-off-by: Colin Ian King 
> I have also sent a patch [0] to fix initialize the array.
> can you please look at my patch so we can work out which one to use?
> 
> my concern is that the "snps,reset-delays-us" property is optional,
> the current dt-bindings documentation states that it's a required
> property. in reality it isn't, there are boards (two examples are
> mentioned in my patch: [0]) without it.
> 
> so I believe that the resulting behavior has to be:
> 1. don't delay if this property is missing (instead of delaying for
> ms)
> 2. don't error out if this property is missing
> 
> your patch covers #1, can you please check whether #2 is also covered?
> I tested case #2 when submitting my patch and it worked fine (even
> though I could not reproduce the garbage values which are being read
> on some boards)
> 
> 
> Thank you!
> Martin
> 
> 
> [0] https://lkml.org/lkml/2019/4/19/638
> 
Is that the correct link?

Colin

Re: [PATCH] staging: kpc2000: simplify error handling in kp2000_pcie_probe

2019-06-18 Thread Dan Carpenter

On Wed, Jun 19, 2019 at 08:36:07AM +0200, Simon Sandström wrote:
> We can get rid of a few iounmaps in the middle of the function by
> re-ordering the error handling labels and adding two new labels.
> 
> Signed-off-by: Simon Sandström 
> ---
> 
> This change has not been tested besides by compiling. It might be good
> took take an extra look to make sure that I got everything right.
> 

You have the right instincts that when something looks really
complicated that's probably for a reason.  That attitude will serve you
well in the future!  But in this case it's staging code so the original
code is just strange.

Reviewed-by: Dan Carpenter 

> Also, this change was proposed by Dan Carpenter. Should I add anything
> in the commit message to show this?

There is a Suggested-by: tag for this, but don't resend because I don't
care and I've already reviewed this version so I don't want to review
the patch again.

regards,
dan carpenter

Re: [PATCH 5/5] Powerpc/Watchpoint: Fix length calculation for unaligned target

2019-06-18 Thread Ravi Bangoria



On 6/18/19 12:16 PM, Christophe Leroy wrote:
>>   +/* Maximum len for DABR is 8 bytes and DAWR is 512 bytes */
>> +static int hw_breakpoint_validate_len(struct arch_hw_breakpoint *hw)
>> +{
>> +    u16 length_max = 8;
>> +    u16 final_len;
> 
> You should be more consistent in naming. If one is called final_len, the 
> other one should be called max_len.

Copy/paste :). Will change it.

> 
>> +    unsigned long start_addr, end_addr;
>> +
>> +    final_len = hw_breakpoint_get_final_len(hw, &start_addr, &end_addr);
>> +
>> +    if (dawr_enabled()) {
>> +    length_max = 512;
>> +    /* DAWR region can't cross 512 bytes boundary */
>> +    if ((start_addr >> 9) != (end_addr >> 9))
>> +    return -EINVAL;
>> +    }
>> +
>> +    if (final_len > length_max)
>> +    return -EINVAL;
>> +
>> +    return 0;
>> +}
>> +
> 
> Is many places, we have those numeric 512 and 9 shift. Could we replace them 
> by some symbol, for instance DAWR_SIZE and DAWR_SHIFT ?

I don't see any other place where we check for boundary limit.

[...]

> 
>> +u16 hw_breakpoint_get_final_len(struct arch_hw_breakpoint *brk,
>> +    unsigned long *start_addr,
>> +    unsigned long *end_addr)
>> +{
>> +    *start_addr = brk->address & ~HW_BREAKPOINT_ALIGN;
>> +    *end_addr = (brk->address + brk->len - 1) | HW_BREAKPOINT_ALIGN;
>> +    return *end_addr - *start_addr + 1;
>> +}
> 
> This function gives horrible code (a couple of unneeded store/re-read and 
> read/re-read).
> 
> 06bc :
>  6bc:    81 23 00 00 lwz r9,0(r3)
>  6c0:    55 29 00 38 rlwinm  r9,r9,0,0,28
>  6c4:    91 24 00 00 stw r9,0(r4)
>  6c8:    81 43 00 00 lwz r10,0(r3)
>  6cc:    a1 23 00 06 lhz r9,6(r3)
>  6d0:    38 6a ff ff addi    r3,r10,-1
>  6d4:    7c 63 4a 14 add r3,r3,r9
>  6d8:    60 63 00 07 ori r3,r3,7
>  6dc:    90 65 00 00 stw r3,0(r5)
>  6e0:    38 63 00 01 addi    r3,r3,1
>  6e4:    81 24 00 00 lwz r9,0(r4)
>  6e8:    7c 69 18 50 subf    r3,r9,r3
>  6ec:    54 63 04 3e clrlwi  r3,r3,16
>  6f0:    4e 80 00 20 blr
> 
> Below code gives something better:
> 
> u16 hw_breakpoint_get_final_len(struct arch_hw_breakpoint *brk,
>     unsigned long *start_addr,
>     unsigned long *end_addr)
> {
> unsigned long address = brk->address;
> unsigned long len = brk->len;
> unsigned long start = address & ~HW_BREAKPOINT_ALIGN;
> unsigned long end = (address + len - 1) | HW_BREAKPOINT_ALIGN;
> 
> *start_addr = start;
> *end_addr = end;
> return end - start + 1;
> }
> 
> 06bc :
>  6bc:    81 43 00 00 lwz r10,0(r3)
>  6c0:    a1 03 00 06 lhz r8,6(r3)
>  6c4:    39 2a ff ff addi    r9,r10,-1
>  6c8:    7d 28 4a 14 add r9,r8,r9
>  6cc:    55 4a 00 38 rlwinm  r10,r10,0,0,28
>  6d0:    61 29 00 07 ori r9,r9,7
>  6d4:    91 44 00 00 stw r10,0(r4)
>  6d8:    20 6a 00 01 subfic  r3,r10,1
>  6dc:    91 25 00 00 stw r9,0(r5)
>  6e0:    7c 63 4a 14 add r3,r3,r9
>  6e4:    54 63 04 3e clrlwi  r3,r3,16
>  6e8:    4e 80 00 20 blr
> 
> 
> And regardless, that's a pitty to have this function using pointers which are 
> from local variables in the callers, as we loose the benefit of registers. 
> Couldn't this function go in the .h as a static inline ? I'm sure the result 
> would be worth it.

This is obviously a bit of optimization, but I like Mikey's idea of
storing start_addr and end_addr in the arch_hw_breakpoint. That way
we don't have to recalculate length every time in set_dawr.

Re: [PATCH v3 0/6] Enable THP for text section of non-shmem files

2019-06-18 Thread Michal Hocko

[Cc fsdevel and lkml]

On Tue 18-06-19 23:24:18, Song Liu wrote:
> Changes v2 => v3:
> 1. Removed the limitation (cannot write to file with THP) by truncating
>whole file during sys_open (see 6/6);
> 2. Fixed a VM_BUG_ON_PAGE() in filemap_fault() (see 2/6);
> 3. Split function rename to a separate patch (Rik);
> 4. Updated condition in hugepage_vma_check() (Rik).
> 
> Changes v1 => v2:
> 1. Fixed a missing mem_cgroup_commit_charge() for non-shmem case.
> 
> This set follows up discussion at LSF/MM 2019. The motivation is to put
> text section of an application in THP, and thus reduces iTLB miss rate and
> improves performance. Both Facebook and Oracle showed strong interests to
> this feature.
> 
> To make reviews easier, this set aims a mininal valid product. Current
> version of the work does not have any changes to file system specific
> code. This comes with some limitations (discussed later).
> 
> This set enables an application to "hugify" its text section by simply
> running something like:
> 
>   madvise(0x60, 0x8, MADV_HUGEPAGE);
> 
> Before this call, the /proc//maps looks like:
> 
> 0040-074d r-xp  00:27 2006927 app
> 
> After this call, part of the text section is split out and mapped to
> THP:
> 
> 0040-00425000 r-xp  00:27 2006927 app
> 0060-00e0 r-xp 0020 00:27 2006927 app   <<< on THP
> 00e0-074d r-xp 00a0 00:27 2006927 app
> 
> Limitations:
> 
> 1. This only works for text section (vma with VM_DENYWRITE).
> 2. Original limitation #2 is removed in v3.
> 
> We gated this feature with an experimental config, READ_ONLY_THP_FOR_FS.
> Once we get better support on the write path, we can remove the config and
> enable it by default.
> 
> Tested cases:
> 1. Tested with btrfs and ext4.
> 2. Tested with real work application (memcache like caching service).
> 3. Tested with "THP aware uprobe":
>https://patchwork.kernel.org/project/linux-mm/list/?series=131339
> 
> Please share your comments and suggestions on this.
> 
> Thanks!
> 
> Song Liu (6):
>   filemap: check compound_head(page)->mapping in filemap_fault()
>   filemap: update offset check in filemap_fault()
>   mm,thp: stats for file backed THP
>   khugepaged: rename collapse_shmem() and khugepaged_scan_shmem()
>   mm,thp: add read-only THP support for (non-shmem) FS
>   mm,thp: handle writes to file with THP in pagecache
> 
>  fs/inode.c |   3 ++
>  fs/proc/meminfo.c  |   4 ++
>  include/linux/fs.h |  31 
>  include/linux/mmzone.h |   2 +
>  mm/Kconfig |  11 +
>  mm/filemap.c   |   9 ++--
>  mm/khugepaged.c| 104 +
>  mm/rmap.c  |  12 +++--
>  mm/truncate.c  |   7 ++-
>  mm/vmstat.c|   2 +
>  10 files changed, 156 insertions(+), 29 deletions(-)
> 
> --
> 2.17.1

-- 
Michal Hocko
SUSE Labs

Re: [PATCH] net: mvpp2: cls: Add pmap to fs dump

2019-06-18 Thread Maxime Chevallier

Hello Nathan,

On Tue, 18 Jun 2019 09:09:10 -0700
Nathan Huckleberry  wrote:

>There was an unused variable 'mvpp2_dbgfs_prs_pmap_fops'
>Added a usage consistent with other fops to dump pmap
>to userspace.

Thanks for sending a fix. Besides the typo preventing your patch from
compiling, you should also prefix the patch by "net: mvpp2: debugfs:"
rather than "cls", which is used for classifier patches.

Thanks,

Maxime

Re: [PATCH 1/1] udf: Fix incorrect final NOT_ALLOCATED (hole) extent length

2019-06-18 Thread Jan Kara

Hi Steve!

On Sun 16-06-19 11:28:46, Steve Magnani wrote:
> On 6/4/19 7:31 AM, Steve Magnani wrote:
> 
> > In some cases, using the 'truncate' command to extend a UDF file results
> > in a mismatch between the length of the file's extents (specifically, due
> > to incorrect length of the final NOT_ALLOCATED extent) and the information
> > (file) length. The discrepancy can prevent other operating systems
> > (i.e., Windows 10) from opening the file.
> > 
> > Two particular errors have been observed when extending a file:
> > 
> > 1. The final extent is larger than it should be, having been rounded up
> > to a multiple of the block size.
> > 
> > B. The final extent is shorter than it should be, due to not having
> > been updated when the file's information length was increased.
> 
> Wondering if you've seen this, or if something got lost in a spam folder.

Sorry for not getting to you earlier. I've seen the patches and they look
reasonable to me. I just wanted to have a one more closer look but last
weeks were rather busy so I didn't get to it. I'll look into it this week.
Thanks a lot for debugging the problem and sending the fixes!

Honza
-- 
Jan Kara 
SUSE Labs, CR

[PATCH 0/2] perf thread-stack: Fix thread stack return from kernel for kernel-only case

2019-06-18 Thread Adrian Hunter

Hi

Here is one non-urgent fix and a subsequent tidy-up.


Adrian Hunter (2):
  perf thread-stack: Fix thread stack return from kernel for kernel-only 
case
  perf thread-stack: Eliminate code duplicating thread_stack__pop_ks()

 tools/perf/util/thread-stack.c | 48 ++
 1 file changed, 35 insertions(+), 13 deletions(-)


Regards
Adrian

[PATCH 2/2] perf thread-stack: Eliminate code duplicating thread_stack__pop_ks()

2019-06-18 Thread Adrian Hunter

Use new function thread_stack__pop_ks() in place of equivalent code.

Signed-off-by: Adrian Hunter 
---
 tools/perf/util/thread-stack.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index f91c00dfe23b..b20c9b867fce 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -673,12 +673,9 @@ static int thread_stack__no_call_return(struct thread 
*thread,
 
if (ip >= ks && addr < ks) {
/* Return to userspace, so pop all kernel addresses */
-   while (thread_stack__in_kernel(ts)) {
-   err = thread_stack__call_return(thread, ts, --ts->cnt,
-   tm, ref, true);
-   if (err)
-   return err;
-   }
+   err = thread_stack__pop_ks(thread, ts, sample, ref);
+   if (err)
+   return err;
 
/* If the stack is empty, push the userspace address */
if (!ts->cnt) {
@@ -688,12 +685,9 @@ static int thread_stack__no_call_return(struct thread 
*thread,
}
} else if (thread_stack__in_kernel(ts) && ip < ks) {
/* Return to userspace, so pop all kernel addresses */
-   while (thread_stack__in_kernel(ts)) {
-   err = thread_stack__call_return(thread, ts, --ts->cnt,
-   tm, ref, true);
-   if (err)
-   return err;
-   }
+   err = thread_stack__pop_ks(thread, ts, sample, ref);
+   if (err)
+   return err;
}
 
if (ts->cnt)
-- 
2.17.1

[PATCH 1/2] perf thread-stack: Fix thread stack return from kernel for kernel-only case

2019-06-18 Thread Adrian Hunter

Commit f08046cb3082 ("perf thread-stack: Represent jmps to the start of a
different symbol") had the side-effect of introducing more stack entries
before return from kernel space. When user space is also traced, those
entries are popped before entry to user space, but when user space is not
traced, they get stuck at the bottom of the stack, making the stack grow
progressively larger. Fix by detecting a return-from-kernel branch type,
and popping kernel addresses from the stack then.

Note, the problem and fix affect the exported Call Graph / Tree but not
the callindent option used by "perf script --call-trace".

Example:

  perf-with-kcore record example -e intel_pt//k -- ls
  perf-with-kcore script --itrace=bep -s 
~/libexec/perf-core/scripts/python/export-to-sqlite.py example.db branches calls
  ~/libexec/perf-core/scripts/python/exported-sql-viewer.py example.db

  Menu option: Reports -> Context-Sensitive Call Graph

  Before: (showing Call Path column only)

Call Path
▶ perf
▼ ls
  ▼ 12111:12111
▶ setup_new_exec
▶ __task_pid_nr_ns
▶ perf_event_pid_type
▶ perf_event_comm_output
▶ perf_iterate_ctx
▶ perf_iterate_sb
▶ perf_event_comm
▶ __set_task_comm
▶ load_elf_binary
▶ search_binary_handler
▶ __do_execve_file.isra.41
▶ __x64_sys_execve
▶ do_syscall_64
▼ entry_SYSCALL_64_after_hwframe
  ▼ swapgs_restore_regs_and_return_to_usermode
▼ native_iret
  ▶ error_entry
  ▶ do_page_fault
  ▼ error_exit
▼ retint_user
  ▶ prepare_exit_to_usermode
  ▼ native_iret
▶ error_entry
▶ do_page_fault
▼ error_exit
  ▼ retint_user
▶ prepare_exit_to_usermode
▼ native_iret
  ▶ error_entry
  ▶ do_page_fault
  ▼ error_exit
▼ retint_user
  ▶ prepare_exit_to_usermode
  ▶ native_iret

  After: (showing Call Path column only)

Call Path
▶ perf
▼ ls
  ▼ 12111:12111
▶ setup_new_exec
▶ __task_pid_nr_ns
▶ perf_event_pid_type
▶ perf_event_comm_output
▶ perf_iterate_ctx
▶ perf_iterate_sb
▶ perf_event_comm
▶ __set_task_comm
▶ load_elf_binary
▶ search_binary_handler
▶ __do_execve_file.isra.41
▶ __x64_sys_execve
▶ do_syscall_64
▶ entry_SYSCALL_64_after_hwframe
▶ page_fault
▼ entry_SYSCALL_64
  ▼ do_syscall_64
▶ __x64_sys_brk
▶ __x64_sys_access
▶ __x64_sys_openat
▶ __x64_sys_newfstat
▶ __x64_sys_mmap
▶ __x64_sys_close
▶ __x64_sys_read
▶ __x64_sys_mprotect
▶ __x64_sys_arch_prctl
▶ __x64_sys_munmap
▶ exit_to_usermode_loop
▶ __x64_sys_set_tid_address
▶ __x64_sys_set_robust_list
▶ __x64_sys_rt_sigaction
▶ __x64_sys_rt_sigprocmask
▶ __x64_sys_prlimit64
▶ __x64_sys_statfs
▶ __x64_sys_ioctl
▶ __x64_sys_getdents64
▶ __x64_sys_write
▶ __x64_sys_exit_group

Signed-off-by: Adrian Hunter 
Fixes: f08046cb3082 ("perf thread-stack: Represent jmps to the start of a 
different symbol")
Cc: sta...@vger.kernel.org
---
 tools/perf/util/thread-stack.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index 8e390f78486f..f91c00dfe23b 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -637,6 +637,23 @@ static int thread_stack__bottom(struct thread_stack *ts,
 true, false);
 }
 
+static int thread_stack__pop_ks(struct thread *thread, struct thread_stack *ts,
+   struct perf_sample *sample, u64 ref)
+{
+   u64 tm = sample->time;
+   int err;
+
+   /* Return to userspace, so pop all kernel addresses */
+   while (thread_stack__in_kernel(ts)) {
+   err = thread_stack__call_return(thread, ts, --ts->cnt,
+   tm, ref, true);
+   if (err)
+   return err;
+   }
+
+   return 0;
+}
+
 static int thread_stack__no_call_return(struct thread *thread,
struct thread_stack *ts,
struct perf_sample *sample,
@@ -919,7 +936,18 @@ int thread_stack__process(struct thread *thread, struct 
comm *comm,
ts->rstate = X86_RETPOLINE_DETECTED;
 
} else if (sample->flags

linux-next: build failure after merge of the usb tree

2019-06-18 Thread Stephen Rothwell

Hi all,

After merging the usb tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

In file included from usr/include/linux/usbdevice_fs.hdrtest.c:1:
./usr/include/linux/usbdevice_fs.h:88:2: error: unknown type name 'u8'
  u8 num_ports;  /* Number of ports the device is connected */
  ^~
./usr/include/linux/usbdevice_fs.h:92:2: error: unknown type name 'u8'
  u8 ports[7];  /* List of ports on the way from the root  */
  ^~

Caused by commit

  6d101f24f1dd ("USB: add usbfs ioctl to retrieve the connection parameters")

Presumably exposed by commit

  b91976b7c0e3 ("kbuild: compile-test UAPI headers to ensure they are 
self-contained")

from the kbuild tree.

I have added this patch for now:

From: Stephen Rothwell 
Date: Wed, 19 Jun 2019 16:36:16 +1000
Subject: [PATCH] USB: fix types in uapi include

Signed-off-by: Stephen Rothwell 
---
 include/uapi/linux/usbdevice_fs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/usbdevice_fs.h 
b/include/uapi/linux/usbdevice_fs.h
index 4b267fe3776e..78efe870c2b7 100644
--- a/include/uapi/linux/usbdevice_fs.h
+++ b/include/uapi/linux/usbdevice_fs.h
@@ -85,11 +85,11 @@ struct usbdevfs_conninfo_ex {
/* kernel, the device is connected to. */
__u32 devnum;   /* Device address on the bus.  */
__u32 speed;/* USB_SPEED_* constants from ch9.h*/
-   u8 num_ports;   /* Number of ports the device is connected */
+   __u8 num_ports; /* Number of ports the device is connected */
/* to on the way to the root hub. It may   */
/* be bigger than size of 'ports' array so */
/* userspace can detect overflows. */
-   u8 ports[7];/* List of ports on the way from the root  */
+   __u8 ports[7];  /* List of ports on the way from the root  */
/* hub to the device. Current limit in */
/* USB specification is 7 tiers (root hub, */
/* 5 intermediate hubs, device), which */
-- 
2.20.1

-- 
Cheers,
Stephen Rothwell


pgpRd45SOvYYL.pgp
Description: OpenPGP digital signature

[net v1] net: stmmac: set IC bit when transmitting frames with HW timestamp

2019-06-18 Thread Voon Weifeng

From: Roland Hii 

When transmitting certain PTP frames, e.g. SYNC and DELAY_REQ, the
PTP daemon, e.g. ptp4l, is polling the driver for the frame transmit
hardware timestamp. The polling will most likely timeout if the tx
coalesce is enabled due to the Interrupt-on-Completion (IC) bit is
not set in tx descriptor for those frames.

This patch will ignore the tx coalesce parameter and set the IC bit
when transmitting PTP frames which need to report out the frame
transmit hardware timestamp to user space.

Fixes: f748be531d70 ("net: stmmac: Rework coalesce timer and fix multi-queue 
races")
Signed-off-by: Roland Hii 
Signed-off-by: Ong Boon Leong 
Signed-off-by: Voon Weifeng 

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 06dd51f47cfd..06358fe5b245 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2947,12 +2947,15 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, 
struct net_device *dev)
 
/* Manage tx mitigation */
tx_q->tx_count_frames += nfrags + 1;
-   if (priv->tx_coal_frames <= tx_q->tx_count_frames) {
+   if (likely(priv->tx_coal_frames > tx_q->tx_count_frames) &&
+   !(priv->synopsys_id >= DWMAC_CORE_4_00 &&
+   (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) &&
+   priv->hwts_tx_en)) {
+   stmmac_tx_timer_arm(priv, queue);
+   } else {
+   tx_q->tx_count_frames = 0;
stmmac_set_tx_ic(priv, desc);
priv->xstats.tx_set_ic_bit++;
-   tx_q->tx_count_frames = 0;
-   } else {
-   stmmac_tx_timer_arm(priv, queue);
}
 
skb_tx_timestamp(skb);
@@ -3166,12 +3169,15 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, 
struct net_device *dev)
 * element in case of no SG.
 */
tx_q->tx_count_frames += nfrags + 1;
-   if (priv->tx_coal_frames <= tx_q->tx_count_frames) {
+   if (likely(priv->tx_coal_frames > tx_q->tx_count_frames) &&
+   !(priv->synopsys_id >= DWMAC_CORE_4_00 &&
+   (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) &&
+   priv->hwts_tx_en)) {
+   stmmac_tx_timer_arm(priv, queue);
+   } else {
+   tx_q->tx_count_frames = 0;
stmmac_set_tx_ic(priv, desc);
priv->xstats.tx_set_ic_bit++;
-   tx_q->tx_count_frames = 0;
-   } else {
-   stmmac_tx_timer_arm(priv, queue);
}
 
skb_tx_timestamp(skb);
-- 
1.9.1

Re: [PATCH] [v2] ipsec: select crypto ciphers for xfrm_algo

2019-06-18 Thread Herbert Xu

On Tue, Jun 18, 2019 at 01:22:13PM +0200, Arnd Bergmann wrote:
> kernelci.org reports failed builds on arc because of what looks
> like an old missed 'select' statement:
> 
> net/xfrm/xfrm_algo.o: In function `xfrm_probe_algs':
> xfrm_algo.c:(.text+0x1e8): undefined reference to `crypto_has_ahash'
> 
> I don't see this in randconfig builds on other architectures, but
> it's fairly clear we want to select the hash code for it, like we
> do for all its other users. As Herbert points out, CRYPTO_BLKCIPHER
> is also required even though it has not popped up in build tests.
> 
> Fixes: 17bc19702221 ("ipsec: Use skcipher and ahash when probing algorithms")
> Signed-off-by: Arnd Bergmann 
> ---
>  net/xfrm/Kconfig | 2 ++
>  1 file changed, 2 insertions(+)

Acked-by: Herbert Xu 
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH V3 4/5] cpufreq: Register notifiers with the PM QoS framework

2019-06-18 Thread Viresh Kumar

On 19-06-19, 00:23, Rafael J. Wysocki wrote:
> In patch [3/5] you could point notifiers for both min and max freq to the same
> notifier head.   Both of your notifiers end up calling cpufreq_update_policy()
> anyway.

I tried it and the changes in qos.c file look fine. But I don't like at all how
cpufreq.c looks now. We only register for min-freq notifier now and that takes
care of max as well. What could have been better is if we could have registered
a freq-notifier instead of min/max, which isn't possible as well because of how
qos framework works.

Honestly, the cpufreq changes look hacky to me :(

What do you say.

-- 
viresh

---
 drivers/base/power/qos.c  | 15 ---
 drivers/cpufreq/cpufreq.c | 38 --
 include/linux/cpufreq.h   |  3 +--
 3 files changed, 17 insertions(+), 39 deletions(-)

diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index cde2692b97f9..9bbf2d2a3376 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c
@@ -202,20 +202,20 @@ static int dev_pm_qos_constraints_allocate(struct device 
*dev)
if (!qos)
return -ENOMEM;
 
-   n = kzalloc(3 * sizeof(*n), GFP_KERNEL);
+   n = kzalloc(2 * sizeof(*n), GFP_KERNEL);
if (!n) {
kfree(qos);
return -ENOMEM;
}
 
+   BLOCKING_INIT_NOTIFIER_HEAD(n);
c = &qos->resume_latency;
plist_head_init(&c->list);
c->target_value = PM_QOS_RESUME_LATENCY_DEFAULT_VALUE;
c->default_value = PM_QOS_RESUME_LATENCY_DEFAULT_VALUE;
c->no_constraint_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
c->type = PM_QOS_MIN;
-   c->notifiers = n;
-   BLOCKING_INIT_NOTIFIER_HEAD(n);
+   c->notifiers = n++;
 
c = &qos->latency_tolerance;
plist_head_init(&c->list);
@@ -224,14 +224,16 @@ static int dev_pm_qos_constraints_allocate(struct device 
*dev)
c->no_constraint_value = PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT;
c->type = PM_QOS_MIN;
 
+   /* Same notifier head is used for both min/max frequency */
+   BLOCKING_INIT_NOTIFIER_HEAD(n);
+
c = &qos->min_frequency;
plist_head_init(&c->list);
c->target_value = PM_QOS_MIN_FREQUENCY_DEFAULT_VALUE;
c->default_value = PM_QOS_MIN_FREQUENCY_DEFAULT_VALUE;
c->no_constraint_value = PM_QOS_MIN_FREQUENCY_DEFAULT_VALUE;
c->type = PM_QOS_MAX;
-   c->notifiers = ++n;
-   BLOCKING_INIT_NOTIFIER_HEAD(n);
+   c->notifiers = n;
 
c = &qos->max_frequency;
plist_head_init(&c->list);
@@ -239,8 +241,7 @@ static int dev_pm_qos_constraints_allocate(struct device 
*dev)
c->default_value = PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE;
c->no_constraint_value = PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE;
c->type = PM_QOS_MIN;
-   c->notifiers = ++n;
-   BLOCKING_INIT_NOTIFIER_HEAD(n);
+   c->notifiers = n;
 
INIT_LIST_HEAD(&qos->flags.list);
 
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 1344e1b1307f..1605dba1327e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1139,19 +1139,10 @@ static int cpufreq_update_freq(struct cpufreq_policy 
*policy)
return 0;
 }
 
-static int cpufreq_notifier_min(struct notifier_block *nb, unsigned long freq,
+static int cpufreq_notifier_qos(struct notifier_block *nb, unsigned long freq,
void *data)
 {
-   struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, 
nb_min);
-
-   return cpufreq_update_freq(policy);
-}
-
-static int cpufreq_notifier_max(struct notifier_block *nb, unsigned long freq,
-   void *data)
-{
-   struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, 
nb_max);
+   struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, 
nb_qos);
 
return cpufreq_update_freq(policy);
@@ -1214,10 +1205,10 @@ static struct cpufreq_policy 
*cpufreq_policy_alloc(unsigned int cpu)
goto err_free_real_cpus;
}
 
-   policy->nb_min.notifier_call = cpufreq_notifier_min;
-   policy->nb_max.notifier_call = cpufreq_notifier_max;
+   policy->nb_qos.notifier_call = cpufreq_notifier_qos;
 
-   ret = dev_pm_qos_add_notifier(dev, &policy->nb_min,
+   /* Notifier for min frequency also takes care of max frequency notifier 
*/
+   ret = dev_pm_qos_add_notifier(dev, &policy->nb_qos,
  DEV_PM_QOS_MIN_FREQUENCY);
if (ret) {
dev_err(dev, "Failed to register MIN QoS notifier: %d 
(%*pbl)\n",
@@ -1225,18 +1216,10 @@ static struct cpufreq_policy 
*cpufreq_policy_alloc(unsigned int cpu)
goto err_kobj_remove;
}
 
-   ret = dev_pm_qos_add_notifier(dev, &policy->nb_max,
- DEV_PM_QOS_MAX_FREQUENCY);
-   if (ret) {
-   dev_err(dev, "Fail

Re: WARNING in fanotify_handle_event

2019-06-18 Thread Amir Goldstein

On Tue, Jun 18, 2019 at 11:27 PM Amir Goldstein  wrote:
>
> On Tue, Jun 18, 2019 at 8:07 PM syzbot
>  wrote:
> >
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:963172d9 Merge branch 'x86-urgent-for-linus' of git://git...
> > git tree:   upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=17c090eaa0
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=fa9f7e1b6a8bb586
> > dashboard link: https://syzkaller.appspot.com/bug?extid=c277e8e2f46414645508
> > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=15a32f46a0
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=13a7dc9ea0
> >
> > The bug was bisected to:
> >
> > commit 77115225acc67d9ac4b15f04dd138006b9cd1ef2
> > Author: Amir Goldstein 
> > Date:   Thu Jan 10 17:04:37 2019 +
> >
> >  fanotify: cache fsid in fsnotify_mark_connector
> >
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=12bfcb66a0
> > final crash:https://syzkaller.appspot.com/x/report.txt?x=11bfcb66a0
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16bfcb66a0
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+c277e8e2f46414645...@syzkaller.appspotmail.com
> > Fixes: 77115225acc6 ("fanotify: cache fsid in fsnotify_mark_connector")
> >
> > WARNING: CPU: 0 PID: 8994 at fs/notify/fanotify/fanotify.c:359
> > fanotify_get_fsid fs/notify/fanotify/fanotify.c:359 [inline]
>
> Oops, we forgot to update conn->fsid when the first mark added
> for inode has no fsid (e.g. inotify) and the second mark has fid,
> which is more or less the only thing the repro does.
> And if we are going to update conn->fsid, we do no have the
> cmpxchg to guaranty setting fsid atomically.
>
> I am thinking a set-once flag on connector FSNOTIFY_CONN_HAS_FSID
> checked before smp_rmb() in fanotify_get_fsid().
> If the flag is not set then call vfs_get_fsid() instead of using fsid cache.

Actually, we don't need to call vfs_get_fsid() in race we just drop the event.

> conn->fsid can be updated in fsnotify_add_mark_list() under conn->lock,
> and flag set after smp_wmb().
>
> Does that sound correct?
>

Something like this:

#syz test: https://github.com/amir73il/linux.git fsnotify-fix-fsid-cache

It passed my modified ltp test:
https://github.com/amir73il/ltp/commits/fanotify_dirent

Thanks,
Amir.

Re: [PATCH] ecryptfs: use print_hex_dump_bytes for hexdump

2019-06-18 Thread Tyler Hicks

On 2019-05-17 12:45:15, Sascha Hauer wrote:
> The Kernel has nice hexdump facilities, use them rather a homebrew
> hexdump function.
> 
> Signed-off-by: Sascha Hauer 

Thanks! This is much nicer. I've pushed the commit to the eCryptfs next
branch.

Tyler

> ---
>  fs/ecryptfs/debug.c | 22 +++---
>  1 file changed, 3 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/ecryptfs/debug.c b/fs/ecryptfs/debug.c
> index 3d2bdf546ec6..ee9d8ac4a809 100644
> --- a/fs/ecryptfs/debug.c
> +++ b/fs/ecryptfs/debug.c
> @@ -97,25 +97,9 @@ void ecryptfs_dump_auth_tok(struct ecryptfs_auth_tok 
> *auth_tok)
>   */
>  void ecryptfs_dump_hex(char *data, int bytes)
>  {
> - int i = 0;
> - int add_newline = 1;
> -
>   if (ecryptfs_verbosity < 1)
>   return;
> - if (bytes != 0) {
> - printk(KERN_DEBUG "0x%.2x.", (unsigned char)data[i]);
> - i++;
> - }
> - while (i < bytes) {
> - printk("0x%.2x.", (unsigned char)data[i]);
> - i++;
> - if (i % 16 == 0) {
> - printk("\n");
> - add_newline = 0;
> - } else
> - add_newline = 1;
> - }
> - if (add_newline)
> - printk("\n");
> -}
>  
> + print_hex_dump(KERN_DEBUG, "ecryptfs: ", DUMP_PREFIX_OFFSET, 16, 1,
> +data, bytes, false);
> +}
> -- 
> 2.20.1
>

[PATCH] staging: kpc2000: simplify error handling in kp2000_pcie_probe

2019-06-18 Thread Simon Sandström

We can get rid of a few iounmaps in the middle of the function by
re-ordering the error handling labels and adding two new labels.

Signed-off-by: Simon Sandström 
---

This change has not been tested besides by compiling. It might be good
took take an extra look to make sure that I got everything right.

Also, this change was proposed by Dan Carpenter. Should I add anything
in the commit message to show this?

- Simon

 drivers/staging/kpc2000/kpc2000/core.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/kpc2000/kpc2000/core.c 
b/drivers/staging/kpc2000/kpc2000/core.c
index 610ea549d240..cb05cca687e1 100644
--- a/drivers/staging/kpc2000/kpc2000/core.c
+++ b/drivers/staging/kpc2000/kpc2000/core.c
@@ -351,12 +351,11 @@ static int kp2000_pcie_probe(struct pci_dev *pdev,
 
err = pci_request_region(pcard->pdev, REG_BAR, KP_DRIVER_NAME_KP2000);
if (err) {
-   iounmap(pcard->regs_bar_base);
dev_err(&pcard->pdev->dev,
"probe: failed to acquire PCI region (%d)\n",
err);
err = -ENODEV;
-   goto err_disable_device;
+   goto err_unmap_regs;
}
 
pcard->regs_base_resource.start = reg_bar_phys_addr;
@@ -374,7 +373,7 @@ static int kp2000_pcie_probe(struct pci_dev *pdev,
dev_err(&pcard->pdev->dev,
"probe: DMA_BAR could not remap memory to virtual 
space\n");
err = -ENODEV;
-   goto err_unmap_regs;
+   goto err_release_regs;
}
dev_dbg(&pcard->pdev->dev,
"probe: DMA_BAR virt hardware address start [%p]\n",
@@ -384,11 +383,10 @@ static int kp2000_pcie_probe(struct pci_dev *pdev,
 
err = pci_request_region(pcard->pdev, DMA_BAR, "kp2000_pcie");
if (err) {
-   iounmap(pcard->dma_bar_base);
dev_err(&pcard->pdev->dev,
"probe: failed to acquire PCI region (%d)\n", err);
err = -ENODEV;
-   goto err_unmap_regs;
+   goto err_unmap_dma;
}
 
pcard->dma_base_resource.start = dma_bar_phys_addr;
@@ -400,7 +398,7 @@ static int kp2000_pcie_probe(struct pci_dev *pdev,
pcard->sysinfo_regs_base = pcard->regs_bar_base;
err = read_system_regs(pcard);
if (err)
-   goto err_unmap_dma;
+   goto err_release_dma;
 
// Disable all "user" interrupts because they're not used yet.
writeq(0x,
@@ -438,14 +436,14 @@ static int kp2000_pcie_probe(struct pci_dev *pdev,
if (err) {
dev_err(&pcard->pdev->dev,
"CANNOT use DMA mask %0llx\n", DMA_BIT_MASK(64));
-   goto err_unmap_dma;
+   goto err_release_dma;
}
dev_dbg(&pcard->pdev->dev,
"Using DMA mask %0llx\n", dma_get_mask(PCARD_TO_DEV(pcard)));
 
err = pci_enable_msi(pcard->pdev);
if (err < 0)
-   goto err_unmap_dma;
+   goto err_release_dma;
 
rv = request_irq(pcard->pdev->irq, kp2000_irq_handler, IRQF_SHARED,
 pcard->name, pcard);
@@ -478,14 +476,14 @@ static int kp2000_pcie_probe(struct pci_dev *pdev,
free_irq(pcard->pdev->irq, pcard);
 err_disable_msi:
pci_disable_msi(pcard->pdev);
+err_release_dma:
+   pci_release_region(pdev, DMA_BAR);
 err_unmap_dma:
iounmap(pcard->dma_bar_base);
-   pci_release_region(pdev, DMA_BAR);
-   pcard->dma_bar_base = NULL;
+err_release_regs:
+   pci_release_region(pdev, REG_BAR);
 err_unmap_regs:
iounmap(pcard->regs_bar_base);
-   pci_release_region(pdev, REG_BAR);
-   pcard->regs_bar_base = NULL;
 err_disable_device:
pci_disable_device(pcard->pdev);
 err_remove_ida:
-- 
2.20.1

Re: [PATCH] scsi: scsi_sysfs.c: Hide wwid sdev attr if VPD is not supported

2019-06-18 Thread Hannes Reinecke

On 6/19/19 5:35 AM, Martin K. Petersen wrote:
> 
> Marcos,
> 
>> WWID composed from VPD data from device, specifically page 0x83. So,
>> when a device does not have VPD support, for example USB storage
>> devices where VPD is specifically disabled, a read into > device>/device/wwid file will always return ENXIO. To avoid this,
>> change the scsi_sdev_attr_is_visible function to hide wwid sysfs file
>> when the devices does not support VPD.
> 
> Not a big fan of attribute files that come and go.
> 
> Why not just return an empty string? Hannes?
> 
Actually, the intention of the 'wwid' attribute was to have a common
place where one could look up the global id.
As such it actually serves a dual purpose, namely indicating that there
_is_ a global ID _and_ that this kernel (version) has support for 'wwid'
attribute. This is to resolve one big issue we have to udev nowadays,
which is figuring out if a specific sysfs attribute is actually
supported on this particular kernel.
Dynamic attributes are 'nicer' on a conceptual level, but make the above
test nearly impossible, as we now have _two_ possibilities why a
specific attribute is not present.
So making 'wwid' conditional would actually defeat its very purpose, and
we should leave it blank if not supported.

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.com  +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH -next] ecryptfs: remove unnessesary null check in ecryptfs_keyring_auth_tok_for_sig

2019-06-18 Thread Tyler Hicks

On 2019-05-27 21:28:14, YueHaibing wrote:
> request_key and ecryptfs_get_encrypted_key never
> return a NULL pointer, so no need do a null check.
> 
> Signed-off-by: YueHaibing 

This change looks good to me. I've pushed it to the eCryptfs next
branch.

Tyler

> ---
>  fs/ecryptfs/keystore.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ecryptfs/keystore.c b/fs/ecryptfs/keystore.c
> index 95662fd46b1d..a1afb162b9d2 100644
> --- a/fs/ecryptfs/keystore.c
> +++ b/fs/ecryptfs/keystore.c
> @@ -1626,9 +1626,9 @@ int ecryptfs_keyring_auth_tok_for_sig(struct key 
> **auth_tok_key,
>   int rc = 0;
>  
>   (*auth_tok_key) = request_key(&key_type_user, sig, NULL);
> - if (!(*auth_tok_key) || IS_ERR(*auth_tok_key)) {
> + if (IS_ERR(*auth_tok_key)) {
>   (*auth_tok_key) = ecryptfs_get_encrypted_key(sig);
> - if (!(*auth_tok_key) || IS_ERR(*auth_tok_key)) {
> + if (IS_ERR(*auth_tok_key)) {
>   printk(KERN_ERR "Could not find key with description: 
> [%s]\n",
> sig);
>   rc = process_request_key_err(PTR_ERR(*auth_tok_key));
> -- 
> 2.17.1
> 
>

Re: [PATCH 04/25] vfs: Implement parameter value retrieval with fsinfo() [ver #13]

2019-06-18 Thread Miklos Szeredi

On Wed, Jun 19, 2019 at 12:34 AM David Howells  wrote:

> > Same goes for vfs_parse_sb_flag() btw.   It should be moved into each
> > filesystem's ->parse_param() and not be a mandatory thing.
>
> I disagree.  Every filesystem *must* be able to accept these standard flags,
> even if it then ignores them.

"posixacl" is not a standard flag.  It never was accepted by mount(8)
so I don't see where you got that from.

Can you explain why you think "mand", "sync", "dirsync", "lazytime"
should be accepted by a filesystem such as proc?  The argument that it
breaks userspace is BS, because this is a new interface, hence by
definition we cannot break old userspace.  If mount(8) wants to use
the new API and there really is breakage if these options are rejected
(which I doubt) then it can easily work around that by ignoring them
itself.

Also why should "rw" not be rejected for filesystems which are
read-only by definition, such as iso9660?

Thanks,
Miklos

Re: [PATCH -next] ecryptfs: Make ecryptfs_xattr_handler static

2019-06-18 Thread Tyler Hicks

On 2019-06-14 23:51:17, YueHaibing wrote:
> Fix sparse warning:
> 
> fs/ecryptfs/inode.c:1138:28: warning:
>  symbol 'ecryptfs_xattr_handler' was not declared. Should it be static?
> 
> Reported-by: Hulk Robot 
> Signed-off-by: YueHaibing 

Thanks for the cleanup! I've pushed this to the eCryptfs next branch.

Tyler

> ---
>  fs/ecryptfs/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index 1e994d7..18426f4 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -1121,7 +1121,7 @@ static int ecryptfs_xattr_set(const struct 
> xattr_handler *handler,
>   }
>  }
>  
> -const struct xattr_handler ecryptfs_xattr_handler = {
> +static const struct xattr_handler ecryptfs_xattr_handler = {
>   .prefix = "",  /* match anything */
>   .get = ecryptfs_xattr_get,
>   .set = ecryptfs_xattr_set,
> -- 
> 2.7.4
> 
>

Re: [PATCH v4 1/3] KVM: x86: add support for user wait instructions

2019-06-18 Thread Xiaoyao Li





On 6/19/2019 2:09 PM, Tao Xu wrote:

UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions.
This patch adds support for user wait instructions in KVM. Availability
of the user wait instructions is indicated by the presence of the CPUID
feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. User wait instructions may
be executed at any privilege level, and use IA32_UMWAIT_CONTROL MSR to
set the maximum time.

The behavior of user wait instructions in VMX non-root operation is
determined first by the setting of the "enable user wait and pause"
secondary processor-based VM-execution control bit 26.
If the VM-execution control is 0, UMONITOR/UMWAIT/TPAUSE cause
an invalid-opcode exception (#UD).
If the VM-execution control is 1, treatment is based on the
setting of the “RDTSC exiting” VM-execution control. Because KVM never
enables RDTSC exiting, if the instruction causes a delay, the amount of
time delayed is called here the physical delay. The physical delay is
first computed by determining the virtual delay. If
IA32_UMWAIT_CONTROL[31:2] is zero, the virtual delay is the value in
EDX:EAX minus the value that RDTSC would return; if
IA32_UMWAIT_CONTROL[31:2] is not zero, the virtual delay is the minimum
of that difference and AND(IA32_UMWAIT_CONTROL,FFFCH).

Because umwait and tpause can put a (psysical) CPU into a power saving
state, by default we dont't expose it to kvm and enable it only when
guest CPUID has it.

Detailed information about user wait instructions can be found in the
latest Intel 64 and IA-32 Architectures Software Developer's Manual.

Co-developed-by: Jingqi Liu 
Signed-off-by: Jingqi Liu 
Signed-off-by: Tao Xu 
---

no changes in v4.
---
  arch/x86/include/asm/vmx.h  | 1 +
  arch/x86/kvm/cpuid.c| 2 +-
  arch/x86/kvm/vmx/capabilities.h | 6 ++
  arch/x86/kvm/vmx/vmx.c  | 4 
  4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index a39136b0d509..8f00882664d3 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -69,6 +69,7 @@
  #define SECONDARY_EXEC_PT_USE_GPA 0x0100
  #define SECONDARY_EXEC_MODE_BASED_EPT_EXEC0x0040
  #define SECONDARY_EXEC_TSC_SCALING  0x0200
+#define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE   0x0400
  
  #define PIN_BASED_EXT_INTR_MASK 0x0001

  #define PIN_BASED_NMI_EXITING   0x0008
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e18a9f9f65b5..48bd851a6ae5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -405,7 +405,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
-   F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B);
+   F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/;
  
  	/* cpuid 7.0.edx*/

const u32 kvm_cpuid_7_0_edx_x86_features =
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index d6664ee3d127..fd77e17651b4 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -253,6 +253,12 @@ static inline bool cpu_has_vmx_tsc_scaling(void)
SECONDARY_EXEC_TSC_SCALING;
  }
  
+static inline bool vmx_waitpkg_supported(void)

+{
+   return vmcs_config.cpu_based_2nd_exec_ctrl &
+   SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;


Shouldn't it be
return vmx->secondary_exec_control &
SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;   ?


+}
+
  static inline bool cpu_has_vmx_apicv(void)
  {
return cpu_has_vmx_apic_register_virt() &&
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b93e36ddee5e..b35bfac30a34 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2250,6 +2250,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf,
SECONDARY_EXEC_RDRAND_EXITING |
SECONDARY_EXEC_ENABLE_PML |
SECONDARY_EXEC_TSC_SCALING |
+   SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
SECONDARY_EXEC_PT_USE_GPA |
SECONDARY_EXEC_PT_CONCEAL_VMX |
SECONDARY_EXEC_ENABLE_VMFUNC |
@@ -3987,6 +3988,9 @@ static void vmx_compute_secondary_exec_control(struct 
vcpu_vmx *vmx)
}
}
  
+	if (!guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG))

+   exec_control &= ~SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;
+
vmx->secondary_exec_control = exec_control;
  }

Re: [PATCH 4/5] Powerpc/hw-breakpoint: Optimize disable path

2019-06-18 Thread Ravi Bangoria




On 6/18/19 12:01 PM, Christophe Leroy wrote:
>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> index f002d286..265fac9fb3a4 100644
>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -793,10 +793,22 @@ static inline int set_dabr(struct arch_hw_breakpoint 
>> *brk)
>>   return __set_dabr(dabr, dabrx);
>>   }
>>   +static int disable_dawr(void)
>> +{
>> +    if (ppc_md.set_dawr)
>> +    return ppc_md.set_dawr(0, 0);
>> +
>> +    mtspr(SPRN_DAWRX, 0);
> 
> And SPRN_DAWR ?

Setting DAWRx with 0 should be enough to disable the breakpoint.

[net v1] net: stmmac: fixed new system time seconds value calculation

2019-06-18 Thread Voon Weifeng

From: Roland Hii 

When ADDSUB bit is set, the system time seconds field is calculated as
the complement of the seconds part of the update value.

For example, if 3.1 seconds need to be subtracted from the
system time, this field is calculated as
2^32 - 3 = 4294967296 - 3 = 0x1 - 3 = 0xFFFD

Previously, the 0x1 is mistakenly written as 1.

This is further simplified from
  sec = (0x1ULL - sec);
to
  sec = -sec;

Fixes: ba1ffd74df74 ("stmmac: fix PTP support for GMAC4")
Signed-off-by: Roland Hii 
Signed-off-by: Ong Boon Leong 
Signed-off-by: Voon Weifeng 

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c
index 2dcdf761d525..020159622559 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c
@@ -112,7 +112,7 @@ static int adjust_systime(void __iomem *ioaddr, u32 sec, 
u32 nsec,
 * programmed with (2^32 – )
 */
if (gmac4)
-   sec = (1ULL - sec);
+   sec = -sec;
 
value = readl(ioaddr + PTP_TCR);
if (value & PTP_TCR_TSCTRLSSR)
-- 
1.9.1

[PATCH v4 0/3] KVM: x86: Enable user wait instructions

2019-06-18 Thread Tao Xu

UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions.

UMONITOR arms address monitoring hardware using an address. A store
to an address within the specified address range triggers the
monitoring hardware to wake up the processor waiting in umwait.

UMWAIT instructs the processor to enter an implementation-dependent
optimized state while monitoring a range of addresses. The optimized
state may be either a light-weight power/performance optimized state
(c0.1 state) or an improved power/performance optimized state
(c0.2 state).

TPAUSE instructs the processor to enter an implementation-dependent
optimized state c0.1 or c0.2 state and wake up when time-stamp counter
reaches specified timeout.

Availability of the user wait instructions is indicated by the presence
of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].

The patches enable the umonitor, umwait and tpause features in KVM.
Because umwait and tpause can put a (psysical) CPU into a power saving
state, by default we dont't expose it to kvm and enable it only when
guest CPUID has it. If the instruction causes a delay, the amount
of time delayed is called here the physical delay. The physical delay is
first computed by determining the virtual delay (the time to delay
relative to the VM’s timestamp counter). 

The release document ref below link:
Intel 64 and IA-32 Architectures Software Developer's Manual,
https://software.intel.com/sites/default/files/\
managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf
This patch has a dependency on https://lkml.org/lkml/2019/6/7/1206

Changelog:
v4:
Set msr of IA32_UMWAIT_CONTROL can be 0 and add the check of
reserved bit 1 (Radim and Xiaoyao)
Use umwait_control_cached directly and add the IA32_UMWAIT_CONTROL
in msrs_to_save[] to support migration (Xiaoyao)
v3:
Simplify the patches, expose user wait instructions when the
guest has CPUID (Paolo)
Use mwait_control_cached to avoid frequently rdmsr of
IA32_UMWAIT_CONTROL (Paolo and Xiaoyao)
Handle vm-exit for UMWAIT and TPAUSE as "never happen" (Paolo)
v2:
Separated from the series https://lkml.org/lkml/2018/7/10/160
Add provide a capability to enable UMONITOR, UMWAIT and TPAUSE 
v1:
Sent out with MOVDIRI/MOVDIR64B instructions patches

Tao Xu (3):
  KVM: x86: add support for user wait instructions
  KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL
  KVM: vmx: handle vm-exit for UMWAIT and TPAUSE

 arch/x86/include/asm/vmx.h  |  1 +
 arch/x86/include/uapi/asm/vmx.h |  6 +++-
 arch/x86/kvm/cpuid.c|  2 +-
 arch/x86/kvm/vmx/capabilities.h |  6 
 arch/x86/kvm/vmx/vmx.c  | 53 +
 arch/x86/kvm/vmx/vmx.h  |  3 ++
 arch/x86/kvm/x86.c  |  1 +
 arch/x86/power/umwait.c |  3 +-
 8 files changed, 72 insertions(+), 3 deletions(-)

-- 
2.20.1

[PATCH v4 1/3] KVM: x86: add support for user wait instructions

2019-06-18 Thread Tao Xu

UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions.
This patch adds support for user wait instructions in KVM. Availability
of the user wait instructions is indicated by the presence of the CPUID
feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. User wait instructions may
be executed at any privilege level, and use IA32_UMWAIT_CONTROL MSR to
set the maximum time.

The behavior of user wait instructions in VMX non-root operation is
determined first by the setting of the "enable user wait and pause"
secondary processor-based VM-execution control bit 26.
If the VM-execution control is 0, UMONITOR/UMWAIT/TPAUSE cause
an invalid-opcode exception (#UD).
If the VM-execution control is 1, treatment is based on the
setting of the “RDTSC exiting” VM-execution control. Because KVM never
enables RDTSC exiting, if the instruction causes a delay, the amount of
time delayed is called here the physical delay. The physical delay is
first computed by determining the virtual delay. If
IA32_UMWAIT_CONTROL[31:2] is zero, the virtual delay is the value in
EDX:EAX minus the value that RDTSC would return; if
IA32_UMWAIT_CONTROL[31:2] is not zero, the virtual delay is the minimum
of that difference and AND(IA32_UMWAIT_CONTROL,FFFCH).

Because umwait and tpause can put a (psysical) CPU into a power saving
state, by default we dont't expose it to kvm and enable it only when
guest CPUID has it.

Detailed information about user wait instructions can be found in the
latest Intel 64 and IA-32 Architectures Software Developer's Manual.

Co-developed-by: Jingqi Liu 
Signed-off-by: Jingqi Liu 
Signed-off-by: Tao Xu 
---

no changes in v4.
---
 arch/x86/include/asm/vmx.h  | 1 +
 arch/x86/kvm/cpuid.c| 2 +-
 arch/x86/kvm/vmx/capabilities.h | 6 ++
 arch/x86/kvm/vmx/vmx.c  | 4 
 4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index a39136b0d509..8f00882664d3 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -69,6 +69,7 @@
 #define SECONDARY_EXEC_PT_USE_GPA  0x0100
 #define SECONDARY_EXEC_MODE_BASED_EPT_EXEC 0x0040
 #define SECONDARY_EXEC_TSC_SCALING  0x0200
+#define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE   0x0400
 
 #define PIN_BASED_EXT_INTR_MASK 0x0001
 #define PIN_BASED_NMI_EXITING   0x0008
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e18a9f9f65b5..48bd851a6ae5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -405,7 +405,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
-   F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B);
+   F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/;
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index d6664ee3d127..fd77e17651b4 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -253,6 +253,12 @@ static inline bool cpu_has_vmx_tsc_scaling(void)
SECONDARY_EXEC_TSC_SCALING;
 }
 
+static inline bool vmx_waitpkg_supported(void)
+{
+   return vmcs_config.cpu_based_2nd_exec_ctrl &
+   SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;
+}
+
 static inline bool cpu_has_vmx_apicv(void)
 {
return cpu_has_vmx_apic_register_virt() &&
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b93e36ddee5e..b35bfac30a34 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2250,6 +2250,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf,
SECONDARY_EXEC_RDRAND_EXITING |
SECONDARY_EXEC_ENABLE_PML |
SECONDARY_EXEC_TSC_SCALING |
+   SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
SECONDARY_EXEC_PT_USE_GPA |
SECONDARY_EXEC_PT_CONCEAL_VMX |
SECONDARY_EXEC_ENABLE_VMFUNC |
@@ -3987,6 +3988,9 @@ static void vmx_compute_secondary_exec_control(struct 
vcpu_vmx *vmx)
}
}
 
+   if (!guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG))
+   exec_control &= ~SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;
+
vmx->secondary_exec_control = exec_control;
 }
 
-- 
2.20.1

[PATCH v4 3/3] KVM: vmx: handle vm-exit for UMWAIT and TPAUSE

2019-06-18 Thread Tao Xu

As the latest Intel 64 and IA-32 Architectures Software Developer's
Manual, UMWAIT and TPAUSE instructions cause a VM exit if the
RDTSC exiting and enable user wait and pause VM-execution
controls are both 1.

This patch is to handle the vm-exit for UMWAIT and TPAUSE as this
should never happen.

Co-developed-by: Jingqi Liu 
Signed-off-by: Jingqi Liu 
Signed-off-by: Tao Xu 
---

no changes in v4
---
 arch/x86/include/uapi/asm/vmx.h |  6 +-
 arch/x86/kvm/vmx/vmx.c  | 16 
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index d213ec5c3766..d88d7a68849b 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -85,6 +85,8 @@
 #define EXIT_REASON_PML_FULL62
 #define EXIT_REASON_XSAVES  63
 #define EXIT_REASON_XRSTORS 64
+#define EXIT_REASON_UMWAIT  67
+#define EXIT_REASON_TPAUSE  68
 
 #define VMX_EXIT_REASONS \
{ EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \
@@ -142,7 +144,9 @@
{ EXIT_REASON_RDSEED,"RDSEED" }, \
{ EXIT_REASON_PML_FULL,  "PML_FULL" }, \
{ EXIT_REASON_XSAVES,"XSAVES" }, \
-   { EXIT_REASON_XRSTORS,   "XRSTORS" }
+   { EXIT_REASON_XRSTORS,   "XRSTORS" }, \
+   { EXIT_REASON_UMWAIT,"UMWAIT" }, \
+   { EXIT_REASON_TPAUSE,"TPAUSE" }
 
 #define VMX_ABORT_SAVE_GUEST_MSR_FAIL1
 #define VMX_ABORT_LOAD_HOST_PDPTE_FAIL   2
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index eb13ff9759d3..46125553b180 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5336,6 +5336,20 @@ static int handle_monitor(struct kvm_vcpu *vcpu)
return handle_nop(vcpu);
 }
 
+static int handle_umwait(struct kvm_vcpu *vcpu)
+{
+   kvm_skip_emulated_instruction(vcpu);
+   WARN(1, "this should never happen\n");
+   return 1;
+}
+
+static int handle_tpause(struct kvm_vcpu *vcpu)
+{
+   kvm_skip_emulated_instruction(vcpu);
+   WARN(1, "this should never happen\n");
+   return 1;
+}
+
 static int handle_invpcid(struct kvm_vcpu *vcpu)
 {
u32 vmx_instruction_info;
@@ -5546,6 +5560,8 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu 
*vcpu) = {
[EXIT_REASON_VMFUNC]  = handle_vmx_instruction,
[EXIT_REASON_PREEMPTION_TIMER]= handle_preemption_timer,
[EXIT_REASON_ENCLS]   = handle_encls,
+   [EXIT_REASON_UMWAIT]  = handle_umwait,
+   [EXIT_REASON_TPAUSE]  = handle_tpause,
 };
 
 static const int kvm_vmx_max_exit_handlers =
-- 
2.20.1

[PATCH v4 2/3] KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL

2019-06-18 Thread Tao Xu

UMWAIT and TPAUSE instructions use IA32_UMWAIT_CONTROL at MSR index E1H
to determines the maximum time in TSC-quanta that the processor can reside
in either C0.1 or C0.2.

This patch emulates MSR IA32_UMWAIT_CONTROL in guest and differentiate
IA32_UMWAIT_CONTROL between host and guest. The variable
mwait_control_cached in arch/x86/power/umwait.c caches the MSR value, so
this patch uses it to avoid frequently rdmsr of IA32_UMWAIT_CONTROL.

Co-developed-by: Jingqi Liu 
Signed-off-by: Jingqi Liu 
Signed-off-by: Tao Xu 
---

Changes in v4:
Set msr of IA32_UMWAIT_CONTROL can be 0 and add the check of
reserved bit 1 (Radim and Xiaoyao)
Use umwait_control_cached directly and add the IA32_UMWAIT_CONTROL
in msrs_to_save[] to support migration (Xiaoyao)
---
 arch/x86/kvm/vmx/vmx.c  | 33 +
 arch/x86/kvm/vmx/vmx.h  |  3 +++
 arch/x86/kvm/x86.c  |  1 +
 arch/x86/power/umwait.c |  3 ++-
 4 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b35bfac30a34..eb13ff9759d3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1679,6 +1679,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 #endif
case MSR_EFER:
return kvm_get_msr_common(vcpu, msr_info);
+   case MSR_IA32_UMWAIT_CONTROL:
+   if (!vmx_waitpkg_supported())
+   return 1;
+
+   msr_info->data = vmx->msr_ia32_umwait_control;
+   break;
case MSR_IA32_SPEC_CTRL:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
@@ -1841,6 +1847,16 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
return 1;
vmcs_write64(GUEST_BNDCFGS, data);
break;
+   case MSR_IA32_UMWAIT_CONTROL:
+   if (!vmx_waitpkg_supported())
+   return 1;
+
+   /* The reserved bit IA32_UMWAIT_CONTROL[1] should be zero */
+   if (data & BIT_ULL(1))
+   return 1;
+
+   vmx->msr_ia32_umwait_control = data;
+   break;
case MSR_IA32_SPEC_CTRL:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
@@ -4126,6 +4142,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
vmx->rmode.vm86_active = 0;
vmx->spec_ctrl = 0;
 
+   vmx->msr_ia32_umwait_control = 0;
+
vcpu->arch.microcode_version = 0x1ULL;
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vcpu, 0);
@@ -6339,6 +6357,19 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
msrs[i].host, false);
 }
 
+static void atomic_switch_ia32_umwait_control(struct vcpu_vmx *vmx)
+{
+   if (!vmx_waitpkg_supported())
+   return;
+
+   if (vmx->msr_ia32_umwait_control != umwait_control_cached)
+   add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL,
+ vmx->msr_ia32_umwait_control,
+ umwait_control_cached, false);
+   else
+   clear_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL);
+}
+
 static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
 {
vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
@@ -6447,6 +6478,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
atomic_switch_perf_msrs(vmx);
 
+   atomic_switch_ia32_umwait_control(vmx);
+
vmx_update_hv_timer(vcpu);
 
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 61128b48c503..8485bec7c38a 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -14,6 +14,8 @@
 extern const u32 vmx_msr_index[];
 extern u64 host_efer;
 
+extern u32 umwait_control_cached;
+
 #define MSR_TYPE_R 1
 #define MSR_TYPE_W 2
 #define MSR_TYPE_RW3
@@ -194,6 +196,7 @@ struct vcpu_vmx {
 #endif
 
u64   spec_ctrl;
+   u64   msr_ia32_umwait_control;
 
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83aefd759846..4480de459bf4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1138,6 +1138,7 @@ static u32 msrs_to_save[] = {
MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
+   MSR_IA32_UMWAIT_CONTROL,
 };
 
 static unsigned num_msrs_to_save;
diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
index 7fa381e3fd4e..2e6ce4cbccb3 100644
--- a/arch/x86/power/umwait.c
+++ b/arch/x86/power/umwait.c
@@ -9,7 +9,8 @@
  * MSR value. By default, umwait max time is 10 in TSC-quanta and C0.2
  * is enabled
  */
-sta

RE: [PATCH v2 6/6] net: macb: parameter added to cadence ethernet controller DT binding

2019-06-18 Thread Parshuram Raju Thombare

Hi Florian,

>Please don't resubmit individual patches as replies to your previous
>ones, re-submitting the entire patch series, see this netdev-FAQ section
>for details:

I will resubmit entire patch series separately.

>
>> +- serdes-rate External serdes rate.Mandatory for USXGMII mode.
>
>> +5 - 5G
>
>> +10 - 10G
>
>
>
>There should be an unit specifier in that property, something like:
>serdes-rate-gbps
>can't we somehow automatically detect that?

Ok, sure. I will add unit specifier to property name. 
No, currently HW don’t have way to auto detect external serdes rate.

Regards,
Parshuram Thombare

Re: nouveau: DRM: GPU lockup - switching to software fbcon

2019-06-18 Thread Ilia Mirkin

On Wed, Jun 19, 2019 at 1:48 AM Sergey Senozhatsky
 wrote:
>
> On (06/19/19 01:20), Ilia Mirkin wrote:
> > On Wed, Jun 19, 2019 at 1:08 AM Sergey Senozhatsky
> >  wrote:
> > >
> > > On (06/14/19 11:50), Sergey Senozhatsky wrote:
> > > > dmesg
> > > >
> > > >  nouveau :01:00.0: DRM: GPU lockup - switching to software fbcon
> > > >  nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
> > > >  nouveau :01:00.0: fifo: runlist 0: scheduled for recovery
> > > >  nouveau :01:00.0: fifo: channel 5: killed
> > > >  nouveau :01:00.0: fifo: engine 6: scheduled for recovery
> > > >  nouveau :01:00.0: fifo: engine 0: scheduled for recovery
> > > >  nouveau :01:00.0: firefox[476]: channel 5 killed!
> > > >  nouveau :01:00.0: firefox[476]: failed to idle channel 5 
> > > > [firefox[476]]
> > > >
> > > > It lockups several times a day. Twice in just one hour today.
> > > > Can we fix this?
> > >
> > > Unusable
> >
> > Are you using a GTX 660 by any chance? You've provided rather minimal
> > system info.
>
> 01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 730] 
> (rev a1)

Quite literally the same GPU I have plugged in...

02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B
[GeForce GT 730] [10de:1287] (rev a1)

Works great here! Only other thing I can think of is that I avoid
applications with the letters "G" and "K" in their names, and I'm
using xf86-video-nouveau ddx, whereas you might be using the "modeset"
ddx with glamor.

If all else fails, just remove nouveau_dri.so and/or boot with
nouveau.noaccel=1 -- should be perfect.

Cheers,

  -ilia

[PATCH v10 12/13] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields

2019-06-18 Thread Dan Williams

At namespace creation time there is the potential for the "expected to
be zero" fields of a 'pfn' info-block to be filled with indeterminate
data. While the kernel buffer is zeroed on allocation it is immediately
overwritten by nd_pfn_validate() filling it with the current contents of
the on-media info-block location. For fields like, 'flags' and the
'padding' it potentially means that future implementations can not rely
on those fields being zero.

In preparation to stop using the 'start_pad' and 'end_trunc' fields for
section alignment, arrange for fields that are not explicitly
initialized to be guaranteed zero. Bump the minor version to indicate it
is safe to assume the 'padding' and 'flags' are zero. Otherwise, this
corruption is expected to benign since all other critical fields are
explicitly initialized.

Note The cc: stable is about spreading this new policy to as many
kernels as possible not fixing an issue in those kernels. It is not
until the change titled "libnvdimm/pfn: Stop padding pmem namespaces to
section alignment" where this improper initialization becomes a problem.
So if someone decides to backport "libnvdimm/pfn: Stop padding pmem
namespaces to section alignment" (which is not tagged for stable), make
sure this pre-requisite is flagged.

Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
Cc: 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/dax_devs.c |2 +-
 drivers/nvdimm/pfn.h  |1 +
 drivers/nvdimm/pfn_devs.c |   18 +++---
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c
index 49fc18ee0565..6d22b0f83b3b 100644
--- a/drivers/nvdimm/dax_devs.c
+++ b/drivers/nvdimm/dax_devs.c
@@ -118,7 +118,7 @@ int nd_dax_probe(struct device *dev, struct 
nd_namespace_common *ndns)
nvdimm_bus_unlock(&ndns->dev);
if (!dax_dev)
return -ENOMEM;
-   pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+   pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, DAX_SIG);
dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : "");
diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index f58b849e455b..dfb2bcda8f5a 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -28,6 +28,7 @@ struct nd_pfn_sb {
__le32 end_trunc;
/* minor-version-2 record the base alignment of the mapping */
__le32 align;
+   /* minor-version-3 guarantee the padding and flags are zero */
u8 padding[4000];
__le64 checksum;
 };
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 0f81fc56bbfd..4977424693b0 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -412,6 +412,15 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn 
*nd_pfn)
return 0;
 }
 
+/**
+ * nd_pfn_validate - read and validate info-block
+ * @nd_pfn: fsdax namespace runtime state / properties
+ * @sig: 'devdax' or 'fsdax' signature
+ *
+ * Upon return the info-block buffer contents (->pfn_sb) are
+ * indeterminate when validation fails, and a coherent info-block
+ * otherwise.
+ */
 int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 {
u64 checksum, offset;
@@ -557,7 +566,7 @@ int nd_pfn_probe(struct device *dev, struct 
nd_namespace_common *ndns)
nvdimm_bus_unlock(&ndns->dev);
if (!pfn_dev)
return -ENOMEM;
-   pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+   pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn = to_nd_pfn(pfn_dev);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, PFN_SIG);
@@ -694,7 +703,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
u64 checksum;
int rc;
 
-   pfn_sb = devm_kzalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
+   pfn_sb = devm_kmalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
if (!pfn_sb)
return -ENOMEM;
 
@@ -703,11 +712,14 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
sig = DAX_SIG;
else
sig = PFN_SIG;
+
rc = nd_pfn_validate(nd_pfn, sig);
if (rc != -ENODEV)
return rc;
 
/* no info block, do init */;
+   memset(pfn_sb, 0, sizeof(*pfn_sb));
+
nd_region = to_nd_region(nd_pfn->dev.parent);
if (nd_region->ro) {
dev_info(&nd_pfn->dev,
@@ -760,7 +772,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
memcpy(pfn_sb->uuid, nd_pfn->uuid, 16);
memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16);
pfn_sb->version_major = cpu_to_le16(1);
-   pfn_sb->version_minor = cpu_to_le16(2);
+   pfn_sb->version_minor = cpu_to_le16(3);
pfn_sb->start_pad = cpu_to_le32(start_pad);
pfn_sb->end_trunc = cpu_to_le32(end_trunc);
pfn_sb->align = cpu_to_le32(nd_pfn->align);

[PATCH v10 09/13] mm/sparsemem: Support sub-section hotplug

2019-06-18 Thread Dan Williams

The libnvdimm sub-system has suffered a series of hacks and broken
workarounds for the memory-hotplug implementation's awkward
section-aligned (128MB) granularity. For example the following backtrace
is emitted when attempting arch_add_memory() with physical address
ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM)
within a given section:

# cat /proc/iomem | grep -A1 -B1 Persistent\ Memory
1-1 : System RAM
2-303ff : Persistent Memory (legacy)
30400-43fff : System RAM
44000-23 : Persistent Memory
24-43bfff : Persistent Memory
  24-43bfff : namespace2.0

WARNING: CPU: 38 PID: 928 at arch/x86/mm/init_64.c:850 add_pages+0x5c/0x60
[..]
RIP: 0010:add_pages+0x5c/0x60
[..]
Call Trace:
 devm_memremap_pages+0x460/0x6e0
 pmem_attach_disk+0x29e/0x680 [nd_pmem]
 ? nd_dax_probe+0xfc/0x120 [libnvdimm]
 nvdimm_bus_probe+0x66/0x160 [libnvdimm]

It was discovered that the problem goes beyond RAM vs PMEM collisions as
some platform produce PMEM vs PMEM collisions within a given section.
The libnvdimm workaround for that case revealed that the libnvdimm
section-alignment-padding implementation has been broken for a long
while. A fix for that long-standing breakage introduces as many problems
as it solves as it would require a backward-incompatible change to the
namespace metadata interpretation. Instead of that dubious route [1],
address the root problem in the memory-hotplug implementation.

Note that EEXIST is no longer treated as success as that is how
sparse_add_section() reports subsection collisions, it was also obviated
by recent changes to perform the request_region() for 'System RAM'
before arch_add_memory() in the add_memory() sequence.

[1]: 
https://lore.kernel.org/r/155000671719.348031.2347363160141119237.st...@dwillia2-desk3.amr.corp.intel.com
Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Cc: Oscar Salvador 
Cc: Pavel Tatashin 
Signed-off-by: Dan Williams 
---
 include/linux/memory_hotplug.h |2 
 mm/memory_hotplug.c|   27 +
 mm/page_alloc.c|2 
 mm/sparse.c|  205 ++--
 4 files changed, 140 insertions(+), 96 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 3ab0282b4fe5..0b8a5e5ef2da 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -350,7 +350,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, 
unsigned long start_pfn,
 extern bool is_memblock_offlined(struct memory_block *mem);
 extern int sparse_add_section(int nid, unsigned long pfn,
unsigned long nr_pages, struct vmem_altmap *altmap);
-extern void sparse_remove_one_section(struct mem_section *ms,
+extern void sparse_remove_section(struct mem_section *ms,
unsigned long pfn, unsigned long nr_pages,
unsigned long map_offset, struct vmem_altmap *altmap);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 399bf78bccc5..4e8e65954f31 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -252,18 +252,6 @@ void __init register_page_bootmem_info_node(struct 
pglist_data *pgdat)
 }
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
-static int __meminit __add_section(int nid, unsigned long pfn,
-   unsigned long nr_pages, struct vmem_altmap *altmap)
-{
-   int ret;
-
-   if (pfn_valid(pfn))
-   return -EEXIST;
-
-   ret = sparse_add_section(nid, pfn, nr_pages, altmap);
-   return ret < 0 ? ret : 0;
-}
-
 static int check_pfn_span(unsigned long pfn, unsigned long nr_pages,
const char *reason)
 {
@@ -327,18 +315,11 @@ int __ref __add_pages(int nid, unsigned long pfn, 
unsigned long nr_pages,
 
pfns = min(nr_pages, PAGES_PER_SECTION
- (pfn & ~PAGE_SECTION_MASK));
-   err = __add_section(nid, pfn, pfns, altmap);
+   err = sparse_add_section(nid, pfn, pfns, altmap);
+   if (err)
+   break;
pfn += pfns;
nr_pages -= pfns;
-
-   /*
-* EEXIST is finally dealt with by ioresource collision
-* check. see add_memory() => register_memory_resource()
-* Warning will be printed if there is collision.
-*/
-   if (err && (err != -EEXIST))
-   break;
-   err = 0;
cond_resched();
}
vmemmap_populate_print_last();
@@ -541,7 +522,7 @@ static void __remove_section(struct zone *zone, unsigned 
long pfn,
return;
 
__remove_zone(zone, pfn, nr_pages);
-   sparse_remove_one_section(ms, pfn, nr_pages, map_offset, altmap);
+   sparse_remove_section(m

[PATCH v10 06/13] mm/hotplug: Kill is_dev_zone() usage in __remove_pages()

2019-06-18 Thread Dan Williams

The zone type check was a leftover from the cleanup that plumbed altmap
through the memory hotplug path, i.e. commit da024512a1fa "mm: pass the
vmem_altmap to arch_remove_memory and __remove_pages".

Cc: Michal Hocko 
Cc: Logan Gunthorpe 
Cc: Pavel Tatashin 
Reviewed-by: David Hildenbrand 
Reviewed-by: Oscar Salvador 
Signed-off-by: Dan Williams 
---
 mm/memory_hotplug.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 647859a1d119..4b882c57781a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -535,11 +535,8 @@ void __remove_pages(struct zone *zone, unsigned long 
phys_start_pfn,
unsigned long map_offset = 0;
int sections_to_remove;
 
-   /* In the ZONE_DEVICE case device driver owns the memory region */
-   if (is_dev_zone(zone)) {
-   if (altmap)
-   map_offset = vmem_altmap_offset(altmap);
-   }
+   if (altmap)
+   map_offset = vmem_altmap_offset(altmap);
 
clear_zone_contiguous(zone);

[PATCH v10 08/13] mm/sparsemem: Prepare for sub-section ranges

2019-06-18 Thread Dan Williams

Prepare the memory hot-{add,remove} paths for handling sub-section
ranges by plumbing the starting page frame and number of pages being
handled through arch_{add,remove}_memory() to
sparse_{add,remove}_one_section().

This is simply plumbing, small cleanups, and some identifier renames. No
intended functional changes.

Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Cc: Oscar Salvador 
Reviewed-by: Pavel Tatashin 
Signed-off-by: Dan Williams 
---
 include/linux/memory_hotplug.h |5 +-
 mm/memory_hotplug.c|  114 +---
 mm/sparse.c|   16 ++
 3 files changed, 81 insertions(+), 54 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 79e0add6a597..3ab0282b4fe5 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -348,9 +348,10 @@ extern int add_memory_resource(int nid, struct resource 
*resource);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages, struct vmem_altmap *altmap);
 extern bool is_memblock_offlined(struct memory_block *mem);
-extern int sparse_add_one_section(int nid, unsigned long start_pfn,
- struct vmem_altmap *altmap);
+extern int sparse_add_section(int nid, unsigned long pfn,
+   unsigned long nr_pages, struct vmem_altmap *altmap);
 extern void sparse_remove_one_section(struct mem_section *ms,
+   unsigned long pfn, unsigned long nr_pages,
unsigned long map_offset, struct vmem_altmap *altmap);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
  unsigned long pnum);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4b882c57781a..399bf78bccc5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -252,51 +252,84 @@ void __init register_page_bootmem_info_node(struct 
pglist_data *pgdat)
 }
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
-static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-  struct vmem_altmap *altmap)
+static int __meminit __add_section(int nid, unsigned long pfn,
+   unsigned long nr_pages, struct vmem_altmap *altmap)
 {
int ret;
 
-   if (pfn_valid(phys_start_pfn))
+   if (pfn_valid(pfn))
return -EEXIST;
 
-   ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
+   ret = sparse_add_section(nid, pfn, nr_pages, altmap);
return ret < 0 ? ret : 0;
 }
 
+static int check_pfn_span(unsigned long pfn, unsigned long nr_pages,
+   const char *reason)
+{
+   /*
+* Disallow all operations smaller than a sub-section and only
+* allow operations smaller than a section for
+* SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
+* enforces a larger memory_block_size_bytes() granularity for
+* memory that will be marked online, so this check should only
+* fire for direct arch_{add,remove}_memory() users outside of
+* add_memory_resource().
+*/
+   unsigned long min_align;
+
+   if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
+   min_align = PAGES_PER_SUBSECTION;
+   else
+   min_align = PAGES_PER_SECTION;
+   if (!IS_ALIGNED(pfn, min_align)
+   || !IS_ALIGNED(nr_pages, min_align)) {
+   WARN(1, "Misaligned __%s_pages start: %#lx end: #%lx\n",
+   reason, pfn, pfn + nr_pages - 1);
+   return -EINVAL;
+   }
+   return 0;
+}
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
  * call this function after deciding the zone to which to
  * add the new pages.
  */
-int __ref __add_pages(int nid, unsigned long phys_start_pfn,
-   unsigned long nr_pages, struct mhp_restrictions *restrictions)
+int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
+   struct mhp_restrictions *restrictions)
 {
unsigned long i;
-   int err = 0;
-   int start_sec, end_sec;
+   int start_sec, end_sec, err;
struct vmem_altmap *altmap = restrictions->altmap;
 
-   /* during initialize mem_map, align hot-added range to section */
-   start_sec = pfn_to_section_nr(phys_start_pfn);
-   end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
-
if (altmap) {
/*
 * Validate altmap is within bounds of the total request
 */
-   if (altmap->base_pfn != phys_start_pfn
+   if (altmap->base_pfn != pfn
|| vmem_altmap_offset(altmap) > nr_pages) {
pr_warn_once("memory add fail, invalid altmap\n");
-   err = -EINVAL;
-   goto o

[PATCH v10 10/13] mm: Document ZONE_DEVICE memory-model implications

2019-06-18 Thread Dan Williams

Explain the general mechanisms of 'ZONE_DEVICE' pages and list the users
of 'devm_memremap_pages()'.

Cc: Jonathan Corbet 
Reported-by: Mike Rapoport 
Signed-off-by: Dan Williams 
---
 Documentation/vm/memory-model.rst |   39 +
 1 file changed, 39 insertions(+)

diff --git a/Documentation/vm/memory-model.rst 
b/Documentation/vm/memory-model.rst
index 382f72ace1fc..e0af47e02e78 100644
--- a/Documentation/vm/memory-model.rst
+++ b/Documentation/vm/memory-model.rst
@@ -181,3 +181,42 @@ that is eventually passed to vmemmap_populate() through a 
long chain
 of function calls. The vmemmap_populate() implementation may use the
 `vmem_altmap` along with :c:func:`altmap_alloc_block_buf` helper to
 allocate memory map on the persistent memory device.
+
+ZONE_DEVICE
+===
+The `ZONE_DEVICE` facility builds upon `SPARSEMEM_VMEMMAP` to offer
+`struct page` `mem_map` services for device driver identified physical
+address ranges. The "device" aspect of `ZONE_DEVICE` relates to the fact
+that the page objects for these address ranges are never marked online,
+and that a reference must be taken against the device, not just the page
+to keep the memory pinned for active use. `ZONE_DEVICE`, via
+:c:func:`devm_memremap_pages`, performs just enough memory hotplug to
+turn on :c:func:`pfn_to_page`, :c:func:`page_to_pfn`, and
+:c:func:`get_user_pages` service for the given range of pfns. Since the
+page reference count never drops below 1 the page is never tracked as
+free memory and the page's `struct list_head lru` space is repurposed
+for back referencing to the host device / driver that mapped the memory.
+
+While `SPARSEMEM` presents memory as a collection of sections,
+optionally collected into memory blocks, `ZONE_DEVICE` users have a need
+for smaller granularity of populating the `mem_map`. Given that
+`ZONE_DEVICE` memory is never marked online it is subsequently never
+subject to its memory ranges being exposed through the sysfs memory
+hotplug api on memory block boundaries. The implementation relies on
+this lack of user-api constraint to allow sub-section sized memory
+ranges to be specified to :c:func:`arch_add_memory`, the top-half of
+memory hotplug. Sub-section support allows for `PMD_SIZE` as the minimum
+alignment granularity for :c:func:`devm_memremap_pages`.
+
+The users of `ZONE_DEVICE` are:
+* pmem: Map platform persistent memory to be used as a direct-I/O target
+  via DAX mappings.
+
+* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()`
+  event callbacks to allow a device-driver to coordinate memory management
+  events related to device-memory, typically GPU memory. See
+  Documentation/vm/hmm.rst.
+
+* p2pdma: Create `struct page` objects to allow peer devices in a
+  PCI/-E topology to coordinate direct-DMA operations between themselves,
+  i.e. bypass host memory.

[PATCH v10 13/13] libnvdimm/pfn: Stop padding pmem namespaces to section alignment

2019-06-18 Thread Dan Williams

Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE
memory, we no longer need to add padding at pfn/dax device creation
time. The kernel will still honor padding established by older kernels.

Reported-by: Jeff Moyer 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/pfn.h  |   14 
 drivers/nvdimm/pfn_devs.c |   77 -
 include/linux/mmzone.h|3 ++
 3 files changed, 16 insertions(+), 78 deletions(-)

diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index dfb2bcda8f5a..7381673b7b70 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -33,18 +33,4 @@ struct nd_pfn_sb {
__le64 checksum;
 };
 
-#ifdef CONFIG_SPARSEMEM
-#define PFN_SECTION_ALIGN_DOWN(x) SECTION_ALIGN_DOWN(x)
-#define PFN_SECTION_ALIGN_UP(x) SECTION_ALIGN_UP(x)
-#else
-/*
- * In this case ZONE_DEVICE=n and we will disable 'pfn' device support,
- * but we still want pmem to compile.
- */
-#define PFN_SECTION_ALIGN_DOWN(x) (x)
-#define PFN_SECTION_ALIGN_UP(x) (x)
-#endif
-
-#define PHYS_SECTION_ALIGN_DOWN(x) 
PFN_PHYS(PFN_SECTION_ALIGN_DOWN(PHYS_PFN(x)))
-#define PHYS_SECTION_ALIGN_UP(x) PFN_PHYS(PFN_SECTION_ALIGN_UP(PHYS_PFN(x)))
 #endif /* __NVDIMM_PFN_H */
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 4977424693b0..2537aa338bd0 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -587,14 +587,14 @@ static u32 info_block_reserve(void)
 }
 
 /*
- * We hotplug memory at section granularity, pad the reserved area from
- * the previous section base to the namespace base address.
+ * We hotplug memory at sub-section granularity, pad the reserved area
+ * from the previous section base to the namespace base address.
  */
 static unsigned long init_altmap_base(resource_size_t base)
 {
unsigned long base_pfn = PHYS_PFN(base);
 
-   return PFN_SECTION_ALIGN_DOWN(base_pfn);
+   return SUBSECTION_ALIGN_DOWN(base_pfn);
 }
 
 static unsigned long init_altmap_reserve(resource_size_t base)
@@ -602,7 +602,7 @@ static unsigned long init_altmap_reserve(resource_size_t 
base)
unsigned long reserve = info_block_reserve() >> PAGE_SHIFT;
unsigned long base_pfn = PHYS_PFN(base);
 
-   reserve += base_pfn - PFN_SECTION_ALIGN_DOWN(base_pfn);
+   reserve += base_pfn - SUBSECTION_ALIGN_DOWN(base_pfn);
return reserve;
 }
 
@@ -633,8 +633,7 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct 
dev_pagemap *pgmap)
nd_pfn->npfns = le64_to_cpu(pfn_sb->npfns);
pgmap->altmap_valid = false;
} else if (nd_pfn->mode == PFN_MODE_PMEM) {
-   nd_pfn->npfns = PFN_SECTION_ALIGN_UP((resource_size(res)
-   - offset) / PAGE_SIZE);
+   nd_pfn->npfns = PHYS_PFN((resource_size(res) - offset));
if (le64_to_cpu(nd_pfn->pfn_sb->npfns) > nd_pfn->npfns)
dev_info(&nd_pfn->dev,
"number of pfns truncated from %lld to 
%ld\n",
@@ -650,54 +649,14 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, 
struct dev_pagemap *pgmap)
return 0;
 }
 
-static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys)
-{
-   return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys),
-   ALIGN_DOWN(phys, nd_pfn->align));
-}
-
-/*
- * Check if pmem collides with 'System RAM', or other regions when
- * section aligned.  Trim it accordingly.
- */
-static void trim_pfn_device(struct nd_pfn *nd_pfn, u32 *start_pad, u32 
*end_trunc)
-{
-   struct nd_namespace_common *ndns = nd_pfn->ndns;
-   struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
-   struct nd_region *nd_region = to_nd_region(nd_pfn->dev.parent);
-   const resource_size_t start = nsio->res.start;
-   const resource_size_t end = start + resource_size(&nsio->res);
-   resource_size_t adjust, size;
-
-   *start_pad = 0;
-   *end_trunc = 0;
-
-   adjust = start - PHYS_SECTION_ALIGN_DOWN(start);
-   size = resource_size(&nsio->res) + adjust;
-   if (region_intersects(start - adjust, size, IORESOURCE_SYSTEM_RAM,
-   IORES_DESC_NONE) == REGION_MIXED
-   || nd_region_conflict(nd_region, start - adjust, size))
-   *start_pad = PHYS_SECTION_ALIGN_UP(start) - start;
-
-   /* Now check that end of the range does not collide. */
-   adjust = PHYS_SECTION_ALIGN_UP(end) - end;
-   size = resource_size(&nsio->res) + adjust;
-   if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
-   IORES_DESC_NONE) == REGION_MIXED
-   || !IS_ALIGNED(end, nd_pfn->align)
-   || nd_region_conflict(nd_region, start, size))
-   *end_trunc = end - phys_pmem_align_down(nd_pfn, end);
-}
-
 static int nd_pfn_init(struct nd_pfn *nd_pfn)
 {
struct nd_namespace_common *ndn

[PATCH v10 11/13] mm/devm_memremap_pages: Enable sub-section remap

2019-06-18 Thread Dan Williams

Teach devm_memremap_pages() about the new sub-section capabilities of
arch_{add,remove}_memory(). Effectively, just replace all usage of
align_start, align_end, and align_size with res->start, res->end, and
resource_size(res). The existing sanity check will still make sure that
the two separate remap attempts do not collide within a sub-section (2MB
on x86).

Cc: Michal Hocko 
Cc: Toshi Kani 
Cc: Jérôme Glisse 
Cc: Logan Gunthorpe 
Cc: Oscar Salvador 
Cc: Pavel Tatashin 
Signed-off-by: Dan Williams 
---
 kernel/memremap.c |   61 +
 1 file changed, 24 insertions(+), 37 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 57980ed4e571..a0e5f6b91b04 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -58,7 +58,7 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap)
struct vmem_altmap *altmap = &pgmap->altmap;
unsigned long pfn;
 
-   pfn = res->start >> PAGE_SHIFT;
+   pfn = PHYS_PFN(res->start);
if (pgmap->altmap_valid)
pfn += vmem_altmap_offset(altmap);
return pfn;
@@ -86,7 +86,6 @@ static void devm_memremap_pages_release(void *data)
struct dev_pagemap *pgmap = data;
struct device *dev = pgmap->dev;
struct resource *res = &pgmap->res;
-   resource_size_t align_start, align_size;
unsigned long pfn;
int nid;
 
@@ -96,25 +95,21 @@ static void devm_memremap_pages_release(void *data)
pgmap->cleanup(pgmap->ref);
 
/* pages are dead and unused, undo the arch mapping */
-   align_start = res->start & ~(PA_SECTION_SIZE - 1);
-   align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
-   - align_start;
-
-   nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT));
+   nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
 
mem_hotplug_begin();
if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-   pfn = align_start >> PAGE_SHIFT;
+   pfn = PHYS_PFN(res->start);
__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
-   align_size >> PAGE_SHIFT, NULL);
+   PHYS_PFN(resource_size(res)), NULL);
} else {
-   arch_remove_memory(nid, align_start, align_size,
+   arch_remove_memory(nid, res->start, resource_size(res),
pgmap->altmap_valid ? &pgmap->altmap : NULL);
-   kasan_remove_zero_shadow(__va(align_start), align_size);
+   kasan_remove_zero_shadow(__va(res->start), resource_size(res));
}
mem_hotplug_done();
 
-   untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
+   untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res));
pgmap_array_delete(res);
dev_WARN_ONCE(dev, pgmap->altmap.alloc,
  "%s: failed to free all reserved pages\n", __func__);
@@ -141,16 +136,13 @@ static void devm_memremap_pages_release(void *data)
  */
 void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 {
-   resource_size_t align_start, align_size, align_end;
-   struct vmem_altmap *altmap = pgmap->altmap_valid ?
-   &pgmap->altmap : NULL;
struct resource *res = &pgmap->res;
struct dev_pagemap *conflict_pgmap;
struct mhp_restrictions restrictions = {
/*
 * We do not want any optional features only our own memmap
*/
-   .altmap = altmap,
+   .altmap = pgmap->altmap_valid ? &pgmap->altmap : NULL,
};
pgprot_t pgprot = PAGE_KERNEL;
int error, nid, is_ram;
@@ -160,12 +152,7 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
return ERR_PTR(-EINVAL);
}
 
-   align_start = res->start & ~(PA_SECTION_SIZE - 1);
-   align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE)
-   - align_start;
-   align_end = align_start + align_size - 1;
-
-   conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_start), NULL);
+   conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->start), NULL);
if (conflict_pgmap) {
dev_WARN(dev, "Conflicting mapping in same section\n");
put_dev_pagemap(conflict_pgmap);
@@ -173,7 +160,7 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
goto err_array;
}
 
-   conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_end), NULL);
+   conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->end), NULL);
if (conflict_pgmap) {
dev_WARN(dev, "Conflicting mapping in same section\n");
put_dev_pagemap(conflict_pgmap);
@@ -181,7 +168,7 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
goto err_array;
}
 
-   is_ram = region_intersects(align_start,

[PATCH v10 07/13] mm: Kill is_dev_zone() helper

2019-06-18 Thread Dan Williams

Given there are no more usages of is_dev_zone() outside of 'ifdef
CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.

Cc: Michal Hocko 
Cc: Logan Gunthorpe 
Acked-by: David Hildenbrand 
Reviewed-by: Oscar Salvador 
Reviewed-by: Pavel Tatashin 
Reviewed-by: Wei Yang 
Signed-off-by: Dan Williams 
---
 include/linux/mmzone.h |   12 
 mm/page_alloc.c|2 +-
 2 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index c4e8843e283c..e976faf57292 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -855,18 +855,6 @@ static inline int local_memory_node(int node_id) { return 
node_id; };
  */
 #define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones)
 
-#ifdef CONFIG_ZONE_DEVICE
-static inline bool is_dev_zone(const struct zone *zone)
-{
-   return zone_idx(zone) == ZONE_DEVICE;
-}
-#else
-static inline bool is_dev_zone(const struct zone *zone)
-{
-   return false;
-}
-#endif
-
 /*
  * Returns true if a zone has pages managed by the buddy allocator.
  * All the reclaim decisions have to use this function rather than
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8e7215fb6976..12b2afd3a529 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5881,7 +5881,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
unsigned long start = jiffies;
int nid = pgdat->node_id;
 
-   if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone)))
+   if (WARN_ON_ONCE(!pgmap || zone_idx(zone) != ZONE_DEVICE))
return;
 
/*

[PATCH v10 04/13] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal

2019-06-18 Thread Dan Williams

Sub-section hotplug support reduces the unit of operation of hotplug
from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
(PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider
PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
valid_section(), can toggle.

Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Reviewed-by: Pavel Tatashin 
Reviewed-by: Oscar Salvador 
Signed-off-by: Dan Williams 
---
 mm/memory_hotplug.c |   29 -
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7b963c2d3a0d..647859a1d119 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -318,12 +318,8 @@ static unsigned long find_smallest_section_pfn(int nid, 
struct zone *zone,
 unsigned long start_pfn,
 unsigned long end_pfn)
 {
-   struct mem_section *ms;
-
-   for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
-   ms = __pfn_to_section(start_pfn);
-
-   if (unlikely(!valid_section(ms)))
+   for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUBSECTION) {
+   if (unlikely(!pfn_valid(start_pfn)))
continue;
 
if (unlikely(pfn_to_nid(start_pfn) != nid))
@@ -343,15 +339,12 @@ static unsigned long find_biggest_section_pfn(int nid, 
struct zone *zone,
unsigned long start_pfn,
unsigned long end_pfn)
 {
-   struct mem_section *ms;
unsigned long pfn;
 
/* pfn is the end pfn of a memory section. */
pfn = end_pfn - 1;
-   for (; pfn >= start_pfn; pfn -= PAGES_PER_SECTION) {
-   ms = __pfn_to_section(pfn);
-
-   if (unlikely(!valid_section(ms)))
+   for (; pfn >= start_pfn; pfn -= PAGES_PER_SUBSECTION) {
+   if (unlikely(!pfn_valid(pfn)))
continue;
 
if (unlikely(pfn_to_nid(pfn) != nid))
@@ -373,7 +366,6 @@ static void shrink_zone_span(struct zone *zone, unsigned 
long start_pfn,
unsigned long z = zone_end_pfn(zone); /* zone_end_pfn namespace clash */
unsigned long zone_end_pfn = z;
unsigned long pfn;
-   struct mem_section *ms;
int nid = zone_to_nid(zone);
 
zone_span_writelock(zone);
@@ -410,10 +402,8 @@ static void shrink_zone_span(struct zone *zone, unsigned 
long start_pfn,
 * it check the zone has only hole or not.
 */
pfn = zone_start_pfn;
-   for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) {
-   ms = __pfn_to_section(pfn);
-
-   if (unlikely(!valid_section(ms)))
+   for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUBSECTION) {
+   if (unlikely(!pfn_valid(pfn)))
continue;
 
if (page_zone(pfn_to_page(pfn)) != zone)
@@ -441,7 +431,6 @@ static void shrink_pgdat_span(struct pglist_data *pgdat,
unsigned long p = pgdat_end_pfn(pgdat); /* pgdat_end_pfn namespace 
clash */
unsigned long pgdat_end_pfn = p;
unsigned long pfn;
-   struct mem_section *ms;
int nid = pgdat->node_id;
 
if (pgdat_start_pfn == start_pfn) {
@@ -478,10 +467,8 @@ static void shrink_pgdat_span(struct pglist_data *pgdat,
 * has only hole or not.
 */
pfn = pgdat_start_pfn;
-   for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SECTION) {
-   ms = __pfn_to_section(pfn);
-
-   if (unlikely(!valid_section(ms)))
+   for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUBSECTION) {
+   if (unlikely(!pfn_valid(pfn)))
continue;
 
if (pfn_to_nid(pfn) != nid)

[PATCH v10 05/13] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()

2019-06-18 Thread Dan Williams

Allow sub-section sized ranges to be added to the memmap.
populate_section_memmap() takes an explict pfn range rather than
assuming a full section, and those parameters are plumbed all the way
through to vmmemap_populate(). There should be no sub-section usage in
current deployments. New warnings are added to clarify which memmap
allocation paths are sub-section capable.

Cc: Michal Hocko 
Cc: David Hildenbrand 
Cc: Logan Gunthorpe 
Cc: Oscar Salvador 
Reviewed-by: Pavel Tatashin 
Signed-off-by: Dan Williams 
---
 arch/x86/mm/init_64.c |4 +++-
 include/linux/mm.h|4 ++--
 mm/sparse-vmemmap.c   |   21 ++---
 mm/sparse.c   |   50 ++---
 4 files changed, 46 insertions(+), 33 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 8335ac6e1112..688fb0687e55 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1520,7 +1520,9 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
 {
int err;
 
-   if (boot_cpu_has(X86_FEATURE_PSE))
+   if (end - start < PAGES_PER_SECTION * sizeof(struct page))
+   err = vmemmap_populate_basepages(start, end, node);
+   else if (boot_cpu_has(X86_FEATURE_PSE))
err = vmemmap_populate_hugepages(start, end, node, altmap);
else if (altmap) {
pr_err_once("%s: no cpu support for altmap allocations\n",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c6ae9eba645d..f7616518124e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2752,8 +2752,8 @@ const char * arch_vma_name(struct vm_area_struct *vma);
 void print_vma_addr(char *prefix, unsigned long rip);
 
 void *sparse_buffer_alloc(unsigned long size);
-struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
-   struct vmem_altmap *altmap);
+struct page * __populate_section_memmap(unsigned long pfn,
+   unsigned long nr_pages, int nid, struct vmem_altmap *altmap);
 pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
 p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node);
 pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 7fec05796796..200aef686722 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -245,19 +245,26 @@ int __meminit vmemmap_populate_basepages(unsigned long 
start,
return 0;
 }
 
-struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid,
-   struct vmem_altmap *altmap)
+struct page * __meminit __populate_section_memmap(unsigned long pfn,
+   unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
unsigned long start;
unsigned long end;
-   struct page *map;
 
-   map = pfn_to_page(pnum * PAGES_PER_SECTION);
-   start = (unsigned long)map;
-   end = (unsigned long)(map + PAGES_PER_SECTION);
+   /*
+* The minimum granularity of memmap extensions is
+* PAGES_PER_SUBSECTION as allocations are tracked in the
+* 'subsection_map' bitmap of the section.
+*/
+   end = ALIGN(pfn + nr_pages, PAGES_PER_SUBSECTION);
+   pfn &= PAGE_SUBSECTION_MASK;
+   nr_pages = end - pfn;
+
+   start = (unsigned long) pfn_to_page(pfn);
+   end = start + nr_pages * sizeof(struct page);
 
if (vmemmap_populate(start, end, nid, altmap))
return NULL;
 
-   return map;
+   return pfn_to_page(pfn);
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index e9fec3c2f7ec..49f0c03d15a3 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -439,8 +439,8 @@ static unsigned long __init section_map_size(void)
return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
 }
 
-struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid,
-   struct vmem_altmap *altmap)
+struct page __init *__populate_section_memmap(unsigned long pfn,
+   unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
 {
unsigned long size = section_map_size();
struct page *map = sparse_buffer_alloc(size);
@@ -521,10 +521,13 @@ static void __init sparse_init_nid(int nid, unsigned long 
pnum_begin,
}
sparse_buffer_init(map_count * section_map_size(), nid);
for_each_present_section_nr(pnum_begin, pnum) {
+   unsigned long pfn = section_nr_to_pfn(pnum);
+
if (pnum >= pnum_end)
break;
 
-   map = sparse_mem_map_populate(pnum, nid, NULL);
+   map = __populate_section_memmap(pfn, PAGES_PER_SECTION,
+   nid, NULL);
if (!map) {
pr_err("%s: node[%d] memory map backing failed. Some 
memory will not be available.",
   __func__, nid);
@@ -625,17 +628,17 @@ void offline_mem_sections(unsigned long start_pfn, 
unsign

[PATCH v10 02/13] mm/sparsemem: Introduce a SECTION_IS_EARLY flag

2019-06-18 Thread Dan Williams

In preparation for sub-section hotplug, track whether a given section
was created during early memory initialization, or later via memory
hotplug.  This distinction is needed to maintain the coarse expectation
that pfn_valid() returns true for any pfn within a given section even if
that section has pages that are reserved from the page allocator.

For example one of the of goals of subsection hotplug is to support
cases where the system physical memory layout collides System RAM and
PMEM within a section. Several pfn_valid() users expect to just check if
a section is valid, but they are not careful to check if the given pfn
is within a "System RAM" boundary and instead expect pgdat information
to further validate the pfn.

Rather than unwind those paths to make their pfn_valid() queries more
precise a follow on patch uses the SECTION_IS_EARLY flag to maintain the
traditional expectation that pfn_valid() returns true for all early
sections.

Link: https://lore.kernel.org/lkml/1560366952-10660-1-git-send-email-...@lca.pw/
Reported-by: Qian Cai 
Cc: Michal Hocko 
Cc: Logan Gunthorpe 
Cc: David Hildenbrand 
Cc: Oscar Salvador 
Cc: Pavel Tatashin 
Signed-off-by: Dan Williams 
---
 include/linux/mmzone.h |8 +++-
 mm/sparse.c|   20 +---
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 179680c94262..d081c9a1d25d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1261,7 +1261,8 @@ extern size_t mem_section_usage_size(void);
 #defineSECTION_MARKED_PRESENT  (1UL<<0)
 #define SECTION_HAS_MEM_MAP(1UL<<1)
 #define SECTION_IS_ONLINE  (1UL<<2)
-#define SECTION_MAP_LAST_BIT   (1UL<<3)
+#define SECTION_IS_EARLY   (1UL<<3)
+#define SECTION_MAP_LAST_BIT   (1UL<<4)
 #define SECTION_MAP_MASK   (~(SECTION_MAP_LAST_BIT-1))
 #define SECTION_NID_SHIFT  3
 
@@ -1287,6 +1288,11 @@ static inline int valid_section(struct mem_section 
*section)
return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP));
 }
 
+static inline int early_section(struct mem_section *section)
+{
+   return (section && (section->section_mem_map & SECTION_IS_EARLY));
+}
+
 static inline int valid_section_nr(unsigned long nr)
 {
return valid_section(__nr_to_section(nr));
diff --git a/mm/sparse.c b/mm/sparse.c
index 71da15cc7432..2031a0694f35 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -288,11 +288,11 @@ struct page *sparse_decode_mem_map(unsigned long 
coded_mem_map, unsigned long pn
 
 static void __meminit sparse_init_one_section(struct mem_section *ms,
unsigned long pnum, struct page *mem_map,
-   struct mem_section_usage *usage)
+   struct mem_section_usage *usage, unsigned long flags)
 {
ms->section_mem_map &= ~SECTION_MAP_MASK;
-   ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) |
-   SECTION_HAS_MEM_MAP;
+   ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum)
+   | SECTION_HAS_MEM_MAP | flags;
ms->usage = usage;
 }
 
@@ -497,7 +497,8 @@ static void __init sparse_init_nid(int nid, unsigned long 
pnum_begin,
goto failed;
}
check_usemap_section_nr(nid, usage);
-   sparse_init_one_section(__nr_to_section(pnum), pnum, map, 
usage);
+   sparse_init_one_section(__nr_to_section(pnum), pnum, map, usage,
+   SECTION_IS_EARLY);
usage = (void *) usage + mem_section_usage_size();
}
sparse_buffer_fini();
@@ -731,7 +732,7 @@ int __meminit sparse_add_one_section(int nid, unsigned long 
start_pfn,
page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION);
 
section_mark_present(ms);
-   sparse_init_one_section(ms, section_nr, memmap, usage);
+   sparse_init_one_section(ms, section_nr, memmap, usage, 0);
 
 out:
if (ret < 0) {
@@ -771,19 +772,16 @@ static inline void clear_hwpoisoned_pages(struct page 
*memmap, int nr_pages)
 }
 #endif
 
-static void free_section_usage(struct page *memmap,
+static void free_section_usage(struct mem_section *ms, struct page *memmap,
struct mem_section_usage *usage, struct vmem_altmap *altmap)
 {
-   struct page *usage_page;
-
if (!usage)
return;
 
-   usage_page = virt_to_page(usage);
/*
 * Check to see if allocation came from hot-plug-add
 */
-   if (PageSlab(usage_page) || PageCompound(usage_page)) {
+   if (!early_section(ms)) {
kfree(usage);
if (memmap)
__kfree_section_memmap(memmap, altmap);
@@ -815,6 +813,6 @@ void sparse_remove_one_section(struct mem_section *ms, 
unsigned long map_offset,
 
clear_hwpoisoned_pages(memmap + map_offset,
PAGES_PER_SECTION - map_offset);

[PATCH v10 00/13] mm: Sub-section memory hotplug support

2019-06-18 Thread Dan Williams

Changes since v9 [1]:
- Fix multiple issues related to the fact that pfn_valid() has
  traditionally returned true for any pfn in an 'early' (onlined at
  boot) section regardless of whether that pfn represented 'System RAM'.
  Teach pfn_valid() to maintain its traditional behavior in the presence
  of subsections. Specifically, subsection precision for pfn_valid() is
  only considered for non-early / hot-plugged sections. (Qian)

- Related to the first item introduce a SECTION_IS_EARLY
  (->section_mem_map flag) to remove the existing hacks for determining
  an early section by looking at whether the usemap was allocated from the
  slab.

- Kill off the EEXIST hackery in __add_pages(). It breaks
  (arch_add_memory() false-positive) the detection of subsection
  collisions reported by section_activate(). It is also obviated by
  David's recent reworks to move the 'System RAM' request_region() earlier
  in the add_memory() sequence().

- Switch to an arch-independent / static subsection-size of 2MB.
  Otherwise, a per-arch subsection-size is a roadblock on the path to
  persistent memory namespace compatibility across archs. (Jeff)

- Update the changelog for "libnvdimm/pfn: Fix fsdax-mode namespace
  info-block zero-fields" to clarify that the "Cc: stable" is only there
  as safety measure for a distro that decides to backport "libnvdimm/pfn:
  Stop padding pmem namespaces to section alignment", otherwise there is
  no known bug exposure in older kernels. (Andrew)
  
- Drop some redundant subsection checks (Oscar)

- Collect some reviewed-bys

[1]: 
https://lore.kernel.org/lkml/155977186863.2443951.9036044808311959913.st...@dwillia2-desk3.amr.corp.intel.com/

---

The memory hotplug section is an arbitrary / convenient unit for memory
hotplug. 'Section-size' units have bled into the user interface
('memblock' sysfs) and can not be changed without breaking existing
userspace. The section-size constraint, while mostly benign for typical
memory hotplug, has and continues to wreak havoc with 'device-memory'
use cases, persistent memory (pmem) in particular. Recall that pmem uses
devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
'struct page' memmap for pmem. However, it does not use the 'bottom
half' of memory hotplug, i.e. never marks pmem pages online and never
exposes the userspace memblock interface for pmem. This leaves an
opening to redress the section-size constraint.

To date, the libnvdimm subsystem has attempted to inject padding to
satisfy the internal constraints of arch_add_memory(). Beyond
complicating the code, leading to bugs [2], wasting memory, and limiting
configuration flexibility, the padding hack is broken when the platform
changes this physical memory alignment of pmem from one boot to the
next. Device failure (intermittent or permanent) and physical
reconfiguration are events that can cause the platform firmware to
change the physical placement of pmem on a subsequent boot, and device
failure is an everyday event in a data-center.

It turns out that sections are only a hard requirement of the
user-facing interface for memory hotplug and with a bit more
infrastructure sub-section arch_add_memory() support can be added for
kernel internal usages like devm_memremap_pages(). Here is an analysis
of the current design assumptions in the current code and how they are
addressed in the new implementation:

Current design assumptions:

- Sections that describe boot memory (early sections) are never
  unplugged / removed.

- pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a
  valid_section() check

- __add_pages() and helper routines assume all operations occur in
  PAGES_PER_SECTION units.

- The memblock sysfs interface only comprehends full sections

New design assumptions:

- Sections are instrumented with a sub-section bitmask to track (on x86)
  individual 2MB sub-divisions of a 128MB section.

- Partially populated early sections can be extended with additional
  sub-sections, and those sub-sections can be removed with
  arch_remove_memory(). With this in place we no longer lose usable memory
  capacity to padding.

- pfn_valid() is updated to look deeper than valid_section() to also check the
  active-sub-section mask. This indication is in the same cacheline as
  the valid_section() so the performance impact is expected to be
  negligible. So far the lkp robot has not reported any regressions.

- Outside of the core vmemmap population routines which are replaced,
  other helper routines like shrink_{zone,pgdat}_span() are updated to
  handle the smaller granularity. Core memory hotplug routines that deal
  with online memory are not touched.

- The existing memblock sysfs user api guarantees / assumptions are
  not touched since this capability is limited to !online
  !memblock-sysfs-accessible sections.

Meanwhile the issue reports continue to roll in from users that do not
understand when and how the 128MB constraint will bite them. The current
im

[PATCH v10 03/13] mm/sparsemem: Add helpers track active portions of a section at boot

2019-06-18 Thread Dan Williams

Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
sub-section active bitmask, each bit representing a PMD_SIZE span of the
architecture's memory hotplug section size.

The implications of a partially populated section is that pfn_valid()
needs to go beyond a valid_section() check and either determine that the
section is an "early section", or read the sub-section active ranges
from the bitmask. The expectation is that the bitmask (subsection_map)
fits in the same cacheline as the valid_section() / early_section()
data, so the incremental performance overhead to pfn_valid() should be
negligible.

The rationale for using early_section() to short-ciruit the
subsection_map check is that there are legacy code paths that use
pfn_valid() at section granularity before validating the pfn against
pgdat data. So, the early_section() check allows those traditional
assumptions to persist while also permitting subsection_map to tell the
truth for purposes of populating the unused portions of early sections
with PMEM and other ZONE_DEVICE mappings.

Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Cc: Oscar Salvador 
Cc: Pavel Tatashin 
Reported-by: Qian Cai 
Tested-by: Jane Chu 
Signed-off-by: Dan Williams 
---
 include/linux/mmzone.h |   33 -
 mm/page_alloc.c|   10 --
 mm/sparse.c|   35 +++
 3 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d081c9a1d25d..c4e8843e283c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1179,6 +1179,8 @@ struct mem_section_usage {
unsigned long pageblock_flags[0];
 };
 
+void subsection_map_init(unsigned long pfn, unsigned long nr_pages);
+
 struct page;
 struct page_ext;
 struct mem_section {
@@ -1322,12 +1324,40 @@ static inline struct mem_section 
*__pfn_to_section(unsigned long pfn)
 
 extern int __highest_present_section_nr;
 
+static inline int subsection_map_index(unsigned long pfn)
+{
+   return (pfn & ~(PAGE_SECTION_MASK)) / PAGES_PER_SUBSECTION;
+}
+
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
+{
+   int idx = subsection_map_index(pfn);
+
+   return test_bit(idx, ms->usage->subsection_map);
+}
+#else
+static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
+{
+   return 1;
+}
+#endif
+
 #ifndef CONFIG_HAVE_ARCH_PFN_VALID
 static inline int pfn_valid(unsigned long pfn)
 {
+   struct mem_section *ms;
+
if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
return 0;
-   return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
+   ms = __nr_to_section(pfn_to_section_nr(pfn));
+   if (!valid_section(ms))
+   return 0;
+   /*
+* Traditionally early sections always returned pfn_valid() for
+* the entire section-sized span.
+*/
+   return early_section(ms) || pfn_section_valid(ms, pfn);
 }
 #endif
 
@@ -1359,6 +1389,7 @@ void sparse_init(void);
 #define sparse_init()  do {} while (0)
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #define pfn_present pfn_valid
+#define subsection_map_init(_pfn, _nr_pages) do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8cc091e87200..8e7215fb6976 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7306,12 +7306,18 @@ void __init free_area_init_nodes(unsigned long 
*max_zone_pfn)
   (u64)zone_movable_pfn[i] << PAGE_SHIFT);
}
 
-   /* Print out the early node map */
+   /*
+* Print out the early node map, and initialize the
+* subsection-map relative to active online memory ranges to
+* enable future "sub-section" extensions of the memory map.
+*/
pr_info("Early memory node ranges\n");
-   for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
+   for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
(u64)start_pfn << PAGE_SHIFT,
((u64)end_pfn << PAGE_SHIFT) - 1);
+   subsection_map_init(start_pfn, end_pfn - start_pfn);
+   }
 
/* Initialise every node */
mminit_verify_pageflags_layout();
diff --git a/mm/sparse.c b/mm/sparse.c
index 2031a0694f35..e9fec3c2f7ec 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -210,6 +210,41 @@ static inline unsigned long first_present_section_nr(void)
return next_present_section_nr(-1);
 }
 
+void subsection_mask_set(unsigned long *map, unsigned long pfn,
+   unsigned long nr_pages)
+{
+   int idx = subsection_map_index(pfn);
+   int end = subsection_map_index(pfn + nr_pages - 1);
+
+   bitmap_set(map, idx, end - idx + 1);
+}
+
+void __init subsection_map_init(uns

[PATCH v10 01/13] mm/sparsemem: Introduce struct mem_section_usage

2019-06-18 Thread Dan Williams

Towards enabling memory hotplug to track partial population of a
section, introduce 'struct mem_section_usage'.

A pointer to a 'struct mem_section_usage' instance replaces the existing
pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
house a new 'subsection_map' bitmap.  The new bitmap enables the memory
hot{plug,remove} implementation to act on incremental sub-divisions of a
section.

SUBSECTION_SHIFT is defined as global constant instead of
per-architecture value like SECTION_SIZE_BITS in order to allow
cross-arch compatibility of subsection users. Specifically a common
subsection size allows for the possibility that persistent memory
namespace configurations be made compatible across architectures.

The primary motivation for this functionality is to support platforms
that mix "System RAM" and "Persistent Memory" within a single section,
or multiple PMEM ranges with different mapping lifetimes within a single
section. The section restriction for hotplug has caused an ongoing saga
of hacks and bugs for devm_memremap_pages() users.

Beyond the fixups to teach existing paths how to retrieve the 'usemap'
from a section, and updates to usemap allocation path, there are no
expected behavior changes.

Cc: Michal Hocko 
Cc: Vlastimil Babka 
Cc: Logan Gunthorpe 
Cc: Pavel Tatashin 
Reviewed-by: Oscar Salvador 
Reviewed-by: Wei Yang 
Signed-off-by: Dan Williams 
---
 include/linux/mmzone.h |   28 +++--
 mm/memory_hotplug.c|   18 ++-
 mm/page_alloc.c|2 +
 mm/sparse.c|   81 
 4 files changed, 76 insertions(+), 53 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 427b79c39b3c..179680c94262 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1161,6 +1161,24 @@ static inline unsigned long section_nr_to_pfn(unsigned 
long sec)
 #define SECTION_ALIGN_UP(pfn)  (((pfn) + PAGES_PER_SECTION - 1) & 
PAGE_SECTION_MASK)
 #define SECTION_ALIGN_DOWN(pfn)((pfn) & PAGE_SECTION_MASK)
 
+#define SUBSECTION_SHIFT 21
+
+#define PFN_SUBSECTION_SHIFT (SUBSECTION_SHIFT - PAGE_SHIFT)
+#define PAGES_PER_SUBSECTION (1UL << PFN_SUBSECTION_SHIFT)
+#define PAGE_SUBSECTION_MASK (~(PAGES_PER_SUBSECTION-1))
+
+#if SUBSECTION_SHIFT > SECTION_SIZE_BITS
+#error Subsection size exceeds section size
+#else
+#define SUBSECTIONS_PER_SECTION (1UL << (SECTION_SIZE_BITS - SUBSECTION_SHIFT))
+#endif
+
+struct mem_section_usage {
+   DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION);
+   /* See declaration of similar field in struct zone */
+   unsigned long pageblock_flags[0];
+};
+
 struct page;
 struct page_ext;
 struct mem_section {
@@ -1178,8 +1196,7 @@ struct mem_section {
 */
unsigned long section_mem_map;
 
-   /* See declaration of similar field in struct zone */
-   unsigned long *pageblock_flags;
+   struct mem_section_usage *usage;
 #ifdef CONFIG_PAGE_EXTENSION
/*
 * If SPARSEMEM, pgdat doesn't have page_ext pointer. We use
@@ -1210,6 +1227,11 @@ extern struct mem_section **mem_section;
 extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT];
 #endif
 
+static inline unsigned long *section_to_usemap(struct mem_section *ms)
+{
+   return ms->usage->pageblock_flags;
+}
+
 static inline struct mem_section *__nr_to_section(unsigned long nr)
 {
 #ifdef CONFIG_SPARSEMEM_EXTREME
@@ -1221,7 +1243,7 @@ static inline struct mem_section 
*__nr_to_section(unsigned long nr)
return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK];
 }
 extern int __section_nr(struct mem_section* ms);
-extern unsigned long usemap_size(void);
+extern size_t mem_section_usage_size(void);
 
 /*
  * We use the lower bits of the mem_map pointer to store
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a88c5f334e5a..7b963c2d3a0d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -166,9 +166,10 @@ void put_page_bootmem(struct page *page)
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
-   unsigned long *usemap, mapsize, section_nr, i;
+   unsigned long mapsize, section_nr, i;
struct mem_section *ms;
struct page *page, *memmap;
+   struct mem_section_usage *usage;
 
section_nr = pfn_to_section_nr(start_pfn);
ms = __nr_to_section(section_nr);
@@ -188,10 +189,10 @@ static void register_page_bootmem_info_section(unsigned 
long start_pfn)
for (i = 0; i < mapsize; i++, page++)
get_page_bootmem(section_nr, page, SECTION_INFO);
 
-   usemap = ms->pageblock_flags;
-   page = virt_to_page(usemap);
+   usage = ms->usage;
+   page = virt_to_page(usage);
 
-   mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT;
+   mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT;

Re: [RESEND v4 1/4] soc: qcom: geni: Add support for ACPI

2019-06-18 Thread Lee Jones

On Wed, 19 Jun 2019, Andy Gross wrote:

> On Mon, Jun 17, 2019 at 01:51:02PM +0100, Lee Jones wrote:
> > When booting with ACPI as the active set of configuration tables,
> > all; clocks, regulators, pin functions ect are expected to be at
> > their ideal values/levels/rates, thus the associated frameworks
> > are unavailable.  Ensure calls to these APIs are shielded when
> > ACPI is enabled.
> > 
> > Signed-off-by: Lee Jones 
> > Acked-by: Ard Biesheuvel 
> 
> Applied.

Thanks Bjorn and Andy.

-- 
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

Re: [PATCH 4/5] Powerpc/hw-breakpoint: Optimize disable path

2019-06-18 Thread Ravi Bangoria

On 6/18/19 11:45 AM, Michael Neuling wrote:
> On Tue, 2019-06-18 at 09:57 +0530, Ravi Bangoria wrote:
>> Directly setting dawr and dawrx with 0 should be enough to
>> disable watchpoint. No need to reset individual bits in
>> variable and then set in hw.
> 
> This seems like a pointless optimisation to me. 
> 
> I'm all for adding more code/complexity if it buys us some performance, but I
> can't imagine this is a fast path (nor have you stated any performance
> benefits). 

This gets called from sched_switch. I expected the improvement when
we switch from monitored process to non-monitored process. With such
scenario, I tried to measure the difference in execution time of
set_dawr but I don't see any improvement. So I'll drop the patch.

Re: [PATCH] NTB: test: remove a duplicate check

2019-06-18 Thread Dan Carpenter

It's not a huge deal obviously but your commit was a6bed7a54165 ("NTB:
Introduce NTB MSI Test Client") but you know that if I had sent a patch
called ("NTB: remove a duplicate check") people would have correctly
complained because the patch prefix is too vague.

What I'm saying is we do this all the time:

[PATCH] NTB: add a new foobazle driver

But it should be:

[PATCH] NTB: foobazle: add a new foobazle driver

Then I can just copy and paste your patch prefix instead of trying
invent one.

regards,
dan carpenter

Re: [PATCH] mfd: stmfx: Fix an endian bug in stmfx_irq_handler()

2019-06-18 Thread Lee Jones

On Tue, 18 Jun 2019, Linus Torvalds wrote:

> On Tue, Jun 18, 2019 at 1:16 AM Lee Jones  wrote:
> >
> > > Reported-by: Linus Torvalds 
> >
> > Ideally we can get a review too.
> 
> Looks fine to me, but obviously somebody should actually _test_ it too.

Amelie, would you be so kind?

-- 
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

Re: [PATCH RESEND 1/8] s390: Start fallback of top-down mmap at mm->mmap_base

2019-06-18 Thread Alex Ghiti

Really sorry about that, my connection is weird this morning, I'll retry 
tomorrow.


Sorry again,

Alex

On 6/19/19 1:42 AM, Alexandre Ghiti wrote:

In case of mmap failure in top-down mode, there is no need to go through
the whole address space again for the bottom-up fallback: the goal of this
fallback is to find, as a last resort, space between the top-down mmap base
and the stack, which is the only place not covered by the top-down mmap.

Signed-off-by: Alexandre Ghiti 
---
  arch/s390/mm/mmap.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
index cbc718ba6d78..4a222969843b 100644
--- a/arch/s390/mm/mmap.c
+++ b/arch/s390/mm/mmap.c
@@ -166,7 +166,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
if (addr & ~PAGE_MASK) {
VM_BUG_ON(addr != -ENOMEM);
info.flags = 0;
-   info.low_limit = TASK_UNMAPPED_BASE;
+   info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
addr = vm_unmapped_area(&info);
if (addr & ~PAGE_MASK)

Re: [RESEND v4 1/4] soc: qcom: geni: Add support for ACPI

2019-06-18 Thread Andy Gross

On Mon, Jun 17, 2019 at 01:51:02PM +0100, Lee Jones wrote:
> When booting with ACPI as the active set of configuration tables,
> all; clocks, regulators, pin functions ect are expected to be at
> their ideal values/levels/rates, thus the associated frameworks
> are unavailable.  Ensure calls to these APIs are shielded when
> ACPI is enabled.
> 
> Signed-off-by: Lee Jones 
> Acked-by: Ard Biesheuvel 

Applied.

Thanks,

Andy

[PATCH V5 3/5] clk: imx: Add API for clk unregister when driver probe fail

2019-06-18 Thread Anson . Huang

From: Anson Huang 

For i.MX clock drivers probe fail case, clks should be unregistered
in the return path, this patch adds a common API for i.MX clock
drivers to unregister clocks when fail.

Signed-off-by: Anson Huang 
---
New patch.
---
 drivers/clk/imx/clk.c | 8 
 drivers/clk/imx/clk.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/drivers/clk/imx/clk.c b/drivers/clk/imx/clk.c
index f241189..8616967 100644
--- a/drivers/clk/imx/clk.c
+++ b/drivers/clk/imx/clk.c
@@ -13,6 +13,14 @@
 
 DEFINE_SPINLOCK(imx_ccm_lock);
 
+void imx_unregister_clocks(struct clk *clks[], unsigned int count)
+{
+   unsigned int i;
+
+   for (i = 0; i < count; i++)
+   clk_unregister(clks[i]);
+}
+
 void __init imx_mmdc_mask_handshake(void __iomem *ccm_base,
unsigned int chn)
 {
diff --git a/drivers/clk/imx/clk.h b/drivers/clk/imx/clk.h
index 19d7b8b..bb4ec1b 100644
--- a/drivers/clk/imx/clk.h
+++ b/drivers/clk/imx/clk.h
@@ -12,6 +12,7 @@ void imx_check_clk_hws(struct clk_hw *clks[], unsigned int 
count);
 void imx_register_uart_clocks(struct clk ** const clks[]);
 void imx_register_uart_clocks_hws(struct clk_hw ** const hws[]);
 void imx_mmdc_mask_handshake(void __iomem *ccm_base, unsigned int chn);
+void imx_unregister_clocks(struct clk *clks[], unsigned int count);
 
 extern void imx_cscmr1_fixup(u32 *val);
 
-- 
2.7.4

[PATCH V5 4/5] clk: imx: Add support for i.MX8MN clock driver

2019-06-18 Thread Anson . Huang

From: Anson Huang 

This patch adds i.MX8MN clock driver support.

Signed-off-by: Anson Huang 
---
Changes since V4:
- use dev_err instead of pr_err;
- unregister clocks when probe failed.
---
 drivers/clk/imx/Kconfig  |   6 +
 drivers/clk/imx/Makefile |   1 +
 drivers/clk/imx/clk-imx8mn.c | 636 +++
 3 files changed, 643 insertions(+)
 create mode 100644 drivers/clk/imx/clk-imx8mn.c

diff --git a/drivers/clk/imx/Kconfig b/drivers/clk/imx/Kconfig
index 0eaf418..1ac0c79 100644
--- a/drivers/clk/imx/Kconfig
+++ b/drivers/clk/imx/Kconfig
@@ -14,6 +14,12 @@ config CLK_IMX8MM
help
Build the driver for i.MX8MM CCM Clock Driver
 
+config CLK_IMX8MN
+   bool "IMX8MN CCM Clock Driver"
+   depends on ARCH_MXC && ARM64
+   help
+   Build the driver for i.MX8MN CCM Clock Driver
+
 config CLK_IMX8MQ
bool "IMX8MQ CCM Clock Driver"
depends on ARCH_MXC && ARM64
diff --git a/drivers/clk/imx/Makefile b/drivers/clk/imx/Makefile
index 05641c6..77a3d71 100644
--- a/drivers/clk/imx/Makefile
+++ b/drivers/clk/imx/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_MXC_CLK_SCU) += \
clk-lpcg-scu.o
 
 obj-$(CONFIG_CLK_IMX8MM) += clk-imx8mm.o
+obj-$(CONFIG_CLK_IMX8MN) += clk-imx8mn.o
 obj-$(CONFIG_CLK_IMX8MQ) += clk-imx8mq.o
 obj-$(CONFIG_CLK_IMX8QXP) += clk-imx8qxp.o clk-imx8qxp-lpcg.o
 
diff --git a/drivers/clk/imx/clk-imx8mn.c b/drivers/clk/imx/clk-imx8mn.c
new file mode 100644
index 000..07481a5
--- /dev/null
+++ b/drivers/clk/imx/clk-imx8mn.c
@@ -0,0 +1,636 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018-2019 NXP.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "clk.h"
+
+static u32 share_count_sai2;
+static u32 share_count_sai3;
+static u32 share_count_sai5;
+static u32 share_count_sai6;
+static u32 share_count_sai7;
+static u32 share_count_disp;
+static u32 share_count_pdm;
+static u32 share_count_nand;
+
+enum {
+   ARM_PLL,
+   GPU_PLL,
+   VPU_PLL,
+   SYS_PLL1,
+   SYS_PLL2,
+   SYS_PLL3,
+   DRAM_PLL,
+   AUDIO_PLL1,
+   AUDIO_PLL2,
+   VIDEO_PLL2,
+   NR_PLLS,
+};
+
+static const struct imx_pll14xx_rate_table imx8mn_pll1416x_tbl[] = {
+   PLL_1416X_RATE(18U, 225, 3, 0),
+   PLL_1416X_RATE(16U, 200, 3, 0),
+   PLL_1416X_RATE(12U, 300, 3, 1),
+   PLL_1416X_RATE(10U, 250, 3, 1),
+   PLL_1416X_RATE(8U,  200, 3, 1),
+   PLL_1416X_RATE(75000U,  250, 2, 2),
+   PLL_1416X_RATE(7U,  350, 3, 2),
+   PLL_1416X_RATE(6U,  300, 3, 2),
+};
+
+static const struct imx_pll14xx_rate_table imx8mn_audiopll_tbl[] = {
+   PLL_1443X_RATE(786432000U, 655, 5, 2, 23593),
+   PLL_1443X_RATE(722534400U, 301, 5, 1, 3670),
+};
+
+static const struct imx_pll14xx_rate_table imx8mn_videopll_tbl[] = {
+   PLL_1443X_RATE(65000U, 325, 3, 2, 0),
+   PLL_1443X_RATE(59400U, 198, 2, 2, 0),
+};
+
+static const struct imx_pll14xx_rate_table imx8mn_drampll_tbl[] = {
+   PLL_1443X_RATE(65000U, 325, 3, 2, 0),
+};
+
+static struct imx_pll14xx_clk imx8mn_audio_pll = {
+   .type = PLL_1443X,
+   .rate_table = imx8mn_audiopll_tbl,
+};
+
+static struct imx_pll14xx_clk imx8mn_video_pll = {
+   .type = PLL_1443X,
+   .rate_table = imx8mn_videopll_tbl,
+};
+
+static struct imx_pll14xx_clk imx8mn_dram_pll = {
+   .type = PLL_1443X,
+   .rate_table = imx8mn_drampll_tbl,
+};
+
+static struct imx_pll14xx_clk imx8mn_arm_pll = {
+   .type = PLL_1416X,
+   .rate_table = imx8mn_pll1416x_tbl,
+};
+
+static struct imx_pll14xx_clk imx8mn_gpu_pll = {
+   .type = PLL_1416X,
+   .rate_table = imx8mn_pll1416x_tbl,
+};
+
+static struct imx_pll14xx_clk imx8mn_vpu_pll = {
+   .type = PLL_1416X,
+   .rate_table = imx8mn_pll1416x_tbl,
+};
+
+static struct imx_pll14xx_clk imx8mn_sys_pll = {
+   .type = PLL_1416X,
+   .rate_table = imx8mn_pll1416x_tbl,
+};
+
+static const char * const pll_ref_sels[] = { "osc_24m", "dummy", "dummy", 
"dummy", };
+static const char * const audio_pll1_bypass_sels[] = {"audio_pll1", 
"audio_pll1_ref_sel", };
+static const char * const audio_pll2_bypass_sels[] = {"audio_pll2", 
"audio_pll2_ref_sel", };
+static const char * const video_pll1_bypass_sels[] = {"video_pll1", 
"video_pll1_ref_sel", };
+static const char * const dram_pll_bypass_sels[] = {"dram_pll", 
"dram_pll_ref_sel", };
+static const char * const gpu_pll_bypass_sels[] = {"gpu_pll", 
"gpu_pll_ref_sel", };
+static const char * const vpu_pll_bypass_sels[] = {"vpu_pll", 
"vpu_pll_ref_sel", };
+static const char * const arm_pll_bypass_sels[] = {"arm_pll", 
"arm_pll_ref_sel", };
+static const char * const sys_pll1_bypass_sels[] = {"sys_pll1", 
"sys_pll1_ref_sel", };
+

[PATCH V5 1/5] dt-bindings: imx: Add clock binding doc for i.MX8MN

2019-06-18 Thread Anson . Huang

From: Anson Huang 

Add the clock binding doc for i.MX8MN.

Signed-off-by: Anson Huang 
Reviewed-by: Maxime Ripard 
---
No changes.
---
 .../devicetree/bindings/clock/imx8mn-clock.yaml| 112 +++
 include/dt-bindings/clock/imx8mn-clock.h   | 215 +
 2 files changed, 327 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/imx8mn-clock.yaml
 create mode 100644 include/dt-bindings/clock/imx8mn-clock.h

diff --git a/Documentation/devicetree/bindings/clock/imx8mn-clock.yaml 
b/Documentation/devicetree/bindings/clock/imx8mn-clock.yaml
new file mode 100644
index 000..454c5b4
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/imx8mn-clock.yaml
@@ -0,0 +1,112 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/bindings/clock/imx8mn-clock.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: NXP i.MX8M Nano Clock Control Module Binding
+
+maintainers:
+  - Anson Huang 
+
+description: |
+  NXP i.MX8M Nano clock control module is an integrated clock controller, which
+  generates and supplies to all modules.
+
+properties:
+  compatible:
+const: fsl,imx8mn-ccm
+
+  reg:
+maxItems: 1
+
+  clocks:
+items:
+  - description: 32k osc
+  - description: 24m osc
+  - description: ext1 clock input
+  - description: ext2 clock input
+  - description: ext3 clock input
+  - description: ext4 clock input
+
+  clock-names:
+items:
+  - const: osc_32k
+  - const: osc_24m
+  - const: clk_ext1
+  - const: clk_ext2
+  - const: clk_ext3
+  - const: clk_ext4
+
+  '#clock-cells':
+const: 1
+description: |
+  The clock consumer should specify the desired clock by having the clock
+  ID in its "clocks" phandle cell. See 
include/dt-bindings/clock/imx8mn-clock.h
+  for the full list of i.MX8M Nano clock IDs.
+
+required:
+  - compatible
+  - reg
+  - clocks
+  - clock-names
+  - '#clock-cells'
+
+examples:
+  # Clock Control Module node:
+  - |
+clk: clock-controller@3038 {
+compatible = "fsl,imx8mn-ccm";
+reg = <0x0 0x3038 0x0 0x1>;
+#clock-cells = <1>;
+clocks = <&osc_32k>, <&osc_24m>, <&clk_ext1>,
+ <&clk_ext2>, <&clk_ext3>, <&clk_ext4>;
+clock-names = "osc_32k", "osc_24m", "clk_ext1",
+  "clk_ext2", "clk_ext3", "clk_ext4";
+};
+
+  # Required external clocks for Clock Control Module node:
+  - |
+osc_32k: clock-osc-32k {
+compatible = "fixed-clock";
+#clock-cells = <0>;
+clock-frequency = <32768>;
+   clock-output-names = "osc_32k";
+};
+
+osc_24m: clock-osc-24m {
+compatible = "fixed-clock";
+#clock-cells = <0>;
+clock-frequency = <2400>;
+clock-output-names = "osc_24m";
+};
+
+clk_ext1: clock-ext1 {
+compatible = "fixed-clock";
+#clock-cells = <0>;
+clock-frequency = <13300>;
+clock-output-names = "clk_ext1";
+};
+
+clk_ext2: clock-ext2 {
+compatible = "fixed-clock";
+#clock-cells = <0>;
+clock-frequency = <13300>;
+clock-output-names = "clk_ext2";
+};
+
+clk_ext3: clock-ext3 {
+compatible = "fixed-clock";
+#clock-cells = <0>;
+clock-frequency = <13300>;
+clock-output-names = "clk_ext3";
+};
+
+clk_ext4: clock-ext4 {
+compatible = "fixed-clock";
+#clock-cells = <0>;
+clock-frequency= <13300>;
+clock-output-names = "clk_ext4";
+};
+
+...
diff --git a/include/dt-bindings/clock/imx8mn-clock.h 
b/include/dt-bindings/clock/imx8mn-clock.h
new file mode 100644
index 000..5255b1c
--- /dev/null
+++ b/include/dt-bindings/clock/imx8mn-clock.h
@@ -0,0 +1,215 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2018-2019 NXP
+ */
+
+#ifndef __DT_BINDINGS_CLOCK_IMX8MN_H
+#define __DT_BINDINGS_CLOCK_IMX8MN_H
+
+#define IMX8MN_CLK_DUMMY   0
+#define IMX8MN_CLK_32K 1
+#define IMX8MN_CLK_24M 2
+#define IMX8MN_OSC_HDMI_CLK3
+#define IMX8MN_CLK_EXT14
+#define IMX8MN_CLK_EXT25
+#define IMX8MN_CLK_EXT36
+#define IMX8MN_CLK_EXT47
+#define IMX8MN_AUDIO_PLL1_REF_SEL  8
+#define IMX8MN_AUDIO_PLL2_REF_SEL  9
+#define IMX8MN_VIDEO_PLL1_REF_SEL  10
+#define IMX8MN_DRAM_PLL_REF_SEL11
+#define IMX8MN_GPU_PLL_REF_SEL 12
+#define IMX8MN_VPU_PLL_REF_SEL 13
+#define IMX8MN_ARM_PLL_REF_SEL 14
+#define IMX8MN_SYS_PLL1_REF_SEL15
+#define IMX8MN_SYS_PLL2_REF_SEL16
+#define IMX8MN_SYS_PLL3_REF_SEL17
+#define IMX8MN_AUDI

[PATCH V5 5/5] arm64: defconfig: Select CONFIG_CLK_IMX8MN by default

2019-06-18 Thread Anson . Huang

From: Anson Huang 

Enable CONFIG_CLK_IMX8MN to support i.MX8MN clock driver.

Signed-off-by: Anson Huang 
---
No changes.
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 7a21159..29f7768 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -659,6 +659,7 @@ CONFIG_COMMON_CLK_S2MPS11=y
 CONFIG_CLK_QORIQ=y
 CONFIG_COMMON_CLK_PWM=y
 CONFIG_CLK_IMX8MM=y
+CONFIG_CLK_IMX8MN=y
 CONFIG_CLK_IMX8MQ=y
 CONFIG_CLK_IMX8QXP=y
 CONFIG_TI_SCI_CLK=y
-- 
2.7.4

[PATCH V5 2/5] clk: imx8mm: Make 1416X/1443X PLL macro definitions common for usage

2019-06-18 Thread Anson . Huang

From: Anson Huang 

1416X/1443X PLL are used on i.MX8MM and i.MX8MN and maybe
other i.MX8M series SoC later, the macro definitions of
these PLLs' initialization should be common for usage.

Signed-off-by: Anson Huang 
---
No changes.
---
 drivers/clk/imx/clk-imx8mm.c | 17 -
 drivers/clk/imx/clk.h| 17 +
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/clk/imx/clk-imx8mm.c b/drivers/clk/imx/clk-imx8mm.c
index 6b8e75d..43fa9c3 100644
--- a/drivers/clk/imx/clk-imx8mm.c
+++ b/drivers/clk/imx/clk-imx8mm.c
@@ -26,23 +26,6 @@ static u32 share_count_dcss;
 static u32 share_count_pdm;
 static u32 share_count_nand;
 
-#define PLL_1416X_RATE(_rate, _m, _p, _s)  \
-   {   \
-   .rate   =   (_rate),\
-   .mdiv   =   (_m),   \
-   .pdiv   =   (_p),   \
-   .sdiv   =   (_s),   \
-   }
-
-#define PLL_1443X_RATE(_rate, _m, _p, _s, _k)  \
-   {   \
-   .rate   =   (_rate),\
-   .mdiv   =   (_m),   \
-   .pdiv   =   (_p),   \
-   .sdiv   =   (_s),   \
-   .kdiv   =   (_k),   \
-   }
-
 static const struct imx_pll14xx_rate_table imx8mm_pll1416x_tbl[] = {
PLL_1416X_RATE(18U, 225, 3, 0),
PLL_1416X_RATE(16U, 200, 3, 0),
diff --git a/drivers/clk/imx/clk.h b/drivers/clk/imx/clk.h
index d94d9cb..19d7b8b 100644
--- a/drivers/clk/imx/clk.h
+++ b/drivers/clk/imx/clk.h
@@ -153,6 +153,23 @@ enum imx_pllv3_type {
 struct clk_hw *imx_clk_hw_pllv3(enum imx_pllv3_type type, const char *name,
const char *parent_name, void __iomem *base, u32 div_mask);
 
+#define PLL_1416X_RATE(_rate, _m, _p, _s)  \
+   {   \
+   .rate   =   (_rate),\
+   .mdiv   =   (_m),   \
+   .pdiv   =   (_p),   \
+   .sdiv   =   (_s),   \
+   }
+
+#define PLL_1443X_RATE(_rate, _m, _p, _s, _k)  \
+   {   \
+   .rate   =   (_rate),\
+   .mdiv   =   (_m),   \
+   .pdiv   =   (_p),   \
+   .sdiv   =   (_s),   \
+   .kdiv   =   (_k),   \
+   }
+
 struct clk_hw *imx_clk_pllv4(const char *name, const char *parent_name,
 void __iomem *base);
 
-- 
2.7.4

[PATCH RESEND 1/8] s390: Start fallback of top-down mmap at mm->mmap_base

2019-06-18 Thread Alexandre Ghiti

In case of mmap failure in top-down mode, there is no need to go through
the whole address space again for the bottom-up fallback: the goal of this
fallback is to find, as a last resort, space between the top-down mmap base
and the stack, which is the only place not covered by the top-down mmap.

Signed-off-by: Alexandre Ghiti 
---
 arch/s390/mm/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
index cbc718ba6d78..4a222969843b 100644
--- a/arch/s390/mm/mmap.c
+++ b/arch/s390/mm/mmap.c
@@ -166,7 +166,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
if (addr & ~PAGE_MASK) {
VM_BUG_ON(addr != -ENOMEM);
info.flags = 0;
-   info.low_limit = TASK_UNMAPPED_BASE;
+   info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
addr = vm_unmapped_area(&info);
if (addr & ~PAGE_MASK)
-- 
2.20.1

Re: [PATCH v4 3/6] soc: qcom: geni: Add support for ACPI

2019-06-18 Thread Bjorn Andersson

On Wed 12 Jun 07:26 PDT 2019, Lee Jones wrote:

> When booting with ACPI as the active set of configuration tables,
> all; clocks, regulators, pin functions ect are expected to be at
> their ideal values/levels/rates, thus the associated frameworks
> are unavailable.  Ensure calls to these APIs are shielded when
> ACPI is enabled.
> 

Reviewed-by: Bjorn Andersson 

> Signed-off-by: Lee Jones 
> Acked-by: Ard Biesheuvel 
> ---
>  drivers/soc/qcom/qcom-geni-se.c | 21 +++--
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/soc/qcom/qcom-geni-se.c b/drivers/soc/qcom/qcom-geni-se.c
> index 6b8ef01472e9..d5cf953b4337 100644
> --- a/drivers/soc/qcom/qcom-geni-se.c
> +++ b/drivers/soc/qcom/qcom-geni-se.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  // Copyright (c) 2017-2018, The Linux Foundation. All rights reserved.
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -450,6 +451,9 @@ int geni_se_resources_off(struct geni_se *se)
>  {
>   int ret;
>  
> + if (has_acpi_companion(se->dev))
> + return 0;
> +
>   ret = pinctrl_pm_select_sleep_state(se->dev);
>   if (ret)
>   return ret;
> @@ -487,6 +491,9 @@ int geni_se_resources_on(struct geni_se *se)
>  {
>   int ret;
>  
> + if (has_acpi_companion(se->dev))
> + return 0;
> +
>   ret = geni_se_clks_on(se);
>   if (ret)
>   return ret;
> @@ -724,12 +731,14 @@ static int geni_se_probe(struct platform_device *pdev)
>   if (IS_ERR(wrapper->base))
>   return PTR_ERR(wrapper->base);
>  
> - wrapper->ahb_clks[0].id = "m-ahb";
> - wrapper->ahb_clks[1].id = "s-ahb";
> - ret = devm_clk_bulk_get(dev, NUM_AHB_CLKS, wrapper->ahb_clks);
> - if (ret) {
> - dev_err(dev, "Err getting AHB clks %d\n", ret);
> - return ret;
> + if (!has_acpi_companion(&pdev->dev)) {
> + wrapper->ahb_clks[0].id = "m-ahb";
> + wrapper->ahb_clks[1].id = "s-ahb";
> + ret = devm_clk_bulk_get(dev, NUM_AHB_CLKS, wrapper->ahb_clks);
> + if (ret) {
> + dev_err(dev, "Err getting AHB clks %d\n", ret);
> + return ret;
> + }
>   }
>  
>   dev_set_drvdata(dev, wrapper);
> -- 
> 2.17.1
>

Re: [PATCH 1/1] scsi: ufs-qcom: Add support for platforms booting ACPI

2019-06-18 Thread Lee Jones

Ard, Martin,

On Tue, 18 Jun 2019, Martin K. Petersen wrote:
> > New Qualcomm AArch64 based laptops are now available which use UFS
> > as their primary data storage medium.  These devices are supplied
> > with ACPI support out of the box.  This patch ensures the Qualcomm
> > UFS driver will be bound when the "QCOM24A5" H/W device is
> > advertised as present.
> 
> Applied to 5.3/scsi-queue. Thanks!

Ideal.  Thanks for your help.

-- 
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

[PATCH RESEND 0/8] Fix mmap base in bottom-up mmap

2019-06-18 Thread Alexandre Ghiti

This series fixes the fallback of the top-down mmap: in case of
failure, a bottom-up scheme can be tried as a last resort between
the top-down mmap base and the stack, hoping for a large unused stack
limit.

Lots of architectures and even mm code start this fallback
at TASK_UNMAPPED_BASE, which is useless since the top-down scheme
already failed on the whole address space: instead, simply use
mmap_base.

Along the way, it allows to get rid of of mmap_legacy_base and
mmap_compat_legacy_base from mm_struct.

Note that arm and mips already implement this behaviour.  

Alexandre Ghiti (8):
  s390: Start fallback of top-down mmap at mm->mmap_base
  sh: Start fallback of top-down mmap at mm->mmap_base
  sparc: Start fallback of top-down mmap at mm->mmap_base
  x86, hugetlbpage: Start fallback of top-down mmap at mm->mmap_base
  mm: Start fallback top-down mmap at mm->mmap_base
  parisc: Use mmap_base, not mmap_legacy_base, as low_limit for
bottom-up mmap
  x86: Use mmap_*base, not mmap_*legacy_base, as low_limit for bottom-up
mmap
  mm: Remove mmap_legacy_base and mmap_compat_legacy_code fields from
mm_struct

 arch/parisc/kernel/sys_parisc.c  |  8 +++-
 arch/s390/mm/mmap.c  |  2 +-
 arch/sh/mm/mmap.c|  2 +-
 arch/sparc/kernel/sys_sparc_64.c |  2 +-
 arch/sparc/mm/hugetlbpage.c  |  2 +-
 arch/x86/include/asm/elf.h   |  2 +-
 arch/x86/kernel/sys_x86_64.c |  4 ++--
 arch/x86/mm/hugetlbpage.c|  7 ---
 arch/x86/mm/mmap.c   | 20 +---
 include/linux/mm_types.h |  2 --
 mm/debug.c   |  4 ++--
 mm/mmap.c|  2 +-
 12 files changed, 26 insertions(+), 31 deletions(-)

-- 
2.20.1

Re: [PATCH] NTB: test: remove a duplicate check

2019-06-18 Thread Logan Gunthorpe

,,

On 2019-06-18 11:32 p.m., Dan Carpenter wrote:
> We already verified that the "nm->isr_ctx" allocation succeeded so there
> is no need to check again here.
> 
> Fixes: a6bed7a54165 ("NTB: Introduce NTB MSI Test Client")
> Signed-off-by: Dan Carpenter 

Hmm, yup, not sure how that slipped through, must have been a bad rebase
or something. Thanks Dan!

Reviewed-by: Logan Gunthorpe 

> ---
> Hey Logan, can pick a patch prefix when you're introducing a new module?
> "[PATCH] NTB/test: Introduce NTB MSI Test Client" or whatever.

I don't quite follow you there. NTB doesn't really have a good standard
for prefixes. NTB/test might have made sense.

Logan

[PATCH RESEND 0/8] Fix mmap base in bottom-up mmap

2019-06-18 Thread Alexandre Ghiti

(Sorry for the previous interrupted series) 

This series fixes the fallback of the top-down mmap: in case of
failure, a bottom-up scheme can be tried as a last resort between
the top-down mmap base and the stack, hoping for a large unused stack
limit.

Lots of architectures and even mm code start this fallback
at TASK_UNMAPPED_BASE, which is useless since the top-down scheme
already failed on the whole address space: instead, simply use
mmap_base.

Along the way, it allows to get rid of of mmap_legacy_base and
mmap_compat_legacy_base from mm_struct.

Note that arm and mips already implement this behaviour.

Alexandre Ghiti (8):
  s390: Start fallback of top-down mmap at mm->mmap_base
  sh: Start fallback of top-down mmap at mm->mmap_base
  sparc: Start fallback of top-down mmap at mm->mmap_base
  x86, hugetlbpage: Start fallback of top-down mmap at mm->mmap_base
  mm: Start fallback top-down mmap at mm->mmap_base
  parisc: Use mmap_base, not mmap_legacy_base, as low_limit for
bottom-up mmap
  x86: Use mmap_*base, not mmap_*legacy_base, as low_limit for bottom-up
mmap
  mm: Remove mmap_legacy_base and mmap_compat_legacy_code fields from
mm_struct

 arch/parisc/kernel/sys_parisc.c  |  8 +++-
 arch/s390/mm/mmap.c  |  2 +-
 arch/sh/mm/mmap.c|  2 +-
 arch/sparc/kernel/sys_sparc_64.c |  2 +-
 arch/sparc/mm/hugetlbpage.c  |  2 +-
 arch/x86/include/asm/elf.h   |  2 +-
 arch/x86/kernel/sys_x86_64.c |  4 ++--
 arch/x86/mm/hugetlbpage.c|  7 ---
 arch/x86/mm/mmap.c   | 20 +---
 include/linux/mm_types.h |  2 --
 mm/debug.c   |  4 ++--
 mm/mmap.c|  2 +-
 12 files changed, 26 insertions(+), 31 deletions(-)

-- 
2.20.1

Re: [PATCH RFC 2/3] fonts: Use BUILD_BUG_ON() for checking empty font table

2019-06-18 Thread Takashi Iwai

On Wed, 19 Jun 2019 01:05:58 +0200,
Randy Dunlap wrote:
> 
> On 6/18/19 1:34 PM, Takashi Iwai wrote:
> > We have a nice macro, and the check of emptiness of the font table can
> > be done in a simpler way.
> > 
> > Signed-off-by: Takashi Iwai 
> 
> Hi,
> 
> Looks good to me.
> Acked-by: Randy Dunlap 
> 
> Also, would you mind adding TER16x32 to Documentation/fb/fbcon.rst, here:
> (AFAIK that would be appropriate.)
> 
> 1. fbcon=font:
> 
>   Select the initial font to use. The value 'name' can be any of the
>   compiled-in fonts: 10x18, 6x10, 7x14, Acorn8x8, MINI4x6,
>   PEARL8x8, ProFont6x11, SUN12x22, SUN8x16, VGA8x16, VGA8x8.

OK, will submit another patch.


thanks,

Takashi

[PATCH] NTB: test: remove a duplicate check

2019-06-18 Thread Dan Carpenter

We already verified that the "nm->isr_ctx" allocation succeeded so there
is no need to check again here.

Fixes: a6bed7a54165 ("NTB: Introduce NTB MSI Test Client")
Signed-off-by: Dan Carpenter 
---
Hey Logan, can pick a patch prefix when you're introducing a new module?
"[PATCH] NTB/test: Introduce NTB MSI Test Client" or whatever.

 drivers/ntb/test/ntb_msi_test.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/ntb/test/ntb_msi_test.c b/drivers/ntb/test/ntb_msi_test.c
index 99d826ed9c34..9ba3c3162cd0 100644
--- a/drivers/ntb/test/ntb_msi_test.c
+++ b/drivers/ntb/test/ntb_msi_test.c
@@ -372,9 +372,6 @@ static int ntb_msit_probe(struct ntb_client *client, struct 
ntb_dev *ntb)
if (ret)
goto remove_dbgfs;
 
-   if (!nm->isr_ctx)
-   goto remove_dbgfs;
-
ntb_link_enable(ntb, NTB_SPEED_AUTO, NTB_WIDTH_AUTO);
 
return 0;
-- 
2.20.1

Re: [PATCH][next] platform/chrome: wilco_ec: fix null pointer dereference on failed kzalloc

2019-06-18 Thread Dan Carpenter

On Tue, Jun 18, 2019 at 04:39:24PM +0100, Colin King wrote:
> diff --git a/drivers/platform/chrome/wilco_ec/event.c 
> b/drivers/platform/chrome/wilco_ec/event.c
> index c975b76e6255..e251a989b152 100644
> --- a/drivers/platform/chrome/wilco_ec/event.c
> +++ b/drivers/platform/chrome/wilco_ec/event.c
> @@ -112,8 +112,11 @@ module_param(queue_size, int, 0644);
>  static struct ec_event_queue *event_queue_new(int capacity)
>  {
>   size_t entries_size = sizeof(struct ec_event *) * capacity;
> - struct ec_event_queue *q = kzalloc(sizeof(*q) + entries_size,
> -GFP_KERNEL);
> + struct ec_event_queue *q;
> +
> + q = kzalloc(sizeof(*q) + entries_size, GFP_KERNEL);
> + if (!q)
> + return NULL;

We have a new struct_size() macro designed for these allocations.

q = kzalloc(struct_size(q, entries, capacity), GFP_KERNEL);

The advantage is that it checks for integer overflows.

regards,
dan carpenter

Re: [PATCHv5 10/20] PCI: mobiveil: Fix the INTx process errors

2019-06-18 Thread Karthikeyan Mitran

On Fri, Jun 14, 2019 at 4:14 PM Lorenzo Pieralisi
 wrote:
>
> On Fri, Jun 14, 2019 at 12:38:51PM +0530, Karthikeyan Mitran wrote:
> > Hi Lorenzo and Hou Zhiqiang
> >  PAB_INTP_AMBA_MISC_STAT does have other status in the higher bits, it
> > should have been masked before checking for the status
>
> You are the maintainer for this driver, so if there is something to be
> changed you must post a patch to that extent, I do not understand what
> the above means, write the code to fix it, I won't do it.
>
> I am getting a bit annoyed with this Mobiveil driver so either you guys
> sort this out or I will have to remove it from the kernel.
>
> > Acked-by: Karthikeyan Mitran 
>
> Ok I assume this means you tested it but according to what you
> say above, are there still issues with this code path ? Should
> we update the patch ?
Tested-by: Karthikeyan Mitran 
This patch fixes the INTx status extraction and handling,
I don't see any need to update this patch.
>
> Moreover:
>
> https://kernelnewbies.org/PatchCulture
>
> Please read it and never top-post.
Thank you very much, for the information.

>
> Thanks,
> Lorenzo
>
> > On Wed, Jun 12, 2019 at 8:38 PM Lorenzo Pieralisi
> >  wrote:
> > >
> > > On Fri, Apr 12, 2019 at 08:36:12AM +, Z.q. Hou wrote:
> > > > From: Hou Zhiqiang 
> > > >
> > > > In the loop block, there is not code to update the loop key,
> > > > this patch updates the loop key by re-read the INTx status
> > > > register.
> > > >
> > > > This patch also add the clearing of the handled INTx status.
> > > >
> > > > Note: Need MV to test this fix.
> > >
> > > This means INTX were never tested and current code handling them is,
> > > AFAICS, an infinite loop which is very very bad.
> > >
> > > This is a gross bug and must be fixed as soon as possible.
> > >
> > > I want Karthikeyan ACK and Tested-by on this patch.
> > >
> > > Lorenzo
> > >
> > > > Fixes: 9af6bcb11e12 ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP 
> > > > driver")
> > > > Signed-off-by: Hou Zhiqiang 
> > > > Reviewed-by: Minghuan Lian 
> > > > Reviewed-by: Subrahmanya Lingappa 
> > > > ---
> > > > V5:
> > > >  - Corrected and retouched the subject and changelog.
> > > >
> > > >  drivers/pci/controller/pcie-mobiveil.c | 13 +
> > > >  1 file changed, 9 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/controller/pcie-mobiveil.c 
> > > > b/drivers/pci/controller/pcie-mobiveil.c
> > > > index 4ba458474e42..78e575e71f4d 100644
> > > > --- a/drivers/pci/controller/pcie-mobiveil.c
> > > > +++ b/drivers/pci/controller/pcie-mobiveil.c
> > > > @@ -361,6 +361,7 @@ static void mobiveil_pcie_isr(struct irq_desc *desc)
> > > >   /* Handle INTx */
> > > >   if (intr_status & PAB_INTP_INTX_MASK) {
> > > >   shifted_status = csr_readl(pcie, PAB_INTP_AMBA_MISC_STAT);
> > > > + shifted_status &= PAB_INTP_INTX_MASK;
> > > >   shifted_status >>= PAB_INTX_START;
> > > >   do {
> > > >   for_each_set_bit(bit, &shifted_status, 
> > > > PCI_NUM_INTX) {
> > > > @@ -372,12 +373,16 @@ static void mobiveil_pcie_isr(struct irq_desc 
> > > > *desc)
> > > >   dev_err_ratelimited(dev, 
> > > > "unexpected IRQ, INT%d\n",
> > > >   bit);
> > > >
> > > > - /* clear interrupt */
> > > > - csr_writel(pcie,
> > > > -shifted_status << 
> > > > PAB_INTX_START,
> > > > + /* clear interrupt handled */
> > > > + csr_writel(pcie, 1 << (PAB_INTX_START + 
> > > > bit),
> > > >  PAB_INTP_AMBA_MISC_STAT);
> > > >   }
> > > > - } while ((shifted_status >> PAB_INTX_START) != 0);
> > > > +
> > > > + shifted_status = csr_readl(pcie,
> > > > +
> > > > PAB_INTP_AMBA_MISC_STAT);
> > > > + shifted_status &= PAB_INTP_INTX_MASK;
> > > > + shifted_status >>= PAB_INTX_START;
> > > > + } while (shifted_status != 0);
> > > >   }
> > > >
> > > >   /* read extra MSI status register */
> > > > --
> > > > 2.17.1
> > > >
> >
> >
> >
> >

-- 
Mobiveil INC., CONFIDENTIALITY NOTICE: This e-mail message, including any 
attachments, is for the sole use of the intended recipient(s) and may 
contain proprietary confidential or privileged information or otherwise be 
protected by law. Any unauthorized review, use, disclosure or distribution 
is prohibited. If you are not the intended recipient, please notify the 
sender and destroy all copies and the original message.

Re: [PATCH] mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind

2019-06-18 Thread Michal Hocko

On Tue 18-06-19 14:13:16, Yang Shi wrote:
[...]
> > > > > Change migrate_page_add() to check if the page is movable or not, if 
> > > > > it
> > > > > is unmovable, just return -EIO.  We don't have to check non-LRU 
> > > > > movable
> > > > > pages since just zsmalloc and virtio-baloon support this.  And, they
> > > > > should be not able to reach here.
> > > > You are not checking whether the page is movable, right? You only rely
> > > > on PageLRU check which is not really an equivalent thing. There are
> > > > movable pages which are not LRU and also pages might be off LRU
> > > > temporarily for many reasons so this could lead to false positives.
> > > I'm supposed non-LRU movable pages could not reach here. Since most of 
> > > them
> > > are not mmapable, i.e. virtio-balloon, zsmalloc. zram device is mmapable,
> > > but the page fault to that vma would end up allocating user space pages
> > > which are on LRU. If I miss something please let me know.
> > That might be true right now but it is a very subtle assumption that
> > might break easily in the future. The point is still that even LRU pages
> > might be isolated from the LRU list temporarily and you do not want this
> > to cause the failure easily.
> 
> I used to have !__PageMovable(page), but it was removed since the
> aforementioned reason. I could add it back.
> 
> For the temporary off LRU page, I did a quick search, it looks the most
> paths have to acquire mmap_sem, so it can't race with us here. Page
> reclaim/compaction looks like the only race. But, since the mapping should
> be preserved even though the page is off LRU temporarily unless the page is
> reclaimed, so we should be able to exclude temporary off LRU pages by
> calling page_mapping() and page_anon_vma().
> 
> So, the fix may look like:
> 
> if (!PageLRU(head) && !__PageMovable(page)) {
>     if (!(page_mapping(page) || page_anon_vma(page)))
>         return -EIO;

This is getting even more muddy TBH. Is there any reason that we have to
handle this problem during the isolation phase rather the migration?
-- 
Michal Hocko
SUSE Labs

[PATCH 5/8] mm: Start fallback top-down mmap at mm->mmap_base

2019-06-18 Thread Alexandre Ghiti

In case of mmap failure in top-down mode, there is no need to go through
the whole address space again for the bottom-up fallback: the goal of this
fallback is to find, as a last resort, space between the top-down mmap base
and the stack, which is the only place not covered by the top-down mmap.

Signed-off-by: Alexandre Ghiti 
---
 mm/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index dedae10cb6e2..e563145c1ff4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2185,7 +2185,7 @@ arch_get_unmapped_area_topdown(struct file *filp, 
unsigned long addr,
if (offset_in_page(addr)) {
VM_BUG_ON(addr != -ENOMEM);
info.flags = 0;
-   info.low_limit = TASK_UNMAPPED_BASE;
+   info.low_limit = arch_get_mmap_base(addr, mm->mmap_base);
info.high_limit = mmap_end;
addr = vm_unmapped_area(&info);
}
-- 
2.20.1

Re: [PATCH 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

2019-06-18 Thread Michael Ellerman

Hi Naveen,

Sorry I meant to reply to this earlier .. :/

"Naveen N. Rao"  writes:
> With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to
> enable function tracing and profiling. So far, with dynamic ftrace, we
> used to only patch out the branch to _mcount(). However, mflr is
> executed by the branch unit that can only execute one per cycle on
> POWER9 and shared with branches, so it would be nice to avoid it where
> possible.
>
> We cannot simply nop out the mflr either. When enabling function
> tracing, there can be a race if tracing is enabled when some thread was
> interrupted after executing a nop'ed out mflr. In this case, the thread
> would execute the now-patched-in branch to _mcount() without having
> executed the preceding mflr.
>
> To solve this, we now enable function tracing in 2 steps: patch in the
> mflr instruction, use synchronize_rcu_tasks() to ensure all existing
> threads make progress, and then patch in the branch to _mcount(). We
> override ftrace_replace_code() with a powerpc64 variant for this
> purpose.

According to the ISA we're not allowed to patch mflr at runtime. See the
section on "CMODX".

I'm also not convinced the ordering between the two patches is
guaranteed by the ISA, given that there's possibly no isync on the other
CPU.

But I haven't had time to dig into it sorry, hopefully later in the
week?

cheers

> diff --git a/arch/powerpc/kernel/trace/ftrace.c 
> b/arch/powerpc/kernel/trace/ftrace.c
> index 517662a56bdc..5e2b29808af1 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -125,7 +125,7 @@ __ftrace_make_nop(struct module *mod,
>  {
>   unsigned long entry, ptr, tramp;
>   unsigned long ip = rec->ip;
> - unsigned int op, pop;
> + unsigned int op;
>  
>   /* read where this goes */
>   if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
> @@ -160,8 +160,6 @@ __ftrace_make_nop(struct module *mod,
>  
>  #ifdef CONFIG_MPROFILE_KERNEL
>   /* When using -mkernel_profile there is no load to jump over */
> - pop = PPC_INST_NOP;
> -
>   if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
>   pr_err("Fetching instruction at %lx failed.\n", ip - 4);
>   return -EFAULT;
> @@ -169,26 +167,23 @@ __ftrace_make_nop(struct module *mod,
>  
>   /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
>   if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) {
> - pr_err("Unexpected instruction %08x around bl _mcount\n", op);
> + pr_err("Unexpected instruction %08x before bl _mcount\n", op);
>   return -EINVAL;
>   }
> -#else
> - /*
> -  * Our original call site looks like:
> -  *
> -  * bl 
> -  * ld r2,XX(r1)
> -  *
> -  * Milton Miller pointed out that we can not simply nop the branch.
> -  * If a task was preempted when calling a trace function, the nops
> -  * will remove the way to restore the TOC in r2 and the r2 TOC will
> -  * get corrupted.
> -  *
> -  * Use a b +8 to jump over the load.
> -  */
>  
> - pop = PPC_INST_BRANCH | 8;  /* b +8 */
> + /* We should patch out the bl to _mcount first */
> + if (patch_instruction((unsigned int *)ip, PPC_INST_NOP)) {
> + pr_err("Patching NOP failed.\n");
> + return -EPERM;
> + }
>  
> + /* then, nop out the preceding 'mflr r0' as an optimization */
> + if (op == PPC_INST_MFLR &&
> + patch_instruction((unsigned int *)(ip - 4), PPC_INST_NOP)) {
> + pr_err("Patching NOP failed.\n");
> + return -EPERM;
> + }
> +#else
>   /*
>* Check what is in the next instruction. We can see ld r2,40(r1), but
>* on first pass after boot we will see mflr r0.
> @@ -202,12 +197,25 @@ __ftrace_make_nop(struct module *mod,
>   pr_err("Expected %08x found %08x\n", PPC_INST_LD_TOC, op);
>   return -EINVAL;
>   }
> -#endif /* CONFIG_MPROFILE_KERNEL */
>  
> - if (patch_instruction((unsigned int *)ip, pop)) {
> + /*
> +  * Our original call site looks like:
> +  *
> +  * bl 
> +  * ld r2,XX(r1)
> +  *
> +  * Milton Miller pointed out that we can not simply nop the branch.
> +  * If a task was preempted when calling a trace function, the nops
> +  * will remove the way to restore the TOC in r2 and the r2 TOC will
> +  * get corrupted.
> +  *
> +  * Use a b +8 to jump over the load.
> +  */
> + if (patch_instruction((unsigned int *)ip, PPC_INST_BRANCH | 8)) {
>   pr_err("Patching NOP failed.\n");
>   return -EPERM;
>   }
> +#endif /* CONFIG_MPROFILE_KERNEL */
>  
>   return 0;
>  }
> @@ -421,6 +429,26 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace 
> *rec, unsigned long addr)
>   return -EPERM;
>   }
>  
> +#ifdef CONFIG_MPROFILE_KERNEL
> + /* Nop out the preceding 'mflr r0' as an optim

[PATCH 4/8] x86, hugetlbpage: Start fallback of top-down mmap at mm->mmap_base

2019-06-18 Thread Alexandre Ghiti

In case of mmap failure in top-down mode, there is no need to go through
the whole address space again for the bottom-up fallback: the goal of this
fallback is to find, as a last resort, space between the top-down mmap base
and the stack, which is the only place not covered by the top-down mmap.

Signed-off-by: Alexandre Ghiti 
---
 arch/x86/mm/hugetlbpage.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index fab095362c50..4b90339aef50 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -106,11 +106,12 @@ static unsigned long 
hugetlb_get_unmapped_area_topdown(struct file *file,
 {
struct hstate *h = hstate_file(file);
struct vm_unmapped_area_info info;
+   unsigned long mmap_base = get_mmap_base(0);
 
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
info.length = len;
info.low_limit = PAGE_SIZE;
-   info.high_limit = get_mmap_base(0);
+   info.high_limit = mmap_base;
 
/*
 * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
@@ -132,7 +133,7 @@ static unsigned long 
hugetlb_get_unmapped_area_topdown(struct file *file,
if (addr & ~PAGE_MASK) {
VM_BUG_ON(addr != -ENOMEM);
info.flags = 0;
-   info.low_limit = TASK_UNMAPPED_BASE;
+   info.low_limit = mmap_base;
info.high_limit = TASK_SIZE_LOW;
addr = vm_unmapped_area(&info);
}
-- 
2.20.1

RE: [PATCH] net: stmmac: add sanity check to device_property_read_u32_array call

2019-06-18 Thread Martin Blumenstingl

Hi Colin,

> Currently the call to device_property_read_u32_array is not error checked
> leading to potential garbage values in the delays array that are then used
> in msleep delays.  Add a sanity check to the property fetching.
> 
> Addresses-Coverity: ("Uninitialized scalar variable")
> Signed-off-by: Colin Ian King 
I have also sent a patch [0] to fix initialize the array.
can you please look at my patch so we can work out which one to use?

my concern is that the "snps,reset-delays-us" property is optional,
the current dt-bindings documentation states that it's a required
property. in reality it isn't, there are boards (two examples are
mentioned in my patch: [0]) without it.

so I believe that the resulting behavior has to be:
1. don't delay if this property is missing (instead of delaying for
ms)
2. don't error out if this property is missing

your patch covers #1, can you please check whether #2 is also covered?
I tested case #2 when submitting my patch and it worked fine (even
though I could not reproduce the garbage values which are being read
on some boards)


Thank you!
Martin


[0] https://lkml.org/lkml/2019/4/19/638

[PATCH 3/8] sparc: Start fallback of top-down mmap at mm->mmap_base

2019-06-18 Thread Alexandre Ghiti

In case of mmap failure in top-down mode, there is no need to go through
the whole address space again for the bottom-up fallback: the goal of this
fallback is to find, as a last resort, space between the top-down mmap base
and the stack, which is the only place not covered by the top-down mmap.

Signed-off-by: Alexandre Ghiti 
---
 arch/sparc/kernel/sys_sparc_64.c | 2 +-
 arch/sparc/mm/hugetlbpage.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
index ccc88926bc00..ea1de1e5fa8d 100644
--- a/arch/sparc/kernel/sys_sparc_64.c
+++ b/arch/sparc/kernel/sys_sparc_64.c
@@ -206,7 +206,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
if (addr & ~PAGE_MASK) {
VM_BUG_ON(addr != -ENOMEM);
info.flags = 0;
-   info.low_limit = TASK_UNMAPPED_BASE;
+   info.low_limit = mm->mmap_base;
info.high_limit = STACK_TOP32;
addr = vm_unmapped_area(&info);
}
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index f78793a06bbd..9c67f805abc8 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -86,7 +86,7 @@ hugetlb_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
if (addr & ~PAGE_MASK) {
VM_BUG_ON(addr != -ENOMEM);
info.flags = 0;
-   info.low_limit = TASK_UNMAPPED_BASE;
+   info.low_limit = mm->mmap_base;
info.high_limit = STACK_TOP32;
addr = vm_unmapped_area(&info);
}
-- 
2.20.1

Re: [PATCH V4 1/2] PCI: dwc: Add API support to de-initialize host

2019-06-18 Thread Kishon Vijay Abraham I

Hi Lorenzo,

On 18/06/19 7:58 PM, Lorenzo Pieralisi wrote:
> On Tue, Jun 18, 2019 at 04:21:17PM +0530, Vidya Sagar wrote:
> 
> [...]
> 
>>> 2) It is not related to this patch but I fail to see the reasoning
>>> behind the __ in __dw_pci_read_dbi(), there is no no-underscore
>>> equivalent so its definition is somewhat questionable, maybe
>>> we should clean-it up (for dbi2 alike).
>> Separate no-underscore versions are present in pcie-designware.h for
>> each width (i.e. l/w/b) as inline and are calling __ versions passing
>> size as argument.
> 
> I understand - the __ prologue was added in b50b2db266d8 maybe
> Kishon can help us understand the __ rationale.
> 
> I am happy to merge it as is, I was just curious about the
> __ annotation (not related to this patch).

In commit b50b2db266d8a8c303e8d88590 ("PCI: dwc: all: Modify dbi accessors to
take dbi_base as argument"), dbi accessors was modified to take dbi_base as
argument (since we wanted to write to dbics2 address space). We didn't want to
change all the drivers invoking dbi accessors to pass the dbi_base. So we added
"__" variant to take dbi_base as argument and the drivers continued to invoke
existing dbi accessors which in-turn invoked "__" version with dbi_base as
argument.

I agree there could be some cleanup since in commit
a509d7d9af5ebf86ffbefa98e49761d ("PCI: dwc: all: Modify dbi accessors to access
data of 4/2/1 bytes"), we modified __dw_pcie_readl_dbi() to
__dw_pcie_write_dbi() when it could have been directly modified to
dw_pcie_write_dbi().

Thanks
Kishon

[PATCH 2/8] sh: Start fallback of top-down mmap at mm->mmap_base

2019-06-18 Thread Alexandre Ghiti

In case of mmap failure in top-down mode, there is no need to go through
the whole address space again for the bottom-up fallback: the goal of this
fallback is to find, as a last resort, space between the top-down mmap base
and the stack, which is the only place not covered by the top-down mmap.

Signed-off-by: Alexandre Ghiti 
---
 arch/sh/mm/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/sh/mm/mmap.c b/arch/sh/mm/mmap.c
index 6a1a1297baae..4c7da92473dd 100644
--- a/arch/sh/mm/mmap.c
+++ b/arch/sh/mm/mmap.c
@@ -135,7 +135,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
if (addr & ~PAGE_MASK) {
VM_BUG_ON(addr != -ENOMEM);
info.flags = 0;
-   info.low_limit = TASK_UNMAPPED_BASE;
+   info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
addr = vm_unmapped_area(&info);
}
-- 
2.20.1

[PATCH 1/8] s390: Start fallback of top-down mmap at mm->mmap_base

2019-06-18 Thread Alexandre Ghiti

In case of mmap failure in top-down mode, there is no need to go through
the whole address space again for the bottom-up fallback: the goal of this
fallback is to find, as a last resort, space between the top-down mmap base
and the stack, which is the only place not covered by the top-down mmap.

Signed-off-by: Alexandre Ghiti 
---
 arch/s390/mm/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
index cbc718ba6d78..4a222969843b 100644
--- a/arch/s390/mm/mmap.c
+++ b/arch/s390/mm/mmap.c
@@ -166,7 +166,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
if (addr & ~PAGE_MASK) {
VM_BUG_ON(addr != -ENOMEM);
info.flags = 0;
-   info.low_limit = TASK_UNMAPPED_BASE;
+   info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
addr = vm_unmapped_area(&info);
if (addr & ~PAGE_MASK)
-- 
2.20.1

[PATCH 0/8] Fix mmap base in bottom-up mmap

2019-06-18 Thread Alexandre Ghiti

This series fixes the fallback of the top-down mmap: in case of
failure, a bottom-up scheme can be tried as a last resort between
the top-down mmap base and the stack, hoping for a large unused stack
limit.

Lots of architectures and even mm code start this fallback
at TASK_UNMAPPED_BASE, which is useless since the top-down scheme
already failed on the whole address space: instead, simply use
mmap_base.

Along the way, it allows to get rid of of mmap_legacy_base and
mmap_compat_legacy_base from mm_struct.

Note that arm and mips already implement this behaviour. 

Alexandre Ghiti (8):
  s390: Start fallback of top-down mmap at mm->mmap_base
  sh: Start fallback of top-down mmap at mm->mmap_base
  sparc: Start fallback of top-down mmap at mm->mmap_base
  x86, hugetlbpage: Start fallback of top-down mmap at mm->mmap_base
  mm: Start fallback top-down mmap at mm->mmap_base
  parisc: Use mmap_base, not mmap_legacy_base, as low_limit for
bottom-up mmap
  x86: Use mmap_*base, not mmap_*legacy_base, as low_limit for bottom-up
mmap
  mm: Remove mmap_legacy_base and mmap_compat_legacy_code fields from
mm_struct

 arch/parisc/kernel/sys_parisc.c  |  8 +++-
 arch/s390/mm/mmap.c  |  2 +-
 arch/sh/mm/mmap.c|  2 +-
 arch/sparc/kernel/sys_sparc_64.c |  2 +-
 arch/sparc/mm/hugetlbpage.c  |  2 +-
 arch/x86/include/asm/elf.h   |  2 +-
 arch/x86/kernel/sys_x86_64.c |  4 ++--
 arch/x86/mm/hugetlbpage.c|  7 ---
 arch/x86/mm/mmap.c   | 20 +---
 include/linux/mm_types.h |  2 --
 mm/debug.c   |  4 ++--
 mm/mmap.c|  2 +-
 12 files changed, 26 insertions(+), 31 deletions(-)

-- 
2.20.1

Re: [LINUX PATCH v12 3/3] mtd: rawnand: arasan: Add support for Arasan NAND Flash Controller

2019-06-18 Thread Naga Sureshkumar Relli

On Mon, Jan 28, 2019 at 10:27:39AM +0100, Miquel Raynal wrote:
Hi Miquel,

> Hi Naga,
> 
> Naga Sureshkumar Relli  wrote on Mon, 28 Jan 2019
> 06:04:53 +:
> 
> > Hi Boris & Miquel,
> > 
> > Could you please provide your thoughts on this driver to support HW-ECC?
> > As I said previously, there is no way to detect errors beyond N bit.
> > I am ok to update the driver based on your inputs.
> 
> We won't support the ECC engine. It simply cannot be used reliably.
> 
> I am working on a generic ECC engine object. It's gonna take a few
> months until it gets merged but after that you could update the
> controller driver to drop any ECC-related function. Although the ECC

Could you please let me know that, when can we expect generic ECC engine
update in mtd NAND?
Based on that, i will plan to update the ARASAN NAND driver along with your
comments mentioned above under this update,
as you know there is a limiation in ARASAN NAND controller to detect
ECC errors.
i am following this series https://patchwork.kernel.org/patch/10838705/

Thanks,
Naga Sureshkumar Relli
> engine part is blocking, raw access still look wrong and the driver
> still needs changes.
> 
> 
> Thanks,
> Miquèl
> 
> __
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

[GIT PULL] ARM: TI SOC updates for v5.3

2019-06-18 Thread Santosh Shilimkar

The following changes since commit cd6c84d8f0cdc911df435bb075ba22ce3c605b07:

  Linux 5.2-rc2 (2019-05-26 16:49:19 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone.git 
tags/drivers_soc_for_5.3

for you to fetch changes up to 4c960505df44b94001178575a505dd8315086edc:

  firmware: ti_sci: Fix gcc unused-but-set-variable warning (2019-06-18 
21:32:25 -0700)


SOC: TI SCI updates for v5.3

- Couple of fixes to handle resource ranges and
  requesting response always from firmware;
- Add processor control
- Add support APIs for DMA
- Fix the SPDX license plate
- Unused varible warning fix


Andrew F. Davis (1):
  firmware: ti_sci: Always request response from firmware

Nishad Kamdar (1):
  firmware: ti_sci: Use the correct style for SPDX License Identifier

Peter Ujfalusi (2):
  firmware: ti_sci: Add resource management APIs for ringacc, psi-l and udma
  firmware: ti_sci: Parse all resource ranges even if some is not available

Suman Anna (1):
  firmware: ti_sci: Add support for processor control

YueHaibing (1):
  firmware: ti_sci: Fix gcc unused-but-set-variable warning

 drivers/firmware/ti_sci.c  | 1143 +++-
 drivers/firmware/ti_sci.h  |  812 ++-
 include/linux/soc/ti/ti_sci_protocol.h |  246 +++
 3 files changed, 2051 insertions(+), 150 deletions(-)

Re: [PATCH 5.1 000/115] 5.1.12-stable review

2019-06-18 Thread Naresh Kamboju

On Tue, 18 Jun 2019 at 19:05, Greg Kroah-Hartman
 wrote:
>
> On Tue, Jun 18, 2019 at 06:04:25PM +0530, Naresh Kamboju wrote:
> > On Tue, 18 Jun 2019 at 02:50, Greg Kroah-Hartman
> >  wrote:
> > >
> > > This is the start of the stable review cycle for the 5.1.12 release.
> > > There are 115 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > >
> > > Responses should be made by Wed 19 Jun 2019 09:06:21 PM UTC.
> > > Anything received after that time might be too late.
> > >
> > > The whole patch series can be found in one patch at:
> > > 
> > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.1.12-rc1.gz
> > > or in the git tree and branch at:
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > > linux-5.1.y
> > > and the diffstat can be found below.
> > >
> > > thanks,
> > >
> > > greg k-h
> >
> > Results from Linaro’s test farm.
> > No regressions on arm64, arm, x86_64, and i386.
> >
> > NOTE:
> > kernel/workqueue.c:3030 __flush_work+0x2c2/0x2d0
> > Kernel warning is been fixed by below patch.
> >
> > > John Fastabend 
> > > bpf: sockmap, only stop/flush strp if it was enabled at some point
>
> What is the git commit id for this patch?

 Upstream commit 014894360ec95abe868e94416b3dd6569f6e2c0c

- Naresh

Re: [PATCH v3 -next] firmware: ti_sci: Fix gcc unused-but-set-variable warning

2019-06-18 Thread santosh.shilim...@oracle.com


On 6/17/19 11:41 AM, Suman Anna wrote:

On 6/15/19 7:50 AM, YueHaibing wrote:

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/firmware/ti_sci.c: In function ti_sci_cmd_ring_config:
drivers/firmware/ti_sci.c:2035:17: warning: variable dev set but not used 
[-Wunused-but-set-variable]
drivers/firmware/ti_sci.c: In function ti_sci_cmd_ring_get_config:
drivers/firmware/ti_sci.c:2104:17: warning: variable dev set but not used 
[-Wunused-but-set-variable]
drivers/firmware/ti_sci.c: In function ti_sci_cmd_rm_udmap_tx_ch_cfg:
drivers/firmware/ti_sci.c:2287:17: warning: variable dev set but not used 
[-Wunused-but-set-variable]
drivers/firmware/ti_sci.c: In function ti_sci_cmd_rm_udmap_rx_ch_cfg:
drivers/firmware/ti_sci.c:2357:17: warning: variable dev set but not used 
[-Wunused-but-set-variable]

Use the 'dev' variable instead of info->dev to fix this.

Reported-by: Hulk Robot 
Signed-off-by: YueHaibing 


Acked-by: Suman Anna 

Hi Santosh,
Can you pick up this patch, goes on top of your for_5.3/driver-soc branch?


Applied.

Re: [PATCH] firmware: ti_sci: Use the correct style for SPDX License Identifier

2019-06-18 Thread santosh.shilim...@oracle.com


On 6/14/19 6:57 AM, Nishad Kamdar wrote:

This patch corrects the SPDX License Identifier style
in header file related to Firmware Drivers for Texas
Instruments SCI Protocol.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used)

Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46

Suggested-by: Joe Perches 
Signed-off-by: Nishad Kamdar 
---

Applied

Re: [PATCH v2 1/1] cpuidle-powernv : forced wakeup for stop states

2019-06-18 Thread Nicholas Piggin

Abhishek Goel's on June 17, 2019 7:56 pm:
> Currently, the cpuidle governors determine what idle state a idling CPU
> should enter into based on heuristics that depend on the idle history on
> that CPU. Given that no predictive heuristic is perfect, there are cases
> where the governor predicts a shallow idle state, hoping that the CPU will
> be busy soon. However, if no new workload is scheduled on that CPU in the
> near future, the CPU may end up in the shallow state.
> 
> This is problematic, when the predicted state in the aforementioned
> scenario is a shallow stop state on a tickless system. As we might get
> stuck into shallow states for hours, in absence of ticks or interrupts.
> 
> To address this, We forcefully wakeup the cpu by setting the
> decrementer. The decrementer is set to a value that corresponds with the
> residency of the next available state. Thus firing up a timer that will
> forcefully wakeup the cpu. Few such iterations will essentially train the
> governor to select a deeper state for that cpu, as the timer here
> corresponds to the next available cpuidle state residency. Thus, cpu will
> eventually end up in the deepest possible state.
> 
> Signed-off-by: Abhishek Goel 
> ---
> 
> Auto-promotion
>  v1 : started as auto promotion logic for cpuidle states in generic
> driver
>  v2 : Removed timeout_needed and rebased the code to upstream kernel
> Forced-wakeup
>  v1 : New patch with name of forced wakeup started
>  v2 : Extending the forced wakeup logic for all states. Setting the
> decrementer instead of queuing up a hrtimer to implement the logic.
> 
>  drivers/cpuidle/cpuidle-powernv.c | 38 +++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/drivers/cpuidle/cpuidle-powernv.c 
> b/drivers/cpuidle/cpuidle-powernv.c
> index 84b1ebe212b3..bc9ca18ae7e3 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -46,6 +46,26 @@ static struct stop_psscr_table 
> stop_psscr_table[CPUIDLE_STATE_MAX] __read_mostly
>  static u64 default_snooze_timeout __read_mostly;
>  static bool snooze_timeout_en __read_mostly;
>  
> +static u64 forced_wakeup_timeout(struct cpuidle_device *dev,
> +  struct cpuidle_driver *drv,
> +  int index)
> +{
> + int i;
> +
> + for (i = index + 1; i < drv->state_count; i++) {
> + struct cpuidle_state *s = &drv->states[i];
> + struct cpuidle_state_usage *su = &dev->states_usage[i];
> +
> + if (s->disabled || su->disable)
> + continue;
> +
> + return (s->target_residency + 2 * s->exit_latency) *
> + tb_ticks_per_usec;
> + }
> +
> + return 0;
> +}

It would be nice to not have this kind of loop iteration in the
idle fast path. Can we add a flag or something to the idle state?

> +
>  static u64 get_snooze_timeout(struct cpuidle_device *dev,
> struct cpuidle_driver *drv,
> int index)
> @@ -144,8 +164,26 @@ static int stop_loop(struct cpuidle_device *dev,
>struct cpuidle_driver *drv,
>int index)
>  {
> + u64 dec_expiry_tb, dec, timeout_tb, forced_wakeup;
> +
> + dec = mfspr(SPRN_DEC);
> + timeout_tb = forced_wakeup_timeout(dev, drv, index);
> + forced_wakeup = 0;
> +
> + if (timeout_tb && timeout_tb < dec) {
> + forced_wakeup = 1;
> + dec_expiry_tb = mftb() + dec;
> + }

The compiler probably can't optimise away the SPR manipulations so try
to avoid them if possible.

> +
> + if (forced_wakeup)
> + mtspr(SPRN_DEC, timeout_tb);

This should just be put in the above 'if'.

> +
>   power9_idle_type(stop_psscr_table[index].val,
>stop_psscr_table[index].mask);
> +
> + if (forced_wakeup)
> + mtspr(SPRN_DEC, dec_expiry_tb - mftb());

This will sometimes go negative and result in another timer interrupt.

It also breaks irq work (which can be set here by machine check I
believe.

May need to implement some timer code to do this for you.

static void reset_dec_after_idle(void)
{
u64 now;
u64 *next_tb;

if (test_irq_work_pending())
return;
now = mftb;
next_tb = this_cpu_ptr(&decrementers_next_tb);

if (now >= *next_tb)
return;
set_dec(*next_tb - now);
if (test_irq_work_pending())
set_dec(1);
}

Something vaguely like that. See timer_interrupt().

Thanks,
Nick

[PATCH 0/1] One cleanup patch for FPGA

2019-06-18 Thread Moritz Fischer

Hi Greg,

please take this cleanup patch.
It's been on the list but somehow fell through the cracks.

Thanks,
Moritz

Enrico Weigelt (1):
  drivers: fpga: Kconfig: pedantic cleanups

 drivers/fpga/Kconfig | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.22.0

[PATCH 1/1] drivers: fpga: Kconfig: pedantic cleanups

2019-06-18 Thread Moritz Fischer

From: Enrico Weigelt 

Formatting of Kconfig files doesn't look so pretty, so just
take damp cloth and clean it up.

Signed-off-by: Enrico Weigelt 
Signed-off-by: Moritz Fischer 
---
 drivers/fpga/Kconfig | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
index 8072c195d831..474f304ec109 100644
--- a/drivers/fpga/Kconfig
+++ b/drivers/fpga/Kconfig
@@ -26,9 +26,9 @@ config FPGA_MGR_SOCFPGA_A10
  FPGA manager driver support for Altera Arria10 SoCFPGA.
 
 config ALTERA_PR_IP_CORE
-tristate "Altera Partial Reconfiguration IP Core"
-help
-  Core driver support for Altera Partial Reconfiguration IP component
+   tristate "Altera Partial Reconfiguration IP Core"
+   help
+ Core driver support for Altera Partial Reconfiguration IP component
 
 config ALTERA_PR_IP_CORE_PLAT
tristate "Platform support of Altera Partial Reconfiguration IP Core"
-- 
2.22.0

[PATCH V6 1/3] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory()

2019-06-18 Thread Anshuman Khandual

Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs
entries between memory block and node. It first checks pfn validity with
pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config
(arm64 has this enabled) pfn_valid_within() calls pfn_valid().

pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID)
which scans all mapped memblock regions with memblock_is_map_memory(). This
creates a problem in memory hot remove path which has already removed given
memory range from memory block with memblock_[remove|free] before arriving
at unregister_mem_sect_under_nodes(). Hence get_nid_for_pfn() returns -1
skipping subsequent sysfs_remove_link() calls leaving node <-> memory block
sysfs entries as is. Subsequent memory add operation hits BUG_ON() because
of existing sysfs entries.

[   62.007176] NUMA: Unknown node for memory at 0x68000, assuming node 0
[   62.052517] [ cut here ]
[   62.053211] kernel BUG at mm/memory_hotplug.c:1143!
[   62.053868] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[   62.054589] Modules linked in:
[   62.054999] CPU: 19 PID: 3275 Comm: bash Not tainted 
5.1.0-rc2-4-g28cea40b2683 #41
[   62.056274] Hardware name: linux,dummy-virt (DT)
[   62.057166] pstate: 4045 (nZcv daif +PAN -UAO)
[   62.058083] pc : add_memory_resource+0x1cc/0x1d8
[   62.058961] lr : add_memory_resource+0x10c/0x1d8
[   62.059842] sp : 168b3ce0
[   62.060477] x29: 168b3ce0 x28: 8005db546c00
[   62.061501] x27:  x26: 
[   62.062509] x25: 111ef000 x24: 111ef5d0
[   62.063520] x23:  x22: 0006bfff
[   62.064540] x21: ffef x20: 006c
[   62.065558] x19: 0068 x18: 0024
[   62.066566] x17:  x16: 
[   62.067579] x15:  x14: 8005e412e890
[   62.068588] x13: 8005d6b105d8 x12: 
[   62.069610] x11: 8005d6b10490 x10: 0040
[   62.070615] x9 : 8005e412e898 x8 : 8005e412e890
[   62.071631] x7 : 8005d6b105d8 x6 : 8005db546c00
[   62.072640] x5 : 0001 x4 : 0002
[   62.073654] x3 : 8005d7049480 x2 : 0002
[   62.074666] x1 : 0003 x0 : ffef
[   62.075685] Process bash (pid: 3275, stack limit = 0xd754280f)
[   62.076930] Call trace:
[   62.077411]  add_memory_resource+0x1cc/0x1d8
[   62.078227]  __add_memory+0x70/0xa8
[   62.078901]  probe_store+0xa4/0xc8
[   62.079561]  dev_attr_store+0x18/0x28
[   62.080270]  sysfs_kf_write+0x40/0x58
[   62.080992]  kernfs_fop_write+0xcc/0x1d8
[   62.081744]  __vfs_write+0x18/0x40
[   62.082400]  vfs_write+0xa4/0x1b0
[   62.083037]  ksys_write+0x5c/0xc0
[   62.083681]  __arm64_sys_write+0x18/0x20
[   62.084432]  el0_svc_handler+0x88/0x100
[   62.085177]  el0_svc+0x8/0xc

Re-ordering memblock_[free|remove]() with arch_remove_memory() solves the
problem on arm64 as pfn_valid() behaves correctly and returns positive
as memblock for the address range still exists. arch_remove_memory()
removes applicable memory sections from zone with __remove_pages() and
tears down kernel linear mapping. Removing memblock regions afterwards
is safe because there is no other memblock (bootmem) allocator user that
late. So nobody is going to allocate from the removed range just to blow
up later. Also nobody should be using the bootmem allocated range else
we wouldn't allow to remove it. So reordering is indeed safe.

Reviewed-by: David Hildenbrand 
Reviewed-by: Oscar Salvador 
Acked-by: Mark Rutland 
Acked-by: Michal Hocko 
Signed-off-by: Anshuman Khandual 
---
 mm/memory_hotplug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a88c5f3..cfa5fac 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1831,13 +1831,13 @@ static int __ref try_remove_memory(int nid, u64 start, 
u64 size)
 
/* remove memmap entry */
firmware_map_remove(start, start + size, "System RAM");
-   memblock_free(start, size);
-   memblock_remove(start, size);
 
/* remove memory block devices before removing memory */
remove_memory_block_devices(start, size);
 
arch_remove_memory(nid, start, size, NULL);
+   memblock_free(start, size);
+   memblock_remove(start, size);
__release_memory_resource(start, size);
 
try_offline_node(nid);
-- 
2.7.4

[PATCH V6 3/3] arm64/mm: Enable memory hot remove

2019-06-18 Thread Anshuman Khandual

The arch code for hot-remove must tear down portions of the linear map and
vmemmap corresponding to memory being removed. In both cases the page
tables mapping these regions must be freed, and when sparse vmemmap is in
use the memory backing the vmemmap must also be freed.

This patch adds a new remove_pagetable() helper which can be used to tear
down either region, and calls it from vmemmap_free() and
___remove_pgd_mapping(). The sparse_vmap argument determines whether the
backing memory will be freed.

remove_pagetable() makes two distinct passes over the kernel page table.
In the first pass it unmaps, invalidates applicable TLB cache and frees
backing memory if required (vmemmap) for each mapped leaf entry. In the
second pass it looks for empty page table sections whose page table page
can be unmapped, TLB invalidated and freed.

While freeing intermediate level page table pages bail out if any of its
entries are still valid. This can happen for partially filled kernel page
table either from a previously attempted failed memory hot add or while
removing an address range which does not span the entire page table page
range.

The vmemmap region may share levels of table with the vmalloc region.
There can be conflicts between hot remove freeing page table pages with
a concurrent vmalloc() walking the kernel page table. This conflict can
not just be solved by taking the init_mm ptl because of existing locking
scheme in vmalloc(). Hence unlike linear mapping, skip freeing page table
pages while tearing down vmemmap mapping.

While here update arch_add_memory() to handle __add_pages() failures by
just unmapping recently added kernel linear mapping. Now enable memory hot
remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE.

This implementation is overall inspired from kernel page table tear down
procedure on X86 architecture.

Acked-by: David Hildenbrand 
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/Kconfig  |   3 +
 arch/arm64/mm/mmu.c | 290 ++--
 2 files changed, 284 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6426f48..9375f26 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -270,6 +270,9 @@ config HAVE_GENERIC_GUP
 config ARCH_ENABLE_MEMORY_HOTPLUG
def_bool y
 
+config ARCH_ENABLE_MEMORY_HOTREMOVE
+   def_bool y
+
 config SMP
def_bool y
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 93ed0df..9e80a94 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -733,6 +733,250 @@ int kern_addr_valid(unsigned long addr)
 
return pfn_valid(pte_pfn(pte));
 }
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static void free_hotplug_page_range(struct page *page, size_t size)
+{
+   WARN_ON(!page || PageReserved(page));
+   free_pages((unsigned long)page_address(page), get_order(size));
+}
+
+static void free_hotplug_pgtable_page(struct page *page)
+{
+   free_hotplug_page_range(page, PAGE_SIZE);
+}
+
+static void free_pte_table(pmd_t *pmdp, unsigned long addr)
+{
+   struct page *page;
+   pte_t *ptep;
+   int i;
+
+   ptep = pte_offset_kernel(pmdp, 0UL);
+   for (i = 0; i < PTRS_PER_PTE; i++) {
+   if (!pte_none(READ_ONCE(ptep[i])))
+   return;
+   }
+
+   page = pmd_page(READ_ONCE(*pmdp));
+   pmd_clear(pmdp);
+   __flush_tlb_kernel_pgtable(addr);
+   free_hotplug_pgtable_page(page);
+}
+
+static void free_pmd_table(pud_t *pudp, unsigned long addr)
+{
+   struct page *page;
+   pmd_t *pmdp;
+   int i;
+
+   if (CONFIG_PGTABLE_LEVELS <= 2)
+   return;
+
+   pmdp = pmd_offset(pudp, 0UL);
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   if (!pmd_none(READ_ONCE(pmdp[i])))
+   return;
+   }
+
+   page = pud_page(READ_ONCE(*pudp));
+   pud_clear(pudp);
+   __flush_tlb_kernel_pgtable(addr);
+   free_hotplug_pgtable_page(page);
+}
+
+static void free_pud_table(pgd_t *pgdp, unsigned long addr)
+{
+   struct page *page;
+   pud_t *pudp;
+   int i;
+
+   if (CONFIG_PGTABLE_LEVELS <= 3)
+   return;
+
+   pudp = pud_offset(pgdp, 0UL);
+   for (i = 0; i < PTRS_PER_PUD; i++) {
+   if (!pud_none(READ_ONCE(pudp[i])))
+   return;
+   }
+
+   page = pgd_page(READ_ONCE(*pgdp));
+   pgd_clear(pgdp);
+   __flush_tlb_kernel_pgtable(addr);
+   free_hotplug_pgtable_page(page);
+}
+
+static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
+   unsigned long end, bool sparse_vmap)
+{
+   struct page *page;
+   pte_t *ptep, pte;
+
+   do {
+   ptep = pte_offset_kernel(pmdp, addr);
+   pte = READ_ONCE(*ptep);
+   if (pte_none(pte))
+   continue;
+
+   WARN_ON(!pte_present(pte));
+   page = sparse_vmap

[PATCH V6 2/3] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump

2019-06-18 Thread Anshuman Khandual

The arm64 page table dump code can race with concurrent modification of the
kernel page tables. When a leaf entries are modified concurrently, the dump
code may log stale or inconsistent information for a VA range, but this is
otherwise not harmful.

When intermediate levels of table are freed, the dump code will continue to
use memory which has been freed and potentially reallocated for another
purpose. In such cases, the dump code may dereference bogus addresses,
leading to a number of potential problems.

Intermediate levels of table may by freed during memory hot-remove,
which will be enabled by a subsequent patch. To avoid racing with
this, take the memory hotplug lock when walking the kernel page table.

Acked-by: David Hildenbrand 
Acked-by: Mark Rutland 
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/mm/ptdump_debugfs.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c
index 064163f..b5eebc8 100644
--- a/arch/arm64/mm/ptdump_debugfs.c
+++ b/arch/arm64/mm/ptdump_debugfs.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
+#include 
 #include 
 
 #include 
@@ -7,7 +8,10 @@
 static int ptdump_show(struct seq_file *m, void *v)
 {
struct ptdump_info *info = m->private;
+
+   get_online_mems();
ptdump_walk_pgd(m, info);
+   put_online_mems();
return 0;
 }
 DEFINE_SHOW_ATTRIBUTE(ptdump);
-- 
2.7.4

[PATCH V6 0/3] arm64/mm: Enable memory hot remove

2019-06-18 Thread Anshuman Khandual

This series enables memory hot remove on arm64 after fixing a memblock
removal ordering problem in generic try_remove_memory() and a possible
arm64 platform specific kernel page table race condition. This series
is based on linux-next (next-20190613).

Concurrent vmalloc() and hot-remove conflict:

As pointed out earlier on the v5 thread [2] there can be potential conflict
between concurrent vmalloc() and memory hot-remove operation. This can be
solved or at least avoided with some possible methods. The problem here is
caused by inadequate locking in vmalloc() which protects installation of a
page table page but not the walk or the leaf entry modification.

Option 1: Making locking in vmalloc() adequate

Current locking scheme protects installation of page table pages but not the
page table walk or leaf entry creation which can conflict with hot-remove.
This scheme is sufficient for now as vmalloc() works on mutually exclusive
ranges which can proceed concurrently only if their shared page table pages
can be created while inside the lock. It achieves performance improvement
which will be compromised if entire vmalloc() operation (even if with some
optimization) has to be completed under a lock.

Option 2: Making sure hot-remove does not happen during vmalloc()

Take mem_hotplug_lock in read mode through [get|put]_online_mems() constructs
for the entire duration of vmalloc(). It protects from concurrent memory hot
remove operation and does not add any significant overhead to other concurrent
vmalloc() threads. It solves the problem in right way unless we do not want to
extend the usage of mem_hotplug_lock in generic MM.

Option 3: Memory hot-remove does not free (conflicting) page table pages

Don't not free page table pages (if any) for vmemmap mappings after unmapping
it's virtual range. The only downside here is that some page table pages might
remain empty and unused until next memory hot-add operation of the same memory
range.

Option 4: Dont let vmalloc and vmemmap share intermediate page table pages

The conflict does not arise if vmalloc and vmemap range do not share kernel
page table pages to start with. If such placement can be ensured in platform
kernel virtual address layout, this problem can be successfully avoided.

There are two generic solutions (Option 1 and 2) and two platform specific
solutions (Options 2 and 3). This series has decided to go with (Option 3)
which requires minimum changes while self-contained inside the functionality.

Testing:

Memory hot remove has been tested on arm64 for 4K, 16K, 64K page config
options with all possible CONFIG_ARM64_VA_BITS and CONFIG_PGTABLE_LEVELS
combinations. Its only build tested on non-arm64 platforms.

Changes in V6:

- Implemented most of the suggestions from Mark Rutland
- Added  in ptdump
- remove_pagetable() now has two distinct passes over the kernel page table
- First pass unmap_hotplug_range() removes leaf level entries at all level
- Second pass free_empty_tables() removes empty page table pages
- Kernel page table lock has been dropped completely
- vmemmap_free() does not call freee_empty_tables() to avoid conflict with 
vmalloc()
- All address range scanning are converted to do {} while() loop
- Added 'unsigned long end' in __remove_pgd_mapping()
- Callers need not provide starting pointer argument to 
free_[pte|pmd|pud]_table() 
- Drop the starting pointer argument from free_[pte|pmd|pud]_table() functions
- Fetching pxxp[i] in free_[pte|pmd|pud]_table() is wrapped around in 
READ_ONCE()
- free_[pte|pmd|pud]_table() now computes starting pointer inside the function
- Fixed TLB handling while freeing huge page section mappings at PMD or PUD 
level
- Added WARN_ON(!page) in free_hotplug_page_range()
- Added WARN_ON(![pm|pud]_table(pud|pmd)) when there is no section mapping

- [PATCH 1/3] mm/hotplug: Reorder memblock_[free|remove]() calls in 
try_remove_memory()
- Request earlier for separate merger 
(https://patchwork.kernel.org/patch/10986599/)
- s/__remove_memory/try_remove_memory in the subject line
- s/arch_remove_memory/memblock_[free|remove] in the subject line
- A small change in the commit message as re-order happens now for memblock 
remove
  functions not for arch_remove_memory()

Changes in V5: (https://lkml.org/lkml/2019/5/29/218)

- Have some agreement [1] over using memory_hotplug_lock for arm64 ptdump
- Change 7ba36eccb3f8 ("arm64/mm: Inhibit huge-vmap with ptdump") already merged
- Dropped the above patch from this series
- Fixed indentation problem in arch_[add|remove]_memory() as per David
- Collected all new Acked-by tags
 
Changes in V4: (https://lkml.org/lkml/2019/5/20/19)

- Implemented most of the suggestions from Mark Rutland
- Interchanged patch [PATCH 2/4] <---> [PATCH 3/4] and updated commit message
- Moved CONFIG_PGTABLE_LEVELS inside free_[pud|pmd]_table()
- Used READ_ONCE() in missing instances while accessing page table entries
- s/p???_present()/p???_none() for checking valid kernel page table entries
-

Re: linux-next: build failure after merge of the net-next tree

2019-06-18 Thread Masahiro Yamada

On Wed, Jun 19, 2019 at 1:02 PM Masahiro Yamada
 wrote:
>
> Hi.
>
>
> On Wed, Jun 19, 2019 at 12:23 PM Stephen Rothwell  
> wrote:
> >
> > Hi all,
> >
> > After merging the net-next tree, today's linux-next build (x86_64
> > allmodconfig) failed like this:
> >
> > In file included from usr/include/linux/tc_act/tc_ctinfo.hdrtest.c:1:
> > ./usr/include/linux/tc_act/tc_ctinfo.h:30:21: error: implicit declaration 
> > of function 'BIT' [-Werror=implicit-function-declaration]
> >   CTINFO_MODE_DSCP = BIT(0),
> >  ^~~
> > ./usr/include/linux/tc_act/tc_ctinfo.h:30:2: error: enumerator value for 
> > 'CTINFO_MODE_DSCP' is not an integer constant
> >   CTINFO_MODE_DSCP = BIT(0),
> >   ^~~~
> > ./usr/include/linux/tc_act/tc_ctinfo.h:32:1: error: enumerator value for 
> > 'CTINFO_MODE_CPMARK' is not an integer constant
> >  };
> >  ^
> >
> > Caused by commit
> >
> >   24ec483cec98 ("net: sched: Introduce act_ctinfo action")
> >
> > Presumably exposed by commit
> >
> >   b91976b7c0e3 ("kbuild: compile-test UAPI headers to ensure they are 
> > self-contained")
> >
> > from the kbuild tree.
>
>
> My commit correctly blocked the broken UAPI header, Hooray!
>
> People export more and more headers that
> are never able to compile in user-space.
>
> We must block new breakages from coming in.
>
>
> BIT() is not exported to user-space
> since it is not prefixed with underscore.
>
>
> You can use _BITUL() in user-space,
> which is available in include/uapi/linux/const.h
>
>


I just took a look at
include/uapi/linux/tc_act/tc_ctinfo.h


I just wondered why the following can be compiled:

struct tc_ctinfo {
tc_gen;
};


Then, I found 'tc_gen' is a macro.

#define tc_gen \
__u32 index; \
__u32 capab; \
int   action; \
int   refcnt; \
int   bindcnt



What a hell.



-- 
Best Regards
Masahiro Yamada

Re: linux-next: build failure after merge of the net-next tree

2019-06-18 Thread Masahiro Yamada

Hi.


On Wed, Jun 19, 2019 at 12:23 PM Stephen Rothwell  wrote:
>
> Hi all,
>
> After merging the net-next tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
>
> In file included from usr/include/linux/tc_act/tc_ctinfo.hdrtest.c:1:
> ./usr/include/linux/tc_act/tc_ctinfo.h:30:21: error: implicit declaration of 
> function 'BIT' [-Werror=implicit-function-declaration]
>   CTINFO_MODE_DSCP = BIT(0),
>  ^~~
> ./usr/include/linux/tc_act/tc_ctinfo.h:30:2: error: enumerator value for 
> 'CTINFO_MODE_DSCP' is not an integer constant
>   CTINFO_MODE_DSCP = BIT(0),
>   ^~~~
> ./usr/include/linux/tc_act/tc_ctinfo.h:32:1: error: enumerator value for 
> 'CTINFO_MODE_CPMARK' is not an integer constant
>  };
>  ^
>
> Caused by commit
>
>   24ec483cec98 ("net: sched: Introduce act_ctinfo action")
>
> Presumably exposed by commit
>
>   b91976b7c0e3 ("kbuild: compile-test UAPI headers to ensure they are 
> self-contained")
>
> from the kbuild tree.


My commit correctly blocked the broken UAPI header, Hooray!

People export more and more headers that
are never able to compile in user-space.

We must block new breakages from coming in.


BIT() is not exported to user-space
since it is not prefixed with underscore.


You can use _BITUL() in user-space,
which is available in include/uapi/linux/const.h


Thanks.




> I have applied the following (obvious) patch for today.
>
> From: Stephen Rothwell 
> Date: Wed, 19 Jun 2019 13:15:22 +1000
> Subject: [PATCH] net: sched: don't use BIT() in uapi headers
>
> Signed-off-by: Stephen Rothwell 
> ---
>  include/uapi/linux/tc_act/tc_ctinfo.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h 
> b/include/uapi/linux/tc_act/tc_ctinfo.h
> index da803e05a89b..6166c62dd7dd 100644
> --- a/include/uapi/linux/tc_act/tc_ctinfo.h
> +++ b/include/uapi/linux/tc_act/tc_ctinfo.h
> @@ -27,8 +27,8 @@ enum {
>  #define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1)
>
>  enum {
> -   CTINFO_MODE_DSCP= BIT(0),
> -   CTINFO_MODE_CPMARK  = BIT(1)
> +   CTINFO_MODE_DSCP= (1UL << 0),
> +   CTINFO_MODE_CPMARK  = (1UL << 1)
>  };
>
>  #endif
> --
> 2.20.1
>
> --
> Cheers,
> Stephen Rothwell



-- 
Best Regards
Masahiro Yamada

Re: [PATCH v9 03/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal

2019-06-18 Thread Dan Williams

On Mon, Jun 17, 2019 at 6:42 PM Wei Yang  wrote:
>
> On Wed, Jun 05, 2019 at 02:58:04PM -0700, Dan Williams wrote:
> >Sub-section hotplug support reduces the unit of operation of hotplug
> >from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
> >(PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider
> >PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
> >valid_section(), can toggle.
> >
> >Cc: Michal Hocko 
> >Cc: Vlastimil Babka 
> >Cc: Logan Gunthorpe 
> >Reviewed-by: Pavel Tatashin 
> >Reviewed-by: Oscar Salvador 
> >Signed-off-by: Dan Williams 
> >---
> > mm/memory_hotplug.c |   29 -
> > 1 file changed, 8 insertions(+), 21 deletions(-)
> >
> >diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> >index 7b963c2d3a0d..647859a1d119 100644
> >--- a/mm/memory_hotplug.c
> >+++ b/mm/memory_hotplug.c
> >@@ -318,12 +318,8 @@ static unsigned long find_smallest_section_pfn(int nid, 
> >struct zone *zone,
> >unsigned long start_pfn,
> >unsigned long end_pfn)
> > {
> >-  struct mem_section *ms;
> >-
> >-  for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
> >-  ms = __pfn_to_section(start_pfn);
> >-
> >-  if (unlikely(!valid_section(ms)))
> >+  for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUBSECTION) {
> >+  if (unlikely(!pfn_valid(start_pfn)))
> >   continue;
>
> Hmm, we change the granularity of valid section from SECTION to SUBSECTION.
> But we didn't change the granularity of node id and zone information.
>
> For example, we found the node id of a pfn mismatch, we can skip the whole
> section instead of a subsection.
>
> Maybe this is not a big deal.

I don't see a problem.

Re: [PATCH] scsi: scsi_sysfs.c: Hide wwid sdev attr if VPD is not supported

2019-06-18 Thread Martin K. Petersen



Marcos,

> WWID composed from VPD data from device, specifically page 0x83. So,
> when a device does not have VPD support, for example USB storage
> devices where VPD is specifically disabled, a read into  device>/device/wwid file will always return ENXIO. To avoid this,
> change the scsi_sdev_attr_is_visible function to hide wwid sysfs file
> when the devices does not support VPD.

Not a big fan of attribute files that come and go.

Why not just return an empty string? Hannes?

-- 
Martin K. Petersen  Oracle Linux Engineering

linux-next: build failure after merge of the net-next tree

2019-06-18 Thread Stephen Rothwell

Hi all,

After merging the net-next tree, today's linux-next build (x86_64
allmodconfig) failed like this:

In file included from usr/include/linux/tc_act/tc_ctinfo.hdrtest.c:1:
./usr/include/linux/tc_act/tc_ctinfo.h:30:21: error: implicit declaration of 
function 'BIT' [-Werror=implicit-function-declaration]
  CTINFO_MODE_DSCP = BIT(0),
 ^~~
./usr/include/linux/tc_act/tc_ctinfo.h:30:2: error: enumerator value for 
'CTINFO_MODE_DSCP' is not an integer constant
  CTINFO_MODE_DSCP = BIT(0),
  ^~~~
./usr/include/linux/tc_act/tc_ctinfo.h:32:1: error: enumerator value for 
'CTINFO_MODE_CPMARK' is not an integer constant
 };
 ^

Caused by commit

  24ec483cec98 ("net: sched: Introduce act_ctinfo action")

Presumably exposed by commit

  b91976b7c0e3 ("kbuild: compile-test UAPI headers to ensure they are 
self-contained")

from the kbuild tree.

I have applied the following (obvious) patch for today.

From: Stephen Rothwell 
Date: Wed, 19 Jun 2019 13:15:22 +1000
Subject: [PATCH] net: sched: don't use BIT() in uapi headers

Signed-off-by: Stephen Rothwell 
---
 include/uapi/linux/tc_act/tc_ctinfo.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h 
b/include/uapi/linux/tc_act/tc_ctinfo.h
index da803e05a89b..6166c62dd7dd 100644
--- a/include/uapi/linux/tc_act/tc_ctinfo.h
+++ b/include/uapi/linux/tc_act/tc_ctinfo.h
@@ -27,8 +27,8 @@ enum {
 #define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1)
 
 enum {
-   CTINFO_MODE_DSCP= BIT(0),
-   CTINFO_MODE_CPMARK  = BIT(1)
+   CTINFO_MODE_DSCP= (1UL << 0),
+   CTINFO_MODE_CPMARK  = (1UL << 1)
 };
 
 #endif
-- 
2.20.1

-- 
Cheers,
Stephen Rothwell


pgp3rq6rExwui.pgp
Description: OpenPGP digital signature

Re: [PATCH 1/2] scsi: devinfo: BLIST_TRY_VPD_PAGES for SanDisk Cruzer Blade

2019-06-18 Thread Martin K. Petersen



Marcos,

> Currently, all USB devices skip VPD pages, even when the device
> supports them (SPC-3 and later), but some of them support VPD, like
> Cruzer Blade.

What's your confidence level wrt. all Cruzer Blades handling this
correctly? How many devices have you tested this change with?

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH v9 02/12] mm/sparsemem: Add helpers track active portions of a section at boot

2019-06-18 Thread Dan Williams

On Mon, Jun 17, 2019 at 3:32 PM Dan Williams  wrote:
>
> On Mon, Jun 17, 2019 at 3:22 PM Wei Yang  wrote:
> >
> > On Wed, Jun 05, 2019 at 02:57:59PM -0700, Dan Williams wrote:
> > >Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> > >sub-section active bitmask, each bit representing a PMD_SIZE span of the
> > >architecture's memory hotplug section size.
> > >
> > >The implications of a partially populated section is that pfn_valid()
> > >needs to go beyond a valid_section() check and read the sub-section
> > >active ranges from the bitmask. The expectation is that the bitmask
> > >(subsection_map) fits in the same cacheline as the valid_section() data,
> > >so the incremental performance overhead to pfn_valid() should be
> > >negligible.
> > >
> > >Cc: Michal Hocko 
> > >Cc: Vlastimil Babka 
> > >Cc: Logan Gunthorpe 
> > >Cc: Oscar Salvador 
> > >Cc: Pavel Tatashin 
> > >Tested-by: Jane Chu 
> > >Signed-off-by: Dan Williams 
> > >---
> > > include/linux/mmzone.h |   29 -
> > > mm/page_alloc.c|4 +++-
> > > mm/sparse.c|   35 +++
> > > 3 files changed, 66 insertions(+), 2 deletions(-)
> > >
> > >diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > >index ac163f2f274f..6dd52d544857 100644
> > >--- a/include/linux/mmzone.h
> > >+++ b/include/linux/mmzone.h
> > >@@ -1199,6 +1199,8 @@ struct mem_section_usage {
> > >   unsigned long pageblock_flags[0];
> > > };
> > >
> > >+void subsection_map_init(unsigned long pfn, unsigned long nr_pages);
> > >+
> > > struct page;
> > > struct page_ext;
> > > struct mem_section {
> > >@@ -1336,12 +1338,36 @@ static inline struct mem_section 
> > >*__pfn_to_section(unsigned long pfn)
> > >
> > > extern int __highest_present_section_nr;
> > >
> > >+static inline int subsection_map_index(unsigned long pfn)
> > >+{
> > >+  return (pfn & ~(PAGE_SECTION_MASK)) / PAGES_PER_SUBSECTION;
> > >+}
> > >+
> > >+#ifdef CONFIG_SPARSEMEM_VMEMMAP
> > >+static inline int pfn_section_valid(struct mem_section *ms, unsigned long 
> > >pfn)
> > >+{
> > >+  int idx = subsection_map_index(pfn);
> > >+
> > >+  return test_bit(idx, ms->usage->subsection_map);
> > >+}
> > >+#else
> > >+static inline int pfn_section_valid(struct mem_section *ms, unsigned long 
> > >pfn)
> > >+{
> > >+  return 1;
> > >+}
> > >+#endif
> > >+
> > > #ifndef CONFIG_HAVE_ARCH_PFN_VALID
> > > static inline int pfn_valid(unsigned long pfn)
> > > {
> > >+  struct mem_section *ms;
> > >+
> > >   if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> > >   return 0;
> > >-  return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> > >+  ms = __nr_to_section(pfn_to_section_nr(pfn));
> > >+  if (!valid_section(ms))
> > >+  return 0;
> > >+  return pfn_section_valid(ms, pfn);
> > > }
> > > #endif
> > >
> > >@@ -1373,6 +1399,7 @@ void sparse_init(void);
> > > #define sparse_init() do {} while (0)
> > > #define sparse_index_init(_sec, _nid)  do {} while (0)
> > > #define pfn_present pfn_valid
> > >+#define subsection_map_init(_pfn, _nr_pages) do {} while (0)
> > > #endif /* CONFIG_SPARSEMEM */
> > >
> > > /*
> > >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > >index c6d8224d792e..bd773efe5b82 100644
> > >--- a/mm/page_alloc.c
> > >+++ b/mm/page_alloc.c
> > >@@ -7292,10 +7292,12 @@ void __init free_area_init_nodes(unsigned long 
> > >*max_zone_pfn)
> > >
> > >   /* Print out the early node map */
> > >   pr_info("Early memory node ranges\n");
> > >-  for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
> > >+  for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) 
> > >{
> > >   pr_info("  node %3d: [mem %#018Lx-%#018Lx]\n", nid,
> > >   (u64)start_pfn << PAGE_SHIFT,
> > >   ((u64)end_pfn << PAGE_SHIFT) - 1);
> > >+  subsection_map_init(start_pfn, end_pfn - start_pfn);
> > >+  }
> >
> > Just curious about why we set subsection here?
> >
> > Function free_area_init_nodes() mostly handles pgdat, if I am correct. Setup
> > subsection here looks like touching some lower level system data structure.
>
> Correct, I'm not sure how it ended up there, but it was the source of
> a bug that was fixed with this change:
>
> https://lore.kernel.org/lkml/capcyv4hjvbpdykpp2gns3-cc2aq0avs1nlk-k3fwxeruvvz...@mail.gmail.com/

On second thought I'm going to keep subsection_map_init() in
free_area_init_nodes(), but instead teach pfn_valid() to return true
for all "early" sections. There are code paths that use pfn_valid() as
a coarse check before validating against pgdat for real validity of
online memory. It is sufficient and safe for those to assume that all
early sections are fully pfn_valid, while ZONE_DEVICE hotplug can see
the more precise subsection_map.

Re: [PATCH] scsi: fdomain: fix building pcmcia front-end

2019-06-18 Thread Martin K. Petersen



Arnd,

> Move the common support outside of the SCSI_LOWLEVEL section.
> Alternatively, we could move all of SCSI_LOWLEVEL_PCMCIA into
> SCSI_LOWLEVEL. This would be more sensible, but might cause surprises
> for users that have SCSI_LOWLEVEL disabled.

It seems messy to me that PCMCIA lives outside of the LOWLEVEL section.

Given that the number of users that rely on PCMCIA for their system disk
is probably pretty low, I think I'm leaning towards cleaning things up
instead of introducing a nonsensical top level option.

Or even better: Get rid of SCSI_FDOMAIN as a user-visible option and
select it if either of the PCI/ISA/PCMCIA drivers are enabled.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: "mm: reparent slab memory on cgroup removal" series triggers SLUB_DEBUG errors

2019-06-18 Thread Roman Gushchin

On Tue, Jun 18, 2019 at 05:43:04PM -0400, Qian Cai wrote:
> Booting linux-next on both arm64 and powerpc triggers SLUB_DEBUG errors 
> below. Reverted the whole series “mm: reparent slab memory on cgroup removal” 
> [1] fixed the issue.

Hi Qian!

Thank you for the report!

Didn't you try to reproduce it on x86? All the code changed in this series
isn't arch-specific, so if it can be seen only on ppc and arm64, that's
interesting.

I'm currently on PTO and have a very limited internet connection,
so I won't be able to reproduce the issue up to Sunday, when I'll be back.

If you can try reverting only the last patch from the series,
I will appreciate it.

Thanks!

> 
> [1] https://lore.kernel.org/lkml/20190611231813.3148843-1-g...@fb.com/
> 
> [  151.773224][ T1650] BUG kmem_cache (Tainted: GB   W): Poison 
> overwritten
> [  151.780969][ T1650] 
> -
> [  151.780969][ T1650] 
> [  151.792016][ T1650] INFO: 0x1fd6fdef-0x07f6bb36. First 
> byte 0x0 instead of 0x6b
> [  151.800726][ T1650] INFO: Allocated in create_cache+0x6c/0x1bc age=24301 
> cpu=97 pid=1444
> [  151.808821][ T1650]kmem_cache_alloc+0x514/0x568
> [  151.813527][ T1650]create_cache+0x6c/0x1bc
> [  151.817800][ T1650]memcg_create_kmem_cache+0xfc/0x11c
> [  151.823028][ T1650]memcg_kmem_cache_create_func+0x40/0x170
> [  151.828691][ T1650]process_one_work+0x4e0/0xa54
> [  151.833398][ T1650]worker_thread+0x498/0x650
> [  151.837843][ T1650]kthread+0x1b8/0x1d4
> [  151.841770][ T1650]ret_from_fork+0x10/0x18
> [  151.846046][ T1650] INFO: Freed in slab_kmem_cache_release+0x3c/0x48 
> age=23341 cpu=28 pid=1480
> [  151.854659][ T1650]slab_kmem_cache_release+0x3c/0x48
> [  151.859799][ T1650]kmem_cache_release+0x1c/0x28
> [  151.864507][ T1650]kobject_cleanup+0x134/0x288
> [  151.869127][ T1650]kobject_put+0x5c/0x68
> [  151.873226][ T1650]sysfs_slab_release+0x2c/0x38
> [  151.877931][ T1650]shutdown_cache+0x198/0x23c
> [  151.882464][ T1650]kmemcg_cache_shutdown_fn+0x1c/0x34
> [  151.887691][ T1650]kmemcg_workfn+0x44/0x68
> [  151.891963][ T1650]process_one_work+0x4e0/0xa54
> [  151.896668][ T1650]worker_thread+0x498/0x650
> [  151.901113][ T1650]kthread+0x1b8/0x1d4
> [  151.905037][ T1650]ret_from_fork+0x10/0x18
> [  151.909324][ T1650] INFO: Slab 0x406d65a6 objects=64 used=64 
> fp=0x4d988e71 flags=0x7ffc000200
> [  151.919596][ T1650] INFO: Object 0x40f4b79e 
> @offset=15420325124116637824 fp=0xe038adbf
> [  151.919596][ T1650] 
> [  151.931079][ T1650] Redzone fc4c04f0: bb bb bb bb bb bb bb bb bb 
> bb bb bb bb bb bb bb  
> [  151.941168][ T1650] Redzone 9a25c019: bb bb bb bb bb bb bb bb bb 
> bb bb bb bb bb bb bb  
> [  151.951256][ T1650] Redzone 0b05c7cc: bb bb bb bb bb bb bb bb bb 
> bb bb bb bb bb bb bb  
> [  151.961345][ T1650] Redzone a08ae38b: bb bb bb bb bb bb bb bb bb 
> bb bb bb bb bb bb bb  
> [  151.971433][ T1650] Redzone e0eccd41: bb bb bb bb bb bb bb bb bb 
> bb bb bb bb bb bb bb  
> [  151.981520][ T1650] Redzone 16ee2661: bb bb bb bb bb bb bb bb bb 
> bb bb bb bb bb bb bb  
> [  151.991608][ T1650] Redzone 9364e729: bb bb bb bb bb bb bb bb bb 
> bb bb bb bb bb bb bb  
> [  152.001695][ T1650] Redzone f2202456: bb bb bb bb bb bb bb bb bb 
> bb bb bb bb bb bb bb  
> [  152.011784][ T1650] Object 40f4b79e: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b  
> [  152.021783][ T1650] Object 2df21fec: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b  
> [  152.031779][ T1650] Object 41cf0887: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b  
> [  152.041775][ T1650] Object bfb91e8f: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b  
> [  152.051770][ T1650] Object da315b1c: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b  
> [  152.061765][ T1650] Object b362de78: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b  
> [  152.071761][ T1650] Object ad4f72bf: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b  
> [  152.081756][ T1650] Object aa32d346: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b  
> [  152.091751][ T1650] Object ad1cf22c: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b  
> [  152.101746][ T1650] Object 1cee47e4: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b  
> [  152.111741][ T1650] Object 418720ed: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b  kk

Re: [RFC net-next 1/5] net: stmmac: introduce IEEE 802.1Qbv configuration functionalities

2019-06-18 Thread Andrew Lunn

On Wed, Jun 19, 2019 at 05:36:14AM +0800, Voon Weifeng wrote:

Hi Voon

> +static int est_poll_srwo(void *ioaddr)
> +{
> + /* Poll until the EST GCL Control[SRWO] bit clears.
> +  * Total wait = 12 x 50ms ~= 0.6s.
> +  */
> + unsigned int retries = 12;
> + unsigned int value;
> +
> + do {
> + value = TSN_RD32(ioaddr + MTL_EST_GCL_CTRL);
> + if (!(value & MTL_EST_GCL_CTRL_SRWO))
> + return 0;
> + msleep(50);
> + } while (--retries);
> +
> + return -ETIMEDOUT;

Maybe use one of the readx_poll_timeout() macros?

> +static int est_read_gce(void *ioaddr, unsigned int row,
> + unsigned int *gates, unsigned int *ti_nsec,
> + unsigned int dbgb, unsigned int dbgm)
> +{
> + struct tsn_hw_cap *cap = &dw_tsn_hwcap;
> + unsigned int ti_wid = cap->ti_wid;
> + unsigned int gates_mask;
> + unsigned int ti_mask;
> + unsigned int value;
> + int ret;
> +
> + gates_mask = (1 << cap->txqcnt) - 1;
> + ti_mask = (1 << ti_wid) - 1;
> +
> + ret = est_read_gcl_config(ioaddr, &value, row, 0, dbgb, dbgm);
> + if (ret) {
> + TSN_ERR("Read GCE failed! row=%u\n", row);

It is generally not a good idea to put wrappers around the kernel
print functions. It would be better if all these functions took struct
stmmac_priv *priv rather than ioaddr, so you could then do

netdev_err(priv->dev, "Read GCE failed! row=%u\n", row);

> + /* Ensure that HW is not in the midst of GCL transition */
> + value = TSN_RD32(ioaddr + MTL_EST_CTRL);

Also, don't put wrapper around readl()/writel().

> + value &= ~MTL_EST_CTRL_SSWL;
> +
> + /* MTL_EST_CTRL value has been read earlier, if TILS value
> +  * differs, we update here.
> +  */
> + if (tils != dw_tsn_hwtunable[TSN_HWTUNA_TX_EST_TILS]) {
> + value &= ~MTL_EST_CTRL_TILS;
> + value |= (tils << MTL_EST_CTRL_TILS_SHIFT);
> +
> + TSN_WR32(value, ioaddr + MTL_EST_CTRL);
> + dw_tsn_hwtunable[TSN_HWTUNA_TX_EST_TILS] = tils;
> + }
> +
> + return 0;
> +}
> +
> +static int est_set_ov(void *ioaddr,
> +   const unsigned int *ptov,
> +   const unsigned int *ctov)
> +{
> + unsigned int value;
> +
> + if (!dw_tsn_feat_en[TSN_FEAT_ID_EST])
> + return -ENOTSUPP;
> +
> + value = TSN_RD32(ioaddr + MTL_EST_CTRL);
> + value &= ~MTL_EST_CTRL_SSWL;
> +
> + if (ptov) {
> + if (*ptov > EST_PTOV_MAX) {
> + TSN_WARN("EST: invalid PTOV(%u), max=%u\n",
> +  *ptov, EST_PTOV_MAX);

It looks like most o the TSN_WARN should actually be netdev_dbg().

   Andrew

Re: [RFC PATCH 16/16] xen/grant-table: host_addr fixup in mapping on xenhost_r0

2019-06-18 Thread Ankur Arora


On 6/17/19 3:55 AM, Juergen Gross wrote:

On 09.05.19 19:25, Ankur Arora wrote:

Xenhost type xenhost_r0 does not support standard GNTTABOP_map_grant_ref
semantics (map a gref onto a specified host_addr). That's because
since the hypervisor is local (same address space as the caller of
GNTTABOP_map_grant_ref), there is no external entity that could
map an arbitrary page underneath an arbitrary address.

To handle this, the GNTTABOP_map_grant_ref hypercall on xenhost_r0
treats the host_addr as an OUT parameter instead of IN and expects the
gnttab_map_refs() and similar to fixup any state that caches the
value of host_addr from before the hypercall.

Accordingly gnttab_map_refs() now adds two parameters, a fixup function
and a pointer to cached maps to fixup:
  int gnttab_map_refs(xenhost_t *xh, struct gnttab_map_grant_ref 
*map_ops,

  struct gnttab_map_grant_ref *kmap_ops,
-    struct page **pages, unsigned int count)
+    struct page **pages, gnttab_map_fixup_t map_fixup_fn,
+    void **map_fixup[], unsigned int count)

The reason we use a fixup function and not an additional mapping op
in the xenhost_t is because, depending on the caller, what we are fixing
might be different: blkback, netback for instance cache host_addr in
via a struct page *, while __xenbus_map_ring() caches a phys_addr.

This patch fixes up xen-blkback and xen-gntdev drivers.

TODO:
   - also rewrite gnttab_batch_map() and __xenbus_map_ring().
   - modify xen-netback, scsiback, pciback etc

Co-developed-by: Joao Martins 
Signed-off-by: Ankur Arora 


Without seeing the __xenbus_map_ring() modification it is impossible to
do a proper review of this patch.

Will do in v2.

Ankur




Juergen

Re: [RFC PATCH 14/16] xen/blk: gnttab, evtchn, xenbus API changes

2019-06-18 Thread Ankur Arora


On 6/17/19 3:14 AM, Juergen Gross wrote:

On 09.05.19 19:25, Ankur Arora wrote:

For the most part, we now pass xenhost_t * as a parameter.

Co-developed-by: Joao Martins 
Signed-off-by: Ankur Arora 


I don't see how this can be a patch on its own.

Yes, the reason this was separate was that given this was an
RFC, I didn't want to pollute the logic page with lots of
mechanical changes.



The only way to be able to use a patch for each driver would be to
keep the original grant-, event- and xenbus-interfaces and add the
new ones taking xenhost * with a new name. The original interfaces
could then use xenhost_default and you can switch them to the new
interfaces one by one. The last patch could then remove the old
interfaces when there is no user left.

Yes, this makes sense.

Ankur




Juergen

Re: [RFC PATCH 13/16] drivers/xen: gnttab, evtchn, xenbus API changes

2019-06-18 Thread Ankur Arora


On 6/17/19 3:07 AM, Juergen Gross wrote:

On 09.05.19 19:25, Ankur Arora wrote:

Mechanical changes, now most of these calls take xenhost_t *
as parameter.

Co-developed-by: Joao Martins 
Signed-off-by: Ankur Arora 
---
  drivers/xen/cpu_hotplug.c | 14 ++---
  drivers/xen/gntalloc.c    | 13 
  drivers/xen/gntdev.c  | 16 +++
  drivers/xen/manage.c  | 37 ++-
  drivers/xen/platform-pci.c    | 12 +++-
  drivers/xen/sys-hypervisor.c  | 12 
  drivers/xen/xen-balloon.c | 10 +++---
  drivers/xen/xenfs/xenstored.c |  7 ---
  8 files changed, 73 insertions(+), 48 deletions(-)

diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
index afeb94446d34..4a05bc028956 100644
--- a/drivers/xen/cpu_hotplug.c
+++ b/drivers/xen/cpu_hotplug.c
@@ -31,13 +31,13 @@ static void disable_hotplug_cpu(int cpu)
  unlock_device_hotplug();
  }
-static int vcpu_online(unsigned int cpu)
+static int vcpu_online(xenhost_t *xh, unsigned int cpu)


Do we really need xenhost for cpu on/offlinig?

I was in two minds about this. We only need it for the xenbus
interfaces which could very well have been just xh_default.

However, the xenhost is part of the xenbus_watch state, so
I thought it is easier to percolate that down instead of
adding xh_default all over the place.




diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 9a69d955dd5c..1655d0a039fd 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -227,14 +227,14 @@ static void shutdown_handler(struct xenbus_watch 
*watch,

  return;
   again:
-    err = xenbus_transaction_start(xh_default, &xbt);
+    err = xenbus_transaction_start(watch->xh, &xbt);
  if (err)
  return;
-    str = (char *)xenbus_read(xh_default, xbt, "control", "shutdown", 
NULL);
+    str = (char *)xenbus_read(watch->xh, xbt, "control", "shutdown", 
NULL);

  /* Ignore read errors and empty reads. */
  if (XENBUS_IS_ERR_READ(str)) {
-    xenbus_transaction_end(xh_default, xbt, 1);
+    xenbus_transaction_end(watch->xh, xbt, 1);
  return;
  }
@@ -245,9 +245,9 @@ static void shutdown_handler(struct xenbus_watch 
*watch,

  /* Only acknowledge commands which we are prepared to handle. */
  if (idx < ARRAY_SIZE(shutdown_handlers))
-    xenbus_write(xh_default, xbt, "control", "shutdown", "");
+    xenbus_write(watch->xh, xbt, "control", "shutdown", "");
-    err = xenbus_transaction_end(xh_default, xbt, 0);
+    err = xenbus_transaction_end(watch->xh, xbt, 0);
  if (err == -EAGAIN) {
  kfree(str);
  goto again;
@@ -272,10 +272,10 @@ static void sysrq_handler(struct xenbus_watch 
*watch, const char *path,

  int err;
   again:
-    err = xenbus_transaction_start(xh_default, &xbt);
+    err = xenbus_transaction_start(watch->xh, &xbt);
  if (err)
  return;
-    err = xenbus_scanf(xh_default, xbt, "control", "sysrq", "%c", 
&sysrq_key);
+    err = xenbus_scanf(watch->xh, xbt, "control", "sysrq", "%c", 
&sysrq_key);

  if (err < 0) {
  /*
   * The Xenstore watch fires directly after registering it and
@@ -287,21 +287,21 @@ static void sysrq_handler(struct xenbus_watch 
*watch, const char *path,

  if (err != -ENOENT && err != -ERANGE)
  pr_err("Error %d reading sysrq code in control/sysrq\n",
 err);
-    xenbus_transaction_end(xh_default, xbt, 1);
+    xenbus_transaction_end(watch->xh, xbt, 1);
  return;
  }
  if (sysrq_key != '\0') {
-    err = xenbus_printf(xh_default, xbt, "control", "sysrq", 
"%c", '\0');
+    err = xenbus_printf(watch->xh, xbt, "control", "sysrq", "%c", 
'\0');

  if (err) {
  pr_err("%s: Error %d writing sysrq in control/sysrq\n",
 __func__, err);
-    xenbus_transaction_end(xh_default, xbt, 1);
+    xenbus_transaction_end(watch->xh, xbt, 1);
  return;
  }
  }
-    err = xenbus_transaction_end(xh_default, xbt, 0);
+    err = xenbus_transaction_end(watch->xh, xbt, 0);
  if (err == -EAGAIN)
  goto again;
@@ -324,14 +324,14 @@ static struct notifier_block xen_reboot_nb = {
  .notifier_call = poweroff_nb,
  };
-static int setup_shutdown_watcher(void)
+static int setup_shutdown_watcher(xenhost_t *xh)


I think shutdown is purely local, too.

Yes, I introduced xenhost for the same reason as above.

I agree that either of these cases (and similar others) have no use
for the concept of xenhost. Do you think it makes sense for these
to pass NULL instead and the underlying interface would just assume
xh_default.

Ankur




Juergen

Re: [PATCH 1/1] scsi: ufs-qcom: Add support for platforms booting ACPI

2019-06-18 Thread Martin K. Petersen



Lee,

> New Qualcomm AArch64 based laptops are now available which use UFS
> as their primary data storage medium.  These devices are supplied
> with ACPI support out of the box.  This patch ensures the Qualcomm
> UFS driver will be bound when the "QCOM24A5" H/W device is
> advertised as present.

Applied to 5.3/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 976 matches

Mail list logo