Re: [PATCH] net: stmmac: add sanity check to device_property_read_u32_array call
On 19/06/2019 06:13, Martin Blumenstingl wrote: > Hi Colin, > >> Currently the call to device_property_read_u32_array is not error checked >> leading to potential garbage values in the delays array that are then used >> in msleep delays. Add a sanity check to the property fetching. >> >> Addresses-Coverity: ("Uninitialized scalar variable") >> Signed-off-by: Colin Ian King > I have also sent a patch [0] to fix initialize the array. > can you please look at my patch so we can work out which one to use? > > my concern is that the "snps,reset-delays-us" property is optional, > the current dt-bindings documentation states that it's a required > property. in reality it isn't, there are boards (two examples are > mentioned in my patch: [0]) without it. > > so I believe that the resulting behavior has to be: > 1. don't delay if this property is missing (instead of delaying for > ms) > 2. don't error out if this property is missing > > your patch covers #1, can you please check whether #2 is also covered? > I tested case #2 when submitting my patch and it worked fine (even > though I could not reproduce the garbage values which are being read > on some boards) > > > Thank you! > Martin > > > [0] https://lkml.org/lkml/2019/4/19/638 > Is that the correct link? Colin
Re: [PATCH] staging: kpc2000: simplify error handling in kp2000_pcie_probe
On Wed, Jun 19, 2019 at 08:36:07AM +0200, Simon Sandström wrote: > We can get rid of a few iounmaps in the middle of the function by > re-ordering the error handling labels and adding two new labels. > > Signed-off-by: Simon Sandström > --- > > This change has not been tested besides by compiling. It might be good > took take an extra look to make sure that I got everything right. > You have the right instincts that when something looks really complicated that's probably for a reason. That attitude will serve you well in the future! But in this case it's staging code so the original code is just strange. Reviewed-by: Dan Carpenter > Also, this change was proposed by Dan Carpenter. Should I add anything > in the commit message to show this? There is a Suggested-by: tag for this, but don't resend because I don't care and I've already reviewed this version so I don't want to review the patch again. regards, dan carpenter
Re: [PATCH 5/5] Powerpc/Watchpoint: Fix length calculation for unaligned target
On 6/18/19 12:16 PM, Christophe Leroy wrote: >> +/* Maximum len for DABR is 8 bytes and DAWR is 512 bytes */ >> +static int hw_breakpoint_validate_len(struct arch_hw_breakpoint *hw) >> +{ >> + u16 length_max = 8; >> + u16 final_len; > > You should be more consistent in naming. If one is called final_len, the > other one should be called max_len. Copy/paste :). Will change it. > >> + unsigned long start_addr, end_addr; >> + >> + final_len = hw_breakpoint_get_final_len(hw, &start_addr, &end_addr); >> + >> + if (dawr_enabled()) { >> + length_max = 512; >> + /* DAWR region can't cross 512 bytes boundary */ >> + if ((start_addr >> 9) != (end_addr >> 9)) >> + return -EINVAL; >> + } >> + >> + if (final_len > length_max) >> + return -EINVAL; >> + >> + return 0; >> +} >> + > > Is many places, we have those numeric 512 and 9 shift. Could we replace them > by some symbol, for instance DAWR_SIZE and DAWR_SHIFT ? I don't see any other place where we check for boundary limit. [...] > >> +u16 hw_breakpoint_get_final_len(struct arch_hw_breakpoint *brk, >> + unsigned long *start_addr, >> + unsigned long *end_addr) >> +{ >> + *start_addr = brk->address & ~HW_BREAKPOINT_ALIGN; >> + *end_addr = (brk->address + brk->len - 1) | HW_BREAKPOINT_ALIGN; >> + return *end_addr - *start_addr + 1; >> +} > > This function gives horrible code (a couple of unneeded store/re-read and > read/re-read). > > 06bc : > 6bc: 81 23 00 00 lwz r9,0(r3) > 6c0: 55 29 00 38 rlwinm r9,r9,0,0,28 > 6c4: 91 24 00 00 stw r9,0(r4) > 6c8: 81 43 00 00 lwz r10,0(r3) > 6cc: a1 23 00 06 lhz r9,6(r3) > 6d0: 38 6a ff ff addi r3,r10,-1 > 6d4: 7c 63 4a 14 add r3,r3,r9 > 6d8: 60 63 00 07 ori r3,r3,7 > 6dc: 90 65 00 00 stw r3,0(r5) > 6e0: 38 63 00 01 addi r3,r3,1 > 6e4: 81 24 00 00 lwz r9,0(r4) > 6e8: 7c 69 18 50 subf r3,r9,r3 > 6ec: 54 63 04 3e clrlwi r3,r3,16 > 6f0: 4e 80 00 20 blr > > Below code gives something better: > > u16 hw_breakpoint_get_final_len(struct arch_hw_breakpoint *brk, > unsigned long *start_addr, > unsigned long *end_addr) > { > unsigned long address = brk->address; > unsigned long len = brk->len; > unsigned long start = address & ~HW_BREAKPOINT_ALIGN; > unsigned long end = (address + len - 1) | HW_BREAKPOINT_ALIGN; > > *start_addr = start; > *end_addr = end; > return end - start + 1; > } > > 06bc : > 6bc: 81 43 00 00 lwz r10,0(r3) > 6c0: a1 03 00 06 lhz r8,6(r3) > 6c4: 39 2a ff ff addi r9,r10,-1 > 6c8: 7d 28 4a 14 add r9,r8,r9 > 6cc: 55 4a 00 38 rlwinm r10,r10,0,0,28 > 6d0: 61 29 00 07 ori r9,r9,7 > 6d4: 91 44 00 00 stw r10,0(r4) > 6d8: 20 6a 00 01 subfic r3,r10,1 > 6dc: 91 25 00 00 stw r9,0(r5) > 6e0: 7c 63 4a 14 add r3,r3,r9 > 6e4: 54 63 04 3e clrlwi r3,r3,16 > 6e8: 4e 80 00 20 blr > > > And regardless, that's a pitty to have this function using pointers which are > from local variables in the callers, as we loose the benefit of registers. > Couldn't this function go in the .h as a static inline ? I'm sure the result > would be worth it. This is obviously a bit of optimization, but I like Mikey's idea of storing start_addr and end_addr in the arch_hw_breakpoint. That way we don't have to recalculate length every time in set_dawr.
Re: [PATCH v3 0/6] Enable THP for text section of non-shmem files
[Cc fsdevel and lkml] On Tue 18-06-19 23:24:18, Song Liu wrote: > Changes v2 => v3: > 1. Removed the limitation (cannot write to file with THP) by truncating >whole file during sys_open (see 6/6); > 2. Fixed a VM_BUG_ON_PAGE() in filemap_fault() (see 2/6); > 3. Split function rename to a separate patch (Rik); > 4. Updated condition in hugepage_vma_check() (Rik). > > Changes v1 => v2: > 1. Fixed a missing mem_cgroup_commit_charge() for non-shmem case. > > This set follows up discussion at LSF/MM 2019. The motivation is to put > text section of an application in THP, and thus reduces iTLB miss rate and > improves performance. Both Facebook and Oracle showed strong interests to > this feature. > > To make reviews easier, this set aims a mininal valid product. Current > version of the work does not have any changes to file system specific > code. This comes with some limitations (discussed later). > > This set enables an application to "hugify" its text section by simply > running something like: > > madvise(0x60, 0x8, MADV_HUGEPAGE); > > Before this call, the /proc//maps looks like: > > 0040-074d r-xp 00:27 2006927 app > > After this call, part of the text section is split out and mapped to > THP: > > 0040-00425000 r-xp 00:27 2006927 app > 0060-00e0 r-xp 0020 00:27 2006927 app <<< on THP > 00e0-074d r-xp 00a0 00:27 2006927 app > > Limitations: > > 1. This only works for text section (vma with VM_DENYWRITE). > 2. Original limitation #2 is removed in v3. > > We gated this feature with an experimental config, READ_ONLY_THP_FOR_FS. > Once we get better support on the write path, we can remove the config and > enable it by default. > > Tested cases: > 1. Tested with btrfs and ext4. > 2. Tested with real work application (memcache like caching service). > 3. Tested with "THP aware uprobe": >https://patchwork.kernel.org/project/linux-mm/list/?series=131339 > > Please share your comments and suggestions on this. > > Thanks! > > Song Liu (6): > filemap: check compound_head(page)->mapping in filemap_fault() > filemap: update offset check in filemap_fault() > mm,thp: stats for file backed THP > khugepaged: rename collapse_shmem() and khugepaged_scan_shmem() > mm,thp: add read-only THP support for (non-shmem) FS > mm,thp: handle writes to file with THP in pagecache > > fs/inode.c | 3 ++ > fs/proc/meminfo.c | 4 ++ > include/linux/fs.h | 31 > include/linux/mmzone.h | 2 + > mm/Kconfig | 11 + > mm/filemap.c | 9 ++-- > mm/khugepaged.c| 104 + > mm/rmap.c | 12 +++-- > mm/truncate.c | 7 ++- > mm/vmstat.c| 2 + > 10 files changed, 156 insertions(+), 29 deletions(-) > > -- > 2.17.1 -- Michal Hocko SUSE Labs
Re: [PATCH] net: mvpp2: cls: Add pmap to fs dump
Hello Nathan, On Tue, 18 Jun 2019 09:09:10 -0700 Nathan Huckleberry wrote: >There was an unused variable 'mvpp2_dbgfs_prs_pmap_fops' >Added a usage consistent with other fops to dump pmap >to userspace. Thanks for sending a fix. Besides the typo preventing your patch from compiling, you should also prefix the patch by "net: mvpp2: debugfs:" rather than "cls", which is used for classifier patches. Thanks, Maxime
Re: [PATCH 1/1] udf: Fix incorrect final NOT_ALLOCATED (hole) extent length
Hi Steve! On Sun 16-06-19 11:28:46, Steve Magnani wrote: > On 6/4/19 7:31 AM, Steve Magnani wrote: > > > In some cases, using the 'truncate' command to extend a UDF file results > > in a mismatch between the length of the file's extents (specifically, due > > to incorrect length of the final NOT_ALLOCATED extent) and the information > > (file) length. The discrepancy can prevent other operating systems > > (i.e., Windows 10) from opening the file. > > > > Two particular errors have been observed when extending a file: > > > > 1. The final extent is larger than it should be, having been rounded up > > to a multiple of the block size. > > > > B. The final extent is shorter than it should be, due to not having > > been updated when the file's information length was increased. > > Wondering if you've seen this, or if something got lost in a spam folder. Sorry for not getting to you earlier. I've seen the patches and they look reasonable to me. I just wanted to have a one more closer look but last weeks were rather busy so I didn't get to it. I'll look into it this week. Thanks a lot for debugging the problem and sending the fixes! Honza -- Jan Kara SUSE Labs, CR
[PATCH 0/2] perf thread-stack: Fix thread stack return from kernel for kernel-only case
Hi Here is one non-urgent fix and a subsequent tidy-up. Adrian Hunter (2): perf thread-stack: Fix thread stack return from kernel for kernel-only case perf thread-stack: Eliminate code duplicating thread_stack__pop_ks() tools/perf/util/thread-stack.c | 48 ++ 1 file changed, 35 insertions(+), 13 deletions(-) Regards Adrian
[PATCH 2/2] perf thread-stack: Eliminate code duplicating thread_stack__pop_ks()
Use new function thread_stack__pop_ks() in place of equivalent code. Signed-off-by: Adrian Hunter --- tools/perf/util/thread-stack.c | 18 ++ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c index f91c00dfe23b..b20c9b867fce 100644 --- a/tools/perf/util/thread-stack.c +++ b/tools/perf/util/thread-stack.c @@ -673,12 +673,9 @@ static int thread_stack__no_call_return(struct thread *thread, if (ip >= ks && addr < ks) { /* Return to userspace, so pop all kernel addresses */ - while (thread_stack__in_kernel(ts)) { - err = thread_stack__call_return(thread, ts, --ts->cnt, - tm, ref, true); - if (err) - return err; - } + err = thread_stack__pop_ks(thread, ts, sample, ref); + if (err) + return err; /* If the stack is empty, push the userspace address */ if (!ts->cnt) { @@ -688,12 +685,9 @@ static int thread_stack__no_call_return(struct thread *thread, } } else if (thread_stack__in_kernel(ts) && ip < ks) { /* Return to userspace, so pop all kernel addresses */ - while (thread_stack__in_kernel(ts)) { - err = thread_stack__call_return(thread, ts, --ts->cnt, - tm, ref, true); - if (err) - return err; - } + err = thread_stack__pop_ks(thread, ts, sample, ref); + if (err) + return err; } if (ts->cnt) -- 2.17.1
[PATCH 1/2] perf thread-stack: Fix thread stack return from kernel for kernel-only case
Commit f08046cb3082 ("perf thread-stack: Represent jmps to the start of a different symbol") had the side-effect of introducing more stack entries before return from kernel space. When user space is also traced, those entries are popped before entry to user space, but when user space is not traced, they get stuck at the bottom of the stack, making the stack grow progressively larger. Fix by detecting a return-from-kernel branch type, and popping kernel addresses from the stack then. Note, the problem and fix affect the exported Call Graph / Tree but not the callindent option used by "perf script --call-trace". Example: perf-with-kcore record example -e intel_pt//k -- ls perf-with-kcore script --itrace=bep -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py example.db branches calls ~/libexec/perf-core/scripts/python/exported-sql-viewer.py example.db Menu option: Reports -> Context-Sensitive Call Graph Before: (showing Call Path column only) Call Path ▶ perf ▼ ls ▼ 12111:12111 ▶ setup_new_exec ▶ __task_pid_nr_ns ▶ perf_event_pid_type ▶ perf_event_comm_output ▶ perf_iterate_ctx ▶ perf_iterate_sb ▶ perf_event_comm ▶ __set_task_comm ▶ load_elf_binary ▶ search_binary_handler ▶ __do_execve_file.isra.41 ▶ __x64_sys_execve ▶ do_syscall_64 ▼ entry_SYSCALL_64_after_hwframe ▼ swapgs_restore_regs_and_return_to_usermode ▼ native_iret ▶ error_entry ▶ do_page_fault ▼ error_exit ▼ retint_user ▶ prepare_exit_to_usermode ▼ native_iret ▶ error_entry ▶ do_page_fault ▼ error_exit ▼ retint_user ▶ prepare_exit_to_usermode ▼ native_iret ▶ error_entry ▶ do_page_fault ▼ error_exit ▼ retint_user ▶ prepare_exit_to_usermode ▶ native_iret After: (showing Call Path column only) Call Path ▶ perf ▼ ls ▼ 12111:12111 ▶ setup_new_exec ▶ __task_pid_nr_ns ▶ perf_event_pid_type ▶ perf_event_comm_output ▶ perf_iterate_ctx ▶ perf_iterate_sb ▶ perf_event_comm ▶ __set_task_comm ▶ load_elf_binary ▶ search_binary_handler ▶ __do_execve_file.isra.41 ▶ __x64_sys_execve ▶ do_syscall_64 ▶ entry_SYSCALL_64_after_hwframe ▶ page_fault ▼ entry_SYSCALL_64 ▼ do_syscall_64 ▶ __x64_sys_brk ▶ __x64_sys_access ▶ __x64_sys_openat ▶ __x64_sys_newfstat ▶ __x64_sys_mmap ▶ __x64_sys_close ▶ __x64_sys_read ▶ __x64_sys_mprotect ▶ __x64_sys_arch_prctl ▶ __x64_sys_munmap ▶ exit_to_usermode_loop ▶ __x64_sys_set_tid_address ▶ __x64_sys_set_robust_list ▶ __x64_sys_rt_sigaction ▶ __x64_sys_rt_sigprocmask ▶ __x64_sys_prlimit64 ▶ __x64_sys_statfs ▶ __x64_sys_ioctl ▶ __x64_sys_getdents64 ▶ __x64_sys_write ▶ __x64_sys_exit_group Signed-off-by: Adrian Hunter Fixes: f08046cb3082 ("perf thread-stack: Represent jmps to the start of a different symbol") Cc: sta...@vger.kernel.org --- tools/perf/util/thread-stack.c | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c index 8e390f78486f..f91c00dfe23b 100644 --- a/tools/perf/util/thread-stack.c +++ b/tools/perf/util/thread-stack.c @@ -637,6 +637,23 @@ static int thread_stack__bottom(struct thread_stack *ts, true, false); } +static int thread_stack__pop_ks(struct thread *thread, struct thread_stack *ts, + struct perf_sample *sample, u64 ref) +{ + u64 tm = sample->time; + int err; + + /* Return to userspace, so pop all kernel addresses */ + while (thread_stack__in_kernel(ts)) { + err = thread_stack__call_return(thread, ts, --ts->cnt, + tm, ref, true); + if (err) + return err; + } + + return 0; +} + static int thread_stack__no_call_return(struct thread *thread, struct thread_stack *ts, struct perf_sample *sample, @@ -919,7 +936,18 @@ int thread_stack__process(struct thread *thread, struct comm *comm, ts->rstate = X86_RETPOLINE_DETECTED; } else if (sample->flags
linux-next: build failure after merge of the usb tree
Hi all, After merging the usb tree, today's linux-next build (x86_64 allmodconfig) failed like this: In file included from usr/include/linux/usbdevice_fs.hdrtest.c:1: ./usr/include/linux/usbdevice_fs.h:88:2: error: unknown type name 'u8' u8 num_ports; /* Number of ports the device is connected */ ^~ ./usr/include/linux/usbdevice_fs.h:92:2: error: unknown type name 'u8' u8 ports[7]; /* List of ports on the way from the root */ ^~ Caused by commit 6d101f24f1dd ("USB: add usbfs ioctl to retrieve the connection parameters") Presumably exposed by commit b91976b7c0e3 ("kbuild: compile-test UAPI headers to ensure they are self-contained") from the kbuild tree. I have added this patch for now: From: Stephen Rothwell Date: Wed, 19 Jun 2019 16:36:16 +1000 Subject: [PATCH] USB: fix types in uapi include Signed-off-by: Stephen Rothwell --- include/uapi/linux/usbdevice_fs.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/usbdevice_fs.h b/include/uapi/linux/usbdevice_fs.h index 4b267fe3776e..78efe870c2b7 100644 --- a/include/uapi/linux/usbdevice_fs.h +++ b/include/uapi/linux/usbdevice_fs.h @@ -85,11 +85,11 @@ struct usbdevfs_conninfo_ex { /* kernel, the device is connected to. */ __u32 devnum; /* Device address on the bus. */ __u32 speed;/* USB_SPEED_* constants from ch9.h*/ - u8 num_ports; /* Number of ports the device is connected */ + __u8 num_ports; /* Number of ports the device is connected */ /* to on the way to the root hub. It may */ /* be bigger than size of 'ports' array so */ /* userspace can detect overflows. */ - u8 ports[7];/* List of ports on the way from the root */ + __u8 ports[7]; /* List of ports on the way from the root */ /* hub to the device. Current limit in */ /* USB specification is 7 tiers (root hub, */ /* 5 intermediate hubs, device), which */ -- 2.20.1 -- Cheers, Stephen Rothwell pgpRd45SOvYYL.pgp Description: OpenPGP digital signature
[net v1] net: stmmac: set IC bit when transmitting frames with HW timestamp
From: Roland Hii When transmitting certain PTP frames, e.g. SYNC and DELAY_REQ, the PTP daemon, e.g. ptp4l, is polling the driver for the frame transmit hardware timestamp. The polling will most likely timeout if the tx coalesce is enabled due to the Interrupt-on-Completion (IC) bit is not set in tx descriptor for those frames. This patch will ignore the tx coalesce parameter and set the IC bit when transmitting PTP frames which need to report out the frame transmit hardware timestamp to user space. Fixes: f748be531d70 ("net: stmmac: Rework coalesce timer and fix multi-queue races") Signed-off-by: Roland Hii Signed-off-by: Ong Boon Leong Signed-off-by: Voon Weifeng diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 06dd51f47cfd..06358fe5b245 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2947,12 +2947,15 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) /* Manage tx mitigation */ tx_q->tx_count_frames += nfrags + 1; - if (priv->tx_coal_frames <= tx_q->tx_count_frames) { + if (likely(priv->tx_coal_frames > tx_q->tx_count_frames) && + !(priv->synopsys_id >= DWMAC_CORE_4_00 && + (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) && + priv->hwts_tx_en)) { + stmmac_tx_timer_arm(priv, queue); + } else { + tx_q->tx_count_frames = 0; stmmac_set_tx_ic(priv, desc); priv->xstats.tx_set_ic_bit++; - tx_q->tx_count_frames = 0; - } else { - stmmac_tx_timer_arm(priv, queue); } skb_tx_timestamp(skb); @@ -3166,12 +3169,15 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) * element in case of no SG. */ tx_q->tx_count_frames += nfrags + 1; - if (priv->tx_coal_frames <= tx_q->tx_count_frames) { + if (likely(priv->tx_coal_frames > tx_q->tx_count_frames) && + !(priv->synopsys_id >= DWMAC_CORE_4_00 && + (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) && + priv->hwts_tx_en)) { + stmmac_tx_timer_arm(priv, queue); + } else { + tx_q->tx_count_frames = 0; stmmac_set_tx_ic(priv, desc); priv->xstats.tx_set_ic_bit++; - tx_q->tx_count_frames = 0; - } else { - stmmac_tx_timer_arm(priv, queue); } skb_tx_timestamp(skb); -- 1.9.1
Re: [PATCH] [v2] ipsec: select crypto ciphers for xfrm_algo
On Tue, Jun 18, 2019 at 01:22:13PM +0200, Arnd Bergmann wrote: > kernelci.org reports failed builds on arc because of what looks > like an old missed 'select' statement: > > net/xfrm/xfrm_algo.o: In function `xfrm_probe_algs': > xfrm_algo.c:(.text+0x1e8): undefined reference to `crypto_has_ahash' > > I don't see this in randconfig builds on other architectures, but > it's fairly clear we want to select the hash code for it, like we > do for all its other users. As Herbert points out, CRYPTO_BLKCIPHER > is also required even though it has not popped up in build tests. > > Fixes: 17bc19702221 ("ipsec: Use skcipher and ahash when probing algorithms") > Signed-off-by: Arnd Bergmann > --- > net/xfrm/Kconfig | 2 ++ > 1 file changed, 2 insertions(+) Acked-by: Herbert Xu -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH V3 4/5] cpufreq: Register notifiers with the PM QoS framework
On 19-06-19, 00:23, Rafael J. Wysocki wrote: > In patch [3/5] you could point notifiers for both min and max freq to the same > notifier head. Both of your notifiers end up calling cpufreq_update_policy() > anyway. I tried it and the changes in qos.c file look fine. But I don't like at all how cpufreq.c looks now. We only register for min-freq notifier now and that takes care of max as well. What could have been better is if we could have registered a freq-notifier instead of min/max, which isn't possible as well because of how qos framework works. Honestly, the cpufreq changes look hacky to me :( What do you say. -- viresh --- drivers/base/power/qos.c | 15 --- drivers/cpufreq/cpufreq.c | 38 -- include/linux/cpufreq.h | 3 +-- 3 files changed, 17 insertions(+), 39 deletions(-) diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c index cde2692b97f9..9bbf2d2a3376 100644 --- a/drivers/base/power/qos.c +++ b/drivers/base/power/qos.c @@ -202,20 +202,20 @@ static int dev_pm_qos_constraints_allocate(struct device *dev) if (!qos) return -ENOMEM; - n = kzalloc(3 * sizeof(*n), GFP_KERNEL); + n = kzalloc(2 * sizeof(*n), GFP_KERNEL); if (!n) { kfree(qos); return -ENOMEM; } + BLOCKING_INIT_NOTIFIER_HEAD(n); c = &qos->resume_latency; plist_head_init(&c->list); c->target_value = PM_QOS_RESUME_LATENCY_DEFAULT_VALUE; c->default_value = PM_QOS_RESUME_LATENCY_DEFAULT_VALUE; c->no_constraint_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT; c->type = PM_QOS_MIN; - c->notifiers = n; - BLOCKING_INIT_NOTIFIER_HEAD(n); + c->notifiers = n++; c = &qos->latency_tolerance; plist_head_init(&c->list); @@ -224,14 +224,16 @@ static int dev_pm_qos_constraints_allocate(struct device *dev) c->no_constraint_value = PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT; c->type = PM_QOS_MIN; + /* Same notifier head is used for both min/max frequency */ + BLOCKING_INIT_NOTIFIER_HEAD(n); + c = &qos->min_frequency; plist_head_init(&c->list); c->target_value = PM_QOS_MIN_FREQUENCY_DEFAULT_VALUE; c->default_value = PM_QOS_MIN_FREQUENCY_DEFAULT_VALUE; c->no_constraint_value = PM_QOS_MIN_FREQUENCY_DEFAULT_VALUE; c->type = PM_QOS_MAX; - c->notifiers = ++n; - BLOCKING_INIT_NOTIFIER_HEAD(n); + c->notifiers = n; c = &qos->max_frequency; plist_head_init(&c->list); @@ -239,8 +241,7 @@ static int dev_pm_qos_constraints_allocate(struct device *dev) c->default_value = PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE; c->no_constraint_value = PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE; c->type = PM_QOS_MIN; - c->notifiers = ++n; - BLOCKING_INIT_NOTIFIER_HEAD(n); + c->notifiers = n; INIT_LIST_HEAD(&qos->flags.list); diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 1344e1b1307f..1605dba1327e 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1139,19 +1139,10 @@ static int cpufreq_update_freq(struct cpufreq_policy *policy) return 0; } -static int cpufreq_notifier_min(struct notifier_block *nb, unsigned long freq, +static int cpufreq_notifier_qos(struct notifier_block *nb, unsigned long freq, void *data) { - struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, nb_min); - - return cpufreq_update_freq(policy); -} - -static int cpufreq_notifier_max(struct notifier_block *nb, unsigned long freq, - void *data) -{ - struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, nb_max); + struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, nb_qos); return cpufreq_update_freq(policy); @@ -1214,10 +1205,10 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) goto err_free_real_cpus; } - policy->nb_min.notifier_call = cpufreq_notifier_min; - policy->nb_max.notifier_call = cpufreq_notifier_max; + policy->nb_qos.notifier_call = cpufreq_notifier_qos; - ret = dev_pm_qos_add_notifier(dev, &policy->nb_min, + /* Notifier for min frequency also takes care of max frequency notifier */ + ret = dev_pm_qos_add_notifier(dev, &policy->nb_qos, DEV_PM_QOS_MIN_FREQUENCY); if (ret) { dev_err(dev, "Failed to register MIN QoS notifier: %d (%*pbl)\n", @@ -1225,18 +1216,10 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) goto err_kobj_remove; } - ret = dev_pm_qos_add_notifier(dev, &policy->nb_max, - DEV_PM_QOS_MAX_FREQUENCY); - if (ret) { - dev_err(dev, "Fail
Re: WARNING in fanotify_handle_event
On Tue, Jun 18, 2019 at 11:27 PM Amir Goldstein wrote: > > On Tue, Jun 18, 2019 at 8:07 PM syzbot > wrote: > > > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit:963172d9 Merge branch 'x86-urgent-for-linus' of git://git... > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=17c090eaa0 > > kernel config: https://syzkaller.appspot.com/x/.config?x=fa9f7e1b6a8bb586 > > dashboard link: https://syzkaller.appspot.com/bug?extid=c277e8e2f46414645508 > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15a32f46a0 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13a7dc9ea0 > > > > The bug was bisected to: > > > > commit 77115225acc67d9ac4b15f04dd138006b9cd1ef2 > > Author: Amir Goldstein > > Date: Thu Jan 10 17:04:37 2019 + > > > > fanotify: cache fsid in fsnotify_mark_connector > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=12bfcb66a0 > > final crash:https://syzkaller.appspot.com/x/report.txt?x=11bfcb66a0 > > console output: https://syzkaller.appspot.com/x/log.txt?x=16bfcb66a0 > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+c277e8e2f46414645...@syzkaller.appspotmail.com > > Fixes: 77115225acc6 ("fanotify: cache fsid in fsnotify_mark_connector") > > > > WARNING: CPU: 0 PID: 8994 at fs/notify/fanotify/fanotify.c:359 > > fanotify_get_fsid fs/notify/fanotify/fanotify.c:359 [inline] > > Oops, we forgot to update conn->fsid when the first mark added > for inode has no fsid (e.g. inotify) and the second mark has fid, > which is more or less the only thing the repro does. > And if we are going to update conn->fsid, we do no have the > cmpxchg to guaranty setting fsid atomically. > > I am thinking a set-once flag on connector FSNOTIFY_CONN_HAS_FSID > checked before smp_rmb() in fanotify_get_fsid(). > If the flag is not set then call vfs_get_fsid() instead of using fsid cache. Actually, we don't need to call vfs_get_fsid() in race we just drop the event. > conn->fsid can be updated in fsnotify_add_mark_list() under conn->lock, > and flag set after smp_wmb(). > > Does that sound correct? > Something like this: #syz test: https://github.com/amir73il/linux.git fsnotify-fix-fsid-cache It passed my modified ltp test: https://github.com/amir73il/ltp/commits/fanotify_dirent Thanks, Amir.
Re: [PATCH] ecryptfs: use print_hex_dump_bytes for hexdump
On 2019-05-17 12:45:15, Sascha Hauer wrote: > The Kernel has nice hexdump facilities, use them rather a homebrew > hexdump function. > > Signed-off-by: Sascha Hauer Thanks! This is much nicer. I've pushed the commit to the eCryptfs next branch. Tyler > --- > fs/ecryptfs/debug.c | 22 +++--- > 1 file changed, 3 insertions(+), 19 deletions(-) > > diff --git a/fs/ecryptfs/debug.c b/fs/ecryptfs/debug.c > index 3d2bdf546ec6..ee9d8ac4a809 100644 > --- a/fs/ecryptfs/debug.c > +++ b/fs/ecryptfs/debug.c > @@ -97,25 +97,9 @@ void ecryptfs_dump_auth_tok(struct ecryptfs_auth_tok > *auth_tok) > */ > void ecryptfs_dump_hex(char *data, int bytes) > { > - int i = 0; > - int add_newline = 1; > - > if (ecryptfs_verbosity < 1) > return; > - if (bytes != 0) { > - printk(KERN_DEBUG "0x%.2x.", (unsigned char)data[i]); > - i++; > - } > - while (i < bytes) { > - printk("0x%.2x.", (unsigned char)data[i]); > - i++; > - if (i % 16 == 0) { > - printk("\n"); > - add_newline = 0; > - } else > - add_newline = 1; > - } > - if (add_newline) > - printk("\n"); > -} > > + print_hex_dump(KERN_DEBUG, "ecryptfs: ", DUMP_PREFIX_OFFSET, 16, 1, > +data, bytes, false); > +} > -- > 2.20.1 >
[PATCH] staging: kpc2000: simplify error handling in kp2000_pcie_probe
We can get rid of a few iounmaps in the middle of the function by re-ordering the error handling labels and adding two new labels. Signed-off-by: Simon Sandström --- This change has not been tested besides by compiling. It might be good took take an extra look to make sure that I got everything right. Also, this change was proposed by Dan Carpenter. Should I add anything in the commit message to show this? - Simon drivers/staging/kpc2000/kpc2000/core.c | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/drivers/staging/kpc2000/kpc2000/core.c b/drivers/staging/kpc2000/kpc2000/core.c index 610ea549d240..cb05cca687e1 100644 --- a/drivers/staging/kpc2000/kpc2000/core.c +++ b/drivers/staging/kpc2000/kpc2000/core.c @@ -351,12 +351,11 @@ static int kp2000_pcie_probe(struct pci_dev *pdev, err = pci_request_region(pcard->pdev, REG_BAR, KP_DRIVER_NAME_KP2000); if (err) { - iounmap(pcard->regs_bar_base); dev_err(&pcard->pdev->dev, "probe: failed to acquire PCI region (%d)\n", err); err = -ENODEV; - goto err_disable_device; + goto err_unmap_regs; } pcard->regs_base_resource.start = reg_bar_phys_addr; @@ -374,7 +373,7 @@ static int kp2000_pcie_probe(struct pci_dev *pdev, dev_err(&pcard->pdev->dev, "probe: DMA_BAR could not remap memory to virtual space\n"); err = -ENODEV; - goto err_unmap_regs; + goto err_release_regs; } dev_dbg(&pcard->pdev->dev, "probe: DMA_BAR virt hardware address start [%p]\n", @@ -384,11 +383,10 @@ static int kp2000_pcie_probe(struct pci_dev *pdev, err = pci_request_region(pcard->pdev, DMA_BAR, "kp2000_pcie"); if (err) { - iounmap(pcard->dma_bar_base); dev_err(&pcard->pdev->dev, "probe: failed to acquire PCI region (%d)\n", err); err = -ENODEV; - goto err_unmap_regs; + goto err_unmap_dma; } pcard->dma_base_resource.start = dma_bar_phys_addr; @@ -400,7 +398,7 @@ static int kp2000_pcie_probe(struct pci_dev *pdev, pcard->sysinfo_regs_base = pcard->regs_bar_base; err = read_system_regs(pcard); if (err) - goto err_unmap_dma; + goto err_release_dma; // Disable all "user" interrupts because they're not used yet. writeq(0x, @@ -438,14 +436,14 @@ static int kp2000_pcie_probe(struct pci_dev *pdev, if (err) { dev_err(&pcard->pdev->dev, "CANNOT use DMA mask %0llx\n", DMA_BIT_MASK(64)); - goto err_unmap_dma; + goto err_release_dma; } dev_dbg(&pcard->pdev->dev, "Using DMA mask %0llx\n", dma_get_mask(PCARD_TO_DEV(pcard))); err = pci_enable_msi(pcard->pdev); if (err < 0) - goto err_unmap_dma; + goto err_release_dma; rv = request_irq(pcard->pdev->irq, kp2000_irq_handler, IRQF_SHARED, pcard->name, pcard); @@ -478,14 +476,14 @@ static int kp2000_pcie_probe(struct pci_dev *pdev, free_irq(pcard->pdev->irq, pcard); err_disable_msi: pci_disable_msi(pcard->pdev); +err_release_dma: + pci_release_region(pdev, DMA_BAR); err_unmap_dma: iounmap(pcard->dma_bar_base); - pci_release_region(pdev, DMA_BAR); - pcard->dma_bar_base = NULL; +err_release_regs: + pci_release_region(pdev, REG_BAR); err_unmap_regs: iounmap(pcard->regs_bar_base); - pci_release_region(pdev, REG_BAR); - pcard->regs_bar_base = NULL; err_disable_device: pci_disable_device(pcard->pdev); err_remove_ida: -- 2.20.1
Re: [PATCH] scsi: scsi_sysfs.c: Hide wwid sdev attr if VPD is not supported
On 6/19/19 5:35 AM, Martin K. Petersen wrote: > > Marcos, > >> WWID composed from VPD data from device, specifically page 0x83. So, >> when a device does not have VPD support, for example USB storage >> devices where VPD is specifically disabled, a read into > device>/device/wwid file will always return ENXIO. To avoid this, >> change the scsi_sdev_attr_is_visible function to hide wwid sysfs file >> when the devices does not support VPD. > > Not a big fan of attribute files that come and go. > > Why not just return an empty string? Hannes? > Actually, the intention of the 'wwid' attribute was to have a common place where one could look up the global id. As such it actually serves a dual purpose, namely indicating that there _is_ a global ID _and_ that this kernel (version) has support for 'wwid' attribute. This is to resolve one big issue we have to udev nowadays, which is figuring out if a specific sysfs attribute is actually supported on this particular kernel. Dynamic attributes are 'nicer' on a conceptual level, but make the above test nearly impossible, as we now have _two_ possibilities why a specific attribute is not present. So making 'wwid' conditional would actually defeat its very purpose, and we should leave it blank if not supported. Cheers, Hannes -- Dr. Hannes ReineckezSeries & Storage h...@suse.com +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH -next] ecryptfs: remove unnessesary null check in ecryptfs_keyring_auth_tok_for_sig
On 2019-05-27 21:28:14, YueHaibing wrote: > request_key and ecryptfs_get_encrypted_key never > return a NULL pointer, so no need do a null check. > > Signed-off-by: YueHaibing This change looks good to me. I've pushed it to the eCryptfs next branch. Tyler > --- > fs/ecryptfs/keystore.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/ecryptfs/keystore.c b/fs/ecryptfs/keystore.c > index 95662fd46b1d..a1afb162b9d2 100644 > --- a/fs/ecryptfs/keystore.c > +++ b/fs/ecryptfs/keystore.c > @@ -1626,9 +1626,9 @@ int ecryptfs_keyring_auth_tok_for_sig(struct key > **auth_tok_key, > int rc = 0; > > (*auth_tok_key) = request_key(&key_type_user, sig, NULL); > - if (!(*auth_tok_key) || IS_ERR(*auth_tok_key)) { > + if (IS_ERR(*auth_tok_key)) { > (*auth_tok_key) = ecryptfs_get_encrypted_key(sig); > - if (!(*auth_tok_key) || IS_ERR(*auth_tok_key)) { > + if (IS_ERR(*auth_tok_key)) { > printk(KERN_ERR "Could not find key with description: > [%s]\n", > sig); > rc = process_request_key_err(PTR_ERR(*auth_tok_key)); > -- > 2.17.1 > >
Re: [PATCH 04/25] vfs: Implement parameter value retrieval with fsinfo() [ver #13]
On Wed, Jun 19, 2019 at 12:34 AM David Howells wrote: > > Same goes for vfs_parse_sb_flag() btw. It should be moved into each > > filesystem's ->parse_param() and not be a mandatory thing. > > I disagree. Every filesystem *must* be able to accept these standard flags, > even if it then ignores them. "posixacl" is not a standard flag. It never was accepted by mount(8) so I don't see where you got that from. Can you explain why you think "mand", "sync", "dirsync", "lazytime" should be accepted by a filesystem such as proc? The argument that it breaks userspace is BS, because this is a new interface, hence by definition we cannot break old userspace. If mount(8) wants to use the new API and there really is breakage if these options are rejected (which I doubt) then it can easily work around that by ignoring them itself. Also why should "rw" not be rejected for filesystems which are read-only by definition, such as iso9660? Thanks, Miklos
Re: [PATCH -next] ecryptfs: Make ecryptfs_xattr_handler static
On 2019-06-14 23:51:17, YueHaibing wrote: > Fix sparse warning: > > fs/ecryptfs/inode.c:1138:28: warning: > symbol 'ecryptfs_xattr_handler' was not declared. Should it be static? > > Reported-by: Hulk Robot > Signed-off-by: YueHaibing Thanks for the cleanup! I've pushed this to the eCryptfs next branch. Tyler > --- > fs/ecryptfs/inode.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c > index 1e994d7..18426f4 100644 > --- a/fs/ecryptfs/inode.c > +++ b/fs/ecryptfs/inode.c > @@ -1121,7 +1121,7 @@ static int ecryptfs_xattr_set(const struct > xattr_handler *handler, > } > } > > -const struct xattr_handler ecryptfs_xattr_handler = { > +static const struct xattr_handler ecryptfs_xattr_handler = { > .prefix = "", /* match anything */ > .get = ecryptfs_xattr_get, > .set = ecryptfs_xattr_set, > -- > 2.7.4 > >
Re: [PATCH v4 1/3] KVM: x86: add support for user wait instructions
On 6/19/2019 2:09 PM, Tao Xu wrote: UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions. This patch adds support for user wait instructions in KVM. Availability of the user wait instructions is indicated by the presence of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. User wait instructions may be executed at any privilege level, and use IA32_UMWAIT_CONTROL MSR to set the maximum time. The behavior of user wait instructions in VMX non-root operation is determined first by the setting of the "enable user wait and pause" secondary processor-based VM-execution control bit 26. If the VM-execution control is 0, UMONITOR/UMWAIT/TPAUSE cause an invalid-opcode exception (#UD). If the VM-execution control is 1, treatment is based on the setting of the “RDTSC exiting” VM-execution control. Because KVM never enables RDTSC exiting, if the instruction causes a delay, the amount of time delayed is called here the physical delay. The physical delay is first computed by determining the virtual delay. If IA32_UMWAIT_CONTROL[31:2] is zero, the virtual delay is the value in EDX:EAX minus the value that RDTSC would return; if IA32_UMWAIT_CONTROL[31:2] is not zero, the virtual delay is the minimum of that difference and AND(IA32_UMWAIT_CONTROL,FFFCH). Because umwait and tpause can put a (psysical) CPU into a power saving state, by default we dont't expose it to kvm and enable it only when guest CPUID has it. Detailed information about user wait instructions can be found in the latest Intel 64 and IA-32 Architectures Software Developer's Manual. Co-developed-by: Jingqi Liu Signed-off-by: Jingqi Liu Signed-off-by: Tao Xu --- no changes in v4. --- arch/x86/include/asm/vmx.h | 1 + arch/x86/kvm/cpuid.c| 2 +- arch/x86/kvm/vmx/capabilities.h | 6 ++ arch/x86/kvm/vmx/vmx.c | 4 4 files changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index a39136b0d509..8f00882664d3 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -69,6 +69,7 @@ #define SECONDARY_EXEC_PT_USE_GPA 0x0100 #define SECONDARY_EXEC_MODE_BASED_EPT_EXEC0x0040 #define SECONDARY_EXEC_TSC_SCALING 0x0200 +#define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE 0x0400 #define PIN_BASED_EXT_INTR_MASK 0x0001 #define PIN_BASED_NMI_EXITING 0x0008 diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index e18a9f9f65b5..48bd851a6ae5 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -405,7 +405,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) | F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) | - F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B); + F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/; /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index d6664ee3d127..fd77e17651b4 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -253,6 +253,12 @@ static inline bool cpu_has_vmx_tsc_scaling(void) SECONDARY_EXEC_TSC_SCALING; } +static inline bool vmx_waitpkg_supported(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl & + SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE; Shouldn't it be return vmx->secondary_exec_control & SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE; ? +} + static inline bool cpu_has_vmx_apicv(void) { return cpu_has_vmx_apic_register_virt() && diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index b93e36ddee5e..b35bfac30a34 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2250,6 +2250,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, SECONDARY_EXEC_RDRAND_EXITING | SECONDARY_EXEC_ENABLE_PML | SECONDARY_EXEC_TSC_SCALING | + SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE | SECONDARY_EXEC_PT_USE_GPA | SECONDARY_EXEC_PT_CONCEAL_VMX | SECONDARY_EXEC_ENABLE_VMFUNC | @@ -3987,6 +3988,9 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx) } } + if (!guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG)) + exec_control &= ~SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE; + vmx->secondary_exec_control = exec_control; }
Re: [PATCH 4/5] Powerpc/hw-breakpoint: Optimize disable path
On 6/18/19 12:01 PM, Christophe Leroy wrote: >> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c >> index f002d286..265fac9fb3a4 100644 >> --- a/arch/powerpc/kernel/process.c >> +++ b/arch/powerpc/kernel/process.c >> @@ -793,10 +793,22 @@ static inline int set_dabr(struct arch_hw_breakpoint >> *brk) >> return __set_dabr(dabr, dabrx); >> } >> +static int disable_dawr(void) >> +{ >> + if (ppc_md.set_dawr) >> + return ppc_md.set_dawr(0, 0); >> + >> + mtspr(SPRN_DAWRX, 0); > > And SPRN_DAWR ? Setting DAWRx with 0 should be enough to disable the breakpoint.
[net v1] net: stmmac: fixed new system time seconds value calculation
From: Roland Hii When ADDSUB bit is set, the system time seconds field is calculated as the complement of the seconds part of the update value. For example, if 3.1 seconds need to be subtracted from the system time, this field is calculated as 2^32 - 3 = 4294967296 - 3 = 0x1 - 3 = 0xFFFD Previously, the 0x1 is mistakenly written as 1. This is further simplified from sec = (0x1ULL - sec); to sec = -sec; Fixes: ba1ffd74df74 ("stmmac: fix PTP support for GMAC4") Signed-off-by: Roland Hii Signed-off-by: Ong Boon Leong Signed-off-by: Voon Weifeng diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c index 2dcdf761d525..020159622559 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c @@ -112,7 +112,7 @@ static int adjust_systime(void __iomem *ioaddr, u32 sec, u32 nsec, * programmed with (2^32 – ) */ if (gmac4) - sec = (1ULL - sec); + sec = -sec; value = readl(ioaddr + PTP_TCR); if (value & PTP_TCR_TSCTRLSSR) -- 1.9.1
[PATCH v4 0/3] KVM: x86: Enable user wait instructions
UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions. UMONITOR arms address monitoring hardware using an address. A store to an address within the specified address range triggers the monitoring hardware to wake up the processor waiting in umwait. UMWAIT instructs the processor to enter an implementation-dependent optimized state while monitoring a range of addresses. The optimized state may be either a light-weight power/performance optimized state (c0.1 state) or an improved power/performance optimized state (c0.2 state). TPAUSE instructs the processor to enter an implementation-dependent optimized state c0.1 or c0.2 state and wake up when time-stamp counter reaches specified timeout. Availability of the user wait instructions is indicated by the presence of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. The patches enable the umonitor, umwait and tpause features in KVM. Because umwait and tpause can put a (psysical) CPU into a power saving state, by default we dont't expose it to kvm and enable it only when guest CPUID has it. If the instruction causes a delay, the amount of time delayed is called here the physical delay. The physical delay is first computed by determining the virtual delay (the time to delay relative to the VM’s timestamp counter). The release document ref below link: Intel 64 and IA-32 Architectures Software Developer's Manual, https://software.intel.com/sites/default/files/\ managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf This patch has a dependency on https://lkml.org/lkml/2019/6/7/1206 Changelog: v4: Set msr of IA32_UMWAIT_CONTROL can be 0 and add the check of reserved bit 1 (Radim and Xiaoyao) Use umwait_control_cached directly and add the IA32_UMWAIT_CONTROL in msrs_to_save[] to support migration (Xiaoyao) v3: Simplify the patches, expose user wait instructions when the guest has CPUID (Paolo) Use mwait_control_cached to avoid frequently rdmsr of IA32_UMWAIT_CONTROL (Paolo and Xiaoyao) Handle vm-exit for UMWAIT and TPAUSE as "never happen" (Paolo) v2: Separated from the series https://lkml.org/lkml/2018/7/10/160 Add provide a capability to enable UMONITOR, UMWAIT and TPAUSE v1: Sent out with MOVDIRI/MOVDIR64B instructions patches Tao Xu (3): KVM: x86: add support for user wait instructions KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL KVM: vmx: handle vm-exit for UMWAIT and TPAUSE arch/x86/include/asm/vmx.h | 1 + arch/x86/include/uapi/asm/vmx.h | 6 +++- arch/x86/kvm/cpuid.c| 2 +- arch/x86/kvm/vmx/capabilities.h | 6 arch/x86/kvm/vmx/vmx.c | 53 + arch/x86/kvm/vmx/vmx.h | 3 ++ arch/x86/kvm/x86.c | 1 + arch/x86/power/umwait.c | 3 +- 8 files changed, 72 insertions(+), 3 deletions(-) -- 2.20.1
[PATCH v4 1/3] KVM: x86: add support for user wait instructions
UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions. This patch adds support for user wait instructions in KVM. Availability of the user wait instructions is indicated by the presence of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. User wait instructions may be executed at any privilege level, and use IA32_UMWAIT_CONTROL MSR to set the maximum time. The behavior of user wait instructions in VMX non-root operation is determined first by the setting of the "enable user wait and pause" secondary processor-based VM-execution control bit 26. If the VM-execution control is 0, UMONITOR/UMWAIT/TPAUSE cause an invalid-opcode exception (#UD). If the VM-execution control is 1, treatment is based on the setting of the “RDTSC exiting” VM-execution control. Because KVM never enables RDTSC exiting, if the instruction causes a delay, the amount of time delayed is called here the physical delay. The physical delay is first computed by determining the virtual delay. If IA32_UMWAIT_CONTROL[31:2] is zero, the virtual delay is the value in EDX:EAX minus the value that RDTSC would return; if IA32_UMWAIT_CONTROL[31:2] is not zero, the virtual delay is the minimum of that difference and AND(IA32_UMWAIT_CONTROL,FFFCH). Because umwait and tpause can put a (psysical) CPU into a power saving state, by default we dont't expose it to kvm and enable it only when guest CPUID has it. Detailed information about user wait instructions can be found in the latest Intel 64 and IA-32 Architectures Software Developer's Manual. Co-developed-by: Jingqi Liu Signed-off-by: Jingqi Liu Signed-off-by: Tao Xu --- no changes in v4. --- arch/x86/include/asm/vmx.h | 1 + arch/x86/kvm/cpuid.c| 2 +- arch/x86/kvm/vmx/capabilities.h | 6 ++ arch/x86/kvm/vmx/vmx.c | 4 4 files changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index a39136b0d509..8f00882664d3 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -69,6 +69,7 @@ #define SECONDARY_EXEC_PT_USE_GPA 0x0100 #define SECONDARY_EXEC_MODE_BASED_EPT_EXEC 0x0040 #define SECONDARY_EXEC_TSC_SCALING 0x0200 +#define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE 0x0400 #define PIN_BASED_EXT_INTR_MASK 0x0001 #define PIN_BASED_NMI_EXITING 0x0008 diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index e18a9f9f65b5..48bd851a6ae5 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -405,7 +405,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) | F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) | - F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B); + F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/; /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index d6664ee3d127..fd77e17651b4 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -253,6 +253,12 @@ static inline bool cpu_has_vmx_tsc_scaling(void) SECONDARY_EXEC_TSC_SCALING; } +static inline bool vmx_waitpkg_supported(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl & + SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE; +} + static inline bool cpu_has_vmx_apicv(void) { return cpu_has_vmx_apic_register_virt() && diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index b93e36ddee5e..b35bfac30a34 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2250,6 +2250,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, SECONDARY_EXEC_RDRAND_EXITING | SECONDARY_EXEC_ENABLE_PML | SECONDARY_EXEC_TSC_SCALING | + SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE | SECONDARY_EXEC_PT_USE_GPA | SECONDARY_EXEC_PT_CONCEAL_VMX | SECONDARY_EXEC_ENABLE_VMFUNC | @@ -3987,6 +3988,9 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx) } } + if (!guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG)) + exec_control &= ~SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE; + vmx->secondary_exec_control = exec_control; } -- 2.20.1
[PATCH v4 3/3] KVM: vmx: handle vm-exit for UMWAIT and TPAUSE
As the latest Intel 64 and IA-32 Architectures Software Developer's Manual, UMWAIT and TPAUSE instructions cause a VM exit if the RDTSC exiting and enable user wait and pause VM-execution controls are both 1. This patch is to handle the vm-exit for UMWAIT and TPAUSE as this should never happen. Co-developed-by: Jingqi Liu Signed-off-by: Jingqi Liu Signed-off-by: Tao Xu --- no changes in v4 --- arch/x86/include/uapi/asm/vmx.h | 6 +- arch/x86/kvm/vmx/vmx.c | 16 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h index d213ec5c3766..d88d7a68849b 100644 --- a/arch/x86/include/uapi/asm/vmx.h +++ b/arch/x86/include/uapi/asm/vmx.h @@ -85,6 +85,8 @@ #define EXIT_REASON_PML_FULL62 #define EXIT_REASON_XSAVES 63 #define EXIT_REASON_XRSTORS 64 +#define EXIT_REASON_UMWAIT 67 +#define EXIT_REASON_TPAUSE 68 #define VMX_EXIT_REASONS \ { EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \ @@ -142,7 +144,9 @@ { EXIT_REASON_RDSEED,"RDSEED" }, \ { EXIT_REASON_PML_FULL, "PML_FULL" }, \ { EXIT_REASON_XSAVES,"XSAVES" }, \ - { EXIT_REASON_XRSTORS, "XRSTORS" } + { EXIT_REASON_XRSTORS, "XRSTORS" }, \ + { EXIT_REASON_UMWAIT,"UMWAIT" }, \ + { EXIT_REASON_TPAUSE,"TPAUSE" } #define VMX_ABORT_SAVE_GUEST_MSR_FAIL1 #define VMX_ABORT_LOAD_HOST_PDPTE_FAIL 2 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index eb13ff9759d3..46125553b180 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5336,6 +5336,20 @@ static int handle_monitor(struct kvm_vcpu *vcpu) return handle_nop(vcpu); } +static int handle_umwait(struct kvm_vcpu *vcpu) +{ + kvm_skip_emulated_instruction(vcpu); + WARN(1, "this should never happen\n"); + return 1; +} + +static int handle_tpause(struct kvm_vcpu *vcpu) +{ + kvm_skip_emulated_instruction(vcpu); + WARN(1, "this should never happen\n"); + return 1; +} + static int handle_invpcid(struct kvm_vcpu *vcpu) { u32 vmx_instruction_info; @@ -5546,6 +5560,8 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { [EXIT_REASON_VMFUNC] = handle_vmx_instruction, [EXIT_REASON_PREEMPTION_TIMER]= handle_preemption_timer, [EXIT_REASON_ENCLS] = handle_encls, + [EXIT_REASON_UMWAIT] = handle_umwait, + [EXIT_REASON_TPAUSE] = handle_tpause, }; static const int kvm_vmx_max_exit_handlers = -- 2.20.1
[PATCH v4 2/3] KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL
UMWAIT and TPAUSE instructions use IA32_UMWAIT_CONTROL at MSR index E1H to determines the maximum time in TSC-quanta that the processor can reside in either C0.1 or C0.2. This patch emulates MSR IA32_UMWAIT_CONTROL in guest and differentiate IA32_UMWAIT_CONTROL between host and guest. The variable mwait_control_cached in arch/x86/power/umwait.c caches the MSR value, so this patch uses it to avoid frequently rdmsr of IA32_UMWAIT_CONTROL. Co-developed-by: Jingqi Liu Signed-off-by: Jingqi Liu Signed-off-by: Tao Xu --- Changes in v4: Set msr of IA32_UMWAIT_CONTROL can be 0 and add the check of reserved bit 1 (Radim and Xiaoyao) Use umwait_control_cached directly and add the IA32_UMWAIT_CONTROL in msrs_to_save[] to support migration (Xiaoyao) --- arch/x86/kvm/vmx/vmx.c | 33 + arch/x86/kvm/vmx/vmx.h | 3 +++ arch/x86/kvm/x86.c | 1 + arch/x86/power/umwait.c | 3 ++- 4 files changed, 39 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index b35bfac30a34..eb13ff9759d3 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1679,6 +1679,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) #endif case MSR_EFER: return kvm_get_msr_common(vcpu, msr_info); + case MSR_IA32_UMWAIT_CONTROL: + if (!vmx_waitpkg_supported()) + return 1; + + msr_info->data = vmx->msr_ia32_umwait_control; + break; case MSR_IA32_SPEC_CTRL: if (!msr_info->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) @@ -1841,6 +1847,16 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 1; vmcs_write64(GUEST_BNDCFGS, data); break; + case MSR_IA32_UMWAIT_CONTROL: + if (!vmx_waitpkg_supported()) + return 1; + + /* The reserved bit IA32_UMWAIT_CONTROL[1] should be zero */ + if (data & BIT_ULL(1)) + return 1; + + vmx->msr_ia32_umwait_control = data; + break; case MSR_IA32_SPEC_CTRL: if (!msr_info->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) @@ -4126,6 +4142,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmx->rmode.vm86_active = 0; vmx->spec_ctrl = 0; + vmx->msr_ia32_umwait_control = 0; + vcpu->arch.microcode_version = 0x1ULL; vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val(); kvm_set_cr8(vcpu, 0); @@ -6339,6 +6357,19 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx) msrs[i].host, false); } +static void atomic_switch_ia32_umwait_control(struct vcpu_vmx *vmx) +{ + if (!vmx_waitpkg_supported()) + return; + + if (vmx->msr_ia32_umwait_control != umwait_control_cached) + add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL, + vmx->msr_ia32_umwait_control, + umwait_control_cached, false); + else + clear_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL); +} + static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val) { vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val); @@ -6447,6 +6478,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) atomic_switch_perf_msrs(vmx); + atomic_switch_ia32_umwait_control(vmx); + vmx_update_hv_timer(vcpu); /* diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 61128b48c503..8485bec7c38a 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -14,6 +14,8 @@ extern const u32 vmx_msr_index[]; extern u64 host_efer; +extern u32 umwait_control_cached; + #define MSR_TYPE_R 1 #define MSR_TYPE_W 2 #define MSR_TYPE_RW3 @@ -194,6 +196,7 @@ struct vcpu_vmx { #endif u64 spec_ctrl; + u64 msr_ia32_umwait_control; u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 83aefd759846..4480de459bf4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1138,6 +1138,7 @@ static u32 msrs_to_save[] = { MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B, MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B, MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B, + MSR_IA32_UMWAIT_CONTROL, }; static unsigned num_msrs_to_save; diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c index 7fa381e3fd4e..2e6ce4cbccb3 100644 --- a/arch/x86/power/umwait.c +++ b/arch/x86/power/umwait.c @@ -9,7 +9,8 @@ * MSR value. By default, umwait max time is 10 in TSC-quanta and C0.2 * is enabled */ -sta
RE: [PATCH v2 6/6] net: macb: parameter added to cadence ethernet controller DT binding
Hi Florian, >Please don't resubmit individual patches as replies to your previous >ones, re-submitting the entire patch series, see this netdev-FAQ section >for details: I will resubmit entire patch series separately. > >> +- serdes-rate External serdes rate.Mandatory for USXGMII mode. > >> +5 - 5G > >> +10 - 10G > > > >There should be an unit specifier in that property, something like: >serdes-rate-gbps >can't we somehow automatically detect that? Ok, sure. I will add unit specifier to property name. No, currently HW don’t have way to auto detect external serdes rate. Regards, Parshuram Thombare
Re: nouveau: DRM: GPU lockup - switching to software fbcon
On Wed, Jun 19, 2019 at 1:48 AM Sergey Senozhatsky wrote: > > On (06/19/19 01:20), Ilia Mirkin wrote: > > On Wed, Jun 19, 2019 at 1:08 AM Sergey Senozhatsky > > wrote: > > > > > > On (06/14/19 11:50), Sergey Senozhatsky wrote: > > > > dmesg > > > > > > > > nouveau :01:00.0: DRM: GPU lockup - switching to software fbcon > > > > nouveau :01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] > > > > nouveau :01:00.0: fifo: runlist 0: scheduled for recovery > > > > nouveau :01:00.0: fifo: channel 5: killed > > > > nouveau :01:00.0: fifo: engine 6: scheduled for recovery > > > > nouveau :01:00.0: fifo: engine 0: scheduled for recovery > > > > nouveau :01:00.0: firefox[476]: channel 5 killed! > > > > nouveau :01:00.0: firefox[476]: failed to idle channel 5 > > > > [firefox[476]] > > > > > > > > It lockups several times a day. Twice in just one hour today. > > > > Can we fix this? > > > > > > Unusable > > > > Are you using a GTX 660 by any chance? You've provided rather minimal > > system info. > > 01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 730] > (rev a1) Quite literally the same GPU I have plugged in... 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B [GeForce GT 730] [10de:1287] (rev a1) Works great here! Only other thing I can think of is that I avoid applications with the letters "G" and "K" in their names, and I'm using xf86-video-nouveau ddx, whereas you might be using the "modeset" ddx with glamor. If all else fails, just remove nouveau_dri.so and/or boot with nouveau.noaccel=1 -- should be perfect. Cheers, -ilia
[PATCH v10 12/13] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields
At namespace creation time there is the potential for the "expected to be zero" fields of a 'pfn' info-block to be filled with indeterminate data. While the kernel buffer is zeroed on allocation it is immediately overwritten by nd_pfn_validate() filling it with the current contents of the on-media info-block location. For fields like, 'flags' and the 'padding' it potentially means that future implementations can not rely on those fields being zero. In preparation to stop using the 'start_pad' and 'end_trunc' fields for section alignment, arrange for fields that are not explicitly initialized to be guaranteed zero. Bump the minor version to indicate it is safe to assume the 'padding' and 'flags' are zero. Otherwise, this corruption is expected to benign since all other critical fields are explicitly initialized. Note The cc: stable is about spreading this new policy to as many kernels as possible not fixing an issue in those kernels. It is not until the change titled "libnvdimm/pfn: Stop padding pmem namespaces to section alignment" where this improper initialization becomes a problem. So if someone decides to backport "libnvdimm/pfn: Stop padding pmem namespaces to section alignment" (which is not tagged for stable), make sure this pre-requisite is flagged. Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem") Cc: Signed-off-by: Dan Williams --- drivers/nvdimm/dax_devs.c |2 +- drivers/nvdimm/pfn.h |1 + drivers/nvdimm/pfn_devs.c | 18 +++--- 3 files changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c index 49fc18ee0565..6d22b0f83b3b 100644 --- a/drivers/nvdimm/dax_devs.c +++ b/drivers/nvdimm/dax_devs.c @@ -118,7 +118,7 @@ int nd_dax_probe(struct device *dev, struct nd_namespace_common *ndns) nvdimm_bus_unlock(&ndns->dev); if (!dax_dev) return -ENOMEM; - pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); + pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); nd_pfn->pfn_sb = pfn_sb; rc = nd_pfn_validate(nd_pfn, DAX_SIG); dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : ""); diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h index f58b849e455b..dfb2bcda8f5a 100644 --- a/drivers/nvdimm/pfn.h +++ b/drivers/nvdimm/pfn.h @@ -28,6 +28,7 @@ struct nd_pfn_sb { __le32 end_trunc; /* minor-version-2 record the base alignment of the mapping */ __le32 align; + /* minor-version-3 guarantee the padding and flags are zero */ u8 padding[4000]; __le64 checksum; }; diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 0f81fc56bbfd..4977424693b0 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -412,6 +412,15 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn *nd_pfn) return 0; } +/** + * nd_pfn_validate - read and validate info-block + * @nd_pfn: fsdax namespace runtime state / properties + * @sig: 'devdax' or 'fsdax' signature + * + * Upon return the info-block buffer contents (->pfn_sb) are + * indeterminate when validation fails, and a coherent info-block + * otherwise. + */ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) { u64 checksum, offset; @@ -557,7 +566,7 @@ int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns) nvdimm_bus_unlock(&ndns->dev); if (!pfn_dev) return -ENOMEM; - pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); + pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); nd_pfn = to_nd_pfn(pfn_dev); nd_pfn->pfn_sb = pfn_sb; rc = nd_pfn_validate(nd_pfn, PFN_SIG); @@ -694,7 +703,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) u64 checksum; int rc; - pfn_sb = devm_kzalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL); + pfn_sb = devm_kmalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL); if (!pfn_sb) return -ENOMEM; @@ -703,11 +712,14 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) sig = DAX_SIG; else sig = PFN_SIG; + rc = nd_pfn_validate(nd_pfn, sig); if (rc != -ENODEV) return rc; /* no info block, do init */; + memset(pfn_sb, 0, sizeof(*pfn_sb)); + nd_region = to_nd_region(nd_pfn->dev.parent); if (nd_region->ro) { dev_info(&nd_pfn->dev, @@ -760,7 +772,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) memcpy(pfn_sb->uuid, nd_pfn->uuid, 16); memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16); pfn_sb->version_major = cpu_to_le16(1); - pfn_sb->version_minor = cpu_to_le16(2); + pfn_sb->version_minor = cpu_to_le16(3); pfn_sb->start_pad = cpu_to_le32(start_pad); pfn_sb->end_trunc = cpu_to_le32(end_trunc); pfn_sb->align = cpu_to_le32(nd_pfn->align);
[PATCH v10 09/13] mm/sparsemem: Support sub-section hotplug
The libnvdimm sub-system has suffered a series of hacks and broken workarounds for the memory-hotplug implementation's awkward section-aligned (128MB) granularity. For example the following backtrace is emitted when attempting arch_add_memory() with physical address ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section: # cat /proc/iomem | grep -A1 -B1 Persistent\ Memory 1-1 : System RAM 2-303ff : Persistent Memory (legacy) 30400-43fff : System RAM 44000-23 : Persistent Memory 24-43bfff : Persistent Memory 24-43bfff : namespace2.0 WARNING: CPU: 38 PID: 928 at arch/x86/mm/init_64.c:850 add_pages+0x5c/0x60 [..] RIP: 0010:add_pages+0x5c/0x60 [..] Call Trace: devm_memremap_pages+0x460/0x6e0 pmem_attach_disk+0x29e/0x680 [nd_pmem] ? nd_dax_probe+0xfc/0x120 [libnvdimm] nvdimm_bus_probe+0x66/0x160 [libnvdimm] It was discovered that the problem goes beyond RAM vs PMEM collisions as some platform produce PMEM vs PMEM collisions within a given section. The libnvdimm workaround for that case revealed that the libnvdimm section-alignment-padding implementation has been broken for a long while. A fix for that long-standing breakage introduces as many problems as it solves as it would require a backward-incompatible change to the namespace metadata interpretation. Instead of that dubious route [1], address the root problem in the memory-hotplug implementation. Note that EEXIST is no longer treated as success as that is how sparse_add_section() reports subsection collisions, it was also obviated by recent changes to perform the request_region() for 'System RAM' before arch_add_memory() in the add_memory() sequence. [1]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.st...@dwillia2-desk3.amr.corp.intel.com Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Cc: Oscar Salvador Cc: Pavel Tatashin Signed-off-by: Dan Williams --- include/linux/memory_hotplug.h |2 mm/memory_hotplug.c| 27 + mm/page_alloc.c|2 mm/sparse.c| 205 ++-- 4 files changed, 140 insertions(+), 96 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 3ab0282b4fe5..0b8a5e5ef2da 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -350,7 +350,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, extern bool is_memblock_offlined(struct memory_block *mem); extern int sparse_add_section(int nid, unsigned long pfn, unsigned long nr_pages, struct vmem_altmap *altmap); -extern void sparse_remove_one_section(struct mem_section *ms, +extern void sparse_remove_section(struct mem_section *ms, unsigned long pfn, unsigned long nr_pages, unsigned long map_offset, struct vmem_altmap *altmap); extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map, diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 399bf78bccc5..4e8e65954f31 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -252,18 +252,6 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat) } #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */ -static int __meminit __add_section(int nid, unsigned long pfn, - unsigned long nr_pages, struct vmem_altmap *altmap) -{ - int ret; - - if (pfn_valid(pfn)) - return -EEXIST; - - ret = sparse_add_section(nid, pfn, nr_pages, altmap); - return ret < 0 ? ret : 0; -} - static int check_pfn_span(unsigned long pfn, unsigned long nr_pages, const char *reason) { @@ -327,18 +315,11 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, pfns = min(nr_pages, PAGES_PER_SECTION - (pfn & ~PAGE_SECTION_MASK)); - err = __add_section(nid, pfn, pfns, altmap); + err = sparse_add_section(nid, pfn, pfns, altmap); + if (err) + break; pfn += pfns; nr_pages -= pfns; - - /* -* EEXIST is finally dealt with by ioresource collision -* check. see add_memory() => register_memory_resource() -* Warning will be printed if there is collision. -*/ - if (err && (err != -EEXIST)) - break; - err = 0; cond_resched(); } vmemmap_populate_print_last(); @@ -541,7 +522,7 @@ static void __remove_section(struct zone *zone, unsigned long pfn, return; __remove_zone(zone, pfn, nr_pages); - sparse_remove_one_section(ms, pfn, nr_pages, map_offset, altmap); + sparse_remove_section(m
[PATCH v10 06/13] mm/hotplug: Kill is_dev_zone() usage in __remove_pages()
The zone type check was a leftover from the cleanup that plumbed altmap through the memory hotplug path, i.e. commit da024512a1fa "mm: pass the vmem_altmap to arch_remove_memory and __remove_pages". Cc: Michal Hocko Cc: Logan Gunthorpe Cc: Pavel Tatashin Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Signed-off-by: Dan Williams --- mm/memory_hotplug.c |7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 647859a1d119..4b882c57781a 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -535,11 +535,8 @@ void __remove_pages(struct zone *zone, unsigned long phys_start_pfn, unsigned long map_offset = 0; int sections_to_remove; - /* In the ZONE_DEVICE case device driver owns the memory region */ - if (is_dev_zone(zone)) { - if (altmap) - map_offset = vmem_altmap_offset(altmap); - } + if (altmap) + map_offset = vmem_altmap_offset(altmap); clear_zone_contiguous(zone);
[PATCH v10 08/13] mm/sparsemem: Prepare for sub-section ranges
Prepare the memory hot-{add,remove} paths for handling sub-section ranges by plumbing the starting page frame and number of pages being handled through arch_{add,remove}_memory() to sparse_{add,remove}_one_section(). This is simply plumbing, small cleanups, and some identifier renames. No intended functional changes. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Cc: Oscar Salvador Reviewed-by: Pavel Tatashin Signed-off-by: Dan Williams --- include/linux/memory_hotplug.h |5 +- mm/memory_hotplug.c| 114 +--- mm/sparse.c| 16 ++ 3 files changed, 81 insertions(+), 54 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 79e0add6a597..3ab0282b4fe5 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -348,9 +348,10 @@ extern int add_memory_resource(int nid, struct resource *resource); extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap); extern bool is_memblock_offlined(struct memory_block *mem); -extern int sparse_add_one_section(int nid, unsigned long start_pfn, - struct vmem_altmap *altmap); +extern int sparse_add_section(int nid, unsigned long pfn, + unsigned long nr_pages, struct vmem_altmap *altmap); extern void sparse_remove_one_section(struct mem_section *ms, + unsigned long pfn, unsigned long nr_pages, unsigned long map_offset, struct vmem_altmap *altmap); extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 4b882c57781a..399bf78bccc5 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -252,51 +252,84 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat) } #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */ -static int __meminit __add_section(int nid, unsigned long phys_start_pfn, - struct vmem_altmap *altmap) +static int __meminit __add_section(int nid, unsigned long pfn, + unsigned long nr_pages, struct vmem_altmap *altmap) { int ret; - if (pfn_valid(phys_start_pfn)) + if (pfn_valid(pfn)) return -EEXIST; - ret = sparse_add_one_section(nid, phys_start_pfn, altmap); + ret = sparse_add_section(nid, pfn, nr_pages, altmap); return ret < 0 ? ret : 0; } +static int check_pfn_span(unsigned long pfn, unsigned long nr_pages, + const char *reason) +{ + /* +* Disallow all operations smaller than a sub-section and only +* allow operations smaller than a section for +* SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range() +* enforces a larger memory_block_size_bytes() granularity for +* memory that will be marked online, so this check should only +* fire for direct arch_{add,remove}_memory() users outside of +* add_memory_resource(). +*/ + unsigned long min_align; + + if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)) + min_align = PAGES_PER_SUBSECTION; + else + min_align = PAGES_PER_SECTION; + if (!IS_ALIGNED(pfn, min_align) + || !IS_ALIGNED(nr_pages, min_align)) { + WARN(1, "Misaligned __%s_pages start: %#lx end: #%lx\n", + reason, pfn, pfn + nr_pages - 1); + return -EINVAL; + } + return 0; +} + /* * Reasonably generic function for adding memory. It is * expected that archs that support memory hotplug will * call this function after deciding the zone to which to * add the new pages. */ -int __ref __add_pages(int nid, unsigned long phys_start_pfn, - unsigned long nr_pages, struct mhp_restrictions *restrictions) +int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, + struct mhp_restrictions *restrictions) { unsigned long i; - int err = 0; - int start_sec, end_sec; + int start_sec, end_sec, err; struct vmem_altmap *altmap = restrictions->altmap; - /* during initialize mem_map, align hot-added range to section */ - start_sec = pfn_to_section_nr(phys_start_pfn); - end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1); - if (altmap) { /* * Validate altmap is within bounds of the total request */ - if (altmap->base_pfn != phys_start_pfn + if (altmap->base_pfn != pfn || vmem_altmap_offset(altmap) > nr_pages) { pr_warn_once("memory add fail, invalid altmap\n"); - err = -EINVAL; - goto o
[PATCH v10 10/13] mm: Document ZONE_DEVICE memory-model implications
Explain the general mechanisms of 'ZONE_DEVICE' pages and list the users of 'devm_memremap_pages()'. Cc: Jonathan Corbet Reported-by: Mike Rapoport Signed-off-by: Dan Williams --- Documentation/vm/memory-model.rst | 39 + 1 file changed, 39 insertions(+) diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst index 382f72ace1fc..e0af47e02e78 100644 --- a/Documentation/vm/memory-model.rst +++ b/Documentation/vm/memory-model.rst @@ -181,3 +181,42 @@ that is eventually passed to vmemmap_populate() through a long chain of function calls. The vmemmap_populate() implementation may use the `vmem_altmap` along with :c:func:`altmap_alloc_block_buf` helper to allocate memory map on the persistent memory device. + +ZONE_DEVICE +=== +The `ZONE_DEVICE` facility builds upon `SPARSEMEM_VMEMMAP` to offer +`struct page` `mem_map` services for device driver identified physical +address ranges. The "device" aspect of `ZONE_DEVICE` relates to the fact +that the page objects for these address ranges are never marked online, +and that a reference must be taken against the device, not just the page +to keep the memory pinned for active use. `ZONE_DEVICE`, via +:c:func:`devm_memremap_pages`, performs just enough memory hotplug to +turn on :c:func:`pfn_to_page`, :c:func:`page_to_pfn`, and +:c:func:`get_user_pages` service for the given range of pfns. Since the +page reference count never drops below 1 the page is never tracked as +free memory and the page's `struct list_head lru` space is repurposed +for back referencing to the host device / driver that mapped the memory. + +While `SPARSEMEM` presents memory as a collection of sections, +optionally collected into memory blocks, `ZONE_DEVICE` users have a need +for smaller granularity of populating the `mem_map`. Given that +`ZONE_DEVICE` memory is never marked online it is subsequently never +subject to its memory ranges being exposed through the sysfs memory +hotplug api on memory block boundaries. The implementation relies on +this lack of user-api constraint to allow sub-section sized memory +ranges to be specified to :c:func:`arch_add_memory`, the top-half of +memory hotplug. Sub-section support allows for `PMD_SIZE` as the minimum +alignment granularity for :c:func:`devm_memremap_pages`. + +The users of `ZONE_DEVICE` are: +* pmem: Map platform persistent memory to be used as a direct-I/O target + via DAX mappings. + +* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()` + event callbacks to allow a device-driver to coordinate memory management + events related to device-memory, typically GPU memory. See + Documentation/vm/hmm.rst. + +* p2pdma: Create `struct page` objects to allow peer devices in a + PCI/-E topology to coordinate direct-DMA operations between themselves, + i.e. bypass host memory.
[PATCH v10 13/13] libnvdimm/pfn: Stop padding pmem namespaces to section alignment
Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE memory, we no longer need to add padding at pfn/dax device creation time. The kernel will still honor padding established by older kernels. Reported-by: Jeff Moyer Signed-off-by: Dan Williams --- drivers/nvdimm/pfn.h | 14 drivers/nvdimm/pfn_devs.c | 77 - include/linux/mmzone.h|3 ++ 3 files changed, 16 insertions(+), 78 deletions(-) diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h index dfb2bcda8f5a..7381673b7b70 100644 --- a/drivers/nvdimm/pfn.h +++ b/drivers/nvdimm/pfn.h @@ -33,18 +33,4 @@ struct nd_pfn_sb { __le64 checksum; }; -#ifdef CONFIG_SPARSEMEM -#define PFN_SECTION_ALIGN_DOWN(x) SECTION_ALIGN_DOWN(x) -#define PFN_SECTION_ALIGN_UP(x) SECTION_ALIGN_UP(x) -#else -/* - * In this case ZONE_DEVICE=n and we will disable 'pfn' device support, - * but we still want pmem to compile. - */ -#define PFN_SECTION_ALIGN_DOWN(x) (x) -#define PFN_SECTION_ALIGN_UP(x) (x) -#endif - -#define PHYS_SECTION_ALIGN_DOWN(x) PFN_PHYS(PFN_SECTION_ALIGN_DOWN(PHYS_PFN(x))) -#define PHYS_SECTION_ALIGN_UP(x) PFN_PHYS(PFN_SECTION_ALIGN_UP(PHYS_PFN(x))) #endif /* __NVDIMM_PFN_H */ diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 4977424693b0..2537aa338bd0 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -587,14 +587,14 @@ static u32 info_block_reserve(void) } /* - * We hotplug memory at section granularity, pad the reserved area from - * the previous section base to the namespace base address. + * We hotplug memory at sub-section granularity, pad the reserved area + * from the previous section base to the namespace base address. */ static unsigned long init_altmap_base(resource_size_t base) { unsigned long base_pfn = PHYS_PFN(base); - return PFN_SECTION_ALIGN_DOWN(base_pfn); + return SUBSECTION_ALIGN_DOWN(base_pfn); } static unsigned long init_altmap_reserve(resource_size_t base) @@ -602,7 +602,7 @@ static unsigned long init_altmap_reserve(resource_size_t base) unsigned long reserve = info_block_reserve() >> PAGE_SHIFT; unsigned long base_pfn = PHYS_PFN(base); - reserve += base_pfn - PFN_SECTION_ALIGN_DOWN(base_pfn); + reserve += base_pfn - SUBSECTION_ALIGN_DOWN(base_pfn); return reserve; } @@ -633,8 +633,7 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap) nd_pfn->npfns = le64_to_cpu(pfn_sb->npfns); pgmap->altmap_valid = false; } else if (nd_pfn->mode == PFN_MODE_PMEM) { - nd_pfn->npfns = PFN_SECTION_ALIGN_UP((resource_size(res) - - offset) / PAGE_SIZE); + nd_pfn->npfns = PHYS_PFN((resource_size(res) - offset)); if (le64_to_cpu(nd_pfn->pfn_sb->npfns) > nd_pfn->npfns) dev_info(&nd_pfn->dev, "number of pfns truncated from %lld to %ld\n", @@ -650,54 +649,14 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap) return 0; } -static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys) -{ - return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys), - ALIGN_DOWN(phys, nd_pfn->align)); -} - -/* - * Check if pmem collides with 'System RAM', or other regions when - * section aligned. Trim it accordingly. - */ -static void trim_pfn_device(struct nd_pfn *nd_pfn, u32 *start_pad, u32 *end_trunc) -{ - struct nd_namespace_common *ndns = nd_pfn->ndns; - struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); - struct nd_region *nd_region = to_nd_region(nd_pfn->dev.parent); - const resource_size_t start = nsio->res.start; - const resource_size_t end = start + resource_size(&nsio->res); - resource_size_t adjust, size; - - *start_pad = 0; - *end_trunc = 0; - - adjust = start - PHYS_SECTION_ALIGN_DOWN(start); - size = resource_size(&nsio->res) + adjust; - if (region_intersects(start - adjust, size, IORESOURCE_SYSTEM_RAM, - IORES_DESC_NONE) == REGION_MIXED - || nd_region_conflict(nd_region, start - adjust, size)) - *start_pad = PHYS_SECTION_ALIGN_UP(start) - start; - - /* Now check that end of the range does not collide. */ - adjust = PHYS_SECTION_ALIGN_UP(end) - end; - size = resource_size(&nsio->res) + adjust; - if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM, - IORES_DESC_NONE) == REGION_MIXED - || !IS_ALIGNED(end, nd_pfn->align) - || nd_region_conflict(nd_region, start, size)) - *end_trunc = end - phys_pmem_align_down(nd_pfn, end); -} - static int nd_pfn_init(struct nd_pfn *nd_pfn) { struct nd_namespace_common *ndn
[PATCH v10 11/13] mm/devm_memremap_pages: Enable sub-section remap
Teach devm_memremap_pages() about the new sub-section capabilities of arch_{add,remove}_memory(). Effectively, just replace all usage of align_start, align_end, and align_size with res->start, res->end, and resource_size(res). The existing sanity check will still make sure that the two separate remap attempts do not collide within a sub-section (2MB on x86). Cc: Michal Hocko Cc: Toshi Kani Cc: Jérôme Glisse Cc: Logan Gunthorpe Cc: Oscar Salvador Cc: Pavel Tatashin Signed-off-by: Dan Williams --- kernel/memremap.c | 61 + 1 file changed, 24 insertions(+), 37 deletions(-) diff --git a/kernel/memremap.c b/kernel/memremap.c index 57980ed4e571..a0e5f6b91b04 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -58,7 +58,7 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap) struct vmem_altmap *altmap = &pgmap->altmap; unsigned long pfn; - pfn = res->start >> PAGE_SHIFT; + pfn = PHYS_PFN(res->start); if (pgmap->altmap_valid) pfn += vmem_altmap_offset(altmap); return pfn; @@ -86,7 +86,6 @@ static void devm_memremap_pages_release(void *data) struct dev_pagemap *pgmap = data; struct device *dev = pgmap->dev; struct resource *res = &pgmap->res; - resource_size_t align_start, align_size; unsigned long pfn; int nid; @@ -96,25 +95,21 @@ static void devm_memremap_pages_release(void *data) pgmap->cleanup(pgmap->ref); /* pages are dead and unused, undo the arch mapping */ - align_start = res->start & ~(PA_SECTION_SIZE - 1); - align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE) - - align_start; - - nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT)); + nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start))); mem_hotplug_begin(); if (pgmap->type == MEMORY_DEVICE_PRIVATE) { - pfn = align_start >> PAGE_SHIFT; + pfn = PHYS_PFN(res->start); __remove_pages(page_zone(pfn_to_page(pfn)), pfn, - align_size >> PAGE_SHIFT, NULL); + PHYS_PFN(resource_size(res)), NULL); } else { - arch_remove_memory(nid, align_start, align_size, + arch_remove_memory(nid, res->start, resource_size(res), pgmap->altmap_valid ? &pgmap->altmap : NULL); - kasan_remove_zero_shadow(__va(align_start), align_size); + kasan_remove_zero_shadow(__va(res->start), resource_size(res)); } mem_hotplug_done(); - untrack_pfn(NULL, PHYS_PFN(align_start), align_size); + untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res)); pgmap_array_delete(res); dev_WARN_ONCE(dev, pgmap->altmap.alloc, "%s: failed to free all reserved pages\n", __func__); @@ -141,16 +136,13 @@ static void devm_memremap_pages_release(void *data) */ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) { - resource_size_t align_start, align_size, align_end; - struct vmem_altmap *altmap = pgmap->altmap_valid ? - &pgmap->altmap : NULL; struct resource *res = &pgmap->res; struct dev_pagemap *conflict_pgmap; struct mhp_restrictions restrictions = { /* * We do not want any optional features only our own memmap */ - .altmap = altmap, + .altmap = pgmap->altmap_valid ? &pgmap->altmap : NULL, }; pgprot_t pgprot = PAGE_KERNEL; int error, nid, is_ram; @@ -160,12 +152,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) return ERR_PTR(-EINVAL); } - align_start = res->start & ~(PA_SECTION_SIZE - 1); - align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE) - - align_start; - align_end = align_start + align_size - 1; - - conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_start), NULL); + conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->start), NULL); if (conflict_pgmap) { dev_WARN(dev, "Conflicting mapping in same section\n"); put_dev_pagemap(conflict_pgmap); @@ -173,7 +160,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) goto err_array; } - conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_end), NULL); + conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->end), NULL); if (conflict_pgmap) { dev_WARN(dev, "Conflicting mapping in same section\n"); put_dev_pagemap(conflict_pgmap); @@ -181,7 +168,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) goto err_array; } - is_ram = region_intersects(align_start,
[PATCH v10 07/13] mm: Kill is_dev_zone() helper
Given there are no more usages of is_dev_zone() outside of 'ifdef CONFIG_ZONE_DEVICE' protection, kill off the compilation helper. Cc: Michal Hocko Cc: Logan Gunthorpe Acked-by: David Hildenbrand Reviewed-by: Oscar Salvador Reviewed-by: Pavel Tatashin Reviewed-by: Wei Yang Signed-off-by: Dan Williams --- include/linux/mmzone.h | 12 mm/page_alloc.c|2 +- 2 files changed, 1 insertion(+), 13 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c4e8843e283c..e976faf57292 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -855,18 +855,6 @@ static inline int local_memory_node(int node_id) { return node_id; }; */ #define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones) -#ifdef CONFIG_ZONE_DEVICE -static inline bool is_dev_zone(const struct zone *zone) -{ - return zone_idx(zone) == ZONE_DEVICE; -} -#else -static inline bool is_dev_zone(const struct zone *zone) -{ - return false; -} -#endif - /* * Returns true if a zone has pages managed by the buddy allocator. * All the reclaim decisions have to use this function rather than diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8e7215fb6976..12b2afd3a529 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5881,7 +5881,7 @@ void __ref memmap_init_zone_device(struct zone *zone, unsigned long start = jiffies; int nid = pgdat->node_id; - if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone))) + if (WARN_ON_ONCE(!pgmap || zone_idx(zone) != ZONE_DEVICE)) return; /*
[PATCH v10 04/13] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal
Sub-section hotplug support reduces the unit of operation of hotplug from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units (PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not valid_section(), can toggle. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Reviewed-by: Pavel Tatashin Reviewed-by: Oscar Salvador Signed-off-by: Dan Williams --- mm/memory_hotplug.c | 29 - 1 file changed, 8 insertions(+), 21 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 7b963c2d3a0d..647859a1d119 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -318,12 +318,8 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) { - struct mem_section *ms; - - for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) { - ms = __pfn_to_section(start_pfn); - - if (unlikely(!valid_section(ms))) + for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUBSECTION) { + if (unlikely(!pfn_valid(start_pfn))) continue; if (unlikely(pfn_to_nid(start_pfn) != nid)) @@ -343,15 +339,12 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) { - struct mem_section *ms; unsigned long pfn; /* pfn is the end pfn of a memory section. */ pfn = end_pfn - 1; - for (; pfn >= start_pfn; pfn -= PAGES_PER_SECTION) { - ms = __pfn_to_section(pfn); - - if (unlikely(!valid_section(ms))) + for (; pfn >= start_pfn; pfn -= PAGES_PER_SUBSECTION) { + if (unlikely(!pfn_valid(pfn))) continue; if (unlikely(pfn_to_nid(pfn) != nid)) @@ -373,7 +366,6 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, unsigned long z = zone_end_pfn(zone); /* zone_end_pfn namespace clash */ unsigned long zone_end_pfn = z; unsigned long pfn; - struct mem_section *ms; int nid = zone_to_nid(zone); zone_span_writelock(zone); @@ -410,10 +402,8 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, * it check the zone has only hole or not. */ pfn = zone_start_pfn; - for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) { - ms = __pfn_to_section(pfn); - - if (unlikely(!valid_section(ms))) + for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUBSECTION) { + if (unlikely(!pfn_valid(pfn))) continue; if (page_zone(pfn_to_page(pfn)) != zone) @@ -441,7 +431,6 @@ static void shrink_pgdat_span(struct pglist_data *pgdat, unsigned long p = pgdat_end_pfn(pgdat); /* pgdat_end_pfn namespace clash */ unsigned long pgdat_end_pfn = p; unsigned long pfn; - struct mem_section *ms; int nid = pgdat->node_id; if (pgdat_start_pfn == start_pfn) { @@ -478,10 +467,8 @@ static void shrink_pgdat_span(struct pglist_data *pgdat, * has only hole or not. */ pfn = pgdat_start_pfn; - for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SECTION) { - ms = __pfn_to_section(pfn); - - if (unlikely(!valid_section(ms))) + for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUBSECTION) { + if (unlikely(!pfn_valid(pfn))) continue; if (pfn_to_nid(pfn) != nid)
[PATCH v10 05/13] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()
Allow sub-section sized ranges to be added to the memmap. populate_section_memmap() takes an explict pfn range rather than assuming a full section, and those parameters are plumbed all the way through to vmmemap_populate(). There should be no sub-section usage in current deployments. New warnings are added to clarify which memmap allocation paths are sub-section capable. Cc: Michal Hocko Cc: David Hildenbrand Cc: Logan Gunthorpe Cc: Oscar Salvador Reviewed-by: Pavel Tatashin Signed-off-by: Dan Williams --- arch/x86/mm/init_64.c |4 +++- include/linux/mm.h|4 ++-- mm/sparse-vmemmap.c | 21 ++--- mm/sparse.c | 50 ++--- 4 files changed, 46 insertions(+), 33 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 8335ac6e1112..688fb0687e55 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1520,7 +1520,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, { int err; - if (boot_cpu_has(X86_FEATURE_PSE)) + if (end - start < PAGES_PER_SECTION * sizeof(struct page)) + err = vmemmap_populate_basepages(start, end, node); + else if (boot_cpu_has(X86_FEATURE_PSE)) err = vmemmap_populate_hugepages(start, end, node, altmap); else if (altmap) { pr_err_once("%s: no cpu support for altmap allocations\n", diff --git a/include/linux/mm.h b/include/linux/mm.h index c6ae9eba645d..f7616518124e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2752,8 +2752,8 @@ const char * arch_vma_name(struct vm_area_struct *vma); void print_vma_addr(char *prefix, unsigned long rip); void *sparse_buffer_alloc(unsigned long size); -struct page *sparse_mem_map_populate(unsigned long pnum, int nid, - struct vmem_altmap *altmap); +struct page * __populate_section_memmap(unsigned long pfn, + unsigned long nr_pages, int nid, struct vmem_altmap *altmap); pgd_t *vmemmap_pgd_populate(unsigned long addr, int node); p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node); pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node); diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 7fec05796796..200aef686722 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -245,19 +245,26 @@ int __meminit vmemmap_populate_basepages(unsigned long start, return 0; } -struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid, - struct vmem_altmap *altmap) +struct page * __meminit __populate_section_memmap(unsigned long pfn, + unsigned long nr_pages, int nid, struct vmem_altmap *altmap) { unsigned long start; unsigned long end; - struct page *map; - map = pfn_to_page(pnum * PAGES_PER_SECTION); - start = (unsigned long)map; - end = (unsigned long)(map + PAGES_PER_SECTION); + /* +* The minimum granularity of memmap extensions is +* PAGES_PER_SUBSECTION as allocations are tracked in the +* 'subsection_map' bitmap of the section. +*/ + end = ALIGN(pfn + nr_pages, PAGES_PER_SUBSECTION); + pfn &= PAGE_SUBSECTION_MASK; + nr_pages = end - pfn; + + start = (unsigned long) pfn_to_page(pfn); + end = start + nr_pages * sizeof(struct page); if (vmemmap_populate(start, end, nid, altmap)) return NULL; - return map; + return pfn_to_page(pfn); } diff --git a/mm/sparse.c b/mm/sparse.c index e9fec3c2f7ec..49f0c03d15a3 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -439,8 +439,8 @@ static unsigned long __init section_map_size(void) return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION); } -struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid, - struct vmem_altmap *altmap) +struct page __init *__populate_section_memmap(unsigned long pfn, + unsigned long nr_pages, int nid, struct vmem_altmap *altmap) { unsigned long size = section_map_size(); struct page *map = sparse_buffer_alloc(size); @@ -521,10 +521,13 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin, } sparse_buffer_init(map_count * section_map_size(), nid); for_each_present_section_nr(pnum_begin, pnum) { + unsigned long pfn = section_nr_to_pfn(pnum); + if (pnum >= pnum_end) break; - map = sparse_mem_map_populate(pnum, nid, NULL); + map = __populate_section_memmap(pfn, PAGES_PER_SECTION, + nid, NULL); if (!map) { pr_err("%s: node[%d] memory map backing failed. Some memory will not be available.", __func__, nid); @@ -625,17 +628,17 @@ void offline_mem_sections(unsigned long start_pfn, unsign
[PATCH v10 02/13] mm/sparsemem: Introduce a SECTION_IS_EARLY flag
In preparation for sub-section hotplug, track whether a given section was created during early memory initialization, or later via memory hotplug. This distinction is needed to maintain the coarse expectation that pfn_valid() returns true for any pfn within a given section even if that section has pages that are reserved from the page allocator. For example one of the of goals of subsection hotplug is to support cases where the system physical memory layout collides System RAM and PMEM within a section. Several pfn_valid() users expect to just check if a section is valid, but they are not careful to check if the given pfn is within a "System RAM" boundary and instead expect pgdat information to further validate the pfn. Rather than unwind those paths to make their pfn_valid() queries more precise a follow on patch uses the SECTION_IS_EARLY flag to maintain the traditional expectation that pfn_valid() returns true for all early sections. Link: https://lore.kernel.org/lkml/1560366952-10660-1-git-send-email-...@lca.pw/ Reported-by: Qian Cai Cc: Michal Hocko Cc: Logan Gunthorpe Cc: David Hildenbrand Cc: Oscar Salvador Cc: Pavel Tatashin Signed-off-by: Dan Williams --- include/linux/mmzone.h |8 +++- mm/sparse.c| 20 +--- 2 files changed, 16 insertions(+), 12 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 179680c94262..d081c9a1d25d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1261,7 +1261,8 @@ extern size_t mem_section_usage_size(void); #defineSECTION_MARKED_PRESENT (1UL<<0) #define SECTION_HAS_MEM_MAP(1UL<<1) #define SECTION_IS_ONLINE (1UL<<2) -#define SECTION_MAP_LAST_BIT (1UL<<3) +#define SECTION_IS_EARLY (1UL<<3) +#define SECTION_MAP_LAST_BIT (1UL<<4) #define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1)) #define SECTION_NID_SHIFT 3 @@ -1287,6 +1288,11 @@ static inline int valid_section(struct mem_section *section) return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP)); } +static inline int early_section(struct mem_section *section) +{ + return (section && (section->section_mem_map & SECTION_IS_EARLY)); +} + static inline int valid_section_nr(unsigned long nr) { return valid_section(__nr_to_section(nr)); diff --git a/mm/sparse.c b/mm/sparse.c index 71da15cc7432..2031a0694f35 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -288,11 +288,11 @@ struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pn static void __meminit sparse_init_one_section(struct mem_section *ms, unsigned long pnum, struct page *mem_map, - struct mem_section_usage *usage) + struct mem_section_usage *usage, unsigned long flags) { ms->section_mem_map &= ~SECTION_MAP_MASK; - ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) | - SECTION_HAS_MEM_MAP; + ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) + | SECTION_HAS_MEM_MAP | flags; ms->usage = usage; } @@ -497,7 +497,8 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin, goto failed; } check_usemap_section_nr(nid, usage); - sparse_init_one_section(__nr_to_section(pnum), pnum, map, usage); + sparse_init_one_section(__nr_to_section(pnum), pnum, map, usage, + SECTION_IS_EARLY); usage = (void *) usage + mem_section_usage_size(); } sparse_buffer_fini(); @@ -731,7 +732,7 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION); section_mark_present(ms); - sparse_init_one_section(ms, section_nr, memmap, usage); + sparse_init_one_section(ms, section_nr, memmap, usage, 0); out: if (ret < 0) { @@ -771,19 +772,16 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages) } #endif -static void free_section_usage(struct page *memmap, +static void free_section_usage(struct mem_section *ms, struct page *memmap, struct mem_section_usage *usage, struct vmem_altmap *altmap) { - struct page *usage_page; - if (!usage) return; - usage_page = virt_to_page(usage); /* * Check to see if allocation came from hot-plug-add */ - if (PageSlab(usage_page) || PageCompound(usage_page)) { + if (!early_section(ms)) { kfree(usage); if (memmap) __kfree_section_memmap(memmap, altmap); @@ -815,6 +813,6 @@ void sparse_remove_one_section(struct mem_section *ms, unsigned long map_offset, clear_hwpoisoned_pages(memmap + map_offset, PAGES_PER_SECTION - map_offset);
[PATCH v10 00/13] mm: Sub-section memory hotplug support
Changes since v9 [1]: - Fix multiple issues related to the fact that pfn_valid() has traditionally returned true for any pfn in an 'early' (onlined at boot) section regardless of whether that pfn represented 'System RAM'. Teach pfn_valid() to maintain its traditional behavior in the presence of subsections. Specifically, subsection precision for pfn_valid() is only considered for non-early / hot-plugged sections. (Qian) - Related to the first item introduce a SECTION_IS_EARLY (->section_mem_map flag) to remove the existing hacks for determining an early section by looking at whether the usemap was allocated from the slab. - Kill off the EEXIST hackery in __add_pages(). It breaks (arch_add_memory() false-positive) the detection of subsection collisions reported by section_activate(). It is also obviated by David's recent reworks to move the 'System RAM' request_region() earlier in the add_memory() sequence(). - Switch to an arch-independent / static subsection-size of 2MB. Otherwise, a per-arch subsection-size is a roadblock on the path to persistent memory namespace compatibility across archs. (Jeff) - Update the changelog for "libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields" to clarify that the "Cc: stable" is only there as safety measure for a distro that decides to backport "libnvdimm/pfn: Stop padding pmem namespaces to section alignment", otherwise there is no known bug exposure in older kernels. (Andrew) - Drop some redundant subsection checks (Oscar) - Collect some reviewed-bys [1]: https://lore.kernel.org/lkml/155977186863.2443951.9036044808311959913.st...@dwillia2-desk3.amr.corp.intel.com/ --- The memory hotplug section is an arbitrary / convenient unit for memory hotplug. 'Section-size' units have bled into the user interface ('memblock' sysfs) and can not be changed without breaking existing userspace. The section-size constraint, while mostly benign for typical memory hotplug, has and continues to wreak havoc with 'device-memory' use cases, persistent memory (pmem) in particular. Recall that pmem uses devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a 'struct page' memmap for pmem. However, it does not use the 'bottom half' of memory hotplug, i.e. never marks pmem pages online and never exposes the userspace memblock interface for pmem. This leaves an opening to redress the section-size constraint. To date, the libnvdimm subsystem has attempted to inject padding to satisfy the internal constraints of arch_add_memory(). Beyond complicating the code, leading to bugs [2], wasting memory, and limiting configuration flexibility, the padding hack is broken when the platform changes this physical memory alignment of pmem from one boot to the next. Device failure (intermittent or permanent) and physical reconfiguration are events that can cause the platform firmware to change the physical placement of pmem on a subsequent boot, and device failure is an everyday event in a data-center. It turns out that sections are only a hard requirement of the user-facing interface for memory hotplug and with a bit more infrastructure sub-section arch_add_memory() support can be added for kernel internal usages like devm_memremap_pages(). Here is an analysis of the current design assumptions in the current code and how they are addressed in the new implementation: Current design assumptions: - Sections that describe boot memory (early sections) are never unplugged / removed. - pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a valid_section() check - __add_pages() and helper routines assume all operations occur in PAGES_PER_SECTION units. - The memblock sysfs interface only comprehends full sections New design assumptions: - Sections are instrumented with a sub-section bitmask to track (on x86) individual 2MB sub-divisions of a 128MB section. - Partially populated early sections can be extended with additional sub-sections, and those sub-sections can be removed with arch_remove_memory(). With this in place we no longer lose usable memory capacity to padding. - pfn_valid() is updated to look deeper than valid_section() to also check the active-sub-section mask. This indication is in the same cacheline as the valid_section() so the performance impact is expected to be negligible. So far the lkp robot has not reported any regressions. - Outside of the core vmemmap population routines which are replaced, other helper routines like shrink_{zone,pgdat}_span() are updated to handle the smaller granularity. Core memory hotplug routines that deal with online memory are not touched. - The existing memblock sysfs user api guarantees / assumptions are not touched since this capability is limited to !online !memblock-sysfs-accessible sections. Meanwhile the issue reports continue to roll in from users that do not understand when and how the 128MB constraint will bite them. The current im
[PATCH v10 03/13] mm/sparsemem: Add helpers track active portions of a section at boot
Prepare for hot{plug,remove} of sub-ranges of a section by tracking a sub-section active bitmask, each bit representing a PMD_SIZE span of the architecture's memory hotplug section size. The implications of a partially populated section is that pfn_valid() needs to go beyond a valid_section() check and either determine that the section is an "early section", or read the sub-section active ranges from the bitmask. The expectation is that the bitmask (subsection_map) fits in the same cacheline as the valid_section() / early_section() data, so the incremental performance overhead to pfn_valid() should be negligible. The rationale for using early_section() to short-ciruit the subsection_map check is that there are legacy code paths that use pfn_valid() at section granularity before validating the pfn against pgdat data. So, the early_section() check allows those traditional assumptions to persist while also permitting subsection_map to tell the truth for purposes of populating the unused portions of early sections with PMEM and other ZONE_DEVICE mappings. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Cc: Oscar Salvador Cc: Pavel Tatashin Reported-by: Qian Cai Tested-by: Jane Chu Signed-off-by: Dan Williams --- include/linux/mmzone.h | 33 - mm/page_alloc.c| 10 -- mm/sparse.c| 35 +++ 3 files changed, 75 insertions(+), 3 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d081c9a1d25d..c4e8843e283c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1179,6 +1179,8 @@ struct mem_section_usage { unsigned long pageblock_flags[0]; }; +void subsection_map_init(unsigned long pfn, unsigned long nr_pages); + struct page; struct page_ext; struct mem_section { @@ -1322,12 +1324,40 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn) extern int __highest_present_section_nr; +static inline int subsection_map_index(unsigned long pfn) +{ + return (pfn & ~(PAGE_SECTION_MASK)) / PAGES_PER_SUBSECTION; +} + +#ifdef CONFIG_SPARSEMEM_VMEMMAP +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) +{ + int idx = subsection_map_index(pfn); + + return test_bit(idx, ms->usage->subsection_map); +} +#else +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) +{ + return 1; +} +#endif + #ifndef CONFIG_HAVE_ARCH_PFN_VALID static inline int pfn_valid(unsigned long pfn) { + struct mem_section *ms; + if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) return 0; - return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); + ms = __nr_to_section(pfn_to_section_nr(pfn)); + if (!valid_section(ms)) + return 0; + /* +* Traditionally early sections always returned pfn_valid() for +* the entire section-sized span. +*/ + return early_section(ms) || pfn_section_valid(ms, pfn); } #endif @@ -1359,6 +1389,7 @@ void sparse_init(void); #define sparse_init() do {} while (0) #define sparse_index_init(_sec, _nid) do {} while (0) #define pfn_present pfn_valid +#define subsection_map_init(_pfn, _nr_pages) do {} while (0) #endif /* CONFIG_SPARSEMEM */ /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8cc091e87200..8e7215fb6976 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7306,12 +7306,18 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) (u64)zone_movable_pfn[i] << PAGE_SHIFT); } - /* Print out the early node map */ + /* +* Print out the early node map, and initialize the +* subsection-map relative to active online memory ranges to +* enable future "sub-section" extensions of the memory map. +*/ pr_info("Early memory node ranges\n"); - for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { pr_info(" node %3d: [mem %#018Lx-%#018Lx]\n", nid, (u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) - 1); + subsection_map_init(start_pfn, end_pfn - start_pfn); + } /* Initialise every node */ mminit_verify_pageflags_layout(); diff --git a/mm/sparse.c b/mm/sparse.c index 2031a0694f35..e9fec3c2f7ec 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -210,6 +210,41 @@ static inline unsigned long first_present_section_nr(void) return next_present_section_nr(-1); } +void subsection_mask_set(unsigned long *map, unsigned long pfn, + unsigned long nr_pages) +{ + int idx = subsection_map_index(pfn); + int end = subsection_map_index(pfn + nr_pages - 1); + + bitmap_set(map, idx, end - idx + 1); +} + +void __init subsection_map_init(uns
[PATCH v10 01/13] mm/sparsemem: Introduce struct mem_section_usage
Towards enabling memory hotplug to track partial population of a section, introduce 'struct mem_section_usage'. A pointer to a 'struct mem_section_usage' instance replaces the existing pointer to a 'pageblock_flags' bitmap. Effectively it adds one more 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to house a new 'subsection_map' bitmap. The new bitmap enables the memory hot{plug,remove} implementation to act on incremental sub-divisions of a section. SUBSECTION_SHIFT is defined as global constant instead of per-architecture value like SECTION_SIZE_BITS in order to allow cross-arch compatibility of subsection users. Specifically a common subsection size allows for the possibility that persistent memory namespace configurations be made compatible across architectures. The primary motivation for this functionality is to support platforms that mix "System RAM" and "Persistent Memory" within a single section, or multiple PMEM ranges with different mapping lifetimes within a single section. The section restriction for hotplug has caused an ongoing saga of hacks and bugs for devm_memremap_pages() users. Beyond the fixups to teach existing paths how to retrieve the 'usemap' from a section, and updates to usemap allocation path, there are no expected behavior changes. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Cc: Pavel Tatashin Reviewed-by: Oscar Salvador Reviewed-by: Wei Yang Signed-off-by: Dan Williams --- include/linux/mmzone.h | 28 +++-- mm/memory_hotplug.c| 18 ++- mm/page_alloc.c|2 + mm/sparse.c| 81 4 files changed, 76 insertions(+), 53 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 427b79c39b3c..179680c94262 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1161,6 +1161,24 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec) #define SECTION_ALIGN_UP(pfn) (((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK) #define SECTION_ALIGN_DOWN(pfn)((pfn) & PAGE_SECTION_MASK) +#define SUBSECTION_SHIFT 21 + +#define PFN_SUBSECTION_SHIFT (SUBSECTION_SHIFT - PAGE_SHIFT) +#define PAGES_PER_SUBSECTION (1UL << PFN_SUBSECTION_SHIFT) +#define PAGE_SUBSECTION_MASK (~(PAGES_PER_SUBSECTION-1)) + +#if SUBSECTION_SHIFT > SECTION_SIZE_BITS +#error Subsection size exceeds section size +#else +#define SUBSECTIONS_PER_SECTION (1UL << (SECTION_SIZE_BITS - SUBSECTION_SHIFT)) +#endif + +struct mem_section_usage { + DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION); + /* See declaration of similar field in struct zone */ + unsigned long pageblock_flags[0]; +}; + struct page; struct page_ext; struct mem_section { @@ -1178,8 +1196,7 @@ struct mem_section { */ unsigned long section_mem_map; - /* See declaration of similar field in struct zone */ - unsigned long *pageblock_flags; + struct mem_section_usage *usage; #ifdef CONFIG_PAGE_EXTENSION /* * If SPARSEMEM, pgdat doesn't have page_ext pointer. We use @@ -1210,6 +1227,11 @@ extern struct mem_section **mem_section; extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif +static inline unsigned long *section_to_usemap(struct mem_section *ms) +{ + return ms->usage->pageblock_flags; +} + static inline struct mem_section *__nr_to_section(unsigned long nr) { #ifdef CONFIG_SPARSEMEM_EXTREME @@ -1221,7 +1243,7 @@ static inline struct mem_section *__nr_to_section(unsigned long nr) return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; } extern int __section_nr(struct mem_section* ms); -extern unsigned long usemap_size(void); +extern size_t mem_section_usage_size(void); /* * We use the lower bits of the mem_map pointer to store diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a88c5f334e5a..7b963c2d3a0d 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -166,9 +166,10 @@ void put_page_bootmem(struct page *page) #ifndef CONFIG_SPARSEMEM_VMEMMAP static void register_page_bootmem_info_section(unsigned long start_pfn) { - unsigned long *usemap, mapsize, section_nr, i; + unsigned long mapsize, section_nr, i; struct mem_section *ms; struct page *page, *memmap; + struct mem_section_usage *usage; section_nr = pfn_to_section_nr(start_pfn); ms = __nr_to_section(section_nr); @@ -188,10 +189,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn) for (i = 0; i < mapsize; i++, page++) get_page_bootmem(section_nr, page, SECTION_INFO); - usemap = ms->pageblock_flags; - page = virt_to_page(usemap); + usage = ms->usage; + page = virt_to_page(usage); - mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT; + mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT;
Re: [RESEND v4 1/4] soc: qcom: geni: Add support for ACPI
On Wed, 19 Jun 2019, Andy Gross wrote: > On Mon, Jun 17, 2019 at 01:51:02PM +0100, Lee Jones wrote: > > When booting with ACPI as the active set of configuration tables, > > all; clocks, regulators, pin functions ect are expected to be at > > their ideal values/levels/rates, thus the associated frameworks > > are unavailable. Ensure calls to these APIs are shielded when > > ACPI is enabled. > > > > Signed-off-by: Lee Jones > > Acked-by: Ard Biesheuvel > > Applied. Thanks Bjorn and Andy. -- Lee Jones [李琼斯] Linaro Services Technical Lead Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
Re: [PATCH 4/5] Powerpc/hw-breakpoint: Optimize disable path
On 6/18/19 11:45 AM, Michael Neuling wrote: > On Tue, 2019-06-18 at 09:57 +0530, Ravi Bangoria wrote: >> Directly setting dawr and dawrx with 0 should be enough to >> disable watchpoint. No need to reset individual bits in >> variable and then set in hw. > > This seems like a pointless optimisation to me. > > I'm all for adding more code/complexity if it buys us some performance, but I > can't imagine this is a fast path (nor have you stated any performance > benefits). This gets called from sched_switch. I expected the improvement when we switch from monitored process to non-monitored process. With such scenario, I tried to measure the difference in execution time of set_dawr but I don't see any improvement. So I'll drop the patch.
Re: [PATCH] NTB: test: remove a duplicate check
It's not a huge deal obviously but your commit was a6bed7a54165 ("NTB: Introduce NTB MSI Test Client") but you know that if I had sent a patch called ("NTB: remove a duplicate check") people would have correctly complained because the patch prefix is too vague. What I'm saying is we do this all the time: [PATCH] NTB: add a new foobazle driver But it should be: [PATCH] NTB: foobazle: add a new foobazle driver Then I can just copy and paste your patch prefix instead of trying invent one. regards, dan carpenter
Re: [PATCH] mfd: stmfx: Fix an endian bug in stmfx_irq_handler()
On Tue, 18 Jun 2019, Linus Torvalds wrote: > On Tue, Jun 18, 2019 at 1:16 AM Lee Jones wrote: > > > > > Reported-by: Linus Torvalds > > > > Ideally we can get a review too. > > Looks fine to me, but obviously somebody should actually _test_ it too. Amelie, would you be so kind? -- Lee Jones [李琼斯] Linaro Services Technical Lead Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
Re: [PATCH RESEND 1/8] s390: Start fallback of top-down mmap at mm->mmap_base
Really sorry about that, my connection is weird this morning, I'll retry tomorrow. Sorry again, Alex On 6/19/19 1:42 AM, Alexandre Ghiti wrote: In case of mmap failure in top-down mode, there is no need to go through the whole address space again for the bottom-up fallback: the goal of this fallback is to find, as a last resort, space between the top-down mmap base and the stack, which is the only place not covered by the top-down mmap. Signed-off-by: Alexandre Ghiti --- arch/s390/mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c index cbc718ba6d78..4a222969843b 100644 --- a/arch/s390/mm/mmap.c +++ b/arch/s390/mm/mmap.c @@ -166,7 +166,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, if (addr & ~PAGE_MASK) { VM_BUG_ON(addr != -ENOMEM); info.flags = 0; - info.low_limit = TASK_UNMAPPED_BASE; + info.low_limit = mm->mmap_base; info.high_limit = TASK_SIZE; addr = vm_unmapped_area(&info); if (addr & ~PAGE_MASK)
Re: [RESEND v4 1/4] soc: qcom: geni: Add support for ACPI
On Mon, Jun 17, 2019 at 01:51:02PM +0100, Lee Jones wrote: > When booting with ACPI as the active set of configuration tables, > all; clocks, regulators, pin functions ect are expected to be at > their ideal values/levels/rates, thus the associated frameworks > are unavailable. Ensure calls to these APIs are shielded when > ACPI is enabled. > > Signed-off-by: Lee Jones > Acked-by: Ard Biesheuvel Applied. Thanks, Andy
[PATCH V5 3/5] clk: imx: Add API for clk unregister when driver probe fail
From: Anson Huang For i.MX clock drivers probe fail case, clks should be unregistered in the return path, this patch adds a common API for i.MX clock drivers to unregister clocks when fail. Signed-off-by: Anson Huang --- New patch. --- drivers/clk/imx/clk.c | 8 drivers/clk/imx/clk.h | 1 + 2 files changed, 9 insertions(+) diff --git a/drivers/clk/imx/clk.c b/drivers/clk/imx/clk.c index f241189..8616967 100644 --- a/drivers/clk/imx/clk.c +++ b/drivers/clk/imx/clk.c @@ -13,6 +13,14 @@ DEFINE_SPINLOCK(imx_ccm_lock); +void imx_unregister_clocks(struct clk *clks[], unsigned int count) +{ + unsigned int i; + + for (i = 0; i < count; i++) + clk_unregister(clks[i]); +} + void __init imx_mmdc_mask_handshake(void __iomem *ccm_base, unsigned int chn) { diff --git a/drivers/clk/imx/clk.h b/drivers/clk/imx/clk.h index 19d7b8b..bb4ec1b 100644 --- a/drivers/clk/imx/clk.h +++ b/drivers/clk/imx/clk.h @@ -12,6 +12,7 @@ void imx_check_clk_hws(struct clk_hw *clks[], unsigned int count); void imx_register_uart_clocks(struct clk ** const clks[]); void imx_register_uart_clocks_hws(struct clk_hw ** const hws[]); void imx_mmdc_mask_handshake(void __iomem *ccm_base, unsigned int chn); +void imx_unregister_clocks(struct clk *clks[], unsigned int count); extern void imx_cscmr1_fixup(u32 *val); -- 2.7.4
[PATCH V5 4/5] clk: imx: Add support for i.MX8MN clock driver
From: Anson Huang This patch adds i.MX8MN clock driver support. Signed-off-by: Anson Huang --- Changes since V4: - use dev_err instead of pr_err; - unregister clocks when probe failed. --- drivers/clk/imx/Kconfig | 6 + drivers/clk/imx/Makefile | 1 + drivers/clk/imx/clk-imx8mn.c | 636 +++ 3 files changed, 643 insertions(+) create mode 100644 drivers/clk/imx/clk-imx8mn.c diff --git a/drivers/clk/imx/Kconfig b/drivers/clk/imx/Kconfig index 0eaf418..1ac0c79 100644 --- a/drivers/clk/imx/Kconfig +++ b/drivers/clk/imx/Kconfig @@ -14,6 +14,12 @@ config CLK_IMX8MM help Build the driver for i.MX8MM CCM Clock Driver +config CLK_IMX8MN + bool "IMX8MN CCM Clock Driver" + depends on ARCH_MXC && ARM64 + help + Build the driver for i.MX8MN CCM Clock Driver + config CLK_IMX8MQ bool "IMX8MQ CCM Clock Driver" depends on ARCH_MXC && ARM64 diff --git a/drivers/clk/imx/Makefile b/drivers/clk/imx/Makefile index 05641c6..77a3d71 100644 --- a/drivers/clk/imx/Makefile +++ b/drivers/clk/imx/Makefile @@ -26,6 +26,7 @@ obj-$(CONFIG_MXC_CLK_SCU) += \ clk-lpcg-scu.o obj-$(CONFIG_CLK_IMX8MM) += clk-imx8mm.o +obj-$(CONFIG_CLK_IMX8MN) += clk-imx8mn.o obj-$(CONFIG_CLK_IMX8MQ) += clk-imx8mq.o obj-$(CONFIG_CLK_IMX8QXP) += clk-imx8qxp.o clk-imx8qxp-lpcg.o diff --git a/drivers/clk/imx/clk-imx8mn.c b/drivers/clk/imx/clk-imx8mn.c new file mode 100644 index 000..07481a5 --- /dev/null +++ b/drivers/clk/imx/clk-imx8mn.c @@ -0,0 +1,636 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2018-2019 NXP. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "clk.h" + +static u32 share_count_sai2; +static u32 share_count_sai3; +static u32 share_count_sai5; +static u32 share_count_sai6; +static u32 share_count_sai7; +static u32 share_count_disp; +static u32 share_count_pdm; +static u32 share_count_nand; + +enum { + ARM_PLL, + GPU_PLL, + VPU_PLL, + SYS_PLL1, + SYS_PLL2, + SYS_PLL3, + DRAM_PLL, + AUDIO_PLL1, + AUDIO_PLL2, + VIDEO_PLL2, + NR_PLLS, +}; + +static const struct imx_pll14xx_rate_table imx8mn_pll1416x_tbl[] = { + PLL_1416X_RATE(18U, 225, 3, 0), + PLL_1416X_RATE(16U, 200, 3, 0), + PLL_1416X_RATE(12U, 300, 3, 1), + PLL_1416X_RATE(10U, 250, 3, 1), + PLL_1416X_RATE(8U, 200, 3, 1), + PLL_1416X_RATE(75000U, 250, 2, 2), + PLL_1416X_RATE(7U, 350, 3, 2), + PLL_1416X_RATE(6U, 300, 3, 2), +}; + +static const struct imx_pll14xx_rate_table imx8mn_audiopll_tbl[] = { + PLL_1443X_RATE(786432000U, 655, 5, 2, 23593), + PLL_1443X_RATE(722534400U, 301, 5, 1, 3670), +}; + +static const struct imx_pll14xx_rate_table imx8mn_videopll_tbl[] = { + PLL_1443X_RATE(65000U, 325, 3, 2, 0), + PLL_1443X_RATE(59400U, 198, 2, 2, 0), +}; + +static const struct imx_pll14xx_rate_table imx8mn_drampll_tbl[] = { + PLL_1443X_RATE(65000U, 325, 3, 2, 0), +}; + +static struct imx_pll14xx_clk imx8mn_audio_pll = { + .type = PLL_1443X, + .rate_table = imx8mn_audiopll_tbl, +}; + +static struct imx_pll14xx_clk imx8mn_video_pll = { + .type = PLL_1443X, + .rate_table = imx8mn_videopll_tbl, +}; + +static struct imx_pll14xx_clk imx8mn_dram_pll = { + .type = PLL_1443X, + .rate_table = imx8mn_drampll_tbl, +}; + +static struct imx_pll14xx_clk imx8mn_arm_pll = { + .type = PLL_1416X, + .rate_table = imx8mn_pll1416x_tbl, +}; + +static struct imx_pll14xx_clk imx8mn_gpu_pll = { + .type = PLL_1416X, + .rate_table = imx8mn_pll1416x_tbl, +}; + +static struct imx_pll14xx_clk imx8mn_vpu_pll = { + .type = PLL_1416X, + .rate_table = imx8mn_pll1416x_tbl, +}; + +static struct imx_pll14xx_clk imx8mn_sys_pll = { + .type = PLL_1416X, + .rate_table = imx8mn_pll1416x_tbl, +}; + +static const char * const pll_ref_sels[] = { "osc_24m", "dummy", "dummy", "dummy", }; +static const char * const audio_pll1_bypass_sels[] = {"audio_pll1", "audio_pll1_ref_sel", }; +static const char * const audio_pll2_bypass_sels[] = {"audio_pll2", "audio_pll2_ref_sel", }; +static const char * const video_pll1_bypass_sels[] = {"video_pll1", "video_pll1_ref_sel", }; +static const char * const dram_pll_bypass_sels[] = {"dram_pll", "dram_pll_ref_sel", }; +static const char * const gpu_pll_bypass_sels[] = {"gpu_pll", "gpu_pll_ref_sel", }; +static const char * const vpu_pll_bypass_sels[] = {"vpu_pll", "vpu_pll_ref_sel", }; +static const char * const arm_pll_bypass_sels[] = {"arm_pll", "arm_pll_ref_sel", }; +static const char * const sys_pll1_bypass_sels[] = {"sys_pll1", "sys_pll1_ref_sel", }; +
[PATCH V5 1/5] dt-bindings: imx: Add clock binding doc for i.MX8MN
From: Anson Huang Add the clock binding doc for i.MX8MN. Signed-off-by: Anson Huang Reviewed-by: Maxime Ripard --- No changes. --- .../devicetree/bindings/clock/imx8mn-clock.yaml| 112 +++ include/dt-bindings/clock/imx8mn-clock.h | 215 + 2 files changed, 327 insertions(+) create mode 100644 Documentation/devicetree/bindings/clock/imx8mn-clock.yaml create mode 100644 include/dt-bindings/clock/imx8mn-clock.h diff --git a/Documentation/devicetree/bindings/clock/imx8mn-clock.yaml b/Documentation/devicetree/bindings/clock/imx8mn-clock.yaml new file mode 100644 index 000..454c5b4 --- /dev/null +++ b/Documentation/devicetree/bindings/clock/imx8mn-clock.yaml @@ -0,0 +1,112 @@ +# SPDX-License-Identifier: GPL-2.0 +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/bindings/clock/imx8mn-clock.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: NXP i.MX8M Nano Clock Control Module Binding + +maintainers: + - Anson Huang + +description: | + NXP i.MX8M Nano clock control module is an integrated clock controller, which + generates and supplies to all modules. + +properties: + compatible: +const: fsl,imx8mn-ccm + + reg: +maxItems: 1 + + clocks: +items: + - description: 32k osc + - description: 24m osc + - description: ext1 clock input + - description: ext2 clock input + - description: ext3 clock input + - description: ext4 clock input + + clock-names: +items: + - const: osc_32k + - const: osc_24m + - const: clk_ext1 + - const: clk_ext2 + - const: clk_ext3 + - const: clk_ext4 + + '#clock-cells': +const: 1 +description: | + The clock consumer should specify the desired clock by having the clock + ID in its "clocks" phandle cell. See include/dt-bindings/clock/imx8mn-clock.h + for the full list of i.MX8M Nano clock IDs. + +required: + - compatible + - reg + - clocks + - clock-names + - '#clock-cells' + +examples: + # Clock Control Module node: + - | +clk: clock-controller@3038 { +compatible = "fsl,imx8mn-ccm"; +reg = <0x0 0x3038 0x0 0x1>; +#clock-cells = <1>; +clocks = <&osc_32k>, <&osc_24m>, <&clk_ext1>, + <&clk_ext2>, <&clk_ext3>, <&clk_ext4>; +clock-names = "osc_32k", "osc_24m", "clk_ext1", + "clk_ext2", "clk_ext3", "clk_ext4"; +}; + + # Required external clocks for Clock Control Module node: + - | +osc_32k: clock-osc-32k { +compatible = "fixed-clock"; +#clock-cells = <0>; +clock-frequency = <32768>; + clock-output-names = "osc_32k"; +}; + +osc_24m: clock-osc-24m { +compatible = "fixed-clock"; +#clock-cells = <0>; +clock-frequency = <2400>; +clock-output-names = "osc_24m"; +}; + +clk_ext1: clock-ext1 { +compatible = "fixed-clock"; +#clock-cells = <0>; +clock-frequency = <13300>; +clock-output-names = "clk_ext1"; +}; + +clk_ext2: clock-ext2 { +compatible = "fixed-clock"; +#clock-cells = <0>; +clock-frequency = <13300>; +clock-output-names = "clk_ext2"; +}; + +clk_ext3: clock-ext3 { +compatible = "fixed-clock"; +#clock-cells = <0>; +clock-frequency = <13300>; +clock-output-names = "clk_ext3"; +}; + +clk_ext4: clock-ext4 { +compatible = "fixed-clock"; +#clock-cells = <0>; +clock-frequency= <13300>; +clock-output-names = "clk_ext4"; +}; + +... diff --git a/include/dt-bindings/clock/imx8mn-clock.h b/include/dt-bindings/clock/imx8mn-clock.h new file mode 100644 index 000..5255b1c --- /dev/null +++ b/include/dt-bindings/clock/imx8mn-clock.h @@ -0,0 +1,215 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright 2018-2019 NXP + */ + +#ifndef __DT_BINDINGS_CLOCK_IMX8MN_H +#define __DT_BINDINGS_CLOCK_IMX8MN_H + +#define IMX8MN_CLK_DUMMY 0 +#define IMX8MN_CLK_32K 1 +#define IMX8MN_CLK_24M 2 +#define IMX8MN_OSC_HDMI_CLK3 +#define IMX8MN_CLK_EXT14 +#define IMX8MN_CLK_EXT25 +#define IMX8MN_CLK_EXT36 +#define IMX8MN_CLK_EXT47 +#define IMX8MN_AUDIO_PLL1_REF_SEL 8 +#define IMX8MN_AUDIO_PLL2_REF_SEL 9 +#define IMX8MN_VIDEO_PLL1_REF_SEL 10 +#define IMX8MN_DRAM_PLL_REF_SEL11 +#define IMX8MN_GPU_PLL_REF_SEL 12 +#define IMX8MN_VPU_PLL_REF_SEL 13 +#define IMX8MN_ARM_PLL_REF_SEL 14 +#define IMX8MN_SYS_PLL1_REF_SEL15 +#define IMX8MN_SYS_PLL2_REF_SEL16 +#define IMX8MN_SYS_PLL3_REF_SEL17 +#define IMX8MN_AUDI
[PATCH V5 5/5] arm64: defconfig: Select CONFIG_CLK_IMX8MN by default
From: Anson Huang Enable CONFIG_CLK_IMX8MN to support i.MX8MN clock driver. Signed-off-by: Anson Huang --- No changes. --- arch/arm64/configs/defconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 7a21159..29f7768 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -659,6 +659,7 @@ CONFIG_COMMON_CLK_S2MPS11=y CONFIG_CLK_QORIQ=y CONFIG_COMMON_CLK_PWM=y CONFIG_CLK_IMX8MM=y +CONFIG_CLK_IMX8MN=y CONFIG_CLK_IMX8MQ=y CONFIG_CLK_IMX8QXP=y CONFIG_TI_SCI_CLK=y -- 2.7.4
[PATCH V5 2/5] clk: imx8mm: Make 1416X/1443X PLL macro definitions common for usage
From: Anson Huang 1416X/1443X PLL are used on i.MX8MM and i.MX8MN and maybe other i.MX8M series SoC later, the macro definitions of these PLLs' initialization should be common for usage. Signed-off-by: Anson Huang --- No changes. --- drivers/clk/imx/clk-imx8mm.c | 17 - drivers/clk/imx/clk.h| 17 + 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/drivers/clk/imx/clk-imx8mm.c b/drivers/clk/imx/clk-imx8mm.c index 6b8e75d..43fa9c3 100644 --- a/drivers/clk/imx/clk-imx8mm.c +++ b/drivers/clk/imx/clk-imx8mm.c @@ -26,23 +26,6 @@ static u32 share_count_dcss; static u32 share_count_pdm; static u32 share_count_nand; -#define PLL_1416X_RATE(_rate, _m, _p, _s) \ - { \ - .rate = (_rate),\ - .mdiv = (_m), \ - .pdiv = (_p), \ - .sdiv = (_s), \ - } - -#define PLL_1443X_RATE(_rate, _m, _p, _s, _k) \ - { \ - .rate = (_rate),\ - .mdiv = (_m), \ - .pdiv = (_p), \ - .sdiv = (_s), \ - .kdiv = (_k), \ - } - static const struct imx_pll14xx_rate_table imx8mm_pll1416x_tbl[] = { PLL_1416X_RATE(18U, 225, 3, 0), PLL_1416X_RATE(16U, 200, 3, 0), diff --git a/drivers/clk/imx/clk.h b/drivers/clk/imx/clk.h index d94d9cb..19d7b8b 100644 --- a/drivers/clk/imx/clk.h +++ b/drivers/clk/imx/clk.h @@ -153,6 +153,23 @@ enum imx_pllv3_type { struct clk_hw *imx_clk_hw_pllv3(enum imx_pllv3_type type, const char *name, const char *parent_name, void __iomem *base, u32 div_mask); +#define PLL_1416X_RATE(_rate, _m, _p, _s) \ + { \ + .rate = (_rate),\ + .mdiv = (_m), \ + .pdiv = (_p), \ + .sdiv = (_s), \ + } + +#define PLL_1443X_RATE(_rate, _m, _p, _s, _k) \ + { \ + .rate = (_rate),\ + .mdiv = (_m), \ + .pdiv = (_p), \ + .sdiv = (_s), \ + .kdiv = (_k), \ + } + struct clk_hw *imx_clk_pllv4(const char *name, const char *parent_name, void __iomem *base); -- 2.7.4
[PATCH RESEND 1/8] s390: Start fallback of top-down mmap at mm->mmap_base
In case of mmap failure in top-down mode, there is no need to go through the whole address space again for the bottom-up fallback: the goal of this fallback is to find, as a last resort, space between the top-down mmap base and the stack, which is the only place not covered by the top-down mmap. Signed-off-by: Alexandre Ghiti --- arch/s390/mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c index cbc718ba6d78..4a222969843b 100644 --- a/arch/s390/mm/mmap.c +++ b/arch/s390/mm/mmap.c @@ -166,7 +166,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, if (addr & ~PAGE_MASK) { VM_BUG_ON(addr != -ENOMEM); info.flags = 0; - info.low_limit = TASK_UNMAPPED_BASE; + info.low_limit = mm->mmap_base; info.high_limit = TASK_SIZE; addr = vm_unmapped_area(&info); if (addr & ~PAGE_MASK) -- 2.20.1
Re: [PATCH v4 3/6] soc: qcom: geni: Add support for ACPI
On Wed 12 Jun 07:26 PDT 2019, Lee Jones wrote: > When booting with ACPI as the active set of configuration tables, > all; clocks, regulators, pin functions ect are expected to be at > their ideal values/levels/rates, thus the associated frameworks > are unavailable. Ensure calls to these APIs are shielded when > ACPI is enabled. > Reviewed-by: Bjorn Andersson > Signed-off-by: Lee Jones > Acked-by: Ard Biesheuvel > --- > drivers/soc/qcom/qcom-geni-se.c | 21 +++-- > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/drivers/soc/qcom/qcom-geni-se.c b/drivers/soc/qcom/qcom-geni-se.c > index 6b8ef01472e9..d5cf953b4337 100644 > --- a/drivers/soc/qcom/qcom-geni-se.c > +++ b/drivers/soc/qcom/qcom-geni-se.c > @@ -1,6 +1,7 @@ > // SPDX-License-Identifier: GPL-2.0 > // Copyright (c) 2017-2018, The Linux Foundation. All rights reserved. > > +#include > #include > #include > #include > @@ -450,6 +451,9 @@ int geni_se_resources_off(struct geni_se *se) > { > int ret; > > + if (has_acpi_companion(se->dev)) > + return 0; > + > ret = pinctrl_pm_select_sleep_state(se->dev); > if (ret) > return ret; > @@ -487,6 +491,9 @@ int geni_se_resources_on(struct geni_se *se) > { > int ret; > > + if (has_acpi_companion(se->dev)) > + return 0; > + > ret = geni_se_clks_on(se); > if (ret) > return ret; > @@ -724,12 +731,14 @@ static int geni_se_probe(struct platform_device *pdev) > if (IS_ERR(wrapper->base)) > return PTR_ERR(wrapper->base); > > - wrapper->ahb_clks[0].id = "m-ahb"; > - wrapper->ahb_clks[1].id = "s-ahb"; > - ret = devm_clk_bulk_get(dev, NUM_AHB_CLKS, wrapper->ahb_clks); > - if (ret) { > - dev_err(dev, "Err getting AHB clks %d\n", ret); > - return ret; > + if (!has_acpi_companion(&pdev->dev)) { > + wrapper->ahb_clks[0].id = "m-ahb"; > + wrapper->ahb_clks[1].id = "s-ahb"; > + ret = devm_clk_bulk_get(dev, NUM_AHB_CLKS, wrapper->ahb_clks); > + if (ret) { > + dev_err(dev, "Err getting AHB clks %d\n", ret); > + return ret; > + } > } > > dev_set_drvdata(dev, wrapper); > -- > 2.17.1 >
Re: [PATCH 1/1] scsi: ufs-qcom: Add support for platforms booting ACPI
Ard, Martin, On Tue, 18 Jun 2019, Martin K. Petersen wrote: > > New Qualcomm AArch64 based laptops are now available which use UFS > > as their primary data storage medium. These devices are supplied > > with ACPI support out of the box. This patch ensures the Qualcomm > > UFS driver will be bound when the "QCOM24A5" H/W device is > > advertised as present. > > Applied to 5.3/scsi-queue. Thanks! Ideal. Thanks for your help. -- Lee Jones [李琼斯] Linaro Services Technical Lead Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
[PATCH RESEND 0/8] Fix mmap base in bottom-up mmap
This series fixes the fallback of the top-down mmap: in case of failure, a bottom-up scheme can be tried as a last resort between the top-down mmap base and the stack, hoping for a large unused stack limit. Lots of architectures and even mm code start this fallback at TASK_UNMAPPED_BASE, which is useless since the top-down scheme already failed on the whole address space: instead, simply use mmap_base. Along the way, it allows to get rid of of mmap_legacy_base and mmap_compat_legacy_base from mm_struct. Note that arm and mips already implement this behaviour. Alexandre Ghiti (8): s390: Start fallback of top-down mmap at mm->mmap_base sh: Start fallback of top-down mmap at mm->mmap_base sparc: Start fallback of top-down mmap at mm->mmap_base x86, hugetlbpage: Start fallback of top-down mmap at mm->mmap_base mm: Start fallback top-down mmap at mm->mmap_base parisc: Use mmap_base, not mmap_legacy_base, as low_limit for bottom-up mmap x86: Use mmap_*base, not mmap_*legacy_base, as low_limit for bottom-up mmap mm: Remove mmap_legacy_base and mmap_compat_legacy_code fields from mm_struct arch/parisc/kernel/sys_parisc.c | 8 +++- arch/s390/mm/mmap.c | 2 +- arch/sh/mm/mmap.c| 2 +- arch/sparc/kernel/sys_sparc_64.c | 2 +- arch/sparc/mm/hugetlbpage.c | 2 +- arch/x86/include/asm/elf.h | 2 +- arch/x86/kernel/sys_x86_64.c | 4 ++-- arch/x86/mm/hugetlbpage.c| 7 --- arch/x86/mm/mmap.c | 20 +--- include/linux/mm_types.h | 2 -- mm/debug.c | 4 ++-- mm/mmap.c| 2 +- 12 files changed, 26 insertions(+), 31 deletions(-) -- 2.20.1
Re: [PATCH] NTB: test: remove a duplicate check
,, On 2019-06-18 11:32 p.m., Dan Carpenter wrote: > We already verified that the "nm->isr_ctx" allocation succeeded so there > is no need to check again here. > > Fixes: a6bed7a54165 ("NTB: Introduce NTB MSI Test Client") > Signed-off-by: Dan Carpenter Hmm, yup, not sure how that slipped through, must have been a bad rebase or something. Thanks Dan! Reviewed-by: Logan Gunthorpe > --- > Hey Logan, can pick a patch prefix when you're introducing a new module? > "[PATCH] NTB/test: Introduce NTB MSI Test Client" or whatever. I don't quite follow you there. NTB doesn't really have a good standard for prefixes. NTB/test might have made sense. Logan
[PATCH RESEND 0/8] Fix mmap base in bottom-up mmap
(Sorry for the previous interrupted series) This series fixes the fallback of the top-down mmap: in case of failure, a bottom-up scheme can be tried as a last resort between the top-down mmap base and the stack, hoping for a large unused stack limit. Lots of architectures and even mm code start this fallback at TASK_UNMAPPED_BASE, which is useless since the top-down scheme already failed on the whole address space: instead, simply use mmap_base. Along the way, it allows to get rid of of mmap_legacy_base and mmap_compat_legacy_base from mm_struct. Note that arm and mips already implement this behaviour. Alexandre Ghiti (8): s390: Start fallback of top-down mmap at mm->mmap_base sh: Start fallback of top-down mmap at mm->mmap_base sparc: Start fallback of top-down mmap at mm->mmap_base x86, hugetlbpage: Start fallback of top-down mmap at mm->mmap_base mm: Start fallback top-down mmap at mm->mmap_base parisc: Use mmap_base, not mmap_legacy_base, as low_limit for bottom-up mmap x86: Use mmap_*base, not mmap_*legacy_base, as low_limit for bottom-up mmap mm: Remove mmap_legacy_base and mmap_compat_legacy_code fields from mm_struct arch/parisc/kernel/sys_parisc.c | 8 +++- arch/s390/mm/mmap.c | 2 +- arch/sh/mm/mmap.c| 2 +- arch/sparc/kernel/sys_sparc_64.c | 2 +- arch/sparc/mm/hugetlbpage.c | 2 +- arch/x86/include/asm/elf.h | 2 +- arch/x86/kernel/sys_x86_64.c | 4 ++-- arch/x86/mm/hugetlbpage.c| 7 --- arch/x86/mm/mmap.c | 20 +--- include/linux/mm_types.h | 2 -- mm/debug.c | 4 ++-- mm/mmap.c| 2 +- 12 files changed, 26 insertions(+), 31 deletions(-) -- 2.20.1
Re: [PATCH RFC 2/3] fonts: Use BUILD_BUG_ON() for checking empty font table
On Wed, 19 Jun 2019 01:05:58 +0200, Randy Dunlap wrote: > > On 6/18/19 1:34 PM, Takashi Iwai wrote: > > We have a nice macro, and the check of emptiness of the font table can > > be done in a simpler way. > > > > Signed-off-by: Takashi Iwai > > Hi, > > Looks good to me. > Acked-by: Randy Dunlap > > Also, would you mind adding TER16x32 to Documentation/fb/fbcon.rst, here: > (AFAIK that would be appropriate.) > > 1. fbcon=font: > > Select the initial font to use. The value 'name' can be any of the > compiled-in fonts: 10x18, 6x10, 7x14, Acorn8x8, MINI4x6, > PEARL8x8, ProFont6x11, SUN12x22, SUN8x16, VGA8x16, VGA8x8. OK, will submit another patch. thanks, Takashi
[PATCH] NTB: test: remove a duplicate check
We already verified that the "nm->isr_ctx" allocation succeeded so there is no need to check again here. Fixes: a6bed7a54165 ("NTB: Introduce NTB MSI Test Client") Signed-off-by: Dan Carpenter --- Hey Logan, can pick a patch prefix when you're introducing a new module? "[PATCH] NTB/test: Introduce NTB MSI Test Client" or whatever. drivers/ntb/test/ntb_msi_test.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/ntb/test/ntb_msi_test.c b/drivers/ntb/test/ntb_msi_test.c index 99d826ed9c34..9ba3c3162cd0 100644 --- a/drivers/ntb/test/ntb_msi_test.c +++ b/drivers/ntb/test/ntb_msi_test.c @@ -372,9 +372,6 @@ static int ntb_msit_probe(struct ntb_client *client, struct ntb_dev *ntb) if (ret) goto remove_dbgfs; - if (!nm->isr_ctx) - goto remove_dbgfs; - ntb_link_enable(ntb, NTB_SPEED_AUTO, NTB_WIDTH_AUTO); return 0; -- 2.20.1
Re: [PATCH][next] platform/chrome: wilco_ec: fix null pointer dereference on failed kzalloc
On Tue, Jun 18, 2019 at 04:39:24PM +0100, Colin King wrote: > diff --git a/drivers/platform/chrome/wilco_ec/event.c > b/drivers/platform/chrome/wilco_ec/event.c > index c975b76e6255..e251a989b152 100644 > --- a/drivers/platform/chrome/wilco_ec/event.c > +++ b/drivers/platform/chrome/wilco_ec/event.c > @@ -112,8 +112,11 @@ module_param(queue_size, int, 0644); > static struct ec_event_queue *event_queue_new(int capacity) > { > size_t entries_size = sizeof(struct ec_event *) * capacity; > - struct ec_event_queue *q = kzalloc(sizeof(*q) + entries_size, > -GFP_KERNEL); > + struct ec_event_queue *q; > + > + q = kzalloc(sizeof(*q) + entries_size, GFP_KERNEL); > + if (!q) > + return NULL; We have a new struct_size() macro designed for these allocations. q = kzalloc(struct_size(q, entries, capacity), GFP_KERNEL); The advantage is that it checks for integer overflows. regards, dan carpenter
Re: [PATCHv5 10/20] PCI: mobiveil: Fix the INTx process errors
On Fri, Jun 14, 2019 at 4:14 PM Lorenzo Pieralisi wrote: > > On Fri, Jun 14, 2019 at 12:38:51PM +0530, Karthikeyan Mitran wrote: > > Hi Lorenzo and Hou Zhiqiang > > PAB_INTP_AMBA_MISC_STAT does have other status in the higher bits, it > > should have been masked before checking for the status > > You are the maintainer for this driver, so if there is something to be > changed you must post a patch to that extent, I do not understand what > the above means, write the code to fix it, I won't do it. > > I am getting a bit annoyed with this Mobiveil driver so either you guys > sort this out or I will have to remove it from the kernel. > > > Acked-by: Karthikeyan Mitran > > Ok I assume this means you tested it but according to what you > say above, are there still issues with this code path ? Should > we update the patch ? Tested-by: Karthikeyan Mitran This patch fixes the INTx status extraction and handling, I don't see any need to update this patch. > > Moreover: > > https://kernelnewbies.org/PatchCulture > > Please read it and never top-post. Thank you very much, for the information. > > Thanks, > Lorenzo > > > On Wed, Jun 12, 2019 at 8:38 PM Lorenzo Pieralisi > > wrote: > > > > > > On Fri, Apr 12, 2019 at 08:36:12AM +, Z.q. Hou wrote: > > > > From: Hou Zhiqiang > > > > > > > > In the loop block, there is not code to update the loop key, > > > > this patch updates the loop key by re-read the INTx status > > > > register. > > > > > > > > This patch also add the clearing of the handled INTx status. > > > > > > > > Note: Need MV to test this fix. > > > > > > This means INTX were never tested and current code handling them is, > > > AFAICS, an infinite loop which is very very bad. > > > > > > This is a gross bug and must be fixed as soon as possible. > > > > > > I want Karthikeyan ACK and Tested-by on this patch. > > > > > > Lorenzo > > > > > > > Fixes: 9af6bcb11e12 ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP > > > > driver") > > > > Signed-off-by: Hou Zhiqiang > > > > Reviewed-by: Minghuan Lian > > > > Reviewed-by: Subrahmanya Lingappa > > > > --- > > > > V5: > > > > - Corrected and retouched the subject and changelog. > > > > > > > > drivers/pci/controller/pcie-mobiveil.c | 13 + > > > > 1 file changed, 9 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/pci/controller/pcie-mobiveil.c > > > > b/drivers/pci/controller/pcie-mobiveil.c > > > > index 4ba458474e42..78e575e71f4d 100644 > > > > --- a/drivers/pci/controller/pcie-mobiveil.c > > > > +++ b/drivers/pci/controller/pcie-mobiveil.c > > > > @@ -361,6 +361,7 @@ static void mobiveil_pcie_isr(struct irq_desc *desc) > > > > /* Handle INTx */ > > > > if (intr_status & PAB_INTP_INTX_MASK) { > > > > shifted_status = csr_readl(pcie, PAB_INTP_AMBA_MISC_STAT); > > > > + shifted_status &= PAB_INTP_INTX_MASK; > > > > shifted_status >>= PAB_INTX_START; > > > > do { > > > > for_each_set_bit(bit, &shifted_status, > > > > PCI_NUM_INTX) { > > > > @@ -372,12 +373,16 @@ static void mobiveil_pcie_isr(struct irq_desc > > > > *desc) > > > > dev_err_ratelimited(dev, > > > > "unexpected IRQ, INT%d\n", > > > > bit); > > > > > > > > - /* clear interrupt */ > > > > - csr_writel(pcie, > > > > -shifted_status << > > > > PAB_INTX_START, > > > > + /* clear interrupt handled */ > > > > + csr_writel(pcie, 1 << (PAB_INTX_START + > > > > bit), > > > > PAB_INTP_AMBA_MISC_STAT); > > > > } > > > > - } while ((shifted_status >> PAB_INTX_START) != 0); > > > > + > > > > + shifted_status = csr_readl(pcie, > > > > + > > > > PAB_INTP_AMBA_MISC_STAT); > > > > + shifted_status &= PAB_INTP_INTX_MASK; > > > > + shifted_status >>= PAB_INTX_START; > > > > + } while (shifted_status != 0); > > > > } > > > > > > > > /* read extra MSI status register */ > > > > -- > > > > 2.17.1 > > > > > > > > > > > > -- Mobiveil INC., CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain proprietary confidential or privileged information or otherwise be protected by law. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please notify the sender and destroy all copies and the original message.
Re: [PATCH] mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind
On Tue 18-06-19 14:13:16, Yang Shi wrote: [...] > > > > > Change migrate_page_add() to check if the page is movable or not, if > > > > > it > > > > > is unmovable, just return -EIO. We don't have to check non-LRU > > > > > movable > > > > > pages since just zsmalloc and virtio-baloon support this. And, they > > > > > should be not able to reach here. > > > > You are not checking whether the page is movable, right? You only rely > > > > on PageLRU check which is not really an equivalent thing. There are > > > > movable pages which are not LRU and also pages might be off LRU > > > > temporarily for many reasons so this could lead to false positives. > > > I'm supposed non-LRU movable pages could not reach here. Since most of > > > them > > > are not mmapable, i.e. virtio-balloon, zsmalloc. zram device is mmapable, > > > but the page fault to that vma would end up allocating user space pages > > > which are on LRU. If I miss something please let me know. > > That might be true right now but it is a very subtle assumption that > > might break easily in the future. The point is still that even LRU pages > > might be isolated from the LRU list temporarily and you do not want this > > to cause the failure easily. > > I used to have !__PageMovable(page), but it was removed since the > aforementioned reason. I could add it back. > > For the temporary off LRU page, I did a quick search, it looks the most > paths have to acquire mmap_sem, so it can't race with us here. Page > reclaim/compaction looks like the only race. But, since the mapping should > be preserved even though the page is off LRU temporarily unless the page is > reclaimed, so we should be able to exclude temporary off LRU pages by > calling page_mapping() and page_anon_vma(). > > So, the fix may look like: > > if (!PageLRU(head) && !__PageMovable(page)) { > if (!(page_mapping(page) || page_anon_vma(page))) > return -EIO; This is getting even more muddy TBH. Is there any reason that we have to handle this problem during the isolation phase rather the migration? -- Michal Hocko SUSE Labs
[PATCH 5/8] mm: Start fallback top-down mmap at mm->mmap_base
In case of mmap failure in top-down mode, there is no need to go through the whole address space again for the bottom-up fallback: the goal of this fallback is to find, as a last resort, space between the top-down mmap base and the stack, which is the only place not covered by the top-down mmap. Signed-off-by: Alexandre Ghiti --- mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/mmap.c b/mm/mmap.c index dedae10cb6e2..e563145c1ff4 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2185,7 +2185,7 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr, if (offset_in_page(addr)) { VM_BUG_ON(addr != -ENOMEM); info.flags = 0; - info.low_limit = TASK_UNMAPPED_BASE; + info.low_limit = arch_get_mmap_base(addr, mm->mmap_base); info.high_limit = mmap_end; addr = vm_unmapped_area(&info); } -- 2.20.1
Re: [PATCH 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel
Hi Naveen, Sorry I meant to reply to this earlier .. :/ "Naveen N. Rao" writes: > With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to > enable function tracing and profiling. So far, with dynamic ftrace, we > used to only patch out the branch to _mcount(). However, mflr is > executed by the branch unit that can only execute one per cycle on > POWER9 and shared with branches, so it would be nice to avoid it where > possible. > > We cannot simply nop out the mflr either. When enabling function > tracing, there can be a race if tracing is enabled when some thread was > interrupted after executing a nop'ed out mflr. In this case, the thread > would execute the now-patched-in branch to _mcount() without having > executed the preceding mflr. > > To solve this, we now enable function tracing in 2 steps: patch in the > mflr instruction, use synchronize_rcu_tasks() to ensure all existing > threads make progress, and then patch in the branch to _mcount(). We > override ftrace_replace_code() with a powerpc64 variant for this > purpose. According to the ISA we're not allowed to patch mflr at runtime. See the section on "CMODX". I'm also not convinced the ordering between the two patches is guaranteed by the ISA, given that there's possibly no isync on the other CPU. But I haven't had time to dig into it sorry, hopefully later in the week? cheers > diff --git a/arch/powerpc/kernel/trace/ftrace.c > b/arch/powerpc/kernel/trace/ftrace.c > index 517662a56bdc..5e2b29808af1 100644 > --- a/arch/powerpc/kernel/trace/ftrace.c > +++ b/arch/powerpc/kernel/trace/ftrace.c > @@ -125,7 +125,7 @@ __ftrace_make_nop(struct module *mod, > { > unsigned long entry, ptr, tramp; > unsigned long ip = rec->ip; > - unsigned int op, pop; > + unsigned int op; > > /* read where this goes */ > if (probe_kernel_read(&op, (void *)ip, sizeof(int))) { > @@ -160,8 +160,6 @@ __ftrace_make_nop(struct module *mod, > > #ifdef CONFIG_MPROFILE_KERNEL > /* When using -mkernel_profile there is no load to jump over */ > - pop = PPC_INST_NOP; > - > if (probe_kernel_read(&op, (void *)(ip - 4), 4)) { > pr_err("Fetching instruction at %lx failed.\n", ip - 4); > return -EFAULT; > @@ -169,26 +167,23 @@ __ftrace_make_nop(struct module *mod, > > /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */ > if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) { > - pr_err("Unexpected instruction %08x around bl _mcount\n", op); > + pr_err("Unexpected instruction %08x before bl _mcount\n", op); > return -EINVAL; > } > -#else > - /* > - * Our original call site looks like: > - * > - * bl > - * ld r2,XX(r1) > - * > - * Milton Miller pointed out that we can not simply nop the branch. > - * If a task was preempted when calling a trace function, the nops > - * will remove the way to restore the TOC in r2 and the r2 TOC will > - * get corrupted. > - * > - * Use a b +8 to jump over the load. > - */ > > - pop = PPC_INST_BRANCH | 8; /* b +8 */ > + /* We should patch out the bl to _mcount first */ > + if (patch_instruction((unsigned int *)ip, PPC_INST_NOP)) { > + pr_err("Patching NOP failed.\n"); > + return -EPERM; > + } > > + /* then, nop out the preceding 'mflr r0' as an optimization */ > + if (op == PPC_INST_MFLR && > + patch_instruction((unsigned int *)(ip - 4), PPC_INST_NOP)) { > + pr_err("Patching NOP failed.\n"); > + return -EPERM; > + } > +#else > /* >* Check what is in the next instruction. We can see ld r2,40(r1), but >* on first pass after boot we will see mflr r0. > @@ -202,12 +197,25 @@ __ftrace_make_nop(struct module *mod, > pr_err("Expected %08x found %08x\n", PPC_INST_LD_TOC, op); > return -EINVAL; > } > -#endif /* CONFIG_MPROFILE_KERNEL */ > > - if (patch_instruction((unsigned int *)ip, pop)) { > + /* > + * Our original call site looks like: > + * > + * bl > + * ld r2,XX(r1) > + * > + * Milton Miller pointed out that we can not simply nop the branch. > + * If a task was preempted when calling a trace function, the nops > + * will remove the way to restore the TOC in r2 and the r2 TOC will > + * get corrupted. > + * > + * Use a b +8 to jump over the load. > + */ > + if (patch_instruction((unsigned int *)ip, PPC_INST_BRANCH | 8)) { > pr_err("Patching NOP failed.\n"); > return -EPERM; > } > +#endif /* CONFIG_MPROFILE_KERNEL */ > > return 0; > } > @@ -421,6 +429,26 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace > *rec, unsigned long addr) > return -EPERM; > } > > +#ifdef CONFIG_MPROFILE_KERNEL > + /* Nop out the preceding 'mflr r0' as an optim
[PATCH 4/8] x86, hugetlbpage: Start fallback of top-down mmap at mm->mmap_base
In case of mmap failure in top-down mode, there is no need to go through the whole address space again for the bottom-up fallback: the goal of this fallback is to find, as a last resort, space between the top-down mmap base and the stack, which is the only place not covered by the top-down mmap. Signed-off-by: Alexandre Ghiti --- arch/x86/mm/hugetlbpage.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c index fab095362c50..4b90339aef50 100644 --- a/arch/x86/mm/hugetlbpage.c +++ b/arch/x86/mm/hugetlbpage.c @@ -106,11 +106,12 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file, { struct hstate *h = hstate_file(file); struct vm_unmapped_area_info info; + unsigned long mmap_base = get_mmap_base(0); info.flags = VM_UNMAPPED_AREA_TOPDOWN; info.length = len; info.low_limit = PAGE_SIZE; - info.high_limit = get_mmap_base(0); + info.high_limit = mmap_base; /* * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area @@ -132,7 +133,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file, if (addr & ~PAGE_MASK) { VM_BUG_ON(addr != -ENOMEM); info.flags = 0; - info.low_limit = TASK_UNMAPPED_BASE; + info.low_limit = mmap_base; info.high_limit = TASK_SIZE_LOW; addr = vm_unmapped_area(&info); } -- 2.20.1
RE: [PATCH] net: stmmac: add sanity check to device_property_read_u32_array call
Hi Colin, > Currently the call to device_property_read_u32_array is not error checked > leading to potential garbage values in the delays array that are then used > in msleep delays. Add a sanity check to the property fetching. > > Addresses-Coverity: ("Uninitialized scalar variable") > Signed-off-by: Colin Ian King I have also sent a patch [0] to fix initialize the array. can you please look at my patch so we can work out which one to use? my concern is that the "snps,reset-delays-us" property is optional, the current dt-bindings documentation states that it's a required property. in reality it isn't, there are boards (two examples are mentioned in my patch: [0]) without it. so I believe that the resulting behavior has to be: 1. don't delay if this property is missing (instead of delaying for ms) 2. don't error out if this property is missing your patch covers #1, can you please check whether #2 is also covered? I tested case #2 when submitting my patch and it worked fine (even though I could not reproduce the garbage values which are being read on some boards) Thank you! Martin [0] https://lkml.org/lkml/2019/4/19/638
[PATCH 3/8] sparc: Start fallback of top-down mmap at mm->mmap_base
In case of mmap failure in top-down mode, there is no need to go through the whole address space again for the bottom-up fallback: the goal of this fallback is to find, as a last resort, space between the top-down mmap base and the stack, which is the only place not covered by the top-down mmap. Signed-off-by: Alexandre Ghiti --- arch/sparc/kernel/sys_sparc_64.c | 2 +- arch/sparc/mm/hugetlbpage.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c index ccc88926bc00..ea1de1e5fa8d 100644 --- a/arch/sparc/kernel/sys_sparc_64.c +++ b/arch/sparc/kernel/sys_sparc_64.c @@ -206,7 +206,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, if (addr & ~PAGE_MASK) { VM_BUG_ON(addr != -ENOMEM); info.flags = 0; - info.low_limit = TASK_UNMAPPED_BASE; + info.low_limit = mm->mmap_base; info.high_limit = STACK_TOP32; addr = vm_unmapped_area(&info); } diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c index f78793a06bbd..9c67f805abc8 100644 --- a/arch/sparc/mm/hugetlbpage.c +++ b/arch/sparc/mm/hugetlbpage.c @@ -86,7 +86,7 @@ hugetlb_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, if (addr & ~PAGE_MASK) { VM_BUG_ON(addr != -ENOMEM); info.flags = 0; - info.low_limit = TASK_UNMAPPED_BASE; + info.low_limit = mm->mmap_base; info.high_limit = STACK_TOP32; addr = vm_unmapped_area(&info); } -- 2.20.1
Re: [PATCH V4 1/2] PCI: dwc: Add API support to de-initialize host
Hi Lorenzo, On 18/06/19 7:58 PM, Lorenzo Pieralisi wrote: > On Tue, Jun 18, 2019 at 04:21:17PM +0530, Vidya Sagar wrote: > > [...] > >>> 2) It is not related to this patch but I fail to see the reasoning >>> behind the __ in __dw_pci_read_dbi(), there is no no-underscore >>> equivalent so its definition is somewhat questionable, maybe >>> we should clean-it up (for dbi2 alike). >> Separate no-underscore versions are present in pcie-designware.h for >> each width (i.e. l/w/b) as inline and are calling __ versions passing >> size as argument. > > I understand - the __ prologue was added in b50b2db266d8 maybe > Kishon can help us understand the __ rationale. > > I am happy to merge it as is, I was just curious about the > __ annotation (not related to this patch). In commit b50b2db266d8a8c303e8d88590 ("PCI: dwc: all: Modify dbi accessors to take dbi_base as argument"), dbi accessors was modified to take dbi_base as argument (since we wanted to write to dbics2 address space). We didn't want to change all the drivers invoking dbi accessors to pass the dbi_base. So we added "__" variant to take dbi_base as argument and the drivers continued to invoke existing dbi accessors which in-turn invoked "__" version with dbi_base as argument. I agree there could be some cleanup since in commit a509d7d9af5ebf86ffbefa98e49761d ("PCI: dwc: all: Modify dbi accessors to access data of 4/2/1 bytes"), we modified __dw_pcie_readl_dbi() to __dw_pcie_write_dbi() when it could have been directly modified to dw_pcie_write_dbi(). Thanks Kishon
[PATCH 2/8] sh: Start fallback of top-down mmap at mm->mmap_base
In case of mmap failure in top-down mode, there is no need to go through the whole address space again for the bottom-up fallback: the goal of this fallback is to find, as a last resort, space between the top-down mmap base and the stack, which is the only place not covered by the top-down mmap. Signed-off-by: Alexandre Ghiti --- arch/sh/mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/sh/mm/mmap.c b/arch/sh/mm/mmap.c index 6a1a1297baae..4c7da92473dd 100644 --- a/arch/sh/mm/mmap.c +++ b/arch/sh/mm/mmap.c @@ -135,7 +135,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, if (addr & ~PAGE_MASK) { VM_BUG_ON(addr != -ENOMEM); info.flags = 0; - info.low_limit = TASK_UNMAPPED_BASE; + info.low_limit = mm->mmap_base; info.high_limit = TASK_SIZE; addr = vm_unmapped_area(&info); } -- 2.20.1
[PATCH 1/8] s390: Start fallback of top-down mmap at mm->mmap_base
In case of mmap failure in top-down mode, there is no need to go through the whole address space again for the bottom-up fallback: the goal of this fallback is to find, as a last resort, space between the top-down mmap base and the stack, which is the only place not covered by the top-down mmap. Signed-off-by: Alexandre Ghiti --- arch/s390/mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c index cbc718ba6d78..4a222969843b 100644 --- a/arch/s390/mm/mmap.c +++ b/arch/s390/mm/mmap.c @@ -166,7 +166,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, if (addr & ~PAGE_MASK) { VM_BUG_ON(addr != -ENOMEM); info.flags = 0; - info.low_limit = TASK_UNMAPPED_BASE; + info.low_limit = mm->mmap_base; info.high_limit = TASK_SIZE; addr = vm_unmapped_area(&info); if (addr & ~PAGE_MASK) -- 2.20.1
[PATCH 0/8] Fix mmap base in bottom-up mmap
This series fixes the fallback of the top-down mmap: in case of failure, a bottom-up scheme can be tried as a last resort between the top-down mmap base and the stack, hoping for a large unused stack limit. Lots of architectures and even mm code start this fallback at TASK_UNMAPPED_BASE, which is useless since the top-down scheme already failed on the whole address space: instead, simply use mmap_base. Along the way, it allows to get rid of of mmap_legacy_base and mmap_compat_legacy_base from mm_struct. Note that arm and mips already implement this behaviour. Alexandre Ghiti (8): s390: Start fallback of top-down mmap at mm->mmap_base sh: Start fallback of top-down mmap at mm->mmap_base sparc: Start fallback of top-down mmap at mm->mmap_base x86, hugetlbpage: Start fallback of top-down mmap at mm->mmap_base mm: Start fallback top-down mmap at mm->mmap_base parisc: Use mmap_base, not mmap_legacy_base, as low_limit for bottom-up mmap x86: Use mmap_*base, not mmap_*legacy_base, as low_limit for bottom-up mmap mm: Remove mmap_legacy_base and mmap_compat_legacy_code fields from mm_struct arch/parisc/kernel/sys_parisc.c | 8 +++- arch/s390/mm/mmap.c | 2 +- arch/sh/mm/mmap.c| 2 +- arch/sparc/kernel/sys_sparc_64.c | 2 +- arch/sparc/mm/hugetlbpage.c | 2 +- arch/x86/include/asm/elf.h | 2 +- arch/x86/kernel/sys_x86_64.c | 4 ++-- arch/x86/mm/hugetlbpage.c| 7 --- arch/x86/mm/mmap.c | 20 +--- include/linux/mm_types.h | 2 -- mm/debug.c | 4 ++-- mm/mmap.c| 2 +- 12 files changed, 26 insertions(+), 31 deletions(-) -- 2.20.1
Re: [LINUX PATCH v12 3/3] mtd: rawnand: arasan: Add support for Arasan NAND Flash Controller
On Mon, Jan 28, 2019 at 10:27:39AM +0100, Miquel Raynal wrote: Hi Miquel, > Hi Naga, > > Naga Sureshkumar Relli wrote on Mon, 28 Jan 2019 > 06:04:53 +: > > > Hi Boris & Miquel, > > > > Could you please provide your thoughts on this driver to support HW-ECC? > > As I said previously, there is no way to detect errors beyond N bit. > > I am ok to update the driver based on your inputs. > > We won't support the ECC engine. It simply cannot be used reliably. > > I am working on a generic ECC engine object. It's gonna take a few > months until it gets merged but after that you could update the > controller driver to drop any ECC-related function. Although the ECC Could you please let me know that, when can we expect generic ECC engine update in mtd NAND? Based on that, i will plan to update the ARASAN NAND driver along with your comments mentioned above under this update, as you know there is a limiation in ARASAN NAND controller to detect ECC errors. i am following this series https://patchwork.kernel.org/patch/10838705/ Thanks, Naga Sureshkumar Relli > engine part is blocking, raw access still look wrong and the driver > still needs changes. > > > Thanks, > Miquèl > > __ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/
[GIT PULL] ARM: TI SOC updates for v5.3
The following changes since commit cd6c84d8f0cdc911df435bb075ba22ce3c605b07: Linux 5.2-rc2 (2019-05-26 16:49:19 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone.git tags/drivers_soc_for_5.3 for you to fetch changes up to 4c960505df44b94001178575a505dd8315086edc: firmware: ti_sci: Fix gcc unused-but-set-variable warning (2019-06-18 21:32:25 -0700) SOC: TI SCI updates for v5.3 - Couple of fixes to handle resource ranges and requesting response always from firmware; - Add processor control - Add support APIs for DMA - Fix the SPDX license plate - Unused varible warning fix Andrew F. Davis (1): firmware: ti_sci: Always request response from firmware Nishad Kamdar (1): firmware: ti_sci: Use the correct style for SPDX License Identifier Peter Ujfalusi (2): firmware: ti_sci: Add resource management APIs for ringacc, psi-l and udma firmware: ti_sci: Parse all resource ranges even if some is not available Suman Anna (1): firmware: ti_sci: Add support for processor control YueHaibing (1): firmware: ti_sci: Fix gcc unused-but-set-variable warning drivers/firmware/ti_sci.c | 1143 +++- drivers/firmware/ti_sci.h | 812 ++- include/linux/soc/ti/ti_sci_protocol.h | 246 +++ 3 files changed, 2051 insertions(+), 150 deletions(-)
Re: [PATCH 5.1 000/115] 5.1.12-stable review
On Tue, 18 Jun 2019 at 19:05, Greg Kroah-Hartman wrote: > > On Tue, Jun 18, 2019 at 06:04:25PM +0530, Naresh Kamboju wrote: > > On Tue, 18 Jun 2019 at 02:50, Greg Kroah-Hartman > > wrote: > > > > > > This is the start of the stable review cycle for the 5.1.12 release. > > > There are 115 patches in this series, all will be posted as a response > > > to this one. If anyone has any issues with these being applied, please > > > let me know. > > > > > > Responses should be made by Wed 19 Jun 2019 09:06:21 PM UTC. > > > Anything received after that time might be too late. > > > > > > The whole patch series can be found in one patch at: > > > > > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.1.12-rc1.gz > > > or in the git tree and branch at: > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > > > linux-5.1.y > > > and the diffstat can be found below. > > > > > > thanks, > > > > > > greg k-h > > > > Results from Linaro’s test farm. > > No regressions on arm64, arm, x86_64, and i386. > > > > NOTE: > > kernel/workqueue.c:3030 __flush_work+0x2c2/0x2d0 > > Kernel warning is been fixed by below patch. > > > > > John Fastabend > > > bpf: sockmap, only stop/flush strp if it was enabled at some point > > What is the git commit id for this patch? Upstream commit 014894360ec95abe868e94416b3dd6569f6e2c0c - Naresh
Re: [PATCH v3 -next] firmware: ti_sci: Fix gcc unused-but-set-variable warning
On 6/17/19 11:41 AM, Suman Anna wrote: On 6/15/19 7:50 AM, YueHaibing wrote: Fixes gcc '-Wunused-but-set-variable' warning: drivers/firmware/ti_sci.c: In function ti_sci_cmd_ring_config: drivers/firmware/ti_sci.c:2035:17: warning: variable dev set but not used [-Wunused-but-set-variable] drivers/firmware/ti_sci.c: In function ti_sci_cmd_ring_get_config: drivers/firmware/ti_sci.c:2104:17: warning: variable dev set but not used [-Wunused-but-set-variable] drivers/firmware/ti_sci.c: In function ti_sci_cmd_rm_udmap_tx_ch_cfg: drivers/firmware/ti_sci.c:2287:17: warning: variable dev set but not used [-Wunused-but-set-variable] drivers/firmware/ti_sci.c: In function ti_sci_cmd_rm_udmap_rx_ch_cfg: drivers/firmware/ti_sci.c:2357:17: warning: variable dev set but not used [-Wunused-but-set-variable] Use the 'dev' variable instead of info->dev to fix this. Reported-by: Hulk Robot Signed-off-by: YueHaibing Acked-by: Suman Anna Hi Santosh, Can you pick up this patch, goes on top of your for_5.3/driver-soc branch? Applied.
Re: [PATCH] firmware: ti_sci: Use the correct style for SPDX License Identifier
On 6/14/19 6:57 AM, Nishad Kamdar wrote: This patch corrects the SPDX License Identifier style in header file related to Firmware Drivers for Texas Instruments SCI Protocol. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used) Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46 Suggested-by: Joe Perches Signed-off-by: Nishad Kamdar --- Applied
Re: [PATCH v2 1/1] cpuidle-powernv : forced wakeup for stop states
Abhishek Goel's on June 17, 2019 7:56 pm: > Currently, the cpuidle governors determine what idle state a idling CPU > should enter into based on heuristics that depend on the idle history on > that CPU. Given that no predictive heuristic is perfect, there are cases > where the governor predicts a shallow idle state, hoping that the CPU will > be busy soon. However, if no new workload is scheduled on that CPU in the > near future, the CPU may end up in the shallow state. > > This is problematic, when the predicted state in the aforementioned > scenario is a shallow stop state on a tickless system. As we might get > stuck into shallow states for hours, in absence of ticks or interrupts. > > To address this, We forcefully wakeup the cpu by setting the > decrementer. The decrementer is set to a value that corresponds with the > residency of the next available state. Thus firing up a timer that will > forcefully wakeup the cpu. Few such iterations will essentially train the > governor to select a deeper state for that cpu, as the timer here > corresponds to the next available cpuidle state residency. Thus, cpu will > eventually end up in the deepest possible state. > > Signed-off-by: Abhishek Goel > --- > > Auto-promotion > v1 : started as auto promotion logic for cpuidle states in generic > driver > v2 : Removed timeout_needed and rebased the code to upstream kernel > Forced-wakeup > v1 : New patch with name of forced wakeup started > v2 : Extending the forced wakeup logic for all states. Setting the > decrementer instead of queuing up a hrtimer to implement the logic. > > drivers/cpuidle/cpuidle-powernv.c | 38 +++ > 1 file changed, 38 insertions(+) > > diff --git a/drivers/cpuidle/cpuidle-powernv.c > b/drivers/cpuidle/cpuidle-powernv.c > index 84b1ebe212b3..bc9ca18ae7e3 100644 > --- a/drivers/cpuidle/cpuidle-powernv.c > +++ b/drivers/cpuidle/cpuidle-powernv.c > @@ -46,6 +46,26 @@ static struct stop_psscr_table > stop_psscr_table[CPUIDLE_STATE_MAX] __read_mostly > static u64 default_snooze_timeout __read_mostly; > static bool snooze_timeout_en __read_mostly; > > +static u64 forced_wakeup_timeout(struct cpuidle_device *dev, > + struct cpuidle_driver *drv, > + int index) > +{ > + int i; > + > + for (i = index + 1; i < drv->state_count; i++) { > + struct cpuidle_state *s = &drv->states[i]; > + struct cpuidle_state_usage *su = &dev->states_usage[i]; > + > + if (s->disabled || su->disable) > + continue; > + > + return (s->target_residency + 2 * s->exit_latency) * > + tb_ticks_per_usec; > + } > + > + return 0; > +} It would be nice to not have this kind of loop iteration in the idle fast path. Can we add a flag or something to the idle state? > + > static u64 get_snooze_timeout(struct cpuidle_device *dev, > struct cpuidle_driver *drv, > int index) > @@ -144,8 +164,26 @@ static int stop_loop(struct cpuidle_device *dev, >struct cpuidle_driver *drv, >int index) > { > + u64 dec_expiry_tb, dec, timeout_tb, forced_wakeup; > + > + dec = mfspr(SPRN_DEC); > + timeout_tb = forced_wakeup_timeout(dev, drv, index); > + forced_wakeup = 0; > + > + if (timeout_tb && timeout_tb < dec) { > + forced_wakeup = 1; > + dec_expiry_tb = mftb() + dec; > + } The compiler probably can't optimise away the SPR manipulations so try to avoid them if possible. > + > + if (forced_wakeup) > + mtspr(SPRN_DEC, timeout_tb); This should just be put in the above 'if'. > + > power9_idle_type(stop_psscr_table[index].val, >stop_psscr_table[index].mask); > + > + if (forced_wakeup) > + mtspr(SPRN_DEC, dec_expiry_tb - mftb()); This will sometimes go negative and result in another timer interrupt. It also breaks irq work (which can be set here by machine check I believe. May need to implement some timer code to do this for you. static void reset_dec_after_idle(void) { u64 now; u64 *next_tb; if (test_irq_work_pending()) return; now = mftb; next_tb = this_cpu_ptr(&decrementers_next_tb); if (now >= *next_tb) return; set_dec(*next_tb - now); if (test_irq_work_pending()) set_dec(1); } Something vaguely like that. See timer_interrupt(). Thanks, Nick
[PATCH 0/1] One cleanup patch for FPGA
Hi Greg, please take this cleanup patch. It's been on the list but somehow fell through the cracks. Thanks, Moritz Enrico Weigelt (1): drivers: fpga: Kconfig: pedantic cleanups drivers/fpga/Kconfig | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- 2.22.0
[PATCH 1/1] drivers: fpga: Kconfig: pedantic cleanups
From: Enrico Weigelt Formatting of Kconfig files doesn't look so pretty, so just take damp cloth and clean it up. Signed-off-by: Enrico Weigelt Signed-off-by: Moritz Fischer --- drivers/fpga/Kconfig | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig index 8072c195d831..474f304ec109 100644 --- a/drivers/fpga/Kconfig +++ b/drivers/fpga/Kconfig @@ -26,9 +26,9 @@ config FPGA_MGR_SOCFPGA_A10 FPGA manager driver support for Altera Arria10 SoCFPGA. config ALTERA_PR_IP_CORE -tristate "Altera Partial Reconfiguration IP Core" -help - Core driver support for Altera Partial Reconfiguration IP component + tristate "Altera Partial Reconfiguration IP Core" + help + Core driver support for Altera Partial Reconfiguration IP component config ALTERA_PR_IP_CORE_PLAT tristate "Platform support of Altera Partial Reconfiguration IP Core" -- 2.22.0
[PATCH V6 1/3] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory()
Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs entries between memory block and node. It first checks pfn validity with pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config (arm64 has this enabled) pfn_valid_within() calls pfn_valid(). pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID) which scans all mapped memblock regions with memblock_is_map_memory(). This creates a problem in memory hot remove path which has already removed given memory range from memory block with memblock_[remove|free] before arriving at unregister_mem_sect_under_nodes(). Hence get_nid_for_pfn() returns -1 skipping subsequent sysfs_remove_link() calls leaving node <-> memory block sysfs entries as is. Subsequent memory add operation hits BUG_ON() because of existing sysfs entries. [ 62.007176] NUMA: Unknown node for memory at 0x68000, assuming node 0 [ 62.052517] [ cut here ] [ 62.053211] kernel BUG at mm/memory_hotplug.c:1143! [ 62.053868] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 62.054589] Modules linked in: [ 62.054999] CPU: 19 PID: 3275 Comm: bash Not tainted 5.1.0-rc2-4-g28cea40b2683 #41 [ 62.056274] Hardware name: linux,dummy-virt (DT) [ 62.057166] pstate: 4045 (nZcv daif +PAN -UAO) [ 62.058083] pc : add_memory_resource+0x1cc/0x1d8 [ 62.058961] lr : add_memory_resource+0x10c/0x1d8 [ 62.059842] sp : 168b3ce0 [ 62.060477] x29: 168b3ce0 x28: 8005db546c00 [ 62.061501] x27: x26: [ 62.062509] x25: 111ef000 x24: 111ef5d0 [ 62.063520] x23: x22: 0006bfff [ 62.064540] x21: ffef x20: 006c [ 62.065558] x19: 0068 x18: 0024 [ 62.066566] x17: x16: [ 62.067579] x15: x14: 8005e412e890 [ 62.068588] x13: 8005d6b105d8 x12: [ 62.069610] x11: 8005d6b10490 x10: 0040 [ 62.070615] x9 : 8005e412e898 x8 : 8005e412e890 [ 62.071631] x7 : 8005d6b105d8 x6 : 8005db546c00 [ 62.072640] x5 : 0001 x4 : 0002 [ 62.073654] x3 : 8005d7049480 x2 : 0002 [ 62.074666] x1 : 0003 x0 : ffef [ 62.075685] Process bash (pid: 3275, stack limit = 0xd754280f) [ 62.076930] Call trace: [ 62.077411] add_memory_resource+0x1cc/0x1d8 [ 62.078227] __add_memory+0x70/0xa8 [ 62.078901] probe_store+0xa4/0xc8 [ 62.079561] dev_attr_store+0x18/0x28 [ 62.080270] sysfs_kf_write+0x40/0x58 [ 62.080992] kernfs_fop_write+0xcc/0x1d8 [ 62.081744] __vfs_write+0x18/0x40 [ 62.082400] vfs_write+0xa4/0x1b0 [ 62.083037] ksys_write+0x5c/0xc0 [ 62.083681] __arm64_sys_write+0x18/0x20 [ 62.084432] el0_svc_handler+0x88/0x100 [ 62.085177] el0_svc+0x8/0xc Re-ordering memblock_[free|remove]() with arch_remove_memory() solves the problem on arm64 as pfn_valid() behaves correctly and returns positive as memblock for the address range still exists. arch_remove_memory() removes applicable memory sections from zone with __remove_pages() and tears down kernel linear mapping. Removing memblock regions afterwards is safe because there is no other memblock (bootmem) allocator user that late. So nobody is going to allocate from the removed range just to blow up later. Also nobody should be using the bootmem allocated range else we wouldn't allow to remove it. So reordering is indeed safe. Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Acked-by: Mark Rutland Acked-by: Michal Hocko Signed-off-by: Anshuman Khandual --- mm/memory_hotplug.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a88c5f3..cfa5fac 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1831,13 +1831,13 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) /* remove memmap entry */ firmware_map_remove(start, start + size, "System RAM"); - memblock_free(start, size); - memblock_remove(start, size); /* remove memory block devices before removing memory */ remove_memory_block_devices(start, size); arch_remove_memory(nid, start, size, NULL); + memblock_free(start, size); + memblock_remove(start, size); __release_memory_resource(start, size); try_offline_node(nid); -- 2.7.4
[PATCH V6 3/3] arm64/mm: Enable memory hot remove
The arch code for hot-remove must tear down portions of the linear map and vmemmap corresponding to memory being removed. In both cases the page tables mapping these regions must be freed, and when sparse vmemmap is in use the memory backing the vmemmap must also be freed. This patch adds a new remove_pagetable() helper which can be used to tear down either region, and calls it from vmemmap_free() and ___remove_pgd_mapping(). The sparse_vmap argument determines whether the backing memory will be freed. remove_pagetable() makes two distinct passes over the kernel page table. In the first pass it unmaps, invalidates applicable TLB cache and frees backing memory if required (vmemmap) for each mapped leaf entry. In the second pass it looks for empty page table sections whose page table page can be unmapped, TLB invalidated and freed. While freeing intermediate level page table pages bail out if any of its entries are still valid. This can happen for partially filled kernel page table either from a previously attempted failed memory hot add or while removing an address range which does not span the entire page table page range. The vmemmap region may share levels of table with the vmalloc region. There can be conflicts between hot remove freeing page table pages with a concurrent vmalloc() walking the kernel page table. This conflict can not just be solved by taking the init_mm ptl because of existing locking scheme in vmalloc(). Hence unlike linear mapping, skip freeing page table pages while tearing down vmemmap mapping. While here update arch_add_memory() to handle __add_pages() failures by just unmapping recently added kernel linear mapping. Now enable memory hot remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE. This implementation is overall inspired from kernel page table tear down procedure on X86 architecture. Acked-by: David Hildenbrand Signed-off-by: Anshuman Khandual --- arch/arm64/Kconfig | 3 + arch/arm64/mm/mmu.c | 290 ++-- 2 files changed, 284 insertions(+), 9 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 6426f48..9375f26 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -270,6 +270,9 @@ config HAVE_GENERIC_GUP config ARCH_ENABLE_MEMORY_HOTPLUG def_bool y +config ARCH_ENABLE_MEMORY_HOTREMOVE + def_bool y + config SMP def_bool y diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 93ed0df..9e80a94 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -733,6 +733,250 @@ int kern_addr_valid(unsigned long addr) return pfn_valid(pte_pfn(pte)); } + +#ifdef CONFIG_MEMORY_HOTPLUG +static void free_hotplug_page_range(struct page *page, size_t size) +{ + WARN_ON(!page || PageReserved(page)); + free_pages((unsigned long)page_address(page), get_order(size)); +} + +static void free_hotplug_pgtable_page(struct page *page) +{ + free_hotplug_page_range(page, PAGE_SIZE); +} + +static void free_pte_table(pmd_t *pmdp, unsigned long addr) +{ + struct page *page; + pte_t *ptep; + int i; + + ptep = pte_offset_kernel(pmdp, 0UL); + for (i = 0; i < PTRS_PER_PTE; i++) { + if (!pte_none(READ_ONCE(ptep[i]))) + return; + } + + page = pmd_page(READ_ONCE(*pmdp)); + pmd_clear(pmdp); + __flush_tlb_kernel_pgtable(addr); + free_hotplug_pgtable_page(page); +} + +static void free_pmd_table(pud_t *pudp, unsigned long addr) +{ + struct page *page; + pmd_t *pmdp; + int i; + + if (CONFIG_PGTABLE_LEVELS <= 2) + return; + + pmdp = pmd_offset(pudp, 0UL); + for (i = 0; i < PTRS_PER_PMD; i++) { + if (!pmd_none(READ_ONCE(pmdp[i]))) + return; + } + + page = pud_page(READ_ONCE(*pudp)); + pud_clear(pudp); + __flush_tlb_kernel_pgtable(addr); + free_hotplug_pgtable_page(page); +} + +static void free_pud_table(pgd_t *pgdp, unsigned long addr) +{ + struct page *page; + pud_t *pudp; + int i; + + if (CONFIG_PGTABLE_LEVELS <= 3) + return; + + pudp = pud_offset(pgdp, 0UL); + for (i = 0; i < PTRS_PER_PUD; i++) { + if (!pud_none(READ_ONCE(pudp[i]))) + return; + } + + page = pgd_page(READ_ONCE(*pgdp)); + pgd_clear(pgdp); + __flush_tlb_kernel_pgtable(addr); + free_hotplug_pgtable_page(page); +} + +static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr, + unsigned long end, bool sparse_vmap) +{ + struct page *page; + pte_t *ptep, pte; + + do { + ptep = pte_offset_kernel(pmdp, addr); + pte = READ_ONCE(*ptep); + if (pte_none(pte)) + continue; + + WARN_ON(!pte_present(pte)); + page = sparse_vmap
[PATCH V6 2/3] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump
The arm64 page table dump code can race with concurrent modification of the kernel page tables. When a leaf entries are modified concurrently, the dump code may log stale or inconsistent information for a VA range, but this is otherwise not harmful. When intermediate levels of table are freed, the dump code will continue to use memory which has been freed and potentially reallocated for another purpose. In such cases, the dump code may dereference bogus addresses, leading to a number of potential problems. Intermediate levels of table may by freed during memory hot-remove, which will be enabled by a subsequent patch. To avoid racing with this, take the memory hotplug lock when walking the kernel page table. Acked-by: David Hildenbrand Acked-by: Mark Rutland Signed-off-by: Anshuman Khandual --- arch/arm64/mm/ptdump_debugfs.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c index 064163f..b5eebc8 100644 --- a/arch/arm64/mm/ptdump_debugfs.c +++ b/arch/arm64/mm/ptdump_debugfs.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include #include @@ -7,7 +8,10 @@ static int ptdump_show(struct seq_file *m, void *v) { struct ptdump_info *info = m->private; + + get_online_mems(); ptdump_walk_pgd(m, info); + put_online_mems(); return 0; } DEFINE_SHOW_ATTRIBUTE(ptdump); -- 2.7.4
[PATCH V6 0/3] arm64/mm: Enable memory hot remove
This series enables memory hot remove on arm64 after fixing a memblock removal ordering problem in generic try_remove_memory() and a possible arm64 platform specific kernel page table race condition. This series is based on linux-next (next-20190613). Concurrent vmalloc() and hot-remove conflict: As pointed out earlier on the v5 thread [2] there can be potential conflict between concurrent vmalloc() and memory hot-remove operation. This can be solved or at least avoided with some possible methods. The problem here is caused by inadequate locking in vmalloc() which protects installation of a page table page but not the walk or the leaf entry modification. Option 1: Making locking in vmalloc() adequate Current locking scheme protects installation of page table pages but not the page table walk or leaf entry creation which can conflict with hot-remove. This scheme is sufficient for now as vmalloc() works on mutually exclusive ranges which can proceed concurrently only if their shared page table pages can be created while inside the lock. It achieves performance improvement which will be compromised if entire vmalloc() operation (even if with some optimization) has to be completed under a lock. Option 2: Making sure hot-remove does not happen during vmalloc() Take mem_hotplug_lock in read mode through [get|put]_online_mems() constructs for the entire duration of vmalloc(). It protects from concurrent memory hot remove operation and does not add any significant overhead to other concurrent vmalloc() threads. It solves the problem in right way unless we do not want to extend the usage of mem_hotplug_lock in generic MM. Option 3: Memory hot-remove does not free (conflicting) page table pages Don't not free page table pages (if any) for vmemmap mappings after unmapping it's virtual range. The only downside here is that some page table pages might remain empty and unused until next memory hot-add operation of the same memory range. Option 4: Dont let vmalloc and vmemmap share intermediate page table pages The conflict does not arise if vmalloc and vmemap range do not share kernel page table pages to start with. If such placement can be ensured in platform kernel virtual address layout, this problem can be successfully avoided. There are two generic solutions (Option 1 and 2) and two platform specific solutions (Options 2 and 3). This series has decided to go with (Option 3) which requires minimum changes while self-contained inside the functionality. Testing: Memory hot remove has been tested on arm64 for 4K, 16K, 64K page config options with all possible CONFIG_ARM64_VA_BITS and CONFIG_PGTABLE_LEVELS combinations. Its only build tested on non-arm64 platforms. Changes in V6: - Implemented most of the suggestions from Mark Rutland - Added in ptdump - remove_pagetable() now has two distinct passes over the kernel page table - First pass unmap_hotplug_range() removes leaf level entries at all level - Second pass free_empty_tables() removes empty page table pages - Kernel page table lock has been dropped completely - vmemmap_free() does not call freee_empty_tables() to avoid conflict with vmalloc() - All address range scanning are converted to do {} while() loop - Added 'unsigned long end' in __remove_pgd_mapping() - Callers need not provide starting pointer argument to free_[pte|pmd|pud]_table() - Drop the starting pointer argument from free_[pte|pmd|pud]_table() functions - Fetching pxxp[i] in free_[pte|pmd|pud]_table() is wrapped around in READ_ONCE() - free_[pte|pmd|pud]_table() now computes starting pointer inside the function - Fixed TLB handling while freeing huge page section mappings at PMD or PUD level - Added WARN_ON(!page) in free_hotplug_page_range() - Added WARN_ON(![pm|pud]_table(pud|pmd)) when there is no section mapping - [PATCH 1/3] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory() - Request earlier for separate merger (https://patchwork.kernel.org/patch/10986599/) - s/__remove_memory/try_remove_memory in the subject line - s/arch_remove_memory/memblock_[free|remove] in the subject line - A small change in the commit message as re-order happens now for memblock remove functions not for arch_remove_memory() Changes in V5: (https://lkml.org/lkml/2019/5/29/218) - Have some agreement [1] over using memory_hotplug_lock for arm64 ptdump - Change 7ba36eccb3f8 ("arm64/mm: Inhibit huge-vmap with ptdump") already merged - Dropped the above patch from this series - Fixed indentation problem in arch_[add|remove]_memory() as per David - Collected all new Acked-by tags Changes in V4: (https://lkml.org/lkml/2019/5/20/19) - Implemented most of the suggestions from Mark Rutland - Interchanged patch [PATCH 2/4] <---> [PATCH 3/4] and updated commit message - Moved CONFIG_PGTABLE_LEVELS inside free_[pud|pmd]_table() - Used READ_ONCE() in missing instances while accessing page table entries - s/p???_present()/p???_none() for checking valid kernel page table entries -
Re: linux-next: build failure after merge of the net-next tree
On Wed, Jun 19, 2019 at 1:02 PM Masahiro Yamada wrote: > > Hi. > > > On Wed, Jun 19, 2019 at 12:23 PM Stephen Rothwell > wrote: > > > > Hi all, > > > > After merging the net-next tree, today's linux-next build (x86_64 > > allmodconfig) failed like this: > > > > In file included from usr/include/linux/tc_act/tc_ctinfo.hdrtest.c:1: > > ./usr/include/linux/tc_act/tc_ctinfo.h:30:21: error: implicit declaration > > of function 'BIT' [-Werror=implicit-function-declaration] > > CTINFO_MODE_DSCP = BIT(0), > > ^~~ > > ./usr/include/linux/tc_act/tc_ctinfo.h:30:2: error: enumerator value for > > 'CTINFO_MODE_DSCP' is not an integer constant > > CTINFO_MODE_DSCP = BIT(0), > > ^~~~ > > ./usr/include/linux/tc_act/tc_ctinfo.h:32:1: error: enumerator value for > > 'CTINFO_MODE_CPMARK' is not an integer constant > > }; > > ^ > > > > Caused by commit > > > > 24ec483cec98 ("net: sched: Introduce act_ctinfo action") > > > > Presumably exposed by commit > > > > b91976b7c0e3 ("kbuild: compile-test UAPI headers to ensure they are > > self-contained") > > > > from the kbuild tree. > > > My commit correctly blocked the broken UAPI header, Hooray! > > People export more and more headers that > are never able to compile in user-space. > > We must block new breakages from coming in. > > > BIT() is not exported to user-space > since it is not prefixed with underscore. > > > You can use _BITUL() in user-space, > which is available in include/uapi/linux/const.h > > I just took a look at include/uapi/linux/tc_act/tc_ctinfo.h I just wondered why the following can be compiled: struct tc_ctinfo { tc_gen; }; Then, I found 'tc_gen' is a macro. #define tc_gen \ __u32 index; \ __u32 capab; \ int action; \ int refcnt; \ int bindcnt What a hell. -- Best Regards Masahiro Yamada
Re: linux-next: build failure after merge of the net-next tree
Hi. On Wed, Jun 19, 2019 at 12:23 PM Stephen Rothwell wrote: > > Hi all, > > After merging the net-next tree, today's linux-next build (x86_64 > allmodconfig) failed like this: > > In file included from usr/include/linux/tc_act/tc_ctinfo.hdrtest.c:1: > ./usr/include/linux/tc_act/tc_ctinfo.h:30:21: error: implicit declaration of > function 'BIT' [-Werror=implicit-function-declaration] > CTINFO_MODE_DSCP = BIT(0), > ^~~ > ./usr/include/linux/tc_act/tc_ctinfo.h:30:2: error: enumerator value for > 'CTINFO_MODE_DSCP' is not an integer constant > CTINFO_MODE_DSCP = BIT(0), > ^~~~ > ./usr/include/linux/tc_act/tc_ctinfo.h:32:1: error: enumerator value for > 'CTINFO_MODE_CPMARK' is not an integer constant > }; > ^ > > Caused by commit > > 24ec483cec98 ("net: sched: Introduce act_ctinfo action") > > Presumably exposed by commit > > b91976b7c0e3 ("kbuild: compile-test UAPI headers to ensure they are > self-contained") > > from the kbuild tree. My commit correctly blocked the broken UAPI header, Hooray! People export more and more headers that are never able to compile in user-space. We must block new breakages from coming in. BIT() is not exported to user-space since it is not prefixed with underscore. You can use _BITUL() in user-space, which is available in include/uapi/linux/const.h Thanks. > I have applied the following (obvious) patch for today. > > From: Stephen Rothwell > Date: Wed, 19 Jun 2019 13:15:22 +1000 > Subject: [PATCH] net: sched: don't use BIT() in uapi headers > > Signed-off-by: Stephen Rothwell > --- > include/uapi/linux/tc_act/tc_ctinfo.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h > b/include/uapi/linux/tc_act/tc_ctinfo.h > index da803e05a89b..6166c62dd7dd 100644 > --- a/include/uapi/linux/tc_act/tc_ctinfo.h > +++ b/include/uapi/linux/tc_act/tc_ctinfo.h > @@ -27,8 +27,8 @@ enum { > #define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1) > > enum { > - CTINFO_MODE_DSCP= BIT(0), > - CTINFO_MODE_CPMARK = BIT(1) > + CTINFO_MODE_DSCP= (1UL << 0), > + CTINFO_MODE_CPMARK = (1UL << 1) > }; > > #endif > -- > 2.20.1 > > -- > Cheers, > Stephen Rothwell -- Best Regards Masahiro Yamada
Re: [PATCH v9 03/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal
On Mon, Jun 17, 2019 at 6:42 PM Wei Yang wrote: > > On Wed, Jun 05, 2019 at 02:58:04PM -0700, Dan Williams wrote: > >Sub-section hotplug support reduces the unit of operation of hotplug > >from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units > >(PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider > >PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not > >valid_section(), can toggle. > > > >Cc: Michal Hocko > >Cc: Vlastimil Babka > >Cc: Logan Gunthorpe > >Reviewed-by: Pavel Tatashin > >Reviewed-by: Oscar Salvador > >Signed-off-by: Dan Williams > >--- > > mm/memory_hotplug.c | 29 - > > 1 file changed, 8 insertions(+), 21 deletions(-) > > > >diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > >index 7b963c2d3a0d..647859a1d119 100644 > >--- a/mm/memory_hotplug.c > >+++ b/mm/memory_hotplug.c > >@@ -318,12 +318,8 @@ static unsigned long find_smallest_section_pfn(int nid, > >struct zone *zone, > >unsigned long start_pfn, > >unsigned long end_pfn) > > { > >- struct mem_section *ms; > >- > >- for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) { > >- ms = __pfn_to_section(start_pfn); > >- > >- if (unlikely(!valid_section(ms))) > >+ for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUBSECTION) { > >+ if (unlikely(!pfn_valid(start_pfn))) > > continue; > > Hmm, we change the granularity of valid section from SECTION to SUBSECTION. > But we didn't change the granularity of node id and zone information. > > For example, we found the node id of a pfn mismatch, we can skip the whole > section instead of a subsection. > > Maybe this is not a big deal. I don't see a problem.
Re: [PATCH] scsi: scsi_sysfs.c: Hide wwid sdev attr if VPD is not supported
Marcos, > WWID composed from VPD data from device, specifically page 0x83. So, > when a device does not have VPD support, for example USB storage > devices where VPD is specifically disabled, a read into device>/device/wwid file will always return ENXIO. To avoid this, > change the scsi_sdev_attr_is_visible function to hide wwid sysfs file > when the devices does not support VPD. Not a big fan of attribute files that come and go. Why not just return an empty string? Hannes? -- Martin K. Petersen Oracle Linux Engineering
linux-next: build failure after merge of the net-next tree
Hi all, After merging the net-next tree, today's linux-next build (x86_64 allmodconfig) failed like this: In file included from usr/include/linux/tc_act/tc_ctinfo.hdrtest.c:1: ./usr/include/linux/tc_act/tc_ctinfo.h:30:21: error: implicit declaration of function 'BIT' [-Werror=implicit-function-declaration] CTINFO_MODE_DSCP = BIT(0), ^~~ ./usr/include/linux/tc_act/tc_ctinfo.h:30:2: error: enumerator value for 'CTINFO_MODE_DSCP' is not an integer constant CTINFO_MODE_DSCP = BIT(0), ^~~~ ./usr/include/linux/tc_act/tc_ctinfo.h:32:1: error: enumerator value for 'CTINFO_MODE_CPMARK' is not an integer constant }; ^ Caused by commit 24ec483cec98 ("net: sched: Introduce act_ctinfo action") Presumably exposed by commit b91976b7c0e3 ("kbuild: compile-test UAPI headers to ensure they are self-contained") from the kbuild tree. I have applied the following (obvious) patch for today. From: Stephen Rothwell Date: Wed, 19 Jun 2019 13:15:22 +1000 Subject: [PATCH] net: sched: don't use BIT() in uapi headers Signed-off-by: Stephen Rothwell --- include/uapi/linux/tc_act/tc_ctinfo.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h b/include/uapi/linux/tc_act/tc_ctinfo.h index da803e05a89b..6166c62dd7dd 100644 --- a/include/uapi/linux/tc_act/tc_ctinfo.h +++ b/include/uapi/linux/tc_act/tc_ctinfo.h @@ -27,8 +27,8 @@ enum { #define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1) enum { - CTINFO_MODE_DSCP= BIT(0), - CTINFO_MODE_CPMARK = BIT(1) + CTINFO_MODE_DSCP= (1UL << 0), + CTINFO_MODE_CPMARK = (1UL << 1) }; #endif -- 2.20.1 -- Cheers, Stephen Rothwell pgp3rq6rExwui.pgp Description: OpenPGP digital signature
Re: [PATCH 1/2] scsi: devinfo: BLIST_TRY_VPD_PAGES for SanDisk Cruzer Blade
Marcos, > Currently, all USB devices skip VPD pages, even when the device > supports them (SPC-3 and later), but some of them support VPD, like > Cruzer Blade. What's your confidence level wrt. all Cruzer Blades handling this correctly? How many devices have you tested this change with? -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v9 02/12] mm/sparsemem: Add helpers track active portions of a section at boot
On Mon, Jun 17, 2019 at 3:32 PM Dan Williams wrote: > > On Mon, Jun 17, 2019 at 3:22 PM Wei Yang wrote: > > > > On Wed, Jun 05, 2019 at 02:57:59PM -0700, Dan Williams wrote: > > >Prepare for hot{plug,remove} of sub-ranges of a section by tracking a > > >sub-section active bitmask, each bit representing a PMD_SIZE span of the > > >architecture's memory hotplug section size. > > > > > >The implications of a partially populated section is that pfn_valid() > > >needs to go beyond a valid_section() check and read the sub-section > > >active ranges from the bitmask. The expectation is that the bitmask > > >(subsection_map) fits in the same cacheline as the valid_section() data, > > >so the incremental performance overhead to pfn_valid() should be > > >negligible. > > > > > >Cc: Michal Hocko > > >Cc: Vlastimil Babka > > >Cc: Logan Gunthorpe > > >Cc: Oscar Salvador > > >Cc: Pavel Tatashin > > >Tested-by: Jane Chu > > >Signed-off-by: Dan Williams > > >--- > > > include/linux/mmzone.h | 29 - > > > mm/page_alloc.c|4 +++- > > > mm/sparse.c| 35 +++ > > > 3 files changed, 66 insertions(+), 2 deletions(-) > > > > > >diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > >index ac163f2f274f..6dd52d544857 100644 > > >--- a/include/linux/mmzone.h > > >+++ b/include/linux/mmzone.h > > >@@ -1199,6 +1199,8 @@ struct mem_section_usage { > > > unsigned long pageblock_flags[0]; > > > }; > > > > > >+void subsection_map_init(unsigned long pfn, unsigned long nr_pages); > > >+ > > > struct page; > > > struct page_ext; > > > struct mem_section { > > >@@ -1336,12 +1338,36 @@ static inline struct mem_section > > >*__pfn_to_section(unsigned long pfn) > > > > > > extern int __highest_present_section_nr; > > > > > >+static inline int subsection_map_index(unsigned long pfn) > > >+{ > > >+ return (pfn & ~(PAGE_SECTION_MASK)) / PAGES_PER_SUBSECTION; > > >+} > > >+ > > >+#ifdef CONFIG_SPARSEMEM_VMEMMAP > > >+static inline int pfn_section_valid(struct mem_section *ms, unsigned long > > >pfn) > > >+{ > > >+ int idx = subsection_map_index(pfn); > > >+ > > >+ return test_bit(idx, ms->usage->subsection_map); > > >+} > > >+#else > > >+static inline int pfn_section_valid(struct mem_section *ms, unsigned long > > >pfn) > > >+{ > > >+ return 1; > > >+} > > >+#endif > > >+ > > > #ifndef CONFIG_HAVE_ARCH_PFN_VALID > > > static inline int pfn_valid(unsigned long pfn) > > > { > > >+ struct mem_section *ms; > > >+ > > > if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) > > > return 0; > > >- return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); > > >+ ms = __nr_to_section(pfn_to_section_nr(pfn)); > > >+ if (!valid_section(ms)) > > >+ return 0; > > >+ return pfn_section_valid(ms, pfn); > > > } > > > #endif > > > > > >@@ -1373,6 +1399,7 @@ void sparse_init(void); > > > #define sparse_init() do {} while (0) > > > #define sparse_index_init(_sec, _nid) do {} while (0) > > > #define pfn_present pfn_valid > > >+#define subsection_map_init(_pfn, _nr_pages) do {} while (0) > > > #endif /* CONFIG_SPARSEMEM */ > > > > > > /* > > >diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > >index c6d8224d792e..bd773efe5b82 100644 > > >--- a/mm/page_alloc.c > > >+++ b/mm/page_alloc.c > > >@@ -7292,10 +7292,12 @@ void __init free_area_init_nodes(unsigned long > > >*max_zone_pfn) > > > > > > /* Print out the early node map */ > > > pr_info("Early memory node ranges\n"); > > >- for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) > > >+ for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) > > >{ > > > pr_info(" node %3d: [mem %#018Lx-%#018Lx]\n", nid, > > > (u64)start_pfn << PAGE_SHIFT, > > > ((u64)end_pfn << PAGE_SHIFT) - 1); > > >+ subsection_map_init(start_pfn, end_pfn - start_pfn); > > >+ } > > > > Just curious about why we set subsection here? > > > > Function free_area_init_nodes() mostly handles pgdat, if I am correct. Setup > > subsection here looks like touching some lower level system data structure. > > Correct, I'm not sure how it ended up there, but it was the source of > a bug that was fixed with this change: > > https://lore.kernel.org/lkml/capcyv4hjvbpdykpp2gns3-cc2aq0avs1nlk-k3fwxeruvvz...@mail.gmail.com/ On second thought I'm going to keep subsection_map_init() in free_area_init_nodes(), but instead teach pfn_valid() to return true for all "early" sections. There are code paths that use pfn_valid() as a coarse check before validating against pgdat for real validity of online memory. It is sufficient and safe for those to assume that all early sections are fully pfn_valid, while ZONE_DEVICE hotplug can see the more precise subsection_map.
Re: [PATCH] scsi: fdomain: fix building pcmcia front-end
Arnd, > Move the common support outside of the SCSI_LOWLEVEL section. > Alternatively, we could move all of SCSI_LOWLEVEL_PCMCIA into > SCSI_LOWLEVEL. This would be more sensible, but might cause surprises > for users that have SCSI_LOWLEVEL disabled. It seems messy to me that PCMCIA lives outside of the LOWLEVEL section. Given that the number of users that rely on PCMCIA for their system disk is probably pretty low, I think I'm leaning towards cleaning things up instead of introducing a nonsensical top level option. Or even better: Get rid of SCSI_FDOMAIN as a user-visible option and select it if either of the PCI/ISA/PCMCIA drivers are enabled. -- Martin K. Petersen Oracle Linux Engineering
Re: "mm: reparent slab memory on cgroup removal" series triggers SLUB_DEBUG errors
On Tue, Jun 18, 2019 at 05:43:04PM -0400, Qian Cai wrote: > Booting linux-next on both arm64 and powerpc triggers SLUB_DEBUG errors > below. Reverted the whole series “mm: reparent slab memory on cgroup removal” > [1] fixed the issue. Hi Qian! Thank you for the report! Didn't you try to reproduce it on x86? All the code changed in this series isn't arch-specific, so if it can be seen only on ppc and arm64, that's interesting. I'm currently on PTO and have a very limited internet connection, so I won't be able to reproduce the issue up to Sunday, when I'll be back. If you can try reverting only the last patch from the series, I will appreciate it. Thanks! > > [1] https://lore.kernel.org/lkml/20190611231813.3148843-1-g...@fb.com/ > > [ 151.773224][ T1650] BUG kmem_cache (Tainted: GB W): Poison > overwritten > [ 151.780969][ T1650] > - > [ 151.780969][ T1650] > [ 151.792016][ T1650] INFO: 0x1fd6fdef-0x07f6bb36. First > byte 0x0 instead of 0x6b > [ 151.800726][ T1650] INFO: Allocated in create_cache+0x6c/0x1bc age=24301 > cpu=97 pid=1444 > [ 151.808821][ T1650]kmem_cache_alloc+0x514/0x568 > [ 151.813527][ T1650]create_cache+0x6c/0x1bc > [ 151.817800][ T1650]memcg_create_kmem_cache+0xfc/0x11c > [ 151.823028][ T1650]memcg_kmem_cache_create_func+0x40/0x170 > [ 151.828691][ T1650]process_one_work+0x4e0/0xa54 > [ 151.833398][ T1650]worker_thread+0x498/0x650 > [ 151.837843][ T1650]kthread+0x1b8/0x1d4 > [ 151.841770][ T1650]ret_from_fork+0x10/0x18 > [ 151.846046][ T1650] INFO: Freed in slab_kmem_cache_release+0x3c/0x48 > age=23341 cpu=28 pid=1480 > [ 151.854659][ T1650]slab_kmem_cache_release+0x3c/0x48 > [ 151.859799][ T1650]kmem_cache_release+0x1c/0x28 > [ 151.864507][ T1650]kobject_cleanup+0x134/0x288 > [ 151.869127][ T1650]kobject_put+0x5c/0x68 > [ 151.873226][ T1650]sysfs_slab_release+0x2c/0x38 > [ 151.877931][ T1650]shutdown_cache+0x198/0x23c > [ 151.882464][ T1650]kmemcg_cache_shutdown_fn+0x1c/0x34 > [ 151.887691][ T1650]kmemcg_workfn+0x44/0x68 > [ 151.891963][ T1650]process_one_work+0x4e0/0xa54 > [ 151.896668][ T1650]worker_thread+0x498/0x650 > [ 151.901113][ T1650]kthread+0x1b8/0x1d4 > [ 151.905037][ T1650]ret_from_fork+0x10/0x18 > [ 151.909324][ T1650] INFO: Slab 0x406d65a6 objects=64 used=64 > fp=0x4d988e71 flags=0x7ffc000200 > [ 151.919596][ T1650] INFO: Object 0x40f4b79e > @offset=15420325124116637824 fp=0xe038adbf > [ 151.919596][ T1650] > [ 151.931079][ T1650] Redzone fc4c04f0: bb bb bb bb bb bb bb bb bb > bb bb bb bb bb bb bb > [ 151.941168][ T1650] Redzone 9a25c019: bb bb bb bb bb bb bb bb bb > bb bb bb bb bb bb bb > [ 151.951256][ T1650] Redzone 0b05c7cc: bb bb bb bb bb bb bb bb bb > bb bb bb bb bb bb bb > [ 151.961345][ T1650] Redzone a08ae38b: bb bb bb bb bb bb bb bb bb > bb bb bb bb bb bb bb > [ 151.971433][ T1650] Redzone e0eccd41: bb bb bb bb bb bb bb bb bb > bb bb bb bb bb bb bb > [ 151.981520][ T1650] Redzone 16ee2661: bb bb bb bb bb bb bb bb bb > bb bb bb bb bb bb bb > [ 151.991608][ T1650] Redzone 9364e729: bb bb bb bb bb bb bb bb bb > bb bb bb bb bb bb bb > [ 152.001695][ T1650] Redzone f2202456: bb bb bb bb bb bb bb bb bb > bb bb bb bb bb bb bb > [ 152.011784][ T1650] Object 40f4b79e: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b > [ 152.021783][ T1650] Object 2df21fec: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b > [ 152.031779][ T1650] Object 41cf0887: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b > [ 152.041775][ T1650] Object bfb91e8f: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b > [ 152.051770][ T1650] Object da315b1c: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b > [ 152.061765][ T1650] Object b362de78: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b > [ 152.071761][ T1650] Object ad4f72bf: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b > [ 152.081756][ T1650] Object aa32d346: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b > [ 152.091751][ T1650] Object ad1cf22c: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b > [ 152.101746][ T1650] Object 1cee47e4: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b > [ 152.111741][ T1650] Object 418720ed: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b kk
Re: [RFC net-next 1/5] net: stmmac: introduce IEEE 802.1Qbv configuration functionalities
On Wed, Jun 19, 2019 at 05:36:14AM +0800, Voon Weifeng wrote: Hi Voon > +static int est_poll_srwo(void *ioaddr) > +{ > + /* Poll until the EST GCL Control[SRWO] bit clears. > + * Total wait = 12 x 50ms ~= 0.6s. > + */ > + unsigned int retries = 12; > + unsigned int value; > + > + do { > + value = TSN_RD32(ioaddr + MTL_EST_GCL_CTRL); > + if (!(value & MTL_EST_GCL_CTRL_SRWO)) > + return 0; > + msleep(50); > + } while (--retries); > + > + return -ETIMEDOUT; Maybe use one of the readx_poll_timeout() macros? > +static int est_read_gce(void *ioaddr, unsigned int row, > + unsigned int *gates, unsigned int *ti_nsec, > + unsigned int dbgb, unsigned int dbgm) > +{ > + struct tsn_hw_cap *cap = &dw_tsn_hwcap; > + unsigned int ti_wid = cap->ti_wid; > + unsigned int gates_mask; > + unsigned int ti_mask; > + unsigned int value; > + int ret; > + > + gates_mask = (1 << cap->txqcnt) - 1; > + ti_mask = (1 << ti_wid) - 1; > + > + ret = est_read_gcl_config(ioaddr, &value, row, 0, dbgb, dbgm); > + if (ret) { > + TSN_ERR("Read GCE failed! row=%u\n", row); It is generally not a good idea to put wrappers around the kernel print functions. It would be better if all these functions took struct stmmac_priv *priv rather than ioaddr, so you could then do netdev_err(priv->dev, "Read GCE failed! row=%u\n", row); > + /* Ensure that HW is not in the midst of GCL transition */ > + value = TSN_RD32(ioaddr + MTL_EST_CTRL); Also, don't put wrapper around readl()/writel(). > + value &= ~MTL_EST_CTRL_SSWL; > + > + /* MTL_EST_CTRL value has been read earlier, if TILS value > + * differs, we update here. > + */ > + if (tils != dw_tsn_hwtunable[TSN_HWTUNA_TX_EST_TILS]) { > + value &= ~MTL_EST_CTRL_TILS; > + value |= (tils << MTL_EST_CTRL_TILS_SHIFT); > + > + TSN_WR32(value, ioaddr + MTL_EST_CTRL); > + dw_tsn_hwtunable[TSN_HWTUNA_TX_EST_TILS] = tils; > + } > + > + return 0; > +} > + > +static int est_set_ov(void *ioaddr, > + const unsigned int *ptov, > + const unsigned int *ctov) > +{ > + unsigned int value; > + > + if (!dw_tsn_feat_en[TSN_FEAT_ID_EST]) > + return -ENOTSUPP; > + > + value = TSN_RD32(ioaddr + MTL_EST_CTRL); > + value &= ~MTL_EST_CTRL_SSWL; > + > + if (ptov) { > + if (*ptov > EST_PTOV_MAX) { > + TSN_WARN("EST: invalid PTOV(%u), max=%u\n", > + *ptov, EST_PTOV_MAX); It looks like most o the TSN_WARN should actually be netdev_dbg(). Andrew
Re: [RFC PATCH 16/16] xen/grant-table: host_addr fixup in mapping on xenhost_r0
On 6/17/19 3:55 AM, Juergen Gross wrote: On 09.05.19 19:25, Ankur Arora wrote: Xenhost type xenhost_r0 does not support standard GNTTABOP_map_grant_ref semantics (map a gref onto a specified host_addr). That's because since the hypervisor is local (same address space as the caller of GNTTABOP_map_grant_ref), there is no external entity that could map an arbitrary page underneath an arbitrary address. To handle this, the GNTTABOP_map_grant_ref hypercall on xenhost_r0 treats the host_addr as an OUT parameter instead of IN and expects the gnttab_map_refs() and similar to fixup any state that caches the value of host_addr from before the hypercall. Accordingly gnttab_map_refs() now adds two parameters, a fixup function and a pointer to cached maps to fixup: int gnttab_map_refs(xenhost_t *xh, struct gnttab_map_grant_ref *map_ops, struct gnttab_map_grant_ref *kmap_ops, - struct page **pages, unsigned int count) + struct page **pages, gnttab_map_fixup_t map_fixup_fn, + void **map_fixup[], unsigned int count) The reason we use a fixup function and not an additional mapping op in the xenhost_t is because, depending on the caller, what we are fixing might be different: blkback, netback for instance cache host_addr in via a struct page *, while __xenbus_map_ring() caches a phys_addr. This patch fixes up xen-blkback and xen-gntdev drivers. TODO: - also rewrite gnttab_batch_map() and __xenbus_map_ring(). - modify xen-netback, scsiback, pciback etc Co-developed-by: Joao Martins Signed-off-by: Ankur Arora Without seeing the __xenbus_map_ring() modification it is impossible to do a proper review of this patch. Will do in v2. Ankur Juergen
Re: [RFC PATCH 14/16] xen/blk: gnttab, evtchn, xenbus API changes
On 6/17/19 3:14 AM, Juergen Gross wrote: On 09.05.19 19:25, Ankur Arora wrote: For the most part, we now pass xenhost_t * as a parameter. Co-developed-by: Joao Martins Signed-off-by: Ankur Arora I don't see how this can be a patch on its own. Yes, the reason this was separate was that given this was an RFC, I didn't want to pollute the logic page with lots of mechanical changes. The only way to be able to use a patch for each driver would be to keep the original grant-, event- and xenbus-interfaces and add the new ones taking xenhost * with a new name. The original interfaces could then use xenhost_default and you can switch them to the new interfaces one by one. The last patch could then remove the old interfaces when there is no user left. Yes, this makes sense. Ankur Juergen
Re: [RFC PATCH 13/16] drivers/xen: gnttab, evtchn, xenbus API changes
On 6/17/19 3:07 AM, Juergen Gross wrote: On 09.05.19 19:25, Ankur Arora wrote: Mechanical changes, now most of these calls take xenhost_t * as parameter. Co-developed-by: Joao Martins Signed-off-by: Ankur Arora --- drivers/xen/cpu_hotplug.c | 14 ++--- drivers/xen/gntalloc.c | 13 drivers/xen/gntdev.c | 16 +++ drivers/xen/manage.c | 37 ++- drivers/xen/platform-pci.c | 12 +++- drivers/xen/sys-hypervisor.c | 12 drivers/xen/xen-balloon.c | 10 +++--- drivers/xen/xenfs/xenstored.c | 7 --- 8 files changed, 73 insertions(+), 48 deletions(-) diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c index afeb94446d34..4a05bc028956 100644 --- a/drivers/xen/cpu_hotplug.c +++ b/drivers/xen/cpu_hotplug.c @@ -31,13 +31,13 @@ static void disable_hotplug_cpu(int cpu) unlock_device_hotplug(); } -static int vcpu_online(unsigned int cpu) +static int vcpu_online(xenhost_t *xh, unsigned int cpu) Do we really need xenhost for cpu on/offlinig? I was in two minds about this. We only need it for the xenbus interfaces which could very well have been just xh_default. However, the xenhost is part of the xenbus_watch state, so I thought it is easier to percolate that down instead of adding xh_default all over the place. diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c index 9a69d955dd5c..1655d0a039fd 100644 --- a/drivers/xen/manage.c +++ b/drivers/xen/manage.c @@ -227,14 +227,14 @@ static void shutdown_handler(struct xenbus_watch *watch, return; again: - err = xenbus_transaction_start(xh_default, &xbt); + err = xenbus_transaction_start(watch->xh, &xbt); if (err) return; - str = (char *)xenbus_read(xh_default, xbt, "control", "shutdown", NULL); + str = (char *)xenbus_read(watch->xh, xbt, "control", "shutdown", NULL); /* Ignore read errors and empty reads. */ if (XENBUS_IS_ERR_READ(str)) { - xenbus_transaction_end(xh_default, xbt, 1); + xenbus_transaction_end(watch->xh, xbt, 1); return; } @@ -245,9 +245,9 @@ static void shutdown_handler(struct xenbus_watch *watch, /* Only acknowledge commands which we are prepared to handle. */ if (idx < ARRAY_SIZE(shutdown_handlers)) - xenbus_write(xh_default, xbt, "control", "shutdown", ""); + xenbus_write(watch->xh, xbt, "control", "shutdown", ""); - err = xenbus_transaction_end(xh_default, xbt, 0); + err = xenbus_transaction_end(watch->xh, xbt, 0); if (err == -EAGAIN) { kfree(str); goto again; @@ -272,10 +272,10 @@ static void sysrq_handler(struct xenbus_watch *watch, const char *path, int err; again: - err = xenbus_transaction_start(xh_default, &xbt); + err = xenbus_transaction_start(watch->xh, &xbt); if (err) return; - err = xenbus_scanf(xh_default, xbt, "control", "sysrq", "%c", &sysrq_key); + err = xenbus_scanf(watch->xh, xbt, "control", "sysrq", "%c", &sysrq_key); if (err < 0) { /* * The Xenstore watch fires directly after registering it and @@ -287,21 +287,21 @@ static void sysrq_handler(struct xenbus_watch *watch, const char *path, if (err != -ENOENT && err != -ERANGE) pr_err("Error %d reading sysrq code in control/sysrq\n", err); - xenbus_transaction_end(xh_default, xbt, 1); + xenbus_transaction_end(watch->xh, xbt, 1); return; } if (sysrq_key != '\0') { - err = xenbus_printf(xh_default, xbt, "control", "sysrq", "%c", '\0'); + err = xenbus_printf(watch->xh, xbt, "control", "sysrq", "%c", '\0'); if (err) { pr_err("%s: Error %d writing sysrq in control/sysrq\n", __func__, err); - xenbus_transaction_end(xh_default, xbt, 1); + xenbus_transaction_end(watch->xh, xbt, 1); return; } } - err = xenbus_transaction_end(xh_default, xbt, 0); + err = xenbus_transaction_end(watch->xh, xbt, 0); if (err == -EAGAIN) goto again; @@ -324,14 +324,14 @@ static struct notifier_block xen_reboot_nb = { .notifier_call = poweroff_nb, }; -static int setup_shutdown_watcher(void) +static int setup_shutdown_watcher(xenhost_t *xh) I think shutdown is purely local, too. Yes, I introduced xenhost for the same reason as above. I agree that either of these cases (and similar others) have no use for the concept of xenhost. Do you think it makes sense for these to pass NULL instead and the underlying interface would just assume xh_default. Ankur Juergen
Re: [PATCH 1/1] scsi: ufs-qcom: Add support for platforms booting ACPI
Lee, > New Qualcomm AArch64 based laptops are now available which use UFS > as their primary data storage medium. These devices are supplied > with ACPI support out of the box. This patch ensures the Qualcomm > UFS driver will be bound when the "QCOM24A5" H/W device is > advertised as present. Applied to 5.3/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering