INFO: rcu detected stall in kvm_vcpu_ioctl
Hello, syzbot found the following crash on: HEAD commit:3d0e7a9e00fd Merge tag 'md/4.19-rc2' of git://git.kernel.o.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1666429e40 kernel config: https://syzkaller.appspot.com/x/.config?x=8f59875069d721b6 dashboard link: https://syzkaller.appspot.com/bug?extid=e9b1e8f574404b6e4ed3 compiler: gcc (GCC) 8.0.1 20180413 (experimental) Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+e9b1e8f574404b6e4...@syzkaller.appspotmail.com rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu:(detected by 0, t=10502 jiffies, g=45997, q=77) rcu: All QSes seen, last rcu_preempt kthread activity 10502 (4294979638-4294969136), jiffies_till_next_fqs=1, root ->qsmask 0x0 syz-executor7 R running task22096 16667 5475 0x Call Trace: sched_show_task.cold.83+0x2b6/0x30a kernel/sched/core.c:5296 print_other_cpu_stall.cold.79+0xa83/0xba5 kernel/rcu/tree.c:1430 check_cpu_stall kernel/rcu/tree.c:1557 [inline] __rcu_pending kernel/rcu/tree.c:3276 [inline] rcu_pending kernel/rcu/tree.c:3319 [inline] rcu_check_callbacks+0xafc/0x1990 kernel/rcu/tree.c:2665 update_process_times+0x2d/0x70 kernel/time/timer.c:1636 tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164 tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274 __run_hrtimer kernel/time/hrtimer.c:1398 [inline] __hrtimer_run_queues+0x41c/0x10d0 kernel/time/hrtimer.c:1460 hrtimer_interrupt+0x313/0x780 kernel/time/hrtimer.c:1518 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1029 [inline] smp_apic_timer_interrupt+0x1a1/0x760 arch/x86/kernel/apic/apic.c:1054 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864 RIP: 0010:__sanitizer_cov_trace_const_cmp4+0x0/0x20 kernel/kcov.c:183 Code: a6 fe ff ff 5d c3 0f 1f 40 00 55 0f b7 d6 0f b7 f7 bf 03 00 00 00 48 89 e5 48 8b 4d 08 e8 88 fe ff ff 5d c3 66 0f 1f 44 00 00 <55> 89 f2 89 fe bf 05 00 00 00 48 89 e5 48 8b 4d 08 e8 6a fe ff ff RSP: 0018:88019baf7858 EFLAGS: 0246 ORIG_RAX: ff13 RAX: RBX: 88019ef30700 RCX: c90001ed4000 RDX: 0004 RSI: RDI: RBP: 88019baf78d8 R08: 8801bd9ea700 R09: 112b43cd R10: 88019baf7860 R11: 8801dae23993 R12: R13: 0007 R14: 0007 R15: dc00 kvm_vcpu_ioctl+0x72b/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2590 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:501 [inline] do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702 __do_sys_ioctl fs/ioctl.c:709 [inline] __se_sys_ioctl fs/ioctl.c:707 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457099 Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7f8361215c78 EFLAGS: 0246 ORIG_RAX: 0010 RAX: ffda RBX: 7f83612166d4 RCX: 00457099 RDX: RSI: ae80 RDI: 0006 RBP: 009300a0 R08: R09: R10: R11: 0246 R12: R13: 004cf730 R14: 004c59b9 R15: rcu: rcu_preempt kthread starved for 10502 jiffies! g45997 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1 rcu: RCU grace-period kthread stack dump: rcu_preempt R running task2287210 2 0x8000 Call Trace: context_switch kernel/sched/core.c:2825 [inline] __schedule+0x86c/0x1ed0 kernel/sched/core.c:3473 schedule+0xfe/0x460 kernel/sched/core.c:3517 schedule_timeout+0x140/0x260 kernel/time/timer.c:1804 rcu_gp_kthread+0x9d9/0x2310 kernel/rcu/tree.c:2194 kthread+0x35a/0x420 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413 sched: RT throttling activated --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot.
INFO: rcu detected stall in kvm_vcpu_ioctl
Hello, syzbot found the following crash on: HEAD commit:3d0e7a9e00fd Merge tag 'md/4.19-rc2' of git://git.kernel.o.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1666429e40 kernel config: https://syzkaller.appspot.com/x/.config?x=8f59875069d721b6 dashboard link: https://syzkaller.appspot.com/bug?extid=e9b1e8f574404b6e4ed3 compiler: gcc (GCC) 8.0.1 20180413 (experimental) Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+e9b1e8f574404b6e4...@syzkaller.appspotmail.com rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu:(detected by 0, t=10502 jiffies, g=45997, q=77) rcu: All QSes seen, last rcu_preempt kthread activity 10502 (4294979638-4294969136), jiffies_till_next_fqs=1, root ->qsmask 0x0 syz-executor7 R running task22096 16667 5475 0x Call Trace: sched_show_task.cold.83+0x2b6/0x30a kernel/sched/core.c:5296 print_other_cpu_stall.cold.79+0xa83/0xba5 kernel/rcu/tree.c:1430 check_cpu_stall kernel/rcu/tree.c:1557 [inline] __rcu_pending kernel/rcu/tree.c:3276 [inline] rcu_pending kernel/rcu/tree.c:3319 [inline] rcu_check_callbacks+0xafc/0x1990 kernel/rcu/tree.c:2665 update_process_times+0x2d/0x70 kernel/time/timer.c:1636 tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164 tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274 __run_hrtimer kernel/time/hrtimer.c:1398 [inline] __hrtimer_run_queues+0x41c/0x10d0 kernel/time/hrtimer.c:1460 hrtimer_interrupt+0x313/0x780 kernel/time/hrtimer.c:1518 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1029 [inline] smp_apic_timer_interrupt+0x1a1/0x760 arch/x86/kernel/apic/apic.c:1054 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864 RIP: 0010:__sanitizer_cov_trace_const_cmp4+0x0/0x20 kernel/kcov.c:183 Code: a6 fe ff ff 5d c3 0f 1f 40 00 55 0f b7 d6 0f b7 f7 bf 03 00 00 00 48 89 e5 48 8b 4d 08 e8 88 fe ff ff 5d c3 66 0f 1f 44 00 00 <55> 89 f2 89 fe bf 05 00 00 00 48 89 e5 48 8b 4d 08 e8 6a fe ff ff RSP: 0018:88019baf7858 EFLAGS: 0246 ORIG_RAX: ff13 RAX: RBX: 88019ef30700 RCX: c90001ed4000 RDX: 0004 RSI: RDI: RBP: 88019baf78d8 R08: 8801bd9ea700 R09: 112b43cd R10: 88019baf7860 R11: 8801dae23993 R12: R13: 0007 R14: 0007 R15: dc00 kvm_vcpu_ioctl+0x72b/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2590 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:501 [inline] do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702 __do_sys_ioctl fs/ioctl.c:709 [inline] __se_sys_ioctl fs/ioctl.c:707 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457099 Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7f8361215c78 EFLAGS: 0246 ORIG_RAX: 0010 RAX: ffda RBX: 7f83612166d4 RCX: 00457099 RDX: RSI: ae80 RDI: 0006 RBP: 009300a0 R08: R09: R10: R11: 0246 R12: R13: 004cf730 R14: 004c59b9 R15: rcu: rcu_preempt kthread starved for 10502 jiffies! g45997 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1 rcu: RCU grace-period kthread stack dump: rcu_preempt R running task2287210 2 0x8000 Call Trace: context_switch kernel/sched/core.c:2825 [inline] __schedule+0x86c/0x1ed0 kernel/sched/core.c:3473 schedule+0xfe/0x460 kernel/sched/core.c:3517 schedule_timeout+0x140/0x260 kernel/time/timer.c:1804 rcu_gp_kthread+0x9d9/0x2310 kernel/rcu/tree.c:2194 kthread+0x35a/0x420 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413 sched: RT throttling activated --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot.
Re: [PATCHv3 2/6] tty/ldsem: Update waiter->task before waking up reader
On (09/11/18 14:04), Sergey Senozhatsky wrote: > > for (;;) { > > set_current_state(TASK_UNINTERRUPTIBLE); > > I think that set_current_state() also executes memory barrier. Just > because it accesses task state. > > > - if (!waiter.task) > > + if (!READ_ONCE(waiter.task)) > > break; > > if (!timeout) > > break; This READ_ONCE(waiter.task) looks interesting. Maybe could be moved to a loop condition while (!READ_ONCE(waiter.task)) { ... } -ss
Re: [PATCHv3 2/6] tty/ldsem: Update waiter->task before waking up reader
On (09/11/18 14:04), Sergey Senozhatsky wrote: > > for (;;) { > > set_current_state(TASK_UNINTERRUPTIBLE); > > I think that set_current_state() also executes memory barrier. Just > because it accesses task state. > > > - if (!waiter.task) > > + if (!READ_ONCE(waiter.task)) > > break; > > if (!timeout) > > break; This READ_ONCE(waiter.task) looks interesting. Maybe could be moved to a loop condition while (!READ_ONCE(waiter.task)) { ... } -ss
[RFC PATCH 2/9] mm: introduce smp_list_del for concurrent list entry removals
From: Daniel Jordan Now that the LRU lock is a RW lock, lay the groundwork for fine-grained synchronization so that multiple threads holding the lock as reader can safely remove pages from an LRU at the same time. Add a thread-safe variant of list_del called smp_list_del that allows multiple threads to delete nodes from a list, and wrap this new list API in smp_del_page_from_lru to get the LRU statistics updates right. For bisectability's sake, call the new function only when holding lru_lock as writer. In the next patch, switch to taking it as reader. The algorithm is explained in detail in the comments. Yosef Lev conceived of the algorithm, and this patch is heavily based on an earlier version from him. Thanks to Dave Dice for suggesting the prefetch. [aaronlu: only take list related code here] Signed-off-by: Yosef Lev Signed-off-by: Daniel Jordan --- include/linux/list.h | 2 + lib/Makefile | 2 +- lib/list.c | 158 +++ 3 files changed, 161 insertions(+), 1 deletion(-) create mode 100644 lib/list.c diff --git a/include/linux/list.h b/include/linux/list.h index de04cc5ed536..0fd9c87dd14b 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -47,6 +47,8 @@ static inline bool __list_del_entry_valid(struct list_head *entry) } #endif +extern void smp_list_del(struct list_head *entry); + /* * Insert a new entry between two known consecutive entries. * diff --git a/lib/Makefile b/lib/Makefile index ca3f7ebb900d..9527b7484653 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -38,7 +38,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \ gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \ bsearch.o find_bit.o llist.o memweight.o kfifo.o \ percpu-refcount.o rhashtable.o reciprocal_div.o \ -once.o refcount.o usercopy.o errseq.o bucket_locks.o +once.o refcount.o usercopy.o errseq.o bucket_locks.o list.o obj-$(CONFIG_STRING_SELFTEST) += test_string.o obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o diff --git a/lib/list.c b/lib/list.c new file mode 100644 index ..4d0949ea1a09 --- /dev/null +++ b/lib/list.c @@ -0,0 +1,158 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * Copyright (c) 2017, 2018 Oracle and/or its affiliates. All rights reserved. + * + * Authors: Yosef Lev + * Daniel Jordan + */ + +#include +#include + +/* + * smp_list_del is a variant of list_del that allows concurrent list removals + * under certain assumptions. The idea is to get away from overly coarse + * synchronization, such as using a lock to guard an entire list, which + * serializes all operations even though those operations might be happening on + * disjoint parts. + * + * If you want to use other functions from the list API concurrently, + * additional synchronization may be necessary. For example, you could use a + * rwlock as a two-mode lock, where readers use the lock in shared mode and are + * allowed to call smp_list_del concurrently, and writers use the lock in + * exclusive mode and are allowed to use all list operations. + */ + +/** + * smp_list_del - concurrent variant of list_del + * @entry: entry to delete from the list + * + * Safely removes an entry from the list in the presence of other threads that + * may try to remove adjacent entries. Uses the entry's next field and the + * predecessor entry's next field as locks to accomplish this. + * + * Assumes that no two threads may try to delete the same entry. This + * assumption holds, for example, if the objects on the list are + * reference-counted so that an object is only removed when its refcount falls + * to 0. + * + * @entry's next and prev fields are poisoned on return just as with list_del. + */ +void smp_list_del(struct list_head *entry) +{ + struct list_head *succ, *pred, *pred_reread; + + /* +* The predecessor entry's cacheline is read before it's written, so to +* avoid an unnecessary cacheline state transition, prefetch for +* writing. In the common case, the predecessor won't change. +*/ + prefetchw(entry->prev); + + /* +* Step 1: Lock @entry E by making its next field point to its +* predecessor D. This prevents any thread from removing the +* predecessor because that thread will loop in its step 4 while +* E->next == D. This also prevents any thread from removing the +* successor F because that thread will see that F->prev->next != F in +* the cmpxchg in its step 3. Retry if the successor is being removed +* and has already set this field to NULL in step 3. +*/ + succ = READ_ONCE(entry->next); + pred = READ_ONCE(entry->prev); + while (succ == NULL || cmpxchg(>next, succ, pred) != succ) { + /* +* Reread @entry's successor because it may change until +
[RFC PATCH 8/9] mm: use smp_list_splice() on free path
With free path running concurrently, the cache bouncing on free list head is severe since multiple threads can be freeing pages and each free will need to add the page to free list head. To improve performance on free path for order-0 pages, we can choose to not add the merged pages to Buddy immediately after merge but keep them on a local percpu list first and then after all pages are finished merging, add these merged pages to Buddy with smp_list_splice() in one go. This optimization caused a problem though: the page we hold on the local percpu list can be a buddy of other being freed page and we lose the merge oppotunity for them. With this patch, we will have mergable pages unmerged in Buddy. Due to this, I don't see much value of keeping the range lock which is used to avoid such thing from happening, so the range lock is removed in this patch. Signed-off-by: Aaron Lu --- include/linux/mm.h | 1 + include/linux/mmzone.h | 3 - init/main.c| 1 + mm/page_alloc.c| 151 + 4 files changed, 95 insertions(+), 61 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index a61ebe8ad4ca..a99ba2cb7a0d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2155,6 +2155,7 @@ extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long, extern void setup_per_zone_wmarks(void); extern int __meminit init_per_zone_wmark_min(void); extern void mem_init(void); +extern void percpu_mergelist_init(void); extern void __init mmap_init(void); extern void show_mem(unsigned int flags, nodemask_t *nodemask); extern long si_mem_available(void); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 0ea52e9bb610..e66b8c63d5d1 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -467,9 +467,6 @@ struct zone { /* Primarily protects free_area */ rwlock_tlock; - /* Protects merge operation for a range of order=(MAX_ORDER-1) pages */ - spinlock_t *range_locks; - /* Write-intensive fields used by compaction and vmstats. */ ZONE_PADDING(_pad2_) diff --git a/init/main.c b/init/main.c index 18f8f0140fa0..68a428e1bf15 100644 --- a/init/main.c +++ b/init/main.c @@ -517,6 +517,7 @@ static void __init mm_init(void) * bigger than MAX_ORDER unless SPARSEMEM. */ page_ext_init_flatmem(); + percpu_mergelist_init(); mem_init(); kmem_cache_init(); pgtable_init(); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5f5cc671bcf7..df38c3f2a1cc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -339,17 +339,6 @@ static inline bool update_defer_init(pg_data_t *pgdat, } #endif -/* Return a pointer to the spinblock for a pageblock this page belongs to */ -static inline spinlock_t *get_range_lock(struct page *page) -{ - struct zone *zone = page_zone(page); - unsigned long zone_start_pfn = zone->zone_start_pfn; - unsigned long range = (page_to_pfn(page) - zone_start_pfn) >> - (MAX_ORDER - 1); - - return >range_locks[range]; -} - /* Return a pointer to the bitmap storing bits affecting a block of pages */ static inline unsigned long *get_pageblock_bitmap(struct page *page, unsigned long pfn) @@ -711,9 +700,15 @@ static inline void set_page_order(struct page *page, unsigned int order) static inline void add_to_buddy(struct page *page, struct zone *zone, unsigned int order, int mt) { + /* +* Adding page to free list before setting PageBuddy flag +* or other thread doing merge can notice its PageBuddy flag +* and attempt to merge with it, causing list corruption. +*/ + smp_list_add(>lru, >free_area[order].free_list[mt]); + smp_wmb(); set_page_order(page, order); atomic_long_inc(>free_area[order].nr_free); - smp_list_add(>lru, >free_area[order].free_list[mt]); } static inline void rmv_page_order(struct page *page) @@ -784,40 +779,17 @@ static inline int page_is_buddy(struct page *page, struct page *buddy, return 0; } -/* - * Freeing function for a buddy system allocator. - * - * The concept of a buddy system is to maintain direct-mapped table - * (containing bit values) for memory blocks of various "orders". - * The bottom level table contains the map for the smallest allocatable - * units of memory (here, pages), and each level above it describes - * pairs of units from the levels below, hence, "buddies". - * At a high level, all that happens here is marking the table entry - * at the bottom level available, and propagating the changes upward - * as necessary, plus some accounting needed to play nicely with other - * parts of the VM system. - * At each level, we keep a list of pages, which are heads of continuous - *
[RFC PATCH 1/9] mm: do not add anon pages to LRU
For the sake of testing purpose, do not add anon pages to LRU to avoid LRU lock so we can test zone lock exclusively. Signed-off-by: Aaron Lu --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index c467102a5cbc..080641255b8b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3208,7 +3208,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); page_add_new_anon_rmap(page, vma, vmf->address, false); mem_cgroup_commit_charge(page, memcg, false, false); - lru_cache_add_active_or_unevictable(page, vma); + //lru_cache_add_active_or_unevictable(page, vma); setpte: set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); -- 2.17.1
[RFC PATCH 8/9] mm: use smp_list_splice() on free path
With free path running concurrently, the cache bouncing on free list head is severe since multiple threads can be freeing pages and each free will need to add the page to free list head. To improve performance on free path for order-0 pages, we can choose to not add the merged pages to Buddy immediately after merge but keep them on a local percpu list first and then after all pages are finished merging, add these merged pages to Buddy with smp_list_splice() in one go. This optimization caused a problem though: the page we hold on the local percpu list can be a buddy of other being freed page and we lose the merge oppotunity for them. With this patch, we will have mergable pages unmerged in Buddy. Due to this, I don't see much value of keeping the range lock which is used to avoid such thing from happening, so the range lock is removed in this patch. Signed-off-by: Aaron Lu --- include/linux/mm.h | 1 + include/linux/mmzone.h | 3 - init/main.c| 1 + mm/page_alloc.c| 151 + 4 files changed, 95 insertions(+), 61 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index a61ebe8ad4ca..a99ba2cb7a0d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2155,6 +2155,7 @@ extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long, extern void setup_per_zone_wmarks(void); extern int __meminit init_per_zone_wmark_min(void); extern void mem_init(void); +extern void percpu_mergelist_init(void); extern void __init mmap_init(void); extern void show_mem(unsigned int flags, nodemask_t *nodemask); extern long si_mem_available(void); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 0ea52e9bb610..e66b8c63d5d1 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -467,9 +467,6 @@ struct zone { /* Primarily protects free_area */ rwlock_tlock; - /* Protects merge operation for a range of order=(MAX_ORDER-1) pages */ - spinlock_t *range_locks; - /* Write-intensive fields used by compaction and vmstats. */ ZONE_PADDING(_pad2_) diff --git a/init/main.c b/init/main.c index 18f8f0140fa0..68a428e1bf15 100644 --- a/init/main.c +++ b/init/main.c @@ -517,6 +517,7 @@ static void __init mm_init(void) * bigger than MAX_ORDER unless SPARSEMEM. */ page_ext_init_flatmem(); + percpu_mergelist_init(); mem_init(); kmem_cache_init(); pgtable_init(); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5f5cc671bcf7..df38c3f2a1cc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -339,17 +339,6 @@ static inline bool update_defer_init(pg_data_t *pgdat, } #endif -/* Return a pointer to the spinblock for a pageblock this page belongs to */ -static inline spinlock_t *get_range_lock(struct page *page) -{ - struct zone *zone = page_zone(page); - unsigned long zone_start_pfn = zone->zone_start_pfn; - unsigned long range = (page_to_pfn(page) - zone_start_pfn) >> - (MAX_ORDER - 1); - - return >range_locks[range]; -} - /* Return a pointer to the bitmap storing bits affecting a block of pages */ static inline unsigned long *get_pageblock_bitmap(struct page *page, unsigned long pfn) @@ -711,9 +700,15 @@ static inline void set_page_order(struct page *page, unsigned int order) static inline void add_to_buddy(struct page *page, struct zone *zone, unsigned int order, int mt) { + /* +* Adding page to free list before setting PageBuddy flag +* or other thread doing merge can notice its PageBuddy flag +* and attempt to merge with it, causing list corruption. +*/ + smp_list_add(>lru, >free_area[order].free_list[mt]); + smp_wmb(); set_page_order(page, order); atomic_long_inc(>free_area[order].nr_free); - smp_list_add(>lru, >free_area[order].free_list[mt]); } static inline void rmv_page_order(struct page *page) @@ -784,40 +779,17 @@ static inline int page_is_buddy(struct page *page, struct page *buddy, return 0; } -/* - * Freeing function for a buddy system allocator. - * - * The concept of a buddy system is to maintain direct-mapped table - * (containing bit values) for memory blocks of various "orders". - * The bottom level table contains the map for the smallest allocatable - * units of memory (here, pages), and each level above it describes - * pairs of units from the levels below, hence, "buddies". - * At a high level, all that happens here is marking the table entry - * at the bottom level available, and propagating the changes upward - * as necessary, plus some accounting needed to play nicely with other - * parts of the VM system. - * At each level, we keep a list of pages, which are heads of continuous - *
[RFC PATCH 1/9] mm: do not add anon pages to LRU
For the sake of testing purpose, do not add anon pages to LRU to avoid LRU lock so we can test zone lock exclusively. Signed-off-by: Aaron Lu --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index c467102a5cbc..080641255b8b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3208,7 +3208,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); page_add_new_anon_rmap(page, vma, vmf->address, false); mem_cgroup_commit_charge(page, memcg, false, false); - lru_cache_add_active_or_unevictable(page, vma); + //lru_cache_add_active_or_unevictable(page, vma); setpte: set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); -- 2.17.1
[RFC PATCH 2/9] mm: introduce smp_list_del for concurrent list entry removals
From: Daniel Jordan Now that the LRU lock is a RW lock, lay the groundwork for fine-grained synchronization so that multiple threads holding the lock as reader can safely remove pages from an LRU at the same time. Add a thread-safe variant of list_del called smp_list_del that allows multiple threads to delete nodes from a list, and wrap this new list API in smp_del_page_from_lru to get the LRU statistics updates right. For bisectability's sake, call the new function only when holding lru_lock as writer. In the next patch, switch to taking it as reader. The algorithm is explained in detail in the comments. Yosef Lev conceived of the algorithm, and this patch is heavily based on an earlier version from him. Thanks to Dave Dice for suggesting the prefetch. [aaronlu: only take list related code here] Signed-off-by: Yosef Lev Signed-off-by: Daniel Jordan --- include/linux/list.h | 2 + lib/Makefile | 2 +- lib/list.c | 158 +++ 3 files changed, 161 insertions(+), 1 deletion(-) create mode 100644 lib/list.c diff --git a/include/linux/list.h b/include/linux/list.h index de04cc5ed536..0fd9c87dd14b 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -47,6 +47,8 @@ static inline bool __list_del_entry_valid(struct list_head *entry) } #endif +extern void smp_list_del(struct list_head *entry); + /* * Insert a new entry between two known consecutive entries. * diff --git a/lib/Makefile b/lib/Makefile index ca3f7ebb900d..9527b7484653 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -38,7 +38,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \ gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \ bsearch.o find_bit.o llist.o memweight.o kfifo.o \ percpu-refcount.o rhashtable.o reciprocal_div.o \ -once.o refcount.o usercopy.o errseq.o bucket_locks.o +once.o refcount.o usercopy.o errseq.o bucket_locks.o list.o obj-$(CONFIG_STRING_SELFTEST) += test_string.o obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o diff --git a/lib/list.c b/lib/list.c new file mode 100644 index ..4d0949ea1a09 --- /dev/null +++ b/lib/list.c @@ -0,0 +1,158 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * Copyright (c) 2017, 2018 Oracle and/or its affiliates. All rights reserved. + * + * Authors: Yosef Lev + * Daniel Jordan + */ + +#include +#include + +/* + * smp_list_del is a variant of list_del that allows concurrent list removals + * under certain assumptions. The idea is to get away from overly coarse + * synchronization, such as using a lock to guard an entire list, which + * serializes all operations even though those operations might be happening on + * disjoint parts. + * + * If you want to use other functions from the list API concurrently, + * additional synchronization may be necessary. For example, you could use a + * rwlock as a two-mode lock, where readers use the lock in shared mode and are + * allowed to call smp_list_del concurrently, and writers use the lock in + * exclusive mode and are allowed to use all list operations. + */ + +/** + * smp_list_del - concurrent variant of list_del + * @entry: entry to delete from the list + * + * Safely removes an entry from the list in the presence of other threads that + * may try to remove adjacent entries. Uses the entry's next field and the + * predecessor entry's next field as locks to accomplish this. + * + * Assumes that no two threads may try to delete the same entry. This + * assumption holds, for example, if the objects on the list are + * reference-counted so that an object is only removed when its refcount falls + * to 0. + * + * @entry's next and prev fields are poisoned on return just as with list_del. + */ +void smp_list_del(struct list_head *entry) +{ + struct list_head *succ, *pred, *pred_reread; + + /* +* The predecessor entry's cacheline is read before it's written, so to +* avoid an unnecessary cacheline state transition, prefetch for +* writing. In the common case, the predecessor won't change. +*/ + prefetchw(entry->prev); + + /* +* Step 1: Lock @entry E by making its next field point to its +* predecessor D. This prevents any thread from removing the +* predecessor because that thread will loop in its step 4 while +* E->next == D. This also prevents any thread from removing the +* successor F because that thread will see that F->prev->next != F in +* the cmpxchg in its step 3. Retry if the successor is being removed +* and has already set this field to NULL in step 3. +*/ + succ = READ_ONCE(entry->next); + pred = READ_ONCE(entry->prev); + while (succ == NULL || cmpxchg(>next, succ, pred) != succ) { + /* +* Reread @entry's successor because it may change until +
[RFC PATCH 9/9] mm: page_alloc: merge before sending pages to global pool
Now that we have mergable pages in Buddy unmerged, this is a step to reduce such things from happening to some extent. Suppose two buddy pages are on the list to be freed in free_pcppages_bulk(), the first page goes to merge but its buddy is not in Buddy yet so we hold it locally as an order0 page; then its buddy page goes to merge and couldn't merge either because we hold the first page locally instead of having it in Buddy. The end result is, we have two mergable buddy pages but failed to merge it. So this patch will attempt merge for these to-be-freed pages before acquiring any lock, it could, to some extent, reduce fragmentation caused by last patch. With this change, the pcp_drain trace isn't easy to use so I removed it. Signed-off-by: Aaron Lu --- mm/page_alloc.c | 75 +++-- 1 file changed, 73 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index df38c3f2a1cc..d3eafe857713 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1098,6 +1098,72 @@ void __init percpu_mergelist_init(void) } } +static inline bool buddy_in_list(struct page *page, struct page *buddy, +struct list_head *list) +{ + list_for_each_entry_continue(page, list, lru) + if (page == buddy) + return true; + + return false; +} + +static inline void merge_in_pcp(struct list_head *list) +{ + int order; + struct page *page; + + /* Set order information to 0 initially since they are PCP pages */ + list_for_each_entry(page, list, lru) + set_page_private(page, 0); + + /* +* Check for mergable pages for each order. +* +* For each order, check if their buddy is also in the list and +* if so, do merge, then remove the merged buddy from the list. +*/ + for (order = 0; order < MAX_ORDER - 1; order++) { + bool has_merge = false; + + page = list_first_entry(list, struct page, lru); + while (>lru != list) { + unsigned long pfn, buddy_pfn, combined_pfn; + struct page *buddy, *n; + + if (page_order(page) != order) { + page = list_next_entry(page, lru); + continue; + } + + pfn = page_to_pfn(page); + buddy_pfn = __find_buddy_pfn(pfn, order); + buddy = page + (buddy_pfn - pfn); + if (!buddy_in_list(page, buddy, list) || + page_order(buddy) != order) { + page = list_next_entry(page, lru); + continue; + } + + combined_pfn = pfn & buddy_pfn; + if (combined_pfn == pfn) { + set_page_private(page, order + 1); + list_del(>lru); + page = list_next_entry(page, lru); + } else { + set_page_private(buddy, order + 1); + n = list_next_entry(page, lru); + list_del(>lru); + page = n; + } + has_merge = true; + } + + if (!has_merge) + break; + } +} + /* * Frees a number of pages from the PCP lists * Assumes all pages on list are in same zone, and of same order. @@ -1165,6 +1231,12 @@ static void free_pcppages_bulk(struct zone *zone, int count, } while (--count && --batch_free && !list_empty(list)); } + /* +* Before acquiring the possibly heavily contended zone lock, do merge +* among these to-be-freed PCP pages before sending them to Buddy. +*/ + merge_in_pcp(); + read_lock(>lock); isolated_pageblocks = has_isolate_pageblock(zone); @@ -1182,10 +1254,9 @@ static void free_pcppages_bulk(struct zone *zone, int count, if (unlikely(isolated_pageblocks)) mt = get_pageblock_migratetype(page); - order = 0; + order = page_order(page); merged_page = do_merge(page, page_to_pfn(page), zone, , mt); list_add(_page->lru, this_cpu_ptr(_lists[order][mt])); - trace_mm_page_pcpu_drain(page, 0, mt); } for_each_migratetype_order(order, migratetype) { -- 2.17.1
[RFC PATCH 5/9] mm/page_alloc: use helper functions to add/remove a page to/from buddy
There are multiple places that add/remove a page into/from buddy, introduce helper functions for them. This also makes it easier to add code when a page is added/removed to/from buddy. No functionality change. Acked-by: Vlastimil Babka Signed-off-by: Aaron Lu --- mm/page_alloc.c | 65 + 1 file changed, 39 insertions(+), 26 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 38e39ccdd6d9..d0b954783f1d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -697,12 +697,41 @@ static inline void set_page_order(struct page *page, unsigned int order) __SetPageBuddy(page); } +static inline void add_to_buddy_common(struct page *page, struct zone *zone, + unsigned int order) +{ + set_page_order(page, order); + zone->free_area[order].nr_free++; +} + +static inline void add_to_buddy_head(struct page *page, struct zone *zone, + unsigned int order, int mt) +{ + add_to_buddy_common(page, zone, order); + list_add(>lru, >free_area[order].free_list[mt]); +} + +static inline void add_to_buddy_tail(struct page *page, struct zone *zone, + unsigned int order, int mt) +{ + add_to_buddy_common(page, zone, order); + list_add_tail(>lru, >free_area[order].free_list[mt]); +} + static inline void rmv_page_order(struct page *page) { __ClearPageBuddy(page); set_page_private(page, 0); } +static inline void remove_from_buddy(struct page *page, struct zone *zone, + unsigned int order) +{ + list_del(>lru); + zone->free_area[order].nr_free--; + rmv_page_order(page); +} + /* * This function checks whether a page is free && is the buddy * we can coalesce a page and its buddy if @@ -803,13 +832,10 @@ static inline void __free_one_page(struct page *page, * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page, * merge with it and move up one order. */ - if (page_is_guard(buddy)) { + if (page_is_guard(buddy)) clear_page_guard(zone, buddy, order, migratetype); - } else { - list_del(>lru); - zone->free_area[order].nr_free--; - rmv_page_order(buddy); - } + else + remove_from_buddy(buddy, zone, order); combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); pfn = combined_pfn; @@ -841,8 +867,6 @@ static inline void __free_one_page(struct page *page, } done_merging: - set_page_order(page, order); - /* * If this is not the largest possible page, check if the buddy * of the next-highest order is free. If it is, it's possible @@ -859,15 +883,12 @@ static inline void __free_one_page(struct page *page, higher_buddy = higher_page + (buddy_pfn - combined_pfn); if (pfn_valid_within(buddy_pfn) && page_is_buddy(higher_page, higher_buddy, order + 1)) { - list_add_tail(>lru, - >free_area[order].free_list[migratetype]); - goto out; + add_to_buddy_tail(page, zone, order, migratetype); + return; } } - list_add(>lru, >free_area[order].free_list[migratetype]); -out: - zone->free_area[order].nr_free++; + add_to_buddy_head(page, zone, order, migratetype); } /* @@ -1805,9 +1826,7 @@ static inline void expand(struct zone *zone, struct page *page, if (set_page_guard(zone, [size], high, migratetype)) continue; - list_add([size].lru, >free_list[migratetype]); - area->nr_free++; - set_page_order([size], high); + add_to_buddy_head([size], zone, high, migratetype); } } @@ -1951,9 +1970,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, struct page, lru); if (!page) continue; - list_del(>lru); - rmv_page_order(page); - area->nr_free--; + remove_from_buddy(page, zone, current_order); expand(zone, page, order, current_order, area, migratetype); set_pcppage_migratetype(page, migratetype); return page; @@ -2871,9 +2888,7 @@ int __isolate_free_page(struct page *page, unsigned int order) } /* Remove page from free list */ - list_del(>lru); - zone->free_area[order].nr_free--; - rmv_page_order(page); + remove_from_buddy(page, zone, order); /*
[RFC PATCH 0/9] Improve zone lock scalability using Daniel Jordan's list work
Daniel Jordan and others proposed an innovative technique to make multiple threads concurrently use list_del() at any position of the list and list_add() at head position of the list without taking a lock in this year's MM summit[0]. People think this technique may be useful to improve zone lock scalability so here is my try. This series is based on Daniel Jordan's most recent patchset[1]. To make this series self contained, 2 of his patches are extracted here. Scalability comes best when multiple threads are operating at different positions of the list. Since free path will access (buddy) pages randomly on free list during merging, it is a good fit to make use of this technique. This patchset makes free path run concurrently. Patch 1 is for testing purpose only, it removes LRU lock from the picture so we can get a better understanding of how much improvement this patchset has on zone lock. Patch 2-3 are Daniel's work to realize concurrent list_del() and list_add(), these new APIs are called smp_list_del() and smp_list_splice(). Patch 4-7 makes free path run concurrently by converting the zone lock from spinlock to rwlock and has free path taking the zone lock in read mode. To avoid complexity and problems, all other code paths take zone lock in write mode. Patch 8 is an optimization that reduces free list head access to avoid severe cache bouncing. It also comes with a side effect: with this patch, there will be mergable pages unmerged in Buddy. Patch 9 improves fragmentation issues introduced in patch 8 by doing pre-merges before pages are sent to merge under zone lock. This patchset is based on v4.19-rc2. Performance wise on 56 cores/112 threads Intel Skylake 2 sockets server using will-it-scale/page_fault1 process mode(higher is better): kernelperformance zone lock contention patch1 9219349 76.99% patch7 2461133 -73.3% 54.46%(another 34.66% on smp_list_add()) patch811712766 +27.0% 68.14% patch911386980 +23.5% 67.18% Though lock contention reduced a lot for patch7, the performance dropped considerably due to severe cache bouncing on free list head among multiple threads doing page free at the same time, because every page free will need to add the page to the free list head. Patch8 is meant to solve this cache bouncing problem and has good result, except the above mentioned side effect of having mergable pages unmerged in Buddy. Patch9 reduced the fragmentation problem to some extent while caused slightly performance drop. As a comparison to the no_merge+cluster_alloc approach I posted before[2]: kernel performance zone lock contention patch1 9219349 76.99% no_merge 11733153 +27.3% 69.18% no_merge+cluster_alloc 12094893 +31.2% 0.73% no_merge(skip merging for order0 page on free path) has similar performance and zone lock contention as patch8/9, while with cluster_alloc that also improves allocation side, zone lock contention for this workload is almost gone. To get an idea of how fragmentation are affected by patch8 and how much improvement patch9 has, this is the result of /proc/buddyinfo after running will-it-scale/page_fault1 for 3 minutes: With patch7: Node 0, zone DMA 0 2 1 1 3 2 2 1 0 1 3 Node 0, zoneDMA32 7 3 6 5 5 10 6 7 6 10410 Node 0, zone Normal 17820 16819 14645 12969 11367 9229 6365 3062 756 69 5646 Node 1, zone Normal 44789 60354 52331 37532 22071 9604 2750241 32 11 6378 With patch8: Node 0, zone DMA 0 2 1 1 3 2 2 1 0 1 3 Node 0, zoneDMA32 7 9 5 4 5 10 6 7 6 10410 Node 0, zone Normal 404917 119614 79446 58303 20679 3106222 89 28 9 5615 Node 1, zone Normal 507659 127355 64470 53549 14104 1288 30 4 1 1 6078 With patch9: Node 0, zone DMA 0 3 0 1 3 0 1 0 1 1 3 Node 0, zoneDMA32 11423621705726702 60 14 5 6296 Node 0, zone Normal 20407 21016 18731 16195 13697 10483 6873 3148 735 39 5637 Node 1, zone Normal 79738 76963 59313 35996 18626 9743 3947750 21 2 6080 A lot more pages stayed in order0 in patch8 than patch7, consequently, for order5 and above pages, there are fewer with patch8 than patch7, suggesting that some pages are not properly merged into high order pages with patch8 applied. Patch9 has far fewer pages stayed in order0 than patch8, which is a good sign but still not as good as patch7. As a comparison, this is the result of no_merge(think of it as a worst case result regarding fragmentation): With no_merge: Node 0, zone DMA 0 2 1
[RFC PATCH 3/9] mm: introduce smp_list_splice to prepare for concurrent LRU adds
From: Daniel Jordan Now that we splice a local list onto the LRU, prepare for multiple tasks doing this concurrently by adding a variant of the kernel's list splicing API, list_splice, that's designed to work with multiple tasks. Although there is naturally less parallelism to be gained from locking the LRU head this way, the main benefit of doing this is to allow removals to happen concurrently. The way lru_lock is today, an add needlessly blocks removal of any page but the first in the LRU. For now, hold lru_lock as writer to serialize the adds to ensure the function is correct for a single thread at a time. Yosef Lev came up with this algorithm. [aaronlu: drop LRU related code, keep only list related code] Suggested-by: Yosef Lev Signed-off-by: Daniel Jordan --- include/linux/list.h | 1 + lib/list.c | 60 ++-- 2 files changed, 54 insertions(+), 7 deletions(-) diff --git a/include/linux/list.h b/include/linux/list.h index 0fd9c87dd14b..5f203fb55939 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -48,6 +48,7 @@ static inline bool __list_del_entry_valid(struct list_head *entry) #endif extern void smp_list_del(struct list_head *entry); +extern void smp_list_splice(struct list_head *list, struct list_head *head); /* * Insert a new entry between two known consecutive entries. diff --git a/lib/list.c b/lib/list.c index 4d0949ea1a09..104faa144abf 100644 --- a/lib/list.c +++ b/lib/list.c @@ -10,17 +10,18 @@ #include /* - * smp_list_del is a variant of list_del that allows concurrent list removals - * under certain assumptions. The idea is to get away from overly coarse - * synchronization, such as using a lock to guard an entire list, which - * serializes all operations even though those operations might be happening on - * disjoint parts. + * smp_list_del and smp_list_splice are variants of list_del and list_splice, + * respectively, that allow concurrent list operations under certain + * assumptions. The idea is to get away from overly coarse synchronization, + * such as using a lock to guard an entire list, which serializes all + * operations even though those operations might be happening on disjoint + * parts. * * If you want to use other functions from the list API concurrently, * additional synchronization may be necessary. For example, you could use a * rwlock as a two-mode lock, where readers use the lock in shared mode and are - * allowed to call smp_list_del concurrently, and writers use the lock in - * exclusive mode and are allowed to use all list operations. + * allowed to call smp_list_* functions concurrently, and writers use the lock + * in exclusive mode and are allowed to use all list operations. */ /** @@ -156,3 +157,48 @@ void smp_list_del(struct list_head *entry) entry->next = LIST_POISON1; entry->prev = LIST_POISON2; } + +/** + * smp_list_splice - thread-safe splice of two lists + * @list: the new list to add + * @head: the place to add it in the first list + * + * Safely handles concurrent smp_list_splice operations onto the same list head + * and concurrent smp_list_del operations of any list entry except @head. + * Assumes that @head cannot be removed. + */ +void smp_list_splice(struct list_head *list, struct list_head *head) +{ + struct list_head *first = list->next; + struct list_head *last = list->prev; + struct list_head *succ; + + /* +* Lock the front of @head by replacing its next pointer with NULL. +* Should another thread be adding to the front, wait until it's done. +*/ + succ = READ_ONCE(head->next); + while (succ == NULL || cmpxchg(>next, succ, NULL) != succ) { + cpu_relax(); + succ = READ_ONCE(head->next); + } + + first->prev = head; + last->next = succ; + + /* +* It is safe to write to succ, head's successor, because locking head +* prevents succ from being removed in smp_list_del. +*/ + succ->prev = last; + + /* +* Pairs with the implied full barrier before the cmpxchg above. +* Ensures the write that unlocks the head is seen last to avoid list +* corruption. +*/ + smp_wmb(); + + /* Simultaneously complete the splice and unlock the head node. */ + WRITE_ONCE(head->next, first); +} -- 2.17.1
[RFC PATCH 5/9] mm/page_alloc: use helper functions to add/remove a page to/from buddy
There are multiple places that add/remove a page into/from buddy, introduce helper functions for them. This also makes it easier to add code when a page is added/removed to/from buddy. No functionality change. Acked-by: Vlastimil Babka Signed-off-by: Aaron Lu --- mm/page_alloc.c | 65 + 1 file changed, 39 insertions(+), 26 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 38e39ccdd6d9..d0b954783f1d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -697,12 +697,41 @@ static inline void set_page_order(struct page *page, unsigned int order) __SetPageBuddy(page); } +static inline void add_to_buddy_common(struct page *page, struct zone *zone, + unsigned int order) +{ + set_page_order(page, order); + zone->free_area[order].nr_free++; +} + +static inline void add_to_buddy_head(struct page *page, struct zone *zone, + unsigned int order, int mt) +{ + add_to_buddy_common(page, zone, order); + list_add(>lru, >free_area[order].free_list[mt]); +} + +static inline void add_to_buddy_tail(struct page *page, struct zone *zone, + unsigned int order, int mt) +{ + add_to_buddy_common(page, zone, order); + list_add_tail(>lru, >free_area[order].free_list[mt]); +} + static inline void rmv_page_order(struct page *page) { __ClearPageBuddy(page); set_page_private(page, 0); } +static inline void remove_from_buddy(struct page *page, struct zone *zone, + unsigned int order) +{ + list_del(>lru); + zone->free_area[order].nr_free--; + rmv_page_order(page); +} + /* * This function checks whether a page is free && is the buddy * we can coalesce a page and its buddy if @@ -803,13 +832,10 @@ static inline void __free_one_page(struct page *page, * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page, * merge with it and move up one order. */ - if (page_is_guard(buddy)) { + if (page_is_guard(buddy)) clear_page_guard(zone, buddy, order, migratetype); - } else { - list_del(>lru); - zone->free_area[order].nr_free--; - rmv_page_order(buddy); - } + else + remove_from_buddy(buddy, zone, order); combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); pfn = combined_pfn; @@ -841,8 +867,6 @@ static inline void __free_one_page(struct page *page, } done_merging: - set_page_order(page, order); - /* * If this is not the largest possible page, check if the buddy * of the next-highest order is free. If it is, it's possible @@ -859,15 +883,12 @@ static inline void __free_one_page(struct page *page, higher_buddy = higher_page + (buddy_pfn - combined_pfn); if (pfn_valid_within(buddy_pfn) && page_is_buddy(higher_page, higher_buddy, order + 1)) { - list_add_tail(>lru, - >free_area[order].free_list[migratetype]); - goto out; + add_to_buddy_tail(page, zone, order, migratetype); + return; } } - list_add(>lru, >free_area[order].free_list[migratetype]); -out: - zone->free_area[order].nr_free++; + add_to_buddy_head(page, zone, order, migratetype); } /* @@ -1805,9 +1826,7 @@ static inline void expand(struct zone *zone, struct page *page, if (set_page_guard(zone, [size], high, migratetype)) continue; - list_add([size].lru, >free_list[migratetype]); - area->nr_free++; - set_page_order([size], high); + add_to_buddy_head([size], zone, high, migratetype); } } @@ -1951,9 +1970,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, struct page, lru); if (!page) continue; - list_del(>lru); - rmv_page_order(page); - area->nr_free--; + remove_from_buddy(page, zone, current_order); expand(zone, page, order, current_order, area, migratetype); set_pcppage_migratetype(page, migratetype); return page; @@ -2871,9 +2888,7 @@ int __isolate_free_page(struct page *page, unsigned int order) } /* Remove page from free list */ - list_del(>lru); - zone->free_area[order].nr_free--; - rmv_page_order(page); + remove_from_buddy(page, zone, order); /*
[RFC PATCH 0/9] Improve zone lock scalability using Daniel Jordan's list work
Daniel Jordan and others proposed an innovative technique to make multiple threads concurrently use list_del() at any position of the list and list_add() at head position of the list without taking a lock in this year's MM summit[0]. People think this technique may be useful to improve zone lock scalability so here is my try. This series is based on Daniel Jordan's most recent patchset[1]. To make this series self contained, 2 of his patches are extracted here. Scalability comes best when multiple threads are operating at different positions of the list. Since free path will access (buddy) pages randomly on free list during merging, it is a good fit to make use of this technique. This patchset makes free path run concurrently. Patch 1 is for testing purpose only, it removes LRU lock from the picture so we can get a better understanding of how much improvement this patchset has on zone lock. Patch 2-3 are Daniel's work to realize concurrent list_del() and list_add(), these new APIs are called smp_list_del() and smp_list_splice(). Patch 4-7 makes free path run concurrently by converting the zone lock from spinlock to rwlock and has free path taking the zone lock in read mode. To avoid complexity and problems, all other code paths take zone lock in write mode. Patch 8 is an optimization that reduces free list head access to avoid severe cache bouncing. It also comes with a side effect: with this patch, there will be mergable pages unmerged in Buddy. Patch 9 improves fragmentation issues introduced in patch 8 by doing pre-merges before pages are sent to merge under zone lock. This patchset is based on v4.19-rc2. Performance wise on 56 cores/112 threads Intel Skylake 2 sockets server using will-it-scale/page_fault1 process mode(higher is better): kernelperformance zone lock contention patch1 9219349 76.99% patch7 2461133 -73.3% 54.46%(another 34.66% on smp_list_add()) patch811712766 +27.0% 68.14% patch911386980 +23.5% 67.18% Though lock contention reduced a lot for patch7, the performance dropped considerably due to severe cache bouncing on free list head among multiple threads doing page free at the same time, because every page free will need to add the page to the free list head. Patch8 is meant to solve this cache bouncing problem and has good result, except the above mentioned side effect of having mergable pages unmerged in Buddy. Patch9 reduced the fragmentation problem to some extent while caused slightly performance drop. As a comparison to the no_merge+cluster_alloc approach I posted before[2]: kernel performance zone lock contention patch1 9219349 76.99% no_merge 11733153 +27.3% 69.18% no_merge+cluster_alloc 12094893 +31.2% 0.73% no_merge(skip merging for order0 page on free path) has similar performance and zone lock contention as patch8/9, while with cluster_alloc that also improves allocation side, zone lock contention for this workload is almost gone. To get an idea of how fragmentation are affected by patch8 and how much improvement patch9 has, this is the result of /proc/buddyinfo after running will-it-scale/page_fault1 for 3 minutes: With patch7: Node 0, zone DMA 0 2 1 1 3 2 2 1 0 1 3 Node 0, zoneDMA32 7 3 6 5 5 10 6 7 6 10410 Node 0, zone Normal 17820 16819 14645 12969 11367 9229 6365 3062 756 69 5646 Node 1, zone Normal 44789 60354 52331 37532 22071 9604 2750241 32 11 6378 With patch8: Node 0, zone DMA 0 2 1 1 3 2 2 1 0 1 3 Node 0, zoneDMA32 7 9 5 4 5 10 6 7 6 10410 Node 0, zone Normal 404917 119614 79446 58303 20679 3106222 89 28 9 5615 Node 1, zone Normal 507659 127355 64470 53549 14104 1288 30 4 1 1 6078 With patch9: Node 0, zone DMA 0 3 0 1 3 0 1 0 1 1 3 Node 0, zoneDMA32 11423621705726702 60 14 5 6296 Node 0, zone Normal 20407 21016 18731 16195 13697 10483 6873 3148 735 39 5637 Node 1, zone Normal 79738 76963 59313 35996 18626 9743 3947750 21 2 6080 A lot more pages stayed in order0 in patch8 than patch7, consequently, for order5 and above pages, there are fewer with patch8 than patch7, suggesting that some pages are not properly merged into high order pages with patch8 applied. Patch9 has far fewer pages stayed in order0 than patch8, which is a good sign but still not as good as patch7. As a comparison, this is the result of no_merge(think of it as a worst case result regarding fragmentation): With no_merge: Node 0, zone DMA 0 2 1
[RFC PATCH 3/9] mm: introduce smp_list_splice to prepare for concurrent LRU adds
From: Daniel Jordan Now that we splice a local list onto the LRU, prepare for multiple tasks doing this concurrently by adding a variant of the kernel's list splicing API, list_splice, that's designed to work with multiple tasks. Although there is naturally less parallelism to be gained from locking the LRU head this way, the main benefit of doing this is to allow removals to happen concurrently. The way lru_lock is today, an add needlessly blocks removal of any page but the first in the LRU. For now, hold lru_lock as writer to serialize the adds to ensure the function is correct for a single thread at a time. Yosef Lev came up with this algorithm. [aaronlu: drop LRU related code, keep only list related code] Suggested-by: Yosef Lev Signed-off-by: Daniel Jordan --- include/linux/list.h | 1 + lib/list.c | 60 ++-- 2 files changed, 54 insertions(+), 7 deletions(-) diff --git a/include/linux/list.h b/include/linux/list.h index 0fd9c87dd14b..5f203fb55939 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -48,6 +48,7 @@ static inline bool __list_del_entry_valid(struct list_head *entry) #endif extern void smp_list_del(struct list_head *entry); +extern void smp_list_splice(struct list_head *list, struct list_head *head); /* * Insert a new entry between two known consecutive entries. diff --git a/lib/list.c b/lib/list.c index 4d0949ea1a09..104faa144abf 100644 --- a/lib/list.c +++ b/lib/list.c @@ -10,17 +10,18 @@ #include /* - * smp_list_del is a variant of list_del that allows concurrent list removals - * under certain assumptions. The idea is to get away from overly coarse - * synchronization, such as using a lock to guard an entire list, which - * serializes all operations even though those operations might be happening on - * disjoint parts. + * smp_list_del and smp_list_splice are variants of list_del and list_splice, + * respectively, that allow concurrent list operations under certain + * assumptions. The idea is to get away from overly coarse synchronization, + * such as using a lock to guard an entire list, which serializes all + * operations even though those operations might be happening on disjoint + * parts. * * If you want to use other functions from the list API concurrently, * additional synchronization may be necessary. For example, you could use a * rwlock as a two-mode lock, where readers use the lock in shared mode and are - * allowed to call smp_list_del concurrently, and writers use the lock in - * exclusive mode and are allowed to use all list operations. + * allowed to call smp_list_* functions concurrently, and writers use the lock + * in exclusive mode and are allowed to use all list operations. */ /** @@ -156,3 +157,48 @@ void smp_list_del(struct list_head *entry) entry->next = LIST_POISON1; entry->prev = LIST_POISON2; } + +/** + * smp_list_splice - thread-safe splice of two lists + * @list: the new list to add + * @head: the place to add it in the first list + * + * Safely handles concurrent smp_list_splice operations onto the same list head + * and concurrent smp_list_del operations of any list entry except @head. + * Assumes that @head cannot be removed. + */ +void smp_list_splice(struct list_head *list, struct list_head *head) +{ + struct list_head *first = list->next; + struct list_head *last = list->prev; + struct list_head *succ; + + /* +* Lock the front of @head by replacing its next pointer with NULL. +* Should another thread be adding to the front, wait until it's done. +*/ + succ = READ_ONCE(head->next); + while (succ == NULL || cmpxchg(>next, succ, NULL) != succ) { + cpu_relax(); + succ = READ_ONCE(head->next); + } + + first->prev = head; + last->next = succ; + + /* +* It is safe to write to succ, head's successor, because locking head +* prevents succ from being removed in smp_list_del. +*/ + succ->prev = last; + + /* +* Pairs with the implied full barrier before the cmpxchg above. +* Ensures the write that unlocks the head is seen last to avoid list +* corruption. +*/ + smp_wmb(); + + /* Simultaneously complete the splice and unlock the head node. */ + WRITE_ONCE(head->next, first); +} -- 2.17.1
[RFC PATCH 9/9] mm: page_alloc: merge before sending pages to global pool
Now that we have mergable pages in Buddy unmerged, this is a step to reduce such things from happening to some extent. Suppose two buddy pages are on the list to be freed in free_pcppages_bulk(), the first page goes to merge but its buddy is not in Buddy yet so we hold it locally as an order0 page; then its buddy page goes to merge and couldn't merge either because we hold the first page locally instead of having it in Buddy. The end result is, we have two mergable buddy pages but failed to merge it. So this patch will attempt merge for these to-be-freed pages before acquiring any lock, it could, to some extent, reduce fragmentation caused by last patch. With this change, the pcp_drain trace isn't easy to use so I removed it. Signed-off-by: Aaron Lu --- mm/page_alloc.c | 75 +++-- 1 file changed, 73 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index df38c3f2a1cc..d3eafe857713 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1098,6 +1098,72 @@ void __init percpu_mergelist_init(void) } } +static inline bool buddy_in_list(struct page *page, struct page *buddy, +struct list_head *list) +{ + list_for_each_entry_continue(page, list, lru) + if (page == buddy) + return true; + + return false; +} + +static inline void merge_in_pcp(struct list_head *list) +{ + int order; + struct page *page; + + /* Set order information to 0 initially since they are PCP pages */ + list_for_each_entry(page, list, lru) + set_page_private(page, 0); + + /* +* Check for mergable pages for each order. +* +* For each order, check if their buddy is also in the list and +* if so, do merge, then remove the merged buddy from the list. +*/ + for (order = 0; order < MAX_ORDER - 1; order++) { + bool has_merge = false; + + page = list_first_entry(list, struct page, lru); + while (>lru != list) { + unsigned long pfn, buddy_pfn, combined_pfn; + struct page *buddy, *n; + + if (page_order(page) != order) { + page = list_next_entry(page, lru); + continue; + } + + pfn = page_to_pfn(page); + buddy_pfn = __find_buddy_pfn(pfn, order); + buddy = page + (buddy_pfn - pfn); + if (!buddy_in_list(page, buddy, list) || + page_order(buddy) != order) { + page = list_next_entry(page, lru); + continue; + } + + combined_pfn = pfn & buddy_pfn; + if (combined_pfn == pfn) { + set_page_private(page, order + 1); + list_del(>lru); + page = list_next_entry(page, lru); + } else { + set_page_private(buddy, order + 1); + n = list_next_entry(page, lru); + list_del(>lru); + page = n; + } + has_merge = true; + } + + if (!has_merge) + break; + } +} + /* * Frees a number of pages from the PCP lists * Assumes all pages on list are in same zone, and of same order. @@ -1165,6 +1231,12 @@ static void free_pcppages_bulk(struct zone *zone, int count, } while (--count && --batch_free && !list_empty(list)); } + /* +* Before acquiring the possibly heavily contended zone lock, do merge +* among these to-be-freed PCP pages before sending them to Buddy. +*/ + merge_in_pcp(); + read_lock(>lock); isolated_pageblocks = has_isolate_pageblock(zone); @@ -1182,10 +1254,9 @@ static void free_pcppages_bulk(struct zone *zone, int count, if (unlikely(isolated_pageblocks)) mt = get_pageblock_migratetype(page); - order = 0; + order = page_order(page); merged_page = do_merge(page, page_to_pfn(page), zone, , mt); list_add(_page->lru, this_cpu_ptr(_lists[order][mt])); - trace_mm_page_pcpu_drain(page, 0, mt); } for_each_migratetype_order(order, migratetype) { -- 2.17.1
[RFC PATCH 4/9] mm: convert zone lock from spinlock to rwlock
This patch converts zone lock from spinlock to rwlock and always take the lock in write mode so there is no functionality change. This is a preparation for free path to take the lock in read mode to make free path work concurrently. compact_trylock and compact_unlock_should_abort are taken from Daniel Jordan's patch. Signed-off-by: Aaron Lu --- include/linux/mmzone.h | 2 +- mm/compaction.c| 90 +- mm/hugetlb.c | 8 ++-- mm/page_alloc.c| 52 mm/page_isolation.c| 12 +++--- mm/vmstat.c| 4 +- 6 files changed, 85 insertions(+), 83 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 1e22d96734e0..84cfa56e2d19 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -465,7 +465,7 @@ struct zone { unsigned long flags; /* Primarily protects free_area */ - spinlock_t lock; + rwlock_tlock; /* Write-intensive fields used by compaction and vmstats. */ ZONE_PADDING(_pad2_) diff --git a/mm/compaction.c b/mm/compaction.c index faca45ebe62d..6ecf74d8e287 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -347,20 +347,20 @@ static inline void update_pageblock_skip(struct compact_control *cc, * Returns true if the lock is held * Returns false if the lock is not held and compaction should abort */ -static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags, - struct compact_control *cc) -{ - if (cc->mode == MIGRATE_ASYNC) { - if (!spin_trylock_irqsave(lock, *flags)) { - cc->contended = true; - return false; - } - } else { - spin_lock_irqsave(lock, *flags); - } - - return true; -} +#define compact_trylock(lock, flags, cc, lockf, trylockf) \ +({\ + bool __ret = true; \ + if ((cc)->mode == MIGRATE_ASYNC) { \ + if (!trylockf((lock), *(flags))) { \ + (cc)->contended = true;\ + __ret = false; \ + } \ + } else { \ + lockf((lock), *(flags)); \ + } \ + \ + __ret; \ +}) /* * Compaction requires the taking of some coarse locks that are potentially @@ -377,29 +377,29 @@ static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags, * Returns false when compaction can continue (sync compaction might have * scheduled) */ -static bool compact_unlock_should_abort(spinlock_t *lock, - unsigned long flags, bool *locked, struct compact_control *cc) -{ - if (*locked) { - spin_unlock_irqrestore(lock, flags); - *locked = false; - } - - if (fatal_signal_pending(current)) { - cc->contended = true; - return true; - } - - if (need_resched()) { - if (cc->mode == MIGRATE_ASYNC) { - cc->contended = true; - return true; - } - cond_resched(); - } - - return false; -} +#define compact_unlock_should_abort(lock, flags, locked, cc, unlockf) \ +({\ + bool __ret = false;\ + \ + if (*(locked)) { \ + unlockf((lock), (flags)); \ + *(locked) = false; \ + } \ + \ + if (fatal_signal_pending(current)) { \ + (cc)->contended = true;\ + __ret = true; \ + } else if (need_resched()) { \ + if ((cc)->mode == MIGRATE_ASYNC) {
[RFC PATCH 6/9] use atomic for free_area[order].nr_free
Since we will make free path run concurrently, free_area[].nr_free has to be atomic. Signed-off-by: Aaron Lu --- include/linux/mmzone.h | 2 +- mm/page_alloc.c| 12 ++-- mm/vmstat.c| 4 ++-- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 84cfa56e2d19..e66b8c63d5d1 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -95,7 +95,7 @@ extern int page_group_by_mobility_disabled; struct free_area { struct list_headfree_list[MIGRATE_TYPES]; - unsigned long nr_free; + atomic_long_t nr_free; }; struct pglist_data; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d0b954783f1d..dff3edc60d71 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -701,7 +701,7 @@ static inline void add_to_buddy_common(struct page *page, struct zone *zone, unsigned int order) { set_page_order(page, order); - zone->free_area[order].nr_free++; + atomic_long_inc(>free_area[order].nr_free); } static inline void add_to_buddy_head(struct page *page, struct zone *zone, @@ -728,7 +728,7 @@ static inline void remove_from_buddy(struct page *page, struct zone *zone, unsigned int order) { list_del(>lru); - zone->free_area[order].nr_free--; + atomic_long_dec(>free_area[order].nr_free); rmv_page_order(page); } @@ -2225,7 +2225,7 @@ int find_suitable_fallback(struct free_area *area, unsigned int order, int i; int fallback_mt; - if (area->nr_free == 0) + if (atomic_long_read(>nr_free) == 0) return -1; *can_steal = false; @@ -3178,7 +3178,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, struct free_area *area = >free_area[o]; int mt; - if (!area->nr_free) + if (atomic_long_read(>nr_free) == 0) continue; for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) { @@ -5029,7 +5029,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) struct free_area *area = >free_area[order]; int type; - nr[order] = area->nr_free; + nr[order] = atomic_long_read(>nr_free); total += nr[order] << order; types[order] = 0; @@ -5562,7 +5562,7 @@ static void __meminit zone_init_free_lists(struct zone *zone) unsigned int order, t; for_each_migratetype_order(order, t) { INIT_LIST_HEAD(>free_area[order].free_list[t]); - zone->free_area[order].nr_free = 0; + atomic_long_set(>free_area[order].nr_free, 0); } } diff --git a/mm/vmstat.c b/mm/vmstat.c index 06d79271a8ae..c1985550bb9f 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1030,7 +1030,7 @@ static void fill_contig_page_info(struct zone *zone, unsigned long blocks; /* Count number of free blocks */ - blocks = zone->free_area[order].nr_free; + blocks = atomic_long_read(>free_area[order].nr_free); info->free_blocks_total += blocks; /* Count free base pages */ @@ -1353,7 +1353,7 @@ static void frag_show_print(struct seq_file *m, pg_data_t *pgdat, seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name); for (order = 0; order < MAX_ORDER; ++order) - seq_printf(m, "%6lu ", zone->free_area[order].nr_free); + seq_printf(m, "%6lu ", atomic_long_read(>free_area[order].nr_free)); seq_putc(m, '\n'); } -- 2.17.1
[RFC PATCH 7/9] mm: use read_lock for free path
Daniel Jordan's patch has made it possible for multiple threads to operate on a global list with smp_list_del() at any position and smp_list_add/splice() at head position concurrently without taking any lock. This patch makes use of this technique on free list. To make this happen, add_to_buddy_tail() is removed since only adding to list head is safe with smp_list_del() so only add_to_buddy() is used. Once free path can run concurrently, it is possible for multiple threads to free pages at the same time. If 2 pages being freed are buddy, they can miss the oppotunity to be merged. For this reason, introduce range locks to protect merge operation that makes sure inside one range, only one merge can happen and a page's Buddy status is properly set inside the lock. The range is selected as an order of (MAX_ORDER-1) pages since merge can't exceed that order. Signed-off-by: Aaron Lu --- include/linux/list.h | 1 + include/linux/mmzone.h | 3 ++ lib/list.c | 23 ++ mm/page_alloc.c| 95 +++--- 4 files changed, 78 insertions(+), 44 deletions(-) diff --git a/include/linux/list.h b/include/linux/list.h index 5f203fb55939..608e40f6489e 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -49,6 +49,7 @@ static inline bool __list_del_entry_valid(struct list_head *entry) extern void smp_list_del(struct list_head *entry); extern void smp_list_splice(struct list_head *list, struct list_head *head); +extern void smp_list_add(struct list_head *entry, struct list_head *head); /* * Insert a new entry between two known consecutive entries. diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e66b8c63d5d1..0ea52e9bb610 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -467,6 +467,9 @@ struct zone { /* Primarily protects free_area */ rwlock_tlock; + /* Protects merge operation for a range of order=(MAX_ORDER-1) pages */ + spinlock_t *range_locks; + /* Write-intensive fields used by compaction and vmstats. */ ZONE_PADDING(_pad2_) diff --git a/lib/list.c b/lib/list.c index 104faa144abf..3ecf62b88c86 100644 --- a/lib/list.c +++ b/lib/list.c @@ -202,3 +202,26 @@ void smp_list_splice(struct list_head *list, struct list_head *head) /* Simultaneously complete the splice and unlock the head node. */ WRITE_ONCE(head->next, first); } + +void smp_list_add(struct list_head *entry, struct list_head *head) +{ + struct list_head *succ; + + /* +* Lock the front of @head by replacing its next pointer with NULL. +* Should another thread be adding to the front, wait until it's done. +*/ + succ = READ_ONCE(head->next); + while (succ == NULL || cmpxchg(>next, succ, NULL) != succ) { + cpu_relax(); + succ = READ_ONCE(head->next); + } + + entry->next = succ; + entry->prev = head; + succ->prev = entry; + + smp_wmb(); + + WRITE_ONCE(head->next, entry); +} diff --git a/mm/page_alloc.c b/mm/page_alloc.c index dff3edc60d71..5f5cc671bcf7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -339,6 +339,17 @@ static inline bool update_defer_init(pg_data_t *pgdat, } #endif +/* Return a pointer to the spinblock for a pageblock this page belongs to */ +static inline spinlock_t *get_range_lock(struct page *page) +{ + struct zone *zone = page_zone(page); + unsigned long zone_start_pfn = zone->zone_start_pfn; + unsigned long range = (page_to_pfn(page) - zone_start_pfn) >> + (MAX_ORDER - 1); + + return >range_locks[range]; +} + /* Return a pointer to the bitmap storing bits affecting a block of pages */ static inline unsigned long *get_pageblock_bitmap(struct page *page, unsigned long pfn) @@ -697,25 +708,12 @@ static inline void set_page_order(struct page *page, unsigned int order) __SetPageBuddy(page); } -static inline void add_to_buddy_common(struct page *page, struct zone *zone, - unsigned int order) +static inline void add_to_buddy(struct page *page, struct zone *zone, + unsigned int order, int mt) { set_page_order(page, order); atomic_long_inc(>free_area[order].nr_free); -} - -static inline void add_to_buddy_head(struct page *page, struct zone *zone, - unsigned int order, int mt) -{ - add_to_buddy_common(page, zone, order); - list_add(>lru, >free_area[order].free_list[mt]); -} - -static inline void add_to_buddy_tail(struct page *page, struct zone *zone, - unsigned int order, int mt) -{ - add_to_buddy_common(page, zone, order); - list_add_tail(>lru, >free_area[order].free_list[mt]); +
[RFC PATCH 4/9] mm: convert zone lock from spinlock to rwlock
This patch converts zone lock from spinlock to rwlock and always take the lock in write mode so there is no functionality change. This is a preparation for free path to take the lock in read mode to make free path work concurrently. compact_trylock and compact_unlock_should_abort are taken from Daniel Jordan's patch. Signed-off-by: Aaron Lu --- include/linux/mmzone.h | 2 +- mm/compaction.c| 90 +- mm/hugetlb.c | 8 ++-- mm/page_alloc.c| 52 mm/page_isolation.c| 12 +++--- mm/vmstat.c| 4 +- 6 files changed, 85 insertions(+), 83 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 1e22d96734e0..84cfa56e2d19 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -465,7 +465,7 @@ struct zone { unsigned long flags; /* Primarily protects free_area */ - spinlock_t lock; + rwlock_tlock; /* Write-intensive fields used by compaction and vmstats. */ ZONE_PADDING(_pad2_) diff --git a/mm/compaction.c b/mm/compaction.c index faca45ebe62d..6ecf74d8e287 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -347,20 +347,20 @@ static inline void update_pageblock_skip(struct compact_control *cc, * Returns true if the lock is held * Returns false if the lock is not held and compaction should abort */ -static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags, - struct compact_control *cc) -{ - if (cc->mode == MIGRATE_ASYNC) { - if (!spin_trylock_irqsave(lock, *flags)) { - cc->contended = true; - return false; - } - } else { - spin_lock_irqsave(lock, *flags); - } - - return true; -} +#define compact_trylock(lock, flags, cc, lockf, trylockf) \ +({\ + bool __ret = true; \ + if ((cc)->mode == MIGRATE_ASYNC) { \ + if (!trylockf((lock), *(flags))) { \ + (cc)->contended = true;\ + __ret = false; \ + } \ + } else { \ + lockf((lock), *(flags)); \ + } \ + \ + __ret; \ +}) /* * Compaction requires the taking of some coarse locks that are potentially @@ -377,29 +377,29 @@ static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags, * Returns false when compaction can continue (sync compaction might have * scheduled) */ -static bool compact_unlock_should_abort(spinlock_t *lock, - unsigned long flags, bool *locked, struct compact_control *cc) -{ - if (*locked) { - spin_unlock_irqrestore(lock, flags); - *locked = false; - } - - if (fatal_signal_pending(current)) { - cc->contended = true; - return true; - } - - if (need_resched()) { - if (cc->mode == MIGRATE_ASYNC) { - cc->contended = true; - return true; - } - cond_resched(); - } - - return false; -} +#define compact_unlock_should_abort(lock, flags, locked, cc, unlockf) \ +({\ + bool __ret = false;\ + \ + if (*(locked)) { \ + unlockf((lock), (flags)); \ + *(locked) = false; \ + } \ + \ + if (fatal_signal_pending(current)) { \ + (cc)->contended = true;\ + __ret = true; \ + } else if (need_resched()) { \ + if ((cc)->mode == MIGRATE_ASYNC) {
[RFC PATCH 6/9] use atomic for free_area[order].nr_free
Since we will make free path run concurrently, free_area[].nr_free has to be atomic. Signed-off-by: Aaron Lu --- include/linux/mmzone.h | 2 +- mm/page_alloc.c| 12 ++-- mm/vmstat.c| 4 ++-- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 84cfa56e2d19..e66b8c63d5d1 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -95,7 +95,7 @@ extern int page_group_by_mobility_disabled; struct free_area { struct list_headfree_list[MIGRATE_TYPES]; - unsigned long nr_free; + atomic_long_t nr_free; }; struct pglist_data; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d0b954783f1d..dff3edc60d71 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -701,7 +701,7 @@ static inline void add_to_buddy_common(struct page *page, struct zone *zone, unsigned int order) { set_page_order(page, order); - zone->free_area[order].nr_free++; + atomic_long_inc(>free_area[order].nr_free); } static inline void add_to_buddy_head(struct page *page, struct zone *zone, @@ -728,7 +728,7 @@ static inline void remove_from_buddy(struct page *page, struct zone *zone, unsigned int order) { list_del(>lru); - zone->free_area[order].nr_free--; + atomic_long_dec(>free_area[order].nr_free); rmv_page_order(page); } @@ -2225,7 +2225,7 @@ int find_suitable_fallback(struct free_area *area, unsigned int order, int i; int fallback_mt; - if (area->nr_free == 0) + if (atomic_long_read(>nr_free) == 0) return -1; *can_steal = false; @@ -3178,7 +3178,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, struct free_area *area = >free_area[o]; int mt; - if (!area->nr_free) + if (atomic_long_read(>nr_free) == 0) continue; for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) { @@ -5029,7 +5029,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) struct free_area *area = >free_area[order]; int type; - nr[order] = area->nr_free; + nr[order] = atomic_long_read(>nr_free); total += nr[order] << order; types[order] = 0; @@ -5562,7 +5562,7 @@ static void __meminit zone_init_free_lists(struct zone *zone) unsigned int order, t; for_each_migratetype_order(order, t) { INIT_LIST_HEAD(>free_area[order].free_list[t]); - zone->free_area[order].nr_free = 0; + atomic_long_set(>free_area[order].nr_free, 0); } } diff --git a/mm/vmstat.c b/mm/vmstat.c index 06d79271a8ae..c1985550bb9f 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1030,7 +1030,7 @@ static void fill_contig_page_info(struct zone *zone, unsigned long blocks; /* Count number of free blocks */ - blocks = zone->free_area[order].nr_free; + blocks = atomic_long_read(>free_area[order].nr_free); info->free_blocks_total += blocks; /* Count free base pages */ @@ -1353,7 +1353,7 @@ static void frag_show_print(struct seq_file *m, pg_data_t *pgdat, seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name); for (order = 0; order < MAX_ORDER; ++order) - seq_printf(m, "%6lu ", zone->free_area[order].nr_free); + seq_printf(m, "%6lu ", atomic_long_read(>free_area[order].nr_free)); seq_putc(m, '\n'); } -- 2.17.1
[RFC PATCH 7/9] mm: use read_lock for free path
Daniel Jordan's patch has made it possible for multiple threads to operate on a global list with smp_list_del() at any position and smp_list_add/splice() at head position concurrently without taking any lock. This patch makes use of this technique on free list. To make this happen, add_to_buddy_tail() is removed since only adding to list head is safe with smp_list_del() so only add_to_buddy() is used. Once free path can run concurrently, it is possible for multiple threads to free pages at the same time. If 2 pages being freed are buddy, they can miss the oppotunity to be merged. For this reason, introduce range locks to protect merge operation that makes sure inside one range, only one merge can happen and a page's Buddy status is properly set inside the lock. The range is selected as an order of (MAX_ORDER-1) pages since merge can't exceed that order. Signed-off-by: Aaron Lu --- include/linux/list.h | 1 + include/linux/mmzone.h | 3 ++ lib/list.c | 23 ++ mm/page_alloc.c| 95 +++--- 4 files changed, 78 insertions(+), 44 deletions(-) diff --git a/include/linux/list.h b/include/linux/list.h index 5f203fb55939..608e40f6489e 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -49,6 +49,7 @@ static inline bool __list_del_entry_valid(struct list_head *entry) extern void smp_list_del(struct list_head *entry); extern void smp_list_splice(struct list_head *list, struct list_head *head); +extern void smp_list_add(struct list_head *entry, struct list_head *head); /* * Insert a new entry between two known consecutive entries. diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e66b8c63d5d1..0ea52e9bb610 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -467,6 +467,9 @@ struct zone { /* Primarily protects free_area */ rwlock_tlock; + /* Protects merge operation for a range of order=(MAX_ORDER-1) pages */ + spinlock_t *range_locks; + /* Write-intensive fields used by compaction and vmstats. */ ZONE_PADDING(_pad2_) diff --git a/lib/list.c b/lib/list.c index 104faa144abf..3ecf62b88c86 100644 --- a/lib/list.c +++ b/lib/list.c @@ -202,3 +202,26 @@ void smp_list_splice(struct list_head *list, struct list_head *head) /* Simultaneously complete the splice and unlock the head node. */ WRITE_ONCE(head->next, first); } + +void smp_list_add(struct list_head *entry, struct list_head *head) +{ + struct list_head *succ; + + /* +* Lock the front of @head by replacing its next pointer with NULL. +* Should another thread be adding to the front, wait until it's done. +*/ + succ = READ_ONCE(head->next); + while (succ == NULL || cmpxchg(>next, succ, NULL) != succ) { + cpu_relax(); + succ = READ_ONCE(head->next); + } + + entry->next = succ; + entry->prev = head; + succ->prev = entry; + + smp_wmb(); + + WRITE_ONCE(head->next, entry); +} diff --git a/mm/page_alloc.c b/mm/page_alloc.c index dff3edc60d71..5f5cc671bcf7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -339,6 +339,17 @@ static inline bool update_defer_init(pg_data_t *pgdat, } #endif +/* Return a pointer to the spinblock for a pageblock this page belongs to */ +static inline spinlock_t *get_range_lock(struct page *page) +{ + struct zone *zone = page_zone(page); + unsigned long zone_start_pfn = zone->zone_start_pfn; + unsigned long range = (page_to_pfn(page) - zone_start_pfn) >> + (MAX_ORDER - 1); + + return >range_locks[range]; +} + /* Return a pointer to the bitmap storing bits affecting a block of pages */ static inline unsigned long *get_pageblock_bitmap(struct page *page, unsigned long pfn) @@ -697,25 +708,12 @@ static inline void set_page_order(struct page *page, unsigned int order) __SetPageBuddy(page); } -static inline void add_to_buddy_common(struct page *page, struct zone *zone, - unsigned int order) +static inline void add_to_buddy(struct page *page, struct zone *zone, + unsigned int order, int mt) { set_page_order(page, order); atomic_long_inc(>free_area[order].nr_free); -} - -static inline void add_to_buddy_head(struct page *page, struct zone *zone, - unsigned int order, int mt) -{ - add_to_buddy_common(page, zone, order); - list_add(>lru, >free_area[order].free_list[mt]); -} - -static inline void add_to_buddy_tail(struct page *page, struct zone *zone, - unsigned int order, int mt) -{ - add_to_buddy_common(page, zone, order); - list_add_tail(>lru, >free_area[order].free_list[mt]); +
Re: [RFC PATCH v2 2/2] fscrypt: enable RCU-walk path for .d_revalidate
Hi Eric, On 2018/9/11 7:20, Eric Biggers wrote: > Hi Gao, > > On Mon, Sep 10, 2018 at 09:08:57PM +0800, Gao Xiang wrote: >> This patch attempts to enable RCU-walk for fscrypt. >> It looks harmless at glance and could have better >> performance than do ref-walk only. >> >> Signed-off-by: Gao Xiang >> --- >> change log v2: >> - READ_ONCE(dir->d_parent) -> READ_ONCE(dentry->d_parent) >> >> fs/crypto/crypto.c | 22 +- >> 1 file changed, 13 insertions(+), 9 deletions(-) >> >> diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c >> index b38c574..9bd21c0 100644 >> --- a/fs/crypto/crypto.c >> +++ b/fs/crypto/crypto.c >> @@ -319,20 +319,24 @@ static int fscrypt_d_revalidate(struct dentry *dentry, >> unsigned int flags) >> { >> struct dentry *dir; >> int dir_has_key, cached_with_key; >> - >> -if (flags & LOOKUP_RCU) >> -return -ECHILD; >> - >> -dir = dget_parent(dentry); >> -if (!IS_ENCRYPTED(d_inode(dir))) { >> -dput(dir); >> +struct inode *dir_inode; >> + >> +rcu_read_lock(); >> +repeat: >> +dir = READ_ONCE(dentry->d_parent); >> +dir_inode = d_inode_rcu(dir); >> +if (!IS_ENCRYPTED(dir_inode)) { >> +rcu_read_unlock(); >> return 0; >> } >> +dir_has_key = (dir_inode->i_crypt_info != NULL); >> +if (unlikely(READ_ONCE(dir->d_lockref.count) < 0 || >> +READ_ONCE(dentry->d_parent) != dir)) >> >> >> +rcu_read_unlock(); >> >> cached_with_key = READ_ONCE(dentry->d_flags) & >> DCACHE_ENCRYPTED_WITH_KEY; >> -dir_has_key = (d_inode(dir)->i_crypt_info != NULL); >> -dput(dir); >> > > I think you're right that we don't have to drop out of RCU mode here, but can > you please Cc linux-fsdevel so that people more knowledgeable about path > lookup > can review this too? This kind of stuff is very tricky. Please resend both > patches. > > Also please indent properly: > > if (unlikely(READ_ONCE(dir->d_lockref.count) < 0 || > READ_ONCE(dentry->d_parent) != dir)) > goto repeat; > > Why read d_lockref.count directly instead of using __lockref_is_dead()? will be fixed in the next version, thanks. > > Also since there's no longer any reference to the parent dentry taken, how do > you know it's still positive (non-NULL d_inode), i.e. that the directory > hasn't > been removed and turned into a negative dentry (NULL d_inode)? I think you are right. I saw this fscrypt piece of code when I was locating a problem related to fscrypt (I am still taking part in it since the problem is urgent). It seems that it could be turned into a negative dentry by d_delete() etc. I will rethink this flow more, make the next patch later and Cc linux-devel the next time. > > I'm also wondering whether the retry loop is actually needed. Can you explain > your thoughts more? But if it is needed, in principle you'd actually need to > wait until after the loop before taking any action based on dir_inode, right? > That would mean the 'rcu_read_unlock(); return 0;' is in the wrong place. What I thought was that I guess it needs to be more strict to claim the dentry is still valid than other cases (therefore IS_ENCRYPTED is not so strict, that is my personal thought tho.) If the parent dentry just sampled is invalid, since the dentry and inode are protected by rcu, so there is no way to READ_ONCE(dentry->d_parent) == dir. Therefore I sampled (IS_ENCRYPTED, dir_has_key) and do a final basic validity check at last --- currently dentry itself (maybe inode later), and I tend to try again especially for ref-walk case (which not governed by d_seq) since it is more lightweight (like a seqlock) than taking & releasing d_lock (or even return 0 to do real lookup again) I think. That is my personal thought, could not be accurate, and I am trying to learn more about the fscrypt due to the urgent problem. If any error, please kindly point out, thanks... Thanks, Gao Xiang > > Thanks, > > - Eric >
Re: [RFC PATCH v2 2/2] fscrypt: enable RCU-walk path for .d_revalidate
Hi Eric, On 2018/9/11 7:20, Eric Biggers wrote: > Hi Gao, > > On Mon, Sep 10, 2018 at 09:08:57PM +0800, Gao Xiang wrote: >> This patch attempts to enable RCU-walk for fscrypt. >> It looks harmless at glance and could have better >> performance than do ref-walk only. >> >> Signed-off-by: Gao Xiang >> --- >> change log v2: >> - READ_ONCE(dir->d_parent) -> READ_ONCE(dentry->d_parent) >> >> fs/crypto/crypto.c | 22 +- >> 1 file changed, 13 insertions(+), 9 deletions(-) >> >> diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c >> index b38c574..9bd21c0 100644 >> --- a/fs/crypto/crypto.c >> +++ b/fs/crypto/crypto.c >> @@ -319,20 +319,24 @@ static int fscrypt_d_revalidate(struct dentry *dentry, >> unsigned int flags) >> { >> struct dentry *dir; >> int dir_has_key, cached_with_key; >> - >> -if (flags & LOOKUP_RCU) >> -return -ECHILD; >> - >> -dir = dget_parent(dentry); >> -if (!IS_ENCRYPTED(d_inode(dir))) { >> -dput(dir); >> +struct inode *dir_inode; >> + >> +rcu_read_lock(); >> +repeat: >> +dir = READ_ONCE(dentry->d_parent); >> +dir_inode = d_inode_rcu(dir); >> +if (!IS_ENCRYPTED(dir_inode)) { >> +rcu_read_unlock(); >> return 0; >> } >> +dir_has_key = (dir_inode->i_crypt_info != NULL); >> +if (unlikely(READ_ONCE(dir->d_lockref.count) < 0 || >> +READ_ONCE(dentry->d_parent) != dir)) >> >> >> +rcu_read_unlock(); >> >> cached_with_key = READ_ONCE(dentry->d_flags) & >> DCACHE_ENCRYPTED_WITH_KEY; >> -dir_has_key = (d_inode(dir)->i_crypt_info != NULL); >> -dput(dir); >> > > I think you're right that we don't have to drop out of RCU mode here, but can > you please Cc linux-fsdevel so that people more knowledgeable about path > lookup > can review this too? This kind of stuff is very tricky. Please resend both > patches. > > Also please indent properly: > > if (unlikely(READ_ONCE(dir->d_lockref.count) < 0 || > READ_ONCE(dentry->d_parent) != dir)) > goto repeat; > > Why read d_lockref.count directly instead of using __lockref_is_dead()? will be fixed in the next version, thanks. > > Also since there's no longer any reference to the parent dentry taken, how do > you know it's still positive (non-NULL d_inode), i.e. that the directory > hasn't > been removed and turned into a negative dentry (NULL d_inode)? I think you are right. I saw this fscrypt piece of code when I was locating a problem related to fscrypt (I am still taking part in it since the problem is urgent). It seems that it could be turned into a negative dentry by d_delete() etc. I will rethink this flow more, make the next patch later and Cc linux-devel the next time. > > I'm also wondering whether the retry loop is actually needed. Can you explain > your thoughts more? But if it is needed, in principle you'd actually need to > wait until after the loop before taking any action based on dir_inode, right? > That would mean the 'rcu_read_unlock(); return 0;' is in the wrong place. What I thought was that I guess it needs to be more strict to claim the dentry is still valid than other cases (therefore IS_ENCRYPTED is not so strict, that is my personal thought tho.) If the parent dentry just sampled is invalid, since the dentry and inode are protected by rcu, so there is no way to READ_ONCE(dentry->d_parent) == dir. Therefore I sampled (IS_ENCRYPTED, dir_has_key) and do a final basic validity check at last --- currently dentry itself (maybe inode later), and I tend to try again especially for ref-walk case (which not governed by d_seq) since it is more lightweight (like a seqlock) than taking & releasing d_lock (or even return 0 to do real lookup again) I think. That is my personal thought, could not be accurate, and I am trying to learn more about the fscrypt due to the urgent problem. If any error, please kindly point out, thanks... Thanks, Gao Xiang > > Thanks, > > - Eric >
RE: [LINUX PATCH v11 1/3] dt-bindings: memory: Add pl353 smc controller devicetree binding information
Hi, [LINUX PATCH v11 1/3] dt-bindings: memory: Add pl353 smc controller devicetree binding information [LINUX PATCH v11 2/3] memory: pl353: Add driver for arm pl353 static memory controller Can somebody apply the above patches? The above patches are already reviewed. Thanks, Naga Sureshkumar Relli. > -Original Message- > From: Linus Walleij [mailto:linus.wall...@linaro.org] > Sent: Friday, July 13, 2018 1:07 PM > To: Naga Sureshkumar Relli > Cc: Boris Brezillon ; Richard Weinberger > ; > David Woodhouse ; Brian Norris > ; Mark Vasut ; Florian > Fainelli > ; Markus Mayer ; Roger Quadros > ; Ladislav Michl ; a...@thorsis.co; > honghui.zh...@mediatek.com; Miquèl Raynal ; linux- > m...@lists.infradead.org; linux-kernel@vger.kernel.org; > nagasureshkumarre...@gmail.com; > Michal Simek > Subject: Re: [LINUX PATCH v11 1/3] dt-bindings: memory: Add pl353 smc > controller > devicetree binding information > > On Wed, Jul 11, 2018 at 9:37 AM Naga Sureshkumar Relli > wrote: > > > Add pl353 static memory controller devicetree binding information. > > > > Signed-off-by: Naga Sureshkumar Relli > > > > --- > > Changes in v11: > > Reviewed-by: Linus Walleij Thanks! > > Yours, > Linus Walleij
RE: [LINUX PATCH v11 1/3] dt-bindings: memory: Add pl353 smc controller devicetree binding information
Hi, [LINUX PATCH v11 1/3] dt-bindings: memory: Add pl353 smc controller devicetree binding information [LINUX PATCH v11 2/3] memory: pl353: Add driver for arm pl353 static memory controller Can somebody apply the above patches? The above patches are already reviewed. Thanks, Naga Sureshkumar Relli. > -Original Message- > From: Linus Walleij [mailto:linus.wall...@linaro.org] > Sent: Friday, July 13, 2018 1:07 PM > To: Naga Sureshkumar Relli > Cc: Boris Brezillon ; Richard Weinberger > ; > David Woodhouse ; Brian Norris > ; Mark Vasut ; Florian > Fainelli > ; Markus Mayer ; Roger Quadros > ; Ladislav Michl ; a...@thorsis.co; > honghui.zh...@mediatek.com; Miquèl Raynal ; linux- > m...@lists.infradead.org; linux-kernel@vger.kernel.org; > nagasureshkumarre...@gmail.com; > Michal Simek > Subject: Re: [LINUX PATCH v11 1/3] dt-bindings: memory: Add pl353 smc > controller > devicetree binding information > > On Wed, Jul 11, 2018 at 9:37 AM Naga Sureshkumar Relli > wrote: > > > Add pl353 static memory controller devicetree binding information. > > > > Signed-off-by: Naga Sureshkumar Relli > > > > --- > > Changes in v11: > > Reviewed-by: Linus Walleij Thanks! > > Yours, > Linus Walleij
[PATCH] scsi: qla2xxx: reduce time granularity of qla2x00_eh_wait_on_command
If the cmd has not be returned after aborted by qla2x00_eh_abort, we have to wait for it. However, the time is 1000ms at least currently. If there are a lot cmds need to be aborted, the delay could be long enough to lead to panic due to such as hung task, ocfs2 heartbeat, etc, just before scsi recovery works. Change the granularity to 1ms, even though more context switches would be introduced, but it should be ok as it is not hot path. Signed-off-by: Jianchao Wang --- drivers/scsi/qla2xxx/qla_os.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 42b8f0d..570d93b 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -1065,7 +1065,7 @@ qla2xxx_mqueuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd, static int qla2x00_eh_wait_on_command(struct scsi_cmnd *cmd) { -#define ABORT_POLLING_PERIOD 1000 +#define ABORT_POLLING_PERIOD 1 #define ABORT_WAIT_ITER((2 * 1000) / (ABORT_POLLING_PERIOD)) unsigned long wait_iter = ABORT_WAIT_ITER; scsi_qla_host_t *vha = shost_priv(cmd->device->host); -- 2.7.4
[PATCH] scsi: qla2xxx: reduce time granularity of qla2x00_eh_wait_on_command
If the cmd has not be returned after aborted by qla2x00_eh_abort, we have to wait for it. However, the time is 1000ms at least currently. If there are a lot cmds need to be aborted, the delay could be long enough to lead to panic due to such as hung task, ocfs2 heartbeat, etc, just before scsi recovery works. Change the granularity to 1ms, even though more context switches would be introduced, but it should be ok as it is not hot path. Signed-off-by: Jianchao Wang --- drivers/scsi/qla2xxx/qla_os.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 42b8f0d..570d93b 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -1065,7 +1065,7 @@ qla2xxx_mqueuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd, static int qla2x00_eh_wait_on_command(struct scsi_cmnd *cmd) { -#define ABORT_POLLING_PERIOD 1000 +#define ABORT_POLLING_PERIOD 1 #define ABORT_WAIT_ITER((2 * 1000) / (ABORT_POLLING_PERIOD)) unsigned long wait_iter = ABORT_WAIT_ITER; scsi_qla_host_t *vha = shost_priv(cmd->device->host); -- 2.7.4
RE: [LINUX PATCH v10 2/2] mtd: rawnand: arasan: Add support for Arasan NAND Flash Controller
Hi Boris, > -Original Message- > From: Boris Brezillon [mailto:boris.brezil...@bootlin.com] > Sent: Monday, August 20, 2018 10:10 PM > To: Naga Sureshkumar Relli > Cc: miquel.ray...@bootlin.com; rich...@nod.at; dw...@infradead.org; > computersforpe...@gmail.com; marek.va...@gmail.com; kyungmin.p...@samsung.com; > abs...@codeaurora.org; peterpand...@micron.com; frieder.schre...@exceet.de; > linux- > m...@lists.infradead.org; linux-kernel@vger.kernel.org; Michal Simek > ; > nagasureshkumarre...@gmail.com > Subject: Re: [LINUX PATCH v10 2/2] mtd: rawnand: arasan: Add support for > Arasan NAND > Flash Controller > > Hi Naga, > > On Fri, 17 Aug 2018 18:49:24 +0530 > Naga Sureshkumar Relli wrote: > > > I haven't finished reviewing the driver but there are still a bunch of things > that look strange, for > instance, your ->read/write_page() implementation looks suspicious. Let's > discuss that before > you send a new version. Could you please review the remaining stuff? I have the changes ready with me which will address all your comments given to this series. Thanks, Naga Sureshkumar Relli
RE: [LINUX PATCH v10 2/2] mtd: rawnand: arasan: Add support for Arasan NAND Flash Controller
Hi Boris, > -Original Message- > From: Boris Brezillon [mailto:boris.brezil...@bootlin.com] > Sent: Monday, August 20, 2018 10:10 PM > To: Naga Sureshkumar Relli > Cc: miquel.ray...@bootlin.com; rich...@nod.at; dw...@infradead.org; > computersforpe...@gmail.com; marek.va...@gmail.com; kyungmin.p...@samsung.com; > abs...@codeaurora.org; peterpand...@micron.com; frieder.schre...@exceet.de; > linux- > m...@lists.infradead.org; linux-kernel@vger.kernel.org; Michal Simek > ; > nagasureshkumarre...@gmail.com > Subject: Re: [LINUX PATCH v10 2/2] mtd: rawnand: arasan: Add support for > Arasan NAND > Flash Controller > > Hi Naga, > > On Fri, 17 Aug 2018 18:49:24 +0530 > Naga Sureshkumar Relli wrote: > > > I haven't finished reviewing the driver but there are still a bunch of things > that look strange, for > instance, your ->read/write_page() implementation looks suspicious. Let's > discuss that before > you send a new version. Could you please review the remaining stuff? I have the changes ready with me which will address all your comments given to this series. Thanks, Naga Sureshkumar Relli
[LKP] [kernfs, sysfs, cgroup, intel_rdt] a8c7fe83d1: BUG:kernel_hang_in_test_stage
FYI, we noticed the following commit (built with gcc-6): commit: a8c7fe83d17109b77c7b27a23140e76d3753fa6a ("kernfs, sysfs, cgroup, intel_rdt: Support fs_context") https://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git btrfs-mount-api in testcase: trinity with following parameters: runtime: 300s test-description: Trinity is a linux system call fuzz tester. test-url: http://codemonkey.org.uk/projects/trinity/ on test machine: qemu-system-x86_64 -enable-kvm -cpu Haswell,+smep,+smap -smp 2 -m 2G caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +--+++ | | 88ed9083f5 | a8c7fe83d1 | +--+++ | boot_successes | 2 | 0 | | boot_failures| 10 | 10 | | BUG:KASAN:null-ptr-deref_in_n| 10 || | BUG:unable_to_handle_kernel | 10 || | Oops:#[##] | 10 || | RIP:nfs_fs_mount | 10 || | Kernel_panic-not_syncing:Fatal_exception | 10 || | BUG:kernel_hang_in_test_stage| 0 | 8 | | invoked_oom-killer:gfp_mask=0x | 0 | 2 | | Mem-Info | 0 | 2 | | Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 0 | 2 | +--+++ [ 17.521153] random: get_random_bytes called from flow_hash_from_keys+0x3e9/0x480 with crng_init=0 [ 17.526197] random: get_random_bytes called from addrconf_dad_kick+0xf7/0x1a0 with crng_init=0 [ 55.227136] random: get_random_bytes called from __prandom_timer+0x57/0xc0 with crng_init=0 [ 442.854487] random: fast init done [ 880.187591] random: crng init done BUG: kernel hang in test stage Elapsed time: 2690 #!/bin/bash To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k job-script # job-script is attached in this email Thanks, Rong, Chen # # Automatically generated file; DO NOT EDIT. # Linux/x86_64 4.19.0-rc1 Kernel Configuration # # # Compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026 # CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=60400 CONFIG_CLANG_VERSION=0 CONFIG_CONSTRUCTORS=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_BUILD_SALT="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y # CONFIG_SYSVIPC is not set # CONFIG_POSIX_MQUEUE is not set CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_USELIB=y # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_CHIP=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_GENERIC_IRQ_DEBUGFS=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y CONFIG_NO_HZ=y # CONFIG_HIGH_RES_TIMERS is not set # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y CONFIG_PREEMPT_COUNT=y # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set # CONFIG_IRQ_TIME_ACCOUNTING is not set CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # CONFIG_TASKSTATS is not set # # RCU Subsystem # CONFIG_PREEMPT_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y CONFIG_TREE_SRCU=y CONFIG_TASKS_RCU=y CONFIG_RCU_STALL_COMMON=y
[LKP] [kernfs, sysfs, cgroup, intel_rdt] a8c7fe83d1: BUG:kernel_hang_in_test_stage
FYI, we noticed the following commit (built with gcc-6): commit: a8c7fe83d17109b77c7b27a23140e76d3753fa6a ("kernfs, sysfs, cgroup, intel_rdt: Support fs_context") https://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git btrfs-mount-api in testcase: trinity with following parameters: runtime: 300s test-description: Trinity is a linux system call fuzz tester. test-url: http://codemonkey.org.uk/projects/trinity/ on test machine: qemu-system-x86_64 -enable-kvm -cpu Haswell,+smep,+smap -smp 2 -m 2G caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +--+++ | | 88ed9083f5 | a8c7fe83d1 | +--+++ | boot_successes | 2 | 0 | | boot_failures| 10 | 10 | | BUG:KASAN:null-ptr-deref_in_n| 10 || | BUG:unable_to_handle_kernel | 10 || | Oops:#[##] | 10 || | RIP:nfs_fs_mount | 10 || | Kernel_panic-not_syncing:Fatal_exception | 10 || | BUG:kernel_hang_in_test_stage| 0 | 8 | | invoked_oom-killer:gfp_mask=0x | 0 | 2 | | Mem-Info | 0 | 2 | | Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 0 | 2 | +--+++ [ 17.521153] random: get_random_bytes called from flow_hash_from_keys+0x3e9/0x480 with crng_init=0 [ 17.526197] random: get_random_bytes called from addrconf_dad_kick+0xf7/0x1a0 with crng_init=0 [ 55.227136] random: get_random_bytes called from __prandom_timer+0x57/0xc0 with crng_init=0 [ 442.854487] random: fast init done [ 880.187591] random: crng init done BUG: kernel hang in test stage Elapsed time: 2690 #!/bin/bash To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k job-script # job-script is attached in this email Thanks, Rong, Chen # # Automatically generated file; DO NOT EDIT. # Linux/x86_64 4.19.0-rc1 Kernel Configuration # # # Compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026 # CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=60400 CONFIG_CLANG_VERSION=0 CONFIG_CONSTRUCTORS=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_BUILD_SALT="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y # CONFIG_SYSVIPC is not set # CONFIG_POSIX_MQUEUE is not set CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_USELIB=y # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_CHIP=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_GENERIC_IRQ_DEBUGFS=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y CONFIG_NO_HZ=y # CONFIG_HIGH_RES_TIMERS is not set # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y CONFIG_PREEMPT_COUNT=y # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set # CONFIG_IRQ_TIME_ACCOUNTING is not set CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # CONFIG_TASKSTATS is not set # # RCU Subsystem # CONFIG_PREEMPT_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y CONFIG_TREE_SRCU=y CONFIG_TASKS_RCU=y CONFIG_RCU_STALL_COMMON=y
Re: [PATCHv3 2/6] tty/ldsem: Update waiter->task before waking up reader
On (09/11/18 02:48), Dmitry Safonov wrote: > There is a couple of reports about lockup in ldsem_down_read() without > anyone holding write end of ldisc semaphore: > lkml.kernel.org/r/<20171121132855.ajdv4k6swzhvk...@wfg-t540p.sh.intel.com> > lkml.kernel.org/r/<20180907045041.GF1110@shao2-debian> > > They all looked like a missed wake up. > I wasn't lucky enough to reproduce it, but it seems like reader on > another CPU can miss waiter->task update and schedule again, resulting > in indefinite (MAX_SCHEDULE_TIMEOUT) sleep. Certainly, something suspicious is going on. > @@ -118,6 +118,8 @@ static void __ldsem_wake_readers(struct ld_semaphore *sem) > tsk = waiter->task; > smp_mb(); > waiter->task = NULL; > + /* Make sure down_read_failed() will see !waiter->task update */ > + smp_wmb(); > wake_up_process(tsk); Hmm. I think wake_up_process() executes a full memory barrier, because it accesses task state. > put_task_struct(tsk); > } > @@ -217,7 +219,7 @@ down_read_failed(struct ld_semaphore *sem, long count, > long timeout) > for (;;) { > set_current_state(TASK_UNINTERRUPTIBLE); I think that set_current_state() also executes memory barrier. Just because it accesses task state. > - if (!waiter.task) > + if (!READ_ONCE(waiter.task)) > break; > if (!timeout) > break; -ss
Re: [PATCHv3 2/6] tty/ldsem: Update waiter->task before waking up reader
On (09/11/18 02:48), Dmitry Safonov wrote: > There is a couple of reports about lockup in ldsem_down_read() without > anyone holding write end of ldisc semaphore: > lkml.kernel.org/r/<20171121132855.ajdv4k6swzhvk...@wfg-t540p.sh.intel.com> > lkml.kernel.org/r/<20180907045041.GF1110@shao2-debian> > > They all looked like a missed wake up. > I wasn't lucky enough to reproduce it, but it seems like reader on > another CPU can miss waiter->task update and schedule again, resulting > in indefinite (MAX_SCHEDULE_TIMEOUT) sleep. Certainly, something suspicious is going on. > @@ -118,6 +118,8 @@ static void __ldsem_wake_readers(struct ld_semaphore *sem) > tsk = waiter->task; > smp_mb(); > waiter->task = NULL; > + /* Make sure down_read_failed() will see !waiter->task update */ > + smp_wmb(); > wake_up_process(tsk); Hmm. I think wake_up_process() executes a full memory barrier, because it accesses task state. > put_task_struct(tsk); > } > @@ -217,7 +219,7 @@ down_read_failed(struct ld_semaphore *sem, long count, > long timeout) > for (;;) { > set_current_state(TASK_UNINTERRUPTIBLE); I think that set_current_state() also executes memory barrier. Just because it accesses task state. > - if (!waiter.task) > + if (!READ_ONCE(waiter.task)) > break; > if (!timeout) > break; -ss
linux-next: Tree for Sep 11
Hi all, Changes since 20180910: Dropped trees: xarray, ida (temporarily) The vfs tree lost a build failure, but I still disabled building some samples. The tty tree gained a build failure so I used the version from next-20180910. Non-merge commits (relative to Linus' tree): 2768 3055 files changed, 91468 insertions(+), 62223 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 287 trees (counting Linus' and 66 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (11da3a7f84f1 Linux 4.19-rc3) Merging fixes/master (72358c0b59b7 linux-next: build warnings from the build of Linus' tree) Merging kbuild-current/fixes (11da3a7f84f1 Linux 4.19-rc3) Merging arc-current/for-curr (00a99339f0a3 ARCv2: build: use mcpu=hs38 iso generic mcpu=archs) Merging arm-current/fixes (afc9f65e01cd ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+) Merging arm64-fixes/for-next/fixes (fac880c7d074 arm64: fix erroneous warnings in page freeing functions) Merging m68k-current/for-linus (0986b16ab49b m68k/mac: Use correct PMU response format) Merging powerpc-fixes/fixes (cca19f0b684f powerpc/64s/radix: Fix missing global invalidations when removing copro) Merging sparc/master (df2def49c57b Merge tag 'acpi-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (7c5cca358854 qmi_wwan: Support dynamic config on Quectel EP06) Merging bpf/master (28619527b8a7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging ipsec/master (782710e333a5 xfrm: reset crypto_done when iterating over multiple input xfrms) Merging netfilter/master (7acfda539c0b netfilter: nf_tables: release chain in flushing set) Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates of non-anonymous set) Merging wireless-drivers/master (5b394b2ddf03 Linux 4.19-rc1) Merging mac80211/master (c42055105785 mac80211: fix TX status reporting for ieee80211s) Merging rdma-fixes/for-rc (8f28b178f71c RDMA/mlx4: Ensure that maximal send/receive SGE less than supported by HW) Merging sound-current/for-linus (36f3a6e02c14 ALSA: fireface: fix memory leak in ff400_switch_fetching_mode()) Merging sound-asoc-fixes/for-linus (2c1007fbee3b Merge branch 'asoc-4.19' into asoc-linus) Merging regmap-fixes/for-linus (57361846b52b Linux 4.19-rc2) Merging regulator-fixes/for-linus (cde11023609f Merge branch 'regulator-4.19' into regulator-linus) Merging spi-fixes/for-linus (a3d6be06a30c Merge branch 'spi-4.19' into spi-linus) Merging pci-current/for-linus (342227b42fe8 PCI: pciehp: Fix hot-add vs powerfault detection order) Merging driver-core.current/driver-core-linus (11da3a7f84f1 Linux 4.19-rc3) Merging tty.current/tty-linus (7f2bf7840b74 tty: hvc: hvc_write() fix break condition) Merging usb.current/usb-linus (df3aa13c7bbb Revert "cdc-acm: implement put_char() and flush_chars()") Merging usb-gadget-fixes/fixes (d9707490077b usb: dwc2: Fix call location of dwc2_check_core_endianness) Merging usb-serial-fixes/usb-linus (5dfdd24eb3d3 USB: serial: ti_usb_3410_5052: fix array underflow in completion handler) Merging usb-chipidea-fixes/ci-for-usb-stable (a930d8bd94d8 usb: chipidea: Always buil
linux-next: Tree for Sep 11
Hi all, Changes since 20180910: Dropped trees: xarray, ida (temporarily) The vfs tree lost a build failure, but I still disabled building some samples. The tty tree gained a build failure so I used the version from next-20180910. Non-merge commits (relative to Linus' tree): 2768 3055 files changed, 91468 insertions(+), 62223 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 287 trees (counting Linus' and 66 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (11da3a7f84f1 Linux 4.19-rc3) Merging fixes/master (72358c0b59b7 linux-next: build warnings from the build of Linus' tree) Merging kbuild-current/fixes (11da3a7f84f1 Linux 4.19-rc3) Merging arc-current/for-curr (00a99339f0a3 ARCv2: build: use mcpu=hs38 iso generic mcpu=archs) Merging arm-current/fixes (afc9f65e01cd ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+) Merging arm64-fixes/for-next/fixes (fac880c7d074 arm64: fix erroneous warnings in page freeing functions) Merging m68k-current/for-linus (0986b16ab49b m68k/mac: Use correct PMU response format) Merging powerpc-fixes/fixes (cca19f0b684f powerpc/64s/radix: Fix missing global invalidations when removing copro) Merging sparc/master (df2def49c57b Merge tag 'acpi-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (7c5cca358854 qmi_wwan: Support dynamic config on Quectel EP06) Merging bpf/master (28619527b8a7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging ipsec/master (782710e333a5 xfrm: reset crypto_done when iterating over multiple input xfrms) Merging netfilter/master (7acfda539c0b netfilter: nf_tables: release chain in flushing set) Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates of non-anonymous set) Merging wireless-drivers/master (5b394b2ddf03 Linux 4.19-rc1) Merging mac80211/master (c42055105785 mac80211: fix TX status reporting for ieee80211s) Merging rdma-fixes/for-rc (8f28b178f71c RDMA/mlx4: Ensure that maximal send/receive SGE less than supported by HW) Merging sound-current/for-linus (36f3a6e02c14 ALSA: fireface: fix memory leak in ff400_switch_fetching_mode()) Merging sound-asoc-fixes/for-linus (2c1007fbee3b Merge branch 'asoc-4.19' into asoc-linus) Merging regmap-fixes/for-linus (57361846b52b Linux 4.19-rc2) Merging regulator-fixes/for-linus (cde11023609f Merge branch 'regulator-4.19' into regulator-linus) Merging spi-fixes/for-linus (a3d6be06a30c Merge branch 'spi-4.19' into spi-linus) Merging pci-current/for-linus (342227b42fe8 PCI: pciehp: Fix hot-add vs powerfault detection order) Merging driver-core.current/driver-core-linus (11da3a7f84f1 Linux 4.19-rc3) Merging tty.current/tty-linus (7f2bf7840b74 tty: hvc: hvc_write() fix break condition) Merging usb.current/usb-linus (df3aa13c7bbb Revert "cdc-acm: implement put_char() and flush_chars()") Merging usb-gadget-fixes/fixes (d9707490077b usb: dwc2: Fix call location of dwc2_check_core_endianness) Merging usb-serial-fixes/usb-linus (5dfdd24eb3d3 USB: serial: ti_usb_3410_5052: fix array underflow in completion handler) Merging usb-chipidea-fixes/ci-for-usb-stable (a930d8bd94d8 usb: chipidea: Always buil
Re: [PATCH v9 3/6] kernel/reboot.c: export pm_power_off_prepare
Hi Shawn, On 11.09.2018 03:53, Shawn Guo wrote: > On Mon, Sep 10, 2018 at 04:19:26PM +0100, Mark Brown wrote: >> On Sun, Sep 09, 2018 at 10:00:23AM +0800, Shawn Guo wrote: >>> On Thu, Sep 06, 2018 at 11:15:17AM +0100, Mark Brown wrote: >> I was expecting to get a pull request with the precursor patches in it - the regulator driver seems to get a moderate amount of development so there's a reasonable risk of conflicts. >> >>> What about you create a stable topic branch for regulator patches and I >>> pull it into IMX tree? >> >> Sure, I can send a pull request back but the first two patches in the >> series are ARM ones - are you OK with me just applying them and sending >> them in the pull request or do you want to apply them first? > > I just took another look at the series. It seems that there is no > build-time dependency between regulator and platform patches. So I > think we can handle the series like: > > - You apply patch #3, #4 and #5 on regulator tree; > - I apply the reset on IMX tree. > > There shouldn't be any build or run time regression on either tree, and > the feature that the series adds will be available when both trees get > merged together on -next or Linus tree. > > @Oleksij Is my understanding above correct? Yes. signature.asc Description: OpenPGP digital signature
Re: [PATCH v9 3/6] kernel/reboot.c: export pm_power_off_prepare
Hi Shawn, On 11.09.2018 03:53, Shawn Guo wrote: > On Mon, Sep 10, 2018 at 04:19:26PM +0100, Mark Brown wrote: >> On Sun, Sep 09, 2018 at 10:00:23AM +0800, Shawn Guo wrote: >>> On Thu, Sep 06, 2018 at 11:15:17AM +0100, Mark Brown wrote: >> I was expecting to get a pull request with the precursor patches in it - the regulator driver seems to get a moderate amount of development so there's a reasonable risk of conflicts. >> >>> What about you create a stable topic branch for regulator patches and I >>> pull it into IMX tree? >> >> Sure, I can send a pull request back but the first two patches in the >> series are ARM ones - are you OK with me just applying them and sending >> them in the pull request or do you want to apply them first? > > I just took another look at the series. It seems that there is no > build-time dependency between regulator and platform patches. So I > think we can handle the series like: > > - You apply patch #3, #4 and #5 on regulator tree; > - I apply the reset on IMX tree. > > There shouldn't be any build or run time regression on either tree, and > the feature that the series adds will be available when both trees get > merged together on -next or Linus tree. > > @Oleksij Is my understanding above correct? Yes. signature.asc Description: OpenPGP digital signature
Re: get_arg_page() && ptr_size accounting
On Mon, Sep 10, 2018 at 10:43 AM, Oleg Nesterov wrote: > On 09/10, Oleg Nesterov wrote: >> >> On 09/10, Kees Cook wrote: >> > >> > On Mon, Sep 10, 2018 at 9:41 AM, Kees Cook wrote: >> > > On Mon, Sep 10, 2018 at 5:29 AM, Oleg Nesterov wrote: >> > >> Hi Kees, >> > >> >> > >> I was thinking about backporting the commit 98da7d08850fb8bde >> > >> ("fs/exec.c: account for argv/envp pointers"), but I am not sure >> > >> I understand it... >> > >> > BTW, if you backport that, please get the rest associated with the >> > various Stack Clash related weaknesses: >> >> may be... >> >> > da029c11e6b1 exec: Limit arg stack to at most 75% of _STK_LIM >> >> and I have to admit that I do not understand this patch at all, the >> changelog explains nothing. >> >> Could you explain what this patch actually prevents from? Especially >> now that we have stack_guard_gap? > > forgot to mention... > > with this patch > > #define MAX_ARG_STRINGS 0x7FFF > > doesn't match the reality. perhaps something like below makes sense just > to make it clear, but this is cosmetic. Part of the discussion from back then was basically "we don't have hard-coded limits so programs need to check dynamically themselves". I'd prefer to leave it all well enough alone since I don't want to introduce regressions here in the face of the many many Stack Clash style weaknesses. -Kees -- Kees Cook Pixel Security
Re: get_arg_page() && ptr_size accounting
On Mon, Sep 10, 2018 at 10:43 AM, Oleg Nesterov wrote: > On 09/10, Oleg Nesterov wrote: >> >> On 09/10, Kees Cook wrote: >> > >> > On Mon, Sep 10, 2018 at 9:41 AM, Kees Cook wrote: >> > > On Mon, Sep 10, 2018 at 5:29 AM, Oleg Nesterov wrote: >> > >> Hi Kees, >> > >> >> > >> I was thinking about backporting the commit 98da7d08850fb8bde >> > >> ("fs/exec.c: account for argv/envp pointers"), but I am not sure >> > >> I understand it... >> > >> > BTW, if you backport that, please get the rest associated with the >> > various Stack Clash related weaknesses: >> >> may be... >> >> > da029c11e6b1 exec: Limit arg stack to at most 75% of _STK_LIM >> >> and I have to admit that I do not understand this patch at all, the >> changelog explains nothing. >> >> Could you explain what this patch actually prevents from? Especially >> now that we have stack_guard_gap? > > forgot to mention... > > with this patch > > #define MAX_ARG_STRINGS 0x7FFF > > doesn't match the reality. perhaps something like below makes sense just > to make it clear, but this is cosmetic. Part of the discussion from back then was basically "we don't have hard-coded limits so programs need to check dynamically themselves". I'd prefer to leave it all well enough alone since I don't want to introduce regressions here in the face of the many many Stack Clash style weaknesses. -Kees -- Kees Cook Pixel Security
Re: get_arg_page() && ptr_size accounting
On Mon, Sep 10, 2018 at 10:21 AM, Oleg Nesterov wrote: > On 09/10, Kees Cook wrote: >> >> On Mon, Sep 10, 2018 at 9:41 AM, Kees Cook wrote: >> > On Mon, Sep 10, 2018 at 5:29 AM, Oleg Nesterov wrote: >> >> Hi Kees, >> >> >> >> I was thinking about backporting the commit 98da7d08850fb8bde >> >> ("fs/exec.c: account for argv/envp pointers"), but I am not sure >> >> I understand it... >> >> BTW, if you backport that, please get the rest associated with the >> various Stack Clash related weaknesses: > > may be... > >> da029c11e6b1 exec: Limit arg stack to at most 75% of _STK_LIM > > and I have to admit that I do not understand this patch at all, the > changelog explains nothing. The issue here is with keeping some stack space available for a program to reasonably start execution without doing insane things. The sizes were picked after discussion with Linus while examining the various Stack Clash weaknesses. > Could you explain what this patch actually prevents from? Especially > now that we have stack_guard_gap? One of the many Stack Clash abuses was that it was possible to jump over the stack gap with outrageous environment variables that got expanded in stupid ways by, IIRC, glibc or the dynamic linker. The point here was to be defensive in the face of future weaknesses, and try to be robust in the face of crazy execs but workable under normal (but large) execs. -Kees -- Kees Cook Pixel Security
Re: get_arg_page() && ptr_size accounting
On Mon, Sep 10, 2018 at 10:21 AM, Oleg Nesterov wrote: > On 09/10, Kees Cook wrote: >> >> On Mon, Sep 10, 2018 at 9:41 AM, Kees Cook wrote: >> > On Mon, Sep 10, 2018 at 5:29 AM, Oleg Nesterov wrote: >> >> Hi Kees, >> >> >> >> I was thinking about backporting the commit 98da7d08850fb8bde >> >> ("fs/exec.c: account for argv/envp pointers"), but I am not sure >> >> I understand it... >> >> BTW, if you backport that, please get the rest associated with the >> various Stack Clash related weaknesses: > > may be... > >> da029c11e6b1 exec: Limit arg stack to at most 75% of _STK_LIM > > and I have to admit that I do not understand this patch at all, the > changelog explains nothing. The issue here is with keeping some stack space available for a program to reasonably start execution without doing insane things. The sizes were picked after discussion with Linus while examining the various Stack Clash weaknesses. > Could you explain what this patch actually prevents from? Especially > now that we have stack_guard_gap? One of the many Stack Clash abuses was that it was possible to jump over the stack gap with outrageous environment variables that got expanded in stupid ways by, IIRC, glibc or the dynamic linker. The point here was to be defensive in the face of future weaknesses, and try to be robust in the face of crazy execs but workable under normal (but large) execs. -Kees -- Kees Cook Pixel Security
Re: get_arg_page() && ptr_size accounting
On Mon, Sep 10, 2018 at 10:18 AM, Oleg Nesterov wrote: > On 09/10, Kees Cook wrote: >> >> > So get_arg_page() does >> > >> > /* >> > * Since the stack will hold pointers to the strings, we >> > * must account for them as well. >> > * >> > * The size calculation is the entire vma while each arg >> > page is >> > * built, so each time we get here it's calculating how >> > far it >> > * is currently (rather than each call being just the newly >> > * added size from the arg page). As a result, we need to >> > * always add the entire size of the pointers, so that on >> > the >> > * last call to get_arg_page() we'll actually have the >> > entire >> > * correct size. >> > */ >> > ptr_size = (bprm->argc + bprm->envc) * sizeof(void *); >> > if (ptr_size > ULONG_MAX - size) >> > goto fail; >> > size += ptr_size; >> > >> > OK, but >> > acct_arg_size(bprm, size / PAGE_SIZE); >> > >> > after that doesn't look exactly right. This additional space will be used >> > later >> > when the process already uses bprm->mm, right? so it shouldn't be >> > accounted by >> > acct_arg_size(). >> >> My understanding (based on the comment about acct_arg_size()) is that >> before exec_mmap() happens, the memory used to build the new arguments >> copy memory area gets accounted to the MM_ANONPAGES resource limit of >> the execing process. > > Yes, because otherwise oom-killer can't account the memory populated by > get_arg_page() in bprm->mm. > >> I couldn't find any place where the argc/envc >> pointers were being included in the count, > > But why??? To clarify, > > size += ptr_size; > > after acct_arg_size() is clear and correct, we are going to check rlim_stack > and thus the size should include the pointers we will add in > create_elf_tables(). > > But acct_arg_size() should only account the pages we allocate for bprm->mm, > nothing more. create_elf_tables() does not allocate the memory when it > populates > arg_start/arg_end/env_start/env_end. Plus at this time the process has already > switched to bprm->mm. I've looked more closely now. So, while I agree with you about resource limits, there's a corner case that is better handled here: once we've called flush_old_exec(), we can no longer send errors back to the parent. We just segfault. So, I think it's better to give a resource limit error early, since it is able to do the math early. If we move acct_arg_size() earlier, then the "immediate" resource utilization is checked, but it means it can just segfault later. If we leave it as-is, we account for later memory allocations "too early", but we'll still not be able to run: but we can tell the parent why. I prefer leave it as-is. >> > Not to mention that ptr_size/PAGE_SIZE doesn't look right in any case... >> >> Hm? acct_arg_size() takes pages, not bytes. I think this is correct? >> What doesn't look right to you? > > Please forget. I meant that _if_ we actually wanted to account this additional > memory in bprm->pages, than we would probably need something like > acct_arg_size(size/PAGE_SIZE + DIV_ROUND_UP(ptr_size, PAGE_SIZE)). I'd need to study that more, but that change seems reasonable. :) -Kees -- Kees Cook Pixel Security
Re: get_arg_page() && ptr_size accounting
On Mon, Sep 10, 2018 at 10:18 AM, Oleg Nesterov wrote: > On 09/10, Kees Cook wrote: >> >> > So get_arg_page() does >> > >> > /* >> > * Since the stack will hold pointers to the strings, we >> > * must account for them as well. >> > * >> > * The size calculation is the entire vma while each arg >> > page is >> > * built, so each time we get here it's calculating how >> > far it >> > * is currently (rather than each call being just the newly >> > * added size from the arg page). As a result, we need to >> > * always add the entire size of the pointers, so that on >> > the >> > * last call to get_arg_page() we'll actually have the >> > entire >> > * correct size. >> > */ >> > ptr_size = (bprm->argc + bprm->envc) * sizeof(void *); >> > if (ptr_size > ULONG_MAX - size) >> > goto fail; >> > size += ptr_size; >> > >> > OK, but >> > acct_arg_size(bprm, size / PAGE_SIZE); >> > >> > after that doesn't look exactly right. This additional space will be used >> > later >> > when the process already uses bprm->mm, right? so it shouldn't be >> > accounted by >> > acct_arg_size(). >> >> My understanding (based on the comment about acct_arg_size()) is that >> before exec_mmap() happens, the memory used to build the new arguments >> copy memory area gets accounted to the MM_ANONPAGES resource limit of >> the execing process. > > Yes, because otherwise oom-killer can't account the memory populated by > get_arg_page() in bprm->mm. > >> I couldn't find any place where the argc/envc >> pointers were being included in the count, > > But why??? To clarify, > > size += ptr_size; > > after acct_arg_size() is clear and correct, we are going to check rlim_stack > and thus the size should include the pointers we will add in > create_elf_tables(). > > But acct_arg_size() should only account the pages we allocate for bprm->mm, > nothing more. create_elf_tables() does not allocate the memory when it > populates > arg_start/arg_end/env_start/env_end. Plus at this time the process has already > switched to bprm->mm. I've looked more closely now. So, while I agree with you about resource limits, there's a corner case that is better handled here: once we've called flush_old_exec(), we can no longer send errors back to the parent. We just segfault. So, I think it's better to give a resource limit error early, since it is able to do the math early. If we move acct_arg_size() earlier, then the "immediate" resource utilization is checked, but it means it can just segfault later. If we leave it as-is, we account for later memory allocations "too early", but we'll still not be able to run: but we can tell the parent why. I prefer leave it as-is. >> > Not to mention that ptr_size/PAGE_SIZE doesn't look right in any case... >> >> Hm? acct_arg_size() takes pages, not bytes. I think this is correct? >> What doesn't look right to you? > > Please forget. I meant that _if_ we actually wanted to account this additional > memory in bprm->pages, than we would probably need something like > acct_arg_size(size/PAGE_SIZE + DIV_ROUND_UP(ptr_size, PAGE_SIZE)). I'd need to study that more, but that change seems reasonable. :) -Kees -- Kees Cook Pixel Security
Re: [PATCH v2 3/5] irqchip: RISC-V Local Interrupt Controller Driver
On Tue, Sep 11, 2018 at 3:49 AM, Christoph Hellwig wrote: > On Mon, Sep 10, 2018 at 09:37:59PM +0200, Thomas Gleixner wrote: >> Processor local interrupts really should be architected and there are >> really not that many of them. > > And that is what they are. > >> But well, RISC-V decided obvsiouly not to learn from mistakes made by >> others. > > I don't think that is the case. I think Atup misreads what reserved > means - if you look at section 2.3 of the RISC-V privileged spec > it clearly states that reserved fields are for future use and not > for vendor specific use. I think I understood what reserved means here. If reserved bits are not for vendor specific or implementation specific stuff then it should be mentioned clearly which is not the case. The list of currently defined RISC-V local interrupts will definitely grow based on my experience from ARM/ARM64 world. Like Thomas mentioned, we will definitely end-up having separate irqchip and irq_domain for RISC-V local interrupts for flexibility. Better do it now with separate RISC-V INTC driver. Regards, Anup
Re: [PATCH v2 3/5] irqchip: RISC-V Local Interrupt Controller Driver
On Tue, Sep 11, 2018 at 3:49 AM, Christoph Hellwig wrote: > On Mon, Sep 10, 2018 at 09:37:59PM +0200, Thomas Gleixner wrote: >> Processor local interrupts really should be architected and there are >> really not that many of them. > > And that is what they are. > >> But well, RISC-V decided obvsiouly not to learn from mistakes made by >> others. > > I don't think that is the case. I think Atup misreads what reserved > means - if you look at section 2.3 of the RISC-V privileged spec > it clearly states that reserved fields are for future use and not > for vendor specific use. I think I understood what reserved means here. If reserved bits are not for vendor specific or implementation specific stuff then it should be mentioned clearly which is not the case. The list of currently defined RISC-V local interrupts will definitely grow based on my experience from ARM/ARM64 world. Like Thomas mentioned, we will definitely end-up having separate irqchip and irq_domain for RISC-V local interrupts for flexibility. Better do it now with separate RISC-V INTC driver. Regards, Anup
[PATCH v4] dt-binding: remoteproc: Add QTI ADSP PIL bindings
Add devicetree bindings documentation file for Qualcomm Technolgies Inc ADSP Peripheral Image Loader. Signed-off-by: Rohit kumar --- Changes since v3: Addressed comments given by Rob .../bindings/remoteproc/qcom,adsp-pil.txt | 126 + 1 file changed, 126 insertions(+) create mode 100644 Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt new file mode 100644 index 000..06558de --- /dev/null +++ b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt @@ -0,0 +1,126 @@ +Qualcomm Technology Inc. ADSP Peripheral Image Loader + +This document defines the binding for a component that loads and boots firmware +on the Qualcomm Technology Inc. ADSP Hexagon core. + +- compatible: + Usage: required + Value type: + Definition: must be one of: + "qcom,sdm845-adsp-pil" + +- reg: + Usage: required + Value type: + Definition: must specify the base address and size of the qdsp6ss register + +- interrupts-extended: + Usage: required + Value type: + Definition: must list the watchdog, fatal IRQs ready, handover and + stop-ack IRQs + +- interrupt-names: + Usage: required + Value type: + Definition: must be "wdog", "fatal", "ready", "handover", "stop-ack" + +- clocks: + Usage: required + Value type: + Definition: List of 8 phandle and clock specifier pairs for the adsp. + +- clock-names: + Usage: required + Value type: + Definition: List of clock input name strings sorted in the same + order as the clocks property. Definition must have + "xo", "sway_cbcr", "lpass_aon", "lpass_ahbs_aon_cbcr", + "lpass_ahbm_aon_cbcr", "qdsp6ss_xo", "qdsp6ss_sleep" + and "qdsp6ss_core". + +- power-domains: + Usage: required + Value type: + Definition: reference to cx power domain node. + +- resets: + Usage: required + Value type: + Definition: reference to the list of 2 reset-controller for the adsp. + +- reset-names: +Usage: required +Value type: +Definition: must be "pdc_sync" and "cc_lpass" + +- qcom,halt-regs: + Usage: required + Value type: + Definition: a phandle reference to a syscon representing TCSR followed + by the offset within syscon for lpass halt register. + +- memory-region: + Usage: required + Value type: + Definition: reference to the reserved-memory for the ADSP + +- qcom,smem-states: + Usage: required + Value type: + Definition: reference to the smem state for requesting the ADSP to + shut down + +- qcom,smem-state-names: + Usage: required + Value type: + Definition: must be "stop" + + += SUBNODES +The adsp node may have an subnode named "glink-edge" that describes the +communication edge, channels and devices related to the ADSP. +See ../soc/qcom/qcom,glink.txt for details on how to describe these. + += EXAMPLE +The following example describes the resources needed to boot control the +ADSP, as it is found on SDM845 boards. + adsp-pil { + compatible = "qcom,sdm845-adsp-pil"; + + reg = <0x1730 0x40c>; + + interrupts-extended = < 0 162 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 0 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 1 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 2 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 3 IRQ_TYPE_EDGE_RISING>; + interrupt-names = "wdog", "fatal", "ready", + "handover", "stop-ack"; + + clocks = < RPMH_CXO_CLK>, + < GCC_LPASS_SWAY_CLK>, + < LPASS_AUDIO_WRAPPER_AON_CLK>, + < LPASS_Q6SS_AHBS_AON_CLK>, + < LPASS_Q6SS_AHBM_AON_CLK>, + < LPASS_QDSP6SS_XO_CLK>, + < LPASS_QDSP6SS_SLEEP_CLK>, + < LPASS_QDSP6SS_CORE_CLK>; + clock-names = "xo", "sway_cbcr", "lpass_aon", + "lpass_ahbs_aon_cbcr", + "lpass_ahbm_aon_cbcr", "qdsp6ss_xo", + "qdsp6ss_sleep", "qdsp6ss_core"; + + power-domains = < SDM845_CX>; + + resets = <_reset PDC_AUDIO_SYNC_RESET>, +<_reset AOSS_CC_LPASS_RESTART>; + reset-names = "pdc_sync", "cc_lpass"; + + qcom,halt-regs = <_mutex_regs 0x22000>; + + memory-region = <_adsp_mem>; + + qcom,smem-states = <_smp2p_out 0>; + qcom,smem-state-names = "stop"; + }; -- Qualcomm India Private Limited, on
[PATCH v4] dt-binding: remoteproc: Add QTI ADSP PIL bindings
Add devicetree bindings documentation file for Qualcomm Technolgies Inc ADSP Peripheral Image Loader. Signed-off-by: Rohit kumar --- Changes since v3: Addressed comments given by Rob .../bindings/remoteproc/qcom,adsp-pil.txt | 126 + 1 file changed, 126 insertions(+) create mode 100644 Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt new file mode 100644 index 000..06558de --- /dev/null +++ b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt @@ -0,0 +1,126 @@ +Qualcomm Technology Inc. ADSP Peripheral Image Loader + +This document defines the binding for a component that loads and boots firmware +on the Qualcomm Technology Inc. ADSP Hexagon core. + +- compatible: + Usage: required + Value type: + Definition: must be one of: + "qcom,sdm845-adsp-pil" + +- reg: + Usage: required + Value type: + Definition: must specify the base address and size of the qdsp6ss register + +- interrupts-extended: + Usage: required + Value type: + Definition: must list the watchdog, fatal IRQs ready, handover and + stop-ack IRQs + +- interrupt-names: + Usage: required + Value type: + Definition: must be "wdog", "fatal", "ready", "handover", "stop-ack" + +- clocks: + Usage: required + Value type: + Definition: List of 8 phandle and clock specifier pairs for the adsp. + +- clock-names: + Usage: required + Value type: + Definition: List of clock input name strings sorted in the same + order as the clocks property. Definition must have + "xo", "sway_cbcr", "lpass_aon", "lpass_ahbs_aon_cbcr", + "lpass_ahbm_aon_cbcr", "qdsp6ss_xo", "qdsp6ss_sleep" + and "qdsp6ss_core". + +- power-domains: + Usage: required + Value type: + Definition: reference to cx power domain node. + +- resets: + Usage: required + Value type: + Definition: reference to the list of 2 reset-controller for the adsp. + +- reset-names: +Usage: required +Value type: +Definition: must be "pdc_sync" and "cc_lpass" + +- qcom,halt-regs: + Usage: required + Value type: + Definition: a phandle reference to a syscon representing TCSR followed + by the offset within syscon for lpass halt register. + +- memory-region: + Usage: required + Value type: + Definition: reference to the reserved-memory for the ADSP + +- qcom,smem-states: + Usage: required + Value type: + Definition: reference to the smem state for requesting the ADSP to + shut down + +- qcom,smem-state-names: + Usage: required + Value type: + Definition: must be "stop" + + += SUBNODES +The adsp node may have an subnode named "glink-edge" that describes the +communication edge, channels and devices related to the ADSP. +See ../soc/qcom/qcom,glink.txt for details on how to describe these. + += EXAMPLE +The following example describes the resources needed to boot control the +ADSP, as it is found on SDM845 boards. + adsp-pil { + compatible = "qcom,sdm845-adsp-pil"; + + reg = <0x1730 0x40c>; + + interrupts-extended = < 0 162 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 0 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 1 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 2 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 3 IRQ_TYPE_EDGE_RISING>; + interrupt-names = "wdog", "fatal", "ready", + "handover", "stop-ack"; + + clocks = < RPMH_CXO_CLK>, + < GCC_LPASS_SWAY_CLK>, + < LPASS_AUDIO_WRAPPER_AON_CLK>, + < LPASS_Q6SS_AHBS_AON_CLK>, + < LPASS_Q6SS_AHBM_AON_CLK>, + < LPASS_QDSP6SS_XO_CLK>, + < LPASS_QDSP6SS_SLEEP_CLK>, + < LPASS_QDSP6SS_CORE_CLK>; + clock-names = "xo", "sway_cbcr", "lpass_aon", + "lpass_ahbs_aon_cbcr", + "lpass_ahbm_aon_cbcr", "qdsp6ss_xo", + "qdsp6ss_sleep", "qdsp6ss_core"; + + power-domains = < SDM845_CX>; + + resets = <_reset PDC_AUDIO_SYNC_RESET>, +<_reset AOSS_CC_LPASS_RESTART>; + reset-names = "pdc_sync", "cc_lpass"; + + qcom,halt-regs = <_mutex_regs 0x22000>; + + memory-region = <_adsp_mem>; + + qcom,smem-states = <_smp2p_out 0>; + qcom,smem-state-names = "stop"; + }; -- Qualcomm India Private Limited, on
linux-next: build warning after merge of the tip tree
Hi all, After merging the tip tree, today's linux-next build (x86_64 allnoconfig) produced this warning: arch/x86/kernel/cpu/common.c: In function 'syscall_init': arch/x86/kernel/cpu/common.c:1534:6: warning: unused variable 'cpu' [-Wunused-variable] int cpu = smp_processor_id(); ^~~ Introduced by commit 86635715ee42 ("x86/pti/64: Remove the SYSCALL64 entry trampoline") -- Cheers, Stephen Rothwell pgpBPEpqs90C8.pgp Description: OpenPGP digital signature
linux-next: build warning after merge of the tip tree
Hi all, After merging the tip tree, today's linux-next build (x86_64 allnoconfig) produced this warning: arch/x86/kernel/cpu/common.c: In function 'syscall_init': arch/x86/kernel/cpu/common.c:1534:6: warning: unused variable 'cpu' [-Wunused-variable] int cpu = smp_processor_id(); ^~~ Introduced by commit 86635715ee42 ("x86/pti/64: Remove the SYSCALL64 entry trampoline") -- Cheers, Stephen Rothwell pgpBPEpqs90C8.pgp Description: OpenPGP digital signature
Re: [PATCH v3 2/2] remoteproc: qcom: Introduce Non-PAS ADSP PIL driver
Thanks Bjorn for reviewing. On 9/11/2018 12:01 AM, Bjorn Andersson wrote: On Mon 03 Sep 04:52 PDT 2018, Rohit kumar wrote: This adds Non PAS ADSP PIL driver for Qualcomm Technologies Inc SoCs. Added initial support for SDM845 with ADSP bootup and shutdown operation handled from Application Processor SubSystem(APSS). Signed-off-by: Rohit kumar Thanks for the changes Rohit, this looks good. Once we hear from DT maintainers that patch 1 can be applied I will update the name of the file and driver as I apply it to match the naming scheme I'm aiming for - no need for you to resend because of this. Sure, I will just update dt-bindings with addressing some comments given by Rob. --- drivers/remoteproc/Kconfig | 14 ++ drivers/remoteproc/Makefile| 1 + drivers/remoteproc/qcom_adsp_pil.c | 500 + 3 files changed, 515 insertions(+) create mode 100644 drivers/remoteproc/qcom_adsp_pil.c diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig index c98c0b2..445de2d 100644 --- a/drivers/remoteproc/Kconfig +++ b/drivers/remoteproc/Kconfig @@ -139,6 +139,20 @@ config QCOM_Q6V5_WCSS Say y here to support the Qualcomm Peripheral Image Loader for the Hexagon V5 based WCSS remote processors. +config QCOM_ADSP_PIL I will make this QCOM_Q6V5_ADSP [..] diff --git a/drivers/remoteproc/qcom_adsp_pil.c b/drivers/remoteproc/qcom_adsp_pil.c Make this qcom_q6v5_adsp.c [..] +static struct platform_driver adsp_pil_driver = { + .probe = adsp_probe, + .remove = adsp_remove, + .driver = { + .name = "qcom_adsp_pil", and this qcom_q6v5_adsp". + .of_match_table = adsp_of_match, + }, +}; Please let me know if you have any objections to this. Naming looks fine. Thanks, Rohit Regards, Bjorn
Re: [PATCH v3 2/2] remoteproc: qcom: Introduce Non-PAS ADSP PIL driver
Thanks Bjorn for reviewing. On 9/11/2018 12:01 AM, Bjorn Andersson wrote: On Mon 03 Sep 04:52 PDT 2018, Rohit kumar wrote: This adds Non PAS ADSP PIL driver for Qualcomm Technologies Inc SoCs. Added initial support for SDM845 with ADSP bootup and shutdown operation handled from Application Processor SubSystem(APSS). Signed-off-by: Rohit kumar Thanks for the changes Rohit, this looks good. Once we hear from DT maintainers that patch 1 can be applied I will update the name of the file and driver as I apply it to match the naming scheme I'm aiming for - no need for you to resend because of this. Sure, I will just update dt-bindings with addressing some comments given by Rob. --- drivers/remoteproc/Kconfig | 14 ++ drivers/remoteproc/Makefile| 1 + drivers/remoteproc/qcom_adsp_pil.c | 500 + 3 files changed, 515 insertions(+) create mode 100644 drivers/remoteproc/qcom_adsp_pil.c diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig index c98c0b2..445de2d 100644 --- a/drivers/remoteproc/Kconfig +++ b/drivers/remoteproc/Kconfig @@ -139,6 +139,20 @@ config QCOM_Q6V5_WCSS Say y here to support the Qualcomm Peripheral Image Loader for the Hexagon V5 based WCSS remote processors. +config QCOM_ADSP_PIL I will make this QCOM_Q6V5_ADSP [..] diff --git a/drivers/remoteproc/qcom_adsp_pil.c b/drivers/remoteproc/qcom_adsp_pil.c Make this qcom_q6v5_adsp.c [..] +static struct platform_driver adsp_pil_driver = { + .probe = adsp_probe, + .remove = adsp_remove, + .driver = { + .name = "qcom_adsp_pil", and this qcom_q6v5_adsp". + .of_match_table = adsp_of_match, + }, +}; Please let me know if you have any objections to this. Naming looks fine. Thanks, Rohit Regards, Bjorn
Re: [PATCH v3 1/2] dt-binding: remoteproc: Add QTI ADSP PIL bindings
Thanks Rob for reviewing. On 9/11/2018 1:31 AM, Rob Herring wrote: On Mon, Sep 03, 2018 at 05:22:39PM +0530, Rohit kumar wrote: Add devicetree bindings documentation file for Qualcomm Technolgies Inc ADSP Peripheral Image Loader. Signed-off-by: Rohit kumar --- .../bindings/remoteproc/qcom,adsp-pil.txt | 123 + 1 file changed, 123 insertions(+) create mode 100644 Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt new file mode 100644 index 000..f1c215a --- /dev/null +++ b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt @@ -0,0 +1,123 @@ +Qualcomm Technology Inc. ADSP Peripheral Image Loader + +This document defines the binding for a component that loads and boots firmware +on the Qualcomm Technology Inc. ADSP Hexagon core. + +- compatible: + Usage: required + Value type: + Definition: must be one of: + "qcom,sdm845-adsp-pil" + +- reg: + Usage: required + Value type: + Definition: must specify the base address and size of the qdsp6ss register + +- interrupts-extended: + Usage: required + Value type: + Definition: must list the watchdog, fatal IRQs ready, handover and + stop-ack IRQs + +- interrupt-names: + Usage: required + Value type: + Definition: must be "wdog", "fatal", "ready", "handover", "stop-ack" + +- clocks: + Usage: required + Value type: + Definition: List of phandle and clock specifier pairs How many clocks? + +- clock-names: + Usage: required + Value type: + Definition: List of clock input name strings sorted in the same + order as the clocks property. What are the names? I will update these in next spin. + +- power-domains: + Usage: required + Value type: + Definition: reference to cx power domain node. + +- resets: + Usage: required + Value type: + Definition: reference to the reset-controller for the lpass How many? + +- reset-names: +Usage: required +Value type: +Definition: must be "pdc_sync" and "cc_lpass" + +- qcom,halt-regs: + Usage: required + Value type: + Definition: a phandle reference to a syscon representing TCSR followed + by the offset within syscon for lpass halt register. + +- memory-region: + Usage: required + Value type: + Definition: reference to the reserved-memory for the ADSP + +- qcom,smem-states: + Usage: required + Value type: + Definition: reference to the smem state for requesting the ADSP to + shut down + +- qcom,smem-state-names: + Usage: required + Value type: + Definition: must be "stop" + + += SUBNODES +The adsp node may have an subnode named "glink-edge" that describes the +communication edge, channels and devices related to the ADSP. +See ../soc/qcom/qcom,glink.txt for details on how to describe these. + += EXAMPLE +The following example describes the resources needed to boot control the +ADSP, as it is found on SDM845 boards. + adsp-pil { + compatible = "qcom,sdm845-adsp-pil"; + + reg = <0x1730 0x40c>; + + interrupts-extended = < 0 162 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 0 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 1 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 2 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 3 IRQ_TYPE_EDGE_RISING>; + interrupt-names = "wdog", "fatal", "ready", + "handover", "stop-ack"; + + clocks = < RPMH_CXO_CLK>, + < GCC_LPASS_SWAY_CLK>, + < LPASS_AUDIO_WRAPPER_AON_CLK>, + < LPASS_Q6SS_AHBS_AON_CLK>, + < LPASS_Q6SS_AHBM_AON_CLK>, + < LPASS_QDSP6SS_XO_CLK>, + < LPASS_QDSP6SS_SLEEP_CLK>, + < LPASS_QDSP6SS_CORE_CLK>; + clock-names = "xo", "sway_cbcr", "lpass_aon", + "lpass_ahbs_aon_cbcr", + "lpass_ahbm_aon_cbcr", "qdsp6ss_xo", + "qdsp6ss_sleep", "qdsp6ss_core"; + + power-domains = < SDM845_CX>; + + resets = <_reset PDC_AUDIO_SYNC_RESET>, +<_reset AOSS_CC_LPASS_RESTART>; + reset-names = "pdc_sync", "cc_lpass"; + + qcom,halt-regs = <_mutex_regs 0x22000>; + + memory-region = <_adsp_mem>; + + qcom,smem-states = <_smp2p_out 0>; + qcom,smem-state-names = "stop"; + }; -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a
Re: [PATCH v3 1/2] dt-binding: remoteproc: Add QTI ADSP PIL bindings
Thanks Rob for reviewing. On 9/11/2018 1:31 AM, Rob Herring wrote: On Mon, Sep 03, 2018 at 05:22:39PM +0530, Rohit kumar wrote: Add devicetree bindings documentation file for Qualcomm Technolgies Inc ADSP Peripheral Image Loader. Signed-off-by: Rohit kumar --- .../bindings/remoteproc/qcom,adsp-pil.txt | 123 + 1 file changed, 123 insertions(+) create mode 100644 Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt new file mode 100644 index 000..f1c215a --- /dev/null +++ b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt @@ -0,0 +1,123 @@ +Qualcomm Technology Inc. ADSP Peripheral Image Loader + +This document defines the binding for a component that loads and boots firmware +on the Qualcomm Technology Inc. ADSP Hexagon core. + +- compatible: + Usage: required + Value type: + Definition: must be one of: + "qcom,sdm845-adsp-pil" + +- reg: + Usage: required + Value type: + Definition: must specify the base address and size of the qdsp6ss register + +- interrupts-extended: + Usage: required + Value type: + Definition: must list the watchdog, fatal IRQs ready, handover and + stop-ack IRQs + +- interrupt-names: + Usage: required + Value type: + Definition: must be "wdog", "fatal", "ready", "handover", "stop-ack" + +- clocks: + Usage: required + Value type: + Definition: List of phandle and clock specifier pairs How many clocks? + +- clock-names: + Usage: required + Value type: + Definition: List of clock input name strings sorted in the same + order as the clocks property. What are the names? I will update these in next spin. + +- power-domains: + Usage: required + Value type: + Definition: reference to cx power domain node. + +- resets: + Usage: required + Value type: + Definition: reference to the reset-controller for the lpass How many? + +- reset-names: +Usage: required +Value type: +Definition: must be "pdc_sync" and "cc_lpass" + +- qcom,halt-regs: + Usage: required + Value type: + Definition: a phandle reference to a syscon representing TCSR followed + by the offset within syscon for lpass halt register. + +- memory-region: + Usage: required + Value type: + Definition: reference to the reserved-memory for the ADSP + +- qcom,smem-states: + Usage: required + Value type: + Definition: reference to the smem state for requesting the ADSP to + shut down + +- qcom,smem-state-names: + Usage: required + Value type: + Definition: must be "stop" + + += SUBNODES +The adsp node may have an subnode named "glink-edge" that describes the +communication edge, channels and devices related to the ADSP. +See ../soc/qcom/qcom,glink.txt for details on how to describe these. + += EXAMPLE +The following example describes the resources needed to boot control the +ADSP, as it is found on SDM845 boards. + adsp-pil { + compatible = "qcom,sdm845-adsp-pil"; + + reg = <0x1730 0x40c>; + + interrupts-extended = < 0 162 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 0 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 1 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 2 IRQ_TYPE_EDGE_RISING>, + <_smp2p_in 3 IRQ_TYPE_EDGE_RISING>; + interrupt-names = "wdog", "fatal", "ready", + "handover", "stop-ack"; + + clocks = < RPMH_CXO_CLK>, + < GCC_LPASS_SWAY_CLK>, + < LPASS_AUDIO_WRAPPER_AON_CLK>, + < LPASS_Q6SS_AHBS_AON_CLK>, + < LPASS_Q6SS_AHBM_AON_CLK>, + < LPASS_QDSP6SS_XO_CLK>, + < LPASS_QDSP6SS_SLEEP_CLK>, + < LPASS_QDSP6SS_CORE_CLK>; + clock-names = "xo", "sway_cbcr", "lpass_aon", + "lpass_ahbs_aon_cbcr", + "lpass_ahbm_aon_cbcr", "qdsp6ss_xo", + "qdsp6ss_sleep", "qdsp6ss_core"; + + power-domains = < SDM845_CX>; + + resets = <_reset PDC_AUDIO_SYNC_RESET>, +<_reset AOSS_CC_LPASS_RESTART>; + reset-names = "pdc_sync", "cc_lpass"; + + qcom,halt-regs = <_mutex_regs 0x22000>; + + memory-region = <_adsp_mem>; + + qcom,smem-states = <_smp2p_out 0>; + qcom,smem-state-names = "stop"; + }; -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a
Re: [RFC PATCH 1/5] RISC-V: Make IPI triggering flexible
On Mon, Sep 10, 2018 at 7:04 PM, Christoph Hellwig wrote: > On Thu, Sep 06, 2018 at 04:15:14PM +0530, Anup Patel wrote: >> This patch is doing two things: >> 1. Allow IRQCHIP driver to provide IPI trigger mechanism > > And the big questions is why do we want that? The last thing we > want is for people to "innovate" on how they deliver IPIs. RISC-V > has defined an SBI interface for it to hide all the details, and > we should not try to handle systems that are not SBI compliant. > > Eventuall we might want to revisit the SBI to improve on shortcomings > if there are any, but we should not allow random irqchip drivers to > override this. I have already dropped this part from the PATCH v2. > >> 2. Have more generic IPI handler in arch/riscv so that IRQCHIP driver >> can call it > > And that is rather irrelevant without 1) above. Nopes, this is required for the RISC-V INTC driver. Regards, Anup
Re: [RFC PATCH 1/5] RISC-V: Make IPI triggering flexible
On Mon, Sep 10, 2018 at 7:04 PM, Christoph Hellwig wrote: > On Thu, Sep 06, 2018 at 04:15:14PM +0530, Anup Patel wrote: >> This patch is doing two things: >> 1. Allow IRQCHIP driver to provide IPI trigger mechanism > > And the big questions is why do we want that? The last thing we > want is for people to "innovate" on how they deliver IPIs. RISC-V > has defined an SBI interface for it to hide all the details, and > we should not try to handle systems that are not SBI compliant. > > Eventuall we might want to revisit the SBI to improve on shortcomings > if there are any, but we should not allow random irqchip drivers to > override this. I have already dropped this part from the PATCH v2. > >> 2. Have more generic IPI handler in arch/riscv so that IRQCHIP driver >> can call it > > And that is rather irrelevant without 1) above. Nopes, this is required for the RISC-V INTC driver. Regards, Anup
Compiler flags for libapi and libtraceevent
I noticed that tools/lib/api/Makefile has these conditional assignments, similar to tools/perf/Makefile.config: ifeq ($(DEBUG),0) ifeq ($(CC_NO_CLANG), 0) CFLAGS += -O3 else CFLAGS += -O6 endif endif ifeq ($(DEBUG),0) CFLAGS += -D_FORTIFY_SOURCE endif But it doesn't set DEBUG to 0 by default, and nothing under tools/perf exports its value of CFLAGS or DEBUG. tools/lib/traceevent/Makefile doesn't seem to have any logic to enable optimisation or Fortify. Shouldn't these libraries both have optimisations and Fortify turned on by default, like perf itself? Ben. -- Ben Hutchings Computers are not intelligent. They only think they are. signature.asc Description: This is a digitally signed message part
Compiler flags for libapi and libtraceevent
I noticed that tools/lib/api/Makefile has these conditional assignments, similar to tools/perf/Makefile.config: ifeq ($(DEBUG),0) ifeq ($(CC_NO_CLANG), 0) CFLAGS += -O3 else CFLAGS += -O6 endif endif ifeq ($(DEBUG),0) CFLAGS += -D_FORTIFY_SOURCE endif But it doesn't set DEBUG to 0 by default, and nothing under tools/perf exports its value of CFLAGS or DEBUG. tools/lib/traceevent/Makefile doesn't seem to have any logic to enable optimisation or Fortify. Shouldn't these libraries both have optimisations and Fortify turned on by default, like perf itself? Ben. -- Ben Hutchings Computers are not intelligent. They only think they are. signature.asc Description: This is a digitally signed message part
linux-next: build failure after merge of the tty tree
Hi Greg, After merging the tty tree, today's linux-next build (arm multi_v7_defconfig) failed like this: drivers/mfd/at91-usart.c:51:34: error: array type has incomplete element type 'struct of_device_id' static const struct of_device_id at91_usart_mode_of_match[] = { ^~~~ drivers/mfd/at91-usart.c:52:4: error: field name not in record or union initializer { .compatible = "atmel,at91rm9200-usart" }, ^ drivers/mfd/at91-usart.c:52:4: note: (near initialization for 'at91_usart_mode_of_match') drivers/mfd/at91-usart.c:53:4: error: field name not in record or union initializer { .compatible = "atmel,at91sam9260-usart" }, ^ drivers/mfd/at91-usart.c:53:4: note: (near initialization for 'at91_usart_mode_of_match') drivers/mfd/at91-usart.c:51:34: warning: 'at91_usart_mode_of_match' defined but not used [-Wunused-variable] static const struct of_device_id at91_usart_mode_of_match[] = { ^~~~ Caused by commit 7d3aa342cef7 ("mfd: at91-usart: Add MFD driver for USART") Forgot to include ? I used the version of the tty tree from next-20180910 for today. -- Cheers, Stephen Rothwell pgp98wiPJXPhs.pgp Description: OpenPGP digital signature
linux-next: build failure after merge of the tty tree
Hi Greg, After merging the tty tree, today's linux-next build (arm multi_v7_defconfig) failed like this: drivers/mfd/at91-usart.c:51:34: error: array type has incomplete element type 'struct of_device_id' static const struct of_device_id at91_usart_mode_of_match[] = { ^~~~ drivers/mfd/at91-usart.c:52:4: error: field name not in record or union initializer { .compatible = "atmel,at91rm9200-usart" }, ^ drivers/mfd/at91-usart.c:52:4: note: (near initialization for 'at91_usart_mode_of_match') drivers/mfd/at91-usart.c:53:4: error: field name not in record or union initializer { .compatible = "atmel,at91sam9260-usart" }, ^ drivers/mfd/at91-usart.c:53:4: note: (near initialization for 'at91_usart_mode_of_match') drivers/mfd/at91-usart.c:51:34: warning: 'at91_usart_mode_of_match' defined but not used [-Wunused-variable] static const struct of_device_id at91_usart_mode_of_match[] = { ^~~~ Caused by commit 7d3aa342cef7 ("mfd: at91-usart: Add MFD driver for USART") Forgot to include ? I used the version of the tty tree from next-20180910 for today. -- Cheers, Stephen Rothwell pgp98wiPJXPhs.pgp Description: OpenPGP digital signature
[PATCH v12 1/2] leds: core: Introduce LED pattern trigger
This patch adds one new led trigger that LED device can configure the software or hardware pattern and trigger it. Consumers can write 'pattern' file to enable the software pattern which alters the brightness for the specified duration with one software timer. Moreover consumers can write 'hw_pattern' file to enable the hardware pattern for some LED controllers which can autonomously control brightness over time, according to some preprogrammed hardware patterns. Signed-off-by: Raphael Teysseyre Signed-off-by: Baolin Wang --- Changes from v11: - Change -1 means repeat indefinitely. Changes from v10: - Change 'int' to 'u32' for delta_t field. Changes from v9: - None. Changes from v8: - None. Changes from v7: - Move the SC27XX hardware patterns description into its own ABI file. Changes from v6: - Improve commit message. - Optimize the description of the hw_pattern file. - Simplify some logics. Changes from v5: - Add one 'hw_pattern' file for hardware patterns. Changes from v4: - Change the repeat file to return the originally written number. - Improve comments. - Fix some build warnings. Changes from v3: - Reset pattern number to 0 if user provides incorrect pattern string. - Support one pattern. Changes from v2: - Remove hardware_pattern boolen. - Chnage the pattern string format. Changes from v1: - Use ATTRIBUTE_GROUPS() to define attributes. - Introduce hardware_pattern flag to determine if software pattern or hardware pattern. - Re-implement pattern_trig_store_pattern() function. - Remove pattern_get() interface. - Improve comments. - Other small optimization. --- .../ABI/testing/sysfs-class-led-trigger-pattern| 39 +++ drivers/leds/trigger/Kconfig |7 + drivers/leds/trigger/Makefile |1 + drivers/leds/trigger/ledtrig-pattern.c | 344 include/linux/leds.h | 15 + 5 files changed, 406 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-led-trigger-pattern create mode 100644 drivers/leds/trigger/ledtrig-pattern.c diff --git a/Documentation/ABI/testing/sysfs-class-led-trigger-pattern b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern new file mode 100644 index 000..afff9e3 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern @@ -0,0 +1,39 @@ +What: /sys/class/leds//pattern +Date: September 2018 +KernelVersion: 4.20 +Description: + Specify a software pattern for the LED, that supports altering + the brightness for the specified duration with one software + timer. + + The pattern is given by a series of tuples, of brightness and + duration (ms). The LED is expected to traverse the series and + each brightness value for the specified duration. Duration of + 0 means brightness should immediately change to new value. + + The format of the software pattern values should be: + "brightness_1 duration_1 brightness_2 duration_2 brightness_3 + duration_3 ...". + +What: /sys/class/leds//hw_pattern +Date: September 2018 +KernelVersion: 4.20 +Description: + Specify a hardware pattern for the LED, for LED hardware that + supports autonomously controlling brightness over time, according + to some preprogrammed hardware patterns. + + Since different LED hardware can have different semantics of + hardware patterns, each driver is expected to provide its own + description for the hardware patterns in their ABI documentation + file. + +What: /sys/class/leds//repeat +Date: September 2018 +KernelVersion: 4.20 +Description: + Specify a pattern repeat number. -1 means repeat indefinitely, + other negative numbers and number 0 are invalid. + + This file will always return the originally written repeat + number. diff --git a/drivers/leds/trigger/Kconfig b/drivers/leds/trigger/Kconfig index 4018af7..b76fc3c 100644 --- a/drivers/leds/trigger/Kconfig +++ b/drivers/leds/trigger/Kconfig @@ -129,4 +129,11 @@ config LEDS_TRIGGER_NETDEV This allows LEDs to be controlled by network device activity. If unsure, say Y. +config LEDS_TRIGGER_PATTERN + tristate "LED Pattern Trigger" + help + This allows LEDs to be controlled by a software or hardware pattern + which is a series of tuples, of brightness and duration (ms). + If unsure, say N + endif # LEDS_TRIGGERS diff --git a/drivers/leds/trigger/Makefile b/drivers/leds/trigger/Makefile index f3cfe19..9bcb64e 100644 --- a/drivers/leds/trigger/Makefile +++ b/drivers/leds/trigger/Makefile @@ -13,3 +13,4 @@ obj-$(CONFIG_LEDS_TRIGGER_TRANSIENT) += ledtrig-transient.o
[PATCH v12 1/2] leds: core: Introduce LED pattern trigger
This patch adds one new led trigger that LED device can configure the software or hardware pattern and trigger it. Consumers can write 'pattern' file to enable the software pattern which alters the brightness for the specified duration with one software timer. Moreover consumers can write 'hw_pattern' file to enable the hardware pattern for some LED controllers which can autonomously control brightness over time, according to some preprogrammed hardware patterns. Signed-off-by: Raphael Teysseyre Signed-off-by: Baolin Wang --- Changes from v11: - Change -1 means repeat indefinitely. Changes from v10: - Change 'int' to 'u32' for delta_t field. Changes from v9: - None. Changes from v8: - None. Changes from v7: - Move the SC27XX hardware patterns description into its own ABI file. Changes from v6: - Improve commit message. - Optimize the description of the hw_pattern file. - Simplify some logics. Changes from v5: - Add one 'hw_pattern' file for hardware patterns. Changes from v4: - Change the repeat file to return the originally written number. - Improve comments. - Fix some build warnings. Changes from v3: - Reset pattern number to 0 if user provides incorrect pattern string. - Support one pattern. Changes from v2: - Remove hardware_pattern boolen. - Chnage the pattern string format. Changes from v1: - Use ATTRIBUTE_GROUPS() to define attributes. - Introduce hardware_pattern flag to determine if software pattern or hardware pattern. - Re-implement pattern_trig_store_pattern() function. - Remove pattern_get() interface. - Improve comments. - Other small optimization. --- .../ABI/testing/sysfs-class-led-trigger-pattern| 39 +++ drivers/leds/trigger/Kconfig |7 + drivers/leds/trigger/Makefile |1 + drivers/leds/trigger/ledtrig-pattern.c | 344 include/linux/leds.h | 15 + 5 files changed, 406 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-led-trigger-pattern create mode 100644 drivers/leds/trigger/ledtrig-pattern.c diff --git a/Documentation/ABI/testing/sysfs-class-led-trigger-pattern b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern new file mode 100644 index 000..afff9e3 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern @@ -0,0 +1,39 @@ +What: /sys/class/leds//pattern +Date: September 2018 +KernelVersion: 4.20 +Description: + Specify a software pattern for the LED, that supports altering + the brightness for the specified duration with one software + timer. + + The pattern is given by a series of tuples, of brightness and + duration (ms). The LED is expected to traverse the series and + each brightness value for the specified duration. Duration of + 0 means brightness should immediately change to new value. + + The format of the software pattern values should be: + "brightness_1 duration_1 brightness_2 duration_2 brightness_3 + duration_3 ...". + +What: /sys/class/leds//hw_pattern +Date: September 2018 +KernelVersion: 4.20 +Description: + Specify a hardware pattern for the LED, for LED hardware that + supports autonomously controlling brightness over time, according + to some preprogrammed hardware patterns. + + Since different LED hardware can have different semantics of + hardware patterns, each driver is expected to provide its own + description for the hardware patterns in their ABI documentation + file. + +What: /sys/class/leds//repeat +Date: September 2018 +KernelVersion: 4.20 +Description: + Specify a pattern repeat number. -1 means repeat indefinitely, + other negative numbers and number 0 are invalid. + + This file will always return the originally written repeat + number. diff --git a/drivers/leds/trigger/Kconfig b/drivers/leds/trigger/Kconfig index 4018af7..b76fc3c 100644 --- a/drivers/leds/trigger/Kconfig +++ b/drivers/leds/trigger/Kconfig @@ -129,4 +129,11 @@ config LEDS_TRIGGER_NETDEV This allows LEDs to be controlled by network device activity. If unsure, say Y. +config LEDS_TRIGGER_PATTERN + tristate "LED Pattern Trigger" + help + This allows LEDs to be controlled by a software or hardware pattern + which is a series of tuples, of brightness and duration (ms). + If unsure, say N + endif # LEDS_TRIGGERS diff --git a/drivers/leds/trigger/Makefile b/drivers/leds/trigger/Makefile index f3cfe19..9bcb64e 100644 --- a/drivers/leds/trigger/Makefile +++ b/drivers/leds/trigger/Makefile @@ -13,3 +13,4 @@ obj-$(CONFIG_LEDS_TRIGGER_TRANSIENT) += ledtrig-transient.o
[PATCH v12 2/2] leds: sc27xx: Add pattern_set/clear interfaces for LED controller
This patch implements the 'pattern_set'and 'pattern_clear' interfaces to support SC27XX LED breathing mode. Signed-off-by: Baolin Wang Acked-by: Pavel Machek --- Changes from v11: - None. Changes from v10: - Add duration alignment function suggested by Jacek. - Add acked tag from Pavel. Changes from v9: - Optimize the ABI documentation file. - Update the brightness value in hardware pattern mode. Changes from v8: - Optimize the ABI documentation file. Changes from v7: - Add its own ABI documentation file. Changes from v6: - None. Changes from v5: - None. Changes from v4: - None. Changes from v3: - None. Changes from v2: - None. Changes from v1: - Remove pattern_get interface. --- .../ABI/testing/sysfs-class-led-driver-sc27xx | 22 drivers/leds/leds-sc27xx-bltc.c| 121 2 files changed, 143 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-led-driver-sc27xx diff --git a/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx new file mode 100644 index 000..45b1e60 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx @@ -0,0 +1,22 @@ +What: /sys/class/leds//hw_pattern +Date: September 2018 +KernelVersion: 4.20 +Description: + Specify a hardware pattern for the SC27XX LED. For the SC27XX + LED controller, it only supports 4 stages to make a single + hardware pattern, which is used to configure the rise time, + high time, fall time and low time for the breathing mode. + + For the breathing mode, the SC27XX LED only expects one brightness + for the high stage. To be compatible with the hardware pattern + format, we should set brightness as 0 for rise stage, fall + stage and low stage. + + Min stage duration: 125 ms + Max stage duration: 31875 ms + + Since the stage duration step is 125 ms, the duration should be + a multiplier of 125, like 125ms, 250ms, 375ms, 500ms ... 31875ms. + + Thus the format of the hardware pattern values should be: + "0 rise_duration brightness high_duration 0 fall_duration 0 low_duration". diff --git a/drivers/leds/leds-sc27xx-bltc.c b/drivers/leds/leds-sc27xx-bltc.c index 9d9b7aa..fecf27f 100644 --- a/drivers/leds/leds-sc27xx-bltc.c +++ b/drivers/leds/leds-sc27xx-bltc.c @@ -32,8 +32,18 @@ #define SC27XX_DUTY_MASK GENMASK(15, 0) #define SC27XX_MOD_MASKGENMASK(7, 0) +#define SC27XX_CURVE_SHIFT 8 +#define SC27XX_CURVE_L_MASKGENMASK(7, 0) +#define SC27XX_CURVE_H_MASKGENMASK(15, 8) + #define SC27XX_LEDS_OFFSET 0x10 #define SC27XX_LEDS_MAX3 +#define SC27XX_LEDS_PATTERN_CNT4 +/* Stage duration step, in milliseconds */ +#define SC27XX_LEDS_STEP 125 +/* Minimum and maximum duration, in milliseconds */ +#define SC27XX_DELTA_T_MIN SC27XX_LEDS_STEP +#define SC27XX_DELTA_T_MAX (SC27XX_LEDS_STEP * 255) struct sc27xx_led { char name[LED_MAX_NAME_SIZE]; @@ -122,6 +132,113 @@ static int sc27xx_led_set(struct led_classdev *ldev, enum led_brightness value) return err; } +static void sc27xx_led_clamp_align_delta_t(u32 *delta_t) +{ + u32 v, offset, t = *delta_t; + + v = t + SC27XX_LEDS_STEP / 2; + v = clamp_t(u32, v, SC27XX_DELTA_T_MIN, SC27XX_DELTA_T_MAX); + offset = v - SC27XX_DELTA_T_MIN; + offset = SC27XX_LEDS_STEP * (offset / SC27XX_LEDS_STEP); + + *delta_t = SC27XX_DELTA_T_MIN + offset; +} + +static int sc27xx_led_pattern_clear(struct led_classdev *ldev) +{ + struct sc27xx_led *leds = to_sc27xx_led(ldev); + struct regmap *regmap = leds->priv->regmap; + u32 base = sc27xx_led_get_offset(leds); + u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL; + u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line; + int err; + + mutex_lock(>priv->lock); + + /* Reset the rise, high, fall and low time to zero. */ + regmap_write(regmap, base + SC27XX_LEDS_CURVE0, 0); + regmap_write(regmap, base + SC27XX_LEDS_CURVE1, 0); + + err = regmap_update_bits(regmap, ctrl_base, + (SC27XX_LED_RUN | SC27XX_LED_TYPE) << ctrl_shift, 0); + + ldev->brightness = LED_OFF; + + mutex_unlock(>priv->lock); + + return err; +} + +static int sc27xx_led_pattern_set(struct led_classdev *ldev, + struct led_pattern *pattern, + u32 len, int repeat) +{ + struct sc27xx_led *leds = to_sc27xx_led(ldev); + u32 base = sc27xx_led_get_offset(leds); + u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL; + u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line; + struct regmap *regmap = leds->priv->regmap; + int err; + + /*
[PATCH v12 2/2] leds: sc27xx: Add pattern_set/clear interfaces for LED controller
This patch implements the 'pattern_set'and 'pattern_clear' interfaces to support SC27XX LED breathing mode. Signed-off-by: Baolin Wang Acked-by: Pavel Machek --- Changes from v11: - None. Changes from v10: - Add duration alignment function suggested by Jacek. - Add acked tag from Pavel. Changes from v9: - Optimize the ABI documentation file. - Update the brightness value in hardware pattern mode. Changes from v8: - Optimize the ABI documentation file. Changes from v7: - Add its own ABI documentation file. Changes from v6: - None. Changes from v5: - None. Changes from v4: - None. Changes from v3: - None. Changes from v2: - None. Changes from v1: - Remove pattern_get interface. --- .../ABI/testing/sysfs-class-led-driver-sc27xx | 22 drivers/leds/leds-sc27xx-bltc.c| 121 2 files changed, 143 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-led-driver-sc27xx diff --git a/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx new file mode 100644 index 000..45b1e60 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx @@ -0,0 +1,22 @@ +What: /sys/class/leds//hw_pattern +Date: September 2018 +KernelVersion: 4.20 +Description: + Specify a hardware pattern for the SC27XX LED. For the SC27XX + LED controller, it only supports 4 stages to make a single + hardware pattern, which is used to configure the rise time, + high time, fall time and low time for the breathing mode. + + For the breathing mode, the SC27XX LED only expects one brightness + for the high stage. To be compatible with the hardware pattern + format, we should set brightness as 0 for rise stage, fall + stage and low stage. + + Min stage duration: 125 ms + Max stage duration: 31875 ms + + Since the stage duration step is 125 ms, the duration should be + a multiplier of 125, like 125ms, 250ms, 375ms, 500ms ... 31875ms. + + Thus the format of the hardware pattern values should be: + "0 rise_duration brightness high_duration 0 fall_duration 0 low_duration". diff --git a/drivers/leds/leds-sc27xx-bltc.c b/drivers/leds/leds-sc27xx-bltc.c index 9d9b7aa..fecf27f 100644 --- a/drivers/leds/leds-sc27xx-bltc.c +++ b/drivers/leds/leds-sc27xx-bltc.c @@ -32,8 +32,18 @@ #define SC27XX_DUTY_MASK GENMASK(15, 0) #define SC27XX_MOD_MASKGENMASK(7, 0) +#define SC27XX_CURVE_SHIFT 8 +#define SC27XX_CURVE_L_MASKGENMASK(7, 0) +#define SC27XX_CURVE_H_MASKGENMASK(15, 8) + #define SC27XX_LEDS_OFFSET 0x10 #define SC27XX_LEDS_MAX3 +#define SC27XX_LEDS_PATTERN_CNT4 +/* Stage duration step, in milliseconds */ +#define SC27XX_LEDS_STEP 125 +/* Minimum and maximum duration, in milliseconds */ +#define SC27XX_DELTA_T_MIN SC27XX_LEDS_STEP +#define SC27XX_DELTA_T_MAX (SC27XX_LEDS_STEP * 255) struct sc27xx_led { char name[LED_MAX_NAME_SIZE]; @@ -122,6 +132,113 @@ static int sc27xx_led_set(struct led_classdev *ldev, enum led_brightness value) return err; } +static void sc27xx_led_clamp_align_delta_t(u32 *delta_t) +{ + u32 v, offset, t = *delta_t; + + v = t + SC27XX_LEDS_STEP / 2; + v = clamp_t(u32, v, SC27XX_DELTA_T_MIN, SC27XX_DELTA_T_MAX); + offset = v - SC27XX_DELTA_T_MIN; + offset = SC27XX_LEDS_STEP * (offset / SC27XX_LEDS_STEP); + + *delta_t = SC27XX_DELTA_T_MIN + offset; +} + +static int sc27xx_led_pattern_clear(struct led_classdev *ldev) +{ + struct sc27xx_led *leds = to_sc27xx_led(ldev); + struct regmap *regmap = leds->priv->regmap; + u32 base = sc27xx_led_get_offset(leds); + u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL; + u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line; + int err; + + mutex_lock(>priv->lock); + + /* Reset the rise, high, fall and low time to zero. */ + regmap_write(regmap, base + SC27XX_LEDS_CURVE0, 0); + regmap_write(regmap, base + SC27XX_LEDS_CURVE1, 0); + + err = regmap_update_bits(regmap, ctrl_base, + (SC27XX_LED_RUN | SC27XX_LED_TYPE) << ctrl_shift, 0); + + ldev->brightness = LED_OFF; + + mutex_unlock(>priv->lock); + + return err; +} + +static int sc27xx_led_pattern_set(struct led_classdev *ldev, + struct led_pattern *pattern, + u32 len, int repeat) +{ + struct sc27xx_led *leds = to_sc27xx_led(ldev); + u32 base = sc27xx_led_get_offset(leds); + u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL; + u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line; + struct regmap *regmap = leds->priv->regmap; + int err; + + /*
linux-next: build warning after merge of the usb tree
Hi Greg, After merging the usb tree, today's linux-next build (arm multi_v7_defconfig) produced this warning: drivers/usb/core/hcd.c: In function '__usb_hcd_giveback_urb': drivers/usb/core/hcd.c:1741:16: warning: unused variable 'flags' [-Wunused-variable] unsigned long flags; ^ Introduced by commit ed194d136769 ("usb: core: remove local_irq_save() around ->complete() handler") -- Cheers, Stephen Rothwell pgpWOq5GNm2gm.pgp Description: OpenPGP digital signature
linux-next: build warning after merge of the usb tree
Hi Greg, After merging the usb tree, today's linux-next build (arm multi_v7_defconfig) produced this warning: drivers/usb/core/hcd.c: In function '__usb_hcd_giveback_urb': drivers/usb/core/hcd.c:1741:16: warning: unused variable 'flags' [-Wunused-variable] unsigned long flags; ^ Introduced by commit ed194d136769 ("usb: core: remove local_irq_save() around ->complete() handler") -- Cheers, Stephen Rothwell pgpWOq5GNm2gm.pgp Description: OpenPGP digital signature
Re: [PATCH] iio: proximity: Add driver support for ST's VL53L0X ToF ranging sensor.
On Mon, Sep 10, 2018 at 11:27:47PM +0530, Himanshu Jha wrote: > On Mon, Sep 10, 2018 at 10:42:59PM +0800, Song Qiang wrote: > > This driver was originally written by ST in 2016 as a misc input device, > > and hasn't been maintained for a long time. I grabbed some code from > > it's API and reformed it to a iio proximity device driver. > > This version of driver uses i2c bus to talk to the sensor and > > polling for measuring completes, so no irq line is needed. > > This version of driver supports only one-shot mode, and it can be > > tested with reading from > > /sys/bus/iio/devices/iio:deviceX/in_distance_raw > > > > Signed-off-by: Song Qiang > > --- > > The Cc list contains developers who might not be relevant > for the discussion. > > So, copy only those people listed by: > > $./scripts/get_maintainer.pl > > Don't know why Kate & Greg are cc'ed ? > Hi Himanshu, Since this is a new device driver may going to be added into drivers/iio/proximity folder, I used drivers/iio/proximity as the parameter of ./scripts/get_maintainer.pl and it just returned them. I rechecked it and seems like it says Greg and Kate are commit_signers. I send patches as Greg's speech "Write and Submit Your First Linux Kernel Patch" told. So, should I just send the patches to reviewers that get_maintainer.pl returned and stop cc all the commit_signers? > > .../bindings/iio/proximity/vl53l0x.txt| 12 + > > drivers/iio/proximity/Kconfig | 13 + > > drivers/iio/proximity/Makefile| 2 + > > drivers/iio/proximity/vl53l0x-i2c.c | 295 ++ > > 4 files changed, 322 insertions(+) > > create mode 100644 > > Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt > > create mode 100644 drivers/iio/proximity/vl53l0x-i2c.c > > > > diff --git a/Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt > > b/Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt > > new file mode 100644 > > index ..64b69442f08e > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt > > @@ -0,0 +1,12 @@ > > +ST's VL53L0X ToF ranging sensor > > + > > +Required properties: > > + - compatible: must be "st,vl53l0x-i2c" > > + - reg: i2c address where to find the device > > + > > +Example: > > + > > +vl53l0x@29 { > > + compatible = "st,vl53l0x-i2c"; > > + reg = <0x29>; > > +}; > > diff --git a/drivers/iio/proximity/Kconfig b/drivers/iio/proximity/Kconfig > > index f726f9427602..1563a5f9144d 100644 > > --- a/drivers/iio/proximity/Kconfig > > +++ b/drivers/iio/proximity/Kconfig > > @@ -79,4 +79,17 @@ config SRF08 > > To compile this driver as a module, choose M here: the > > module will be called srf08. > > > > +config VL53L0X_I2C > > + tristate "STMicroelectronics VL53L0X ToF ranger sensor (I2C)" > > + select IIO_BUFFER > > + select IIO_TRIGGERED_BUFFER > > I don't see any buffer/trigger support, so better to remove these > two options. > > > + depends on I2C > > + help > > + Say Y here to build a driver for STMicroelectronics VL53L0X > > + ToF ranger sensors with i2c interface. > > + This driver can be used to measure the distance of objects. > > + > > + To compile this driver as a module, choose M here: the > > + module will be called vl53l0x-i2c. > > `name` attribute will be VL53L0X_DRV_NAME(vl53l0x) if OF matching > is not used to probe the driver. > > > endmenu > > diff --git a/drivers/iio/proximity/Makefile b/drivers/iio/proximity/Makefile > > index 4f4ed45e87ef..7cb771665c8b 100644 > > --- a/drivers/iio/proximity/Makefile > > +++ b/drivers/iio/proximity/Makefile > > @@ -10,3 +10,5 @@ obj-$(CONFIG_RFD77402)+= rfd77402.o > > obj-$(CONFIG_SRF04)+= srf04.o > > obj-$(CONFIG_SRF08)+= srf08.o > > obj-$(CONFIG_SX9500) += sx9500.o > > +obj-$(CONFIG_VL53L0X_I2C) += vl53l0x-i2c.o > > + > > diff --git a/drivers/iio/proximity/vl53l0x-i2c.c > > b/drivers/iio/proximity/vl53l0x-i2c.c > > new file mode 100644 > > index ..c00713041d30 > > --- /dev/null > > +++ b/drivers/iio/proximity/vl53l0x-i2c.c > > @@ -0,0 +1,295 @@ > > +// SPDX-License-Identifier: GPL-2.0+ > > +/* > > + * vl53l0x-i2c.c - Support for STM VL53L0X FlightSense TOF > > + * Ranger Sensor on a i2c bus. > > + * > > + * Copyright (C) 2016 STMicroelectronics Imaging Division. > > + * Copyright (C) 2018 Song Qiang > > + * > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#define VL53L0X_DRV_NAME "vl53l0x" > > + > > +/* Device register map */ > > +#define VL_REG_SYSRANGE_START 0x000 > > +#define VL_REG_SYSRANGE_MODE_MASK 0x0F > > +#define VL_REG_SYSRANGE_MODE_START_STOP0x01 > > +#define VL_REG_SYSRANGE_MODE_SINGLESHOT0x00 > >
Re: [PATCH] iio: proximity: Add driver support for ST's VL53L0X ToF ranging sensor.
On Mon, Sep 10, 2018 at 11:27:47PM +0530, Himanshu Jha wrote: > On Mon, Sep 10, 2018 at 10:42:59PM +0800, Song Qiang wrote: > > This driver was originally written by ST in 2016 as a misc input device, > > and hasn't been maintained for a long time. I grabbed some code from > > it's API and reformed it to a iio proximity device driver. > > This version of driver uses i2c bus to talk to the sensor and > > polling for measuring completes, so no irq line is needed. > > This version of driver supports only one-shot mode, and it can be > > tested with reading from > > /sys/bus/iio/devices/iio:deviceX/in_distance_raw > > > > Signed-off-by: Song Qiang > > --- > > The Cc list contains developers who might not be relevant > for the discussion. > > So, copy only those people listed by: > > $./scripts/get_maintainer.pl > > Don't know why Kate & Greg are cc'ed ? > Hi Himanshu, Since this is a new device driver may going to be added into drivers/iio/proximity folder, I used drivers/iio/proximity as the parameter of ./scripts/get_maintainer.pl and it just returned them. I rechecked it and seems like it says Greg and Kate are commit_signers. I send patches as Greg's speech "Write and Submit Your First Linux Kernel Patch" told. So, should I just send the patches to reviewers that get_maintainer.pl returned and stop cc all the commit_signers? > > .../bindings/iio/proximity/vl53l0x.txt| 12 + > > drivers/iio/proximity/Kconfig | 13 + > > drivers/iio/proximity/Makefile| 2 + > > drivers/iio/proximity/vl53l0x-i2c.c | 295 ++ > > 4 files changed, 322 insertions(+) > > create mode 100644 > > Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt > > create mode 100644 drivers/iio/proximity/vl53l0x-i2c.c > > > > diff --git a/Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt > > b/Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt > > new file mode 100644 > > index ..64b69442f08e > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt > > @@ -0,0 +1,12 @@ > > +ST's VL53L0X ToF ranging sensor > > + > > +Required properties: > > + - compatible: must be "st,vl53l0x-i2c" > > + - reg: i2c address where to find the device > > + > > +Example: > > + > > +vl53l0x@29 { > > + compatible = "st,vl53l0x-i2c"; > > + reg = <0x29>; > > +}; > > diff --git a/drivers/iio/proximity/Kconfig b/drivers/iio/proximity/Kconfig > > index f726f9427602..1563a5f9144d 100644 > > --- a/drivers/iio/proximity/Kconfig > > +++ b/drivers/iio/proximity/Kconfig > > @@ -79,4 +79,17 @@ config SRF08 > > To compile this driver as a module, choose M here: the > > module will be called srf08. > > > > +config VL53L0X_I2C > > + tristate "STMicroelectronics VL53L0X ToF ranger sensor (I2C)" > > + select IIO_BUFFER > > + select IIO_TRIGGERED_BUFFER > > I don't see any buffer/trigger support, so better to remove these > two options. > > > + depends on I2C > > + help > > + Say Y here to build a driver for STMicroelectronics VL53L0X > > + ToF ranger sensors with i2c interface. > > + This driver can be used to measure the distance of objects. > > + > > + To compile this driver as a module, choose M here: the > > + module will be called vl53l0x-i2c. > > `name` attribute will be VL53L0X_DRV_NAME(vl53l0x) if OF matching > is not used to probe the driver. > > > endmenu > > diff --git a/drivers/iio/proximity/Makefile b/drivers/iio/proximity/Makefile > > index 4f4ed45e87ef..7cb771665c8b 100644 > > --- a/drivers/iio/proximity/Makefile > > +++ b/drivers/iio/proximity/Makefile > > @@ -10,3 +10,5 @@ obj-$(CONFIG_RFD77402)+= rfd77402.o > > obj-$(CONFIG_SRF04)+= srf04.o > > obj-$(CONFIG_SRF08)+= srf08.o > > obj-$(CONFIG_SX9500) += sx9500.o > > +obj-$(CONFIG_VL53L0X_I2C) += vl53l0x-i2c.o > > + > > diff --git a/drivers/iio/proximity/vl53l0x-i2c.c > > b/drivers/iio/proximity/vl53l0x-i2c.c > > new file mode 100644 > > index ..c00713041d30 > > --- /dev/null > > +++ b/drivers/iio/proximity/vl53l0x-i2c.c > > @@ -0,0 +1,295 @@ > > +// SPDX-License-Identifier: GPL-2.0+ > > +/* > > + * vl53l0x-i2c.c - Support for STM VL53L0X FlightSense TOF > > + * Ranger Sensor on a i2c bus. > > + * > > + * Copyright (C) 2016 STMicroelectronics Imaging Division. > > + * Copyright (C) 2018 Song Qiang > > + * > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#define VL53L0X_DRV_NAME "vl53l0x" > > + > > +/* Device register map */ > > +#define VL_REG_SYSRANGE_START 0x000 > > +#define VL_REG_SYSRANGE_MODE_MASK 0x0F > > +#define VL_REG_SYSRANGE_MODE_START_STOP0x01 > > +#define VL_REG_SYSRANGE_MODE_SINGLESHOT0x00 > >
[PATCH v2 4/4] arm64: dts: rockchip: Enable SD card detection for Rock960 boards
For proper working of SD cards, let's add the Card Detect GPIO property to the common devicetree for Rock960 family boards. Signed-off-by: Manivannan Sadhasivam --- arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi index 5a5d8e28ef55..f68254831ad9 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi @@ -403,6 +403,7 @@ cap-sd-highspeed; clock-frequency = <1>; clock-freq-min-max = <10 1>; + cd-gpios = < 7 GPIO_ACTIVE_LOW>; disable-wp; sd-uhs-sdr104; vqmmc-supply = <_sd>; -- 2.17.1
[PATCH v2 2/4] dt-bindings: arm: rockchip: Add binding for Rock960 board
Add devicetree binding for Rock960 board from Vamrs Limited. Signed-off-by: Manivannan Sadhasivam --- Documentation/devicetree/bindings/arm/rockchip.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/arm/rockchip.txt b/Documentation/devicetree/bindings/arm/rockchip.txt index acfd3c773dd0..4b6888a21db2 100644 --- a/Documentation/devicetree/bindings/arm/rockchip.txt +++ b/Documentation/devicetree/bindings/arm/rockchip.txt @@ -5,6 +5,10 @@ Rockchip platforms device tree bindings Required root node properties: - compatible = "vamrs,ficus", "rockchip,rk3399"; +- 96boards RK3399 Rock960 (ROCK960 Consumer Edition) +Required root node properties: + - compatible = "vamrs,rock960", "rockchip,rk3399"; + - Amarula Vyasa RK3288 board Required root node properties: - compatible = "amarula,vyasa-rk3288", "rockchip,rk3288"; -- 2.17.1
[PATCH v2 4/4] arm64: dts: rockchip: Enable SD card detection for Rock960 boards
For proper working of SD cards, let's add the Card Detect GPIO property to the common devicetree for Rock960 family boards. Signed-off-by: Manivannan Sadhasivam --- arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi index 5a5d8e28ef55..f68254831ad9 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi @@ -403,6 +403,7 @@ cap-sd-highspeed; clock-frequency = <1>; clock-freq-min-max = <10 1>; + cd-gpios = < 7 GPIO_ACTIVE_LOW>; disable-wp; sd-uhs-sdr104; vqmmc-supply = <_sd>; -- 2.17.1
[PATCH v2 2/4] dt-bindings: arm: rockchip: Add binding for Rock960 board
Add devicetree binding for Rock960 board from Vamrs Limited. Signed-off-by: Manivannan Sadhasivam --- Documentation/devicetree/bindings/arm/rockchip.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/arm/rockchip.txt b/Documentation/devicetree/bindings/arm/rockchip.txt index acfd3c773dd0..4b6888a21db2 100644 --- a/Documentation/devicetree/bindings/arm/rockchip.txt +++ b/Documentation/devicetree/bindings/arm/rockchip.txt @@ -5,6 +5,10 @@ Rockchip platforms device tree bindings Required root node properties: - compatible = "vamrs,ficus", "rockchip,rk3399"; +- 96boards RK3399 Rock960 (ROCK960 Consumer Edition) +Required root node properties: + - compatible = "vamrs,rock960", "rockchip,rk3399"; + - Amarula Vyasa RK3288 board Required root node properties: - compatible = "amarula,vyasa-rk3288", "rockchip,rk3288"; -- 2.17.1
[PATCH v2 3/4] arm64: boot: dts: rockchip: Add support for Rock960 board
Add devicetree support for Rock960 board, one of the Consumer Edition boards of the 96Boards family. This board support utilizes the common Rock960 family board support that includes Ficus 96Board. Signed-off-by: Manivannan Sadhasivam --- arch/arm64/boot/dts/rockchip/Makefile | 1 + .../boot/dts/rockchip/rk3399-rock960.dts | 139 ++ 2 files changed, 140 insertions(+) create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dts diff --git a/arch/arm64/boot/dts/rockchip/Makefile b/arch/arm64/boot/dts/rockchip/Makefile index b0092d95b574..57c0d76458e6 100644 --- a/arch/arm64/boot/dts/rockchip/Makefile +++ b/arch/arm64/boot/dts/rockchip/Makefile @@ -14,5 +14,6 @@ dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-firefly.dtb dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-gru-bob.dtb dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-gru-kevin.dtb dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-puma-haikou.dtb +dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-rock960.dtb dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-sapphire.dtb dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-sapphire-excavator.dtb diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock960.dts b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dts new file mode 100644 index ..37242b64a7a3 --- /dev/null +++ b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dts @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: (GPL-2.0+ OR MIT) +/* + * Copyright (c) 2018 Linaro Ltd. + */ + +/dts-v1/; +#include "rk3399-rock960.dtsi" + +/ { + model = "96boards Rock960"; + compatible = "vamrs,rock960", "rockchip,rk3399"; + + chosen { + stdout-path = "serial2:150n8"; + }; + + vcc3v3_pcie: vcc3v3-pcie-regulator { + compatible = "regulator-fixed"; + enable-active-high; + gpio = < 5 GPIO_ACTIVE_HIGH>; + pinctrl-names = "default"; + pinctrl-0 = <_drv>; + regulator-boot-on; + regulator-name = "vcc3v3_pcie"; + regulator-min-microvolt = <330>; + regulator-max-microvolt = <330>; + vin-supply = <_sys>; + }; + + vcc5v0_host: vcc5v0-host-regulator { + compatible = "regulator-fixed"; + enable-active-high; + gpio = < 25 GPIO_ACTIVE_HIGH>; + pinctrl-names = "default"; + pinctrl-0 = <_vbus_drv>; + regulator-name = "vcc5v0_host"; + regulator-min-microvolt = <500>; + regulator-max-microvolt = <500>; + regulator-always-on; + vin-supply = <_sys>; + }; +}; + + { + pcie { + pcie_drv: pcie-drv { + rockchip,pins = + <2 RK_PA5 RK_FUNC_GPIO _pull_none>; + }; + }; + + usb2 { + host_vbus_drv: host-vbus-drv { + rockchip,pins = + <4 RK_PD1 RK_FUNC_GPIO _pull_none>; + }; + }; +}; + +_phy { + status = "okay"; +}; + + { + ep-gpios = < RK_PA2 GPIO_ACTIVE_HIGH>; + num-lanes = <4>; + pinctrl-names = "default"; + pinctrl-0 = <_clkreqn_cpm>; + vpcie3v3-supply = <_pcie>; + status = "okay"; +}; + + { + status = "okay"; +}; + + { + status = "okay"; +}; + + { + status = "okay"; +}; + + { + status = "okay"; +}; + +_host { + phy-supply = <_host>; + status = "okay"; +}; + +_host { + phy-supply = <_host>; + status = "okay"; +}; + +_otg { + status = "okay"; +}; + +_otg { + status = "okay"; +}; + +_host0_ehci { + status = "okay"; +}; + +_host0_ohci { + status = "okay"; +}; + +_host1_ehci { + status = "okay"; +}; + +_host1_ohci { + status = "okay"; +}; + +_0 { + status = "okay"; +}; + +_dwc3_0 { + status = "okay"; + dr_mode = "otg"; +}; + +_1 { + status = "okay"; +}; + +_dwc3_1 { + status = "okay"; + dr_mode = "host"; +}; -- 2.17.1
[PATCH v2 3/4] arm64: boot: dts: rockchip: Add support for Rock960 board
Add devicetree support for Rock960 board, one of the Consumer Edition boards of the 96Boards family. This board support utilizes the common Rock960 family board support that includes Ficus 96Board. Signed-off-by: Manivannan Sadhasivam --- arch/arm64/boot/dts/rockchip/Makefile | 1 + .../boot/dts/rockchip/rk3399-rock960.dts | 139 ++ 2 files changed, 140 insertions(+) create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dts diff --git a/arch/arm64/boot/dts/rockchip/Makefile b/arch/arm64/boot/dts/rockchip/Makefile index b0092d95b574..57c0d76458e6 100644 --- a/arch/arm64/boot/dts/rockchip/Makefile +++ b/arch/arm64/boot/dts/rockchip/Makefile @@ -14,5 +14,6 @@ dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-firefly.dtb dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-gru-bob.dtb dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-gru-kevin.dtb dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-puma-haikou.dtb +dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-rock960.dtb dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-sapphire.dtb dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-sapphire-excavator.dtb diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock960.dts b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dts new file mode 100644 index ..37242b64a7a3 --- /dev/null +++ b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dts @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: (GPL-2.0+ OR MIT) +/* + * Copyright (c) 2018 Linaro Ltd. + */ + +/dts-v1/; +#include "rk3399-rock960.dtsi" + +/ { + model = "96boards Rock960"; + compatible = "vamrs,rock960", "rockchip,rk3399"; + + chosen { + stdout-path = "serial2:150n8"; + }; + + vcc3v3_pcie: vcc3v3-pcie-regulator { + compatible = "regulator-fixed"; + enable-active-high; + gpio = < 5 GPIO_ACTIVE_HIGH>; + pinctrl-names = "default"; + pinctrl-0 = <_drv>; + regulator-boot-on; + regulator-name = "vcc3v3_pcie"; + regulator-min-microvolt = <330>; + regulator-max-microvolt = <330>; + vin-supply = <_sys>; + }; + + vcc5v0_host: vcc5v0-host-regulator { + compatible = "regulator-fixed"; + enable-active-high; + gpio = < 25 GPIO_ACTIVE_HIGH>; + pinctrl-names = "default"; + pinctrl-0 = <_vbus_drv>; + regulator-name = "vcc5v0_host"; + regulator-min-microvolt = <500>; + regulator-max-microvolt = <500>; + regulator-always-on; + vin-supply = <_sys>; + }; +}; + + { + pcie { + pcie_drv: pcie-drv { + rockchip,pins = + <2 RK_PA5 RK_FUNC_GPIO _pull_none>; + }; + }; + + usb2 { + host_vbus_drv: host-vbus-drv { + rockchip,pins = + <4 RK_PD1 RK_FUNC_GPIO _pull_none>; + }; + }; +}; + +_phy { + status = "okay"; +}; + + { + ep-gpios = < RK_PA2 GPIO_ACTIVE_HIGH>; + num-lanes = <4>; + pinctrl-names = "default"; + pinctrl-0 = <_clkreqn_cpm>; + vpcie3v3-supply = <_pcie>; + status = "okay"; +}; + + { + status = "okay"; +}; + + { + status = "okay"; +}; + + { + status = "okay"; +}; + + { + status = "okay"; +}; + +_host { + phy-supply = <_host>; + status = "okay"; +}; + +_host { + phy-supply = <_host>; + status = "okay"; +}; + +_otg { + status = "okay"; +}; + +_otg { + status = "okay"; +}; + +_host0_ehci { + status = "okay"; +}; + +_host0_ohci { + status = "okay"; +}; + +_host1_ehci { + status = "okay"; +}; + +_host1_ohci { + status = "okay"; +}; + +_0 { + status = "okay"; +}; + +_dwc3_0 { + status = "okay"; + dr_mode = "otg"; +}; + +_1 { + status = "okay"; +}; + +_dwc3_1 { + status = "okay"; + dr_mode = "host"; +}; -- 2.17.1
[PATCH v2 1/4] arm64: dts: rockchip: Split out common nodes for Rock960 based boards
Since the same family members of Rock960 boards (Rock960 and Ficus) share the same configuration, split out the common nodes into a common dtsi file for reducing code duplication. The board specific nodes for Ficus boards are then placed in corresponding board DTS file. Signed-off-by: Manivannan Sadhasivam --- arch/arm64/boot/dts/rockchip/rk3399-ficus.dts | 429 + .../boot/dts/rockchip/rk3399-rock960.dtsi | 439 ++ 2 files changed, 440 insertions(+), 428 deletions(-) create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi diff --git a/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts b/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts index 8978d924eb83..7f6ec37d5a69 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts +++ b/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts @@ -7,8 +7,7 @@ */ /dts-v1/; -#include "rk3399.dtsi" -#include "rk3399-opp.dtsi" +#include "rk3399-rock960.dtsi" / { model = "96boards RK3399 Ficus"; @@ -25,31 +24,6 @@ #clock-cells = <0>; }; - vcc1v8_s0: vcc1v8-s0 { - compatible = "regulator-fixed"; - regulator-name = "vcc1v8_s0"; - regulator-min-microvolt = <180>; - regulator-max-microvolt = <180>; - regulator-always-on; - }; - - vcc_sys: vcc-sys { - compatible = "regulator-fixed"; - regulator-name = "vcc_sys"; - regulator-min-microvolt = <500>; - regulator-max-microvolt = <500>; - regulator-always-on; - }; - - vcc3v3_sys: vcc3v3-sys { - compatible = "regulator-fixed"; - regulator-name = "vcc3v3_sys"; - regulator-min-microvolt = <330>; - regulator-max-microvolt = <330>; - regulator-always-on; - vin-supply = <_sys>; - }; - vcc3v3_pcie: vcc3v3-pcie-regulator { compatible = "regulator-fixed"; enable-active-high; @@ -75,46 +49,6 @@ regulator-always-on; vin-supply = <_sys>; }; - - vdd_log: vdd-log { - compatible = "pwm-regulator"; - pwms = < 0 25000 0>; - regulator-name = "vdd_log"; - regulator-min-microvolt = <80>; - regulator-max-microvolt = <140>; - regulator-always-on; - regulator-boot-on; - vin-supply = <_sys>; - }; - -}; - -_l0 { - cpu-supply = <_cpu_l>; -}; - -_l1 { - cpu-supply = <_cpu_l>; -}; - -_l2 { - cpu-supply = <_cpu_l>; -}; - -_l3 { - cpu-supply = <_cpu_l>; -}; - -_b0 { - cpu-supply = <_cpu_b>; -}; - -_b1 { - cpu-supply = <_cpu_b>; -}; - -_phy { - status = "okay"; }; { @@ -133,263 +67,6 @@ status = "okay"; }; - { - ddc-i2c-bus = <>; - pinctrl-names = "default"; - pinctrl-0 = <_cec>; - status = "okay"; -}; - - { - clock-frequency = <40>; - i2c-scl-rising-time-ns = <168>; - i2c-scl-falling-time-ns = <4>; - status = "okay"; - - vdd_cpu_b: regulator@40 { - compatible = "silergy,syr827"; - reg = <0x40>; - fcs,suspend-voltage-selector = <1>; - regulator-name = "vdd_cpu_b"; - regulator-min-microvolt = <712500>; - regulator-max-microvolt = <150>; - regulator-ramp-delay = <1000>; - regulator-always-on; - regulator-boot-on; - vin-supply = <_sys>; - status = "okay"; - - regulator-state-mem { - regulator-off-in-suspend; - }; - }; - - vdd_gpu: regulator@41 { - compatible = "silergy,syr828"; - reg = <0x41>; - fcs,suspend-voltage-selector = <1>; - regulator-name = "vdd_gpu"; - regulator-min-microvolt = <712500>; - regulator-max-microvolt = <150>; - regulator-ramp-delay = <1000>; - regulator-always-on; - regulator-boot-on; - vin-supply = <_sys>; - regulator-state-mem { - regulator-off-in-suspend; - }; - }; - - rk808: pmic@1b { - compatible = "rockchip,rk808"; - reg = <0x1b>; - interrupt-parent = <>; - interrupts = <21 IRQ_TYPE_LEVEL_LOW>; - pinctrl-names = "default"; - pinctrl-0 = <_int_l>; - rockchip,system-power-controller; - wakeup-source; - #clock-cells = <1>; - clock-output-names = "xin32k", "rk808-clkout2"; - - vcc1-supply = <_sys>; - vcc2-supply = <_sys>; - vcc3-supply = <_sys>; -
[PATCH v2 1/4] arm64: dts: rockchip: Split out common nodes for Rock960 based boards
Since the same family members of Rock960 boards (Rock960 and Ficus) share the same configuration, split out the common nodes into a common dtsi file for reducing code duplication. The board specific nodes for Ficus boards are then placed in corresponding board DTS file. Signed-off-by: Manivannan Sadhasivam --- arch/arm64/boot/dts/rockchip/rk3399-ficus.dts | 429 + .../boot/dts/rockchip/rk3399-rock960.dtsi | 439 ++ 2 files changed, 440 insertions(+), 428 deletions(-) create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi diff --git a/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts b/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts index 8978d924eb83..7f6ec37d5a69 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts +++ b/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts @@ -7,8 +7,7 @@ */ /dts-v1/; -#include "rk3399.dtsi" -#include "rk3399-opp.dtsi" +#include "rk3399-rock960.dtsi" / { model = "96boards RK3399 Ficus"; @@ -25,31 +24,6 @@ #clock-cells = <0>; }; - vcc1v8_s0: vcc1v8-s0 { - compatible = "regulator-fixed"; - regulator-name = "vcc1v8_s0"; - regulator-min-microvolt = <180>; - regulator-max-microvolt = <180>; - regulator-always-on; - }; - - vcc_sys: vcc-sys { - compatible = "regulator-fixed"; - regulator-name = "vcc_sys"; - regulator-min-microvolt = <500>; - regulator-max-microvolt = <500>; - regulator-always-on; - }; - - vcc3v3_sys: vcc3v3-sys { - compatible = "regulator-fixed"; - regulator-name = "vcc3v3_sys"; - regulator-min-microvolt = <330>; - regulator-max-microvolt = <330>; - regulator-always-on; - vin-supply = <_sys>; - }; - vcc3v3_pcie: vcc3v3-pcie-regulator { compatible = "regulator-fixed"; enable-active-high; @@ -75,46 +49,6 @@ regulator-always-on; vin-supply = <_sys>; }; - - vdd_log: vdd-log { - compatible = "pwm-regulator"; - pwms = < 0 25000 0>; - regulator-name = "vdd_log"; - regulator-min-microvolt = <80>; - regulator-max-microvolt = <140>; - regulator-always-on; - regulator-boot-on; - vin-supply = <_sys>; - }; - -}; - -_l0 { - cpu-supply = <_cpu_l>; -}; - -_l1 { - cpu-supply = <_cpu_l>; -}; - -_l2 { - cpu-supply = <_cpu_l>; -}; - -_l3 { - cpu-supply = <_cpu_l>; -}; - -_b0 { - cpu-supply = <_cpu_b>; -}; - -_b1 { - cpu-supply = <_cpu_b>; -}; - -_phy { - status = "okay"; }; { @@ -133,263 +67,6 @@ status = "okay"; }; - { - ddc-i2c-bus = <>; - pinctrl-names = "default"; - pinctrl-0 = <_cec>; - status = "okay"; -}; - - { - clock-frequency = <40>; - i2c-scl-rising-time-ns = <168>; - i2c-scl-falling-time-ns = <4>; - status = "okay"; - - vdd_cpu_b: regulator@40 { - compatible = "silergy,syr827"; - reg = <0x40>; - fcs,suspend-voltage-selector = <1>; - regulator-name = "vdd_cpu_b"; - regulator-min-microvolt = <712500>; - regulator-max-microvolt = <150>; - regulator-ramp-delay = <1000>; - regulator-always-on; - regulator-boot-on; - vin-supply = <_sys>; - status = "okay"; - - regulator-state-mem { - regulator-off-in-suspend; - }; - }; - - vdd_gpu: regulator@41 { - compatible = "silergy,syr828"; - reg = <0x41>; - fcs,suspend-voltage-selector = <1>; - regulator-name = "vdd_gpu"; - regulator-min-microvolt = <712500>; - regulator-max-microvolt = <150>; - regulator-ramp-delay = <1000>; - regulator-always-on; - regulator-boot-on; - vin-supply = <_sys>; - regulator-state-mem { - regulator-off-in-suspend; - }; - }; - - rk808: pmic@1b { - compatible = "rockchip,rk808"; - reg = <0x1b>; - interrupt-parent = <>; - interrupts = <21 IRQ_TYPE_LEVEL_LOW>; - pinctrl-names = "default"; - pinctrl-0 = <_int_l>; - rockchip,system-power-controller; - wakeup-source; - #clock-cells = <1>; - clock-output-names = "xin32k", "rk808-clkout2"; - - vcc1-supply = <_sys>; - vcc2-supply = <_sys>; - vcc3-supply = <_sys>; -
Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer
On (09/10/18 16:57), Petr Mladek wrote: > > Good catch. > > > --- > > > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index c036f128cdc3..ede29a7ba6db 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -2545,6 +2545,7 @@ void console_flush_on_panic(void) > > * ensure may_schedule is cleared. > > */ > > console_trylock(); > > + exclusive_console = NULL; > > This is not be enough. It would cause replying old messages > on all consoles. Oh, that was intentional. I consider repeated messages to be less problematic than the missing ones. > Most problems should probably be solved when we store console_seq > before setting exclusive_console. Then we could clear > exclusive_console when reaching the stored sequence number. > > Can this be that simple? ;-) This can work, yes. I also thought about doing it the way Linus, Jan Kara and Hannes Reinecke proposed: - store the console_seq nr of the first oops_in_progress message (oops_console_seq) and flush only messages that are in [oops_console_seq - 200, log_next_seq] range, as opposed to complete logbuf flush. Hannes asked for this several times. And it was in Jan's printk patches long time ago (if I'm not mistaken - sorry if I am -- Jan said that Linus wanted that "just N messages prior to oops" thing). Jan's patch: https://lore.kernel.org/lkml/1457964820-4642-3-git-send-email-sergey.senozhat...@gmail.com/T/#u > This reverts commit 375899cddcbb26881b03cb3fbdcfd600e4e67f4a. > > Reported-by: Hans de Goede > Signed-off-by: Petr Mladek Acked-by: Sergey Senozhatsky -ss
[PATCH v2 0/4] Add 96Boards Rock960 CE board support
This patchset adds 96Boards Rock960 CE board support. Rock960 CE (Consumer Edition) board is one of the member of 96Boards Consumer Edition and AI platform and is manufactured by Vamrs Limited. Most of the board configuration is shared with the Ficus board manufactured by vamrs, which is an Enterprise 96Board. For the sake of avoiding code duplication, a common rock960.dtsi file with common DT nodes for both boards and separate board specific DTS files has been added. To be specific, below are some of the key differences between both boards: 1. Different host enable GPIO for USB 2. Different power and reset GPIO for PCI-E 3. No Ethernet port on Rock960 While adding the board support, SD card Chip detection support is also added to the common dtsi file, shared by both boards. This series has been tested on Rock960 CE v1.2 board and expecting the Ficus board maintainer to test the relevant Ficus part. Thanks, Mani Changes in v2: * Changed the board compatible to "vamrs,rock960" Manivannan Sadhasivam (4): arm64: dts: rockchip: Split out common nodes for Rock960 based boards dt-bindings: arm: rockchip: Add binding for Rock960 board arm64: boot: dts: rockchip: Add support for Rock960 board arm64: dts: rockchip: Enable SD card detection for Rock960 boards .../devicetree/bindings/arm/rockchip.txt | 4 + arch/arm64/boot/dts/rockchip/Makefile | 1 + arch/arm64/boot/dts/rockchip/rk3399-ficus.dts | 429 + .../boot/dts/rockchip/rk3399-rock960.dts | 139 ++ .../boot/dts/rockchip/rk3399-rock960.dtsi | 440 ++ 5 files changed, 585 insertions(+), 428 deletions(-) create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dts create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi -- 2.17.1
Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer
On (09/10/18 16:57), Petr Mladek wrote: > > Good catch. > > > --- > > > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index c036f128cdc3..ede29a7ba6db 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -2545,6 +2545,7 @@ void console_flush_on_panic(void) > > * ensure may_schedule is cleared. > > */ > > console_trylock(); > > + exclusive_console = NULL; > > This is not be enough. It would cause replying old messages > on all consoles. Oh, that was intentional. I consider repeated messages to be less problematic than the missing ones. > Most problems should probably be solved when we store console_seq > before setting exclusive_console. Then we could clear > exclusive_console when reaching the stored sequence number. > > Can this be that simple? ;-) This can work, yes. I also thought about doing it the way Linus, Jan Kara and Hannes Reinecke proposed: - store the console_seq nr of the first oops_in_progress message (oops_console_seq) and flush only messages that are in [oops_console_seq - 200, log_next_seq] range, as opposed to complete logbuf flush. Hannes asked for this several times. And it was in Jan's printk patches long time ago (if I'm not mistaken - sorry if I am -- Jan said that Linus wanted that "just N messages prior to oops" thing). Jan's patch: https://lore.kernel.org/lkml/1457964820-4642-3-git-send-email-sergey.senozhat...@gmail.com/T/#u > This reverts commit 375899cddcbb26881b03cb3fbdcfd600e4e67f4a. > > Reported-by: Hans de Goede > Signed-off-by: Petr Mladek Acked-by: Sergey Senozhatsky -ss
[PATCH v2 0/4] Add 96Boards Rock960 CE board support
This patchset adds 96Boards Rock960 CE board support. Rock960 CE (Consumer Edition) board is one of the member of 96Boards Consumer Edition and AI platform and is manufactured by Vamrs Limited. Most of the board configuration is shared with the Ficus board manufactured by vamrs, which is an Enterprise 96Board. For the sake of avoiding code duplication, a common rock960.dtsi file with common DT nodes for both boards and separate board specific DTS files has been added. To be specific, below are some of the key differences between both boards: 1. Different host enable GPIO for USB 2. Different power and reset GPIO for PCI-E 3. No Ethernet port on Rock960 While adding the board support, SD card Chip detection support is also added to the common dtsi file, shared by both boards. This series has been tested on Rock960 CE v1.2 board and expecting the Ficus board maintainer to test the relevant Ficus part. Thanks, Mani Changes in v2: * Changed the board compatible to "vamrs,rock960" Manivannan Sadhasivam (4): arm64: dts: rockchip: Split out common nodes for Rock960 based boards dt-bindings: arm: rockchip: Add binding for Rock960 board arm64: boot: dts: rockchip: Add support for Rock960 board arm64: dts: rockchip: Enable SD card detection for Rock960 boards .../devicetree/bindings/arm/rockchip.txt | 4 + arch/arm64/boot/dts/rockchip/Makefile | 1 + arch/arm64/boot/dts/rockchip/rk3399-ficus.dts | 429 + .../boot/dts/rockchip/rk3399-rock960.dts | 139 ++ .../boot/dts/rockchip/rk3399-rock960.dtsi | 440 ++ 5 files changed, 585 insertions(+), 428 deletions(-) create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dts create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi -- 2.17.1
Re: [PATCH v3 1/2] i2c: mediatek: Register i2c adapter driver earlier
On Thu, 2018-09-06 at 20:31 +0200, Wolfram Sang wrote: > On Thu, Sep 06, 2018 at 09:15:28PM +0800, Jun Gao wrote: > > From: Jun Gao > > > > In order not to block the initializations of some i2c devices. > > Register i2c adapter driver at appropriate time. > > > > Signed-off-by: Jun Gao > > The reasons this patch was rejected in v2 still hold. OK. Thanks for your opinion. >
Re: [PATCH v3 1/2] i2c: mediatek: Register i2c adapter driver earlier
On Thu, 2018-09-06 at 20:31 +0200, Wolfram Sang wrote: > On Thu, Sep 06, 2018 at 09:15:28PM +0800, Jun Gao wrote: > > From: Jun Gao > > > > In order not to block the initializations of some i2c devices. > > Register i2c adapter driver at appropriate time. > > > > Signed-off-by: Jun Gao > > The reasons this patch was rejected in v2 still hold. OK. Thanks for your opinion. >
[PATCH] kernel: prevent submission of creds with higher privileges inside container
From: Xin Lin <18650033...@163.com> Adversaries often attack the Linux kernel via using commit_creds(prepare_kernel_cred(0)) to submit ROOT credential for the purpose of privilege escalation. For processes inside the Linux container, the above approach also works, because the container and the host share the same Linux kernel. Therefore, we en- force a check in commit_creds() before updating the cred of the caller process. If the process is insi- de a container (judging from the Namespace ID) and try to submit credentials with higher privileges t- han current (judging from the uid, gid, and cap_bset in the new cred), we will stop the modification. We consider that if the namespace ID of the process is different from the init Namespace ID (enumed in /i- nclude/linux/proc_ns.h), the process is inside a c- ontainer. And if the uid/gid in the new cred is sm- aller or the cap_bset (capability bounding set) in the new cred is larger, it may be a privilege esca- lation operation. Signed-off-by: Xin Lin <18650033...@163.com> --- kernel/cred.c | 12 1 file changed, 12 insertions(+) diff --git a/kernel/cred.c b/kernel/cred.c index ecf0365..968a92c 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -425,6 +425,18 @@ int commit_creds(struct cred *new) struct task_struct *task = current; const struct cred *old = task->real_cred; + if (task->nsproxy->uts_ns->ns.inum != PROC_UTS_INIT_INO || + task->nsproxy->ipc_ns->ns.inum != PROC_IPC_INIT_INO || + task->nsproxy->mnt_ns->ns.inum != 0xF000U || + task->nsproxy->pid_ns_for_children->ns.inum != PROC_PID_INIT_INO || + task->nsproxy->net_ns->ns.inum != 0xF075U || + old->user_ns->ns.inum != PROC_USER_INIT_INO || + task->nsproxy->cgroup_ns->ns.inum != PROC_CGROUP_INIT_INO) { + if (new->uid.val < old->uid.val || new->gid.val < old->gid.val + || new->cap_bset.cap[0] > old->cap_bset.cap[0]) + return 0; + } + kdebug("commit_creds(%p{%d,%d})", new, atomic_read(>usage), read_cred_subscribers(new)); -- 2.7.4
[PATCH] kernel: prevent submission of creds with higher privileges inside container
From: Xin Lin <18650033...@163.com> Adversaries often attack the Linux kernel via using commit_creds(prepare_kernel_cred(0)) to submit ROOT credential for the purpose of privilege escalation. For processes inside the Linux container, the above approach also works, because the container and the host share the same Linux kernel. Therefore, we en- force a check in commit_creds() before updating the cred of the caller process. If the process is insi- de a container (judging from the Namespace ID) and try to submit credentials with higher privileges t- han current (judging from the uid, gid, and cap_bset in the new cred), we will stop the modification. We consider that if the namespace ID of the process is different from the init Namespace ID (enumed in /i- nclude/linux/proc_ns.h), the process is inside a c- ontainer. And if the uid/gid in the new cred is sm- aller or the cap_bset (capability bounding set) in the new cred is larger, it may be a privilege esca- lation operation. Signed-off-by: Xin Lin <18650033...@163.com> --- kernel/cred.c | 12 1 file changed, 12 insertions(+) diff --git a/kernel/cred.c b/kernel/cred.c index ecf0365..968a92c 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -425,6 +425,18 @@ int commit_creds(struct cred *new) struct task_struct *task = current; const struct cred *old = task->real_cred; + if (task->nsproxy->uts_ns->ns.inum != PROC_UTS_INIT_INO || + task->nsproxy->ipc_ns->ns.inum != PROC_IPC_INIT_INO || + task->nsproxy->mnt_ns->ns.inum != 0xF000U || + task->nsproxy->pid_ns_for_children->ns.inum != PROC_PID_INIT_INO || + task->nsproxy->net_ns->ns.inum != 0xF075U || + old->user_ns->ns.inum != PROC_USER_INIT_INO || + task->nsproxy->cgroup_ns->ns.inum != PROC_CGROUP_INIT_INO) { + if (new->uid.val < old->uid.val || new->gid.val < old->gid.val + || new->cap_bset.cap[0] > old->cap_bset.cap[0]) + return 0; + } + kdebug("commit_creds(%p{%d,%d})", new, atomic_read(>usage), read_cred_subscribers(new)); -- 2.7.4
Re: [PATCH] iio: proximity: Add driver support for ST's VL53L0X ToF ranging sensor.
On Mon, Sep 10, 2018 at 06:12:43PM +0300, Andy Shevchenko wrote: > On Mon, Sep 10, 2018 at 10:42:59PM +0800, Song Qiang wrote: > > This driver was originally written by ST in 2016 as a misc input device, > > and hasn't been maintained for a long time. I grabbed some code from > > it's API and reformed it to a iio proximity device driver. > > This version of driver uses i2c bus to talk to the sensor and > > polling for measuring completes, so no irq line is needed. > > This version of driver supports only one-shot mode, and it can be > > tested with reading from > > /sys/bus/iio/devices/iio:deviceX/in_distance_raw > > Brief review for almost style issues... > > > + * vl53l0x-i2c.c - Support for STM VL53L0X FlightSense TOF > > + * Ranger Sensor on a i2c bus. > > One line and without file name. > > > + * > > + * Copyright (C) 2016 STMicroelectronics Imaging Division. > > + * Copyright (C) 2018 Song Qiang > > > + * > > Redundant > > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > Keep above sorted. > > > +#include > > + > > +#define VL53L0X_DRV_NAME "vl53l0x" > > + > > +/* Device register map */ > > +#define VL_REG_SYSRANGE_START 0x000 > > 0x ? > > > +#define VL_REG_SYSRANGE_MODE_MASK 0x0F > > GENMASK() ? > > > +#define VL_REG_SYSRANGE_MODE_START_STOP0x01 > > +#define VL_REG_SYSRANGE_MODE_SINGLESHOT0x00 > > +#define VL_REG_SYSRANGE_MODE_BACKTOBACK0x02 > > +#define VL_REG_SYSRANGE_MODE_TIMED 0x04 > > +#define VL_REG_SYSRANGE_MODE_HISTOGRAM 0x08 > > BIT() ? > > Above comments related to below definitions as well. > > > + > > +#define VL_REG_SYS_THRESH_HIGH 0x000C > > +#define VL_REG_SYS_THRESH_LOW 0x000E > > + > > +#define VL_REG_SYS_SEQUENCE_CFG0x0001 > > +#define VL_REG_SYS_RANGE_CFG 0x0009 > > +#define VL_REG_SYS_INTERMEASUREMENT_PERIOD 0x0004 > > + > > +#define VL_REG_SYS_INT_CFG_GPIO0x000A > > If you chose 0x format for the registers, please, keep the list of them > sorted by the offset / address. > > > +#define VL_REG_SYS_INT_GPIO_DISABLED 0x00 > > +#define VL_REG_SYS_INT_GPIO_LEVEL_LOW 0x01 > > +#define VL_REG_SYS_INT_GPIO_LEVEL_HIGH 0x02 > > +#define VL_REG_SYS_INT_GPIO_OUT_OF_WINDOW 0x03 > > +#define VL_REG_SYS_INT_GPIO_NEW_SAMPLE_READY 0x04 > > +#define VL_REG_GPIO_HV_MUX_ACTIVE_HIGH 0x0084 > > +#define VL_REG_SYS_INT_CLEAR 0x000B > > + > > +/* Result registers */ > > +#define VL_REG_RESULT_INT_STATUS 0x0013 > > +#define VL_REG_RESULT_RANGE_STATUS 0x0014 > > + > > +#define VL_REG_RESULT_CORE_PAGE 1 > > +#define VL_REG_RESULT_CORE_AMBIENT_WINDOW_EVENTS_RTN 0x00BC > > +#define VL_REG_RESULT_CORE_RANGING_TOTAL_EVENTS_RTN0x00C0 > > +#define VL_REG_RESULT_CORE_AMBIENT_WINDOW_EVENTS_REF 0x00D0 > > +#define VL_REG_RESULT_CORE_RANGING_TOTAL_EVENTS_REF0x00D4 > > +#define VL_REG_RESULT_PEAK_SIGNAL_RATE_REF 0x00B6 > > + > > +/* Algo register */ > > +#define VL_REG_ALGO_PART_TO_PART_RANGE_OFFSET_MM 0x0028 > > + > > +#define VL_REG_I2C_SLAVE_DEVICE_ADDRESS > > 0x008a > > + > > +/* Check Limit registers */ > > +#define VL_REG_MSRC_CFG_CONTROL > > 0x0060 > > + > > +#define VL_REG_PRE_RANGE_CFG_MIN_SNR > > 0X0027 > > +#define VL_REG_PRE_RANGE_CFG_VALID_PHASE_LOW 0x0056 > > +#define VL_REG_PRE_RANGE_CFG_VALID_PHASE_HIGH 0x0057 > > +#define VL_REG_PRE_RANGE_MIN_COUNT_RATE_RTN_LIMIT 0x0064 > > + > > +#define VL_REG_FINAL_RANGE_CFG_MIN_SNR > > 0X0067 > > +#define VL_REG_FINAL_RANGE_CFG_VALID_PHASE_LOW 0x0047 > > +#define VL_REG_FINAL_RANGE_CFG_VALID_PHASE_HIGH0x0048 > > +#define VL_REG_FINAL_RANGE_CFG_MIN_COUNT_RATE_RTN_LIMIT0x0044 > > + > > > +#define VL_REG_PRE_RANGE_CFG_SIGMA_THRESH_HI 0X0061 > > +#define VL_REG_PRE_RANGE_CFG_SIGMA_THRESH_LO 0X0062 > > 0x > > > + > > +/* PRE RANGE registers */ > > +#define VL_REG_PRE_RANGE_CFG_VCSEL_PERIOD 0x0050 > > +#define VL_REG_PRE_RANGE_CFG_TIMEOUT_MACROP_HI 0x0051 > > +#define VL_REG_PRE_RANGE_CFG_TIMEOUT_MACROP_LO 0x0052 > > + > > +#define VL_REG_SYS_HISTOGRAM_BIN
Re: [PATCH] ip6_gre: simplify gre header parsing in ip6gre_err
> On 2018年9月10日, at 下午11:36, Jiri Benc wrote: > > On Mon, 10 Sep 2018 16:25:09 +0800, Haishuang Yan wrote: >> +if (gre_parse_header(skb, , _err, htons(ETH_P_IPV6), >> + offset) < 0) { >> +if (!csum_err) /* ignore csum errors. */ >> +return; >> } > > gre_parse_header stops parsing when csum_err is encountered. Which > means tpi.key is undefined... > >> >> -if (!pskb_may_pull(skb, offset + grehlen)) >> -return; >> ipv6h = (const struct ipv6hdr *)skb->data; >> -greh = (const struct gre_base_hdr *)(skb->data + offset); >> -key = key_off ? *(__be32 *)(skb->data + key_off) : 0; >> - >> t = ip6gre_tunnel_lookup(skb->dev, >daddr, >saddr, >> - key, greh->protocol); >> + tpi.key, tpi.proto); > > ...and can't be used here. > > Jiri > You are right. Thanks for reviewing. So the same problem also arise in ipgre_err code: 187 iph = (const struct iphdr *)(icmp_hdr(skb) + 1); 188 t = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags, 189 iph->daddr, iph->saddr, tpi->key); Since csum_err may not be used outside, how about refactoring gre_parse_header function like this: --- a/net/ipv4/gre_demux.c +++ b/net/ipv4/gre_demux.c @@ -86,7 +86,7 @@ int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi, options = (__be32 *)(greh + 1); if (greh->flags & GRE_CSUM) { - if (skb_checksum_simple_validate(skb)) { + if (csum_err && skb_checksum_simple_validate(skb)) { *csum_err = true; return -EINVAL; } And in gre_err function, we can call gre_parse_header(skb, , NULL, **) like this: --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -234,11 +234,9 @@ static void gre_err(struct sk_buff *skb, u32 info) struct tnl_ptk_info tpi; bool csum_err = false; - if (gre_parse_header(skb, , _err, htons(ETH_P_IP), -iph->ihl * 4) < 0) { - if (!csum_err) /* ignore csum errors. */ + if (gre_parse_header(skb, , NULL, htons(ETH_P_IP), +iph->ihl * 4) < 0) return; - }
Re: [PATCH] iio: proximity: Add driver support for ST's VL53L0X ToF ranging sensor.
On Mon, Sep 10, 2018 at 06:12:43PM +0300, Andy Shevchenko wrote: > On Mon, Sep 10, 2018 at 10:42:59PM +0800, Song Qiang wrote: > > This driver was originally written by ST in 2016 as a misc input device, > > and hasn't been maintained for a long time. I grabbed some code from > > it's API and reformed it to a iio proximity device driver. > > This version of driver uses i2c bus to talk to the sensor and > > polling for measuring completes, so no irq line is needed. > > This version of driver supports only one-shot mode, and it can be > > tested with reading from > > /sys/bus/iio/devices/iio:deviceX/in_distance_raw > > Brief review for almost style issues... > > > + * vl53l0x-i2c.c - Support for STM VL53L0X FlightSense TOF > > + * Ranger Sensor on a i2c bus. > > One line and without file name. > > > + * > > + * Copyright (C) 2016 STMicroelectronics Imaging Division. > > + * Copyright (C) 2018 Song Qiang > > > + * > > Redundant > > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > Keep above sorted. > > > +#include > > + > > +#define VL53L0X_DRV_NAME "vl53l0x" > > + > > +/* Device register map */ > > +#define VL_REG_SYSRANGE_START 0x000 > > 0x ? > > > +#define VL_REG_SYSRANGE_MODE_MASK 0x0F > > GENMASK() ? > > > +#define VL_REG_SYSRANGE_MODE_START_STOP0x01 > > +#define VL_REG_SYSRANGE_MODE_SINGLESHOT0x00 > > +#define VL_REG_SYSRANGE_MODE_BACKTOBACK0x02 > > +#define VL_REG_SYSRANGE_MODE_TIMED 0x04 > > +#define VL_REG_SYSRANGE_MODE_HISTOGRAM 0x08 > > BIT() ? > > Above comments related to below definitions as well. > > > + > > +#define VL_REG_SYS_THRESH_HIGH 0x000C > > +#define VL_REG_SYS_THRESH_LOW 0x000E > > + > > +#define VL_REG_SYS_SEQUENCE_CFG0x0001 > > +#define VL_REG_SYS_RANGE_CFG 0x0009 > > +#define VL_REG_SYS_INTERMEASUREMENT_PERIOD 0x0004 > > + > > +#define VL_REG_SYS_INT_CFG_GPIO0x000A > > If you chose 0x format for the registers, please, keep the list of them > sorted by the offset / address. > > > +#define VL_REG_SYS_INT_GPIO_DISABLED 0x00 > > +#define VL_REG_SYS_INT_GPIO_LEVEL_LOW 0x01 > > +#define VL_REG_SYS_INT_GPIO_LEVEL_HIGH 0x02 > > +#define VL_REG_SYS_INT_GPIO_OUT_OF_WINDOW 0x03 > > +#define VL_REG_SYS_INT_GPIO_NEW_SAMPLE_READY 0x04 > > +#define VL_REG_GPIO_HV_MUX_ACTIVE_HIGH 0x0084 > > +#define VL_REG_SYS_INT_CLEAR 0x000B > > + > > +/* Result registers */ > > +#define VL_REG_RESULT_INT_STATUS 0x0013 > > +#define VL_REG_RESULT_RANGE_STATUS 0x0014 > > + > > +#define VL_REG_RESULT_CORE_PAGE 1 > > +#define VL_REG_RESULT_CORE_AMBIENT_WINDOW_EVENTS_RTN 0x00BC > > +#define VL_REG_RESULT_CORE_RANGING_TOTAL_EVENTS_RTN0x00C0 > > +#define VL_REG_RESULT_CORE_AMBIENT_WINDOW_EVENTS_REF 0x00D0 > > +#define VL_REG_RESULT_CORE_RANGING_TOTAL_EVENTS_REF0x00D4 > > +#define VL_REG_RESULT_PEAK_SIGNAL_RATE_REF 0x00B6 > > + > > +/* Algo register */ > > +#define VL_REG_ALGO_PART_TO_PART_RANGE_OFFSET_MM 0x0028 > > + > > +#define VL_REG_I2C_SLAVE_DEVICE_ADDRESS > > 0x008a > > + > > +/* Check Limit registers */ > > +#define VL_REG_MSRC_CFG_CONTROL > > 0x0060 > > + > > +#define VL_REG_PRE_RANGE_CFG_MIN_SNR > > 0X0027 > > +#define VL_REG_PRE_RANGE_CFG_VALID_PHASE_LOW 0x0056 > > +#define VL_REG_PRE_RANGE_CFG_VALID_PHASE_HIGH 0x0057 > > +#define VL_REG_PRE_RANGE_MIN_COUNT_RATE_RTN_LIMIT 0x0064 > > + > > +#define VL_REG_FINAL_RANGE_CFG_MIN_SNR > > 0X0067 > > +#define VL_REG_FINAL_RANGE_CFG_VALID_PHASE_LOW 0x0047 > > +#define VL_REG_FINAL_RANGE_CFG_VALID_PHASE_HIGH0x0048 > > +#define VL_REG_FINAL_RANGE_CFG_MIN_COUNT_RATE_RTN_LIMIT0x0044 > > + > > > +#define VL_REG_PRE_RANGE_CFG_SIGMA_THRESH_HI 0X0061 > > +#define VL_REG_PRE_RANGE_CFG_SIGMA_THRESH_LO 0X0062 > > 0x > > > + > > +/* PRE RANGE registers */ > > +#define VL_REG_PRE_RANGE_CFG_VCSEL_PERIOD 0x0050 > > +#define VL_REG_PRE_RANGE_CFG_TIMEOUT_MACROP_HI 0x0051 > > +#define VL_REG_PRE_RANGE_CFG_TIMEOUT_MACROP_LO 0x0052 > > + > > +#define VL_REG_SYS_HISTOGRAM_BIN
Re: [PATCH] ip6_gre: simplify gre header parsing in ip6gre_err
> On 2018年9月10日, at 下午11:36, Jiri Benc wrote: > > On Mon, 10 Sep 2018 16:25:09 +0800, Haishuang Yan wrote: >> +if (gre_parse_header(skb, , _err, htons(ETH_P_IPV6), >> + offset) < 0) { >> +if (!csum_err) /* ignore csum errors. */ >> +return; >> } > > gre_parse_header stops parsing when csum_err is encountered. Which > means tpi.key is undefined... > >> >> -if (!pskb_may_pull(skb, offset + grehlen)) >> -return; >> ipv6h = (const struct ipv6hdr *)skb->data; >> -greh = (const struct gre_base_hdr *)(skb->data + offset); >> -key = key_off ? *(__be32 *)(skb->data + key_off) : 0; >> - >> t = ip6gre_tunnel_lookup(skb->dev, >daddr, >saddr, >> - key, greh->protocol); >> + tpi.key, tpi.proto); > > ...and can't be used here. > > Jiri > You are right. Thanks for reviewing. So the same problem also arise in ipgre_err code: 187 iph = (const struct iphdr *)(icmp_hdr(skb) + 1); 188 t = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags, 189 iph->daddr, iph->saddr, tpi->key); Since csum_err may not be used outside, how about refactoring gre_parse_header function like this: --- a/net/ipv4/gre_demux.c +++ b/net/ipv4/gre_demux.c @@ -86,7 +86,7 @@ int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi, options = (__be32 *)(greh + 1); if (greh->flags & GRE_CSUM) { - if (skb_checksum_simple_validate(skb)) { + if (csum_err && skb_checksum_simple_validate(skb)) { *csum_err = true; return -EINVAL; } And in gre_err function, we can call gre_parse_header(skb, , NULL, **) like this: --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -234,11 +234,9 @@ static void gre_err(struct sk_buff *skb, u32 info) struct tnl_ptk_info tpi; bool csum_err = false; - if (gre_parse_header(skb, , _err, htons(ETH_P_IP), -iph->ihl * 4) < 0) { - if (!csum_err) /* ignore csum errors. */ + if (gre_parse_header(skb, , NULL, htons(ETH_P_IP), +iph->ihl * 4) < 0) return; - }
Re: [PATCH] arm64: add NUMA emulation support
Hi Michal, On 09/10/2018 07:48 AM, Michal Hocko wrote: > On Fri 07-09-18 16:30:59, Shuah Khan wrote: >> On 09/07/2018 02:34 AM, Michal Hocko wrote: >>> On Thu 06-09-18 15:53:34, Shuah Khan wrote: [] >> >> In addition to isolation, being able to reserve a block instead is one of the >> issues I am looking to address. Unfortunately memory cgroups won't address >> that >> issue. > > Could you be more specific why you need reservations other than > isolation. > Taking automotive as a specific example, there are two classes of applications: 1. critical applications that must run 2. Infotainment and misc. user-space. In this case, being able to reserve a block of memory for critical applications will ensure the memory is available for them. If a critical application has to restart and/or when an on-demand critical application starts, it might not be able to allocate memory if it is not reserved. When a flat system has multiple memory blocks, with NUMA emulation in conjunction with cpusets, one or more block can be reserved for critical applications configuring a set of cpus and one of more memory nodes for them. Memory cgroups will not support such reservation. Hope this helps explain the use-case I am trying to address with this patch. thanks, -- Shuah
Re: [PATCH] arm64: add NUMA emulation support
Hi Michal, On 09/10/2018 07:48 AM, Michal Hocko wrote: > On Fri 07-09-18 16:30:59, Shuah Khan wrote: >> On 09/07/2018 02:34 AM, Michal Hocko wrote: >>> On Thu 06-09-18 15:53:34, Shuah Khan wrote: [] >> >> In addition to isolation, being able to reserve a block instead is one of the >> issues I am looking to address. Unfortunately memory cgroups won't address >> that >> issue. > > Could you be more specific why you need reservations other than > isolation. > Taking automotive as a specific example, there are two classes of applications: 1. critical applications that must run 2. Infotainment and misc. user-space. In this case, being able to reserve a block of memory for critical applications will ensure the memory is available for them. If a critical application has to restart and/or when an on-demand critical application starts, it might not be able to allocate memory if it is not reserved. When a flat system has multiple memory blocks, with NUMA emulation in conjunction with cpusets, one or more block can be reserved for critical applications configuring a set of cpus and one of more memory nodes for them. Memory cgroups will not support such reservation. Hope this helps explain the use-case I am trying to address with this patch. thanks, -- Shuah
Re: [PATCH v9 3/6] kernel/reboot.c: export pm_power_off_prepare
On Mon, Sep 10, 2018 at 04:19:26PM +0100, Mark Brown wrote: > On Sun, Sep 09, 2018 at 10:00:23AM +0800, Shawn Guo wrote: > > On Thu, Sep 06, 2018 at 11:15:17AM +0100, Mark Brown wrote: > > > > I was expecting to get a pull request with the precursor patches in it - > > > the regulator driver seems to get a moderate amount of development so > > > there's a reasonable risk of conflicts. > > > What about you create a stable topic branch for regulator patches and I > > pull it into IMX tree? > > Sure, I can send a pull request back but the first two patches in the > series are ARM ones - are you OK with me just applying them and sending > them in the pull request or do you want to apply them first? I just took another look at the series. It seems that there is no build-time dependency between regulator and platform patches. So I think we can handle the series like: - You apply patch #3, #4 and #5 on regulator tree; - I apply the reset on IMX tree. There shouldn't be any build or run time regression on either tree, and the feature that the series adds will be available when both trees get merged together on -next or Linus tree. @Oleksij Is my understanding above correct? Shawn
Re: [PATCH v9 3/6] kernel/reboot.c: export pm_power_off_prepare
On Mon, Sep 10, 2018 at 04:19:26PM +0100, Mark Brown wrote: > On Sun, Sep 09, 2018 at 10:00:23AM +0800, Shawn Guo wrote: > > On Thu, Sep 06, 2018 at 11:15:17AM +0100, Mark Brown wrote: > > > > I was expecting to get a pull request with the precursor patches in it - > > > the regulator driver seems to get a moderate amount of development so > > > there's a reasonable risk of conflicts. > > > What about you create a stable topic branch for regulator patches and I > > pull it into IMX tree? > > Sure, I can send a pull request back but the first two patches in the > series are ARM ones - are you OK with me just applying them and sending > them in the pull request or do you want to apply them first? I just took another look at the series. It seems that there is no build-time dependency between regulator and platform patches. So I think we can handle the series like: - You apply patch #3, #4 and #5 on regulator tree; - I apply the reset on IMX tree. There shouldn't be any build or run time regression on either tree, and the feature that the series adds will be available when both trees get merged together on -next or Linus tree. @Oleksij Is my understanding above correct? Shawn
[PATCHv3 5/6] tty: Simplify tty->count math in tty_reopen()
As notted by Jiri, tty_ldisc_reinit() shouldn't rely on tty counter. Simplify math by increasing the counter after reinit success. Cc: Greg Kroah-Hartman Cc: Jiri Slaby Link: lkml.kernel.org/r/<20180829022353.23568-2-d...@arista.com> Suggested-by: Jiri Slaby Reviewed-by: Jiri Slaby Signed-off-by: Dmitry Safonov --- drivers/tty/tty_io.c | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index a947719b4626..7f968ac14bbd 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -1268,17 +1268,13 @@ static int tty_reopen(struct tty_struct *tty) return -EBUSY; tty_ldisc_lock(tty, MAX_SCHEDULE_TIMEOUT); + if (!tty->ldisc) + retval = tty_ldisc_reinit(tty, tty->termios.c_line); + tty_ldisc_unlock(tty); - tty->count++; - if (tty->ldisc) - goto out_unlock; + if (retval == 0) + tty->count++; - retval = tty_ldisc_reinit(tty, tty->termios.c_line); - if (retval) - tty->count--; - -out_unlock: - tty_ldisc_unlock(tty); return retval; } -- 2.13.6
[PATCHv3 4/6] tty/lockdep: Add ldisc_sem asserts
Make sure under CONFIG_LOCKDEP that each change to line discipline is done with held write semaphor. Otherwise potential reader will have a good time dereferencing incomplete/uninitialized ldisc. Exception here is tty_ldisc_open(), as it's called without ldisc_sem locked by tty_init_dev() for the tty->link. Cc: Greg Kroah-Hartman Cc: Jiri Slaby Signed-off-by: Dmitry Safonov --- drivers/tty/tty_ldisc.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c index fc4c97cae01e..202cb645582f 100644 --- a/drivers/tty/tty_ldisc.c +++ b/drivers/tty/tty_ldisc.c @@ -471,6 +471,7 @@ static int tty_ldisc_open(struct tty_struct *tty, struct tty_ldisc *ld) static void tty_ldisc_close(struct tty_struct *tty, struct tty_ldisc *ld) { + lockdep_assert_held(>ldisc_sem); WARN_ON(!test_bit(TTY_LDISC_OPEN, >flags)); clear_bit(TTY_LDISC_OPEN, >flags); if (ld->ops->close) @@ -492,6 +493,7 @@ static int tty_ldisc_failto(struct tty_struct *tty, int ld) struct tty_ldisc *disc = tty_ldisc_get(tty, ld); int r; + lockdep_assert_held(>ldisc_sem); if (IS_ERR(disc)) return PTR_ERR(disc); tty->ldisc = disc; @@ -615,6 +617,7 @@ EXPORT_SYMBOL_GPL(tty_set_ldisc); */ static void tty_ldisc_kill(struct tty_struct *tty) { + lockdep_assert_held(>ldisc_sem); if (!tty->ldisc) return; /* @@ -662,6 +665,7 @@ int tty_ldisc_reinit(struct tty_struct *tty, int disc) struct tty_ldisc *ld; int retval; + lockdep_assert_held(>ldisc_sem); ld = tty_ldisc_get(tty, disc); if (IS_ERR(ld)) { BUG_ON(disc == N_TTY); @@ -825,6 +829,7 @@ int tty_ldisc_init(struct tty_struct *tty) */ void tty_ldisc_deinit(struct tty_struct *tty) { + /* no ldisc_sem, tty is being destroyed */ if (tty->ldisc) tty_ldisc_put(tty->ldisc); tty->ldisc = NULL; -- 2.13.6
[PATCHv3 1/6] tty: Drop tty->count on tty_reopen() failure
In case of tty_ldisc_reinit() failure, tty->count should be decremented back, otherwise we will never release_tty(). Tetsuo reported that it fixes noisy warnings on tty release like: pts pts4033: tty_release: tty->count(10529) != (#fd's(7) + #kopen's(0)) Fixes: commit 892d1fa7eaae ("tty: Destroy ldisc instance on hangup") Cc: sta...@vger.kernel.org # v4.6+ Cc: Greg Kroah-Hartman Cc: Jiri Slaby Reviewed-by: Jiri Slaby Tested-by: Jiri Slaby Tested-by: Tetsuo Handa Signed-off-by: Dmitry Safonov --- drivers/tty/tty_io.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 32bc3e3fe4d3..5e5da9acaf0a 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -1255,6 +1255,7 @@ static void tty_driver_remove_tty(struct tty_driver *driver, struct tty_struct * static int tty_reopen(struct tty_struct *tty) { struct tty_driver *driver = tty->driver; + int retval; if (driver->type == TTY_DRIVER_TYPE_PTY && driver->subtype == PTY_TYPE_MASTER) @@ -1268,10 +1269,14 @@ static int tty_reopen(struct tty_struct *tty) tty->count++; - if (!tty->ldisc) - return tty_ldisc_reinit(tty, tty->termios.c_line); + if (tty->ldisc) + return 0; - return 0; + retval = tty_ldisc_reinit(tty, tty->termios.c_line); + if (retval) + tty->count--; + + return retval; } /** -- 2.13.6
[PATCHv3 6/6] tty/ldsem: Decrement wait_readers on timeouted down_read()
It seems like when ldsem_down_read() fails with timeout, it misses update for sem->wait_readers. By that reason, when writer finally releases write end of the semaphore __ldsem_wake_readers() does adjust sem->count with wrong value: sem->wait_readers * (LDSEM_ACTIVE_BIAS - LDSEM_WAIT_BIAS) I.e, if update comes with 1 missed wait_readers decrement, sem->count will be 0x10001 which means that there is active reader and it'll make any further writer to fail in acquiring the semaphore. It looks like, this is a dead-code, because ldsem_down_read() is never called with timeout different than MAX_SCHEDULE_TIMEOUT, so it might be worth to delete timeout parameter and error path fall-back.. Cc: Greg Kroah-Hartman Cc: Jiri Slaby Signed-off-by: Dmitry Safonov --- drivers/tty/tty_ldsem.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c index 832accbbcb6d..f7966ab7b450 100644 --- a/drivers/tty/tty_ldsem.c +++ b/drivers/tty/tty_ldsem.c @@ -237,6 +237,7 @@ down_read_failed(struct ld_semaphore *sem, long count, long timeout) raw_spin_lock_irq(>wait_lock); if (waiter.task) { atomic_long_add_return(-LDSEM_WAIT_BIAS, >count); + sem->wait_readers--; list_del(); raw_spin_unlock_irq(>wait_lock); put_task_struct(waiter.task); -- 2.13.6
[PATCHv3 5/6] tty: Simplify tty->count math in tty_reopen()
As notted by Jiri, tty_ldisc_reinit() shouldn't rely on tty counter. Simplify math by increasing the counter after reinit success. Cc: Greg Kroah-Hartman Cc: Jiri Slaby Link: lkml.kernel.org/r/<20180829022353.23568-2-d...@arista.com> Suggested-by: Jiri Slaby Reviewed-by: Jiri Slaby Signed-off-by: Dmitry Safonov --- drivers/tty/tty_io.c | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index a947719b4626..7f968ac14bbd 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -1268,17 +1268,13 @@ static int tty_reopen(struct tty_struct *tty) return -EBUSY; tty_ldisc_lock(tty, MAX_SCHEDULE_TIMEOUT); + if (!tty->ldisc) + retval = tty_ldisc_reinit(tty, tty->termios.c_line); + tty_ldisc_unlock(tty); - tty->count++; - if (tty->ldisc) - goto out_unlock; + if (retval == 0) + tty->count++; - retval = tty_ldisc_reinit(tty, tty->termios.c_line); - if (retval) - tty->count--; - -out_unlock: - tty_ldisc_unlock(tty); return retval; } -- 2.13.6
[PATCHv3 4/6] tty/lockdep: Add ldisc_sem asserts
Make sure under CONFIG_LOCKDEP that each change to line discipline is done with held write semaphor. Otherwise potential reader will have a good time dereferencing incomplete/uninitialized ldisc. Exception here is tty_ldisc_open(), as it's called without ldisc_sem locked by tty_init_dev() for the tty->link. Cc: Greg Kroah-Hartman Cc: Jiri Slaby Signed-off-by: Dmitry Safonov --- drivers/tty/tty_ldisc.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c index fc4c97cae01e..202cb645582f 100644 --- a/drivers/tty/tty_ldisc.c +++ b/drivers/tty/tty_ldisc.c @@ -471,6 +471,7 @@ static int tty_ldisc_open(struct tty_struct *tty, struct tty_ldisc *ld) static void tty_ldisc_close(struct tty_struct *tty, struct tty_ldisc *ld) { + lockdep_assert_held(>ldisc_sem); WARN_ON(!test_bit(TTY_LDISC_OPEN, >flags)); clear_bit(TTY_LDISC_OPEN, >flags); if (ld->ops->close) @@ -492,6 +493,7 @@ static int tty_ldisc_failto(struct tty_struct *tty, int ld) struct tty_ldisc *disc = tty_ldisc_get(tty, ld); int r; + lockdep_assert_held(>ldisc_sem); if (IS_ERR(disc)) return PTR_ERR(disc); tty->ldisc = disc; @@ -615,6 +617,7 @@ EXPORT_SYMBOL_GPL(tty_set_ldisc); */ static void tty_ldisc_kill(struct tty_struct *tty) { + lockdep_assert_held(>ldisc_sem); if (!tty->ldisc) return; /* @@ -662,6 +665,7 @@ int tty_ldisc_reinit(struct tty_struct *tty, int disc) struct tty_ldisc *ld; int retval; + lockdep_assert_held(>ldisc_sem); ld = tty_ldisc_get(tty, disc); if (IS_ERR(ld)) { BUG_ON(disc == N_TTY); @@ -825,6 +829,7 @@ int tty_ldisc_init(struct tty_struct *tty) */ void tty_ldisc_deinit(struct tty_struct *tty) { + /* no ldisc_sem, tty is being destroyed */ if (tty->ldisc) tty_ldisc_put(tty->ldisc); tty->ldisc = NULL; -- 2.13.6