INFO: rcu detected stall in kvm_vcpu_ioctl

2018-09-10 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:3d0e7a9e00fd Merge tag 'md/4.19-rc2' of git://git.kernel.o..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1666429e40
kernel config:  https://syzkaller.appspot.com/x/.config?x=8f59875069d721b6
dashboard link: https://syzkaller.appspot.com/bug?extid=e9b1e8f574404b6e4ed3
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+e9b1e8f574404b6e4...@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:(detected by 0, t=10502 jiffies, g=45997, q=77)
rcu: All QSes seen, last rcu_preempt kthread activity 10502  
(4294979638-4294969136), jiffies_till_next_fqs=1, root ->qsmask 0x0

syz-executor7   R  running task22096 16667   5475 0x
Call Trace:
 
 sched_show_task.cold.83+0x2b6/0x30a kernel/sched/core.c:5296
 print_other_cpu_stall.cold.79+0xa83/0xba5 kernel/rcu/tree.c:1430
 check_cpu_stall kernel/rcu/tree.c:1557 [inline]
 __rcu_pending kernel/rcu/tree.c:3276 [inline]
 rcu_pending kernel/rcu/tree.c:3319 [inline]
 rcu_check_callbacks+0xafc/0x1990 kernel/rcu/tree.c:2665
 update_process_times+0x2d/0x70 kernel/time/timer.c:1636
 tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
 tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
 __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
 __hrtimer_run_queues+0x41c/0x10d0 kernel/time/hrtimer.c:1460
 hrtimer_interrupt+0x313/0x780 kernel/time/hrtimer.c:1518
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1029 [inline]
 smp_apic_timer_interrupt+0x1a1/0x760 arch/x86/kernel/apic/apic.c:1054
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
 
RIP: 0010:__sanitizer_cov_trace_const_cmp4+0x0/0x20 kernel/kcov.c:183
Code: a6 fe ff ff 5d c3 0f 1f 40 00 55 0f b7 d6 0f b7 f7 bf 03 00 00 00 48  
89 e5 48 8b 4d 08 e8 88 fe ff ff 5d c3 66 0f 1f 44 00 00 <55> 89 f2 89 fe  
bf 05 00 00 00 48 89 e5 48 8b 4d 08 e8 6a fe ff ff

RSP: 0018:88019baf7858 EFLAGS: 0246 ORIG_RAX: ff13
RAX:  RBX: 88019ef30700 RCX: c90001ed4000
RDX: 0004 RSI:  RDI: 
RBP: 88019baf78d8 R08: 8801bd9ea700 R09: 112b43cd
R10: 88019baf7860 R11: 8801dae23993 R12: 
R13: 0007 R14: 0007 R15: dc00
 kvm_vcpu_ioctl+0x72b/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2590
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:501 [inline]
 do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702
 __do_sys_ioctl fs/ioctl.c:709 [inline]
 __se_sys_ioctl fs/ioctl.c:707 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457099
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00

RSP: 002b:7f8361215c78 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX: 7f83612166d4 RCX: 00457099
RDX:  RSI: ae80 RDI: 0006
RBP: 009300a0 R08:  R09: 
R10:  R11: 0246 R12: 
R13: 004cf730 R14: 004c59b9 R15: 
rcu: rcu_preempt kthread starved for 10502 jiffies! g45997 f0x2  
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1

rcu: RCU grace-period kthread stack dump:
rcu_preempt R  running task2287210  2 0x8000
Call Trace:
 context_switch kernel/sched/core.c:2825 [inline]
 __schedule+0x86c/0x1ed0 kernel/sched/core.c:3473
 schedule+0xfe/0x460 kernel/sched/core.c:3517
 schedule_timeout+0x140/0x260 kernel/time/timer.c:1804
 rcu_gp_kthread+0x9d9/0x2310 kernel/rcu/tree.c:2194
 kthread+0x35a/0x420 kernel/kthread.c:246
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
sched: RT throttling activated


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.


INFO: rcu detected stall in kvm_vcpu_ioctl

2018-09-10 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:3d0e7a9e00fd Merge tag 'md/4.19-rc2' of git://git.kernel.o..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1666429e40
kernel config:  https://syzkaller.appspot.com/x/.config?x=8f59875069d721b6
dashboard link: https://syzkaller.appspot.com/bug?extid=e9b1e8f574404b6e4ed3
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+e9b1e8f574404b6e4...@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:(detected by 0, t=10502 jiffies, g=45997, q=77)
rcu: All QSes seen, last rcu_preempt kthread activity 10502  
(4294979638-4294969136), jiffies_till_next_fqs=1, root ->qsmask 0x0

syz-executor7   R  running task22096 16667   5475 0x
Call Trace:
 
 sched_show_task.cold.83+0x2b6/0x30a kernel/sched/core.c:5296
 print_other_cpu_stall.cold.79+0xa83/0xba5 kernel/rcu/tree.c:1430
 check_cpu_stall kernel/rcu/tree.c:1557 [inline]
 __rcu_pending kernel/rcu/tree.c:3276 [inline]
 rcu_pending kernel/rcu/tree.c:3319 [inline]
 rcu_check_callbacks+0xafc/0x1990 kernel/rcu/tree.c:2665
 update_process_times+0x2d/0x70 kernel/time/timer.c:1636
 tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
 tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
 __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
 __hrtimer_run_queues+0x41c/0x10d0 kernel/time/hrtimer.c:1460
 hrtimer_interrupt+0x313/0x780 kernel/time/hrtimer.c:1518
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1029 [inline]
 smp_apic_timer_interrupt+0x1a1/0x760 arch/x86/kernel/apic/apic.c:1054
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
 
RIP: 0010:__sanitizer_cov_trace_const_cmp4+0x0/0x20 kernel/kcov.c:183
Code: a6 fe ff ff 5d c3 0f 1f 40 00 55 0f b7 d6 0f b7 f7 bf 03 00 00 00 48  
89 e5 48 8b 4d 08 e8 88 fe ff ff 5d c3 66 0f 1f 44 00 00 <55> 89 f2 89 fe  
bf 05 00 00 00 48 89 e5 48 8b 4d 08 e8 6a fe ff ff

RSP: 0018:88019baf7858 EFLAGS: 0246 ORIG_RAX: ff13
RAX:  RBX: 88019ef30700 RCX: c90001ed4000
RDX: 0004 RSI:  RDI: 
RBP: 88019baf78d8 R08: 8801bd9ea700 R09: 112b43cd
R10: 88019baf7860 R11: 8801dae23993 R12: 
R13: 0007 R14: 0007 R15: dc00
 kvm_vcpu_ioctl+0x72b/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2590
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:501 [inline]
 do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702
 __do_sys_ioctl fs/ioctl.c:709 [inline]
 __se_sys_ioctl fs/ioctl.c:707 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457099
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00

RSP: 002b:7f8361215c78 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX: 7f83612166d4 RCX: 00457099
RDX:  RSI: ae80 RDI: 0006
RBP: 009300a0 R08:  R09: 
R10:  R11: 0246 R12: 
R13: 004cf730 R14: 004c59b9 R15: 
rcu: rcu_preempt kthread starved for 10502 jiffies! g45997 f0x2  
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1

rcu: RCU grace-period kthread stack dump:
rcu_preempt R  running task2287210  2 0x8000
Call Trace:
 context_switch kernel/sched/core.c:2825 [inline]
 __schedule+0x86c/0x1ed0 kernel/sched/core.c:3473
 schedule+0xfe/0x460 kernel/sched/core.c:3517
 schedule_timeout+0x140/0x260 kernel/time/timer.c:1804
 rcu_gp_kthread+0x9d9/0x2310 kernel/rcu/tree.c:2194
 kthread+0x35a/0x420 kernel/kthread.c:246
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
sched: RT throttling activated


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.


Re: [PATCHv3 2/6] tty/ldsem: Update waiter->task before waking up reader

2018-09-10 Thread Sergey Senozhatsky
On (09/11/18 14:04), Sergey Senozhatsky wrote:
> > for (;;) {
> > set_current_state(TASK_UNINTERRUPTIBLE);
> 
> I think that set_current_state() also executes memory barrier. Just
> because it accesses task state.
> 
> > -   if (!waiter.task)
> > +   if (!READ_ONCE(waiter.task))
> > break;
> > if (!timeout)
> > break;

This READ_ONCE(waiter.task) looks interesting. Maybe could be moved
to a loop condition

while (!READ_ONCE(waiter.task)) {
...
}

-ss


Re: [PATCHv3 2/6] tty/ldsem: Update waiter->task before waking up reader

2018-09-10 Thread Sergey Senozhatsky
On (09/11/18 14:04), Sergey Senozhatsky wrote:
> > for (;;) {
> > set_current_state(TASK_UNINTERRUPTIBLE);
> 
> I think that set_current_state() also executes memory barrier. Just
> because it accesses task state.
> 
> > -   if (!waiter.task)
> > +   if (!READ_ONCE(waiter.task))
> > break;
> > if (!timeout)
> > break;

This READ_ONCE(waiter.task) looks interesting. Maybe could be moved
to a loop condition

while (!READ_ONCE(waiter.task)) {
...
}

-ss


[RFC PATCH 2/9] mm: introduce smp_list_del for concurrent list entry removals

2018-09-10 Thread Aaron Lu
From: Daniel Jordan 

Now that the LRU lock is a RW lock, lay the groundwork for fine-grained
synchronization so that multiple threads holding the lock as reader can
safely remove pages from an LRU at the same time.

Add a thread-safe variant of list_del called smp_list_del that allows
multiple threads to delete nodes from a list, and wrap this new list API
in smp_del_page_from_lru to get the LRU statistics updates right.

For bisectability's sake, call the new function only when holding
lru_lock as writer.  In the next patch, switch to taking it as reader.

The algorithm is explained in detail in the comments.  Yosef Lev
conceived of the algorithm, and this patch is heavily based on an
earlier version from him.  Thanks to Dave Dice for suggesting the
prefetch.

[aaronlu: only take list related code here]
Signed-off-by: Yosef Lev 
Signed-off-by: Daniel Jordan 
---
 include/linux/list.h |   2 +
 lib/Makefile |   2 +-
 lib/list.c   | 158 +++
 3 files changed, 161 insertions(+), 1 deletion(-)
 create mode 100644 lib/list.c

diff --git a/include/linux/list.h b/include/linux/list.h
index de04cc5ed536..0fd9c87dd14b 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -47,6 +47,8 @@ static inline bool __list_del_entry_valid(struct list_head 
*entry)
 }
 #endif
 
+extern void smp_list_del(struct list_head *entry);
+
 /*
  * Insert a new entry between two known consecutive entries.
  *
diff --git a/lib/Makefile b/lib/Makefile
index ca3f7ebb900d..9527b7484653 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -38,7 +38,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o 
random32.o \
 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
 percpu-refcount.o rhashtable.o reciprocal_div.o \
-once.o refcount.o usercopy.o errseq.o bucket_locks.o
+once.o refcount.o usercopy.o errseq.o bucket_locks.o list.o
 obj-$(CONFIG_STRING_SELFTEST) += test_string.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/lib/list.c b/lib/list.c
new file mode 100644
index ..4d0949ea1a09
--- /dev/null
+++ b/lib/list.c
@@ -0,0 +1,158 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (c) 2017, 2018 Oracle and/or its affiliates. All rights reserved.
+ *
+ * Authors: Yosef Lev 
+ *  Daniel Jordan 
+ */
+
+#include 
+#include 
+
+/*
+ * smp_list_del is a variant of list_del that allows concurrent list removals
+ * under certain assumptions.  The idea is to get away from overly coarse
+ * synchronization, such as using a lock to guard an entire list, which
+ * serializes all operations even though those operations might be happening on
+ * disjoint parts.
+ *
+ * If you want to use other functions from the list API concurrently,
+ * additional synchronization may be necessary.  For example, you could use a
+ * rwlock as a two-mode lock, where readers use the lock in shared mode and are
+ * allowed to call smp_list_del concurrently, and writers use the lock in
+ * exclusive mode and are allowed to use all list operations.
+ */
+
+/**
+ * smp_list_del - concurrent variant of list_del
+ * @entry: entry to delete from the list
+ *
+ * Safely removes an entry from the list in the presence of other threads that
+ * may try to remove adjacent entries.  Uses the entry's next field and the
+ * predecessor entry's next field as locks to accomplish this.
+ *
+ * Assumes that no two threads may try to delete the same entry.  This
+ * assumption holds, for example, if the objects on the list are
+ * reference-counted so that an object is only removed when its refcount falls
+ * to 0.
+ *
+ * @entry's next and prev fields are poisoned on return just as with list_del.
+ */
+void smp_list_del(struct list_head *entry)
+{
+   struct list_head *succ, *pred, *pred_reread;
+
+   /*
+* The predecessor entry's cacheline is read before it's written, so to
+* avoid an unnecessary cacheline state transition, prefetch for
+* writing.  In the common case, the predecessor won't change.
+*/
+   prefetchw(entry->prev);
+
+   /*
+* Step 1: Lock @entry E by making its next field point to its
+* predecessor D.  This prevents any thread from removing the
+* predecessor because that thread will loop in its step 4 while
+* E->next == D.  This also prevents any thread from removing the
+* successor F because that thread will see that F->prev->next != F in
+* the cmpxchg in its step 3.  Retry if the successor is being removed
+* and has already set this field to NULL in step 3.
+*/
+   succ = READ_ONCE(entry->next);
+   pred = READ_ONCE(entry->prev);
+   while (succ == NULL || cmpxchg(>next, succ, pred) != succ) {
+   /*
+* Reread @entry's successor because it may change until
+   

[RFC PATCH 8/9] mm: use smp_list_splice() on free path

2018-09-10 Thread Aaron Lu
With free path running concurrently, the cache bouncing on free
list head is severe since multiple threads can be freeing pages
and each free will need to add the page to free list head.

To improve performance on free path for order-0 pages, we can
choose to not add the merged pages to Buddy immediately after
merge but keep them on a local percpu list first and then after
all pages are finished merging, add these merged pages to Buddy
with smp_list_splice() in one go.

This optimization caused a problem though: the page we hold on the
local percpu list can be a buddy of other being freed page and we
lose the merge oppotunity for them. With this patch, we will have
mergable pages unmerged in Buddy.

Due to this, I don't see much value of keeping the range lock which
is used to avoid such thing from happening, so the range lock is
removed in this patch.

Signed-off-by: Aaron Lu 
---
 include/linux/mm.h |   1 +
 include/linux/mmzone.h |   3 -
 init/main.c|   1 +
 mm/page_alloc.c| 151 +
 4 files changed, 95 insertions(+), 61 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a61ebe8ad4ca..a99ba2cb7a0d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2155,6 +2155,7 @@ extern void memmap_init_zone(unsigned long, int, unsigned 
long, unsigned long,
 extern void setup_per_zone_wmarks(void);
 extern int __meminit init_per_zone_wmark_min(void);
 extern void mem_init(void);
+extern void percpu_mergelist_init(void);
 extern void __init mmap_init(void);
 extern void show_mem(unsigned int flags, nodemask_t *nodemask);
 extern long si_mem_available(void);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 0ea52e9bb610..e66b8c63d5d1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -467,9 +467,6 @@ struct zone {
/* Primarily protects free_area */
rwlock_tlock;
 
-   /* Protects merge operation for a range of order=(MAX_ORDER-1) pages */
-   spinlock_t  *range_locks;
-
/* Write-intensive fields used by compaction and vmstats. */
ZONE_PADDING(_pad2_)
 
diff --git a/init/main.c b/init/main.c
index 18f8f0140fa0..68a428e1bf15 100644
--- a/init/main.c
+++ b/init/main.c
@@ -517,6 +517,7 @@ static void __init mm_init(void)
 * bigger than MAX_ORDER unless SPARSEMEM.
 */
page_ext_init_flatmem();
+   percpu_mergelist_init();
mem_init();
kmem_cache_init();
pgtable_init();
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5f5cc671bcf7..df38c3f2a1cc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -339,17 +339,6 @@ static inline bool update_defer_init(pg_data_t *pgdat,
 }
 #endif
 
-/* Return a pointer to the spinblock for a pageblock this page belongs to */
-static inline spinlock_t *get_range_lock(struct page *page)
-{
-   struct zone *zone = page_zone(page);
-   unsigned long zone_start_pfn = zone->zone_start_pfn;
-   unsigned long range = (page_to_pfn(page) - zone_start_pfn) >>
-   (MAX_ORDER - 1);
-
-   return >range_locks[range];
-}
-
 /* Return a pointer to the bitmap storing bits affecting a block of pages */
 static inline unsigned long *get_pageblock_bitmap(struct page *page,
unsigned long pfn)
@@ -711,9 +700,15 @@ static inline void set_page_order(struct page *page, 
unsigned int order)
 static inline void add_to_buddy(struct page *page, struct zone *zone,
unsigned int order, int mt)
 {
+   /*
+* Adding page to free list before setting PageBuddy flag
+* or other thread doing merge can notice its PageBuddy flag
+* and attempt to merge with it, causing list corruption.
+*/
+   smp_list_add(>lru, >free_area[order].free_list[mt]);
+   smp_wmb();
set_page_order(page, order);
atomic_long_inc(>free_area[order].nr_free);
-   smp_list_add(>lru, >free_area[order].free_list[mt]);
 }
 
 static inline void rmv_page_order(struct page *page)
@@ -784,40 +779,17 @@ static inline int page_is_buddy(struct page *page, struct 
page *buddy,
return 0;
 }
 
-/*
- * Freeing function for a buddy system allocator.
- *
- * The concept of a buddy system is to maintain direct-mapped table
- * (containing bit values) for memory blocks of various "orders".
- * The bottom level table contains the map for the smallest allocatable
- * units of memory (here, pages), and each level above it describes
- * pairs of units from the levels below, hence, "buddies".
- * At a high level, all that happens here is marking the table entry
- * at the bottom level available, and propagating the changes upward
- * as necessary, plus some accounting needed to play nicely with other
- * parts of the VM system.
- * At each level, we keep a list of pages, which are heads of continuous
- * 

[RFC PATCH 1/9] mm: do not add anon pages to LRU

2018-09-10 Thread Aaron Lu
For the sake of testing purpose, do not add anon pages to LRU to
avoid LRU lock so we can test zone lock exclusively.

Signed-off-by: Aaron Lu 
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index c467102a5cbc..080641255b8b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3208,7 +3208,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, vmf->address, false);
mem_cgroup_commit_charge(page, memcg, false, false);
-   lru_cache_add_active_or_unevictable(page, vma);
+   //lru_cache_add_active_or_unevictable(page, vma);
 setpte:
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
 
-- 
2.17.1



[RFC PATCH 8/9] mm: use smp_list_splice() on free path

2018-09-10 Thread Aaron Lu
With free path running concurrently, the cache bouncing on free
list head is severe since multiple threads can be freeing pages
and each free will need to add the page to free list head.

To improve performance on free path for order-0 pages, we can
choose to not add the merged pages to Buddy immediately after
merge but keep them on a local percpu list first and then after
all pages are finished merging, add these merged pages to Buddy
with smp_list_splice() in one go.

This optimization caused a problem though: the page we hold on the
local percpu list can be a buddy of other being freed page and we
lose the merge oppotunity for them. With this patch, we will have
mergable pages unmerged in Buddy.

Due to this, I don't see much value of keeping the range lock which
is used to avoid such thing from happening, so the range lock is
removed in this patch.

Signed-off-by: Aaron Lu 
---
 include/linux/mm.h |   1 +
 include/linux/mmzone.h |   3 -
 init/main.c|   1 +
 mm/page_alloc.c| 151 +
 4 files changed, 95 insertions(+), 61 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a61ebe8ad4ca..a99ba2cb7a0d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2155,6 +2155,7 @@ extern void memmap_init_zone(unsigned long, int, unsigned 
long, unsigned long,
 extern void setup_per_zone_wmarks(void);
 extern int __meminit init_per_zone_wmark_min(void);
 extern void mem_init(void);
+extern void percpu_mergelist_init(void);
 extern void __init mmap_init(void);
 extern void show_mem(unsigned int flags, nodemask_t *nodemask);
 extern long si_mem_available(void);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 0ea52e9bb610..e66b8c63d5d1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -467,9 +467,6 @@ struct zone {
/* Primarily protects free_area */
rwlock_tlock;
 
-   /* Protects merge operation for a range of order=(MAX_ORDER-1) pages */
-   spinlock_t  *range_locks;
-
/* Write-intensive fields used by compaction and vmstats. */
ZONE_PADDING(_pad2_)
 
diff --git a/init/main.c b/init/main.c
index 18f8f0140fa0..68a428e1bf15 100644
--- a/init/main.c
+++ b/init/main.c
@@ -517,6 +517,7 @@ static void __init mm_init(void)
 * bigger than MAX_ORDER unless SPARSEMEM.
 */
page_ext_init_flatmem();
+   percpu_mergelist_init();
mem_init();
kmem_cache_init();
pgtable_init();
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5f5cc671bcf7..df38c3f2a1cc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -339,17 +339,6 @@ static inline bool update_defer_init(pg_data_t *pgdat,
 }
 #endif
 
-/* Return a pointer to the spinblock for a pageblock this page belongs to */
-static inline spinlock_t *get_range_lock(struct page *page)
-{
-   struct zone *zone = page_zone(page);
-   unsigned long zone_start_pfn = zone->zone_start_pfn;
-   unsigned long range = (page_to_pfn(page) - zone_start_pfn) >>
-   (MAX_ORDER - 1);
-
-   return >range_locks[range];
-}
-
 /* Return a pointer to the bitmap storing bits affecting a block of pages */
 static inline unsigned long *get_pageblock_bitmap(struct page *page,
unsigned long pfn)
@@ -711,9 +700,15 @@ static inline void set_page_order(struct page *page, 
unsigned int order)
 static inline void add_to_buddy(struct page *page, struct zone *zone,
unsigned int order, int mt)
 {
+   /*
+* Adding page to free list before setting PageBuddy flag
+* or other thread doing merge can notice its PageBuddy flag
+* and attempt to merge with it, causing list corruption.
+*/
+   smp_list_add(>lru, >free_area[order].free_list[mt]);
+   smp_wmb();
set_page_order(page, order);
atomic_long_inc(>free_area[order].nr_free);
-   smp_list_add(>lru, >free_area[order].free_list[mt]);
 }
 
 static inline void rmv_page_order(struct page *page)
@@ -784,40 +779,17 @@ static inline int page_is_buddy(struct page *page, struct 
page *buddy,
return 0;
 }
 
-/*
- * Freeing function for a buddy system allocator.
- *
- * The concept of a buddy system is to maintain direct-mapped table
- * (containing bit values) for memory blocks of various "orders".
- * The bottom level table contains the map for the smallest allocatable
- * units of memory (here, pages), and each level above it describes
- * pairs of units from the levels below, hence, "buddies".
- * At a high level, all that happens here is marking the table entry
- * at the bottom level available, and propagating the changes upward
- * as necessary, plus some accounting needed to play nicely with other
- * parts of the VM system.
- * At each level, we keep a list of pages, which are heads of continuous
- * 

[RFC PATCH 1/9] mm: do not add anon pages to LRU

2018-09-10 Thread Aaron Lu
For the sake of testing purpose, do not add anon pages to LRU to
avoid LRU lock so we can test zone lock exclusively.

Signed-off-by: Aaron Lu 
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index c467102a5cbc..080641255b8b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3208,7 +3208,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, vmf->address, false);
mem_cgroup_commit_charge(page, memcg, false, false);
-   lru_cache_add_active_or_unevictable(page, vma);
+   //lru_cache_add_active_or_unevictable(page, vma);
 setpte:
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
 
-- 
2.17.1



[RFC PATCH 2/9] mm: introduce smp_list_del for concurrent list entry removals

2018-09-10 Thread Aaron Lu
From: Daniel Jordan 

Now that the LRU lock is a RW lock, lay the groundwork for fine-grained
synchronization so that multiple threads holding the lock as reader can
safely remove pages from an LRU at the same time.

Add a thread-safe variant of list_del called smp_list_del that allows
multiple threads to delete nodes from a list, and wrap this new list API
in smp_del_page_from_lru to get the LRU statistics updates right.

For bisectability's sake, call the new function only when holding
lru_lock as writer.  In the next patch, switch to taking it as reader.

The algorithm is explained in detail in the comments.  Yosef Lev
conceived of the algorithm, and this patch is heavily based on an
earlier version from him.  Thanks to Dave Dice for suggesting the
prefetch.

[aaronlu: only take list related code here]
Signed-off-by: Yosef Lev 
Signed-off-by: Daniel Jordan 
---
 include/linux/list.h |   2 +
 lib/Makefile |   2 +-
 lib/list.c   | 158 +++
 3 files changed, 161 insertions(+), 1 deletion(-)
 create mode 100644 lib/list.c

diff --git a/include/linux/list.h b/include/linux/list.h
index de04cc5ed536..0fd9c87dd14b 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -47,6 +47,8 @@ static inline bool __list_del_entry_valid(struct list_head 
*entry)
 }
 #endif
 
+extern void smp_list_del(struct list_head *entry);
+
 /*
  * Insert a new entry between two known consecutive entries.
  *
diff --git a/lib/Makefile b/lib/Makefile
index ca3f7ebb900d..9527b7484653 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -38,7 +38,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o 
random32.o \
 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
 percpu-refcount.o rhashtable.o reciprocal_div.o \
-once.o refcount.o usercopy.o errseq.o bucket_locks.o
+once.o refcount.o usercopy.o errseq.o bucket_locks.o list.o
 obj-$(CONFIG_STRING_SELFTEST) += test_string.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/lib/list.c b/lib/list.c
new file mode 100644
index ..4d0949ea1a09
--- /dev/null
+++ b/lib/list.c
@@ -0,0 +1,158 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (c) 2017, 2018 Oracle and/or its affiliates. All rights reserved.
+ *
+ * Authors: Yosef Lev 
+ *  Daniel Jordan 
+ */
+
+#include 
+#include 
+
+/*
+ * smp_list_del is a variant of list_del that allows concurrent list removals
+ * under certain assumptions.  The idea is to get away from overly coarse
+ * synchronization, such as using a lock to guard an entire list, which
+ * serializes all operations even though those operations might be happening on
+ * disjoint parts.
+ *
+ * If you want to use other functions from the list API concurrently,
+ * additional synchronization may be necessary.  For example, you could use a
+ * rwlock as a two-mode lock, where readers use the lock in shared mode and are
+ * allowed to call smp_list_del concurrently, and writers use the lock in
+ * exclusive mode and are allowed to use all list operations.
+ */
+
+/**
+ * smp_list_del - concurrent variant of list_del
+ * @entry: entry to delete from the list
+ *
+ * Safely removes an entry from the list in the presence of other threads that
+ * may try to remove adjacent entries.  Uses the entry's next field and the
+ * predecessor entry's next field as locks to accomplish this.
+ *
+ * Assumes that no two threads may try to delete the same entry.  This
+ * assumption holds, for example, if the objects on the list are
+ * reference-counted so that an object is only removed when its refcount falls
+ * to 0.
+ *
+ * @entry's next and prev fields are poisoned on return just as with list_del.
+ */
+void smp_list_del(struct list_head *entry)
+{
+   struct list_head *succ, *pred, *pred_reread;
+
+   /*
+* The predecessor entry's cacheline is read before it's written, so to
+* avoid an unnecessary cacheline state transition, prefetch for
+* writing.  In the common case, the predecessor won't change.
+*/
+   prefetchw(entry->prev);
+
+   /*
+* Step 1: Lock @entry E by making its next field point to its
+* predecessor D.  This prevents any thread from removing the
+* predecessor because that thread will loop in its step 4 while
+* E->next == D.  This also prevents any thread from removing the
+* successor F because that thread will see that F->prev->next != F in
+* the cmpxchg in its step 3.  Retry if the successor is being removed
+* and has already set this field to NULL in step 3.
+*/
+   succ = READ_ONCE(entry->next);
+   pred = READ_ONCE(entry->prev);
+   while (succ == NULL || cmpxchg(>next, succ, pred) != succ) {
+   /*
+* Reread @entry's successor because it may change until
+   

[RFC PATCH 9/9] mm: page_alloc: merge before sending pages to global pool

2018-09-10 Thread Aaron Lu
Now that we have mergable pages in Buddy unmerged, this is a step
to reduce such things from happening to some extent.

Suppose two buddy pages are on the list to be freed in free_pcppages_bulk(),
the first page goes to merge but its buddy is not in Buddy yet so we
hold it locally as an order0 page; then its buddy page goes to merge and
couldn't merge either because we hold the first page locally instead of
having it in Buddy. The end result is, we have two mergable buddy pages
but failed to merge it.

So this patch will attempt merge for these to-be-freed pages before
acquiring any lock, it could, to some extent, reduce fragmentation caused
by last patch.

With this change, the pcp_drain trace isn't easy to use so I removed it.

Signed-off-by: Aaron Lu 
---
 mm/page_alloc.c | 75 +++--
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df38c3f2a1cc..d3eafe857713 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1098,6 +1098,72 @@ void __init percpu_mergelist_init(void)
}
 }
 
+static inline bool buddy_in_list(struct page *page, struct page *buddy,
+struct list_head *list)
+{
+   list_for_each_entry_continue(page, list, lru)
+   if (page == buddy)
+   return true;
+
+   return false;
+}
+
+static inline void merge_in_pcp(struct list_head *list)
+{
+   int order;
+   struct page *page;
+
+   /* Set order information to 0 initially since they are PCP pages */
+   list_for_each_entry(page, list, lru)
+   set_page_private(page, 0);
+
+   /*
+* Check for mergable pages for each order.
+*
+* For each order, check if their buddy is also in the list and
+* if so, do merge, then remove the merged buddy from the list.
+*/
+   for (order = 0; order < MAX_ORDER - 1; order++) {
+   bool has_merge = false;
+
+   page = list_first_entry(list, struct page, lru);
+   while (>lru != list) {
+   unsigned long pfn, buddy_pfn, combined_pfn;
+   struct page *buddy, *n;
+
+   if (page_order(page) != order) {
+   page = list_next_entry(page, lru);
+   continue;
+   }
+
+   pfn = page_to_pfn(page);
+   buddy_pfn = __find_buddy_pfn(pfn, order);
+   buddy = page + (buddy_pfn - pfn);
+   if (!buddy_in_list(page, buddy, list) ||
+   page_order(buddy) != order) {
+   page = list_next_entry(page, lru);
+   continue;
+   }
+
+   combined_pfn = pfn & buddy_pfn;
+   if (combined_pfn == pfn) {
+   set_page_private(page, order + 1);
+   list_del(>lru);
+   page = list_next_entry(page, lru);
+   } else {
+   set_page_private(buddy, order + 1);
+   n = list_next_entry(page, lru);
+   list_del(>lru);
+   page = n;
+   }
+   has_merge = true;
+   }
+
+   if (!has_merge)
+   break;
+   }
+}
+
 /*
  * Frees a number of pages from the PCP lists
  * Assumes all pages on list are in same zone, and of same order.
@@ -1165,6 +1231,12 @@ static void free_pcppages_bulk(struct zone *zone, int 
count,
} while (--count && --batch_free && !list_empty(list));
}
 
+   /*
+* Before acquiring the possibly heavily contended zone lock, do merge
+* among these to-be-freed PCP pages before sending them to Buddy.
+*/
+   merge_in_pcp();
+
read_lock(>lock);
isolated_pageblocks = has_isolate_pageblock(zone);
 
@@ -1182,10 +1254,9 @@ static void free_pcppages_bulk(struct zone *zone, int 
count,
if (unlikely(isolated_pageblocks))
mt = get_pageblock_migratetype(page);
 
-   order = 0;
+   order = page_order(page);
merged_page = do_merge(page, page_to_pfn(page), zone, , 
mt);
list_add(_page->lru, 
this_cpu_ptr(_lists[order][mt]));
-   trace_mm_page_pcpu_drain(page, 0, mt);
}
 
for_each_migratetype_order(order, migratetype) {
-- 
2.17.1



[RFC PATCH 5/9] mm/page_alloc: use helper functions to add/remove a page to/from buddy

2018-09-10 Thread Aaron Lu
There are multiple places that add/remove a page into/from buddy,
introduce helper functions for them.

This also makes it easier to add code when a page is added/removed
to/from buddy.

No functionality change.

Acked-by: Vlastimil Babka 
Signed-off-by: Aaron Lu 
---
 mm/page_alloc.c | 65 +
 1 file changed, 39 insertions(+), 26 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 38e39ccdd6d9..d0b954783f1d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -697,12 +697,41 @@ static inline void set_page_order(struct page *page, 
unsigned int order)
__SetPageBuddy(page);
 }
 
+static inline void add_to_buddy_common(struct page *page, struct zone *zone,
+   unsigned int order)
+{
+   set_page_order(page, order);
+   zone->free_area[order].nr_free++;
+}
+
+static inline void add_to_buddy_head(struct page *page, struct zone *zone,
+   unsigned int order, int mt)
+{
+   add_to_buddy_common(page, zone, order);
+   list_add(>lru, >free_area[order].free_list[mt]);
+}
+
+static inline void add_to_buddy_tail(struct page *page, struct zone *zone,
+   unsigned int order, int mt)
+{
+   add_to_buddy_common(page, zone, order);
+   list_add_tail(>lru, >free_area[order].free_list[mt]);
+}
+
 static inline void rmv_page_order(struct page *page)
 {
__ClearPageBuddy(page);
set_page_private(page, 0);
 }
 
+static inline void remove_from_buddy(struct page *page, struct zone *zone,
+   unsigned int order)
+{
+   list_del(>lru);
+   zone->free_area[order].nr_free--;
+   rmv_page_order(page);
+}
+
 /*
  * This function checks whether a page is free && is the buddy
  * we can coalesce a page and its buddy if
@@ -803,13 +832,10 @@ static inline void __free_one_page(struct page *page,
 * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
 * merge with it and move up one order.
 */
-   if (page_is_guard(buddy)) {
+   if (page_is_guard(buddy))
clear_page_guard(zone, buddy, order, migratetype);
-   } else {
-   list_del(>lru);
-   zone->free_area[order].nr_free--;
-   rmv_page_order(buddy);
-   }
+   else
+   remove_from_buddy(buddy, zone, order);
combined_pfn = buddy_pfn & pfn;
page = page + (combined_pfn - pfn);
pfn = combined_pfn;
@@ -841,8 +867,6 @@ static inline void __free_one_page(struct page *page,
}
 
 done_merging:
-   set_page_order(page, order);
-
/*
 * If this is not the largest possible page, check if the buddy
 * of the next-highest order is free. If it is, it's possible
@@ -859,15 +883,12 @@ static inline void __free_one_page(struct page *page,
higher_buddy = higher_page + (buddy_pfn - combined_pfn);
if (pfn_valid_within(buddy_pfn) &&
page_is_buddy(higher_page, higher_buddy, order + 1)) {
-   list_add_tail(>lru,
-   >free_area[order].free_list[migratetype]);
-   goto out;
+   add_to_buddy_tail(page, zone, order, migratetype);
+   return;
}
}
 
-   list_add(>lru, >free_area[order].free_list[migratetype]);
-out:
-   zone->free_area[order].nr_free++;
+   add_to_buddy_head(page, zone, order, migratetype);
 }
 
 /*
@@ -1805,9 +1826,7 @@ static inline void expand(struct zone *zone, struct page 
*page,
if (set_page_guard(zone, [size], high, migratetype))
continue;
 
-   list_add([size].lru, >free_list[migratetype]);
-   area->nr_free++;
-   set_page_order([size], high);
+   add_to_buddy_head([size], zone, high, migratetype);
}
 }
 
@@ -1951,9 +1970,7 @@ struct page *__rmqueue_smallest(struct zone *zone, 
unsigned int order,
struct page, lru);
if (!page)
continue;
-   list_del(>lru);
-   rmv_page_order(page);
-   area->nr_free--;
+   remove_from_buddy(page, zone, current_order);
expand(zone, page, order, current_order, area, migratetype);
set_pcppage_migratetype(page, migratetype);
return page;
@@ -2871,9 +2888,7 @@ int __isolate_free_page(struct page *page, unsigned int 
order)
}
 
/* Remove page from free list */
-   list_del(>lru);
-   zone->free_area[order].nr_free--;
-   rmv_page_order(page);
+   remove_from_buddy(page, zone, order);
 
/*
 

[RFC PATCH 0/9] Improve zone lock scalability using Daniel Jordan's list work

2018-09-10 Thread Aaron Lu
Daniel Jordan and others proposed an innovative technique to make
multiple threads concurrently use list_del() at any position of the
list and list_add() at head position of the list without taking a lock
in this year's MM summit[0].

People think this technique may be useful to improve zone lock
scalability so here is my try. This series is based on Daniel Jordan's
most recent patchset[1]. To make this series self contained, 2 of his
patches are extracted here.

Scalability comes best when multiple threads are operating at different
positions of the list. Since free path will access (buddy) pages
randomly on free list during merging, it is a good fit to make use of
this technique. This patchset makes free path run concurrently.

Patch 1 is for testing purpose only, it removes LRU lock from the
picture so we can get a better understanding of how much improvement
this patchset has on zone lock.

Patch 2-3 are Daniel's work to realize concurrent list_del() and
list_add(), these new APIs are called smp_list_del() and
smp_list_splice().

Patch 4-7 makes free path run concurrently by converting the zone lock
from spinlock to rwlock and has free path taking the zone lock in read
mode. To avoid complexity and problems, all other code paths take zone
lock in write mode.

Patch 8 is an optimization that reduces free list head access to avoid
severe cache bouncing. It also comes with a side effect: with this
patch, there will be mergable pages unmerged in Buddy.

Patch 9 improves fragmentation issues introduced in patch 8 by doing
pre-merges before pages are sent to merge under zone lock.

This patchset is based on v4.19-rc2.

Performance wise on 56 cores/112 threads Intel Skylake 2 sockets server
using will-it-scale/page_fault1 process mode(higher is better):

kernelperformance  zone lock contention
patch1 9219349 76.99%
patch7 2461133 -73.3%  54.46%(another 34.66% on smp_list_add())
patch811712766 +27.0%  68.14%
patch911386980 +23.5%  67.18%

Though lock contention reduced a lot for patch7, the performance dropped
considerably due to severe cache bouncing on free list head among
multiple threads doing page free at the same time, because every page free
will need to add the page to the free list head.

Patch8 is meant to solve this cache bouncing problem and has good result,
except the above mentioned side effect of having mergable pages unmerged
in Buddy. Patch9 reduced the fragmentation problem to some extent while
caused slightly performance drop.

As a comparison to the no_merge+cluster_alloc approach I posted before[2]:

kernel performance  zone lock contention
patch1  9219349 76.99%
no_merge   11733153 +27.3%  69.18%
no_merge+cluster_alloc 12094893 +31.2%   0.73%

no_merge(skip merging for order0 page on free path) has similar
performance and zone lock contention as patch8/9, while with
cluster_alloc that also improves allocation side, zone lock contention
for this workload is almost gone.

To get an idea of how fragmentation are affected by patch8 and how much
improvement patch9 has, this is the result of /proc/buddyinfo after
running will-it-scale/page_fault1 for 3 minutes:

With patch7:
Node 0, zone  DMA  0  2  1  1  3  2  2  1   
   0  1  3
Node 0, zoneDMA32  7  3  6  5  5 10  6  7   
   6 10410
Node 0, zone   Normal  17820  16819  14645  12969  11367   9229   6365   3062   
 756 69   5646
Node 1, zone   Normal  44789  60354  52331  37532  22071   9604   2750241   
  32 11   6378

With patch8:
Node 0, zone  DMA  0  2  1  1  3  2  2  1   
   0  1  3
Node 0, zoneDMA32  7  9  5  4  5 10  6  7   
   6 10410
Node 0, zone   Normal 404917 119614  79446  58303  20679   3106222 89   
  28  9   5615
Node 1, zone   Normal 507659 127355  64470  53549  14104   1288 30  4   
   1  1   6078

With patch9:
Node 0, zone  DMA  0  3  0  1  3  0  1  0   
   1  1  3
Node 0, zoneDMA32 11423621705726702 60 14   
   5  6296
Node 0, zone   Normal  20407  21016  18731  16195  13697  10483   6873   3148   
 735 39   5637
Node 1, zone   Normal  79738  76963  59313  35996  18626   9743   3947750   
  21  2   6080

A lot more pages stayed in order0 in patch8 than patch7, consequently,
for order5 and above pages, there are fewer with patch8 than patch7,
suggesting that some pages are not properly merged into high order pages
with patch8 applied. Patch9 has far fewer pages stayed in order0 than
patch8, which is a good sign but still not as good as patch7.

As a comparison, this is the result of no_merge(think of it as a worst
case result regarding fragmentation):

With no_merge:
Node 0, zone  DMA  0  2  1 

[RFC PATCH 3/9] mm: introduce smp_list_splice to prepare for concurrent LRU adds

2018-09-10 Thread Aaron Lu
From: Daniel Jordan 

Now that we splice a local list onto the LRU, prepare for multiple tasks
doing this concurrently by adding a variant of the kernel's list
splicing API, list_splice, that's designed to work with multiple tasks.

Although there is naturally less parallelism to be gained from locking
the LRU head this way, the main benefit of doing this is to allow
removals to happen concurrently.  The way lru_lock is today, an add
needlessly blocks removal of any page but the first in the LRU.

For now, hold lru_lock as writer to serialize the adds to ensure the
function is correct for a single thread at a time.

Yosef Lev came up with this algorithm.

[aaronlu: drop LRU related code, keep only list related code]
Suggested-by: Yosef Lev 
Signed-off-by: Daniel Jordan 
---
 include/linux/list.h |  1 +
 lib/list.c   | 60 ++--
 2 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/include/linux/list.h b/include/linux/list.h
index 0fd9c87dd14b..5f203fb55939 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -48,6 +48,7 @@ static inline bool __list_del_entry_valid(struct list_head 
*entry)
 #endif
 
 extern void smp_list_del(struct list_head *entry);
+extern void smp_list_splice(struct list_head *list, struct list_head *head);
 
 /*
  * Insert a new entry between two known consecutive entries.
diff --git a/lib/list.c b/lib/list.c
index 4d0949ea1a09..104faa144abf 100644
--- a/lib/list.c
+++ b/lib/list.c
@@ -10,17 +10,18 @@
 #include 
 
 /*
- * smp_list_del is a variant of list_del that allows concurrent list removals
- * under certain assumptions.  The idea is to get away from overly coarse
- * synchronization, such as using a lock to guard an entire list, which
- * serializes all operations even though those operations might be happening on
- * disjoint parts.
+ * smp_list_del and smp_list_splice are variants of list_del and list_splice,
+ * respectively, that allow concurrent list operations under certain
+ * assumptions.  The idea is to get away from overly coarse synchronization,
+ * such as using a lock to guard an entire list, which serializes all
+ * operations even though those operations might be happening on disjoint
+ * parts.
  *
  * If you want to use other functions from the list API concurrently,
  * additional synchronization may be necessary.  For example, you could use a
  * rwlock as a two-mode lock, where readers use the lock in shared mode and are
- * allowed to call smp_list_del concurrently, and writers use the lock in
- * exclusive mode and are allowed to use all list operations.
+ * allowed to call smp_list_* functions concurrently, and writers use the lock
+ * in exclusive mode and are allowed to use all list operations.
  */
 
 /**
@@ -156,3 +157,48 @@ void smp_list_del(struct list_head *entry)
entry->next = LIST_POISON1;
entry->prev = LIST_POISON2;
 }
+
+/**
+ * smp_list_splice - thread-safe splice of two lists
+ * @list: the new list to add
+ * @head: the place to add it in the first list
+ *
+ * Safely handles concurrent smp_list_splice operations onto the same list head
+ * and concurrent smp_list_del operations of any list entry except @head.
+ * Assumes that @head cannot be removed.
+ */
+void smp_list_splice(struct list_head *list, struct list_head *head)
+{
+   struct list_head *first = list->next;
+   struct list_head *last = list->prev;
+   struct list_head *succ;
+
+   /*
+* Lock the front of @head by replacing its next pointer with NULL.
+* Should another thread be adding to the front, wait until it's done.
+*/
+   succ = READ_ONCE(head->next);
+   while (succ == NULL || cmpxchg(>next, succ, NULL) != succ) {
+   cpu_relax();
+   succ = READ_ONCE(head->next);
+   }
+
+   first->prev = head;
+   last->next = succ;
+
+   /*
+* It is safe to write to succ, head's successor, because locking head
+* prevents succ from being removed in smp_list_del.
+*/
+   succ->prev = last;
+
+   /*
+* Pairs with the implied full barrier before the cmpxchg above.
+* Ensures the write that unlocks the head is seen last to avoid list
+* corruption.
+*/
+   smp_wmb();
+
+   /* Simultaneously complete the splice and unlock the head node. */
+   WRITE_ONCE(head->next, first);
+}
-- 
2.17.1



[RFC PATCH 5/9] mm/page_alloc: use helper functions to add/remove a page to/from buddy

2018-09-10 Thread Aaron Lu
There are multiple places that add/remove a page into/from buddy,
introduce helper functions for them.

This also makes it easier to add code when a page is added/removed
to/from buddy.

No functionality change.

Acked-by: Vlastimil Babka 
Signed-off-by: Aaron Lu 
---
 mm/page_alloc.c | 65 +
 1 file changed, 39 insertions(+), 26 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 38e39ccdd6d9..d0b954783f1d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -697,12 +697,41 @@ static inline void set_page_order(struct page *page, 
unsigned int order)
__SetPageBuddy(page);
 }
 
+static inline void add_to_buddy_common(struct page *page, struct zone *zone,
+   unsigned int order)
+{
+   set_page_order(page, order);
+   zone->free_area[order].nr_free++;
+}
+
+static inline void add_to_buddy_head(struct page *page, struct zone *zone,
+   unsigned int order, int mt)
+{
+   add_to_buddy_common(page, zone, order);
+   list_add(>lru, >free_area[order].free_list[mt]);
+}
+
+static inline void add_to_buddy_tail(struct page *page, struct zone *zone,
+   unsigned int order, int mt)
+{
+   add_to_buddy_common(page, zone, order);
+   list_add_tail(>lru, >free_area[order].free_list[mt]);
+}
+
 static inline void rmv_page_order(struct page *page)
 {
__ClearPageBuddy(page);
set_page_private(page, 0);
 }
 
+static inline void remove_from_buddy(struct page *page, struct zone *zone,
+   unsigned int order)
+{
+   list_del(>lru);
+   zone->free_area[order].nr_free--;
+   rmv_page_order(page);
+}
+
 /*
  * This function checks whether a page is free && is the buddy
  * we can coalesce a page and its buddy if
@@ -803,13 +832,10 @@ static inline void __free_one_page(struct page *page,
 * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
 * merge with it and move up one order.
 */
-   if (page_is_guard(buddy)) {
+   if (page_is_guard(buddy))
clear_page_guard(zone, buddy, order, migratetype);
-   } else {
-   list_del(>lru);
-   zone->free_area[order].nr_free--;
-   rmv_page_order(buddy);
-   }
+   else
+   remove_from_buddy(buddy, zone, order);
combined_pfn = buddy_pfn & pfn;
page = page + (combined_pfn - pfn);
pfn = combined_pfn;
@@ -841,8 +867,6 @@ static inline void __free_one_page(struct page *page,
}
 
 done_merging:
-   set_page_order(page, order);
-
/*
 * If this is not the largest possible page, check if the buddy
 * of the next-highest order is free. If it is, it's possible
@@ -859,15 +883,12 @@ static inline void __free_one_page(struct page *page,
higher_buddy = higher_page + (buddy_pfn - combined_pfn);
if (pfn_valid_within(buddy_pfn) &&
page_is_buddy(higher_page, higher_buddy, order + 1)) {
-   list_add_tail(>lru,
-   >free_area[order].free_list[migratetype]);
-   goto out;
+   add_to_buddy_tail(page, zone, order, migratetype);
+   return;
}
}
 
-   list_add(>lru, >free_area[order].free_list[migratetype]);
-out:
-   zone->free_area[order].nr_free++;
+   add_to_buddy_head(page, zone, order, migratetype);
 }
 
 /*
@@ -1805,9 +1826,7 @@ static inline void expand(struct zone *zone, struct page 
*page,
if (set_page_guard(zone, [size], high, migratetype))
continue;
 
-   list_add([size].lru, >free_list[migratetype]);
-   area->nr_free++;
-   set_page_order([size], high);
+   add_to_buddy_head([size], zone, high, migratetype);
}
 }
 
@@ -1951,9 +1970,7 @@ struct page *__rmqueue_smallest(struct zone *zone, 
unsigned int order,
struct page, lru);
if (!page)
continue;
-   list_del(>lru);
-   rmv_page_order(page);
-   area->nr_free--;
+   remove_from_buddy(page, zone, current_order);
expand(zone, page, order, current_order, area, migratetype);
set_pcppage_migratetype(page, migratetype);
return page;
@@ -2871,9 +2888,7 @@ int __isolate_free_page(struct page *page, unsigned int 
order)
}
 
/* Remove page from free list */
-   list_del(>lru);
-   zone->free_area[order].nr_free--;
-   rmv_page_order(page);
+   remove_from_buddy(page, zone, order);
 
/*
 

[RFC PATCH 0/9] Improve zone lock scalability using Daniel Jordan's list work

2018-09-10 Thread Aaron Lu
Daniel Jordan and others proposed an innovative technique to make
multiple threads concurrently use list_del() at any position of the
list and list_add() at head position of the list without taking a lock
in this year's MM summit[0].

People think this technique may be useful to improve zone lock
scalability so here is my try. This series is based on Daniel Jordan's
most recent patchset[1]. To make this series self contained, 2 of his
patches are extracted here.

Scalability comes best when multiple threads are operating at different
positions of the list. Since free path will access (buddy) pages
randomly on free list during merging, it is a good fit to make use of
this technique. This patchset makes free path run concurrently.

Patch 1 is for testing purpose only, it removes LRU lock from the
picture so we can get a better understanding of how much improvement
this patchset has on zone lock.

Patch 2-3 are Daniel's work to realize concurrent list_del() and
list_add(), these new APIs are called smp_list_del() and
smp_list_splice().

Patch 4-7 makes free path run concurrently by converting the zone lock
from spinlock to rwlock and has free path taking the zone lock in read
mode. To avoid complexity and problems, all other code paths take zone
lock in write mode.

Patch 8 is an optimization that reduces free list head access to avoid
severe cache bouncing. It also comes with a side effect: with this
patch, there will be mergable pages unmerged in Buddy.

Patch 9 improves fragmentation issues introduced in patch 8 by doing
pre-merges before pages are sent to merge under zone lock.

This patchset is based on v4.19-rc2.

Performance wise on 56 cores/112 threads Intel Skylake 2 sockets server
using will-it-scale/page_fault1 process mode(higher is better):

kernelperformance  zone lock contention
patch1 9219349 76.99%
patch7 2461133 -73.3%  54.46%(another 34.66% on smp_list_add())
patch811712766 +27.0%  68.14%
patch911386980 +23.5%  67.18%

Though lock contention reduced a lot for patch7, the performance dropped
considerably due to severe cache bouncing on free list head among
multiple threads doing page free at the same time, because every page free
will need to add the page to the free list head.

Patch8 is meant to solve this cache bouncing problem and has good result,
except the above mentioned side effect of having mergable pages unmerged
in Buddy. Patch9 reduced the fragmentation problem to some extent while
caused slightly performance drop.

As a comparison to the no_merge+cluster_alloc approach I posted before[2]:

kernel performance  zone lock contention
patch1  9219349 76.99%
no_merge   11733153 +27.3%  69.18%
no_merge+cluster_alloc 12094893 +31.2%   0.73%

no_merge(skip merging for order0 page on free path) has similar
performance and zone lock contention as patch8/9, while with
cluster_alloc that also improves allocation side, zone lock contention
for this workload is almost gone.

To get an idea of how fragmentation are affected by patch8 and how much
improvement patch9 has, this is the result of /proc/buddyinfo after
running will-it-scale/page_fault1 for 3 minutes:

With patch7:
Node 0, zone  DMA  0  2  1  1  3  2  2  1   
   0  1  3
Node 0, zoneDMA32  7  3  6  5  5 10  6  7   
   6 10410
Node 0, zone   Normal  17820  16819  14645  12969  11367   9229   6365   3062   
 756 69   5646
Node 1, zone   Normal  44789  60354  52331  37532  22071   9604   2750241   
  32 11   6378

With patch8:
Node 0, zone  DMA  0  2  1  1  3  2  2  1   
   0  1  3
Node 0, zoneDMA32  7  9  5  4  5 10  6  7   
   6 10410
Node 0, zone   Normal 404917 119614  79446  58303  20679   3106222 89   
  28  9   5615
Node 1, zone   Normal 507659 127355  64470  53549  14104   1288 30  4   
   1  1   6078

With patch9:
Node 0, zone  DMA  0  3  0  1  3  0  1  0   
   1  1  3
Node 0, zoneDMA32 11423621705726702 60 14   
   5  6296
Node 0, zone   Normal  20407  21016  18731  16195  13697  10483   6873   3148   
 735 39   5637
Node 1, zone   Normal  79738  76963  59313  35996  18626   9743   3947750   
  21  2   6080

A lot more pages stayed in order0 in patch8 than patch7, consequently,
for order5 and above pages, there are fewer with patch8 than patch7,
suggesting that some pages are not properly merged into high order pages
with patch8 applied. Patch9 has far fewer pages stayed in order0 than
patch8, which is a good sign but still not as good as patch7.

As a comparison, this is the result of no_merge(think of it as a worst
case result regarding fragmentation):

With no_merge:
Node 0, zone  DMA  0  2  1 

[RFC PATCH 3/9] mm: introduce smp_list_splice to prepare for concurrent LRU adds

2018-09-10 Thread Aaron Lu
From: Daniel Jordan 

Now that we splice a local list onto the LRU, prepare for multiple tasks
doing this concurrently by adding a variant of the kernel's list
splicing API, list_splice, that's designed to work with multiple tasks.

Although there is naturally less parallelism to be gained from locking
the LRU head this way, the main benefit of doing this is to allow
removals to happen concurrently.  The way lru_lock is today, an add
needlessly blocks removal of any page but the first in the LRU.

For now, hold lru_lock as writer to serialize the adds to ensure the
function is correct for a single thread at a time.

Yosef Lev came up with this algorithm.

[aaronlu: drop LRU related code, keep only list related code]
Suggested-by: Yosef Lev 
Signed-off-by: Daniel Jordan 
---
 include/linux/list.h |  1 +
 lib/list.c   | 60 ++--
 2 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/include/linux/list.h b/include/linux/list.h
index 0fd9c87dd14b..5f203fb55939 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -48,6 +48,7 @@ static inline bool __list_del_entry_valid(struct list_head 
*entry)
 #endif
 
 extern void smp_list_del(struct list_head *entry);
+extern void smp_list_splice(struct list_head *list, struct list_head *head);
 
 /*
  * Insert a new entry between two known consecutive entries.
diff --git a/lib/list.c b/lib/list.c
index 4d0949ea1a09..104faa144abf 100644
--- a/lib/list.c
+++ b/lib/list.c
@@ -10,17 +10,18 @@
 #include 
 
 /*
- * smp_list_del is a variant of list_del that allows concurrent list removals
- * under certain assumptions.  The idea is to get away from overly coarse
- * synchronization, such as using a lock to guard an entire list, which
- * serializes all operations even though those operations might be happening on
- * disjoint parts.
+ * smp_list_del and smp_list_splice are variants of list_del and list_splice,
+ * respectively, that allow concurrent list operations under certain
+ * assumptions.  The idea is to get away from overly coarse synchronization,
+ * such as using a lock to guard an entire list, which serializes all
+ * operations even though those operations might be happening on disjoint
+ * parts.
  *
  * If you want to use other functions from the list API concurrently,
  * additional synchronization may be necessary.  For example, you could use a
  * rwlock as a two-mode lock, where readers use the lock in shared mode and are
- * allowed to call smp_list_del concurrently, and writers use the lock in
- * exclusive mode and are allowed to use all list operations.
+ * allowed to call smp_list_* functions concurrently, and writers use the lock
+ * in exclusive mode and are allowed to use all list operations.
  */
 
 /**
@@ -156,3 +157,48 @@ void smp_list_del(struct list_head *entry)
entry->next = LIST_POISON1;
entry->prev = LIST_POISON2;
 }
+
+/**
+ * smp_list_splice - thread-safe splice of two lists
+ * @list: the new list to add
+ * @head: the place to add it in the first list
+ *
+ * Safely handles concurrent smp_list_splice operations onto the same list head
+ * and concurrent smp_list_del operations of any list entry except @head.
+ * Assumes that @head cannot be removed.
+ */
+void smp_list_splice(struct list_head *list, struct list_head *head)
+{
+   struct list_head *first = list->next;
+   struct list_head *last = list->prev;
+   struct list_head *succ;
+
+   /*
+* Lock the front of @head by replacing its next pointer with NULL.
+* Should another thread be adding to the front, wait until it's done.
+*/
+   succ = READ_ONCE(head->next);
+   while (succ == NULL || cmpxchg(>next, succ, NULL) != succ) {
+   cpu_relax();
+   succ = READ_ONCE(head->next);
+   }
+
+   first->prev = head;
+   last->next = succ;
+
+   /*
+* It is safe to write to succ, head's successor, because locking head
+* prevents succ from being removed in smp_list_del.
+*/
+   succ->prev = last;
+
+   /*
+* Pairs with the implied full barrier before the cmpxchg above.
+* Ensures the write that unlocks the head is seen last to avoid list
+* corruption.
+*/
+   smp_wmb();
+
+   /* Simultaneously complete the splice and unlock the head node. */
+   WRITE_ONCE(head->next, first);
+}
-- 
2.17.1



[RFC PATCH 9/9] mm: page_alloc: merge before sending pages to global pool

2018-09-10 Thread Aaron Lu
Now that we have mergable pages in Buddy unmerged, this is a step
to reduce such things from happening to some extent.

Suppose two buddy pages are on the list to be freed in free_pcppages_bulk(),
the first page goes to merge but its buddy is not in Buddy yet so we
hold it locally as an order0 page; then its buddy page goes to merge and
couldn't merge either because we hold the first page locally instead of
having it in Buddy. The end result is, we have two mergable buddy pages
but failed to merge it.

So this patch will attempt merge for these to-be-freed pages before
acquiring any lock, it could, to some extent, reduce fragmentation caused
by last patch.

With this change, the pcp_drain trace isn't easy to use so I removed it.

Signed-off-by: Aaron Lu 
---
 mm/page_alloc.c | 75 +++--
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df38c3f2a1cc..d3eafe857713 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1098,6 +1098,72 @@ void __init percpu_mergelist_init(void)
}
 }
 
+static inline bool buddy_in_list(struct page *page, struct page *buddy,
+struct list_head *list)
+{
+   list_for_each_entry_continue(page, list, lru)
+   if (page == buddy)
+   return true;
+
+   return false;
+}
+
+static inline void merge_in_pcp(struct list_head *list)
+{
+   int order;
+   struct page *page;
+
+   /* Set order information to 0 initially since they are PCP pages */
+   list_for_each_entry(page, list, lru)
+   set_page_private(page, 0);
+
+   /*
+* Check for mergable pages for each order.
+*
+* For each order, check if their buddy is also in the list and
+* if so, do merge, then remove the merged buddy from the list.
+*/
+   for (order = 0; order < MAX_ORDER - 1; order++) {
+   bool has_merge = false;
+
+   page = list_first_entry(list, struct page, lru);
+   while (>lru != list) {
+   unsigned long pfn, buddy_pfn, combined_pfn;
+   struct page *buddy, *n;
+
+   if (page_order(page) != order) {
+   page = list_next_entry(page, lru);
+   continue;
+   }
+
+   pfn = page_to_pfn(page);
+   buddy_pfn = __find_buddy_pfn(pfn, order);
+   buddy = page + (buddy_pfn - pfn);
+   if (!buddy_in_list(page, buddy, list) ||
+   page_order(buddy) != order) {
+   page = list_next_entry(page, lru);
+   continue;
+   }
+
+   combined_pfn = pfn & buddy_pfn;
+   if (combined_pfn == pfn) {
+   set_page_private(page, order + 1);
+   list_del(>lru);
+   page = list_next_entry(page, lru);
+   } else {
+   set_page_private(buddy, order + 1);
+   n = list_next_entry(page, lru);
+   list_del(>lru);
+   page = n;
+   }
+   has_merge = true;
+   }
+
+   if (!has_merge)
+   break;
+   }
+}
+
 /*
  * Frees a number of pages from the PCP lists
  * Assumes all pages on list are in same zone, and of same order.
@@ -1165,6 +1231,12 @@ static void free_pcppages_bulk(struct zone *zone, int 
count,
} while (--count && --batch_free && !list_empty(list));
}
 
+   /*
+* Before acquiring the possibly heavily contended zone lock, do merge
+* among these to-be-freed PCP pages before sending them to Buddy.
+*/
+   merge_in_pcp();
+
read_lock(>lock);
isolated_pageblocks = has_isolate_pageblock(zone);
 
@@ -1182,10 +1254,9 @@ static void free_pcppages_bulk(struct zone *zone, int 
count,
if (unlikely(isolated_pageblocks))
mt = get_pageblock_migratetype(page);
 
-   order = 0;
+   order = page_order(page);
merged_page = do_merge(page, page_to_pfn(page), zone, , 
mt);
list_add(_page->lru, 
this_cpu_ptr(_lists[order][mt]));
-   trace_mm_page_pcpu_drain(page, 0, mt);
}
 
for_each_migratetype_order(order, migratetype) {
-- 
2.17.1



[RFC PATCH 4/9] mm: convert zone lock from spinlock to rwlock

2018-09-10 Thread Aaron Lu
This patch converts zone lock from spinlock to rwlock and always
take the lock in write mode so there is no functionality change.

This is a preparation for free path to take the lock in read mode
to make free path work concurrently.

compact_trylock and compact_unlock_should_abort are taken from
Daniel Jordan's patch.

Signed-off-by: Aaron Lu 
---
 include/linux/mmzone.h |  2 +-
 mm/compaction.c| 90 +-
 mm/hugetlb.c   |  8 ++--
 mm/page_alloc.c| 52 
 mm/page_isolation.c| 12 +++---
 mm/vmstat.c|  4 +-
 6 files changed, 85 insertions(+), 83 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 1e22d96734e0..84cfa56e2d19 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -465,7 +465,7 @@ struct zone {
unsigned long   flags;
 
/* Primarily protects free_area */
-   spinlock_t  lock;
+   rwlock_tlock;
 
/* Write-intensive fields used by compaction and vmstats. */
ZONE_PADDING(_pad2_)
diff --git a/mm/compaction.c b/mm/compaction.c
index faca45ebe62d..6ecf74d8e287 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -347,20 +347,20 @@ static inline void update_pageblock_skip(struct 
compact_control *cc,
  * Returns true if the lock is held
  * Returns false if the lock is not held and compaction should abort
  */
-static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags,
-   struct compact_control *cc)
-{
-   if (cc->mode == MIGRATE_ASYNC) {
-   if (!spin_trylock_irqsave(lock, *flags)) {
-   cc->contended = true;
-   return false;
-   }
-   } else {
-   spin_lock_irqsave(lock, *flags);
-   }
-
-   return true;
-}
+#define compact_trylock(lock, flags, cc, lockf, trylockf) \
+({\
+   bool __ret = true; \
+   if ((cc)->mode == MIGRATE_ASYNC) { \
+   if (!trylockf((lock), *(flags))) { \
+   (cc)->contended = true;\
+   __ret = false; \
+   }  \
+   } else {   \
+   lockf((lock), *(flags));   \
+   }  \
+  \
+   __ret; \
+})
 
 /*
  * Compaction requires the taking of some coarse locks that are potentially
@@ -377,29 +377,29 @@ static bool compact_trylock_irqsave(spinlock_t *lock, 
unsigned long *flags,
  * Returns false when compaction can continue (sync compaction might have
  * scheduled)
  */
-static bool compact_unlock_should_abort(spinlock_t *lock,
-   unsigned long flags, bool *locked, struct compact_control *cc)
-{
-   if (*locked) {
-   spin_unlock_irqrestore(lock, flags);
-   *locked = false;
-   }
-
-   if (fatal_signal_pending(current)) {
-   cc->contended = true;
-   return true;
-   }
-
-   if (need_resched()) {
-   if (cc->mode == MIGRATE_ASYNC) {
-   cc->contended = true;
-   return true;
-   }
-   cond_resched();
-   }
-
-   return false;
-}
+#define compact_unlock_should_abort(lock, flags, locked, cc, unlockf) \
+({\
+   bool __ret = false;\
+  \
+   if (*(locked)) {   \
+   unlockf((lock), (flags));  \
+   *(locked) = false; \
+   }  \
+  \
+   if (fatal_signal_pending(current)) {   \
+   (cc)->contended = true;\
+   __ret = true;  \
+   } else if (need_resched()) {   \
+   if ((cc)->mode == MIGRATE_ASYNC) {   

[RFC PATCH 6/9] use atomic for free_area[order].nr_free

2018-09-10 Thread Aaron Lu
Since we will make free path run concurrently, free_area[].nr_free has
to be atomic.

Signed-off-by: Aaron Lu 
---
 include/linux/mmzone.h |  2 +-
 mm/page_alloc.c| 12 ++--
 mm/vmstat.c|  4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 84cfa56e2d19..e66b8c63d5d1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -95,7 +95,7 @@ extern int page_group_by_mobility_disabled;
 
 struct free_area {
struct list_headfree_list[MIGRATE_TYPES];
-   unsigned long   nr_free;
+   atomic_long_t   nr_free;
 };
 
 struct pglist_data;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d0b954783f1d..dff3edc60d71 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -701,7 +701,7 @@ static inline void add_to_buddy_common(struct page *page, 
struct zone *zone,
unsigned int order)
 {
set_page_order(page, order);
-   zone->free_area[order].nr_free++;
+   atomic_long_inc(>free_area[order].nr_free);
 }
 
 static inline void add_to_buddy_head(struct page *page, struct zone *zone,
@@ -728,7 +728,7 @@ static inline void remove_from_buddy(struct page *page, 
struct zone *zone,
unsigned int order)
 {
list_del(>lru);
-   zone->free_area[order].nr_free--;
+   atomic_long_dec(>free_area[order].nr_free);
rmv_page_order(page);
 }
 
@@ -2225,7 +2225,7 @@ int find_suitable_fallback(struct free_area *area, 
unsigned int order,
int i;
int fallback_mt;
 
-   if (area->nr_free == 0)
+   if (atomic_long_read(>nr_free) == 0)
return -1;
 
*can_steal = false;
@@ -3178,7 +3178,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int 
order, unsigned long mark,
struct free_area *area = >free_area[o];
int mt;
 
-   if (!area->nr_free)
+   if (atomic_long_read(>nr_free) == 0)
continue;
 
for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
@@ -5029,7 +5029,7 @@ void show_free_areas(unsigned int filter, nodemask_t 
*nodemask)
struct free_area *area = >free_area[order];
int type;
 
-   nr[order] = area->nr_free;
+   nr[order] = atomic_long_read(>nr_free);
total += nr[order] << order;
 
types[order] = 0;
@@ -5562,7 +5562,7 @@ static void __meminit zone_init_free_lists(struct zone 
*zone)
unsigned int order, t;
for_each_migratetype_order(order, t) {
INIT_LIST_HEAD(>free_area[order].free_list[t]);
-   zone->free_area[order].nr_free = 0;
+   atomic_long_set(>free_area[order].nr_free, 0);
}
 }
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 06d79271a8ae..c1985550bb9f 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1030,7 +1030,7 @@ static void fill_contig_page_info(struct zone *zone,
unsigned long blocks;
 
/* Count number of free blocks */
-   blocks = zone->free_area[order].nr_free;
+   blocks = atomic_long_read(>free_area[order].nr_free);
info->free_blocks_total += blocks;
 
/* Count free base pages */
@@ -1353,7 +1353,7 @@ static void frag_show_print(struct seq_file *m, pg_data_t 
*pgdat,
 
seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
for (order = 0; order < MAX_ORDER; ++order)
-   seq_printf(m, "%6lu ", zone->free_area[order].nr_free);
+   seq_printf(m, "%6lu ", 
atomic_long_read(>free_area[order].nr_free));
seq_putc(m, '\n');
 }
 
-- 
2.17.1



[RFC PATCH 7/9] mm: use read_lock for free path

2018-09-10 Thread Aaron Lu
Daniel Jordan's patch has made it possible for multiple threads to
operate on a global list with smp_list_del() at any position and
smp_list_add/splice() at head position concurrently without taking
any lock.

This patch makes use of this technique on free list.
To make this happen, add_to_buddy_tail() is removed since only
adding to list head is safe with smp_list_del() so only
add_to_buddy() is used.

Once free path can run concurrently, it is possible for multiple
threads to free pages at the same time. If 2 pages being freed are
buddy, they can miss the oppotunity to be merged.

For this reason, introduce range locks to protect merge operation
that makes sure inside one range, only one merge can happen and a
page's Buddy status is properly set inside the lock. The range is
selected as an order of (MAX_ORDER-1) pages since merge can't
exceed that order.

Signed-off-by: Aaron Lu 
---
 include/linux/list.h   |  1 +
 include/linux/mmzone.h |  3 ++
 lib/list.c | 23 ++
 mm/page_alloc.c| 95 +++---
 4 files changed, 78 insertions(+), 44 deletions(-)

diff --git a/include/linux/list.h b/include/linux/list.h
index 5f203fb55939..608e40f6489e 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -49,6 +49,7 @@ static inline bool __list_del_entry_valid(struct list_head 
*entry)
 
 extern void smp_list_del(struct list_head *entry);
 extern void smp_list_splice(struct list_head *list, struct list_head *head);
+extern void smp_list_add(struct list_head *entry, struct list_head *head);
 
 /*
  * Insert a new entry between two known consecutive entries.
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e66b8c63d5d1..0ea52e9bb610 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -467,6 +467,9 @@ struct zone {
/* Primarily protects free_area */
rwlock_tlock;
 
+   /* Protects merge operation for a range of order=(MAX_ORDER-1) pages */
+   spinlock_t  *range_locks;
+
/* Write-intensive fields used by compaction and vmstats. */
ZONE_PADDING(_pad2_)
 
diff --git a/lib/list.c b/lib/list.c
index 104faa144abf..3ecf62b88c86 100644
--- a/lib/list.c
+++ b/lib/list.c
@@ -202,3 +202,26 @@ void smp_list_splice(struct list_head *list, struct 
list_head *head)
/* Simultaneously complete the splice and unlock the head node. */
WRITE_ONCE(head->next, first);
 }
+
+void smp_list_add(struct list_head *entry, struct list_head *head)
+{
+   struct list_head *succ;
+
+   /*
+* Lock the front of @head by replacing its next pointer with NULL.
+* Should another thread be adding to the front, wait until it's done.
+*/
+   succ = READ_ONCE(head->next);
+   while (succ == NULL || cmpxchg(>next, succ, NULL) != succ) {
+   cpu_relax();
+   succ = READ_ONCE(head->next);
+   }
+
+   entry->next = succ;
+   entry->prev = head;
+   succ->prev = entry;
+
+   smp_wmb();
+
+   WRITE_ONCE(head->next, entry);
+}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dff3edc60d71..5f5cc671bcf7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -339,6 +339,17 @@ static inline bool update_defer_init(pg_data_t *pgdat,
 }
 #endif
 
+/* Return a pointer to the spinblock for a pageblock this page belongs to */
+static inline spinlock_t *get_range_lock(struct page *page)
+{
+   struct zone *zone = page_zone(page);
+   unsigned long zone_start_pfn = zone->zone_start_pfn;
+   unsigned long range = (page_to_pfn(page) - zone_start_pfn) >>
+   (MAX_ORDER - 1);
+
+   return >range_locks[range];
+}
+
 /* Return a pointer to the bitmap storing bits affecting a block of pages */
 static inline unsigned long *get_pageblock_bitmap(struct page *page,
unsigned long pfn)
@@ -697,25 +708,12 @@ static inline void set_page_order(struct page *page, 
unsigned int order)
__SetPageBuddy(page);
 }
 
-static inline void add_to_buddy_common(struct page *page, struct zone *zone,
-   unsigned int order)
+static inline void add_to_buddy(struct page *page, struct zone *zone,
+   unsigned int order, int mt)
 {
set_page_order(page, order);
atomic_long_inc(>free_area[order].nr_free);
-}
-
-static inline void add_to_buddy_head(struct page *page, struct zone *zone,
-   unsigned int order, int mt)
-{
-   add_to_buddy_common(page, zone, order);
-   list_add(>lru, >free_area[order].free_list[mt]);
-}
-
-static inline void add_to_buddy_tail(struct page *page, struct zone *zone,
-   unsigned int order, int mt)
-{
-   add_to_buddy_common(page, zone, order);
-   list_add_tail(>lru, >free_area[order].free_list[mt]);
+   

[RFC PATCH 4/9] mm: convert zone lock from spinlock to rwlock

2018-09-10 Thread Aaron Lu
This patch converts zone lock from spinlock to rwlock and always
take the lock in write mode so there is no functionality change.

This is a preparation for free path to take the lock in read mode
to make free path work concurrently.

compact_trylock and compact_unlock_should_abort are taken from
Daniel Jordan's patch.

Signed-off-by: Aaron Lu 
---
 include/linux/mmzone.h |  2 +-
 mm/compaction.c| 90 +-
 mm/hugetlb.c   |  8 ++--
 mm/page_alloc.c| 52 
 mm/page_isolation.c| 12 +++---
 mm/vmstat.c|  4 +-
 6 files changed, 85 insertions(+), 83 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 1e22d96734e0..84cfa56e2d19 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -465,7 +465,7 @@ struct zone {
unsigned long   flags;
 
/* Primarily protects free_area */
-   spinlock_t  lock;
+   rwlock_tlock;
 
/* Write-intensive fields used by compaction and vmstats. */
ZONE_PADDING(_pad2_)
diff --git a/mm/compaction.c b/mm/compaction.c
index faca45ebe62d..6ecf74d8e287 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -347,20 +347,20 @@ static inline void update_pageblock_skip(struct 
compact_control *cc,
  * Returns true if the lock is held
  * Returns false if the lock is not held and compaction should abort
  */
-static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags,
-   struct compact_control *cc)
-{
-   if (cc->mode == MIGRATE_ASYNC) {
-   if (!spin_trylock_irqsave(lock, *flags)) {
-   cc->contended = true;
-   return false;
-   }
-   } else {
-   spin_lock_irqsave(lock, *flags);
-   }
-
-   return true;
-}
+#define compact_trylock(lock, flags, cc, lockf, trylockf) \
+({\
+   bool __ret = true; \
+   if ((cc)->mode == MIGRATE_ASYNC) { \
+   if (!trylockf((lock), *(flags))) { \
+   (cc)->contended = true;\
+   __ret = false; \
+   }  \
+   } else {   \
+   lockf((lock), *(flags));   \
+   }  \
+  \
+   __ret; \
+})
 
 /*
  * Compaction requires the taking of some coarse locks that are potentially
@@ -377,29 +377,29 @@ static bool compact_trylock_irqsave(spinlock_t *lock, 
unsigned long *flags,
  * Returns false when compaction can continue (sync compaction might have
  * scheduled)
  */
-static bool compact_unlock_should_abort(spinlock_t *lock,
-   unsigned long flags, bool *locked, struct compact_control *cc)
-{
-   if (*locked) {
-   spin_unlock_irqrestore(lock, flags);
-   *locked = false;
-   }
-
-   if (fatal_signal_pending(current)) {
-   cc->contended = true;
-   return true;
-   }
-
-   if (need_resched()) {
-   if (cc->mode == MIGRATE_ASYNC) {
-   cc->contended = true;
-   return true;
-   }
-   cond_resched();
-   }
-
-   return false;
-}
+#define compact_unlock_should_abort(lock, flags, locked, cc, unlockf) \
+({\
+   bool __ret = false;\
+  \
+   if (*(locked)) {   \
+   unlockf((lock), (flags));  \
+   *(locked) = false; \
+   }  \
+  \
+   if (fatal_signal_pending(current)) {   \
+   (cc)->contended = true;\
+   __ret = true;  \
+   } else if (need_resched()) {   \
+   if ((cc)->mode == MIGRATE_ASYNC) {   

[RFC PATCH 6/9] use atomic for free_area[order].nr_free

2018-09-10 Thread Aaron Lu
Since we will make free path run concurrently, free_area[].nr_free has
to be atomic.

Signed-off-by: Aaron Lu 
---
 include/linux/mmzone.h |  2 +-
 mm/page_alloc.c| 12 ++--
 mm/vmstat.c|  4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 84cfa56e2d19..e66b8c63d5d1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -95,7 +95,7 @@ extern int page_group_by_mobility_disabled;
 
 struct free_area {
struct list_headfree_list[MIGRATE_TYPES];
-   unsigned long   nr_free;
+   atomic_long_t   nr_free;
 };
 
 struct pglist_data;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d0b954783f1d..dff3edc60d71 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -701,7 +701,7 @@ static inline void add_to_buddy_common(struct page *page, 
struct zone *zone,
unsigned int order)
 {
set_page_order(page, order);
-   zone->free_area[order].nr_free++;
+   atomic_long_inc(>free_area[order].nr_free);
 }
 
 static inline void add_to_buddy_head(struct page *page, struct zone *zone,
@@ -728,7 +728,7 @@ static inline void remove_from_buddy(struct page *page, 
struct zone *zone,
unsigned int order)
 {
list_del(>lru);
-   zone->free_area[order].nr_free--;
+   atomic_long_dec(>free_area[order].nr_free);
rmv_page_order(page);
 }
 
@@ -2225,7 +2225,7 @@ int find_suitable_fallback(struct free_area *area, 
unsigned int order,
int i;
int fallback_mt;
 
-   if (area->nr_free == 0)
+   if (atomic_long_read(>nr_free) == 0)
return -1;
 
*can_steal = false;
@@ -3178,7 +3178,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int 
order, unsigned long mark,
struct free_area *area = >free_area[o];
int mt;
 
-   if (!area->nr_free)
+   if (atomic_long_read(>nr_free) == 0)
continue;
 
for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
@@ -5029,7 +5029,7 @@ void show_free_areas(unsigned int filter, nodemask_t 
*nodemask)
struct free_area *area = >free_area[order];
int type;
 
-   nr[order] = area->nr_free;
+   nr[order] = atomic_long_read(>nr_free);
total += nr[order] << order;
 
types[order] = 0;
@@ -5562,7 +5562,7 @@ static void __meminit zone_init_free_lists(struct zone 
*zone)
unsigned int order, t;
for_each_migratetype_order(order, t) {
INIT_LIST_HEAD(>free_area[order].free_list[t]);
-   zone->free_area[order].nr_free = 0;
+   atomic_long_set(>free_area[order].nr_free, 0);
}
 }
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 06d79271a8ae..c1985550bb9f 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1030,7 +1030,7 @@ static void fill_contig_page_info(struct zone *zone,
unsigned long blocks;
 
/* Count number of free blocks */
-   blocks = zone->free_area[order].nr_free;
+   blocks = atomic_long_read(>free_area[order].nr_free);
info->free_blocks_total += blocks;
 
/* Count free base pages */
@@ -1353,7 +1353,7 @@ static void frag_show_print(struct seq_file *m, pg_data_t 
*pgdat,
 
seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
for (order = 0; order < MAX_ORDER; ++order)
-   seq_printf(m, "%6lu ", zone->free_area[order].nr_free);
+   seq_printf(m, "%6lu ", 
atomic_long_read(>free_area[order].nr_free));
seq_putc(m, '\n');
 }
 
-- 
2.17.1



[RFC PATCH 7/9] mm: use read_lock for free path

2018-09-10 Thread Aaron Lu
Daniel Jordan's patch has made it possible for multiple threads to
operate on a global list with smp_list_del() at any position and
smp_list_add/splice() at head position concurrently without taking
any lock.

This patch makes use of this technique on free list.
To make this happen, add_to_buddy_tail() is removed since only
adding to list head is safe with smp_list_del() so only
add_to_buddy() is used.

Once free path can run concurrently, it is possible for multiple
threads to free pages at the same time. If 2 pages being freed are
buddy, they can miss the oppotunity to be merged.

For this reason, introduce range locks to protect merge operation
that makes sure inside one range, only one merge can happen and a
page's Buddy status is properly set inside the lock. The range is
selected as an order of (MAX_ORDER-1) pages since merge can't
exceed that order.

Signed-off-by: Aaron Lu 
---
 include/linux/list.h   |  1 +
 include/linux/mmzone.h |  3 ++
 lib/list.c | 23 ++
 mm/page_alloc.c| 95 +++---
 4 files changed, 78 insertions(+), 44 deletions(-)

diff --git a/include/linux/list.h b/include/linux/list.h
index 5f203fb55939..608e40f6489e 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -49,6 +49,7 @@ static inline bool __list_del_entry_valid(struct list_head 
*entry)
 
 extern void smp_list_del(struct list_head *entry);
 extern void smp_list_splice(struct list_head *list, struct list_head *head);
+extern void smp_list_add(struct list_head *entry, struct list_head *head);
 
 /*
  * Insert a new entry between two known consecutive entries.
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e66b8c63d5d1..0ea52e9bb610 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -467,6 +467,9 @@ struct zone {
/* Primarily protects free_area */
rwlock_tlock;
 
+   /* Protects merge operation for a range of order=(MAX_ORDER-1) pages */
+   spinlock_t  *range_locks;
+
/* Write-intensive fields used by compaction and vmstats. */
ZONE_PADDING(_pad2_)
 
diff --git a/lib/list.c b/lib/list.c
index 104faa144abf..3ecf62b88c86 100644
--- a/lib/list.c
+++ b/lib/list.c
@@ -202,3 +202,26 @@ void smp_list_splice(struct list_head *list, struct 
list_head *head)
/* Simultaneously complete the splice and unlock the head node. */
WRITE_ONCE(head->next, first);
 }
+
+void smp_list_add(struct list_head *entry, struct list_head *head)
+{
+   struct list_head *succ;
+
+   /*
+* Lock the front of @head by replacing its next pointer with NULL.
+* Should another thread be adding to the front, wait until it's done.
+*/
+   succ = READ_ONCE(head->next);
+   while (succ == NULL || cmpxchg(>next, succ, NULL) != succ) {
+   cpu_relax();
+   succ = READ_ONCE(head->next);
+   }
+
+   entry->next = succ;
+   entry->prev = head;
+   succ->prev = entry;
+
+   smp_wmb();
+
+   WRITE_ONCE(head->next, entry);
+}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dff3edc60d71..5f5cc671bcf7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -339,6 +339,17 @@ static inline bool update_defer_init(pg_data_t *pgdat,
 }
 #endif
 
+/* Return a pointer to the spinblock for a pageblock this page belongs to */
+static inline spinlock_t *get_range_lock(struct page *page)
+{
+   struct zone *zone = page_zone(page);
+   unsigned long zone_start_pfn = zone->zone_start_pfn;
+   unsigned long range = (page_to_pfn(page) - zone_start_pfn) >>
+   (MAX_ORDER - 1);
+
+   return >range_locks[range];
+}
+
 /* Return a pointer to the bitmap storing bits affecting a block of pages */
 static inline unsigned long *get_pageblock_bitmap(struct page *page,
unsigned long pfn)
@@ -697,25 +708,12 @@ static inline void set_page_order(struct page *page, 
unsigned int order)
__SetPageBuddy(page);
 }
 
-static inline void add_to_buddy_common(struct page *page, struct zone *zone,
-   unsigned int order)
+static inline void add_to_buddy(struct page *page, struct zone *zone,
+   unsigned int order, int mt)
 {
set_page_order(page, order);
atomic_long_inc(>free_area[order].nr_free);
-}
-
-static inline void add_to_buddy_head(struct page *page, struct zone *zone,
-   unsigned int order, int mt)
-{
-   add_to_buddy_common(page, zone, order);
-   list_add(>lru, >free_area[order].free_list[mt]);
-}
-
-static inline void add_to_buddy_tail(struct page *page, struct zone *zone,
-   unsigned int order, int mt)
-{
-   add_to_buddy_common(page, zone, order);
-   list_add_tail(>lru, >free_area[order].free_list[mt]);
+   

Re: [RFC PATCH v2 2/2] fscrypt: enable RCU-walk path for .d_revalidate

2018-09-10 Thread Gao Xiang
Hi Eric,

On 2018/9/11 7:20, Eric Biggers wrote:
> Hi Gao,
> 
> On Mon, Sep 10, 2018 at 09:08:57PM +0800, Gao Xiang wrote:
>> This patch attempts to enable RCU-walk for fscrypt.
>> It looks harmless at glance and could have better
>> performance than do ref-walk only.
>>
>> Signed-off-by: Gao Xiang 
>> ---
>> change log v2:
>>  - READ_ONCE(dir->d_parent) -> READ_ONCE(dentry->d_parent)
>>
>>  fs/crypto/crypto.c | 22 +-
>>  1 file changed, 13 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
>> index b38c574..9bd21c0 100644
>> --- a/fs/crypto/crypto.c
>> +++ b/fs/crypto/crypto.c
>> @@ -319,20 +319,24 @@ static int fscrypt_d_revalidate(struct dentry *dentry, 
>> unsigned int flags)
>>  {
>>  struct dentry *dir;
>>  int dir_has_key, cached_with_key;
>> -
>> -if (flags & LOOKUP_RCU)
>> -return -ECHILD;
>> -
>> -dir = dget_parent(dentry);
>> -if (!IS_ENCRYPTED(d_inode(dir))) {
>> -dput(dir);
>> +struct inode *dir_inode;
>> +
>> +rcu_read_lock();
>> +repeat:
>> +dir = READ_ONCE(dentry->d_parent);
>> +dir_inode = d_inode_rcu(dir);
>> +if (!IS_ENCRYPTED(dir_inode)) {
>> +rcu_read_unlock();
>>  return 0;
>>  }
>> +dir_has_key = (dir_inode->i_crypt_info != NULL);
>> +if (unlikely(READ_ONCE(dir->d_lockref.count) < 0 ||
>> +READ_ONCE(dentry->d_parent) != dir))
>>
>>
>> +rcu_read_unlock();
>>  
>>  cached_with_key = READ_ONCE(dentry->d_flags) &
>>  DCACHE_ENCRYPTED_WITH_KEY;
>> -dir_has_key = (d_inode(dir)->i_crypt_info != NULL);
>> -dput(dir);
>>  
> 
> I think you're right that we don't have to drop out of RCU mode here, but can
> you please Cc linux-fsdevel so that people more knowledgeable about path 
> lookup
> can review this too?  This kind of stuff is very tricky.  Please resend both
> patches.
> 
> Also please indent properly:
> 
>   if (unlikely(READ_ONCE(dir->d_lockref.count) < 0 ||
>  READ_ONCE(dentry->d_parent) != dir))
>   goto repeat;
> 
> Why read d_lockref.count directly instead of using __lockref_is_dead()?

will be fixed in the next version, thanks.

> 
> Also since there's no longer any reference to the parent dentry taken, how do
> you know it's still positive (non-NULL d_inode), i.e. that the directory 
> hasn't
> been removed and turned into a negative dentry (NULL d_inode)?

I think you are right. I saw this fscrypt piece of code when I was locating a
problem related to fscrypt (I am still taking part in it since the problem is 
urgent).
It seems that it could be turned into a negative dentry by d_delete() etc.

I will rethink this flow more, make the next patch later and Cc linux-devel
the next time.

> 
> I'm also wondering whether the retry loop is actually needed.  Can you explain
> your thoughts more?  But if it is needed, in principle you'd actually need to
> wait until after the loop before taking any action based on dir_inode, right?
> That would mean the 'rcu_read_unlock(); return 0;' is in the wrong place.
What I thought was that I guess it needs to be more strict to claim the dentry 
is
still valid than other cases (therefore IS_ENCRYPTED is not so strict, that is
my personal thought tho.)

If the parent dentry just sampled is invalid, since the dentry and inode are
protected by rcu, so there is no way to READ_ONCE(dentry->d_parent) == dir.

Therefore I sampled (IS_ENCRYPTED, dir_has_key) and do a final basic validity
check at last --- currently dentry itself (maybe inode later), and I tend to
try again especially for ref-walk case (which not governed by d_seq) since it is
more lightweight (like a seqlock) than taking & releasing d_lock (or even
return 0 to do real lookup again) I think.

That is my personal thought, could not be accurate, and I am trying to learn
more about the fscrypt due to the urgent problem.

If any error, please kindly point out, thanks...

Thanks,
Gao Xiang

> 
> Thanks,
> 
> - Eric
> 


Re: [RFC PATCH v2 2/2] fscrypt: enable RCU-walk path for .d_revalidate

2018-09-10 Thread Gao Xiang
Hi Eric,

On 2018/9/11 7:20, Eric Biggers wrote:
> Hi Gao,
> 
> On Mon, Sep 10, 2018 at 09:08:57PM +0800, Gao Xiang wrote:
>> This patch attempts to enable RCU-walk for fscrypt.
>> It looks harmless at glance and could have better
>> performance than do ref-walk only.
>>
>> Signed-off-by: Gao Xiang 
>> ---
>> change log v2:
>>  - READ_ONCE(dir->d_parent) -> READ_ONCE(dentry->d_parent)
>>
>>  fs/crypto/crypto.c | 22 +-
>>  1 file changed, 13 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
>> index b38c574..9bd21c0 100644
>> --- a/fs/crypto/crypto.c
>> +++ b/fs/crypto/crypto.c
>> @@ -319,20 +319,24 @@ static int fscrypt_d_revalidate(struct dentry *dentry, 
>> unsigned int flags)
>>  {
>>  struct dentry *dir;
>>  int dir_has_key, cached_with_key;
>> -
>> -if (flags & LOOKUP_RCU)
>> -return -ECHILD;
>> -
>> -dir = dget_parent(dentry);
>> -if (!IS_ENCRYPTED(d_inode(dir))) {
>> -dput(dir);
>> +struct inode *dir_inode;
>> +
>> +rcu_read_lock();
>> +repeat:
>> +dir = READ_ONCE(dentry->d_parent);
>> +dir_inode = d_inode_rcu(dir);
>> +if (!IS_ENCRYPTED(dir_inode)) {
>> +rcu_read_unlock();
>>  return 0;
>>  }
>> +dir_has_key = (dir_inode->i_crypt_info != NULL);
>> +if (unlikely(READ_ONCE(dir->d_lockref.count) < 0 ||
>> +READ_ONCE(dentry->d_parent) != dir))
>>
>>
>> +rcu_read_unlock();
>>  
>>  cached_with_key = READ_ONCE(dentry->d_flags) &
>>  DCACHE_ENCRYPTED_WITH_KEY;
>> -dir_has_key = (d_inode(dir)->i_crypt_info != NULL);
>> -dput(dir);
>>  
> 
> I think you're right that we don't have to drop out of RCU mode here, but can
> you please Cc linux-fsdevel so that people more knowledgeable about path 
> lookup
> can review this too?  This kind of stuff is very tricky.  Please resend both
> patches.
> 
> Also please indent properly:
> 
>   if (unlikely(READ_ONCE(dir->d_lockref.count) < 0 ||
>  READ_ONCE(dentry->d_parent) != dir))
>   goto repeat;
> 
> Why read d_lockref.count directly instead of using __lockref_is_dead()?

will be fixed in the next version, thanks.

> 
> Also since there's no longer any reference to the parent dentry taken, how do
> you know it's still positive (non-NULL d_inode), i.e. that the directory 
> hasn't
> been removed and turned into a negative dentry (NULL d_inode)?

I think you are right. I saw this fscrypt piece of code when I was locating a
problem related to fscrypt (I am still taking part in it since the problem is 
urgent).
It seems that it could be turned into a negative dentry by d_delete() etc.

I will rethink this flow more, make the next patch later and Cc linux-devel
the next time.

> 
> I'm also wondering whether the retry loop is actually needed.  Can you explain
> your thoughts more?  But if it is needed, in principle you'd actually need to
> wait until after the loop before taking any action based on dir_inode, right?
> That would mean the 'rcu_read_unlock(); return 0;' is in the wrong place.
What I thought was that I guess it needs to be more strict to claim the dentry 
is
still valid than other cases (therefore IS_ENCRYPTED is not so strict, that is
my personal thought tho.)

If the parent dentry just sampled is invalid, since the dentry and inode are
protected by rcu, so there is no way to READ_ONCE(dentry->d_parent) == dir.

Therefore I sampled (IS_ENCRYPTED, dir_has_key) and do a final basic validity
check at last --- currently dentry itself (maybe inode later), and I tend to
try again especially for ref-walk case (which not governed by d_seq) since it is
more lightweight (like a seqlock) than taking & releasing d_lock (or even
return 0 to do real lookup again) I think.

That is my personal thought, could not be accurate, and I am trying to learn
more about the fscrypt due to the urgent problem.

If any error, please kindly point out, thanks...

Thanks,
Gao Xiang

> 
> Thanks,
> 
> - Eric
> 


RE: [LINUX PATCH v11 1/3] dt-bindings: memory: Add pl353 smc controller devicetree binding information

2018-09-10 Thread Naga Sureshkumar Relli
Hi,

[LINUX PATCH v11 1/3] dt-bindings: memory: Add pl353 smc controller devicetree 
binding information
[LINUX PATCH v11 2/3] memory: pl353: Add driver for arm pl353 static memory 
controller

Can somebody apply the above patches?
The above patches are already reviewed. 

Thanks,
Naga Sureshkumar Relli.

> -Original Message-
> From: Linus Walleij [mailto:linus.wall...@linaro.org]
> Sent: Friday, July 13, 2018 1:07 PM
> To: Naga Sureshkumar Relli 
> Cc: Boris Brezillon ; Richard Weinberger 
> ;
> David Woodhouse ; Brian Norris
> ; Mark Vasut ; Florian 
> Fainelli
> ; Markus Mayer ; Roger Quadros
> ; Ladislav Michl ; a...@thorsis.co;
> honghui.zh...@mediatek.com; Miquèl Raynal ; linux-
> m...@lists.infradead.org; linux-kernel@vger.kernel.org; 
> nagasureshkumarre...@gmail.com;
> Michal Simek 
> Subject: Re: [LINUX PATCH v11 1/3] dt-bindings: memory: Add pl353 smc 
> controller
> devicetree binding information
> 
> On Wed, Jul 11, 2018 at 9:37 AM Naga Sureshkumar Relli
>  wrote:
> 
> > Add pl353 static memory controller devicetree binding information.
> >
> > Signed-off-by: Naga Sureshkumar Relli
> > 
> > ---
> > Changes in v11:
> 
> Reviewed-by: Linus Walleij  Thanks!
> 
> Yours,
> Linus Walleij


RE: [LINUX PATCH v11 1/3] dt-bindings: memory: Add pl353 smc controller devicetree binding information

2018-09-10 Thread Naga Sureshkumar Relli
Hi,

[LINUX PATCH v11 1/3] dt-bindings: memory: Add pl353 smc controller devicetree 
binding information
[LINUX PATCH v11 2/3] memory: pl353: Add driver for arm pl353 static memory 
controller

Can somebody apply the above patches?
The above patches are already reviewed. 

Thanks,
Naga Sureshkumar Relli.

> -Original Message-
> From: Linus Walleij [mailto:linus.wall...@linaro.org]
> Sent: Friday, July 13, 2018 1:07 PM
> To: Naga Sureshkumar Relli 
> Cc: Boris Brezillon ; Richard Weinberger 
> ;
> David Woodhouse ; Brian Norris
> ; Mark Vasut ; Florian 
> Fainelli
> ; Markus Mayer ; Roger Quadros
> ; Ladislav Michl ; a...@thorsis.co;
> honghui.zh...@mediatek.com; Miquèl Raynal ; linux-
> m...@lists.infradead.org; linux-kernel@vger.kernel.org; 
> nagasureshkumarre...@gmail.com;
> Michal Simek 
> Subject: Re: [LINUX PATCH v11 1/3] dt-bindings: memory: Add pl353 smc 
> controller
> devicetree binding information
> 
> On Wed, Jul 11, 2018 at 9:37 AM Naga Sureshkumar Relli
>  wrote:
> 
> > Add pl353 static memory controller devicetree binding information.
> >
> > Signed-off-by: Naga Sureshkumar Relli
> > 
> > ---
> > Changes in v11:
> 
> Reviewed-by: Linus Walleij  Thanks!
> 
> Yours,
> Linus Walleij


[PATCH] scsi: qla2xxx: reduce time granularity of qla2x00_eh_wait_on_command

2018-09-10 Thread Jianchao Wang
If the cmd has not be returned after aborted by qla2x00_eh_abort,
we have to wait for it. However, the time is 1000ms at least currently.
If there are a lot cmds need to be aborted, the delay could be long
enough to lead to panic due to such as hung task, ocfs2 heartbeat,
etc, just before scsi recovery works.

Change the granularity to 1ms, even though more context switches
would be introduced, but it should be ok as it is not hot path.

Signed-off-by: Jianchao Wang 
---
 drivers/scsi/qla2xxx/qla_os.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 42b8f0d..570d93b 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -1065,7 +1065,7 @@ qla2xxx_mqueuecommand(struct Scsi_Host *host, struct 
scsi_cmnd *cmd,
 static int
 qla2x00_eh_wait_on_command(struct scsi_cmnd *cmd)
 {
-#define ABORT_POLLING_PERIOD   1000
+#define ABORT_POLLING_PERIOD   1
 #define ABORT_WAIT_ITER((2 * 1000) / (ABORT_POLLING_PERIOD))
unsigned long wait_iter = ABORT_WAIT_ITER;
scsi_qla_host_t *vha = shost_priv(cmd->device->host);
-- 
2.7.4



[PATCH] scsi: qla2xxx: reduce time granularity of qla2x00_eh_wait_on_command

2018-09-10 Thread Jianchao Wang
If the cmd has not be returned after aborted by qla2x00_eh_abort,
we have to wait for it. However, the time is 1000ms at least currently.
If there are a lot cmds need to be aborted, the delay could be long
enough to lead to panic due to such as hung task, ocfs2 heartbeat,
etc, just before scsi recovery works.

Change the granularity to 1ms, even though more context switches
would be introduced, but it should be ok as it is not hot path.

Signed-off-by: Jianchao Wang 
---
 drivers/scsi/qla2xxx/qla_os.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 42b8f0d..570d93b 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -1065,7 +1065,7 @@ qla2xxx_mqueuecommand(struct Scsi_Host *host, struct 
scsi_cmnd *cmd,
 static int
 qla2x00_eh_wait_on_command(struct scsi_cmnd *cmd)
 {
-#define ABORT_POLLING_PERIOD   1000
+#define ABORT_POLLING_PERIOD   1
 #define ABORT_WAIT_ITER((2 * 1000) / (ABORT_POLLING_PERIOD))
unsigned long wait_iter = ABORT_WAIT_ITER;
scsi_qla_host_t *vha = shost_priv(cmd->device->host);
-- 
2.7.4



RE: [LINUX PATCH v10 2/2] mtd: rawnand: arasan: Add support for Arasan NAND Flash Controller

2018-09-10 Thread Naga Sureshkumar Relli
Hi Boris,

> -Original Message-
> From: Boris Brezillon [mailto:boris.brezil...@bootlin.com]
> Sent: Monday, August 20, 2018 10:10 PM
> To: Naga Sureshkumar Relli 
> Cc: miquel.ray...@bootlin.com; rich...@nod.at; dw...@infradead.org;
> computersforpe...@gmail.com; marek.va...@gmail.com; kyungmin.p...@samsung.com;
> abs...@codeaurora.org; peterpand...@micron.com; frieder.schre...@exceet.de; 
> linux-
> m...@lists.infradead.org; linux-kernel@vger.kernel.org; Michal Simek 
> ;
> nagasureshkumarre...@gmail.com
> Subject: Re: [LINUX PATCH v10 2/2] mtd: rawnand: arasan: Add support for 
> Arasan NAND
> Flash Controller
> 
> Hi Naga,
> 
> On Fri, 17 Aug 2018 18:49:24 +0530
> Naga Sureshkumar Relli  wrote:
> 
> 
> I haven't finished reviewing the driver but there are still a bunch of things 
> that look strange, for
> instance, your ->read/write_page() implementation looks suspicious. Let's 
> discuss that before
> you send a new version.
Could you please review the remaining stuff?
I have the changes ready with me which will address all your comments given to 
this series.

Thanks,
Naga Sureshkumar Relli


RE: [LINUX PATCH v10 2/2] mtd: rawnand: arasan: Add support for Arasan NAND Flash Controller

2018-09-10 Thread Naga Sureshkumar Relli
Hi Boris,

> -Original Message-
> From: Boris Brezillon [mailto:boris.brezil...@bootlin.com]
> Sent: Monday, August 20, 2018 10:10 PM
> To: Naga Sureshkumar Relli 
> Cc: miquel.ray...@bootlin.com; rich...@nod.at; dw...@infradead.org;
> computersforpe...@gmail.com; marek.va...@gmail.com; kyungmin.p...@samsung.com;
> abs...@codeaurora.org; peterpand...@micron.com; frieder.schre...@exceet.de; 
> linux-
> m...@lists.infradead.org; linux-kernel@vger.kernel.org; Michal Simek 
> ;
> nagasureshkumarre...@gmail.com
> Subject: Re: [LINUX PATCH v10 2/2] mtd: rawnand: arasan: Add support for 
> Arasan NAND
> Flash Controller
> 
> Hi Naga,
> 
> On Fri, 17 Aug 2018 18:49:24 +0530
> Naga Sureshkumar Relli  wrote:
> 
> 
> I haven't finished reviewing the driver but there are still a bunch of things 
> that look strange, for
> instance, your ->read/write_page() implementation looks suspicious. Let's 
> discuss that before
> you send a new version.
Could you please review the remaining stuff?
I have the changes ready with me which will address all your comments given to 
this series.

Thanks,
Naga Sureshkumar Relli


[LKP] [kernfs, sysfs, cgroup, intel_rdt] a8c7fe83d1: BUG:kernel_hang_in_test_stage

2018-09-10 Thread kernel test robot
FYI, we noticed the following commit (built with gcc-6):

commit: a8c7fe83d17109b77c7b27a23140e76d3753fa6a ("kernfs, sysfs, cgroup, 
intel_rdt: Support fs_context")
https://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git 
btrfs-mount-api

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu Haswell,+smep,+smap -smp 2 
-m 2G

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


+--+++
|  | 88ed9083f5 
| a8c7fe83d1 |
+--+++
| boot_successes   | 2  
| 0  |
| boot_failures| 10 
| 10 |
| BUG:KASAN:null-ptr-deref_in_n| 10 
||
| BUG:unable_to_handle_kernel  | 10 
||
| Oops:#[##]   | 10 
||
| RIP:nfs_fs_mount | 10 
||
| Kernel_panic-not_syncing:Fatal_exception | 10 
||
| BUG:kernel_hang_in_test_stage| 0  
| 8  |
| invoked_oom-killer:gfp_mask=0x   | 0  
| 2  |
| Mem-Info | 0  
| 2  |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 0  
| 2  |
+--+++



[   17.521153] random: get_random_bytes called from 
flow_hash_from_keys+0x3e9/0x480 with crng_init=0
[   17.526197] random: get_random_bytes called from 
addrconf_dad_kick+0xf7/0x1a0 with crng_init=0
[   55.227136] random: get_random_bytes called from __prandom_timer+0x57/0xc0 
with crng_init=0
[  442.854487] random: fast init done
[  880.187591] random: crng init done
BUG: kernel hang in test stage

Elapsed time: 2690

#!/bin/bash



To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k  job-script # job-script is attached in this 
email



Thanks,
Rong, Chen
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 4.19.0-rc1 Kernel Configuration
#

#
# Compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=60400
CONFIG_CLANG_VERSION=0
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
# CONFIG_SYSVIPC is not set
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_GENERIC_IRQ_DEBUGFS=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
# CONFIG_HIGH_RES_TIMERS is not set
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_STALL_COMMON=y

[LKP] [kernfs, sysfs, cgroup, intel_rdt] a8c7fe83d1: BUG:kernel_hang_in_test_stage

2018-09-10 Thread kernel test robot
FYI, we noticed the following commit (built with gcc-6):

commit: a8c7fe83d17109b77c7b27a23140e76d3753fa6a ("kernfs, sysfs, cgroup, 
intel_rdt: Support fs_context")
https://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git 
btrfs-mount-api

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu Haswell,+smep,+smap -smp 2 
-m 2G

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


+--+++
|  | 88ed9083f5 
| a8c7fe83d1 |
+--+++
| boot_successes   | 2  
| 0  |
| boot_failures| 10 
| 10 |
| BUG:KASAN:null-ptr-deref_in_n| 10 
||
| BUG:unable_to_handle_kernel  | 10 
||
| Oops:#[##]   | 10 
||
| RIP:nfs_fs_mount | 10 
||
| Kernel_panic-not_syncing:Fatal_exception | 10 
||
| BUG:kernel_hang_in_test_stage| 0  
| 8  |
| invoked_oom-killer:gfp_mask=0x   | 0  
| 2  |
| Mem-Info | 0  
| 2  |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 0  
| 2  |
+--+++



[   17.521153] random: get_random_bytes called from 
flow_hash_from_keys+0x3e9/0x480 with crng_init=0
[   17.526197] random: get_random_bytes called from 
addrconf_dad_kick+0xf7/0x1a0 with crng_init=0
[   55.227136] random: get_random_bytes called from __prandom_timer+0x57/0xc0 
with crng_init=0
[  442.854487] random: fast init done
[  880.187591] random: crng init done
BUG: kernel hang in test stage

Elapsed time: 2690

#!/bin/bash



To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k  job-script # job-script is attached in this 
email



Thanks,
Rong, Chen
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 4.19.0-rc1 Kernel Configuration
#

#
# Compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=60400
CONFIG_CLANG_VERSION=0
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
# CONFIG_SYSVIPC is not set
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_GENERIC_IRQ_DEBUGFS=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
# CONFIG_HIGH_RES_TIMERS is not set
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_STALL_COMMON=y

Re: [PATCHv3 2/6] tty/ldsem: Update waiter->task before waking up reader

2018-09-10 Thread Sergey Senozhatsky
On (09/11/18 02:48), Dmitry Safonov wrote:
> There is a couple of reports about lockup in ldsem_down_read() without
> anyone holding write end of ldisc semaphore:
> lkml.kernel.org/r/<20171121132855.ajdv4k6swzhvk...@wfg-t540p.sh.intel.com>
> lkml.kernel.org/r/<20180907045041.GF1110@shao2-debian>
> 
> They all looked like a missed wake up.
> I wasn't lucky enough to reproduce it, but it seems like reader on
> another CPU can miss waiter->task update and schedule again, resulting
> in indefinite (MAX_SCHEDULE_TIMEOUT) sleep.

Certainly, something suspicious is going on.

> @@ -118,6 +118,8 @@ static void __ldsem_wake_readers(struct ld_semaphore *sem)
>   tsk = waiter->task;
>   smp_mb();
>   waiter->task = NULL;
> + /* Make sure down_read_failed() will see !waiter->task update */
> + smp_wmb();
>   wake_up_process(tsk);

Hmm. I think wake_up_process() executes a full memory barrier, because
it accesses task state.

>   put_task_struct(tsk);
>   }
> @@ -217,7 +219,7 @@ down_read_failed(struct ld_semaphore *sem, long count, 
> long timeout)
>   for (;;) {
>   set_current_state(TASK_UNINTERRUPTIBLE);

I think that set_current_state() also executes memory barrier. Just
because it accesses task state.

> - if (!waiter.task)
> + if (!READ_ONCE(waiter.task))
>   break;
>   if (!timeout)
>   break;

-ss


Re: [PATCHv3 2/6] tty/ldsem: Update waiter->task before waking up reader

2018-09-10 Thread Sergey Senozhatsky
On (09/11/18 02:48), Dmitry Safonov wrote:
> There is a couple of reports about lockup in ldsem_down_read() without
> anyone holding write end of ldisc semaphore:
> lkml.kernel.org/r/<20171121132855.ajdv4k6swzhvk...@wfg-t540p.sh.intel.com>
> lkml.kernel.org/r/<20180907045041.GF1110@shao2-debian>
> 
> They all looked like a missed wake up.
> I wasn't lucky enough to reproduce it, but it seems like reader on
> another CPU can miss waiter->task update and schedule again, resulting
> in indefinite (MAX_SCHEDULE_TIMEOUT) sleep.

Certainly, something suspicious is going on.

> @@ -118,6 +118,8 @@ static void __ldsem_wake_readers(struct ld_semaphore *sem)
>   tsk = waiter->task;
>   smp_mb();
>   waiter->task = NULL;
> + /* Make sure down_read_failed() will see !waiter->task update */
> + smp_wmb();
>   wake_up_process(tsk);

Hmm. I think wake_up_process() executes a full memory barrier, because
it accesses task state.

>   put_task_struct(tsk);
>   }
> @@ -217,7 +219,7 @@ down_read_failed(struct ld_semaphore *sem, long count, 
> long timeout)
>   for (;;) {
>   set_current_state(TASK_UNINTERRUPTIBLE);

I think that set_current_state() also executes memory barrier. Just
because it accesses task state.

> - if (!waiter.task)
> + if (!READ_ONCE(waiter.task))
>   break;
>   if (!timeout)
>   break;

-ss


linux-next: Tree for Sep 11

2018-09-10 Thread Stephen Rothwell
Hi all,

Changes since 20180910:

Dropped trees: xarray, ida (temporarily)

The vfs tree lost a build failure, but I still disabled building some
samples.

The tty tree gained a build failure so I used the version from
next-20180910.

Non-merge commits (relative to Linus' tree): 2768
 3055 files changed, 91468 insertions(+), 62223 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 287 trees (counting Linus' and 66 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (11da3a7f84f1 Linux 4.19-rc3)
Merging fixes/master (72358c0b59b7 linux-next: build warnings from the build of 
Linus' tree)
Merging kbuild-current/fixes (11da3a7f84f1 Linux 4.19-rc3)
Merging arc-current/for-curr (00a99339f0a3 ARCv2: build: use mcpu=hs38 iso 
generic mcpu=archs)
Merging arm-current/fixes (afc9f65e01cd ARM: 8781/1: Fix Thumb-2 syscall return 
for binutils 2.29+)
Merging arm64-fixes/for-next/fixes (fac880c7d074 arm64: fix erroneous warnings 
in page freeing functions)
Merging m68k-current/for-linus (0986b16ab49b m68k/mac: Use correct PMU response 
format)
Merging powerpc-fixes/fixes (cca19f0b684f powerpc/64s/radix: Fix missing global 
invalidations when removing copro)
Merging sparc/master (df2def49c57b Merge tag 'acpi-4.19-rc1-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (7c5cca358854 qmi_wwan: Support dynamic config on Quectel 
EP06)
Merging bpf/master (28619527b8a7 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging ipsec/master (782710e333a5 xfrm: reset crypto_done when iterating over 
multiple input xfrms)
Merging netfilter/master (7acfda539c0b netfilter: nf_tables: release chain in 
flushing set)
Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates 
of non-anonymous set)
Merging wireless-drivers/master (5b394b2ddf03 Linux 4.19-rc1)
Merging mac80211/master (c42055105785 mac80211: fix TX status reporting for 
ieee80211s)
Merging rdma-fixes/for-rc (8f28b178f71c RDMA/mlx4: Ensure that maximal 
send/receive SGE less than supported by HW)
Merging sound-current/for-linus (36f3a6e02c14 ALSA: fireface: fix memory leak 
in ff400_switch_fetching_mode())
Merging sound-asoc-fixes/for-linus (2c1007fbee3b Merge branch 'asoc-4.19' into 
asoc-linus)
Merging regmap-fixes/for-linus (57361846b52b Linux 4.19-rc2)
Merging regulator-fixes/for-linus (cde11023609f Merge branch 'regulator-4.19' 
into regulator-linus)
Merging spi-fixes/for-linus (a3d6be06a30c Merge branch 'spi-4.19' into 
spi-linus)
Merging pci-current/for-linus (342227b42fe8 PCI: pciehp: Fix hot-add vs 
powerfault detection order)
Merging driver-core.current/driver-core-linus (11da3a7f84f1 Linux 4.19-rc3)
Merging tty.current/tty-linus (7f2bf7840b74 tty: hvc: hvc_write() fix break 
condition)
Merging usb.current/usb-linus (df3aa13c7bbb Revert "cdc-acm: implement 
put_char() and flush_chars()")
Merging usb-gadget-fixes/fixes (d9707490077b usb: dwc2: Fix call location of 
dwc2_check_core_endianness)
Merging usb-serial-fixes/usb-linus (5dfdd24eb3d3 USB: serial: ti_usb_3410_5052: 
fix array underflow in completion handler)
Merging usb-chipidea-fixes/ci-for-usb-stable (a930d8bd94d8 usb: chipidea: 
Always buil

linux-next: Tree for Sep 11

2018-09-10 Thread Stephen Rothwell
Hi all,

Changes since 20180910:

Dropped trees: xarray, ida (temporarily)

The vfs tree lost a build failure, but I still disabled building some
samples.

The tty tree gained a build failure so I used the version from
next-20180910.

Non-merge commits (relative to Linus' tree): 2768
 3055 files changed, 91468 insertions(+), 62223 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 287 trees (counting Linus' and 66 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (11da3a7f84f1 Linux 4.19-rc3)
Merging fixes/master (72358c0b59b7 linux-next: build warnings from the build of 
Linus' tree)
Merging kbuild-current/fixes (11da3a7f84f1 Linux 4.19-rc3)
Merging arc-current/for-curr (00a99339f0a3 ARCv2: build: use mcpu=hs38 iso 
generic mcpu=archs)
Merging arm-current/fixes (afc9f65e01cd ARM: 8781/1: Fix Thumb-2 syscall return 
for binutils 2.29+)
Merging arm64-fixes/for-next/fixes (fac880c7d074 arm64: fix erroneous warnings 
in page freeing functions)
Merging m68k-current/for-linus (0986b16ab49b m68k/mac: Use correct PMU response 
format)
Merging powerpc-fixes/fixes (cca19f0b684f powerpc/64s/radix: Fix missing global 
invalidations when removing copro)
Merging sparc/master (df2def49c57b Merge tag 'acpi-4.19-rc1-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (7c5cca358854 qmi_wwan: Support dynamic config on Quectel 
EP06)
Merging bpf/master (28619527b8a7 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging ipsec/master (782710e333a5 xfrm: reset crypto_done when iterating over 
multiple input xfrms)
Merging netfilter/master (7acfda539c0b netfilter: nf_tables: release chain in 
flushing set)
Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates 
of non-anonymous set)
Merging wireless-drivers/master (5b394b2ddf03 Linux 4.19-rc1)
Merging mac80211/master (c42055105785 mac80211: fix TX status reporting for 
ieee80211s)
Merging rdma-fixes/for-rc (8f28b178f71c RDMA/mlx4: Ensure that maximal 
send/receive SGE less than supported by HW)
Merging sound-current/for-linus (36f3a6e02c14 ALSA: fireface: fix memory leak 
in ff400_switch_fetching_mode())
Merging sound-asoc-fixes/for-linus (2c1007fbee3b Merge branch 'asoc-4.19' into 
asoc-linus)
Merging regmap-fixes/for-linus (57361846b52b Linux 4.19-rc2)
Merging regulator-fixes/for-linus (cde11023609f Merge branch 'regulator-4.19' 
into regulator-linus)
Merging spi-fixes/for-linus (a3d6be06a30c Merge branch 'spi-4.19' into 
spi-linus)
Merging pci-current/for-linus (342227b42fe8 PCI: pciehp: Fix hot-add vs 
powerfault detection order)
Merging driver-core.current/driver-core-linus (11da3a7f84f1 Linux 4.19-rc3)
Merging tty.current/tty-linus (7f2bf7840b74 tty: hvc: hvc_write() fix break 
condition)
Merging usb.current/usb-linus (df3aa13c7bbb Revert "cdc-acm: implement 
put_char() and flush_chars()")
Merging usb-gadget-fixes/fixes (d9707490077b usb: dwc2: Fix call location of 
dwc2_check_core_endianness)
Merging usb-serial-fixes/usb-linus (5dfdd24eb3d3 USB: serial: ti_usb_3410_5052: 
fix array underflow in completion handler)
Merging usb-chipidea-fixes/ci-for-usb-stable (a930d8bd94d8 usb: chipidea: 
Always buil

Re: [PATCH v9 3/6] kernel/reboot.c: export pm_power_off_prepare

2018-09-10 Thread Oleksij Rempel
Hi Shawn,

On 11.09.2018 03:53, Shawn Guo wrote:
> On Mon, Sep 10, 2018 at 04:19:26PM +0100, Mark Brown wrote:
>> On Sun, Sep 09, 2018 at 10:00:23AM +0800, Shawn Guo wrote:
>>> On Thu, Sep 06, 2018 at 11:15:17AM +0100, Mark Brown wrote:
>>
 I was expecting to get a pull request with the precursor patches in it -
 the regulator driver seems to get a moderate amount of development so
 there's a reasonable risk of conflicts.
>>
>>> What about you create a stable topic branch for regulator patches and I
>>> pull it into IMX tree?
>>
>> Sure, I can send a pull request back but the first two patches in the
>> series are ARM ones - are you OK with me just applying them and sending
>> them in the pull request or do you want to apply them first?
> 
> I just took another look at the series.  It seems that there is no
> build-time dependency between regulator and platform patches.  So I
> think we can handle the series like:
> 
>  - You apply patch #3, #4 and #5 on regulator tree;
>  - I apply the reset on IMX tree.
> 
> There shouldn't be any build or run time regression on either tree, and
> the feature that the series adds will be available when both trees get
> merged together on -next or Linus tree.
> 
> @Oleksij Is my understanding above correct?

Yes.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v9 3/6] kernel/reboot.c: export pm_power_off_prepare

2018-09-10 Thread Oleksij Rempel
Hi Shawn,

On 11.09.2018 03:53, Shawn Guo wrote:
> On Mon, Sep 10, 2018 at 04:19:26PM +0100, Mark Brown wrote:
>> On Sun, Sep 09, 2018 at 10:00:23AM +0800, Shawn Guo wrote:
>>> On Thu, Sep 06, 2018 at 11:15:17AM +0100, Mark Brown wrote:
>>
 I was expecting to get a pull request with the precursor patches in it -
 the regulator driver seems to get a moderate amount of development so
 there's a reasonable risk of conflicts.
>>
>>> What about you create a stable topic branch for regulator patches and I
>>> pull it into IMX tree?
>>
>> Sure, I can send a pull request back but the first two patches in the
>> series are ARM ones - are you OK with me just applying them and sending
>> them in the pull request or do you want to apply them first?
> 
> I just took another look at the series.  It seems that there is no
> build-time dependency between regulator and platform patches.  So I
> think we can handle the series like:
> 
>  - You apply patch #3, #4 and #5 on regulator tree;
>  - I apply the reset on IMX tree.
> 
> There shouldn't be any build or run time regression on either tree, and
> the feature that the series adds will be available when both trees get
> merged together on -next or Linus tree.
> 
> @Oleksij Is my understanding above correct?

Yes.



signature.asc
Description: OpenPGP digital signature


Re: get_arg_page() && ptr_size accounting

2018-09-10 Thread Kees Cook
On Mon, Sep 10, 2018 at 10:43 AM, Oleg Nesterov  wrote:
> On 09/10, Oleg Nesterov wrote:
>>
>> On 09/10, Kees Cook wrote:
>> >
>> > On Mon, Sep 10, 2018 at 9:41 AM, Kees Cook  wrote:
>> > > On Mon, Sep 10, 2018 at 5:29 AM, Oleg Nesterov  wrote:
>> > >> Hi Kees,
>> > >>
>> > >> I was thinking about backporting the commit 98da7d08850fb8bde
>> > >> ("fs/exec.c: account for argv/envp pointers"), but I am not sure
>> > >> I understand it...
>> >
>> > BTW, if you backport that, please get the rest associated with the
>> > various Stack Clash related weaknesses:
>>
>> may be...
>>
>> > da029c11e6b1 exec: Limit arg stack to at most 75% of _STK_LIM
>>
>> and I have to admit that I do not understand this patch at all, the
>> changelog explains nothing.
>>
>> Could you explain what this patch actually prevents from? Especially
>> now that we have stack_guard_gap?
>
> forgot to mention...
>
> with this patch
>
> #define MAX_ARG_STRINGS 0x7FFF
>
> doesn't match the reality. perhaps something like below makes sense just
> to make it clear, but this is cosmetic.

Part of the discussion from back then was basically "we don't have
hard-coded limits so programs need to check dynamically themselves".

I'd prefer to leave it all well enough alone since I don't want to
introduce regressions here in the face of the many many Stack Clash
style weaknesses.

-Kees

-- 
Kees Cook
Pixel Security


Re: get_arg_page() && ptr_size accounting

2018-09-10 Thread Kees Cook
On Mon, Sep 10, 2018 at 10:43 AM, Oleg Nesterov  wrote:
> On 09/10, Oleg Nesterov wrote:
>>
>> On 09/10, Kees Cook wrote:
>> >
>> > On Mon, Sep 10, 2018 at 9:41 AM, Kees Cook  wrote:
>> > > On Mon, Sep 10, 2018 at 5:29 AM, Oleg Nesterov  wrote:
>> > >> Hi Kees,
>> > >>
>> > >> I was thinking about backporting the commit 98da7d08850fb8bde
>> > >> ("fs/exec.c: account for argv/envp pointers"), but I am not sure
>> > >> I understand it...
>> >
>> > BTW, if you backport that, please get the rest associated with the
>> > various Stack Clash related weaknesses:
>>
>> may be...
>>
>> > da029c11e6b1 exec: Limit arg stack to at most 75% of _STK_LIM
>>
>> and I have to admit that I do not understand this patch at all, the
>> changelog explains nothing.
>>
>> Could you explain what this patch actually prevents from? Especially
>> now that we have stack_guard_gap?
>
> forgot to mention...
>
> with this patch
>
> #define MAX_ARG_STRINGS 0x7FFF
>
> doesn't match the reality. perhaps something like below makes sense just
> to make it clear, but this is cosmetic.

Part of the discussion from back then was basically "we don't have
hard-coded limits so programs need to check dynamically themselves".

I'd prefer to leave it all well enough alone since I don't want to
introduce regressions here in the face of the many many Stack Clash
style weaknesses.

-Kees

-- 
Kees Cook
Pixel Security


Re: get_arg_page() && ptr_size accounting

2018-09-10 Thread Kees Cook
On Mon, Sep 10, 2018 at 10:21 AM, Oleg Nesterov  wrote:
> On 09/10, Kees Cook wrote:
>>
>> On Mon, Sep 10, 2018 at 9:41 AM, Kees Cook  wrote:
>> > On Mon, Sep 10, 2018 at 5:29 AM, Oleg Nesterov  wrote:
>> >> Hi Kees,
>> >>
>> >> I was thinking about backporting the commit 98da7d08850fb8bde
>> >> ("fs/exec.c: account for argv/envp pointers"), but I am not sure
>> >> I understand it...
>>
>> BTW, if you backport that, please get the rest associated with the
>> various Stack Clash related weaknesses:
>
> may be...
>
>> da029c11e6b1 exec: Limit arg stack to at most 75% of _STK_LIM
>
> and I have to admit that I do not understand this patch at all, the
> changelog explains nothing.

The issue here is with keeping some stack space available for a
program to reasonably start execution without doing insane things. The
sizes were picked after discussion with Linus while examining the
various Stack Clash weaknesses.

> Could you explain what this patch actually prevents from? Especially
> now that we have stack_guard_gap?

One of the many Stack Clash abuses was that it was possible to jump
over the stack gap with outrageous environment variables that got
expanded in stupid ways by, IIRC, glibc or the dynamic linker. The
point here was to be defensive in the face of future weaknesses, and
try to be robust in the face of crazy execs but workable under normal
(but large) execs.

-Kees

-- 
Kees Cook
Pixel Security


Re: get_arg_page() && ptr_size accounting

2018-09-10 Thread Kees Cook
On Mon, Sep 10, 2018 at 10:21 AM, Oleg Nesterov  wrote:
> On 09/10, Kees Cook wrote:
>>
>> On Mon, Sep 10, 2018 at 9:41 AM, Kees Cook  wrote:
>> > On Mon, Sep 10, 2018 at 5:29 AM, Oleg Nesterov  wrote:
>> >> Hi Kees,
>> >>
>> >> I was thinking about backporting the commit 98da7d08850fb8bde
>> >> ("fs/exec.c: account for argv/envp pointers"), but I am not sure
>> >> I understand it...
>>
>> BTW, if you backport that, please get the rest associated with the
>> various Stack Clash related weaknesses:
>
> may be...
>
>> da029c11e6b1 exec: Limit arg stack to at most 75% of _STK_LIM
>
> and I have to admit that I do not understand this patch at all, the
> changelog explains nothing.

The issue here is with keeping some stack space available for a
program to reasonably start execution without doing insane things. The
sizes were picked after discussion with Linus while examining the
various Stack Clash weaknesses.

> Could you explain what this patch actually prevents from? Especially
> now that we have stack_guard_gap?

One of the many Stack Clash abuses was that it was possible to jump
over the stack gap with outrageous environment variables that got
expanded in stupid ways by, IIRC, glibc or the dynamic linker. The
point here was to be defensive in the face of future weaknesses, and
try to be robust in the face of crazy execs but workable under normal
(but large) execs.

-Kees

-- 
Kees Cook
Pixel Security


Re: get_arg_page() && ptr_size accounting

2018-09-10 Thread Kees Cook
On Mon, Sep 10, 2018 at 10:18 AM, Oleg Nesterov  wrote:
> On 09/10, Kees Cook wrote:
>>
>> > So get_arg_page() does
>> >
>> > /*
>> >  * Since the stack will hold pointers to the strings, we
>> >  * must account for them as well.
>> >  *
>> >  * The size calculation is the entire vma while each arg 
>> > page is
>> >  * built, so each time we get here it's calculating how 
>> > far it
>> >  * is currently (rather than each call being just the newly
>> >  * added size from the arg page).  As a result, we need to
>> >  * always add the entire size of the pointers, so that on 
>> > the
>> >  * last call to get_arg_page() we'll actually have the 
>> > entire
>> >  * correct size.
>> >  */
>> > ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
>> > if (ptr_size > ULONG_MAX - size)
>> > goto fail;
>> > size += ptr_size;
>> >
>> > OK, but
>> > acct_arg_size(bprm, size / PAGE_SIZE);
>> >
>> > after that doesn't look exactly right. This additional space will be used 
>> > later
>> > when the process already uses bprm->mm, right? so it shouldn't be 
>> > accounted by
>> > acct_arg_size().
>>
>> My understanding (based on the comment about acct_arg_size()) is that
>> before exec_mmap() happens, the memory used to build the new arguments
>> copy memory area gets accounted to the MM_ANONPAGES resource limit of
>> the execing process.
>
> Yes, because otherwise oom-killer can't account the memory populated by
> get_arg_page() in bprm->mm.
>
>> I couldn't find any place where the argc/envc
>> pointers were being included in the count,
>
> But why??? To clarify,
>
> size += ptr_size;
>
> after acct_arg_size() is clear and correct, we are going to check rlim_stack
> and thus the size should include the pointers we will add in 
> create_elf_tables().
>
> But acct_arg_size() should only account the pages we allocate for bprm->mm,
> nothing more. create_elf_tables() does not allocate the memory when it 
> populates
> arg_start/arg_end/env_start/env_end. Plus at this time the process has already
> switched to bprm->mm.

I've looked more closely now. So, while I agree with you about
resource limits, there's a corner case that is better handled here:
once we've called flush_old_exec(), we can no longer send errors back
to the parent. We just segfault. So, I think it's better to give a
resource limit error early, since it is able to do the math early.

If we move acct_arg_size() earlier, then the "immediate" resource
utilization is checked, but it means it can just segfault later. If we
leave it as-is, we account for later memory allocations "too early",
but we'll still not be able to run: but we can tell the parent why.

I prefer leave it as-is.

>> > Not to mention that ptr_size/PAGE_SIZE doesn't look right in any case...
>>
>> Hm? acct_arg_size() takes pages, not bytes. I think this is correct?
>> What doesn't look right to you?
>
> Please forget. I meant that _if_ we actually wanted to account this additional
> memory in bprm->pages, than we would probably need something like
> acct_arg_size(size/PAGE_SIZE + DIV_ROUND_UP(ptr_size, PAGE_SIZE)).

I'd need to study that more, but that change seems reasonable. :)

-Kees

-- 
Kees Cook
Pixel Security


Re: get_arg_page() && ptr_size accounting

2018-09-10 Thread Kees Cook
On Mon, Sep 10, 2018 at 10:18 AM, Oleg Nesterov  wrote:
> On 09/10, Kees Cook wrote:
>>
>> > So get_arg_page() does
>> >
>> > /*
>> >  * Since the stack will hold pointers to the strings, we
>> >  * must account for them as well.
>> >  *
>> >  * The size calculation is the entire vma while each arg 
>> > page is
>> >  * built, so each time we get here it's calculating how 
>> > far it
>> >  * is currently (rather than each call being just the newly
>> >  * added size from the arg page).  As a result, we need to
>> >  * always add the entire size of the pointers, so that on 
>> > the
>> >  * last call to get_arg_page() we'll actually have the 
>> > entire
>> >  * correct size.
>> >  */
>> > ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
>> > if (ptr_size > ULONG_MAX - size)
>> > goto fail;
>> > size += ptr_size;
>> >
>> > OK, but
>> > acct_arg_size(bprm, size / PAGE_SIZE);
>> >
>> > after that doesn't look exactly right. This additional space will be used 
>> > later
>> > when the process already uses bprm->mm, right? so it shouldn't be 
>> > accounted by
>> > acct_arg_size().
>>
>> My understanding (based on the comment about acct_arg_size()) is that
>> before exec_mmap() happens, the memory used to build the new arguments
>> copy memory area gets accounted to the MM_ANONPAGES resource limit of
>> the execing process.
>
> Yes, because otherwise oom-killer can't account the memory populated by
> get_arg_page() in bprm->mm.
>
>> I couldn't find any place where the argc/envc
>> pointers were being included in the count,
>
> But why??? To clarify,
>
> size += ptr_size;
>
> after acct_arg_size() is clear and correct, we are going to check rlim_stack
> and thus the size should include the pointers we will add in 
> create_elf_tables().
>
> But acct_arg_size() should only account the pages we allocate for bprm->mm,
> nothing more. create_elf_tables() does not allocate the memory when it 
> populates
> arg_start/arg_end/env_start/env_end. Plus at this time the process has already
> switched to bprm->mm.

I've looked more closely now. So, while I agree with you about
resource limits, there's a corner case that is better handled here:
once we've called flush_old_exec(), we can no longer send errors back
to the parent. We just segfault. So, I think it's better to give a
resource limit error early, since it is able to do the math early.

If we move acct_arg_size() earlier, then the "immediate" resource
utilization is checked, but it means it can just segfault later. If we
leave it as-is, we account for later memory allocations "too early",
but we'll still not be able to run: but we can tell the parent why.

I prefer leave it as-is.

>> > Not to mention that ptr_size/PAGE_SIZE doesn't look right in any case...
>>
>> Hm? acct_arg_size() takes pages, not bytes. I think this is correct?
>> What doesn't look right to you?
>
> Please forget. I meant that _if_ we actually wanted to account this additional
> memory in bprm->pages, than we would probably need something like
> acct_arg_size(size/PAGE_SIZE + DIV_ROUND_UP(ptr_size, PAGE_SIZE)).

I'd need to study that more, but that change seems reasonable. :)

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH v2 3/5] irqchip: RISC-V Local Interrupt Controller Driver

2018-09-10 Thread Anup Patel
On Tue, Sep 11, 2018 at 3:49 AM, Christoph Hellwig  wrote:
> On Mon, Sep 10, 2018 at 09:37:59PM +0200, Thomas Gleixner wrote:
>> Processor local interrupts really should be architected and there are
>> really not that many of them.
>
> And that is what they are.
>
>> But well, RISC-V decided obvsiouly not to learn from mistakes made by
>> others.
>
> I don't think that is the case.  I think Atup misreads what reserved
> means - if you look at section 2.3 of the RISC-V privileged spec
> it clearly states that reserved fields are for future use and not
> for vendor specific use.

I think I understood what reserved means here. If reserved bits are
not for vendor specific or implementation specific stuff then it should
be mentioned clearly which is not the case.

The list of currently defined RISC-V local interrupts will definitely grow
based on my experience from ARM/ARM64 world.

Like Thomas mentioned, we will definitely end-up having separate
irqchip and irq_domain for RISC-V local interrupts for flexibility. Better
do it now with separate RISC-V INTC driver.

Regards,
Anup


Re: [PATCH v2 3/5] irqchip: RISC-V Local Interrupt Controller Driver

2018-09-10 Thread Anup Patel
On Tue, Sep 11, 2018 at 3:49 AM, Christoph Hellwig  wrote:
> On Mon, Sep 10, 2018 at 09:37:59PM +0200, Thomas Gleixner wrote:
>> Processor local interrupts really should be architected and there are
>> really not that many of them.
>
> And that is what they are.
>
>> But well, RISC-V decided obvsiouly not to learn from mistakes made by
>> others.
>
> I don't think that is the case.  I think Atup misreads what reserved
> means - if you look at section 2.3 of the RISC-V privileged spec
> it clearly states that reserved fields are for future use and not
> for vendor specific use.

I think I understood what reserved means here. If reserved bits are
not for vendor specific or implementation specific stuff then it should
be mentioned clearly which is not the case.

The list of currently defined RISC-V local interrupts will definitely grow
based on my experience from ARM/ARM64 world.

Like Thomas mentioned, we will definitely end-up having separate
irqchip and irq_domain for RISC-V local interrupts for flexibility. Better
do it now with separate RISC-V INTC driver.

Regards,
Anup


[PATCH v4] dt-binding: remoteproc: Add QTI ADSP PIL bindings

2018-09-10 Thread Rohit kumar
Add devicetree bindings documentation file for Qualcomm
Technolgies Inc ADSP Peripheral Image Loader.

Signed-off-by: Rohit kumar 
---
Changes since v3:
Addressed comments given by Rob

 .../bindings/remoteproc/qcom,adsp-pil.txt  | 126 +
 1 file changed, 126 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt 
b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt
new file mode 100644
index 000..06558de
--- /dev/null
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt
@@ -0,0 +1,126 @@
+Qualcomm Technology Inc. ADSP Peripheral Image Loader
+
+This document defines the binding for a component that loads and boots firmware
+on the Qualcomm Technology Inc. ADSP Hexagon core.
+
+- compatible:
+   Usage: required
+   Value type: 
+   Definition: must be one of:
+   "qcom,sdm845-adsp-pil"
+
+- reg:
+   Usage: required
+   Value type: 
+   Definition: must specify the base address and size of the qdsp6ss 
register
+
+- interrupts-extended:
+   Usage: required
+   Value type: 
+   Definition: must list the watchdog, fatal IRQs ready, handover and
+   stop-ack IRQs
+
+- interrupt-names:
+   Usage: required
+   Value type: 
+   Definition: must be "wdog", "fatal", "ready", "handover", "stop-ack"
+
+- clocks:
+   Usage: required
+   Value type: 
+   Definition:  List of 8 phandle and clock specifier pairs for the adsp.
+
+- clock-names:
+   Usage: required
+   Value type: 
+   Definition: List of clock input name strings sorted in the same
+   order as the clocks property. Definition must have
+   "xo", "sway_cbcr", "lpass_aon", "lpass_ahbs_aon_cbcr",
+   "lpass_ahbm_aon_cbcr", "qdsp6ss_xo", "qdsp6ss_sleep"
+   and "qdsp6ss_core".
+
+- power-domains:
+   Usage: required
+   Value type: 
+   Definition: reference to cx power domain node.
+
+- resets:
+   Usage: required
+   Value type: 
+   Definition: reference to the list of 2 reset-controller for the adsp.
+
+- reset-names:
+Usage: required
+Value type: 
+Definition: must be "pdc_sync" and "cc_lpass"
+
+- qcom,halt-regs:
+   Usage: required
+   Value type: 
+   Definition: a phandle reference to a syscon representing TCSR followed
+   by the offset within syscon for lpass halt register.
+
+- memory-region:
+   Usage: required
+   Value type: 
+   Definition: reference to the reserved-memory for the ADSP
+
+- qcom,smem-states:
+   Usage: required
+   Value type: 
+   Definition: reference to the smem state for requesting the ADSP to
+   shut down
+
+- qcom,smem-state-names:
+   Usage: required
+   Value type: 
+   Definition: must be "stop"
+
+
+= SUBNODES
+The adsp node may have an subnode named "glink-edge" that describes the
+communication edge, channels and devices related to the ADSP.
+See ../soc/qcom/qcom,glink.txt for details on how to describe these.
+
+= EXAMPLE
+The following example describes the resources needed to boot control the
+ADSP, as it is found on SDM845 boards.
+   adsp-pil {
+   compatible = "qcom,sdm845-adsp-pil";
+
+   reg = <0x1730 0x40c>;
+
+   interrupts-extended = < 0 162 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 0 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 1 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 2 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 3 IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "wdog", "fatal", "ready",
+   "handover", "stop-ack";
+
+   clocks = < RPMH_CXO_CLK>,
+   < GCC_LPASS_SWAY_CLK>,
+   < LPASS_AUDIO_WRAPPER_AON_CLK>,
+   < LPASS_Q6SS_AHBS_AON_CLK>,
+   < LPASS_Q6SS_AHBM_AON_CLK>,
+   < LPASS_QDSP6SS_XO_CLK>,
+   < LPASS_QDSP6SS_SLEEP_CLK>,
+   < LPASS_QDSP6SS_CORE_CLK>;
+   clock-names = "xo", "sway_cbcr", "lpass_aon",
+   "lpass_ahbs_aon_cbcr",
+   "lpass_ahbm_aon_cbcr", "qdsp6ss_xo",
+   "qdsp6ss_sleep", "qdsp6ss_core";
+
+   power-domains = < SDM845_CX>;
+
+   resets = <_reset PDC_AUDIO_SYNC_RESET>,
+<_reset AOSS_CC_LPASS_RESTART>;
+   reset-names = "pdc_sync", "cc_lpass";
+
+   qcom,halt-regs = <_mutex_regs 0x22000>;
+
+   memory-region = <_adsp_mem>;
+
+   qcom,smem-states = <_smp2p_out 0>;
+   qcom,smem-state-names = "stop";
+   };
-- 
Qualcomm India Private Limited, on 

[PATCH v4] dt-binding: remoteproc: Add QTI ADSP PIL bindings

2018-09-10 Thread Rohit kumar
Add devicetree bindings documentation file for Qualcomm
Technolgies Inc ADSP Peripheral Image Loader.

Signed-off-by: Rohit kumar 
---
Changes since v3:
Addressed comments given by Rob

 .../bindings/remoteproc/qcom,adsp-pil.txt  | 126 +
 1 file changed, 126 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt 
b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt
new file mode 100644
index 000..06558de
--- /dev/null
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt
@@ -0,0 +1,126 @@
+Qualcomm Technology Inc. ADSP Peripheral Image Loader
+
+This document defines the binding for a component that loads and boots firmware
+on the Qualcomm Technology Inc. ADSP Hexagon core.
+
+- compatible:
+   Usage: required
+   Value type: 
+   Definition: must be one of:
+   "qcom,sdm845-adsp-pil"
+
+- reg:
+   Usage: required
+   Value type: 
+   Definition: must specify the base address and size of the qdsp6ss 
register
+
+- interrupts-extended:
+   Usage: required
+   Value type: 
+   Definition: must list the watchdog, fatal IRQs ready, handover and
+   stop-ack IRQs
+
+- interrupt-names:
+   Usage: required
+   Value type: 
+   Definition: must be "wdog", "fatal", "ready", "handover", "stop-ack"
+
+- clocks:
+   Usage: required
+   Value type: 
+   Definition:  List of 8 phandle and clock specifier pairs for the adsp.
+
+- clock-names:
+   Usage: required
+   Value type: 
+   Definition: List of clock input name strings sorted in the same
+   order as the clocks property. Definition must have
+   "xo", "sway_cbcr", "lpass_aon", "lpass_ahbs_aon_cbcr",
+   "lpass_ahbm_aon_cbcr", "qdsp6ss_xo", "qdsp6ss_sleep"
+   and "qdsp6ss_core".
+
+- power-domains:
+   Usage: required
+   Value type: 
+   Definition: reference to cx power domain node.
+
+- resets:
+   Usage: required
+   Value type: 
+   Definition: reference to the list of 2 reset-controller for the adsp.
+
+- reset-names:
+Usage: required
+Value type: 
+Definition: must be "pdc_sync" and "cc_lpass"
+
+- qcom,halt-regs:
+   Usage: required
+   Value type: 
+   Definition: a phandle reference to a syscon representing TCSR followed
+   by the offset within syscon for lpass halt register.
+
+- memory-region:
+   Usage: required
+   Value type: 
+   Definition: reference to the reserved-memory for the ADSP
+
+- qcom,smem-states:
+   Usage: required
+   Value type: 
+   Definition: reference to the smem state for requesting the ADSP to
+   shut down
+
+- qcom,smem-state-names:
+   Usage: required
+   Value type: 
+   Definition: must be "stop"
+
+
+= SUBNODES
+The adsp node may have an subnode named "glink-edge" that describes the
+communication edge, channels and devices related to the ADSP.
+See ../soc/qcom/qcom,glink.txt for details on how to describe these.
+
+= EXAMPLE
+The following example describes the resources needed to boot control the
+ADSP, as it is found on SDM845 boards.
+   adsp-pil {
+   compatible = "qcom,sdm845-adsp-pil";
+
+   reg = <0x1730 0x40c>;
+
+   interrupts-extended = < 0 162 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 0 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 1 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 2 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 3 IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "wdog", "fatal", "ready",
+   "handover", "stop-ack";
+
+   clocks = < RPMH_CXO_CLK>,
+   < GCC_LPASS_SWAY_CLK>,
+   < LPASS_AUDIO_WRAPPER_AON_CLK>,
+   < LPASS_Q6SS_AHBS_AON_CLK>,
+   < LPASS_Q6SS_AHBM_AON_CLK>,
+   < LPASS_QDSP6SS_XO_CLK>,
+   < LPASS_QDSP6SS_SLEEP_CLK>,
+   < LPASS_QDSP6SS_CORE_CLK>;
+   clock-names = "xo", "sway_cbcr", "lpass_aon",
+   "lpass_ahbs_aon_cbcr",
+   "lpass_ahbm_aon_cbcr", "qdsp6ss_xo",
+   "qdsp6ss_sleep", "qdsp6ss_core";
+
+   power-domains = < SDM845_CX>;
+
+   resets = <_reset PDC_AUDIO_SYNC_RESET>,
+<_reset AOSS_CC_LPASS_RESTART>;
+   reset-names = "pdc_sync", "cc_lpass";
+
+   qcom,halt-regs = <_mutex_regs 0x22000>;
+
+   memory-region = <_adsp_mem>;
+
+   qcom,smem-states = <_smp2p_out 0>;
+   qcom,smem-state-names = "stop";
+   };
-- 
Qualcomm India Private Limited, on 

linux-next: build warning after merge of the tip tree

2018-09-10 Thread Stephen Rothwell
Hi all,

After merging the tip tree, today's linux-next build (x86_64 allnoconfig)
produced this warning:

arch/x86/kernel/cpu/common.c: In function 'syscall_init':
arch/x86/kernel/cpu/common.c:1534:6: warning: unused variable 'cpu' 
[-Wunused-variable]
  int cpu = smp_processor_id();
  ^~~

Introduced by commit

  86635715ee42 ("x86/pti/64: Remove the SYSCALL64 entry trampoline")

-- 
Cheers,
Stephen Rothwell


pgpBPEpqs90C8.pgp
Description: OpenPGP digital signature


linux-next: build warning after merge of the tip tree

2018-09-10 Thread Stephen Rothwell
Hi all,

After merging the tip tree, today's linux-next build (x86_64 allnoconfig)
produced this warning:

arch/x86/kernel/cpu/common.c: In function 'syscall_init':
arch/x86/kernel/cpu/common.c:1534:6: warning: unused variable 'cpu' 
[-Wunused-variable]
  int cpu = smp_processor_id();
  ^~~

Introduced by commit

  86635715ee42 ("x86/pti/64: Remove the SYSCALL64 entry trampoline")

-- 
Cheers,
Stephen Rothwell


pgpBPEpqs90C8.pgp
Description: OpenPGP digital signature


Re: [PATCH v3 2/2] remoteproc: qcom: Introduce Non-PAS ADSP PIL driver

2018-09-10 Thread Rohit Kumar

Thanks Bjorn for reviewing.


On 9/11/2018 12:01 AM, Bjorn Andersson wrote:

On Mon 03 Sep 04:52 PDT 2018, Rohit kumar wrote:


This adds Non PAS ADSP PIL driver for Qualcomm
Technologies Inc SoCs.
Added initial support for SDM845 with ADSP bootup and
shutdown operation handled from Application Processor
SubSystem(APSS).

Signed-off-by: Rohit kumar 

Thanks for the changes Rohit, this looks good.

Once we hear from DT maintainers that patch 1 can be applied I will
update the name of the file and driver as I apply it to match the naming
scheme I'm aiming for - no need for you to resend because of this.
Sure, I will just update dt-bindings with addressing some comments given 
by Rob.



---
  drivers/remoteproc/Kconfig |  14 ++
  drivers/remoteproc/Makefile|   1 +
  drivers/remoteproc/qcom_adsp_pil.c | 500 +
  3 files changed, 515 insertions(+)
  create mode 100644 drivers/remoteproc/qcom_adsp_pil.c

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index c98c0b2..445de2d 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -139,6 +139,20 @@ config QCOM_Q6V5_WCSS
  Say y here to support the Qualcomm Peripheral Image Loader for the
  Hexagon V5 based WCSS remote processors.
  
+config QCOM_ADSP_PIL

I will make this QCOM_Q6V5_ADSP

[..]

diff --git a/drivers/remoteproc/qcom_adsp_pil.c 
b/drivers/remoteproc/qcom_adsp_pil.c

Make this qcom_q6v5_adsp.c

[..]

+static struct platform_driver adsp_pil_driver = {
+   .probe = adsp_probe,
+   .remove = adsp_remove,
+   .driver = {
+   .name = "qcom_adsp_pil",

and this qcom_q6v5_adsp".


+   .of_match_table = adsp_of_match,
+   },
+};

Please let me know if you have any objections to this.

Naming looks fine.

Thanks,
Rohit

Regards,
Bjorn




Re: [PATCH v3 2/2] remoteproc: qcom: Introduce Non-PAS ADSP PIL driver

2018-09-10 Thread Rohit Kumar

Thanks Bjorn for reviewing.


On 9/11/2018 12:01 AM, Bjorn Andersson wrote:

On Mon 03 Sep 04:52 PDT 2018, Rohit kumar wrote:


This adds Non PAS ADSP PIL driver for Qualcomm
Technologies Inc SoCs.
Added initial support for SDM845 with ADSP bootup and
shutdown operation handled from Application Processor
SubSystem(APSS).

Signed-off-by: Rohit kumar 

Thanks for the changes Rohit, this looks good.

Once we hear from DT maintainers that patch 1 can be applied I will
update the name of the file and driver as I apply it to match the naming
scheme I'm aiming for - no need for you to resend because of this.
Sure, I will just update dt-bindings with addressing some comments given 
by Rob.



---
  drivers/remoteproc/Kconfig |  14 ++
  drivers/remoteproc/Makefile|   1 +
  drivers/remoteproc/qcom_adsp_pil.c | 500 +
  3 files changed, 515 insertions(+)
  create mode 100644 drivers/remoteproc/qcom_adsp_pil.c

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index c98c0b2..445de2d 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -139,6 +139,20 @@ config QCOM_Q6V5_WCSS
  Say y here to support the Qualcomm Peripheral Image Loader for the
  Hexagon V5 based WCSS remote processors.
  
+config QCOM_ADSP_PIL

I will make this QCOM_Q6V5_ADSP

[..]

diff --git a/drivers/remoteproc/qcom_adsp_pil.c 
b/drivers/remoteproc/qcom_adsp_pil.c

Make this qcom_q6v5_adsp.c

[..]

+static struct platform_driver adsp_pil_driver = {
+   .probe = adsp_probe,
+   .remove = adsp_remove,
+   .driver = {
+   .name = "qcom_adsp_pil",

and this qcom_q6v5_adsp".


+   .of_match_table = adsp_of_match,
+   },
+};

Please let me know if you have any objections to this.

Naming looks fine.

Thanks,
Rohit

Regards,
Bjorn




Re: [PATCH v3 1/2] dt-binding: remoteproc: Add QTI ADSP PIL bindings

2018-09-10 Thread Rohit Kumar

Thanks Rob for reviewing.


On 9/11/2018 1:31 AM, Rob Herring wrote:

On Mon, Sep 03, 2018 at 05:22:39PM +0530, Rohit kumar wrote:

Add devicetree bindings documentation file for Qualcomm
Technolgies Inc ADSP Peripheral Image Loader.

Signed-off-by: Rohit kumar 
---
  .../bindings/remoteproc/qcom,adsp-pil.txt  | 123 +
  1 file changed, 123 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt 
b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt
new file mode 100644
index 000..f1c215a
--- /dev/null
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt
@@ -0,0 +1,123 @@
+Qualcomm Technology Inc. ADSP Peripheral Image Loader
+
+This document defines the binding for a component that loads and boots firmware
+on the Qualcomm Technology Inc. ADSP Hexagon core.
+
+- compatible:
+   Usage: required
+   Value type: 
+   Definition: must be one of:
+   "qcom,sdm845-adsp-pil"
+
+- reg:
+   Usage: required
+   Value type: 
+   Definition: must specify the base address and size of the qdsp6ss 
register
+
+- interrupts-extended:
+   Usage: required
+   Value type: 
+   Definition: must list the watchdog, fatal IRQs ready, handover and
+   stop-ack IRQs
+
+- interrupt-names:
+   Usage: required
+   Value type: 
+   Definition: must be "wdog", "fatal", "ready", "handover", "stop-ack"
+
+- clocks:
+   Usage: required
+   Value type: 
+   Definition:  List of phandle and clock specifier pairs

How many clocks?


+
+- clock-names:
+   Usage: required
+   Value type: 
+   Definition: List of clock input name strings sorted in the same
+   order as the clocks property.

What are the names?


I will update these in next spin.

+
+- power-domains:
+   Usage: required
+   Value type: 
+   Definition: reference to cx power domain node.
+
+- resets:
+   Usage: required
+   Value type: 
+   Definition: reference to the reset-controller for the lpass

How many?


+
+- reset-names:
+Usage: required
+Value type: 
+Definition: must be "pdc_sync" and "cc_lpass"
+
+- qcom,halt-regs:
+   Usage: required
+   Value type: 
+   Definition: a phandle reference to a syscon representing TCSR followed
+   by the offset within syscon for lpass halt register.
+
+- memory-region:
+   Usage: required
+   Value type: 
+   Definition: reference to the reserved-memory for the ADSP
+
+- qcom,smem-states:
+   Usage: required
+   Value type: 
+   Definition: reference to the smem state for requesting the ADSP to
+   shut down
+
+- qcom,smem-state-names:
+   Usage: required
+   Value type: 
+   Definition: must be "stop"
+
+
+= SUBNODES
+The adsp node may have an subnode named "glink-edge" that describes the
+communication edge, channels and devices related to the ADSP.
+See ../soc/qcom/qcom,glink.txt for details on how to describe these.
+
+= EXAMPLE
+The following example describes the resources needed to boot control the
+ADSP, as it is found on SDM845 boards.
+   adsp-pil {
+   compatible = "qcom,sdm845-adsp-pil";
+
+   reg = <0x1730 0x40c>;
+
+   interrupts-extended = < 0 162 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 0 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 1 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 2 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 3 IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "wdog", "fatal", "ready",
+   "handover", "stop-ack";
+
+   clocks = < RPMH_CXO_CLK>,
+   < GCC_LPASS_SWAY_CLK>,
+   < LPASS_AUDIO_WRAPPER_AON_CLK>,
+   < LPASS_Q6SS_AHBS_AON_CLK>,
+   < LPASS_Q6SS_AHBM_AON_CLK>,
+   < LPASS_QDSP6SS_XO_CLK>,
+   < LPASS_QDSP6SS_SLEEP_CLK>,
+   < LPASS_QDSP6SS_CORE_CLK>;
+   clock-names = "xo", "sway_cbcr", "lpass_aon",
+   "lpass_ahbs_aon_cbcr",
+   "lpass_ahbm_aon_cbcr", "qdsp6ss_xo",
+   "qdsp6ss_sleep", "qdsp6ss_core";
+
+   power-domains = < SDM845_CX>;
+
+   resets = <_reset PDC_AUDIO_SYNC_RESET>,
+<_reset AOSS_CC_LPASS_RESTART>;
+   reset-names = "pdc_sync", "cc_lpass";
+
+   qcom,halt-regs = <_mutex_regs 0x22000>;
+
+   memory-region = <_adsp_mem>;
+
+   qcom,smem-states = <_smp2p_out 0>;
+   qcom,smem-state-names = "stop";
+   };
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a 

Re: [PATCH v3 1/2] dt-binding: remoteproc: Add QTI ADSP PIL bindings

2018-09-10 Thread Rohit Kumar

Thanks Rob for reviewing.


On 9/11/2018 1:31 AM, Rob Herring wrote:

On Mon, Sep 03, 2018 at 05:22:39PM +0530, Rohit kumar wrote:

Add devicetree bindings documentation file for Qualcomm
Technolgies Inc ADSP Peripheral Image Loader.

Signed-off-by: Rohit kumar 
---
  .../bindings/remoteproc/qcom,adsp-pil.txt  | 123 +
  1 file changed, 123 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt 
b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt
new file mode 100644
index 000..f1c215a
--- /dev/null
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,adsp-pil.txt
@@ -0,0 +1,123 @@
+Qualcomm Technology Inc. ADSP Peripheral Image Loader
+
+This document defines the binding for a component that loads and boots firmware
+on the Qualcomm Technology Inc. ADSP Hexagon core.
+
+- compatible:
+   Usage: required
+   Value type: 
+   Definition: must be one of:
+   "qcom,sdm845-adsp-pil"
+
+- reg:
+   Usage: required
+   Value type: 
+   Definition: must specify the base address and size of the qdsp6ss 
register
+
+- interrupts-extended:
+   Usage: required
+   Value type: 
+   Definition: must list the watchdog, fatal IRQs ready, handover and
+   stop-ack IRQs
+
+- interrupt-names:
+   Usage: required
+   Value type: 
+   Definition: must be "wdog", "fatal", "ready", "handover", "stop-ack"
+
+- clocks:
+   Usage: required
+   Value type: 
+   Definition:  List of phandle and clock specifier pairs

How many clocks?


+
+- clock-names:
+   Usage: required
+   Value type: 
+   Definition: List of clock input name strings sorted in the same
+   order as the clocks property.

What are the names?


I will update these in next spin.

+
+- power-domains:
+   Usage: required
+   Value type: 
+   Definition: reference to cx power domain node.
+
+- resets:
+   Usage: required
+   Value type: 
+   Definition: reference to the reset-controller for the lpass

How many?


+
+- reset-names:
+Usage: required
+Value type: 
+Definition: must be "pdc_sync" and "cc_lpass"
+
+- qcom,halt-regs:
+   Usage: required
+   Value type: 
+   Definition: a phandle reference to a syscon representing TCSR followed
+   by the offset within syscon for lpass halt register.
+
+- memory-region:
+   Usage: required
+   Value type: 
+   Definition: reference to the reserved-memory for the ADSP
+
+- qcom,smem-states:
+   Usage: required
+   Value type: 
+   Definition: reference to the smem state for requesting the ADSP to
+   shut down
+
+- qcom,smem-state-names:
+   Usage: required
+   Value type: 
+   Definition: must be "stop"
+
+
+= SUBNODES
+The adsp node may have an subnode named "glink-edge" that describes the
+communication edge, channels and devices related to the ADSP.
+See ../soc/qcom/qcom,glink.txt for details on how to describe these.
+
+= EXAMPLE
+The following example describes the resources needed to boot control the
+ADSP, as it is found on SDM845 boards.
+   adsp-pil {
+   compatible = "qcom,sdm845-adsp-pil";
+
+   reg = <0x1730 0x40c>;
+
+   interrupts-extended = < 0 162 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 0 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 1 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 2 IRQ_TYPE_EDGE_RISING>,
+   <_smp2p_in 3 IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "wdog", "fatal", "ready",
+   "handover", "stop-ack";
+
+   clocks = < RPMH_CXO_CLK>,
+   < GCC_LPASS_SWAY_CLK>,
+   < LPASS_AUDIO_WRAPPER_AON_CLK>,
+   < LPASS_Q6SS_AHBS_AON_CLK>,
+   < LPASS_Q6SS_AHBM_AON_CLK>,
+   < LPASS_QDSP6SS_XO_CLK>,
+   < LPASS_QDSP6SS_SLEEP_CLK>,
+   < LPASS_QDSP6SS_CORE_CLK>;
+   clock-names = "xo", "sway_cbcr", "lpass_aon",
+   "lpass_ahbs_aon_cbcr",
+   "lpass_ahbm_aon_cbcr", "qdsp6ss_xo",
+   "qdsp6ss_sleep", "qdsp6ss_core";
+
+   power-domains = < SDM845_CX>;
+
+   resets = <_reset PDC_AUDIO_SYNC_RESET>,
+<_reset AOSS_CC_LPASS_RESTART>;
+   reset-names = "pdc_sync", "cc_lpass";
+
+   qcom,halt-regs = <_mutex_regs 0x22000>;
+
+   memory-region = <_adsp_mem>;
+
+   qcom,smem-states = <_smp2p_out 0>;
+   qcom,smem-state-names = "stop";
+   };
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a 

Re: [RFC PATCH 1/5] RISC-V: Make IPI triggering flexible

2018-09-10 Thread Anup Patel
On Mon, Sep 10, 2018 at 7:04 PM, Christoph Hellwig  wrote:
> On Thu, Sep 06, 2018 at 04:15:14PM +0530, Anup Patel wrote:
>> This patch is doing two things:
>> 1. Allow IRQCHIP driver to provide IPI trigger mechanism
>
> And the big questions is why do we want that?  The last thing we
> want is for people to "innovate" on how they deliver IPIs.  RISC-V
> has defined an SBI interface for it to hide all the details, and
> we should not try to handle systems that are not SBI compliant.
>
> Eventuall we might want to revisit the SBI to improve on shortcomings
> if there are any, but we should not allow random irqchip drivers to
> override this.

I have already dropped this part from the PATCH v2.

>
>> 2. Have more generic IPI handler in arch/riscv so that IRQCHIP driver
>> can call it
>
> And that is rather irrelevant without 1) above.

Nopes, this is required for the RISC-V INTC driver.

Regards,
Anup


Re: [RFC PATCH 1/5] RISC-V: Make IPI triggering flexible

2018-09-10 Thread Anup Patel
On Mon, Sep 10, 2018 at 7:04 PM, Christoph Hellwig  wrote:
> On Thu, Sep 06, 2018 at 04:15:14PM +0530, Anup Patel wrote:
>> This patch is doing two things:
>> 1. Allow IRQCHIP driver to provide IPI trigger mechanism
>
> And the big questions is why do we want that?  The last thing we
> want is for people to "innovate" on how they deliver IPIs.  RISC-V
> has defined an SBI interface for it to hide all the details, and
> we should not try to handle systems that are not SBI compliant.
>
> Eventuall we might want to revisit the SBI to improve on shortcomings
> if there are any, but we should not allow random irqchip drivers to
> override this.

I have already dropped this part from the PATCH v2.

>
>> 2. Have more generic IPI handler in arch/riscv so that IRQCHIP driver
>> can call it
>
> And that is rather irrelevant without 1) above.

Nopes, this is required for the RISC-V INTC driver.

Regards,
Anup


Compiler flags for libapi and libtraceevent

2018-09-10 Thread Ben Hutchings
I noticed that tools/lib/api/Makefile has these conditional
assignments, similar to tools/perf/Makefile.config:

ifeq ($(DEBUG),0)
ifeq ($(CC_NO_CLANG), 0)
  CFLAGS += -O3
else
  CFLAGS += -O6
endif
endif

ifeq ($(DEBUG),0)
  CFLAGS += -D_FORTIFY_SOURCE
endif

But it doesn't set DEBUG to 0 by default, and nothing under tools/perf
exports its value of CFLAGS or DEBUG.

tools/lib/traceevent/Makefile doesn't seem to have any logic to enable
optimisation or Fortify.

Shouldn't these libraries both have optimisations and Fortify turned on
by default, like perf itself?

Ben.

-- 
Ben Hutchings
Computers are not intelligent.  They only think they are.



signature.asc
Description: This is a digitally signed message part


Compiler flags for libapi and libtraceevent

2018-09-10 Thread Ben Hutchings
I noticed that tools/lib/api/Makefile has these conditional
assignments, similar to tools/perf/Makefile.config:

ifeq ($(DEBUG),0)
ifeq ($(CC_NO_CLANG), 0)
  CFLAGS += -O3
else
  CFLAGS += -O6
endif
endif

ifeq ($(DEBUG),0)
  CFLAGS += -D_FORTIFY_SOURCE
endif

But it doesn't set DEBUG to 0 by default, and nothing under tools/perf
exports its value of CFLAGS or DEBUG.

tools/lib/traceevent/Makefile doesn't seem to have any logic to enable
optimisation or Fortify.

Shouldn't these libraries both have optimisations and Fortify turned on
by default, like perf itself?

Ben.

-- 
Ben Hutchings
Computers are not intelligent.  They only think they are.



signature.asc
Description: This is a digitally signed message part


linux-next: build failure after merge of the tty tree

2018-09-10 Thread Stephen Rothwell
Hi Greg,

After merging the tty tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

drivers/mfd/at91-usart.c:51:34: error: array type has incomplete element type 
'struct of_device_id'
 static const struct of_device_id at91_usart_mode_of_match[] = {
  ^~~~
drivers/mfd/at91-usart.c:52:4: error: field name not in record or union 
initializer
  { .compatible = "atmel,at91rm9200-usart" },
^
drivers/mfd/at91-usart.c:52:4: note: (near initialization for 
'at91_usart_mode_of_match')
drivers/mfd/at91-usart.c:53:4: error: field name not in record or union 
initializer
  { .compatible = "atmel,at91sam9260-usart" },
^
drivers/mfd/at91-usart.c:53:4: note: (near initialization for 
'at91_usart_mode_of_match')
drivers/mfd/at91-usart.c:51:34: warning: 'at91_usart_mode_of_match' defined but 
not used [-Wunused-variable]
 static const struct of_device_id at91_usart_mode_of_match[] = {
  ^~~~

Caused by commit

  7d3aa342cef7 ("mfd: at91-usart: Add MFD driver for USART")

Forgot to include ?

I used the version of the tty tree from next-20180910 for today.



-- 
Cheers,
Stephen Rothwell


pgp98wiPJXPhs.pgp
Description: OpenPGP digital signature


linux-next: build failure after merge of the tty tree

2018-09-10 Thread Stephen Rothwell
Hi Greg,

After merging the tty tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

drivers/mfd/at91-usart.c:51:34: error: array type has incomplete element type 
'struct of_device_id'
 static const struct of_device_id at91_usart_mode_of_match[] = {
  ^~~~
drivers/mfd/at91-usart.c:52:4: error: field name not in record or union 
initializer
  { .compatible = "atmel,at91rm9200-usart" },
^
drivers/mfd/at91-usart.c:52:4: note: (near initialization for 
'at91_usart_mode_of_match')
drivers/mfd/at91-usart.c:53:4: error: field name not in record or union 
initializer
  { .compatible = "atmel,at91sam9260-usart" },
^
drivers/mfd/at91-usart.c:53:4: note: (near initialization for 
'at91_usart_mode_of_match')
drivers/mfd/at91-usart.c:51:34: warning: 'at91_usart_mode_of_match' defined but 
not used [-Wunused-variable]
 static const struct of_device_id at91_usart_mode_of_match[] = {
  ^~~~

Caused by commit

  7d3aa342cef7 ("mfd: at91-usart: Add MFD driver for USART")

Forgot to include ?

I used the version of the tty tree from next-20180910 for today.



-- 
Cheers,
Stephen Rothwell


pgp98wiPJXPhs.pgp
Description: OpenPGP digital signature


[PATCH v12 1/2] leds: core: Introduce LED pattern trigger

2018-09-10 Thread Baolin Wang
This patch adds one new led trigger that LED device can configure
the software or hardware pattern and trigger it.

Consumers can write 'pattern' file to enable the software pattern
which alters the brightness for the specified duration with one
software timer.

Moreover consumers can write 'hw_pattern' file to enable the hardware
pattern for some LED controllers which can autonomously control
brightness over time, according to some preprogrammed hardware
patterns.

Signed-off-by: Raphael Teysseyre 
Signed-off-by: Baolin Wang 
---
Changes from v11:
 - Change -1 means repeat indefinitely.

Changes from v10:
 - Change 'int' to 'u32' for delta_t field.

Changes from v9:
 - None.

Changes from v8:
 - None.

Changes from v7:
 - Move the SC27XX hardware patterns description into its own ABI file.

Changes from v6:
 - Improve commit message.
 - Optimize the description of the hw_pattern file.
 - Simplify some logics.

Changes from v5:
 - Add one 'hw_pattern' file for hardware patterns.

Changes from v4:
 - Change the repeat file to return the originally written number.
 - Improve comments.
 - Fix some build warnings.

Changes from v3:
 - Reset pattern number to 0 if user provides incorrect pattern string.
 - Support one pattern.

Changes from v2:
 - Remove hardware_pattern boolen.
 - Chnage the pattern string format.

Changes from v1:
 - Use ATTRIBUTE_GROUPS() to define attributes.
 - Introduce hardware_pattern flag to determine if software pattern
 or hardware pattern.
 - Re-implement pattern_trig_store_pattern() function.
 - Remove pattern_get() interface.
 - Improve comments.
 - Other small optimization.
---
 .../ABI/testing/sysfs-class-led-trigger-pattern|   39 +++
 drivers/leds/trigger/Kconfig   |7 +
 drivers/leds/trigger/Makefile  |1 +
 drivers/leds/trigger/ledtrig-pattern.c |  344 
 include/linux/leds.h   |   15 +
 5 files changed, 406 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-led-trigger-pattern
 create mode 100644 drivers/leds/trigger/ledtrig-pattern.c

diff --git a/Documentation/ABI/testing/sysfs-class-led-trigger-pattern 
b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern
new file mode 100644
index 000..afff9e3
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern
@@ -0,0 +1,39 @@
+What:  /sys/class/leds//pattern
+Date:  September 2018
+KernelVersion: 4.20
+Description:
+   Specify a software pattern for the LED, that supports altering
+   the brightness for the specified duration with one software
+   timer.
+
+   The pattern is given by a series of tuples, of brightness and
+   duration (ms). The LED is expected to traverse the series and
+   each brightness value for the specified duration. Duration of
+   0 means brightness should immediately change to new value.
+
+   The format of the software pattern values should be:
+   "brightness_1 duration_1 brightness_2 duration_2 brightness_3
+   duration_3 ...".
+
+What:  /sys/class/leds//hw_pattern
+Date:  September 2018
+KernelVersion: 4.20
+Description:
+   Specify a hardware pattern for the LED, for LED hardware that
+   supports autonomously controlling brightness over time, 
according
+   to some preprogrammed hardware patterns.
+
+   Since different LED hardware can have different semantics of
+   hardware patterns, each driver is expected to provide its own
+   description for the hardware patterns in their ABI documentation
+   file.
+
+What:  /sys/class/leds//repeat
+Date:  September 2018
+KernelVersion: 4.20
+Description:
+   Specify a pattern repeat number. -1 means repeat indefinitely,
+   other negative numbers and number 0 are invalid.
+
+   This file will always return the originally written repeat
+   number.
diff --git a/drivers/leds/trigger/Kconfig b/drivers/leds/trigger/Kconfig
index 4018af7..b76fc3c 100644
--- a/drivers/leds/trigger/Kconfig
+++ b/drivers/leds/trigger/Kconfig
@@ -129,4 +129,11 @@ config LEDS_TRIGGER_NETDEV
  This allows LEDs to be controlled by network device activity.
  If unsure, say Y.
 
+config LEDS_TRIGGER_PATTERN
+   tristate "LED Pattern Trigger"
+   help
+ This allows LEDs to be controlled by a software or hardware pattern
+ which is a series of tuples, of brightness and duration (ms).
+ If unsure, say N
+
 endif # LEDS_TRIGGERS
diff --git a/drivers/leds/trigger/Makefile b/drivers/leds/trigger/Makefile
index f3cfe19..9bcb64e 100644
--- a/drivers/leds/trigger/Makefile
+++ b/drivers/leds/trigger/Makefile
@@ -13,3 +13,4 @@ obj-$(CONFIG_LEDS_TRIGGER_TRANSIENT)  += ledtrig-transient.o
 

[PATCH v12 1/2] leds: core: Introduce LED pattern trigger

2018-09-10 Thread Baolin Wang
This patch adds one new led trigger that LED device can configure
the software or hardware pattern and trigger it.

Consumers can write 'pattern' file to enable the software pattern
which alters the brightness for the specified duration with one
software timer.

Moreover consumers can write 'hw_pattern' file to enable the hardware
pattern for some LED controllers which can autonomously control
brightness over time, according to some preprogrammed hardware
patterns.

Signed-off-by: Raphael Teysseyre 
Signed-off-by: Baolin Wang 
---
Changes from v11:
 - Change -1 means repeat indefinitely.

Changes from v10:
 - Change 'int' to 'u32' for delta_t field.

Changes from v9:
 - None.

Changes from v8:
 - None.

Changes from v7:
 - Move the SC27XX hardware patterns description into its own ABI file.

Changes from v6:
 - Improve commit message.
 - Optimize the description of the hw_pattern file.
 - Simplify some logics.

Changes from v5:
 - Add one 'hw_pattern' file for hardware patterns.

Changes from v4:
 - Change the repeat file to return the originally written number.
 - Improve comments.
 - Fix some build warnings.

Changes from v3:
 - Reset pattern number to 0 if user provides incorrect pattern string.
 - Support one pattern.

Changes from v2:
 - Remove hardware_pattern boolen.
 - Chnage the pattern string format.

Changes from v1:
 - Use ATTRIBUTE_GROUPS() to define attributes.
 - Introduce hardware_pattern flag to determine if software pattern
 or hardware pattern.
 - Re-implement pattern_trig_store_pattern() function.
 - Remove pattern_get() interface.
 - Improve comments.
 - Other small optimization.
---
 .../ABI/testing/sysfs-class-led-trigger-pattern|   39 +++
 drivers/leds/trigger/Kconfig   |7 +
 drivers/leds/trigger/Makefile  |1 +
 drivers/leds/trigger/ledtrig-pattern.c |  344 
 include/linux/leds.h   |   15 +
 5 files changed, 406 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-led-trigger-pattern
 create mode 100644 drivers/leds/trigger/ledtrig-pattern.c

diff --git a/Documentation/ABI/testing/sysfs-class-led-trigger-pattern 
b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern
new file mode 100644
index 000..afff9e3
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern
@@ -0,0 +1,39 @@
+What:  /sys/class/leds//pattern
+Date:  September 2018
+KernelVersion: 4.20
+Description:
+   Specify a software pattern for the LED, that supports altering
+   the brightness for the specified duration with one software
+   timer.
+
+   The pattern is given by a series of tuples, of brightness and
+   duration (ms). The LED is expected to traverse the series and
+   each brightness value for the specified duration. Duration of
+   0 means brightness should immediately change to new value.
+
+   The format of the software pattern values should be:
+   "brightness_1 duration_1 brightness_2 duration_2 brightness_3
+   duration_3 ...".
+
+What:  /sys/class/leds//hw_pattern
+Date:  September 2018
+KernelVersion: 4.20
+Description:
+   Specify a hardware pattern for the LED, for LED hardware that
+   supports autonomously controlling brightness over time, 
according
+   to some preprogrammed hardware patterns.
+
+   Since different LED hardware can have different semantics of
+   hardware patterns, each driver is expected to provide its own
+   description for the hardware patterns in their ABI documentation
+   file.
+
+What:  /sys/class/leds//repeat
+Date:  September 2018
+KernelVersion: 4.20
+Description:
+   Specify a pattern repeat number. -1 means repeat indefinitely,
+   other negative numbers and number 0 are invalid.
+
+   This file will always return the originally written repeat
+   number.
diff --git a/drivers/leds/trigger/Kconfig b/drivers/leds/trigger/Kconfig
index 4018af7..b76fc3c 100644
--- a/drivers/leds/trigger/Kconfig
+++ b/drivers/leds/trigger/Kconfig
@@ -129,4 +129,11 @@ config LEDS_TRIGGER_NETDEV
  This allows LEDs to be controlled by network device activity.
  If unsure, say Y.
 
+config LEDS_TRIGGER_PATTERN
+   tristate "LED Pattern Trigger"
+   help
+ This allows LEDs to be controlled by a software or hardware pattern
+ which is a series of tuples, of brightness and duration (ms).
+ If unsure, say N
+
 endif # LEDS_TRIGGERS
diff --git a/drivers/leds/trigger/Makefile b/drivers/leds/trigger/Makefile
index f3cfe19..9bcb64e 100644
--- a/drivers/leds/trigger/Makefile
+++ b/drivers/leds/trigger/Makefile
@@ -13,3 +13,4 @@ obj-$(CONFIG_LEDS_TRIGGER_TRANSIENT)  += ledtrig-transient.o
 

[PATCH v12 2/2] leds: sc27xx: Add pattern_set/clear interfaces for LED controller

2018-09-10 Thread Baolin Wang
This patch implements the 'pattern_set'and 'pattern_clear'
interfaces to support SC27XX LED breathing mode.

Signed-off-by: Baolin Wang 
Acked-by: Pavel Machek 
---
Changes from v11:
 - None.

Changes from v10:
 - Add duration alignment function suggested by Jacek.
 - Add acked tag from Pavel.

Changes from v9:
 - Optimize the ABI documentation file.
 - Update the brightness value in hardware pattern mode.

Changes from v8:
 - Optimize the ABI documentation file.

Changes from v7:
 - Add its own ABI documentation file.

Changes from v6:
 - None.

Changes from v5:
 - None.

Changes from v4:
 - None.

Changes from v3:
 - None.

Changes from v2:
 - None.

Changes from v1:
 - Remove pattern_get interface.
---
 .../ABI/testing/sysfs-class-led-driver-sc27xx  |   22 
 drivers/leds/leds-sc27xx-bltc.c|  121 
 2 files changed, 143 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-led-driver-sc27xx

diff --git a/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx 
b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
new file mode 100644
index 000..45b1e60
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
@@ -0,0 +1,22 @@
+What:  /sys/class/leds//hw_pattern
+Date:  September 2018
+KernelVersion: 4.20
+Description:
+   Specify a hardware pattern for the SC27XX LED. For the SC27XX
+   LED controller, it only supports 4 stages to make a single
+   hardware pattern, which is used to configure the rise time,
+   high time, fall time and low time for the breathing mode.
+
+   For the breathing mode, the SC27XX LED only expects one 
brightness
+   for the high stage. To be compatible with the hardware pattern
+   format, we should set brightness as 0 for rise stage, fall
+   stage and low stage.
+
+   Min stage duration: 125 ms
+   Max stage duration: 31875 ms
+
+   Since the stage duration step is 125 ms, the duration should be
+   a multiplier of 125, like 125ms, 250ms, 375ms, 500ms ... 
31875ms.
+
+   Thus the format of the hardware pattern values should be:
+   "0 rise_duration brightness high_duration 0 fall_duration 0 
low_duration".
diff --git a/drivers/leds/leds-sc27xx-bltc.c b/drivers/leds/leds-sc27xx-bltc.c
index 9d9b7aa..fecf27f 100644
--- a/drivers/leds/leds-sc27xx-bltc.c
+++ b/drivers/leds/leds-sc27xx-bltc.c
@@ -32,8 +32,18 @@
 #define SC27XX_DUTY_MASK   GENMASK(15, 0)
 #define SC27XX_MOD_MASKGENMASK(7, 0)
 
+#define SC27XX_CURVE_SHIFT 8
+#define SC27XX_CURVE_L_MASKGENMASK(7, 0)
+#define SC27XX_CURVE_H_MASKGENMASK(15, 8)
+
 #define SC27XX_LEDS_OFFSET 0x10
 #define SC27XX_LEDS_MAX3
+#define SC27XX_LEDS_PATTERN_CNT4
+/* Stage duration step, in milliseconds */
+#define SC27XX_LEDS_STEP   125
+/* Minimum and maximum duration, in milliseconds */
+#define SC27XX_DELTA_T_MIN SC27XX_LEDS_STEP
+#define SC27XX_DELTA_T_MAX (SC27XX_LEDS_STEP * 255)
 
 struct sc27xx_led {
char name[LED_MAX_NAME_SIZE];
@@ -122,6 +132,113 @@ static int sc27xx_led_set(struct led_classdev *ldev, enum 
led_brightness value)
return err;
 }
 
+static void sc27xx_led_clamp_align_delta_t(u32 *delta_t)
+{
+   u32 v, offset, t = *delta_t;
+
+   v = t + SC27XX_LEDS_STEP / 2;
+   v = clamp_t(u32, v, SC27XX_DELTA_T_MIN, SC27XX_DELTA_T_MAX);
+   offset = v - SC27XX_DELTA_T_MIN;
+   offset = SC27XX_LEDS_STEP * (offset / SC27XX_LEDS_STEP);
+
+   *delta_t = SC27XX_DELTA_T_MIN + offset;
+}
+
+static int sc27xx_led_pattern_clear(struct led_classdev *ldev)
+{
+   struct sc27xx_led *leds = to_sc27xx_led(ldev);
+   struct regmap *regmap = leds->priv->regmap;
+   u32 base = sc27xx_led_get_offset(leds);
+   u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL;
+   u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line;
+   int err;
+
+   mutex_lock(>priv->lock);
+
+   /* Reset the rise, high, fall and low time to zero. */
+   regmap_write(regmap, base + SC27XX_LEDS_CURVE0, 0);
+   regmap_write(regmap, base + SC27XX_LEDS_CURVE1, 0);
+
+   err = regmap_update_bits(regmap, ctrl_base,
+   (SC27XX_LED_RUN | SC27XX_LED_TYPE) << ctrl_shift, 0);
+
+   ldev->brightness = LED_OFF;
+
+   mutex_unlock(>priv->lock);
+
+   return err;
+}
+
+static int sc27xx_led_pattern_set(struct led_classdev *ldev,
+ struct led_pattern *pattern,
+ u32 len, int repeat)
+{
+   struct sc27xx_led *leds = to_sc27xx_led(ldev);
+   u32 base = sc27xx_led_get_offset(leds);
+   u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL;
+   u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line;
+   struct regmap *regmap = leds->priv->regmap;
+   int err;
+
+   /*

[PATCH v12 2/2] leds: sc27xx: Add pattern_set/clear interfaces for LED controller

2018-09-10 Thread Baolin Wang
This patch implements the 'pattern_set'and 'pattern_clear'
interfaces to support SC27XX LED breathing mode.

Signed-off-by: Baolin Wang 
Acked-by: Pavel Machek 
---
Changes from v11:
 - None.

Changes from v10:
 - Add duration alignment function suggested by Jacek.
 - Add acked tag from Pavel.

Changes from v9:
 - Optimize the ABI documentation file.
 - Update the brightness value in hardware pattern mode.

Changes from v8:
 - Optimize the ABI documentation file.

Changes from v7:
 - Add its own ABI documentation file.

Changes from v6:
 - None.

Changes from v5:
 - None.

Changes from v4:
 - None.

Changes from v3:
 - None.

Changes from v2:
 - None.

Changes from v1:
 - Remove pattern_get interface.
---
 .../ABI/testing/sysfs-class-led-driver-sc27xx  |   22 
 drivers/leds/leds-sc27xx-bltc.c|  121 
 2 files changed, 143 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-led-driver-sc27xx

diff --git a/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx 
b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
new file mode 100644
index 000..45b1e60
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
@@ -0,0 +1,22 @@
+What:  /sys/class/leds//hw_pattern
+Date:  September 2018
+KernelVersion: 4.20
+Description:
+   Specify a hardware pattern for the SC27XX LED. For the SC27XX
+   LED controller, it only supports 4 stages to make a single
+   hardware pattern, which is used to configure the rise time,
+   high time, fall time and low time for the breathing mode.
+
+   For the breathing mode, the SC27XX LED only expects one 
brightness
+   for the high stage. To be compatible with the hardware pattern
+   format, we should set brightness as 0 for rise stage, fall
+   stage and low stage.
+
+   Min stage duration: 125 ms
+   Max stage duration: 31875 ms
+
+   Since the stage duration step is 125 ms, the duration should be
+   a multiplier of 125, like 125ms, 250ms, 375ms, 500ms ... 
31875ms.
+
+   Thus the format of the hardware pattern values should be:
+   "0 rise_duration brightness high_duration 0 fall_duration 0 
low_duration".
diff --git a/drivers/leds/leds-sc27xx-bltc.c b/drivers/leds/leds-sc27xx-bltc.c
index 9d9b7aa..fecf27f 100644
--- a/drivers/leds/leds-sc27xx-bltc.c
+++ b/drivers/leds/leds-sc27xx-bltc.c
@@ -32,8 +32,18 @@
 #define SC27XX_DUTY_MASK   GENMASK(15, 0)
 #define SC27XX_MOD_MASKGENMASK(7, 0)
 
+#define SC27XX_CURVE_SHIFT 8
+#define SC27XX_CURVE_L_MASKGENMASK(7, 0)
+#define SC27XX_CURVE_H_MASKGENMASK(15, 8)
+
 #define SC27XX_LEDS_OFFSET 0x10
 #define SC27XX_LEDS_MAX3
+#define SC27XX_LEDS_PATTERN_CNT4
+/* Stage duration step, in milliseconds */
+#define SC27XX_LEDS_STEP   125
+/* Minimum and maximum duration, in milliseconds */
+#define SC27XX_DELTA_T_MIN SC27XX_LEDS_STEP
+#define SC27XX_DELTA_T_MAX (SC27XX_LEDS_STEP * 255)
 
 struct sc27xx_led {
char name[LED_MAX_NAME_SIZE];
@@ -122,6 +132,113 @@ static int sc27xx_led_set(struct led_classdev *ldev, enum 
led_brightness value)
return err;
 }
 
+static void sc27xx_led_clamp_align_delta_t(u32 *delta_t)
+{
+   u32 v, offset, t = *delta_t;
+
+   v = t + SC27XX_LEDS_STEP / 2;
+   v = clamp_t(u32, v, SC27XX_DELTA_T_MIN, SC27XX_DELTA_T_MAX);
+   offset = v - SC27XX_DELTA_T_MIN;
+   offset = SC27XX_LEDS_STEP * (offset / SC27XX_LEDS_STEP);
+
+   *delta_t = SC27XX_DELTA_T_MIN + offset;
+}
+
+static int sc27xx_led_pattern_clear(struct led_classdev *ldev)
+{
+   struct sc27xx_led *leds = to_sc27xx_led(ldev);
+   struct regmap *regmap = leds->priv->regmap;
+   u32 base = sc27xx_led_get_offset(leds);
+   u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL;
+   u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line;
+   int err;
+
+   mutex_lock(>priv->lock);
+
+   /* Reset the rise, high, fall and low time to zero. */
+   regmap_write(regmap, base + SC27XX_LEDS_CURVE0, 0);
+   regmap_write(regmap, base + SC27XX_LEDS_CURVE1, 0);
+
+   err = regmap_update_bits(regmap, ctrl_base,
+   (SC27XX_LED_RUN | SC27XX_LED_TYPE) << ctrl_shift, 0);
+
+   ldev->brightness = LED_OFF;
+
+   mutex_unlock(>priv->lock);
+
+   return err;
+}
+
+static int sc27xx_led_pattern_set(struct led_classdev *ldev,
+ struct led_pattern *pattern,
+ u32 len, int repeat)
+{
+   struct sc27xx_led *leds = to_sc27xx_led(ldev);
+   u32 base = sc27xx_led_get_offset(leds);
+   u32 ctrl_base = leds->priv->base + SC27XX_LEDS_CTRL;
+   u8 ctrl_shift = SC27XX_CTRL_SHIFT * leds->line;
+   struct regmap *regmap = leds->priv->regmap;
+   int err;
+
+   /*

linux-next: build warning after merge of the usb tree

2018-09-10 Thread Stephen Rothwell
Hi Greg,

After merging the usb tree, today's linux-next build (arm
multi_v7_defconfig) produced this warning:

drivers/usb/core/hcd.c: In function '__usb_hcd_giveback_urb':
drivers/usb/core/hcd.c:1741:16: warning: unused variable 'flags' 
[-Wunused-variable]
  unsigned long flags;
^

Introduced by commit

  ed194d136769 ("usb: core: remove local_irq_save() around ->complete() 
handler")

-- 
Cheers,
Stephen Rothwell


pgpWOq5GNm2gm.pgp
Description: OpenPGP digital signature


linux-next: build warning after merge of the usb tree

2018-09-10 Thread Stephen Rothwell
Hi Greg,

After merging the usb tree, today's linux-next build (arm
multi_v7_defconfig) produced this warning:

drivers/usb/core/hcd.c: In function '__usb_hcd_giveback_urb':
drivers/usb/core/hcd.c:1741:16: warning: unused variable 'flags' 
[-Wunused-variable]
  unsigned long flags;
^

Introduced by commit

  ed194d136769 ("usb: core: remove local_irq_save() around ->complete() 
handler")

-- 
Cheers,
Stephen Rothwell


pgpWOq5GNm2gm.pgp
Description: OpenPGP digital signature


Re: [PATCH] iio: proximity: Add driver support for ST's VL53L0X ToF ranging sensor.

2018-09-10 Thread Song Qiang
On Mon, Sep 10, 2018 at 11:27:47PM +0530, Himanshu Jha wrote:
> On Mon, Sep 10, 2018 at 10:42:59PM +0800, Song Qiang wrote:
> > This driver was originally written by ST in 2016 as a misc input device,
> > and hasn't been maintained for a long time. I grabbed some code from
> > it's API and reformed it to a iio proximity device driver.
> > This version of driver uses i2c bus to talk to the sensor and
> > polling for measuring completes, so no irq line is needed.
> > This version of driver supports only one-shot mode, and it can be
> > tested with reading from
> > /sys/bus/iio/devices/iio:deviceX/in_distance_raw
> > 
> > Signed-off-by: Song Qiang 
> > ---
> 
> The Cc list contains developers who might not be relevant
> for the discussion.
> 
> So, copy only those people listed by:
> 
> $./scripts/get_maintainer.pl 
> 
> Don't know why Kate & Greg are cc'ed ?
> 

Hi Himanshu,

Since this is a new device driver may going to be added into
drivers/iio/proximity folder, I used drivers/iio/proximity as
the parameter of ./scripts/get_maintainer.pl and it just returned
them. I rechecked it and seems like it says Greg and Kate are
commit_signers. I send patches as Greg's speech "Write and Submit
Your First Linux Kernel Patch" told. So, should I just send the
patches to reviewers that get_maintainer.pl returned and stop cc
all the commit_signers?

> >  .../bindings/iio/proximity/vl53l0x.txt|  12 +
> >  drivers/iio/proximity/Kconfig |  13 +
> >  drivers/iio/proximity/Makefile|   2 +
> >  drivers/iio/proximity/vl53l0x-i2c.c   | 295 ++
> >  4 files changed, 322 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt
> >  create mode 100644 drivers/iio/proximity/vl53l0x-i2c.c
> > 
> > diff --git a/Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt 
> > b/Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt
> > new file mode 100644
> > index ..64b69442f08e
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt
> > @@ -0,0 +1,12 @@
> > +ST's VL53L0X ToF ranging sensor
> > +
> > +Required properties:
> > +   - compatible: must be "st,vl53l0x-i2c"
> > +   - reg: i2c address where to find the device
> > +
> > +Example:
> > +
> > +vl53l0x@29 {
> > +   compatible = "st,vl53l0x-i2c";
> > +   reg = <0x29>;
> > +};
> > diff --git a/drivers/iio/proximity/Kconfig b/drivers/iio/proximity/Kconfig
> > index f726f9427602..1563a5f9144d 100644
> > --- a/drivers/iio/proximity/Kconfig
> > +++ b/drivers/iio/proximity/Kconfig
> > @@ -79,4 +79,17 @@ config SRF08
> >   To compile this driver as a module, choose M here: the
> >   module will be called srf08.
> >  
> > +config VL53L0X_I2C
> > +   tristate "STMicroelectronics VL53L0X ToF ranger sensor (I2C)"
> > +   select IIO_BUFFER
> > +   select IIO_TRIGGERED_BUFFER
> 
> I don't see any buffer/trigger support, so better to remove these
> two options.
> 
> > +   depends on I2C
> > +   help
> > + Say Y here to build a driver for STMicroelectronics VL53L0X
> > + ToF ranger sensors with i2c interface.
> > + This driver can be used to measure the distance of objects.
> > +
> > + To compile this driver as a module, choose M here: the
> > + module will be called vl53l0x-i2c.
> 
> `name` attribute will be VL53L0X_DRV_NAME(vl53l0x) if OF matching
> is not used to probe the driver.
> 
> >  endmenu
> > diff --git a/drivers/iio/proximity/Makefile b/drivers/iio/proximity/Makefile
> > index 4f4ed45e87ef..7cb771665c8b 100644
> > --- a/drivers/iio/proximity/Makefile
> > +++ b/drivers/iio/proximity/Makefile
> > @@ -10,3 +10,5 @@ obj-$(CONFIG_RFD77402)+= rfd77402.o
> >  obj-$(CONFIG_SRF04)+= srf04.o
> >  obj-$(CONFIG_SRF08)+= srf08.o
> >  obj-$(CONFIG_SX9500)   += sx9500.o
> > +obj-$(CONFIG_VL53L0X_I2C)  += vl53l0x-i2c.o
> > +
> > diff --git a/drivers/iio/proximity/vl53l0x-i2c.c 
> > b/drivers/iio/proximity/vl53l0x-i2c.c
> > new file mode 100644
> > index ..c00713041d30
> > --- /dev/null
> > +++ b/drivers/iio/proximity/vl53l0x-i2c.c
> > @@ -0,0 +1,295 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +/*
> > + *  vl53l0x-i2c.c - Support for STM VL53L0X FlightSense TOF
> > + * Ranger Sensor on a i2c bus.
> > + *
> > + *  Copyright (C) 2016 STMicroelectronics Imaging Division.
> > + *  Copyright (C) 2018 Song Qiang 
> > + *
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define VL53L0X_DRV_NAME   "vl53l0x"
> > +
> > +/* Device register map */
> > +#define VL_REG_SYSRANGE_START  0x000
> > +#define VL_REG_SYSRANGE_MODE_MASK  0x0F
> > +#define VL_REG_SYSRANGE_MODE_START_STOP0x01
> > +#define VL_REG_SYSRANGE_MODE_SINGLESHOT0x00
> > 

Re: [PATCH] iio: proximity: Add driver support for ST's VL53L0X ToF ranging sensor.

2018-09-10 Thread Song Qiang
On Mon, Sep 10, 2018 at 11:27:47PM +0530, Himanshu Jha wrote:
> On Mon, Sep 10, 2018 at 10:42:59PM +0800, Song Qiang wrote:
> > This driver was originally written by ST in 2016 as a misc input device,
> > and hasn't been maintained for a long time. I grabbed some code from
> > it's API and reformed it to a iio proximity device driver.
> > This version of driver uses i2c bus to talk to the sensor and
> > polling for measuring completes, so no irq line is needed.
> > This version of driver supports only one-shot mode, and it can be
> > tested with reading from
> > /sys/bus/iio/devices/iio:deviceX/in_distance_raw
> > 
> > Signed-off-by: Song Qiang 
> > ---
> 
> The Cc list contains developers who might not be relevant
> for the discussion.
> 
> So, copy only those people listed by:
> 
> $./scripts/get_maintainer.pl 
> 
> Don't know why Kate & Greg are cc'ed ?
> 

Hi Himanshu,

Since this is a new device driver may going to be added into
drivers/iio/proximity folder, I used drivers/iio/proximity as
the parameter of ./scripts/get_maintainer.pl and it just returned
them. I rechecked it and seems like it says Greg and Kate are
commit_signers. I send patches as Greg's speech "Write and Submit
Your First Linux Kernel Patch" told. So, should I just send the
patches to reviewers that get_maintainer.pl returned and stop cc
all the commit_signers?

> >  .../bindings/iio/proximity/vl53l0x.txt|  12 +
> >  drivers/iio/proximity/Kconfig |  13 +
> >  drivers/iio/proximity/Makefile|   2 +
> >  drivers/iio/proximity/vl53l0x-i2c.c   | 295 ++
> >  4 files changed, 322 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt
> >  create mode 100644 drivers/iio/proximity/vl53l0x-i2c.c
> > 
> > diff --git a/Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt 
> > b/Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt
> > new file mode 100644
> > index ..64b69442f08e
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/iio/proximity/vl53l0x.txt
> > @@ -0,0 +1,12 @@
> > +ST's VL53L0X ToF ranging sensor
> > +
> > +Required properties:
> > +   - compatible: must be "st,vl53l0x-i2c"
> > +   - reg: i2c address where to find the device
> > +
> > +Example:
> > +
> > +vl53l0x@29 {
> > +   compatible = "st,vl53l0x-i2c";
> > +   reg = <0x29>;
> > +};
> > diff --git a/drivers/iio/proximity/Kconfig b/drivers/iio/proximity/Kconfig
> > index f726f9427602..1563a5f9144d 100644
> > --- a/drivers/iio/proximity/Kconfig
> > +++ b/drivers/iio/proximity/Kconfig
> > @@ -79,4 +79,17 @@ config SRF08
> >   To compile this driver as a module, choose M here: the
> >   module will be called srf08.
> >  
> > +config VL53L0X_I2C
> > +   tristate "STMicroelectronics VL53L0X ToF ranger sensor (I2C)"
> > +   select IIO_BUFFER
> > +   select IIO_TRIGGERED_BUFFER
> 
> I don't see any buffer/trigger support, so better to remove these
> two options.
> 
> > +   depends on I2C
> > +   help
> > + Say Y here to build a driver for STMicroelectronics VL53L0X
> > + ToF ranger sensors with i2c interface.
> > + This driver can be used to measure the distance of objects.
> > +
> > + To compile this driver as a module, choose M here: the
> > + module will be called vl53l0x-i2c.
> 
> `name` attribute will be VL53L0X_DRV_NAME(vl53l0x) if OF matching
> is not used to probe the driver.
> 
> >  endmenu
> > diff --git a/drivers/iio/proximity/Makefile b/drivers/iio/proximity/Makefile
> > index 4f4ed45e87ef..7cb771665c8b 100644
> > --- a/drivers/iio/proximity/Makefile
> > +++ b/drivers/iio/proximity/Makefile
> > @@ -10,3 +10,5 @@ obj-$(CONFIG_RFD77402)+= rfd77402.o
> >  obj-$(CONFIG_SRF04)+= srf04.o
> >  obj-$(CONFIG_SRF08)+= srf08.o
> >  obj-$(CONFIG_SX9500)   += sx9500.o
> > +obj-$(CONFIG_VL53L0X_I2C)  += vl53l0x-i2c.o
> > +
> > diff --git a/drivers/iio/proximity/vl53l0x-i2c.c 
> > b/drivers/iio/proximity/vl53l0x-i2c.c
> > new file mode 100644
> > index ..c00713041d30
> > --- /dev/null
> > +++ b/drivers/iio/proximity/vl53l0x-i2c.c
> > @@ -0,0 +1,295 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +/*
> > + *  vl53l0x-i2c.c - Support for STM VL53L0X FlightSense TOF
> > + * Ranger Sensor on a i2c bus.
> > + *
> > + *  Copyright (C) 2016 STMicroelectronics Imaging Division.
> > + *  Copyright (C) 2018 Song Qiang 
> > + *
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define VL53L0X_DRV_NAME   "vl53l0x"
> > +
> > +/* Device register map */
> > +#define VL_REG_SYSRANGE_START  0x000
> > +#define VL_REG_SYSRANGE_MODE_MASK  0x0F
> > +#define VL_REG_SYSRANGE_MODE_START_STOP0x01
> > +#define VL_REG_SYSRANGE_MODE_SINGLESHOT0x00
> > 

[PATCH v2 4/4] arm64: dts: rockchip: Enable SD card detection for Rock960 boards

2018-09-10 Thread Manivannan Sadhasivam
For proper working of SD cards, let's add the Card Detect GPIO property
to the common devicetree for Rock960 family boards.

Signed-off-by: Manivannan Sadhasivam 
---
 arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi
index 5a5d8e28ef55..f68254831ad9 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi
@@ -403,6 +403,7 @@
cap-sd-highspeed;
clock-frequency = <1>;
clock-freq-min-max = <10 1>;
+   cd-gpios = < 7 GPIO_ACTIVE_LOW>;
disable-wp;
sd-uhs-sdr104;
vqmmc-supply = <_sd>;
-- 
2.17.1



[PATCH v2 2/4] dt-bindings: arm: rockchip: Add binding for Rock960 board

2018-09-10 Thread Manivannan Sadhasivam
Add devicetree binding for Rock960 board from Vamrs Limited.

Signed-off-by: Manivannan Sadhasivam 
---
 Documentation/devicetree/bindings/arm/rockchip.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/rockchip.txt 
b/Documentation/devicetree/bindings/arm/rockchip.txt
index acfd3c773dd0..4b6888a21db2 100644
--- a/Documentation/devicetree/bindings/arm/rockchip.txt
+++ b/Documentation/devicetree/bindings/arm/rockchip.txt
@@ -5,6 +5,10 @@ Rockchip platforms device tree bindings
 Required root node properties:
   - compatible = "vamrs,ficus", "rockchip,rk3399";
 
+- 96boards RK3399 Rock960 (ROCK960 Consumer Edition)
+Required root node properties:
+  - compatible = "vamrs,rock960", "rockchip,rk3399";
+
 - Amarula Vyasa RK3288 board
 Required root node properties:
   - compatible = "amarula,vyasa-rk3288", "rockchip,rk3288";
-- 
2.17.1



[PATCH v2 4/4] arm64: dts: rockchip: Enable SD card detection for Rock960 boards

2018-09-10 Thread Manivannan Sadhasivam
For proper working of SD cards, let's add the Card Detect GPIO property
to the common devicetree for Rock960 family boards.

Signed-off-by: Manivannan Sadhasivam 
---
 arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi
index 5a5d8e28ef55..f68254831ad9 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi
@@ -403,6 +403,7 @@
cap-sd-highspeed;
clock-frequency = <1>;
clock-freq-min-max = <10 1>;
+   cd-gpios = < 7 GPIO_ACTIVE_LOW>;
disable-wp;
sd-uhs-sdr104;
vqmmc-supply = <_sd>;
-- 
2.17.1



[PATCH v2 2/4] dt-bindings: arm: rockchip: Add binding for Rock960 board

2018-09-10 Thread Manivannan Sadhasivam
Add devicetree binding for Rock960 board from Vamrs Limited.

Signed-off-by: Manivannan Sadhasivam 
---
 Documentation/devicetree/bindings/arm/rockchip.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/rockchip.txt 
b/Documentation/devicetree/bindings/arm/rockchip.txt
index acfd3c773dd0..4b6888a21db2 100644
--- a/Documentation/devicetree/bindings/arm/rockchip.txt
+++ b/Documentation/devicetree/bindings/arm/rockchip.txt
@@ -5,6 +5,10 @@ Rockchip platforms device tree bindings
 Required root node properties:
   - compatible = "vamrs,ficus", "rockchip,rk3399";
 
+- 96boards RK3399 Rock960 (ROCK960 Consumer Edition)
+Required root node properties:
+  - compatible = "vamrs,rock960", "rockchip,rk3399";
+
 - Amarula Vyasa RK3288 board
 Required root node properties:
   - compatible = "amarula,vyasa-rk3288", "rockchip,rk3288";
-- 
2.17.1



[PATCH v2 3/4] arm64: boot: dts: rockchip: Add support for Rock960 board

2018-09-10 Thread Manivannan Sadhasivam
Add devicetree support for Rock960 board, one of the Consumer Edition
boards of the 96Boards family. This board support utilizes the common
Rock960 family board support that includes Ficus 96Board.

Signed-off-by: Manivannan Sadhasivam 
---
 arch/arm64/boot/dts/rockchip/Makefile |   1 +
 .../boot/dts/rockchip/rk3399-rock960.dts  | 139 ++
 2 files changed, 140 insertions(+)
 create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dts

diff --git a/arch/arm64/boot/dts/rockchip/Makefile 
b/arch/arm64/boot/dts/rockchip/Makefile
index b0092d95b574..57c0d76458e6 100644
--- a/arch/arm64/boot/dts/rockchip/Makefile
+++ b/arch/arm64/boot/dts/rockchip/Makefile
@@ -14,5 +14,6 @@ dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-firefly.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-gru-bob.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-gru-kevin.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-puma-haikou.dtb
+dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-rock960.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-sapphire.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-sapphire-excavator.dtb
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock960.dts 
b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dts
new file mode 100644
index ..37242b64a7a3
--- /dev/null
+++ b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dts
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+/*
+ * Copyright (c) 2018 Linaro Ltd.
+ */
+
+/dts-v1/;
+#include "rk3399-rock960.dtsi"
+
+/ {
+   model = "96boards Rock960";
+   compatible = "vamrs,rock960", "rockchip,rk3399";
+
+   chosen {
+   stdout-path = "serial2:150n8";
+   };
+
+   vcc3v3_pcie: vcc3v3-pcie-regulator {
+   compatible = "regulator-fixed";
+   enable-active-high;
+   gpio = < 5 GPIO_ACTIVE_HIGH>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_drv>;
+   regulator-boot-on;
+   regulator-name = "vcc3v3_pcie";
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   vin-supply = <_sys>;
+   };
+
+   vcc5v0_host: vcc5v0-host-regulator {
+   compatible = "regulator-fixed";
+   enable-active-high;
+   gpio = < 25 GPIO_ACTIVE_HIGH>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_vbus_drv>;
+   regulator-name = "vcc5v0_host";
+   regulator-min-microvolt = <500>;
+   regulator-max-microvolt = <500>;
+   regulator-always-on;
+   vin-supply = <_sys>;
+   };
+};
+
+ {
+   pcie {
+   pcie_drv: pcie-drv {
+   rockchip,pins =
+   <2 RK_PA5 RK_FUNC_GPIO _pull_none>;
+   };
+   };
+
+   usb2 {
+   host_vbus_drv: host-vbus-drv {
+   rockchip,pins =
+   <4 RK_PD1 RK_FUNC_GPIO _pull_none>;
+   };
+   };
+};
+
+_phy {
+   status = "okay";
+};
+
+ {
+   ep-gpios = < RK_PA2 GPIO_ACTIVE_HIGH>;
+   num-lanes = <4>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_clkreqn_cpm>;
+   vpcie3v3-supply = <_pcie>;
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+_host {
+   phy-supply = <_host>;
+   status = "okay";
+};
+
+_host {
+   phy-supply = <_host>;
+   status = "okay";
+};
+
+_otg {
+   status = "okay";
+};
+
+_otg {
+   status = "okay";
+};
+
+_host0_ehci {
+   status = "okay";
+};
+
+_host0_ohci {
+   status = "okay";
+};
+
+_host1_ehci {
+   status = "okay";
+};
+
+_host1_ohci {
+   status = "okay";
+};
+
+_0 {
+   status = "okay";
+};
+
+_dwc3_0 {
+   status = "okay";
+   dr_mode = "otg";
+};
+
+_1 {
+   status = "okay";
+};
+
+_dwc3_1 {
+   status = "okay";
+   dr_mode = "host";
+};
-- 
2.17.1



[PATCH v2 3/4] arm64: boot: dts: rockchip: Add support for Rock960 board

2018-09-10 Thread Manivannan Sadhasivam
Add devicetree support for Rock960 board, one of the Consumer Edition
boards of the 96Boards family. This board support utilizes the common
Rock960 family board support that includes Ficus 96Board.

Signed-off-by: Manivannan Sadhasivam 
---
 arch/arm64/boot/dts/rockchip/Makefile |   1 +
 .../boot/dts/rockchip/rk3399-rock960.dts  | 139 ++
 2 files changed, 140 insertions(+)
 create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dts

diff --git a/arch/arm64/boot/dts/rockchip/Makefile 
b/arch/arm64/boot/dts/rockchip/Makefile
index b0092d95b574..57c0d76458e6 100644
--- a/arch/arm64/boot/dts/rockchip/Makefile
+++ b/arch/arm64/boot/dts/rockchip/Makefile
@@ -14,5 +14,6 @@ dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-firefly.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-gru-bob.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-gru-kevin.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-puma-haikou.dtb
+dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-rock960.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-sapphire.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-sapphire-excavator.dtb
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock960.dts 
b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dts
new file mode 100644
index ..37242b64a7a3
--- /dev/null
+++ b/arch/arm64/boot/dts/rockchip/rk3399-rock960.dts
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+/*
+ * Copyright (c) 2018 Linaro Ltd.
+ */
+
+/dts-v1/;
+#include "rk3399-rock960.dtsi"
+
+/ {
+   model = "96boards Rock960";
+   compatible = "vamrs,rock960", "rockchip,rk3399";
+
+   chosen {
+   stdout-path = "serial2:150n8";
+   };
+
+   vcc3v3_pcie: vcc3v3-pcie-regulator {
+   compatible = "regulator-fixed";
+   enable-active-high;
+   gpio = < 5 GPIO_ACTIVE_HIGH>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_drv>;
+   regulator-boot-on;
+   regulator-name = "vcc3v3_pcie";
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   vin-supply = <_sys>;
+   };
+
+   vcc5v0_host: vcc5v0-host-regulator {
+   compatible = "regulator-fixed";
+   enable-active-high;
+   gpio = < 25 GPIO_ACTIVE_HIGH>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_vbus_drv>;
+   regulator-name = "vcc5v0_host";
+   regulator-min-microvolt = <500>;
+   regulator-max-microvolt = <500>;
+   regulator-always-on;
+   vin-supply = <_sys>;
+   };
+};
+
+ {
+   pcie {
+   pcie_drv: pcie-drv {
+   rockchip,pins =
+   <2 RK_PA5 RK_FUNC_GPIO _pull_none>;
+   };
+   };
+
+   usb2 {
+   host_vbus_drv: host-vbus-drv {
+   rockchip,pins =
+   <4 RK_PD1 RK_FUNC_GPIO _pull_none>;
+   };
+   };
+};
+
+_phy {
+   status = "okay";
+};
+
+ {
+   ep-gpios = < RK_PA2 GPIO_ACTIVE_HIGH>;
+   num-lanes = <4>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_clkreqn_cpm>;
+   vpcie3v3-supply = <_pcie>;
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+_host {
+   phy-supply = <_host>;
+   status = "okay";
+};
+
+_host {
+   phy-supply = <_host>;
+   status = "okay";
+};
+
+_otg {
+   status = "okay";
+};
+
+_otg {
+   status = "okay";
+};
+
+_host0_ehci {
+   status = "okay";
+};
+
+_host0_ohci {
+   status = "okay";
+};
+
+_host1_ehci {
+   status = "okay";
+};
+
+_host1_ohci {
+   status = "okay";
+};
+
+_0 {
+   status = "okay";
+};
+
+_dwc3_0 {
+   status = "okay";
+   dr_mode = "otg";
+};
+
+_1 {
+   status = "okay";
+};
+
+_dwc3_1 {
+   status = "okay";
+   dr_mode = "host";
+};
-- 
2.17.1



[PATCH v2 1/4] arm64: dts: rockchip: Split out common nodes for Rock960 based boards

2018-09-10 Thread Manivannan Sadhasivam
Since the same family members of Rock960 boards (Rock960 and Ficus)
share the same configuration, split out the common nodes into a common
dtsi file for reducing code duplication. The board specific nodes for
Ficus boards are then placed in corresponding board DTS file.

Signed-off-by: Manivannan Sadhasivam 
---
 arch/arm64/boot/dts/rockchip/rk3399-ficus.dts | 429 +
 .../boot/dts/rockchip/rk3399-rock960.dtsi | 439 ++
 2 files changed, 440 insertions(+), 428 deletions(-)
 create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts 
b/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts
index 8978d924eb83..7f6ec37d5a69 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts
@@ -7,8 +7,7 @@
  */
 
 /dts-v1/;
-#include "rk3399.dtsi"
-#include "rk3399-opp.dtsi"
+#include "rk3399-rock960.dtsi"
 
 / {
model = "96boards RK3399 Ficus";
@@ -25,31 +24,6 @@
#clock-cells = <0>;
};
 
-   vcc1v8_s0: vcc1v8-s0 {
-   compatible = "regulator-fixed";
-   regulator-name = "vcc1v8_s0";
-   regulator-min-microvolt = <180>;
-   regulator-max-microvolt = <180>;
-   regulator-always-on;
-   };
-
-   vcc_sys: vcc-sys {
-   compatible = "regulator-fixed";
-   regulator-name = "vcc_sys";
-   regulator-min-microvolt = <500>;
-   regulator-max-microvolt = <500>;
-   regulator-always-on;
-   };
-
-   vcc3v3_sys: vcc3v3-sys {
-   compatible = "regulator-fixed";
-   regulator-name = "vcc3v3_sys";
-   regulator-min-microvolt = <330>;
-   regulator-max-microvolt = <330>;
-   regulator-always-on;
-   vin-supply = <_sys>;
-   };
-
vcc3v3_pcie: vcc3v3-pcie-regulator {
compatible = "regulator-fixed";
enable-active-high;
@@ -75,46 +49,6 @@
regulator-always-on;
vin-supply = <_sys>;
};
-
-   vdd_log: vdd-log {
-   compatible = "pwm-regulator";
-   pwms = < 0 25000 0>;
-   regulator-name = "vdd_log";
-   regulator-min-microvolt = <80>;
-   regulator-max-microvolt = <140>;
-   regulator-always-on;
-   regulator-boot-on;
-   vin-supply = <_sys>;
-   };
-
-};
-
-_l0 {
-   cpu-supply = <_cpu_l>;
-};
-
-_l1 {
-   cpu-supply = <_cpu_l>;
-};
-
-_l2 {
-   cpu-supply = <_cpu_l>;
-};
-
-_l3 {
-   cpu-supply = <_cpu_l>;
-};
-
-_b0 {
-   cpu-supply = <_cpu_b>;
-};
-
-_b1 {
-   cpu-supply = <_cpu_b>;
-};
-
-_phy {
-   status = "okay";
 };
 
  {
@@ -133,263 +67,6 @@
status = "okay";
 };
 
- {
-   ddc-i2c-bus = <>;
-   pinctrl-names = "default";
-   pinctrl-0 = <_cec>;
-   status = "okay";
-};
-
- {
-   clock-frequency = <40>;
-   i2c-scl-rising-time-ns = <168>;
-   i2c-scl-falling-time-ns = <4>;
-   status = "okay";
-
-   vdd_cpu_b: regulator@40 {
-   compatible = "silergy,syr827";
-   reg = <0x40>;
-   fcs,suspend-voltage-selector = <1>;
-   regulator-name = "vdd_cpu_b";
-   regulator-min-microvolt = <712500>;
-   regulator-max-microvolt = <150>;
-   regulator-ramp-delay = <1000>;
-   regulator-always-on;
-   regulator-boot-on;
-   vin-supply = <_sys>;
-   status = "okay";
-
-   regulator-state-mem {
-   regulator-off-in-suspend;
-   };
-   };
-
-   vdd_gpu: regulator@41 {
-   compatible = "silergy,syr828";
-   reg = <0x41>;
-   fcs,suspend-voltage-selector = <1>;
-   regulator-name = "vdd_gpu";
-   regulator-min-microvolt = <712500>;
-   regulator-max-microvolt = <150>;
-   regulator-ramp-delay = <1000>;
-   regulator-always-on;
-   regulator-boot-on;
-   vin-supply = <_sys>;
-   regulator-state-mem {
-   regulator-off-in-suspend;
-   };
-   };
-
-   rk808: pmic@1b {
-   compatible = "rockchip,rk808";
-   reg = <0x1b>;
-   interrupt-parent = <>;
-   interrupts = <21 IRQ_TYPE_LEVEL_LOW>;
-   pinctrl-names = "default";
-   pinctrl-0 = <_int_l>;
-   rockchip,system-power-controller;
-   wakeup-source;
-   #clock-cells = <1>;
-   clock-output-names = "xin32k", "rk808-clkout2";
-
-   vcc1-supply = <_sys>;
-   vcc2-supply = <_sys>;
-   vcc3-supply = <_sys>;
-   

[PATCH v2 1/4] arm64: dts: rockchip: Split out common nodes for Rock960 based boards

2018-09-10 Thread Manivannan Sadhasivam
Since the same family members of Rock960 boards (Rock960 and Ficus)
share the same configuration, split out the common nodes into a common
dtsi file for reducing code duplication. The board specific nodes for
Ficus boards are then placed in corresponding board DTS file.

Signed-off-by: Manivannan Sadhasivam 
---
 arch/arm64/boot/dts/rockchip/rk3399-ficus.dts | 429 +
 .../boot/dts/rockchip/rk3399-rock960.dtsi | 439 ++
 2 files changed, 440 insertions(+), 428 deletions(-)
 create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts 
b/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts
index 8978d924eb83..7f6ec37d5a69 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3399-ficus.dts
@@ -7,8 +7,7 @@
  */
 
 /dts-v1/;
-#include "rk3399.dtsi"
-#include "rk3399-opp.dtsi"
+#include "rk3399-rock960.dtsi"
 
 / {
model = "96boards RK3399 Ficus";
@@ -25,31 +24,6 @@
#clock-cells = <0>;
};
 
-   vcc1v8_s0: vcc1v8-s0 {
-   compatible = "regulator-fixed";
-   regulator-name = "vcc1v8_s0";
-   regulator-min-microvolt = <180>;
-   regulator-max-microvolt = <180>;
-   regulator-always-on;
-   };
-
-   vcc_sys: vcc-sys {
-   compatible = "regulator-fixed";
-   regulator-name = "vcc_sys";
-   regulator-min-microvolt = <500>;
-   regulator-max-microvolt = <500>;
-   regulator-always-on;
-   };
-
-   vcc3v3_sys: vcc3v3-sys {
-   compatible = "regulator-fixed";
-   regulator-name = "vcc3v3_sys";
-   regulator-min-microvolt = <330>;
-   regulator-max-microvolt = <330>;
-   regulator-always-on;
-   vin-supply = <_sys>;
-   };
-
vcc3v3_pcie: vcc3v3-pcie-regulator {
compatible = "regulator-fixed";
enable-active-high;
@@ -75,46 +49,6 @@
regulator-always-on;
vin-supply = <_sys>;
};
-
-   vdd_log: vdd-log {
-   compatible = "pwm-regulator";
-   pwms = < 0 25000 0>;
-   regulator-name = "vdd_log";
-   regulator-min-microvolt = <80>;
-   regulator-max-microvolt = <140>;
-   regulator-always-on;
-   regulator-boot-on;
-   vin-supply = <_sys>;
-   };
-
-};
-
-_l0 {
-   cpu-supply = <_cpu_l>;
-};
-
-_l1 {
-   cpu-supply = <_cpu_l>;
-};
-
-_l2 {
-   cpu-supply = <_cpu_l>;
-};
-
-_l3 {
-   cpu-supply = <_cpu_l>;
-};
-
-_b0 {
-   cpu-supply = <_cpu_b>;
-};
-
-_b1 {
-   cpu-supply = <_cpu_b>;
-};
-
-_phy {
-   status = "okay";
 };
 
  {
@@ -133,263 +67,6 @@
status = "okay";
 };
 
- {
-   ddc-i2c-bus = <>;
-   pinctrl-names = "default";
-   pinctrl-0 = <_cec>;
-   status = "okay";
-};
-
- {
-   clock-frequency = <40>;
-   i2c-scl-rising-time-ns = <168>;
-   i2c-scl-falling-time-ns = <4>;
-   status = "okay";
-
-   vdd_cpu_b: regulator@40 {
-   compatible = "silergy,syr827";
-   reg = <0x40>;
-   fcs,suspend-voltage-selector = <1>;
-   regulator-name = "vdd_cpu_b";
-   regulator-min-microvolt = <712500>;
-   regulator-max-microvolt = <150>;
-   regulator-ramp-delay = <1000>;
-   regulator-always-on;
-   regulator-boot-on;
-   vin-supply = <_sys>;
-   status = "okay";
-
-   regulator-state-mem {
-   regulator-off-in-suspend;
-   };
-   };
-
-   vdd_gpu: regulator@41 {
-   compatible = "silergy,syr828";
-   reg = <0x41>;
-   fcs,suspend-voltage-selector = <1>;
-   regulator-name = "vdd_gpu";
-   regulator-min-microvolt = <712500>;
-   regulator-max-microvolt = <150>;
-   regulator-ramp-delay = <1000>;
-   regulator-always-on;
-   regulator-boot-on;
-   vin-supply = <_sys>;
-   regulator-state-mem {
-   regulator-off-in-suspend;
-   };
-   };
-
-   rk808: pmic@1b {
-   compatible = "rockchip,rk808";
-   reg = <0x1b>;
-   interrupt-parent = <>;
-   interrupts = <21 IRQ_TYPE_LEVEL_LOW>;
-   pinctrl-names = "default";
-   pinctrl-0 = <_int_l>;
-   rockchip,system-power-controller;
-   wakeup-source;
-   #clock-cells = <1>;
-   clock-output-names = "xin32k", "rk808-clkout2";
-
-   vcc1-supply = <_sys>;
-   vcc2-supply = <_sys>;
-   vcc3-supply = <_sys>;
-   

Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer

2018-09-10 Thread Sergey Senozhatsky
On (09/10/18 16:57), Petr Mladek wrote:
> 
> Good catch.
> 
> > ---
> > 
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index c036f128cdc3..ede29a7ba6db 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2545,6 +2545,7 @@ void console_flush_on_panic(void)
> >  * ensure may_schedule is cleared.
> >  */
> > console_trylock();
> > +   exclusive_console = NULL;
> 
> This is not be enough. It would cause replying old messages
> on all consoles.

Oh, that was intentional. I consider repeated messages to be less
problematic than the missing ones.

> Most problems should probably be solved when we store console_seq
> before setting exclusive_console. Then we could clear
> exclusive_console when reaching the stored sequence number.
> 
> Can this be that simple? ;-)

This can work, yes.

I also thought about doing it the way Linus, Jan Kara and Hannes
Reinecke proposed:

- store the console_seq nr of the first oops_in_progress message
  (oops_console_seq) and flush only messages that are in
  [oops_console_seq - 200, log_next_seq] range, as opposed to complete
  logbuf flush.

Hannes asked for this several times. And it was in Jan's printk patches
long time ago (if I'm not mistaken - sorry if I am -- Jan said that Linus
wanted that "just N messages prior to oops" thing).

Jan's patch:
 
https://lore.kernel.org/lkml/1457964820-4642-3-git-send-email-sergey.senozhat...@gmail.com/T/#u


> This reverts commit 375899cddcbb26881b03cb3fbdcfd600e4e67f4a.
> 
> Reported-by: Hans de Goede 
> Signed-off-by: Petr Mladek 

Acked-by: Sergey Senozhatsky 

-ss


[PATCH v2 0/4] Add 96Boards Rock960 CE board support

2018-09-10 Thread Manivannan Sadhasivam
This patchset adds 96Boards Rock960 CE board support. Rock960 CE
(Consumer Edition) board is one of the member of 96Boards Consumer
Edition and AI platform and is manufactured by Vamrs Limited. Most of
the board configuration is shared with the Ficus board manufactured by
vamrs, which is an Enterprise 96Board.

For the sake of avoiding code duplication, a common rock960.dtsi file
with common DT nodes for both boards and separate board specific DTS
files has been added.

To be specific, below are some of the key differences between both
boards:

1. Different host enable GPIO for USB
2. Different power and reset GPIO for PCI-E
3. No Ethernet port on Rock960

While adding the board support, SD card Chip detection support is also
added to the common dtsi file, shared by both boards.

This series has been tested on Rock960 CE v1.2 board and expecting the
Ficus board maintainer to test the relevant Ficus part.

Thanks,
Mani

Changes in v2:

* Changed the board compatible to "vamrs,rock960"

Manivannan Sadhasivam (4):
  arm64: dts: rockchip: Split out common nodes for Rock960 based boards
  dt-bindings: arm: rockchip: Add binding for Rock960 board
  arm64: boot: dts: rockchip: Add support for Rock960 board
  arm64: dts: rockchip: Enable SD card detection for Rock960 boards

 .../devicetree/bindings/arm/rockchip.txt  |   4 +
 arch/arm64/boot/dts/rockchip/Makefile |   1 +
 arch/arm64/boot/dts/rockchip/rk3399-ficus.dts | 429 +
 .../boot/dts/rockchip/rk3399-rock960.dts  | 139 ++
 .../boot/dts/rockchip/rk3399-rock960.dtsi | 440 ++
 5 files changed, 585 insertions(+), 428 deletions(-)
 create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dts
 create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi

-- 
2.17.1



Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer

2018-09-10 Thread Sergey Senozhatsky
On (09/10/18 16:57), Petr Mladek wrote:
> 
> Good catch.
> 
> > ---
> > 
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index c036f128cdc3..ede29a7ba6db 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2545,6 +2545,7 @@ void console_flush_on_panic(void)
> >  * ensure may_schedule is cleared.
> >  */
> > console_trylock();
> > +   exclusive_console = NULL;
> 
> This is not be enough. It would cause replying old messages
> on all consoles.

Oh, that was intentional. I consider repeated messages to be less
problematic than the missing ones.

> Most problems should probably be solved when we store console_seq
> before setting exclusive_console. Then we could clear
> exclusive_console when reaching the stored sequence number.
> 
> Can this be that simple? ;-)

This can work, yes.

I also thought about doing it the way Linus, Jan Kara and Hannes
Reinecke proposed:

- store the console_seq nr of the first oops_in_progress message
  (oops_console_seq) and flush only messages that are in
  [oops_console_seq - 200, log_next_seq] range, as opposed to complete
  logbuf flush.

Hannes asked for this several times. And it was in Jan's printk patches
long time ago (if I'm not mistaken - sorry if I am -- Jan said that Linus
wanted that "just N messages prior to oops" thing).

Jan's patch:
 
https://lore.kernel.org/lkml/1457964820-4642-3-git-send-email-sergey.senozhat...@gmail.com/T/#u


> This reverts commit 375899cddcbb26881b03cb3fbdcfd600e4e67f4a.
> 
> Reported-by: Hans de Goede 
> Signed-off-by: Petr Mladek 

Acked-by: Sergey Senozhatsky 

-ss


[PATCH v2 0/4] Add 96Boards Rock960 CE board support

2018-09-10 Thread Manivannan Sadhasivam
This patchset adds 96Boards Rock960 CE board support. Rock960 CE
(Consumer Edition) board is one of the member of 96Boards Consumer
Edition and AI platform and is manufactured by Vamrs Limited. Most of
the board configuration is shared with the Ficus board manufactured by
vamrs, which is an Enterprise 96Board.

For the sake of avoiding code duplication, a common rock960.dtsi file
with common DT nodes for both boards and separate board specific DTS
files has been added.

To be specific, below are some of the key differences between both
boards:

1. Different host enable GPIO for USB
2. Different power and reset GPIO for PCI-E
3. No Ethernet port on Rock960

While adding the board support, SD card Chip detection support is also
added to the common dtsi file, shared by both boards.

This series has been tested on Rock960 CE v1.2 board and expecting the
Ficus board maintainer to test the relevant Ficus part.

Thanks,
Mani

Changes in v2:

* Changed the board compatible to "vamrs,rock960"

Manivannan Sadhasivam (4):
  arm64: dts: rockchip: Split out common nodes for Rock960 based boards
  dt-bindings: arm: rockchip: Add binding for Rock960 board
  arm64: boot: dts: rockchip: Add support for Rock960 board
  arm64: dts: rockchip: Enable SD card detection for Rock960 boards

 .../devicetree/bindings/arm/rockchip.txt  |   4 +
 arch/arm64/boot/dts/rockchip/Makefile |   1 +
 arch/arm64/boot/dts/rockchip/rk3399-ficus.dts | 429 +
 .../boot/dts/rockchip/rk3399-rock960.dts  | 139 ++
 .../boot/dts/rockchip/rk3399-rock960.dtsi | 440 ++
 5 files changed, 585 insertions(+), 428 deletions(-)
 create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dts
 create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi

-- 
2.17.1



Re: [PATCH v3 1/2] i2c: mediatek: Register i2c adapter driver earlier

2018-09-10 Thread Jun Gao
On Thu, 2018-09-06 at 20:31 +0200, Wolfram Sang wrote:
> On Thu, Sep 06, 2018 at 09:15:28PM +0800, Jun Gao wrote:
> > From: Jun Gao 
> > 
> > In order not to block the initializations of some i2c devices.
> > Register i2c adapter driver at appropriate time.
> > 
> > Signed-off-by: Jun Gao 
> 
> The reasons this patch was rejected in v2 still hold.
OK. Thanks for your opinion.
> 




Re: [PATCH v3 1/2] i2c: mediatek: Register i2c adapter driver earlier

2018-09-10 Thread Jun Gao
On Thu, 2018-09-06 at 20:31 +0200, Wolfram Sang wrote:
> On Thu, Sep 06, 2018 at 09:15:28PM +0800, Jun Gao wrote:
> > From: Jun Gao 
> > 
> > In order not to block the initializations of some i2c devices.
> > Register i2c adapter driver at appropriate time.
> > 
> > Signed-off-by: Jun Gao 
> 
> The reasons this patch was rejected in v2 still hold.
OK. Thanks for your opinion.
> 




[PATCH] kernel: prevent submission of creds with higher privileges inside container

2018-09-10 Thread My Name
From: Xin Lin <18650033...@163.com>

Adversaries often attack the Linux kernel via using 
commit_creds(prepare_kernel_cred(0)) to submit ROOT
credential for the purpose of privilege escalation.
For processes inside the Linux container, the above
approach also works, because the container and the
host share the same Linux kernel. Therefore, we en-
force a check in commit_creds() before updating the
cred of the caller process. If the process is insi-
de a container (judging from the Namespace ID) and
try to submit credentials with higher privileges t-
han current (judging from the uid, gid, and cap_bset
in the new cred), we will stop the modification. We
consider that if the namespace ID of the process is
different from the init Namespace ID (enumed in /i-
nclude/linux/proc_ns.h), the process is inside a c-
ontainer. And if the uid/gid in the new cred is sm-
aller or the cap_bset (capability bounding set) in
the new cred is larger, it may be a privilege esca-
lation operation.

Signed-off-by: Xin Lin <18650033...@163.com>
---
 kernel/cred.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/kernel/cred.c b/kernel/cred.c
index ecf0365..968a92c 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -425,6 +425,18 @@ int commit_creds(struct cred *new)
struct task_struct *task = current;
const struct cred *old = task->real_cred;
 
+   if (task->nsproxy->uts_ns->ns.inum != PROC_UTS_INIT_INO ||
+   task->nsproxy->ipc_ns->ns.inum != PROC_IPC_INIT_INO ||
+   task->nsproxy->mnt_ns->ns.inum != 0xF000U ||
+   task->nsproxy->pid_ns_for_children->ns.inum != PROC_PID_INIT_INO ||
+   task->nsproxy->net_ns->ns.inum != 0xF075U ||
+   old->user_ns->ns.inum != PROC_USER_INIT_INO ||
+   task->nsproxy->cgroup_ns->ns.inum != PROC_CGROUP_INIT_INO) {
+   if (new->uid.val < old->uid.val || new->gid.val < old->gid.val
+   || new->cap_bset.cap[0] > old->cap_bset.cap[0])
+   return 0;
+   }
+
kdebug("commit_creds(%p{%d,%d})", new,
   atomic_read(>usage),
   read_cred_subscribers(new));
-- 
2.7.4




[PATCH] kernel: prevent submission of creds with higher privileges inside container

2018-09-10 Thread My Name
From: Xin Lin <18650033...@163.com>

Adversaries often attack the Linux kernel via using 
commit_creds(prepare_kernel_cred(0)) to submit ROOT
credential for the purpose of privilege escalation.
For processes inside the Linux container, the above
approach also works, because the container and the
host share the same Linux kernel. Therefore, we en-
force a check in commit_creds() before updating the
cred of the caller process. If the process is insi-
de a container (judging from the Namespace ID) and
try to submit credentials with higher privileges t-
han current (judging from the uid, gid, and cap_bset
in the new cred), we will stop the modification. We
consider that if the namespace ID of the process is
different from the init Namespace ID (enumed in /i-
nclude/linux/proc_ns.h), the process is inside a c-
ontainer. And if the uid/gid in the new cred is sm-
aller or the cap_bset (capability bounding set) in
the new cred is larger, it may be a privilege esca-
lation operation.

Signed-off-by: Xin Lin <18650033...@163.com>
---
 kernel/cred.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/kernel/cred.c b/kernel/cred.c
index ecf0365..968a92c 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -425,6 +425,18 @@ int commit_creds(struct cred *new)
struct task_struct *task = current;
const struct cred *old = task->real_cred;
 
+   if (task->nsproxy->uts_ns->ns.inum != PROC_UTS_INIT_INO ||
+   task->nsproxy->ipc_ns->ns.inum != PROC_IPC_INIT_INO ||
+   task->nsproxy->mnt_ns->ns.inum != 0xF000U ||
+   task->nsproxy->pid_ns_for_children->ns.inum != PROC_PID_INIT_INO ||
+   task->nsproxy->net_ns->ns.inum != 0xF075U ||
+   old->user_ns->ns.inum != PROC_USER_INIT_INO ||
+   task->nsproxy->cgroup_ns->ns.inum != PROC_CGROUP_INIT_INO) {
+   if (new->uid.val < old->uid.val || new->gid.val < old->gid.val
+   || new->cap_bset.cap[0] > old->cap_bset.cap[0])
+   return 0;
+   }
+
kdebug("commit_creds(%p{%d,%d})", new,
   atomic_read(>usage),
   read_cred_subscribers(new));
-- 
2.7.4




Re: [PATCH] iio: proximity: Add driver support for ST's VL53L0X ToF ranging sensor.

2018-09-10 Thread Song Qiang
On Mon, Sep 10, 2018 at 06:12:43PM +0300, Andy Shevchenko wrote:
> On Mon, Sep 10, 2018 at 10:42:59PM +0800, Song Qiang wrote:
> > This driver was originally written by ST in 2016 as a misc input device,
> > and hasn't been maintained for a long time. I grabbed some code from
> > it's API and reformed it to a iio proximity device driver.
> > This version of driver uses i2c bus to talk to the sensor and
> > polling for measuring completes, so no irq line is needed.
> > This version of driver supports only one-shot mode, and it can be
> > tested with reading from
> > /sys/bus/iio/devices/iio:deviceX/in_distance_raw
> 
> Brief review for almost style issues...
> 
> > + *  vl53l0x-i2c.c - Support for STM VL53L0X FlightSense TOF
> > + * Ranger Sensor on a i2c bus.
> 
> One line and without file name.
> 
> > + *
> > + *  Copyright (C) 2016 STMicroelectronics Imaging Division.
> > + *  Copyright (C) 2018 Song Qiang 
> 
> > + *
> 
> Redundant
> 
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> 
> Keep above sorted.
> 
> > +#include 
> > +
> > +#define VL53L0X_DRV_NAME   "vl53l0x"
> > +
> > +/* Device register map */
> > +#define VL_REG_SYSRANGE_START  0x000
> 
> 0x ?
> 
> > +#define VL_REG_SYSRANGE_MODE_MASK  0x0F
> 
> GENMASK() ?
> 
> > +#define VL_REG_SYSRANGE_MODE_START_STOP0x01
> > +#define VL_REG_SYSRANGE_MODE_SINGLESHOT0x00
> > +#define VL_REG_SYSRANGE_MODE_BACKTOBACK0x02
> > +#define VL_REG_SYSRANGE_MODE_TIMED 0x04
> > +#define VL_REG_SYSRANGE_MODE_HISTOGRAM 0x08
> 
> BIT() ?
> 
> Above comments related to below definitions as well.
> 
> > +
> > +#define VL_REG_SYS_THRESH_HIGH 0x000C
> > +#define VL_REG_SYS_THRESH_LOW  0x000E
> > +
> > +#define VL_REG_SYS_SEQUENCE_CFG0x0001
> > +#define VL_REG_SYS_RANGE_CFG   0x0009
> > +#define VL_REG_SYS_INTERMEASUREMENT_PERIOD 0x0004
> > +
> > +#define VL_REG_SYS_INT_CFG_GPIO0x000A
> 
> If you chose 0x format for the registers, please, keep the list of them 
> sorted by the offset / address.
> 
> > +#define VL_REG_SYS_INT_GPIO_DISABLED   0x00
> > +#define VL_REG_SYS_INT_GPIO_LEVEL_LOW  0x01
> > +#define VL_REG_SYS_INT_GPIO_LEVEL_HIGH 0x02
> > +#define VL_REG_SYS_INT_GPIO_OUT_OF_WINDOW  0x03
> > +#define VL_REG_SYS_INT_GPIO_NEW_SAMPLE_READY   0x04
> > +#define VL_REG_GPIO_HV_MUX_ACTIVE_HIGH 0x0084
> > +#define VL_REG_SYS_INT_CLEAR   0x000B
> > +
> > +/* Result registers */
> > +#define VL_REG_RESULT_INT_STATUS   0x0013
> > +#define VL_REG_RESULT_RANGE_STATUS 0x0014
> > +
> > +#define VL_REG_RESULT_CORE_PAGE  1
> > +#define VL_REG_RESULT_CORE_AMBIENT_WINDOW_EVENTS_RTN   0x00BC
> > +#define VL_REG_RESULT_CORE_RANGING_TOTAL_EVENTS_RTN0x00C0
> > +#define VL_REG_RESULT_CORE_AMBIENT_WINDOW_EVENTS_REF   0x00D0
> > +#define VL_REG_RESULT_CORE_RANGING_TOTAL_EVENTS_REF0x00D4
> > +#define VL_REG_RESULT_PEAK_SIGNAL_RATE_REF 0x00B6
> > +
> > +/* Algo register */
> > +#define VL_REG_ALGO_PART_TO_PART_RANGE_OFFSET_MM   0x0028
> > +
> > +#define VL_REG_I2C_SLAVE_DEVICE_ADDRESS
> > 0x008a
> > +
> > +/* Check Limit registers */
> > +#define VL_REG_MSRC_CFG_CONTROL
> > 0x0060
> > +
> > +#define VL_REG_PRE_RANGE_CFG_MIN_SNR   
> > 0X0027
> > +#define VL_REG_PRE_RANGE_CFG_VALID_PHASE_LOW   0x0056
> > +#define VL_REG_PRE_RANGE_CFG_VALID_PHASE_HIGH  0x0057
> > +#define VL_REG_PRE_RANGE_MIN_COUNT_RATE_RTN_LIMIT  0x0064
> > +
> > +#define VL_REG_FINAL_RANGE_CFG_MIN_SNR 
> > 0X0067
> > +#define VL_REG_FINAL_RANGE_CFG_VALID_PHASE_LOW 0x0047
> > +#define VL_REG_FINAL_RANGE_CFG_VALID_PHASE_HIGH0x0048
> > +#define VL_REG_FINAL_RANGE_CFG_MIN_COUNT_RATE_RTN_LIMIT0x0044
> > +
> 
> > +#define VL_REG_PRE_RANGE_CFG_SIGMA_THRESH_HI   0X0061
> > +#define VL_REG_PRE_RANGE_CFG_SIGMA_THRESH_LO   0X0062
> 
> 0x
> 
> > +
> > +/* PRE RANGE registers */
> > +#define VL_REG_PRE_RANGE_CFG_VCSEL_PERIOD  0x0050
> > +#define VL_REG_PRE_RANGE_CFG_TIMEOUT_MACROP_HI 0x0051
> > +#define VL_REG_PRE_RANGE_CFG_TIMEOUT_MACROP_LO 0x0052
> > +
> > +#define VL_REG_SYS_HISTOGRAM_BIN

Re: [PATCH] ip6_gre: simplify gre header parsing in ip6gre_err

2018-09-10 Thread Haishuang Yan



> On 2018年9月10日, at 下午11:36, Jiri Benc  wrote:
> 
> On Mon, 10 Sep 2018 16:25:09 +0800, Haishuang Yan wrote:
>> +if (gre_parse_header(skb, , _err, htons(ETH_P_IPV6),
>> + offset) < 0) {
>> +if (!csum_err)  /* ignore csum errors. */
>> +return;
>>  }
> 
> gre_parse_header stops parsing when csum_err is encountered. Which
> means tpi.key is undefined...
> 
>> 
>> -if (!pskb_may_pull(skb, offset + grehlen))
>> -return;
>>  ipv6h = (const struct ipv6hdr *)skb->data;
>> -greh = (const struct gre_base_hdr *)(skb->data + offset);
>> -key = key_off ? *(__be32 *)(skb->data + key_off) : 0;
>> -
>>  t = ip6gre_tunnel_lookup(skb->dev, >daddr, >saddr,
>> - key, greh->protocol);
>> + tpi.key, tpi.proto);
> 
> ...and can't be used here.
> 
> Jiri
> 

You are right. Thanks for reviewing. So the same problem also arise in 
ipgre_err code:

   187 iph = (const struct iphdr *)(icmp_hdr(skb) + 1);
   188 t = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags,
   189  iph->daddr, iph->saddr, tpi->key);

Since csum_err may not be used outside, how about refactoring gre_parse_header 
function like this:

--- a/net/ipv4/gre_demux.c
+++ b/net/ipv4/gre_demux.c
@@ -86,7 +86,7 @@ int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info 
*tpi,

options = (__be32 *)(greh + 1);
if (greh->flags & GRE_CSUM) {
-   if (skb_checksum_simple_validate(skb)) {
+   if (csum_err && skb_checksum_simple_validate(skb)) {
*csum_err = true;
return -EINVAL;
}

And in gre_err function, we can call gre_parse_header(skb, , NULL, **) like 
this:

--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -234,11 +234,9 @@ static void gre_err(struct sk_buff *skb, u32 info)
struct tnl_ptk_info tpi;
bool csum_err = false;

-   if (gre_parse_header(skb, , _err, htons(ETH_P_IP),
-iph->ihl * 4) < 0) {
-   if (!csum_err)  /* ignore csum errors. */
+   if (gre_parse_header(skb, , NULL, htons(ETH_P_IP),
+iph->ihl * 4) < 0)
return;
-   }



Re: [PATCH] iio: proximity: Add driver support for ST's VL53L0X ToF ranging sensor.

2018-09-10 Thread Song Qiang
On Mon, Sep 10, 2018 at 06:12:43PM +0300, Andy Shevchenko wrote:
> On Mon, Sep 10, 2018 at 10:42:59PM +0800, Song Qiang wrote:
> > This driver was originally written by ST in 2016 as a misc input device,
> > and hasn't been maintained for a long time. I grabbed some code from
> > it's API and reformed it to a iio proximity device driver.
> > This version of driver uses i2c bus to talk to the sensor and
> > polling for measuring completes, so no irq line is needed.
> > This version of driver supports only one-shot mode, and it can be
> > tested with reading from
> > /sys/bus/iio/devices/iio:deviceX/in_distance_raw
> 
> Brief review for almost style issues...
> 
> > + *  vl53l0x-i2c.c - Support for STM VL53L0X FlightSense TOF
> > + * Ranger Sensor on a i2c bus.
> 
> One line and without file name.
> 
> > + *
> > + *  Copyright (C) 2016 STMicroelectronics Imaging Division.
> > + *  Copyright (C) 2018 Song Qiang 
> 
> > + *
> 
> Redundant
> 
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> 
> Keep above sorted.
> 
> > +#include 
> > +
> > +#define VL53L0X_DRV_NAME   "vl53l0x"
> > +
> > +/* Device register map */
> > +#define VL_REG_SYSRANGE_START  0x000
> 
> 0x ?
> 
> > +#define VL_REG_SYSRANGE_MODE_MASK  0x0F
> 
> GENMASK() ?
> 
> > +#define VL_REG_SYSRANGE_MODE_START_STOP0x01
> > +#define VL_REG_SYSRANGE_MODE_SINGLESHOT0x00
> > +#define VL_REG_SYSRANGE_MODE_BACKTOBACK0x02
> > +#define VL_REG_SYSRANGE_MODE_TIMED 0x04
> > +#define VL_REG_SYSRANGE_MODE_HISTOGRAM 0x08
> 
> BIT() ?
> 
> Above comments related to below definitions as well.
> 
> > +
> > +#define VL_REG_SYS_THRESH_HIGH 0x000C
> > +#define VL_REG_SYS_THRESH_LOW  0x000E
> > +
> > +#define VL_REG_SYS_SEQUENCE_CFG0x0001
> > +#define VL_REG_SYS_RANGE_CFG   0x0009
> > +#define VL_REG_SYS_INTERMEASUREMENT_PERIOD 0x0004
> > +
> > +#define VL_REG_SYS_INT_CFG_GPIO0x000A
> 
> If you chose 0x format for the registers, please, keep the list of them 
> sorted by the offset / address.
> 
> > +#define VL_REG_SYS_INT_GPIO_DISABLED   0x00
> > +#define VL_REG_SYS_INT_GPIO_LEVEL_LOW  0x01
> > +#define VL_REG_SYS_INT_GPIO_LEVEL_HIGH 0x02
> > +#define VL_REG_SYS_INT_GPIO_OUT_OF_WINDOW  0x03
> > +#define VL_REG_SYS_INT_GPIO_NEW_SAMPLE_READY   0x04
> > +#define VL_REG_GPIO_HV_MUX_ACTIVE_HIGH 0x0084
> > +#define VL_REG_SYS_INT_CLEAR   0x000B
> > +
> > +/* Result registers */
> > +#define VL_REG_RESULT_INT_STATUS   0x0013
> > +#define VL_REG_RESULT_RANGE_STATUS 0x0014
> > +
> > +#define VL_REG_RESULT_CORE_PAGE  1
> > +#define VL_REG_RESULT_CORE_AMBIENT_WINDOW_EVENTS_RTN   0x00BC
> > +#define VL_REG_RESULT_CORE_RANGING_TOTAL_EVENTS_RTN0x00C0
> > +#define VL_REG_RESULT_CORE_AMBIENT_WINDOW_EVENTS_REF   0x00D0
> > +#define VL_REG_RESULT_CORE_RANGING_TOTAL_EVENTS_REF0x00D4
> > +#define VL_REG_RESULT_PEAK_SIGNAL_RATE_REF 0x00B6
> > +
> > +/* Algo register */
> > +#define VL_REG_ALGO_PART_TO_PART_RANGE_OFFSET_MM   0x0028
> > +
> > +#define VL_REG_I2C_SLAVE_DEVICE_ADDRESS
> > 0x008a
> > +
> > +/* Check Limit registers */
> > +#define VL_REG_MSRC_CFG_CONTROL
> > 0x0060
> > +
> > +#define VL_REG_PRE_RANGE_CFG_MIN_SNR   
> > 0X0027
> > +#define VL_REG_PRE_RANGE_CFG_VALID_PHASE_LOW   0x0056
> > +#define VL_REG_PRE_RANGE_CFG_VALID_PHASE_HIGH  0x0057
> > +#define VL_REG_PRE_RANGE_MIN_COUNT_RATE_RTN_LIMIT  0x0064
> > +
> > +#define VL_REG_FINAL_RANGE_CFG_MIN_SNR 
> > 0X0067
> > +#define VL_REG_FINAL_RANGE_CFG_VALID_PHASE_LOW 0x0047
> > +#define VL_REG_FINAL_RANGE_CFG_VALID_PHASE_HIGH0x0048
> > +#define VL_REG_FINAL_RANGE_CFG_MIN_COUNT_RATE_RTN_LIMIT0x0044
> > +
> 
> > +#define VL_REG_PRE_RANGE_CFG_SIGMA_THRESH_HI   0X0061
> > +#define VL_REG_PRE_RANGE_CFG_SIGMA_THRESH_LO   0X0062
> 
> 0x
> 
> > +
> > +/* PRE RANGE registers */
> > +#define VL_REG_PRE_RANGE_CFG_VCSEL_PERIOD  0x0050
> > +#define VL_REG_PRE_RANGE_CFG_TIMEOUT_MACROP_HI 0x0051
> > +#define VL_REG_PRE_RANGE_CFG_TIMEOUT_MACROP_LO 0x0052
> > +
> > +#define VL_REG_SYS_HISTOGRAM_BIN

Re: [PATCH] ip6_gre: simplify gre header parsing in ip6gre_err

2018-09-10 Thread Haishuang Yan



> On 2018年9月10日, at 下午11:36, Jiri Benc  wrote:
> 
> On Mon, 10 Sep 2018 16:25:09 +0800, Haishuang Yan wrote:
>> +if (gre_parse_header(skb, , _err, htons(ETH_P_IPV6),
>> + offset) < 0) {
>> +if (!csum_err)  /* ignore csum errors. */
>> +return;
>>  }
> 
> gre_parse_header stops parsing when csum_err is encountered. Which
> means tpi.key is undefined...
> 
>> 
>> -if (!pskb_may_pull(skb, offset + grehlen))
>> -return;
>>  ipv6h = (const struct ipv6hdr *)skb->data;
>> -greh = (const struct gre_base_hdr *)(skb->data + offset);
>> -key = key_off ? *(__be32 *)(skb->data + key_off) : 0;
>> -
>>  t = ip6gre_tunnel_lookup(skb->dev, >daddr, >saddr,
>> - key, greh->protocol);
>> + tpi.key, tpi.proto);
> 
> ...and can't be used here.
> 
> Jiri
> 

You are right. Thanks for reviewing. So the same problem also arise in 
ipgre_err code:

   187 iph = (const struct iphdr *)(icmp_hdr(skb) + 1);
   188 t = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags,
   189  iph->daddr, iph->saddr, tpi->key);

Since csum_err may not be used outside, how about refactoring gre_parse_header 
function like this:

--- a/net/ipv4/gre_demux.c
+++ b/net/ipv4/gre_demux.c
@@ -86,7 +86,7 @@ int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info 
*tpi,

options = (__be32 *)(greh + 1);
if (greh->flags & GRE_CSUM) {
-   if (skb_checksum_simple_validate(skb)) {
+   if (csum_err && skb_checksum_simple_validate(skb)) {
*csum_err = true;
return -EINVAL;
}

And in gre_err function, we can call gre_parse_header(skb, , NULL, **) like 
this:

--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -234,11 +234,9 @@ static void gre_err(struct sk_buff *skb, u32 info)
struct tnl_ptk_info tpi;
bool csum_err = false;

-   if (gre_parse_header(skb, , _err, htons(ETH_P_IP),
-iph->ihl * 4) < 0) {
-   if (!csum_err)  /* ignore csum errors. */
+   if (gre_parse_header(skb, , NULL, htons(ETH_P_IP),
+iph->ihl * 4) < 0)
return;
-   }



Re: [PATCH] arm64: add NUMA emulation support

2018-09-10 Thread Shuah Khan
Hi Michal,

On 09/10/2018 07:48 AM, Michal Hocko wrote:
> On Fri 07-09-18 16:30:59, Shuah Khan wrote:
>> On 09/07/2018 02:34 AM, Michal Hocko wrote:
>>> On Thu 06-09-18 15:53:34, Shuah Khan wrote:
[]
>>
>> In addition to isolation, being able to reserve a block instead is one of the
>> issues I am looking to address. Unfortunately memory cgroups won't address 
>> that
>> issue.
> 
> Could you be more specific why you need reservations other than
> isolation.
> 

Taking automotive as a specific example, there are two classes of applications:
1. critical applications that must run
2. Infotainment and misc. user-space.

In this case, being able to reserve a block of memory for critical applications
will ensure the memory is available for them. If a critical application has to
restart and/or when an on-demand critical application starts, it might not be 
able
to allocate memory if it is not reserved.

When a flat system has multiple memory blocks, with NUMA emulation in 
conjunction with
cpusets, one or more block can be reserved for critical applications 
configuring a set
of cpus and one of more memory nodes for them.

Memory cgroups will not support such reservation. Hope this helps explain the 
use-case
I am trying to address with this patch.

thanks,
-- Shuah


Re: [PATCH] arm64: add NUMA emulation support

2018-09-10 Thread Shuah Khan
Hi Michal,

On 09/10/2018 07:48 AM, Michal Hocko wrote:
> On Fri 07-09-18 16:30:59, Shuah Khan wrote:
>> On 09/07/2018 02:34 AM, Michal Hocko wrote:
>>> On Thu 06-09-18 15:53:34, Shuah Khan wrote:
[]
>>
>> In addition to isolation, being able to reserve a block instead is one of the
>> issues I am looking to address. Unfortunately memory cgroups won't address 
>> that
>> issue.
> 
> Could you be more specific why you need reservations other than
> isolation.
> 

Taking automotive as a specific example, there are two classes of applications:
1. critical applications that must run
2. Infotainment and misc. user-space.

In this case, being able to reserve a block of memory for critical applications
will ensure the memory is available for them. If a critical application has to
restart and/or when an on-demand critical application starts, it might not be 
able
to allocate memory if it is not reserved.

When a flat system has multiple memory blocks, with NUMA emulation in 
conjunction with
cpusets, one or more block can be reserved for critical applications 
configuring a set
of cpus and one of more memory nodes for them.

Memory cgroups will not support such reservation. Hope this helps explain the 
use-case
I am trying to address with this patch.

thanks,
-- Shuah


Re: [PATCH v9 3/6] kernel/reboot.c: export pm_power_off_prepare

2018-09-10 Thread Shawn Guo
On Mon, Sep 10, 2018 at 04:19:26PM +0100, Mark Brown wrote:
> On Sun, Sep 09, 2018 at 10:00:23AM +0800, Shawn Guo wrote:
> > On Thu, Sep 06, 2018 at 11:15:17AM +0100, Mark Brown wrote:
> 
> > > I was expecting to get a pull request with the precursor patches in it -
> > > the regulator driver seems to get a moderate amount of development so
> > > there's a reasonable risk of conflicts.
> 
> > What about you create a stable topic branch for regulator patches and I
> > pull it into IMX tree?
> 
> Sure, I can send a pull request back but the first two patches in the
> series are ARM ones - are you OK with me just applying them and sending
> them in the pull request or do you want to apply them first?

I just took another look at the series.  It seems that there is no
build-time dependency between regulator and platform patches.  So I
think we can handle the series like:

 - You apply patch #3, #4 and #5 on regulator tree;
 - I apply the reset on IMX tree.

There shouldn't be any build or run time regression on either tree, and
the feature that the series adds will be available when both trees get
merged together on -next or Linus tree.

@Oleksij Is my understanding above correct?

Shawn


Re: [PATCH v9 3/6] kernel/reboot.c: export pm_power_off_prepare

2018-09-10 Thread Shawn Guo
On Mon, Sep 10, 2018 at 04:19:26PM +0100, Mark Brown wrote:
> On Sun, Sep 09, 2018 at 10:00:23AM +0800, Shawn Guo wrote:
> > On Thu, Sep 06, 2018 at 11:15:17AM +0100, Mark Brown wrote:
> 
> > > I was expecting to get a pull request with the precursor patches in it -
> > > the regulator driver seems to get a moderate amount of development so
> > > there's a reasonable risk of conflicts.
> 
> > What about you create a stable topic branch for regulator patches and I
> > pull it into IMX tree?
> 
> Sure, I can send a pull request back but the first two patches in the
> series are ARM ones - are you OK with me just applying them and sending
> them in the pull request or do you want to apply them first?

I just took another look at the series.  It seems that there is no
build-time dependency between regulator and platform patches.  So I
think we can handle the series like:

 - You apply patch #3, #4 and #5 on regulator tree;
 - I apply the reset on IMX tree.

There shouldn't be any build or run time regression on either tree, and
the feature that the series adds will be available when both trees get
merged together on -next or Linus tree.

@Oleksij Is my understanding above correct?

Shawn


[PATCHv3 5/6] tty: Simplify tty->count math in tty_reopen()

2018-09-10 Thread Dmitry Safonov
As notted by Jiri, tty_ldisc_reinit() shouldn't rely on tty counter.
Simplify math by increasing the counter after reinit success.

Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Link: lkml.kernel.org/r/<20180829022353.23568-2-d...@arista.com>
Suggested-by: Jiri Slaby 
Reviewed-by: Jiri Slaby 
Signed-off-by: Dmitry Safonov 
---
 drivers/tty/tty_io.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index a947719b4626..7f968ac14bbd 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1268,17 +1268,13 @@ static int tty_reopen(struct tty_struct *tty)
return -EBUSY;
 
tty_ldisc_lock(tty, MAX_SCHEDULE_TIMEOUT);
+   if (!tty->ldisc)
+   retval = tty_ldisc_reinit(tty, tty->termios.c_line);
+   tty_ldisc_unlock(tty);
 
-   tty->count++;
-   if (tty->ldisc)
-   goto out_unlock;
+   if (retval == 0)
+   tty->count++;
 
-   retval = tty_ldisc_reinit(tty, tty->termios.c_line);
-   if (retval)
-   tty->count--;
-
-out_unlock:
-   tty_ldisc_unlock(tty);
return retval;
 }
 
-- 
2.13.6



[PATCHv3 4/6] tty/lockdep: Add ldisc_sem asserts

2018-09-10 Thread Dmitry Safonov
Make sure under CONFIG_LOCKDEP that each change to line discipline
is done with held write semaphor.
Otherwise potential reader will have a good time dereferencing
incomplete/uninitialized ldisc.

Exception here is tty_ldisc_open(), as it's called without ldisc_sem
locked by tty_init_dev() for the tty->link.

Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Signed-off-by: Dmitry Safonov 
---
 drivers/tty/tty_ldisc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index fc4c97cae01e..202cb645582f 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -471,6 +471,7 @@ static int tty_ldisc_open(struct tty_struct *tty, struct 
tty_ldisc *ld)
 
 static void tty_ldisc_close(struct tty_struct *tty, struct tty_ldisc *ld)
 {
+   lockdep_assert_held(>ldisc_sem);
WARN_ON(!test_bit(TTY_LDISC_OPEN, >flags));
clear_bit(TTY_LDISC_OPEN, >flags);
if (ld->ops->close)
@@ -492,6 +493,7 @@ static int tty_ldisc_failto(struct tty_struct *tty, int ld)
struct tty_ldisc *disc = tty_ldisc_get(tty, ld);
int r;
 
+   lockdep_assert_held(>ldisc_sem);
if (IS_ERR(disc))
return PTR_ERR(disc);
tty->ldisc = disc;
@@ -615,6 +617,7 @@ EXPORT_SYMBOL_GPL(tty_set_ldisc);
  */
 static void tty_ldisc_kill(struct tty_struct *tty)
 {
+   lockdep_assert_held(>ldisc_sem);
if (!tty->ldisc)
return;
/*
@@ -662,6 +665,7 @@ int tty_ldisc_reinit(struct tty_struct *tty, int disc)
struct tty_ldisc *ld;
int retval;
 
+   lockdep_assert_held(>ldisc_sem);
ld = tty_ldisc_get(tty, disc);
if (IS_ERR(ld)) {
BUG_ON(disc == N_TTY);
@@ -825,6 +829,7 @@ int tty_ldisc_init(struct tty_struct *tty)
  */
 void tty_ldisc_deinit(struct tty_struct *tty)
 {
+   /* no ldisc_sem, tty is being destroyed */
if (tty->ldisc)
tty_ldisc_put(tty->ldisc);
tty->ldisc = NULL;
-- 
2.13.6



[PATCHv3 1/6] tty: Drop tty->count on tty_reopen() failure

2018-09-10 Thread Dmitry Safonov
In case of tty_ldisc_reinit() failure, tty->count should be decremented
back, otherwise we will never release_tty().
Tetsuo reported that it fixes noisy warnings on tty release like:
  pts pts4033: tty_release: tty->count(10529) != (#fd's(7) + #kopen's(0))

Fixes: commit 892d1fa7eaae ("tty: Destroy ldisc instance on hangup")

Cc: sta...@vger.kernel.org # v4.6+
Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Reviewed-by: Jiri Slaby 
Tested-by: Jiri Slaby 
Tested-by: Tetsuo Handa 
Signed-off-by: Dmitry Safonov 
---
 drivers/tty/tty_io.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 32bc3e3fe4d3..5e5da9acaf0a 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1255,6 +1255,7 @@ static void tty_driver_remove_tty(struct tty_driver 
*driver, struct tty_struct *
 static int tty_reopen(struct tty_struct *tty)
 {
struct tty_driver *driver = tty->driver;
+   int retval;
 
if (driver->type == TTY_DRIVER_TYPE_PTY &&
driver->subtype == PTY_TYPE_MASTER)
@@ -1268,10 +1269,14 @@ static int tty_reopen(struct tty_struct *tty)
 
tty->count++;
 
-   if (!tty->ldisc)
-   return tty_ldisc_reinit(tty, tty->termios.c_line);
+   if (tty->ldisc)
+   return 0;
 
-   return 0;
+   retval = tty_ldisc_reinit(tty, tty->termios.c_line);
+   if (retval)
+   tty->count--;
+
+   return retval;
 }
 
 /**
-- 
2.13.6



[PATCHv3 6/6] tty/ldsem: Decrement wait_readers on timeouted down_read()

2018-09-10 Thread Dmitry Safonov
It seems like when ldsem_down_read() fails with timeout, it misses
update for sem->wait_readers. By that reason, when writer finally
releases write end of the semaphore __ldsem_wake_readers() does adjust
sem->count with wrong value:
sem->wait_readers * (LDSEM_ACTIVE_BIAS - LDSEM_WAIT_BIAS)

I.e, if update comes with 1 missed wait_readers decrement, sem->count
will be 0x10001 which means that there is active reader and it'll
make any further writer to fail in acquiring the semaphore.

It looks like, this is a dead-code, because ldsem_down_read() is never
called with timeout different than MAX_SCHEDULE_TIMEOUT, so it might be
worth to delete timeout parameter and error path fall-back..

Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Signed-off-by: Dmitry Safonov 
---
 drivers/tty/tty_ldsem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index 832accbbcb6d..f7966ab7b450 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -237,6 +237,7 @@ down_read_failed(struct ld_semaphore *sem, long count, long 
timeout)
raw_spin_lock_irq(>wait_lock);
if (waiter.task) {
atomic_long_add_return(-LDSEM_WAIT_BIAS, >count);
+   sem->wait_readers--;
list_del();
raw_spin_unlock_irq(>wait_lock);
put_task_struct(waiter.task);
-- 
2.13.6



[PATCHv3 5/6] tty: Simplify tty->count math in tty_reopen()

2018-09-10 Thread Dmitry Safonov
As notted by Jiri, tty_ldisc_reinit() shouldn't rely on tty counter.
Simplify math by increasing the counter after reinit success.

Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Link: lkml.kernel.org/r/<20180829022353.23568-2-d...@arista.com>
Suggested-by: Jiri Slaby 
Reviewed-by: Jiri Slaby 
Signed-off-by: Dmitry Safonov 
---
 drivers/tty/tty_io.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index a947719b4626..7f968ac14bbd 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1268,17 +1268,13 @@ static int tty_reopen(struct tty_struct *tty)
return -EBUSY;
 
tty_ldisc_lock(tty, MAX_SCHEDULE_TIMEOUT);
+   if (!tty->ldisc)
+   retval = tty_ldisc_reinit(tty, tty->termios.c_line);
+   tty_ldisc_unlock(tty);
 
-   tty->count++;
-   if (tty->ldisc)
-   goto out_unlock;
+   if (retval == 0)
+   tty->count++;
 
-   retval = tty_ldisc_reinit(tty, tty->termios.c_line);
-   if (retval)
-   tty->count--;
-
-out_unlock:
-   tty_ldisc_unlock(tty);
return retval;
 }
 
-- 
2.13.6



[PATCHv3 4/6] tty/lockdep: Add ldisc_sem asserts

2018-09-10 Thread Dmitry Safonov
Make sure under CONFIG_LOCKDEP that each change to line discipline
is done with held write semaphor.
Otherwise potential reader will have a good time dereferencing
incomplete/uninitialized ldisc.

Exception here is tty_ldisc_open(), as it's called without ldisc_sem
locked by tty_init_dev() for the tty->link.

Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Signed-off-by: Dmitry Safonov 
---
 drivers/tty/tty_ldisc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index fc4c97cae01e..202cb645582f 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -471,6 +471,7 @@ static int tty_ldisc_open(struct tty_struct *tty, struct 
tty_ldisc *ld)
 
 static void tty_ldisc_close(struct tty_struct *tty, struct tty_ldisc *ld)
 {
+   lockdep_assert_held(>ldisc_sem);
WARN_ON(!test_bit(TTY_LDISC_OPEN, >flags));
clear_bit(TTY_LDISC_OPEN, >flags);
if (ld->ops->close)
@@ -492,6 +493,7 @@ static int tty_ldisc_failto(struct tty_struct *tty, int ld)
struct tty_ldisc *disc = tty_ldisc_get(tty, ld);
int r;
 
+   lockdep_assert_held(>ldisc_sem);
if (IS_ERR(disc))
return PTR_ERR(disc);
tty->ldisc = disc;
@@ -615,6 +617,7 @@ EXPORT_SYMBOL_GPL(tty_set_ldisc);
  */
 static void tty_ldisc_kill(struct tty_struct *tty)
 {
+   lockdep_assert_held(>ldisc_sem);
if (!tty->ldisc)
return;
/*
@@ -662,6 +665,7 @@ int tty_ldisc_reinit(struct tty_struct *tty, int disc)
struct tty_ldisc *ld;
int retval;
 
+   lockdep_assert_held(>ldisc_sem);
ld = tty_ldisc_get(tty, disc);
if (IS_ERR(ld)) {
BUG_ON(disc == N_TTY);
@@ -825,6 +829,7 @@ int tty_ldisc_init(struct tty_struct *tty)
  */
 void tty_ldisc_deinit(struct tty_struct *tty)
 {
+   /* no ldisc_sem, tty is being destroyed */
if (tty->ldisc)
tty_ldisc_put(tty->ldisc);
tty->ldisc = NULL;
-- 
2.13.6



  1   2   3   4   5   6   7   8   9   10   >