Re: [PATCH 18/22] perf scripts python: exported-sql-viewer.py: Add IPC information to the Branch reports
On 31/05/19 7:44 PM, Arnaldo Carvalho de Melo wrote: > Em Mon, May 20, 2019 at 02:37:24PM +0300, Adrian Hunter escreveu: >> Enhance the "All branches" and "Selected branches" reports to display IPC >> information if it is available. > > So, testing this I noticed that it all starts with the left arrow in every > line, that should mean there is some tree there, i.e. look at all those ▶ > symbols: > > Time CPU Command PID TID Branch Type In Tx Insn > Cnt Cyc Cnt IPC Branch > ▶ 187836112195670 7 simple-retpolin 23003 23003 trace begin No 0 > 00 0 unknown (unknown) -> 7f6f33d4f110 _start > (ld-2.28.so) > ▶ 187836112195987 7 simple-retpolin 23003 23003 trace endNo 0 > 883 07f6f33d4f110 _start (ld-2.28.so) -> 0 unknown (unknown) > ▶ 187836112199189 7 simple-retpolin 23003 23003 trace begin No 0 > 00 0 unknown (unknown) -> 7f6f33d4f110 _start > (ld-2.28.so) > ▶ 187836112199189 7 simple-retpolin 23003 23003 call No 0 > 007f6f33d4f113 _start+0x3 (ld-2.28.so) -> 7f6f33d4ff50 > _dl_start (ld-2.28.so) > ▶ 187836112199544 7 simple-retpolin 23003 23003 trace endNo 17 > 996 0.02 7f6f33d4ff73 _dl_start+0x23 (ld-2.28.so) -> 0 unknown > (unknown) > ▶ 187836112200939 7 simple-retpolin 23003 23003 trace begin No 0 > 00 0 unknown (unknown) -> 7f6f33d4ff73 _dl_start+0x23 > (ld-2.28.so) > ▶ 187836112201229 7 simple-retpolin 23003 23003 trace endNo 1 > 816 0.00 7f6f33d4ff7a _dl_start+0x2a (ld-2.28.so) -> 0 unknown > (unknown) > ▶ 187836112203500 7 simple-retpolin 23003 23003 trace begin No 0 > 00 0 unknown (unknown) -> 7f6f33d4ff7a _dl_start+0x2a > (ld-2.28.so) > > But if you click on it, that ▶ disappears and a new click doesn't make it > reappear, looks buggy, but seems like a minor oddity that will not prevent me > from applying it now, please check and provide a fix on top of this, The arrow is to display disssassembly, but only if xed is installed and the object is in the buildid cache. Unfortunately, it is not efficient to determine if there is anything to expand before the user clicks.
RE: [External] Re: linux kernel page allocation failure and tuning of page cache
-Original Message- From: Matthew Wilcox [mailto:wi...@infradead.org] Sent: Saturday, June 1, 2019 1:01 AM To: Nagal, Amit UTC CCS Cc: linux-kernel@vger.kernel.org; linux...@kvack.org; CHAWLA, RITU UTC CCS ; net...@vger.kernel.org Subject: [External] Re: linux kernel page allocation failure and tuning of page cache > 1) the platform is low memory platform having memory 64MB. > > 2) we are doing around 45MB TCP data transfer from PC to target using netcat > utility .On Target , a process receives data over socket and writes the data > to flash disk . >I think your network is faster than your disk ... Ok . I need to check it . But how does this affect page reclaim procedure . > 5) sometimes , we observed kernel memory getting exhausted as page allocation > failure happens in kernel with the backtrace is printed below : > # [ 775.947949] nc.traditional: page allocation failure: order:0, > mode:0x2080020(GFP_ATOMIC) >We're in the soft interrupt handler at this point, so we have very few options >for freeing memory; we can't wait for I/O to complete, for example. >That said, this is a TCP connection. We could drop the packet silently >without such a noisy warning. Perhaps just collect statistics on how many >packets we dropped due to a low memory situation. I will collect statistics for it . > [ 775.956362] CPU: 0 PID: 1288 Comm: nc.traditional Tainted: G O > 4.9.123-pic6-g31a13de-dirty #19 > [ 775.966085] Hardware name: Generic R7S72100 (Flattened Device Tree) > [ 775.972501] [] (unwind_backtrace) from [] > (show_stack+0xb/0xc) [ 775.980118] [] (show_stack) from > [] (warn_alloc+0x89/0xba) [ 775.987361] [] > (warn_alloc) from [] (__alloc_pages_nodemask+0x1eb/0x634) > [ 775.995790] [] (__alloc_pages_nodemask) from [] > (__alloc_page_frag+0x39/0xde) [ 776.004685] [] > (__alloc_page_frag) from [] (__netdev_alloc_skb+0x51/0xb0) [ > 776.013217] [] (__netdev_alloc_skb) from [] > (sh_eth_poll+0xbf/0x3c0) [ 776.021342] [] (sh_eth_poll) > from [] (net_rx_action+0x77/0x170) [ 776.029051] > [] (net_rx_action) from [] > (__do_softirq+0x107/0x160) [ 776.036896] [] (__do_softirq) > from [] (irq_exit+0x5d/0x80) [ 776.044165] [] > (irq_exit) from [] (__handle_domain_irq+0x57/0x8c) [ 776.052007] > [] (__handle_domain_irq) from [] > (gic_handle_irq+0x31/0x48) [ 776.060362] [] (gic_handle_irq) from > [] (__irq_svc+0x65/0xac) [ 776.067835] Exception stack(0xc1cafd70 > to 0xc1cafdb8) > [ 776.072876] fd60: 0002751c c1dec6a0 > 000c 521c3be5 > [ 776.081042] fd80: 56feb08e f64823a6 ffb35f7b feab513d f9cb0643 > 056c c1caff10 e000 [ 776.089204] fda0: b1f49160 c1cafdc4 > c180c677 c0234ace 200e0033 [ 776.095816] [] > (__irq_svc) from [] (__copy_to_user_std+0x7e/0x430) [ > 776.103796] [] (__copy_to_user_std) from [] > (copy_page_to_iter+0x105/0x250) [ 776.112503] [] > (copy_page_to_iter) from [] > (skb_copy_datagram_iter+0xa3/0x108) > [ 776.121469] [] (skb_copy_datagram_iter) from [] > (tcp_recvmsg+0x3ab/0x5f4) [ 776.130045] [] (tcp_recvmsg) > from [] (inet_recvmsg+0x21/0x2c) [ 776.137576] [] > (inet_recvmsg) from [] (sock_read_iter+0x51/0x6e) [ > 776.145384] [] (sock_read_iter) from [] > (__vfs_read+0x97/0xb0) [ 776.152967] [] (__vfs_read) from > [] (vfs_read+0x51/0xb0) [ 776.159983] [] > (vfs_read) from [] (SyS_read+0x27/0x52) [ 776.166837] [] > (SyS_read) from [] (ret_fast_syscall+0x1/0x54) [ 776.174308] > Mem-Info: > [ 776.176650] active_anon:2037 inactive_anon:23 isolated_anon:0 [ > 776.176650] active_file:2636 inactive_file:7391 isolated_file:32 [ > 776.176650] unevictable:0 dirty:1366 writeback:1281 unstable:0 >Almost all the dirty pages are under writeback at this point. > [ 776.176650] slab_reclaimable:719 slab_unreclaimable:724 [ > 776.176650] mapped:1990 shmem:26 pagetables:159 bounce:0 [ > 776.176650] free:373 free_pcp:6 free_cma:0 >We have 373 free pages, but refused to allocate one of them to GFP_ATOMIC? >I don't understand why that failed. We also didn't try to steal an >inactive_file or inactive_anon page, which seems like an obvious thing we >might want to do. Yes that's where I am concerned . we do not have swap device so I am assuming perhaps inactive_anon pages are not stolen , but inactive_file pages could have been used . > [ 776.209062] Node 0 active_anon:8148kB inactive_anon:92kB > active_file:10544kB inactive_file:29564kB unevictable:0kB > isolated(anon):0kB isolated(file):128kB mapped:7960kB dirty:5464kB > writeback:5124kB shmem:104kB writeback_tmp:0kB unstable:0kB > pages_scanned:0 all_unreclaimable? no [ 776.233602] Normal > free:1492kB min:964kB low:1204kB high:1444kB active_anon:8148kB > inactive_anon:92kB active_file:10544kB inactive_file:29564kB > unevictable:0kB writepending:10588kB present:65536kB managed:59304kB > mlocked:0kB slab_reclaimable:2876kB slab_unreclaimable:2896kB > kernel_stack:1152kB
[PATCH v1 2/4] mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM
The local variable references in shrink_page_list is PAGEREF_RECLAIM_CLEAN as default. It is for preventing to reclaim dirty pages when CMA try to migrate pages. Strictly speaking, we don't need it because CMA didn't allow to write out by .may_writepage = 0 in reclaim_clean_pages_from_list. Moreover, it has a problem to prevent anonymous pages's swap out even though force_reclaim = true in shrink_page_list on upcoming patch. So this patch makes references's default value to PAGEREF_RECLAIM and rename force_reclaim with ignore_references to make it more clear. This is a preparatory work for next patch. * RFCv1 * use ignore_referecnes as parameter name - hannes Acked-by: Johannes Weiner Signed-off-by: Minchan Kim --- mm/vmscan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 84dcb651d05c..0973a46a0472 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1102,7 +1102,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct scan_control *sc, enum ttu_flags ttu_flags, struct reclaim_stat *stat, - bool force_reclaim) + bool ignore_references) { LIST_HEAD(ret_pages); LIST_HEAD(free_pages); @@ -1116,7 +1116,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct address_space *mapping; struct page *page; int may_enter_fs; - enum page_references references = PAGEREF_RECLAIM_CLEAN; + enum page_references references = PAGEREF_RECLAIM; bool dirty, writeback; unsigned int nr_pages; @@ -1247,7 +1247,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, } } - if (!force_reclaim) + if (!ignore_references) references = page_check_references(page, sc); switch (references) { -- 2.22.0.rc1.311.g5d7573a151-goog
[PATCH v1 3/4] mm: account nr_isolated_xxx in [isolate|putback]_lru_page
The isolate counting is pecpu counter so it would be not huge gain to work them by batch. Rather than complicating to make them batch, let's make it more stright-foward via adding the counting logic into [isolate|putback]_lru_page API. Link: http://lkml.kernel.org/r/20190531165927.ga20...@cmpxchg.org Suggested-by: Johannes Weiner Signed-off-by: Minchan Kim --- mm/compaction.c | 2 -- mm/gup.c| 7 +-- mm/khugepaged.c | 3 --- mm/memory-failure.c | 3 --- mm/memory_hotplug.c | 4 mm/mempolicy.c | 6 +- mm/migrate.c| 37 - mm/vmscan.c | 22 -- 8 files changed, 26 insertions(+), 58 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9e1b9acb116b..c6591682deda 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -982,8 +982,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, /* Successfully isolated */ del_page_from_lru_list(page, lruvec, page_lru(page)); - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); isolate_success: list_add(>lru, >migratepages); diff --git a/mm/gup.c b/mm/gup.c index 63ac50e48072..2d9a9bc358c7 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1360,13 +1360,8 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk, drain_allow = false; } - if (!isolate_lru_page(head)) { + if (!isolate_lru_page(head)) list_add_tail(>lru, _page_list); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + - page_is_file_cache(head), - hpage_nr_pages(head)); - } } } } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a335f7c1fac4..3359df994fb4 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -503,7 +503,6 @@ void __khugepaged_exit(struct mm_struct *mm) static void release_pte_page(struct page *page) { - dec_node_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); unlock_page(page); putback_lru_page(page); } @@ -602,8 +601,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_DEL_PAGE_LRU; goto out; } - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageLRU(page), page); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index bc749265a8f3..2187bad7ceff 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1796,9 +1796,6 @@ static int __soft_offline_page(struct page *page, int flags) * so use !__PageMovable instead for LRU page's mapping * cannot have PAGE_MAPPING_MOVABLE. */ - if (!__PageMovable(page)) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); list_add(>lru, ); ret = migrate_pages(, new_page, NULL, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a88c5f334e5a..a41bea24d0c9 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1390,10 +1390,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE); if (!ret) { /* Success */ list_add_tail(>lru, ); - if (!__PageMovable(page)) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); - } else { pr_warn("failed to isolate pfn %lx\n", pfn); dump_page(page, "isolation failed"); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 5b3bf1747c19..cfb0590f69bb 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -948,12 +948,8 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist, * Avoid migrating a page that is shared with others. */ if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) { - if (!isolate_lru_page(head)) { + if (!isolate_lru_page(head)) list_add_tail(>lru, pagelist); -
[PATCH v1 0/4] Introduce MADV_COLD and MADV_PAGEOUT
This patch is part of previous series: https://lore.kernel.org/lkml/20190531064313.193437-1-minc...@kernel.org/T/#u Originally, it was created for external madvise hinting feature. https://lkml.org/lkml/2019/5/31/463 Michal wanted to separte the discussion from external hinting interface so this patchset includes only first part of my entire patchset - introduce MADV_COLD and MADV_PAGEOUT hint to madvise. However, I keep entire description for others for easier understanding why this kinds of hint was born. Thanks. This patchset is against on next-20190530. Below is description of previous entire patchset. = &< = - Background The Android terminology used for forking a new process and starting an app from scratch is a cold start, while resuming an existing app is a hot start. While we continually try to improve the performance of cold starts, hot starts will always be significantly less power hungry as well as faster so we are trying to make hot start more likely than cold start. To increase hot start, Android userspace manages the order that apps should be killed in a process called ActivityManagerService. ActivityManagerService tracks every Android app or service that the user could be interacting with at any time and translates that into a ranked list for lmkd(low memory killer daemon). They are likely to be killed by lmkd if the system has to reclaim memory. In that sense they are similar to entries in any other cache. Those apps are kept alive for opportunistic performance improvements but those performance improvements will vary based on the memory requirements of individual workloads. - Problem Naturally, cached apps were dominant consumers of memory on the system. However, they were not significant consumers of swap even though they are good candidate for swap. Under investigation, swapping out only begins once the low zone watermark is hit and kswapd wakes up, but the overall allocation rate in the system might trip lmkd thresholds and cause a cached process to be killed(we measured performance swapping out vs. zapping the memory by killing a process. Unsurprisingly, zapping is 10x times faster even though we use zram which is much faster than real storage) so kill from lmkd will often satisfy the high zone watermark, resulting in very few pages actually being moved to swap. - Approach The approach we chose was to use a new interface to allow userspace to proactively reclaim entire processes by leveraging platform information. This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages that are known to be cold from userspace and to avoid races with lmkd by reclaiming apps as soon as they entered the cached state. Additionally, it could provide many chances for platform to use much information to optimize memory efficiency. To achieve the goal, the patchset introduce two new options for madvise. One is MADV_COLD which will deactivate activated pages and the other is MADV_PAGEOUT which will reclaim private pages instantly. These new options complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to gain some free memory space. MADV_PAGEOUT is similar to MADV_DONTNEED in a way that it hints the kernel that memory region is not currently needed and should be reclaimed immediately; MADV_COLD is similar to MADV_FREE in a way that it hints the kernel that memory region is not currently needed and should be reclaimed when memory pressure rises. This approach is similar in spirit to madvise(MADV_WONTNEED), but the information required to make the reclaim decision is not known to the app. Instead, it is known to a centralized userspace daemon, and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the concern, this patch introduces new syscall - struct pr_madvise_param { int size; /* the size of this structure */ int cookie; /* reserved to support atomicity */ int nr_elem;/* count of below arrary fields */ int __user *hints; /* hints for each range */ /* to store result of each operation */ const struct iovec __user *results; /* input address ranges */ const struct iovec __user *ranges; }; int process_madvise(int pidfd, struct pr_madvise_param *u_param, unsigned long flags); The syscall get pidfd to give hints to external process and provides pair of result/ranges vector arguments so that it could give several hints to each address range all at once. It also has cookie variable to support atomicity of the API for address ranges operations. IOW, if target process changes address space since monitor process has parsed address ranges via map_files or maps, the API can detect the race so could cancel entire address space operation. It's not implemented yet. Daniel Colascione suggested a idea(Please read
[PATCH v1 1/4] mm: introduce MADV_COLD
When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive files's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end. So it's better to move inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning of cold anonymous if system doesn't have a swap device. All of error rule is same with MADV_DONTNEED. Note: This hint works with only private pages(IOW, page_mapcount(page) < 2) because shared page could have more chance to be accessed from other processes sharing the page although the caller reset the reference bits. It ends up preventing the reclaim of the page and wastes CPU cycle. * RFCv2 * add more description - mhocko * RFCv1 * renaming from MADV_COOL to MADV_COLD - hannes * internal review * use clear_page_youn in deactivate_page - joelaf * Revise the description - surenb * Renaming from MADV_WARM to MADV_COOL - surenb Signed-off-by: Minchan Kim --- include/linux/page-flags.h | 1 + include/linux/page_idle.h | 15 include/linux/swap.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/internal.h | 2 +- mm/madvise.c | 115 - mm/oom_kill.c | 2 +- mm/swap.c | 43 + 8 files changed, 176 insertions(+), 4 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 9f8712a4b1a5..58b06654c8dd 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -424,6 +424,7 @@ static inline bool set_hwpoison_free_buddy_page(struct page *page) TESTPAGEFLAG(Young, young, PF_ANY) SETPAGEFLAG(Young, young, PF_ANY) TESTCLEARFLAG(Young, young, PF_ANY) +CLEARPAGEFLAG(Young, young, PF_ANY) PAGEFLAG(Idle, idle, PF_ANY) #endif diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h index 1e894d34bdce..f3f43b317150 100644 --- a/include/linux/page_idle.h +++ b/include/linux/page_idle.h @@ -19,6 +19,11 @@ static inline void set_page_young(struct page *page) SetPageYoung(page); } +static inline void clear_page_young(struct page *page) +{ + ClearPageYoung(page); +} + static inline bool test_and_clear_page_young(struct page *page) { return TestClearPageYoung(page); @@ -65,6 +70,16 @@ static inline void set_page_young(struct page *page) set_bit(PAGE_EXT_YOUNG, _ext->flags); } +static void clear_page_young(struct page *page) +{ + struct page_ext *page_ext = lookup_page_ext(page); + + if (unlikely(!page_ext)) + return; + + clear_bit(PAGE_EXT_YOUNG, _ext->flags); +} + static inline bool test_and_clear_page_young(struct page *page) { struct page_ext *page_ext = lookup_page_ext(page); diff --git a/include/linux/swap.h b/include/linux/swap.h index de2c67a33b7e..0ce997edb8bb 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -340,6 +340,7 @@ extern void lru_add_drain_cpu(int cpu); extern void lru_add_drain_all(void); extern void rotate_reclaimable_page(struct page *page); extern void deactivate_file_page(struct page *page); +extern void deactivate_page(struct page *page); extern void mark_page_lazyfree(struct page *page); extern void swap_setup(void); diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index bea0278f65ab..1190f4e7f7b9 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -43,6 +43,7 @@ #define MADV_SEQUENTIAL2 /* expect sequential page references */ #define MADV_WILLNEED 3 /* will need these pages */ #define MADV_DONTNEED 4 /* don't need these pages */ +#define
[PATCH v1 4/4] mm: introduce MADV_PAGEOUT
When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. All of error rule is same with MADV_DONTNEED. Note: This hint works with only private pages(IOW, page_mapcount(page) < 2) because shared page could have more chance to be accessed from other processes sharing the page so that it could cause major fault soon, which is inefficient. * RFC v2 * make reclaim_pages simple via factoring out isolate logic - hannes * RFCv1 * rename from MADV_COLD to MADV_PAGEOUT - hannes * bail out if process is being killed - Hillf * fix reclaim_pages bugs - Hillf Signed-off-by: Minchan Kim --- include/linux/swap.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/madvise.c | 126 + mm/vmscan.c| 34 +++ 4 files changed, 162 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index 0ce997edb8bb..063c0c1e112b 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -365,6 +365,7 @@ extern int vm_swappiness; extern int remove_mapping(struct address_space *mapping, struct page *page); extern unsigned long vm_total_pages; +extern unsigned long reclaim_pages(struct list_head *page_list); #ifdef CONFIG_NUMA extern int node_reclaim_mode; extern int sysctl_min_unmapped_ratio; diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 1190f4e7f7b9..92e347a89ddc 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -44,6 +44,7 @@ #define MADV_WILLNEED 3 /* will need these pages */ #define MADV_DONTNEED 4 /* don't need these pages */ #define MADV_COLD 5 /* deactivatie these pages */ +#define MADV_PAGEOUT 6 /* reclaim these pages */ /* common parameters: try to keep these consistent across architectures */ #define MADV_FREE 8 /* free pages only if memory pressure */ diff --git a/mm/madvise.c b/mm/madvise.c index ab158766858a..b010249cb8b6 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -41,6 +41,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_WILLNEED: case MADV_DONTNEED: case MADV_COLD: + case MADV_PAGEOUT: case MADV_FREE: return 0; default: @@ -415,6 +416,128 @@ static long madvise_cold(struct vm_area_struct *vma, return 0; } +static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + pte_t *orig_pte, *pte, ptent; + spinlock_t *ptl; + LIST_HEAD(page_list); + struct page *page; + int isolated = 0; + struct vm_area_struct *vma = walk->vma; + unsigned long next; + + if (fatal_signal_pending(current)) + return -EINTR; + + next = pmd_addr_end(addr, end); + if (pmd_trans_huge(*pmd)) { + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + if (is_huge_zero_pmd(*pmd)) + goto huge_unlock; + + page = pmd_page(*pmd); + if (page_mapcount(page) > 1) + goto huge_unlock; + + if (next - addr != HPAGE_PMD_SIZE) { + int err; + + get_page(page); + spin_unlock(ptl); + lock_page(page); + err = split_huge_page(page); + unlock_page(page); + put_page(page); + if (!err) + goto regular_page; + return 0; + } + + if (isolate_lru_page(page)) + goto huge_unlock; + + list_add(>lru, _list); +huge_unlock: + spin_unlock(ptl); + reclaim_pages(_list); + return 0; + } + + if (pmd_trans_unstable(pmd)) + return 0; +regular_page: + orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, ); + for (pte = orig_pte; addr < end; pte++, addr += PAGE_SIZE) { + ptent = *pte; + if (!pte_present(ptent)) + continue; + + page = vm_normal_page(vma, addr, ptent); + if (!page) +
Re: [PATCH] regulator: bd70528: Drop unused include
Thanks Linus! On Sat, 2019-06-01 at 01:06 +0200, Linus Walleij wrote: > This driver does not use any symbols from > so just drop the include. > > Cc: Matti Vaittinen > Signed-off-by: Linus Walleij Acked-By: Matti Vaittinen Br, Matti Vaittinen
Re: [PATCH] regulator: bd718x7: Drop unused include
And thanks for this too =) On Sat, 2019-06-01 at 01:08 +0200, Linus Walleij wrote: > This driver does not use any symbols from > so just drop the include. > > Cc: Matti Vaittinen > Signed-off-by: Linus Walleij Acked-By: Matti Vaittinen Br, Matti Vaittinen
Re: rcu_read_lock lost its compiler barrier
On Sun, Jun 02, 2019 at 08:47:07PM -0700, Paul E. McKenney wrote: > > 1.These guarantees are of full memory barriers, -not- compiler > barriers. What I'm saying is that wherever they are, they must come with compiler barriers. I'm not aware of any synchronisation mechanism in the kernel that gives a memory barrier without a compiler barrier. > 2.These rules don't say exactly where these full memory barriers > go. SRCU is at one extreme, placing those full barriers in > srcu_read_lock() and srcu_read_unlock(), and !PREEMPT Tree RCU > at the other, placing these barriers entirely within the callback > queueing/invocation, grace-period computation, and the scheduler. > Preemptible Tree RCU is in the middle, with rcu_read_unlock() > sometimes including a full memory barrier, but other times with > the full memory barrier being confined as it is with !PREEMPT > Tree RCU. The rules do say that the (full) memory barrier must precede any RCU read-side that occur after the synchronize_rcu and after the end of any RCU read-side that occur before the synchronize_rcu. All I'm arguing is that wherever that full mb is, as long as it also carries with it a barrier() (which it must do if it's done using an existing kernel mb/locking primitive), then we're fine. > Interleaving and inserting full memory barriers as per the rules above: > > CPU1: WRITE_ONCE(a, 1) > CPU1: synchronize_rcu > /* Could put a full memory barrier here, but it wouldn't help. */ CPU1: smp_mb(); CPU2: smp_mb(); Let's put them in because I think they are critical. smp_mb() also carries with it a barrier(). > CPU2: rcu_read_lock(); > CPU1: b = 2; > CPU2: if (READ_ONCE(a) == 0) > CPU2: if (b != 1) /* Weakly ordered CPU moved this up! */ > CPU2: b = 1; > CPU2: rcu_read_unlock > > In fact, CPU2's load from b might be moved up to race with CPU1's store, > which (I believe) is why the model complains in this case. Let's put aside my doubt over how we're even allowing a compiler to turn b = 1 into if (b != 1) b = 1 Since you seem to be assuming that (a == 0) is true in this case (as the assignment b = 1 is carried out), then because of the presence of the full memory barrier, the RCU read-side section must have started prior to the synchronize_rcu. This means that synchronize_rcu is not allowed to return until at least the end of the grace period, or at least until the end of rcu_read_unlock. So it actually should be: CPU1: WRITE_ONCE(a, 1) CPU1: synchronize_rcu called /* Could put a full memory barrier here, but it wouldn't help. */ CPU1: smp_mb(); CPU2: smp_mb(); CPU2: grace period starts ...time passes... CPU2: rcu_read_lock(); CPU2: if (READ_ONCE(a) == 0) CPU2: if (b != 1) /* Weakly ordered CPU moved this up! */ CPU2: b = 1; CPU2: rcu_read_unlock ...time passes... CPU2: grace period ends /* This full memory barrier is also guaranteed by RCU. */ CPU2: smp_mb(); CPU1 synchronize_rcu returns CPU1: b = 2; Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [EXT] INFO: trying to register non-static key in del_timer_sync (2)
On Sat, Jun 1, 2019 at 7:52 PM Ganapathi Bhat wrote: > > Hi syzbot, > > > > > syzbot found the following crash on: > > > As per the > link(https://syzkaller.appspot.com/bug?extid=dc4127f950da51639216), the issue > is fixed; Is it OK? Let us know if we need to do something? Hi Ganapathi, The "fixed" status relates to the similar past bug that was reported and fixed more than a year ago: https://groups.google.com/forum/#!msg/syzkaller-bugs/3YnGX1chF2w/jeQjeihtBAAJ https://syzkaller.appspot.com/bug?id=b4b5c74c57c4b69f4fff86131abb799106182749 This one is still well alive and kicking, with 1200+ crashes and the last one happened less then 30min ago.
[GIT] Sparc
Please pull to get these three bug fixes, and TLB flushing one is of particular brown paper bag quality... Thanks. The following changes since commit f2c7c76c5d0a443053e94adb9f0918fa2fb85c3a: Linux 5.2-rc3 (2019-06-02 13:55:33 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc.git for you to fetch changes up to 56cd0aefa475079e9613085b14a0f05037518fed: sparc: perf: fix updated event period in response to PERF_EVENT_IOC_PERIOD (2019-06-02 22:16:33 -0700) Gen Zhang (1): mdesc: fix a missing-check bug in get_vdev_port_node_info() James Clarke (1): sparc64: Fix regression in non-hypervisor TLB flush xcall Young Xiao (1): sparc: perf: fix updated event period in response to PERF_EVENT_IOC_PERIOD arch/sparc/kernel/mdesc.c | 2 ++ arch/sparc/kernel/perf_event.c | 4 arch/sparc/mm/ultra.S | 4 ++-- 3 files changed, 8 insertions(+), 2 deletions(-)
Re: [PATCH] mdesc: fix a missing-check bug in get_vdev_port_node_info()
From: Gen Zhang Date: Fri, 31 May 2019 09:24:18 +0800 > In get_vdev_port_node_info(), 'node_info->vdev_port.name' is allcoated > by kstrdup_const(), and it returns NULL when fails. So > 'node_info->vdev_port.name' should be checked. > > Signed-off-by: Gen Zhang Applied, thanks.
Re: [PATCH] sparc: perf: fix updated event period in response to PERF_EVENT_IOC_PERIOD
From: Young Xiao <92siuy...@gmail.com> Date: Wed, 29 May 2019 10:21:48 +0800 > The PERF_EVENT_IOC_PERIOD ioctl command can be used to change the > sample period of a running perf_event. Consequently, when calculating > the next event period, the new period will only be considered after the > previous one has overflowed. > > This patch changes the calculation of the remaining event ticks so that > they are offset if the period has changed. > > See commit 3581fe0ef37c ("ARM: 7556/1: perf: fix updated event period in > response to PERF_EVENT_IOC_PERIOD") for details. > > Signed-off-by: Young Xiao <92siuy...@gmail.com> Applied, thanks.
Re: [PATCHv6 5/6] arm64: dts: lx2160a: Add PCIe controller DT nodes
Hi Hou Zhiqiang Two instances [@360 and @380] of the six has a different window count, the RC can not have more than 8 windows. apio-wins = <256>; //Can we change it to 8 ppio-wins = <24>;//Can we change it to 8 On Tue, May 28, 2019 at 12:20 PM Z.q. Hou wrote: > > From: Hou Zhiqiang > > The LX2160A integrated 6 PCIe Gen4 controllers. > > Signed-off-by: Hou Zhiqiang > Reviewed-by: Minghuan Lian > --- > V6: > - No change. > > .../arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 163 ++ > 1 file changed, 163 insertions(+) > > diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > index 125a8cc2c5b3..7a2b91ff1fbc 100644 > --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > @@ -964,5 +964,168 @@ > }; > }; > }; > + > + pcie@340 { > + compatible = "fsl,lx2160a-pcie"; > + reg = <0x00 0x0340 0x0 0x0010 /* controller > registers */ > + 0x80 0x 0x0 0x1000>; /* > configuration space */ > + reg-names = "csr_axi_slave", "config_axi_slave"; > + interrupts = , /* > AER interrupt */ > +, /* > PME interrupt */ > +; /* > controller interrupt */ > + interrupt-names = "aer", "pme", "intr"; > + #address-cells = <3>; > + #size-cells = <2>; > + device_type = "pci"; > + dma-coherent; > + apio-wins = <8>; > + ppio-wins = <8>; > + bus-range = <0x0 0xff>; > + ranges = <0x8200 0x0 0x4000 0x80 0x4000 > 0x0 0x4000>; /* non-prefetchable memory */ > + msi-parent = <>; > + #interrupt-cells = <1>; > + interrupt-map-mask = <0 0 0 7>; > + interrupt-map = < 0 0 1 0 0 GIC_SPI 109 > IRQ_TYPE_LEVEL_HIGH>, > + < 0 0 2 0 0 GIC_SPI 110 > IRQ_TYPE_LEVEL_HIGH>, > + < 0 0 3 0 0 GIC_SPI 111 > IRQ_TYPE_LEVEL_HIGH>, > + < 0 0 4 0 0 GIC_SPI 112 > IRQ_TYPE_LEVEL_HIGH>; > + status = "disabled"; > + }; > + > + pcie@350 { > + compatible = "fsl,lx2160a-pcie"; > + reg = <0x00 0x0350 0x0 0x0010 /* controller > registers */ > + 0x88 0x 0x0 0x1000>; /* > configuration space */ > + reg-names = "csr_axi_slave", "config_axi_slave"; > + interrupts = , /* > AER interrupt */ > +, /* > PME interrupt */ > +; /* > controller interrupt */ > + interrupt-names = "aer", "pme", "intr"; > + #address-cells = <3>; > + #size-cells = <2>; > + device_type = "pci"; > + dma-coherent; > + apio-wins = <8>; > + ppio-wins = <8>; > + bus-range = <0x0 0xff>; > + ranges = <0x8200 0x0 0x4000 0x88 0x4000 > 0x0 0x4000>; /* non-prefetchable memory */ > + msi-parent = <>; > + #interrupt-cells = <1>; > + interrupt-map-mask = <0 0 0 7>; > + interrupt-map = < 0 0 1 0 0 GIC_SPI 114 > IRQ_TYPE_LEVEL_HIGH>, > + < 0 0 2 0 0 GIC_SPI 115 > IRQ_TYPE_LEVEL_HIGH>, > + < 0 0 3 0 0 GIC_SPI 116 > IRQ_TYPE_LEVEL_HIGH>, > + < 0 0 4 0 0 GIC_SPI 117 > IRQ_TYPE_LEVEL_HIGH>; > + status = "disabled"; > + }; > + > + pcie@360 { > + compatible = "fsl,lx2160a-pcie"; > + reg = <0x00 0x0360 0x0 0x0010 /* controller > registers */ > + 0x90 0x 0x0 0x1000>; /* > configuration space */ > + reg-names = "csr_axi_slave", "config_axi_slave"; > + interrupts = , /* > AER interrupt */ > +, /* > PME interrupt */ > +; /* > controller interrupt */ > + interrupt-names = "aer", "pme", "intr"; > + #address-cells = <3>; > + #size-cells
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 03/06/2019 12:23, Shawn Anastasio wrote: > > > On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote: >> >> >> On 31/05/2019 08:49, Shawn Anastasio wrote: >>> On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote: On 28/05/2019 17:39, Shawn Anastasio wrote: > > > On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: >> >> >> On 28/05/2019 15:36, Oliver wrote: >>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio >>> wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } >>> >>> I think the logic here is questionable to begin with. If the user >>> has >>> explicitly requested re-aligning a resource via the command line >>> then >>> we should probably do it even if PCI_PROBE_ONLY is set. When it >>> breaks >>> they get to keep the pieces. >>> >>> That said, the real issue here is that PCI_PROBE_ONLY probably >>> shouldn't be set under qemu/kvm. Under the other hypervisor >>> (PowerVM) >>> hotplugged devices are configured by firmware before it's passed to >>> the guest and we need to keep the FW assignments otherwise things >>> break. QEMU however doesn't do any BAR assignments and relies on >>> that >>> being handled by the guest. At boot time this is done by SLOF, but >>> Linux only keeps SLOF around until it's extracted the device-tree. >>> Once that's done SLOF gets blown away and the kernel needs to do >>> it's >>> own BAR assignments. I'm guessing there's a hack in there to make it >>> work today, but it's a little surprising that it works at all... >> >> >> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the >> guest which receives an event from qemu (RAS_EPOW from >> /proc/interrupts), fetches device tree chunks (and as I understand >> it - >> they come with BARs from phyp but without from qemu) and writes >> "1" to >> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: > > Interesting. Does this mean that the PHYP hotplug path doesn't > call pci_assign_resource? I'd expect dlpar_add_slot() to be called under phyp and eventually pci_device_add() which (I think) may or may not trigger later reassignment. > If so it means the patch may not > break that platform after all, though it still may not be > the correct way of doing things. We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems that (unless resource_alignment= is used) the pseries guest should just walk through all allocated resources and leave them unchanged. >>> >>> If we add a pcibios_default_alignment() implementation like was >>> suggested earlier, then it will behave as if the user has >>> specified resource_alignment= by default and SLOF's assignments >>> won't be honored (I think). >> >> >> I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and >> tried booting with and without
linux-next: manual merge of the akpm-current tree with the dma-mapping tree
Hi all, Today's linux-next merge of the akpm-current tree got a conflict in: include/linux/genalloc.h between commit: 3334e1dc5d71 ("lib/genalloc: add gen_pool_dma_zalloc() for zeroed DMA allocations") from the dma-mapping tree and commit: 1c6b703cba18 ("lib/genalloc: introduce chunk owners") from the akpm-current tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc include/linux/genalloc.h index 6c62eeca754f,b0ab64879ccb.. --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@@ -116,13 -124,47 +124,48 @@@ static inline int gen_pool_add(struct g return gen_pool_add_virt(pool, addr, -1, size, nid); } extern void gen_pool_destroy(struct gen_pool *); - extern unsigned long gen_pool_alloc(struct gen_pool *, size_t); - extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t, - genpool_algo_t algo, void *data); + unsigned long gen_pool_alloc_algo_owner(struct gen_pool *pool, size_t size, + genpool_algo_t algo, void *data, void **owner); + + static inline unsigned long gen_pool_alloc_owner(struct gen_pool *pool, + size_t size, void **owner) + { + return gen_pool_alloc_algo_owner(pool, size, pool->algo, pool->data, + owner); + } + + static inline unsigned long gen_pool_alloc_algo(struct gen_pool *pool, + size_t size, genpool_algo_t algo, void *data) + { + return gen_pool_alloc_algo_owner(pool, size, algo, data, NULL); + } + + /** + * gen_pool_alloc - allocate special memory from the pool + * @pool: pool to allocate from + * @size: number of bytes to allocate from the pool + * + * Allocate the requested number of bytes from the specified pool. + * Uses the pool allocation function (with first-fit algorithm by default). + * Can not be used in NMI handler on architectures without + * NMI-safe cmpxchg implementation. + */ + static inline unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size) + { + return gen_pool_alloc_algo(pool, size, pool->algo, pool->data); + } + extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); +void *gen_pool_dma_zalloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); - extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); + extern void gen_pool_free_owner(struct gen_pool *pool, unsigned long addr, + size_t size, void **owner); + static inline void gen_pool_free(struct gen_pool *pool, unsigned long addr, + size_t size) + { + gen_pool_free_owner(pool, addr, size, NULL); + } + extern void gen_pool_for_each_chunk(struct gen_pool *, void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *); extern size_t gen_pool_avail(struct gen_pool *); pgp0hv1vZlEf0.pgp Description: OpenPGP digital signature
Re: [RFC] mm: Generalize notify_page_fault()
On 05/31/2019 11:18 PM, Matthew Wilcox wrote: > On Fri, May 31, 2019 at 02:17:43PM +0530, Anshuman Khandual wrote: >> On 05/30/2019 07:09 PM, Matthew Wilcox wrote: >>> On Thu, May 30, 2019 at 05:31:15PM +0530, Anshuman Khandual wrote: On 05/30/2019 04:36 PM, Matthew Wilcox wrote: > The two handle preemption differently. Why is x86 wrong and this one > correct? Here it expects context to be already non-preemptible where as the proposed generic function makes it non-preemptible with a preempt_[disable|enable]() pair for the required code section, irrespective of it's present state. Is not this better ? >>> >>> git log -p arch/x86/mm/fault.c >>> >>> search for 'kprobes'. >>> >>> tell me what you think. >> >> Are you referring to these following commits >> >> a980c0ef9f6d ("x86/kprobes: Refactor kprobes_fault() like >> kprobe_exceptions_notify()") >> b506a9d08bae ("x86: code clarification patch to Kprobes arch code") >> >> In particular the later one (b506a9d08bae). It explains how the invoking >> context >> in itself should be non-preemptible for the kprobes processing context >> irrespective >> of whether kprobe_running() or perhaps smp_processor_id() is safe or not. >> Hence it >> does not make much sense to continue when original invoking context is >> preemptible. >> Instead just bail out earlier. This seems to be making more sense than >> preempt >> disable-enable pair. If there are no concerns about this change from other >> platforms, >> I will change the preemption behavior in proposed generic function next time >> around. > > Exactly. > > So, any of the arch maintainers know of a reason they behave differently > from x86 in this regard? Or can Anshuman use the x86 implementation > for all the architectures supporting kprobes? So the generic notify_page_fault() will be like this. int __kprobes notify_page_fault(struct pt_regs *regs, unsigned int trap) { int ret = 0; /* * To be potentially processing a kprobe fault and to be allowed * to call kprobe_running(), we have to be non-preemptible. */ if (kprobes_built_in() && !preemptible() && !user_mode(regs)) { if (kprobe_running() && kprobe_fault_handler(regs, trap)) ret = 1; } return ret; }
[PATCH] sched/fair: don't restart enqueued cfs quota slack timer
From: "liangyan.ply" start_cfs_slack_bandwidth() will restart the quota slack timer, if it is called frequently, this timer will be restarted continuously and may have no chance to expire to unthrottle cfs tasks. This will cause that the throttled tasks can't be unthrottled in time although they have remaining quota. Signed-off-by: Liangyan --- kernel/sched/fair.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d90a64620072..fdb03c752f97 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4411,9 +4411,11 @@ static void start_cfs_slack_bandwidth(struct cfs_bandwidth *cfs_b) if (runtime_refresh_within(cfs_b, min_left)) return; - hrtimer_start(_b->slack_timer, + if (!hrtimer_active(_b->slack_timer)) { + hrtimer_start(_b->slack_timer, ns_to_ktime(cfs_bandwidth_slack_period), HRTIMER_MODE_REL); + } } /* we know any runtime found here is valid as update_curr() precedes return */ -- 2.14.4.44.g2045bb6
Re: [PATCH] PCI: endpoint: Add DMA to Linux PCI EP Framework
Hi Alan, On 31/05/19 11:46 PM, Alan Mikhak wrote: > On Thu, May 30, 2019 at 10:08 PM Kishon Vijay Abraham I wrote: >> Hi Alan, >>> >>> Hi Kishon, >>> >>> I have some improvements in mind for a v2 patch in response to >>> feedback from Gustavo Pimentel that the current implementation is HW >>> specific. I hesitate from submitting a v2 patch because it seems best >>> to seek comment on possible directions this may be taking. >>> >>> One alternative is to wait for or modify test functions in >>> pci-epf-test.c to call DMAengine client APIs, if possible. I imagine >>> pci-epf-test.c test functions would still allocate the necessary local >>> buffer on the endpoint side for the same canned tests for everyone to >>> use. They would prepare the buffer in the existing manner by filling >>> it with random bytes and calculate CRC in the case of a write test. >>> However, they would then initiate DMA operations by using DMAengine >>> client APIs in a generic way instead of calling memcpy_toio() and >>> memcpy_fromio(). They would post-process the buffer in the existing >> >> No, you can't remove memcpy_toio/memcpy_fromio APIs. There could be platforms >> without system DMA or they could have system DMA but without MEMCOPY channels >> or without DMA in their PCI controller. > > I agree. I wouldn't remove memcpy_toio/fromio. That is the reason this > patch introduces the '-d' flag for pcitest to communicate that user > intent across the PCIe bus to pci-epf-test so the endpoint can > initiate the transfer using either memcpy_toio/fromio or DMA. > >>> manner such as the checking for CRC in the case of a read test. >>> Finally, they would release the resources and report results back to >>> the user of pcitest across the PCIe bus through the existing methods. >>> >>> Another alternative I have in mind for v2 is to change the struct >>> pci_epc_dma that this patch added to pci-epc.h from the following: >>> >>> struct pci_epc_dma { >>> u32 control; >>> u32 size; >>> u64 sar; >>> u64 dar; >>> }; >>> >>> to something similar to the following: >>> >>> struct pci_epc_dma { >>> size_t size; >>> void *buffer; >>> int flags; >>> }; >>> >>> The 'flags' field can be a bit field or separate boolean values to >>> specify such things as linked-list mode vs single-block, etc. >>> Associated #defines would be removed from pci-epc.h to be replaced if >>> needed with something generic. The 'size' field specifies the size of >>> DMA transfer that can fit in the buffer. >> >> I still have to look closer into your DMA patch but linked-list mode or >> single >> block mode shouldn't be an user select-able option but should be determined >> by >> the size of transfer. > > Please consider the following when taking a closer look at this patch. After seeing comments from Vinod and Arnd, it looks like the better way of adding DMA support would be to register DMA within PCI endpoint controller to DMA subsystem (as dmaengine) and use only dmaengine APIs in pci_epf_test. > > In my specific use case, I need to verify that any valid block size, > including a one byte transfer, can be transferred across the PCIe bus > by memcpy_toio/fromio() or by DMA either as a single block or as > linked-list. That is why, instead of deciding based on transfer size, > this patch introduces the '-L' flag for pcitest to communicate the > user intent across the PCIe bus to pci-epf-test so the endpoint can > initiate the DMA transfer using a single block or in linked-list mode. The -L option seems to select an internal DMA configuration which might be specific to one implementation. As Gustavo already pointed, we should have only generic options in pcitest. This would no longer be applicable when we move to dmaengine. Thanks Kishon
Re: [PATCH] PCI: endpoint: Add DMA to Linux PCI EP Framework
Hi Kishon, On 03-06-19, 09:54, Kishon Vijay Abraham I wrote: > right. For the endpoint case, drivers/pci/controller should register with the > dmaengine i.e if the controller has aN embedded DMA (I think it should be okay > to keep that in drivers/pci/controller itself instead of drivers/dma) and > drivers/pci/endpoint/functions/ should use dmaengine API's (Depending on the > platform, this will either use system DMA or DMA within the PCI controller). Typically I would prefer the driver to be part of drivers/dma. Would this be a standalone driver or part of the endpoint driver. In former case we can move to dmaengine for latter i guess it makes sense to stay in PCI Thanks -- ~Vinod
[PATCH] cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending
When "deep" suspend is enabled, all CPUs except the primary CPU are hotplugged out. Since CPU hotplug is a costly operation, check if we have to abort the suspend in between each CPU hotplug. This would improve the system suspend abort latency upon detecting a wakeup condition. Signed-off-by: Pavankumar Kondeti --- kernel/cpu.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/kernel/cpu.c b/kernel/cpu.c index f2ef104..784b33d 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1221,6 +1221,13 @@ int freeze_secondary_cpus(int primary) for_each_online_cpu(cpu) { if (cpu == primary) continue; + + if (pm_wakeup_pending()) { + pr_info("Aborting disabling non-boot CPUs..\n"); + error = -EBUSY; + break; + } + trace_suspend_resume(TPS("CPU_OFF"), cpu, true); error = _cpu_down(cpu, 1, CPUHP_OFFLINE); trace_suspend_resume(TPS("CPU_OFF"), cpu, false); -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH] PCI: endpoint: Add DMA to Linux PCI EP Framework
Hi, On 31/05/19 1:19 PM, Arnd Bergmann wrote: > On Fri, May 31, 2019 at 8:32 AM Vinod Koul wrote: >> On 31-05-19, 10:50, Kishon Vijay Abraham I wrote: >>> On 31/05/19 10:37 AM, Vinod Koul wrote: On 30-05-19, 11:16, Kishon Vijay Abraham I wrote: > > right, my initial thought process was to use only dmaengine APIs in > pci-epf-test so that the system DMA or DMA within the PCIe controller can > be > used transparently. But can we register DMA within the PCIe controller to > the > DMA subsystem? AFAIK only system DMA should register with the DMA > subsystem. > (ADMA in SDHCI doesn't use dmaengine). Vinod Koul can confirm. So would this DMA be dedicated for PCI and all PCI devices on the bus? >>> >>> Yes, this DMA will be used only by PCI ($patch is w.r.t PCIe device mode. So >>> all endpoint functions both physical and virtual functions will use the DMA >>> in >>> the controller). If so I do not see a reason why this cannot be using dmaengine. The use >>> >>> Thanks for clarifying. I was under the impression any DMA within a >>> peripheral >>> controller shouldn't use DMAengine. >> >> That is indeed a correct assumption. The dmaengine helps in cases where >> we have a dma controller with multiple users, for a single user case it >> might be overhead to setup dma driver and then use it thru framework. >> >> Someone needs to see the benefit and cost of using the framework and >> decide. > > I think the main question is about how generalized we want this to be. > There are lots of difference PCIe endpoint implementations, and in > case of some licensable IP cores like the designware PCIe there are > many variants, as each SoC will do the implementation in a slightly > different way. > > If we can have a single endpoint driver than can either have an > integrated DMA engine or use an external one, then abstracting that > DMA engine helps make the driver work more readily either way. > > Similarly, there may be PCIe endpoint implementations that have > a dedicated DMA engine in them that is not usable for anything else, > but that is closely related to an IP core we already have a dmaengine > driver for. In this case, we can avoid duplication. right. Either way it makes more sense to register DMA embedded within the PCIe endpoint controller instead of creating epc_ops for DMA transfers. Thanks Kishon
[PATCH 12/15] dcache: Provide a dentry constructor
In order to support object migration on the dentry cache we need to have a determined object state at all times. Without a constructor the object would have a random state after allocation. Provide a dentry constructor. Signed-off-by: Tobin C. Harding --- fs/dcache.c | 30 +- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index c435398f2c81..867d97a86940 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1603,6 +1603,16 @@ void d_invalidate(struct dentry *dentry) } EXPORT_SYMBOL(d_invalidate); +static void dcache_ctor(void *p) +{ + struct dentry *dentry = p; + + /* Mimic lockref_mark_dead() */ + dentry->d_lockref.count = -128; + + spin_lock_init(>d_lock); +} + /** * __d_alloc - allocate a dcache entry * @sb: filesystem it will belong to @@ -1658,7 +1668,6 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name) dentry->d_lockref.count = 1; dentry->d_flags = 0; - spin_lock_init(>d_lock); seqcount_init(>d_seq); dentry->d_inode = NULL; dentry->d_parent = dentry; @@ -3096,14 +3105,17 @@ static void __init dcache_init_early(void) static void __init dcache_init(void) { - /* -* A constructor could be added for stable state like the lists, -* but it is probably not worth it because of the cache nature -* of the dcache. -*/ - dentry_cache = KMEM_CACHE_USERCOPY(dentry, - SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT, - d_iname); + slab_flags_t flags = + SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | SLAB_MEM_SPREAD | SLAB_ACCOUNT; + + dentry_cache = + kmem_cache_create_usercopy("dentry", + sizeof(struct dentry), + __alignof__(struct dentry), + flags, + offsetof(struct dentry, d_iname), + sizeof_field(struct dentry, d_iname), + dcache_ctor); /* Hash may have been set up in dcache_init_early */ if (!hashdist) -- 2.21.0
[PATCH 14/15] slub: Enable moving objects to/from specific nodes
We have just implemented Slab Movable Objects (SMO, object migration). Currently object migration is used to defrag a cache. On NUMA systems it would be nice to be able to control the source and destination nodes when moving objects. Add CONFIG_SLUB_SMO_NODE to guard this feature. CONFIG_SLUB_SMO_NODE depends on CONFIG_SLUB_DEBUG because we use the full list. Implement moving all objects (including those in full slabs) to a specific node. Expose this functionality to userspace via a sysfs entry. Add sysfs entry: /sysfs/kernel/slab//move With this users get access to the following functionality: - Move all objects to specified node. echo "N1" > move - Move all objects from specified node to other specified node (from N1 -> to N2): echo "N1 N2" > move This also enables shrinking slabs on a specific node: echo "N1 N1" > move Signed-off-by: Tobin C. Harding --- mm/Kconfig | 7 ++ mm/slub.c | 247 + 2 files changed, 254 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index f0c76ba47695..c1438b9e578b 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -259,6 +259,13 @@ config ARCH_ENABLE_THP_MIGRATION config CONTIG_ALLOC def_bool (MEMORY_ISOLATION && COMPACTION) || CMA +config SLUB_SMO_NODE + bool "Enable per node control of Slab Movable Objects" + depends on SLUB && SYSFS + select SLUB_DEBUG + help + On NUMA systems enable moving objects to and from a specified node. + config PHYS_ADDR_T_64BIT def_bool 64BIT diff --git a/mm/slub.c b/mm/slub.c index 2157205df7ba..23566e5a712b 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4336,6 +4336,130 @@ static void move_slab_page(struct page *page, void *scratch, int node) s->migrate(s, vector, count, node, private); } +#ifdef CONFIG_SLUB_SMO_NODE +/* + * kmem_cache_move() - Attempt to move all slab objects. + * @s: The cache we are working on. + * @node: The node to move objects away from. + * @target_node: The node to move objects on to. + * + * Attempts to move all objects (partial slabs and full slabs) to target + * node. + * + * Context: Takes the list_lock. + * Return: The number of slabs remaining on node. + */ +static unsigned long kmem_cache_move(struct kmem_cache *s, +int node, int target_node) +{ + struct kmem_cache_node *n = get_node(s, node); + LIST_HEAD(move_list); + struct page *page, *page2; + unsigned long flags; + void **scratch; + + if (!s->migrate) { + pr_warn("%s SMO not enabled, cannot move objects\n", s->name); + goto out; + } + + scratch = alloc_scratch(s); + if (!scratch) + goto out; + + spin_lock_irqsave(>list_lock, flags); + + list_for_each_entry_safe(page, page2, >partial, lru) { + if (!slab_trylock(page)) + /* Busy slab. Get out of the way */ + continue; + + if (page->inuse) { + list_move(>lru, _list); + /* Stop page being considered for allocations */ + n->nr_partial--; + page->frozen = 1; + + slab_unlock(page); + } else {/* Empty slab page */ + list_del(>lru); + n->nr_partial--; + slab_unlock(page); + discard_slab(s, page); + } + } + list_for_each_entry_safe(page, page2, >full, lru) { + if (!slab_trylock(page)) + continue; + + list_move(>lru, _list); + page->frozen = 1; + slab_unlock(page); + } + + spin_unlock_irqrestore(>list_lock, flags); + + list_for_each_entry(page, _list, lru) { + if (page->inuse) + move_slab_page(page, scratch, target_node); + } + kfree(scratch); + + /* Bail here to save taking the list_lock */ + if (list_empty(_list)) + goto out; + + /* Inspect results and dispose of pages */ + spin_lock_irqsave(>list_lock, flags); + list_for_each_entry_safe(page, page2, _list, lru) { + list_del(>lru); + slab_lock(page); + page->frozen = 0; + + if (page->inuse) { + if (page->inuse == page->objects) { + list_add(>lru, >full); + slab_unlock(page); + } else { + n->nr_partial++; + list_add_tail(>lru, >partial); + slab_unlock(page); + } + } else { + slab_unlock(page); + discard_slab(s, page); + } + }
[PATCH 15/15] slub: Enable balancing slabs across nodes
We have just implemented Slab Movable Objects (SMO). On NUMA systems slabs can become unbalanced i.e. many slabs on one node while other nodes have few slabs. Using SMO we can balance the slabs across all the nodes. The algorithm used is as follows: 1. Move all objects to node 0 (this has the effect of defragmenting the cache). 2. Calculate the desired number of slabs for each node (this is done using the approximation nr_slabs / nr_nodes). 3. Loop over the nodes moving the desired number of slabs from node 0 to the node. Feature is conditionally built in with CONFIG_SMO_NODE, this is because we need the full list (we enable SLUB_DEBUG to get this). Future version may separate final list out of SLUB_DEBUG. Expose this functionality to userspace via a sysfs entry. Add sysfs entry: /sysfs/kernel/slab//balance Write of '1' to this file triggers balance, no other value accepted. This feature relies on SMO being enable for the cache, this is done with a call to, after the isolate/migrate functions have been defined. kmem_cache_setup_mobility(s, isolate, migrate) Signed-off-by: Tobin C. Harding --- mm/slub.c | 130 ++ 1 file changed, 130 insertions(+) diff --git a/mm/slub.c b/mm/slub.c index 23566e5a712b..70e46c4db757 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4458,6 +4458,119 @@ static unsigned long kmem_cache_move_to_node(struct kmem_cache *s, int node) return left; } + +/* + * kmem_cache_move_slabs() - Attempt to move @num slabs to target_node, + * @s: The cache we are working on. + * @node: The node to move objects from. + * @target_node: The node to move objects to. + * @num: The number of slabs to move. + * + * Attempts to move @num slabs from @node to @target_node. This is done + * by migrating objects from slabs on the full_list. + * + * Return: The number of slabs moved or error code. + */ +static long kmem_cache_move_slabs(struct kmem_cache *s, + int node, int target_node, long num) +{ + struct kmem_cache_node *n = get_node(s, node); + LIST_HEAD(move_list); + struct page *page, *page2; + unsigned long flags; + void **scratch; + long done = 0; + + if (!s->migrate) { + pr_warn("%s SMO not enabled, cannot move objects\n", s->name); + goto out; + } + + if (node == target_node) + return -EINVAL; + + scratch = alloc_scratch(s); + if (!scratch) + return -ENOMEM; + + spin_lock_irqsave(>list_lock, flags); + + list_for_each_entry_safe(page, page2, >full, lru) { + if (!slab_trylock(page)) + /* Busy slab. Get out of the way */ + continue; + + list_move(>lru, _list); + page->frozen = 1; + slab_unlock(page); + + if (++done >= num) + break; + } + spin_unlock_irqrestore(>list_lock, flags); + + list_for_each_entry(page, _list, lru) { + if (page->inuse) + move_slab_page(page, scratch, target_node); + } + kfree(scratch); + + /* Bail here to save taking the list_lock */ + if (list_empty(_list)) + goto out; + + /* Inspect results and dispose of pages */ + spin_lock_irqsave(>list_lock, flags); + list_for_each_entry_safe(page, page2, _list, lru) { + list_del(>lru); + slab_lock(page); + page->frozen = 0; + + if (page->inuse) { + /* +* This is best effort only, if slab still has +* objects just put it back on the partial list. +*/ + n->nr_partial++; + list_add_tail(>lru, >partial); + slab_unlock(page); + } else { + slab_unlock(page); + discard_slab(s, page); + } + } + spin_unlock_irqrestore(>list_lock, flags); +out: + return done; +} + +/* + * kmem_cache_balance_nodes() - Balance slabs across nodes. + * @s: The cache we are working on. + */ +static void kmem_cache_balance_nodes(struct kmem_cache *s) +{ + struct kmem_cache_node *n = get_node(s, 0); + unsigned long desired_nr_slabs_per_node; + unsigned long nr_slabs; + int nr_nodes = 0; + int nid; + + (void)kmem_cache_move_to_node(s, 0); + + for_each_node_state(nid, N_NORMAL_MEMORY) + nr_nodes++; + + nr_slabs = atomic_long_read(>nr_slabs); + desired_nr_slabs_per_node = nr_slabs / nr_nodes; + + for_each_node_state(nid, N_NORMAL_MEMORY) { + if (nid == 0) + continue; + + kmem_cache_move_slabs(s, 0, nid, desired_nr_slabs_per_node); + } +}
[PATCH 10/15] xarray: Implement migration function for xa_node objects
Recently Slab Movable Objects (SMO) was implemented for the SLUB allocator. The XArray can take advantage of this and make the xa_node slab cache objects movable. Implement functions to migrate objects and activate SMO when we initialise the XArray slab cache. This is based on initial code by Matthew Wilcox and was modified to work with slab object migration. Cc: Matthew Wilcox Signed-off-by: Tobin C. Harding --- lib/xarray.c | 61 1 file changed, 61 insertions(+) diff --git a/lib/xarray.c b/lib/xarray.c index 861c042daa1d..9354e0f01f26 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -1993,12 +1993,73 @@ static void xa_node_ctor(void *arg) INIT_LIST_HEAD(>private_list); } +static void xa_object_migrate(struct xa_node *node, int numa_node) +{ + struct xarray *xa = READ_ONCE(node->array); + void __rcu **slot; + struct xa_node *new_node; + int i; + + /* Freed or not yet in tree then skip */ + if (!xa || xa == XA_RCU_FREE) + return; + + new_node = kmem_cache_alloc_node(xa_node_cachep, GFP_KERNEL, numa_node); + if (!new_node) { + pr_err("%s: slab cache allocation failed\n", __func__); + return; + } + + xa_lock_irq(xa); + + /* Check again. */ + if (xa != node->array) { + node = new_node; + goto unlock; + } + + memcpy(new_node, node, sizeof(struct xa_node)); + + if (list_empty(>private_list)) + INIT_LIST_HEAD(_node->private_list); + else + list_replace(>private_list, _node->private_list); + + for (i = 0; i < XA_CHUNK_SIZE; i++) { + void *x = xa_entry_locked(xa, new_node, i); + + if (xa_is_node(x)) + rcu_assign_pointer(xa_to_node(x)->parent, new_node); + } + if (!new_node->parent) + slot = >xa_head; + else + slot = _parent_locked(xa, new_node)->slots[new_node->offset]; + rcu_assign_pointer(*slot, xa_mk_node(new_node)); + +unlock: + xa_unlock_irq(xa); + xa_node_free(node); + rcu_barrier(); +} + +static void xa_migrate(struct kmem_cache *s, void **objects, int nr, + int node, void *_unused) +{ + int i; + + for (i = 0; i < nr; i++) + xa_object_migrate(objects[i], node); +} + void __init xarray_slabcache_init(void) { xa_node_cachep = kmem_cache_create("xarray_node", sizeof(struct xa_node), 0, SLAB_PANIC | SLAB_RECLAIM_ACCOUNT, xa_node_ctor); + + kmem_cache_setup_mobility(xa_node_cachep, NULL, xa_migrate); } #ifdef XA_DEBUG -- 2.21.0
[PATCH 07/15] tools/testing/slab: Add object migration test module
We just implemented slab movable objects for the SLUB allocator. We should test that code. In order to do so we need to be able to do a number of things - Create a cache - Enable Slab Movable Objects for the cache - Allocate objects to the cache - Free objects from within specific slabs of the cache We can do all this via a loadable module. Add a module that defines functions that can be triggered from userspace via a debugfs entry. From the source: /* * SLUB defragmentation a.k.a. Slab Movable Objects (SMO). * * This module is used for testing the SLUB allocator. Enables * userspace to run kernel functions via a debugfs file. * * debugfs: /sys/kernel/debugfs/smo/callfn (write only) * * String written to `callfn` is parsed by the module and associated * function is called. See fn_tab for mapping of strings to functions. */ References to allocated objects are kept by the module in a linked list so that userspace can control which object to free. We introduce the following four functions via the function table "enable": Enables object migration for the test cache. "alloc X": Allocates X objects "free X [Y]": Frees X objects starting at list position Y (default Y==0) "test": Runs [stress] tests from within the module (see below). {"enable", smo_enable_cache_mobility}, {"alloc", smo_alloc_objects}, {"free", smo_free_object}, {"test", smo_run_module_tests}, Freeing from the start of the list creates a hole in the slab being freed from (i.e. creates a partial slab). The results of running these commands can be see using `slabinfo` (available in tools/vm/): make -o slabinfo tools/vm/slabinfo.c Stress tests can be run from within the module. These tests are internal to the module because we verify that object references are still good after object migration. These are called 'stress' tests because it is intended that they create/free a lot of objects. Userspace can control the number of objects to create, default is 1000. Example test session Relevant /proc/slabinfo column headers: name # mount -t debugfs none /sys/kernel/debug/ $ cd path/to/linux/tools/testing/slab; make ... # insmod slub_defrag.ko # cat /proc/slabinfo | grep smo_test | sed 's/:.*//' smo_test 0 0392 202 >From this we can see that the module created cache 'smo_test' with 20 objects per slab and 2 pages per slab (and cache is currently empty). We can play with the slab allocator manually: # insmod slub_defrag.ko # echo 'alloc 21' > callfn # cat /proc/slabinfo | grep smo_test | sed 's/:.*//' smo_test 21 40392 202 We see here that 21 active objects have been allocated creating 2 slabs (40 total objects). # slabinfo smo_test --report Slabcache: smo_test Aliases: 0 Order : 1 Objects: 21 Sizes (bytes) Slabs DebugMemory Object : 56 Total : 2 Sanity Checks : On Total: 16384 SlabObj: 392 Full : 1 Redzoning : On Used :1176 SlabSiz:8192 Partial: 1 Poisoning : On Loss : 15208 Loss : 336 CpuSlab: 0 Tracking : On Lalig:7056 Align : 8 Objects: 20 Tracing : Off Lpadd: 704 Now free an object from the first slot of the first slab # echo 'free 1' > callfn # cat /proc/slabinfo | grep smo_test | sed 's/:.*//' smo_test 20 40392 202 # slabinfo smo_test --report Slabcache: smo_test Aliases: 0 Order : 1 Objects: 20 Sizes (bytes) Slabs DebugMemory Object : 56 Total : 2 Sanity Checks : On Total: 16384 SlabObj: 392 Full : 0 Redzoning : On Used :1120 SlabSiz:8192 Partial: 2 Poisoning : On Loss : 15264 Loss : 336 CpuSlab: 0 Tracking : On Lalig:6720 Align : 8 Objects: 20 Tracing : Off Lpadd: 704 Calling shrink now on the cache does nothing because object migration is not enabled (output omitted). If we enable object migration then shrink the cache we expect the object from the second slab to me moved to the first slot in the first slab and the second slab to be removed from the partial list. # echo 'enable' > callfn # slabinfo smo_test --shrink # slabinfo smo_test --report Slabcache: smo_test Aliases: 0 Order : 1 Objects: 20 ** Defragmentation at 30% Sizes (bytes) Slabs DebugMemory Object : 56 Total : 1 Sanity Checks : On Total:8192 SlabObj: 392 Full : 1
[PATCH 09/15] lib: Separate radix_tree_node and xa_node slab cache
Earlier, Slab Movable Objects (SMO) was implemented. The XArray is now able to take advantage of SMO in order to make xarray nodes movable (when using the SLUB allocator). Currently the radix tree uses the same slab cache as the XArray. Only XArray nodes are movable _not_ radix tree nodes. We can give the radix tree its own slab cache to overcome this. In preparation for implementing XArray object migration (xa_node objects) via Slab Movable Objects add a slab cache solely for XArray nodes and make the XArray use this slab cache instead of the radix_tree_node slab cache. Cc: Matthew Wilcox Signed-off-by: Tobin C. Harding --- include/linux/xarray.h | 3 +++ init/main.c| 2 ++ lib/radix-tree.c | 2 +- lib/xarray.c | 48 ++ 4 files changed, 45 insertions(+), 10 deletions(-) diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 0e01e6129145..773f91f8e1db 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -42,6 +42,9 @@ #define BITS_PER_XA_VALUE (BITS_PER_LONG - 1) +/* Called from init/main.c */ +void xarray_slabcache_init(void); + /** * xa_mk_value() - Create an XArray entry from an integer. * @v: Value to store in XArray. diff --git a/init/main.c b/init/main.c index 66a196c5e4c3..8c409a5dc937 100644 --- a/init/main.c +++ b/init/main.c @@ -107,6 +107,7 @@ static int kernel_init(void *); extern void init_IRQ(void); extern void radix_tree_init(void); +extern void xarray_slabcache_init(void); /* * Debug helper: via this flag we know that we are in 'early bootup code' @@ -622,6 +623,7 @@ asmlinkage __visible void __init start_kernel(void) "Interrupts were enabled *very* early, fixing it\n")) local_irq_disable(); radix_tree_init(); + xarray_slabcache_init(); /* * Set up housekeeping before setting up workqueues to allow the unbound diff --git a/lib/radix-tree.c b/lib/radix-tree.c index 18c1dfbb1765..e6127c4c84b5 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -31,7 +31,7 @@ /* * Radix tree node cache. */ -struct kmem_cache *radix_tree_node_cachep; +static struct kmem_cache *radix_tree_node_cachep; /* * The radix tree is variable-height, so an insert operation not only has diff --git a/lib/xarray.c b/lib/xarray.c index 6be3acbb861f..861c042daa1d 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -27,6 +27,8 @@ * @entry refers to something stored in a slot in the xarray */ +static struct kmem_cache *xa_node_cachep; + static inline unsigned int xa_lock_type(const struct xarray *xa) { return (__force unsigned int)xa->xa_flags & 3; @@ -244,9 +246,21 @@ void *xas_load(struct xa_state *xas) } EXPORT_SYMBOL_GPL(xas_load); -/* Move the radix tree node cache here */ -extern struct kmem_cache *radix_tree_node_cachep; -extern void radix_tree_node_rcu_free(struct rcu_head *head); +static void xa_node_rcu_free(struct rcu_head *head) +{ + struct xa_node *node = container_of(head, struct xa_node, rcu_head); + + /* +* Must only free zeroed nodes into the slab. We can be left with +* non-NULL entries by radix_tree_free_nodes, so clear the entries +* and tags here. +*/ + memset(node->slots, 0, sizeof(node->slots)); + memset(node->tags, 0, sizeof(node->tags)); + INIT_LIST_HEAD(>private_list); + + kmem_cache_free(xa_node_cachep, node); +} #define XA_RCU_FREE((struct xarray *)1) @@ -254,7 +268,7 @@ static void xa_node_free(struct xa_node *node) { XA_NODE_BUG_ON(node, !list_empty(>private_list)); node->array = XA_RCU_FREE; - call_rcu(>rcu_head, radix_tree_node_rcu_free); + call_rcu(>rcu_head, xa_node_rcu_free); } /* @@ -270,7 +284,7 @@ static void xas_destroy(struct xa_state *xas) if (!node) return; XA_NODE_BUG_ON(node, !list_empty(>private_list)); - kmem_cache_free(radix_tree_node_cachep, node); + kmem_cache_free(xa_node_cachep, node); xas->xa_alloc = NULL; } @@ -298,7 +312,7 @@ bool xas_nomem(struct xa_state *xas, gfp_t gfp) xas_destroy(xas); return false; } - xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp); + xas->xa_alloc = kmem_cache_alloc(xa_node_cachep, gfp); if (!xas->xa_alloc) return false; XA_NODE_BUG_ON(xas->xa_alloc, !list_empty(>xa_alloc->private_list)); @@ -327,10 +341,10 @@ static bool __xas_nomem(struct xa_state *xas, gfp_t gfp) } if (gfpflags_allow_blocking(gfp)) { xas_unlock_type(xas, lock_type); - xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp); + xas->xa_alloc = kmem_cache_alloc(xa_node_cachep, gfp); xas_lock_type(xas, lock_type); } else { - xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp); +
[PATCH 13/15] dcache: Implement partial shrink via Slab Movable Objects
The dentry slab cache is susceptible to internal fragmentation. Now that we have Slab Movable Objects we can attempt to defragment the dcache. Dentry objects are inherently _not_ relocatable however under some conditions they can be free'd. This is the same as shrinking the dcache but instead of shrinking the whole cache we only attempt to free those objects that are located in partially full slab pages. There is no guarantee that this will reduce the memory usage of the system, it is a compromise between fragmented memory and total cache shrinkage with the hope that some memory pressure can be alleviated. This is implemented using the newly added Slab Movable Objects infrastructure. The dcache 'migration' function is intentionally _not_ called 'd_migrate' because we only free, we do not migrate. Call it 'd_partial_shrink' to make explicit that no reallocation is done. In order to enable SMO a call to kmem_cache_setup_mobility() must be made, we do this during initialization of the dcache. Implement isolate and 'migrate' functions for the dentry slab cache. Enable SMO for the dcache during initialization. Signed-off-by: Tobin C. Harding --- fs/dcache.c | 75 + 1 file changed, 75 insertions(+) diff --git a/fs/dcache.c b/fs/dcache.c index 867d97a86940..3ca721752723 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -3072,6 +3072,79 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode) } EXPORT_SYMBOL(d_tmpfile); +/* + * d_isolate() - Dentry isolation callback function. + * @s: The dentry cache. + * @v: Vector of pointers to the objects to isolate. + * @nr: Number of objects in @v. + * + * The slab allocator is holding off frees. We can safely examine + * the object without the danger of it vanishing from under us. + */ +static void *d_isolate(struct kmem_cache *s, void **v, int nr) +{ + struct list_head *dispose; + struct dentry *dentry; + int i; + + dispose = kmalloc(sizeof(*dispose), GFP_KERNEL); + if (!dispose) + return NULL; + + INIT_LIST_HEAD(dispose); + + for (i = 0; i < nr; i++) { + dentry = v[i]; + spin_lock(>d_lock); + + if (dentry->d_lockref.count > 0 || + dentry->d_flags & DCACHE_SHRINK_LIST) { + spin_unlock(>d_lock); + continue; + } + + if (dentry->d_flags & DCACHE_LRU_LIST) + d_lru_del(dentry); + + d_shrink_add(dentry, dispose); + spin_unlock(>d_lock); + } + + return dispose; +} + +/* + * d_partial_shrink() - Dentry migration callback function. + * @s: The dentry cache. + * @_unused: We do not access the vector. + * @__unused: No need for length of vector. + * @___unused: We do not do any allocation. + * @private: list_head pointer representing the shrink list. + * + * Dispose of the shrink list created during isolation function. + * + * Dentry objects can _not_ be relocated and shrinking the whole dcache + * can be expensive. This is an effort to free dentry objects that are + * stopping slab pages from being free'd without clearing the whole dcache. + * + * This callback is called from the SLUB allocator object migration + * infrastructure in attempt to free up slab pages by freeing dentry + * objects from partially full slabs. + */ +static void d_partial_shrink(struct kmem_cache *s, void **_unused, int __unused, +int ___unused, void *private) +{ + struct list_head *dispose = private; + + if (!private) /* kmalloc error during isolate. */ + return; + + if (!list_empty(dispose)) + shrink_dentry_list(dispose); + + kfree(private); +} + static __initdata unsigned long dhash_entries; static int __init set_dhash_entries(char *str) { @@ -3117,6 +3190,8 @@ static void __init dcache_init(void) sizeof_field(struct dentry, d_iname), dcache_ctor); + kmem_cache_setup_mobility(dentry_cache, d_isolate, d_partial_shrink); + /* Hash may have been set up in dcache_init_early */ if (!hashdist) return; -- 2.21.0
[PATCH 11/15] tools/testing/slab: Add XArray movable objects tests
We just implemented movable objects for the XArray. Let's test it intree. Add test module for the XArray's movable objects implementation. Functionality of the XArray Slab Movable Object implementation can usually be seen by simply by using `slabinfo` on a running machine since the radix tree is typically in use on a running machine and will have partial slabs. For repeated testing we can use the test module to run to simulate a workload on the XArray then use `slabinfo` to test object migration is functioning. If testing on freshly spun up VM (low radix tree workload) it may be necessary to load/unload the module a number of times to create partial slabs. Example test session Relevant /proc/slabinfo column headers: name Prior to testing slabinfo report for radix_tree_node: # slabinfo radix_tree_node --report Slabcache: radix_tree_node Aliases: 0 Order : 2 Objects: 8352 ** Reclaim accounting active ** Defragmentation at 30% Sizes (bytes) Slabs DebugMemory Object : 576 Total : 497 Sanity Checks : On Total: 8142848 SlabObj: 912 Full : 473 Redzoning : On Used : 4810752 SlabSiz: 16384 Partial: 24 Poisoning : On Loss : 3332096 Loss : 336 CpuSlab: 0 Tracking : On Lalig: 2806272 Align : 8 Objects: 17 Tracing : Off Lpadd: 437360 Here you can see the kernel was built with Slab Movable Objects enabled for the XArray (XArray uses the radix tree below the surface). After inserting the test module (note we have triggered allocation of a number of radix tree nodes increasing the object count but decreasing the number of partial slabs): # slabinfo radix_tree_node --report Slabcache: radix_tree_node Aliases: 0 Order : 2 Objects: 8442 ** Reclaim accounting active ** Defragmentation at 30% Sizes (bytes) Slabs DebugMemory Object : 576 Total : 499 Sanity Checks : On Total: 8175616 SlabObj: 912 Full : 484 Redzoning : On Used : 4862592 SlabSiz: 16384 Partial: 15 Poisoning : On Loss : 3313024 Loss : 336 CpuSlab: 0 Tracking : On Lalig: 2836512 Align : 8 Objects: 17 Tracing : Off Lpadd: 439120 Now we can shrink the radix_tree_node cache: # slabinfo radix_tree_node --shrink # slabinfo radix_tree_node --report Slabcache: radix_tree_node Aliases: 0 Order : 2 Objects: 8515 ** Reclaim accounting active ** Defragmentation at 30% Sizes (bytes) Slabs DebugMemory Object : 576 Total : 501 Sanity Checks : On Total: 8208384 SlabObj: 912 Full : 500 Redzoning : On Used : 4904640 SlabSiz: 16384 Partial: 1 Poisoning : On Loss : 3303744 Loss : 336 CpuSlab: 0 Tracking : On Lalig: 2861040 Align : 8 Objects: 17 Tracing : Off Lpadd: 440880 Note the single remaining partial slab. Signed-off-by: Tobin C. Harding --- tools/testing/slab/Makefile | 2 +- tools/testing/slab/slub_defrag_xarray.c | 211 2 files changed, 212 insertions(+), 1 deletion(-) create mode 100644 tools/testing/slab/slub_defrag_xarray.c diff --git a/tools/testing/slab/Makefile b/tools/testing/slab/Makefile index 440c2e3e356f..44c18d9a4d52 100644 --- a/tools/testing/slab/Makefile +++ b/tools/testing/slab/Makefile @@ -1,4 +1,4 @@ -obj-m += slub_defrag.o +obj-m += slub_defrag.o slub_defrag_xarray.o KTREE=../../.. diff --git a/tools/testing/slab/slub_defrag_xarray.c b/tools/testing/slab/slub_defrag_xarray.c new file mode 100644 index ..41143f73256c --- /dev/null +++ b/tools/testing/slab/slub_defrag_xarray.c @@ -0,0 +1,211 @@ +// SPDX-License-Identifier: GPL-2.0+ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define SMOX_CACHE_NAME "smox_test" +static struct kmem_cache *cachep; + +/* + * Declare XArrays globally so we can clean them up on module unload. + */ + +/* Used by test_smo_xarray()*/ +DEFINE_XARRAY(things); + +/* Thing to store pointers to in the XArray */ +struct smox_thing { + long id; +}; + +/* It's up to the caller to ensure id is unique */ +static struct smox_thing *alloc_thing(int id) +{ + struct smox_thing *thing; + + thing = kmem_cache_alloc(cachep, GFP_KERNEL); + if (!thing) + return ERR_PTR(-ENOMEM); + + thing->id = id; + return thing; +} + +/** + * smox_object_ctor() - SMO object constructor function. + * @ptr: Pointer to memory where the object should be constructed. + */ +void
[PATCH 08/15] tools/testing/slab: Add object migration test suite
We just added a module that enables testing the SLUB allocators ability to defrag/shrink caches via movable objects. Tests are better when they are automated. Add automated testing via a python script for SLUB movable objects. Example output: $ cd path/to/linux/tools/testing/slab $ /slub_defrag.py Please run script as root $ sudo ./slub_defrag.py $ sudo ./slub_defrag.py --debug Loading module ... Slab cache smo_test created Objects per slab: 20 Running sanity checks ... Running module stress test (see dmesg for additional test output) ... Removing module slub_defrag ... Loading module ... Slab cache smo_test created Running test non-movable ... testing slab 'smo_test' prior to enabling movable objects ... verified non-movable slabs are NOT shrinkable Running test movable ... testing slab 'smo_test' after enabling movable objects ... verified movable slabs are shrinkable Removing module slub_defrag ... Signed-off-by: Tobin C. Harding --- tools/testing/slab/slub_defrag.c | 1 + tools/testing/slab/slub_defrag.py | 451 ++ 2 files changed, 452 insertions(+) create mode 100755 tools/testing/slab/slub_defrag.py diff --git a/tools/testing/slab/slub_defrag.c b/tools/testing/slab/slub_defrag.c index 4a5c24394b96..8332e69ee868 100644 --- a/tools/testing/slab/slub_defrag.c +++ b/tools/testing/slab/slub_defrag.c @@ -337,6 +337,7 @@ static int smo_run_module_tests(int nr_objs, int keep) /* * struct functions() - Map command to a function pointer. + * If you update this please update the documentation in slub_defrag.py */ struct functions { char *fn_name; diff --git a/tools/testing/slab/slub_defrag.py b/tools/testing/slab/slub_defrag.py new file mode 100755 index ..41747c0db39b --- /dev/null +++ b/tools/testing/slab/slub_defrag.py @@ -0,0 +1,451 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +import subprocess +import sys +from os import path + +# SLUB Movable Objects test suite. +# +# Requirements: +# - CONFIG_SLUB=y +# - CONFIG_SLUB_DEBUG=y +# - The slub_defrag module in this directory. + +# Test SMO using a kernel module that enables triggering arbitrary +# kernel code from userspace via a debugfs file. +# +# Module code is in ./slub_defrag.c, basically the functionality is as +# follows: +# +# - Creates debugfs file /sys/kernel/debugfs/smo/callfn +# - Writes to 'callfn' are parsed as a command string and the function +#associated with command is called. +# - Defines 4 commands (all commands operate on smo_test cache): +# - 'test': Runs module stress tests. +# - 'alloc N': Allocates N slub objects +# - 'free N POS': Frees N objects starting at POS (see below) +# - 'enable': Enables SLUB Movable Objects +# +# The module maintains a list of allocated objects. Allocation adds +# objects to the tail of the list. Free'ing frees from the head of the +# list. This has the effect of creating free slots in the slab. For +# finer grained control over where in the cache slots are free'd POS +# (position) argument may be used. + +# The main() function is reasonably readable; the test suite does the +# following: +# +# 1. Runs the module stress tests. +# 2. Tests the cache without movable objects enabled. +#- Creates multiple partial slabs as explained above. +#- Verifies that partial slabs are _not_ removed by shrink (see below). +# 3. Tests the cache with movable objects enabled. +#- Creates multiple partial slabs as explained above. +#- Verifies that partial slabs _are_ removed by shrink (see below). + +# The sysfs file /sys/kernel/slab//shrink enables calling the +# function kmem_cache_shrink() (see mm/slab_common.c and mm/slub.cc). +# Shrinking a cache attempts to consolidate all partial slabs by moving +# objects if object migration is enable for the cache, otherwise +# shrinking a cache simply re-orders the partial list so as most densely +# populated slab are at the head of the list. + +# Enable/disable debugging output (also enabled via -d | --debug). +debug = False + +# Used in debug messages and when running `insmod`. +MODULE_NAME = "slub_defrag" + +# Slab cache created by the test module. +CACHE_NAME = "smo_test" + +# Set by get_slab_config() +objects_per_slab = 0 +pages_per_slab = 0 +debugfs_mounted = False # Set to true if we mount debugfs. + + +def eprint(*args, **kwargs): +print(*args, file=sys.stderr, **kwargs) + + +def dprint(*args, **kwargs): +if debug: +print(*args, file=sys.stderr, **kwargs) + + +def run_shell(cmd): +return subprocess.call([cmd], shell=True) + + +def run_shell_get_stdout(cmd): +return subprocess.check_output([cmd], shell=True) + + +def assert_root(): +user = run_shell_get_stdout('whoami') +if user != b'root\n': +eprint("Please run script as root") +sys.exit(1) + + +def mount_debugfs(): +mounted = False + +# Check if debugfs is mounted at a known
[PATCH 06/15] tools/vm/slabinfo: Add defrag_used_ratio output
Add output for the newly added defrag_used_ratio sysfs knob. Signed-off-by: Tobin C. Harding --- tools/vm/slabinfo.c | 4 1 file changed, 4 insertions(+) diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c index d2c22f9ee2d8..ef4ff93df4cc 100644 --- a/tools/vm/slabinfo.c +++ b/tools/vm/slabinfo.c @@ -34,6 +34,7 @@ struct slabinfo { unsigned int sanity_checks, slab_size, store_user, trace; int order, poison, reclaim_account, red_zone; int movable, ctor; + int defrag_used_ratio; int remote_node_defrag_ratio; unsigned long partial, objects, slabs, objects_partial, objects_total; unsigned long alloc_fastpath, alloc_slowpath; @@ -549,6 +550,8 @@ static void report(struct slabinfo *s) printf("** Slabs are destroyed via RCU\n"); if (s->reclaim_account) printf("** Reclaim accounting active\n"); + if (s->movable) + printf("** Defragmentation at %d%%\n", s->defrag_used_ratio); printf("\nSizes (bytes) Slabs Debug Memory\n"); printf("\n"); @@ -1279,6 +1282,7 @@ static void read_slab_dir(void) slab->deactivate_bypass = get_obj("deactivate_bypass"); slab->remote_node_defrag_ratio = get_obj("remote_node_defrag_ratio"); + slab->defrag_used_ratio = get_obj("defrag_used_ratio"); chdir(".."); if (read_slab_obj(slab, "ops")) { if (strstr(buffer, "ctor :")) -- 2.21.0
[PATCH 05/15] tools/vm/slabinfo: Add remote node defrag ratio output
Add output line for NUMA remote node defrag ratio. Signed-off-by: Tobin C. Harding --- tools/vm/slabinfo.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c index cbfc56c44c2f..d2c22f9ee2d8 100644 --- a/tools/vm/slabinfo.c +++ b/tools/vm/slabinfo.c @@ -34,6 +34,7 @@ struct slabinfo { unsigned int sanity_checks, slab_size, store_user, trace; int order, poison, reclaim_account, red_zone; int movable, ctor; + int remote_node_defrag_ratio; unsigned long partial, objects, slabs, objects_partial, objects_total; unsigned long alloc_fastpath, alloc_slowpath; unsigned long free_fastpath, free_slowpath; @@ -377,6 +378,10 @@ static void slab_numa(struct slabinfo *s, int mode) if (skip_zero && !s->slabs) return; + if (mode) { + printf("\nNUMA remote node defrag ratio: %3d\n", + s->remote_node_defrag_ratio); + } if (!line) { printf("\n%-21s:", mode ? "NUMA nodes" : "Slab"); for(node = 0; node <= highest_node; node++) @@ -1272,6 +1277,8 @@ static void read_slab_dir(void) slab->cpu_partial_free = get_obj("cpu_partial_free"); slab->alloc_node_mismatch = get_obj("alloc_node_mismatch"); slab->deactivate_bypass = get_obj("deactivate_bypass"); + slab->remote_node_defrag_ratio = + get_obj("remote_node_defrag_ratio"); chdir(".."); if (read_slab_obj(slab, "ops")) { if (strstr(buffer, "ctor :")) -- 2.21.0
[PATCH 01/15] slub: Add isolate() and migrate() methods
Add the two methods needed for moving objects and enable the display of the callbacks via the /sys/kernel/slab interface. Add documentation explaining the use of these methods and the prototypes for slab.h. Add functions to setup the callbacks method for a slab cache. Add empty functions for SLAB/SLOB. The API is generic so it could be theoretically implemented for these allocators as well. Change sysfs 'ctor' field to be 'ops' to contain all the callback operations defined for a slab cache. Display the existing 'ctor' callback in the ops fields contents along with 'isolate' and 'migrate' callbacks. Signed-off-by: Tobin C. Harding --- include/linux/slab.h | 70 include/linux/slub_def.h | 3 ++ mm/slub.c| 59 + 3 files changed, 126 insertions(+), 6 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 9449b19c5f10..886fc130334d 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -154,6 +154,76 @@ void memcg_create_kmem_cache(struct mem_cgroup *, struct kmem_cache *); void memcg_deactivate_kmem_caches(struct mem_cgroup *); void memcg_destroy_kmem_caches(struct mem_cgroup *); +/* + * Function prototypes passed to kmem_cache_setup_mobility() to enable + * mobile objects and targeted reclaim in slab caches. + */ + +/** + * typedef kmem_cache_isolate_func - Object migration callback function. + * @s: The cache we are working on. + * @ptr: Pointer to an array of pointers to the objects to isolate. + * @nr: Number of objects in @ptr array. + * + * The purpose of kmem_cache_isolate_func() is to pin each object so that + * they cannot be freed until kmem_cache_migrate_func() has processed + * them. This may be accomplished by increasing the refcount or setting + * a flag. + * + * The object pointer array passed is also passed to + * kmem_cache_migrate_func(). The function may remove objects from the + * array by setting pointers to %NULL. This is useful if we can + * determine that an object is being freed because + * kmem_cache_isolate_func() was called when the subsystem was calling + * kmem_cache_free(). In that case it is not necessary to increase the + * refcount or specially mark the object because the release of the slab + * lock will lead to the immediate freeing of the object. + * + * Context: Called with locks held so that the slab objects cannot be + * freed. We are in an atomic context and no slab operations + * may be performed. + * Return: A pointer that is passed to the migrate function. If any + * objects cannot be touched at this point then the pointer may + * indicate a failure and then the migration function can simply + * remove the references that were already obtained. The private + * data could be used to track the objects that were already pinned. + */ +typedef void *kmem_cache_isolate_func(struct kmem_cache *s, void **ptr, int nr); + +/** + * typedef kmem_cache_migrate_func - Object migration callback function. + * @s: The cache we are working on. + * @ptr: Pointer to an array of pointers to the objects to migrate. + * @nr: Number of objects in @ptr array. + * @node: The NUMA node where the object should be allocated. + * @private: The pointer returned by kmem_cache_isolate_func(). + * + * This function is responsible for migrating objects. Typically, for + * each object in the input array you will want to allocate an new + * object, copy the original object, update any pointers, and free the + * old object. + * + * After this function returns all pointers to the old object should now + * point to the new object. + * + * Context: Called with no locks held and interrupts enabled. Sleeping + * is possible. Any operation may be performed. + */ +typedef void kmem_cache_migrate_func(struct kmem_cache *s, void **ptr, +int nr, int node, void *private); + +/* + * kmem_cache_setup_mobility() is used to setup callbacks for a slab cache. + */ +#ifdef CONFIG_SLUB +void kmem_cache_setup_mobility(struct kmem_cache *, kmem_cache_isolate_func, + kmem_cache_migrate_func); +#else +static inline void +kmem_cache_setup_mobility(struct kmem_cache *s, kmem_cache_isolate_func isolate, + kmem_cache_migrate_func migrate) {} +#endif + /* * Please use this macro to create slab caches. Simply specify the * name of the structure and maybe some flags that are listed above. diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index d2153789bd9f..2879a2f5f8eb 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -99,6 +99,9 @@ struct kmem_cache { gfp_t allocflags; /* gfp flags to use on each alloc */ int refcount; /* Refcount for slab cache destroy */ void (*ctor)(void *); + kmem_cache_isolate_func *isolate; + kmem_cache_migrate_func *migrate; +
[PATCH 02/15] tools/vm/slabinfo: Add support for -C and -M options
-C lists caches that use a ctor. -M lists caches that support object migration. Add command line options to show caches with a constructor and caches that are movable (i.e. have migrate function). Signed-off-by: Tobin C. Harding --- tools/vm/slabinfo.c | 40 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c index 73818f1b2ef8..cbfc56c44c2f 100644 --- a/tools/vm/slabinfo.c +++ b/tools/vm/slabinfo.c @@ -33,6 +33,7 @@ struct slabinfo { unsigned int hwcache_align, object_size, objs_per_slab; unsigned int sanity_checks, slab_size, store_user, trace; int order, poison, reclaim_account, red_zone; + int movable, ctor; unsigned long partial, objects, slabs, objects_partial, objects_total; unsigned long alloc_fastpath, alloc_slowpath; unsigned long free_fastpath, free_slowpath; @@ -67,6 +68,8 @@ int show_report; int show_alias; int show_slab; int skip_zero = 1; +int show_movable; +int show_ctor; int show_numa; int show_track; int show_first_alias; @@ -109,11 +112,13 @@ static void fatal(const char *x, ...) static void usage(void) { - printf("slabinfo 4/15/2011. (c) 2007 sgi/(c) 2011 Linux Foundation.\n\n" - "slabinfo [-aADefhilnosrStTvz1LXBU] [N=K] [-dafzput] [slab-regexp]\n" + printf("slabinfo 4/15/2017. (c) 2007 sgi/(c) 2011 Linux Foundation/(c) 2017 Jump Trading LLC.\n\n" + "slabinfo [-aACDefhilMnosrStTvz1LXBU] [N=K] [-dafzput] [slab-regexp]\n" + "-a|--aliases Show aliases\n" "-A|--activity Most active slabs first\n" "-B|--Bytes Show size in bytes\n" + "-C|--ctor Show slabs with ctors\n" "-D|--display-activeSwitch line format to activity\n" "-e|--empty Show empty slabs\n" "-f|--first-alias Show first alias\n" @@ -121,6 +126,7 @@ static void usage(void) "-i|--inverted Inverted list\n" "-l|--slabs Show slabs\n" "-L|--Loss Sort by loss\n" + "-M|--movable Show caches that support movable objects\n" "-n|--numa Show NUMA information\n" "-N|--lines=K Show the first K slabs\n" "-o|--ops Show kmem_cache_ops\n" @@ -588,6 +594,12 @@ static void slabcache(struct slabinfo *s) if (show_empty && s->slabs) return; + if (show_ctor && !s->ctor) + return; + + if (show_movable && !s->movable) + return; + if (sort_loss == 0) store_size(size_str, slab_size(s)); else @@ -602,6 +614,10 @@ static void slabcache(struct slabinfo *s) *p++ = '*'; if (s->cache_dma) *p++ = 'd'; + if (s->ctor) + *p++ = 'C'; + if (s->movable) + *p++ = 'M'; if (s->hwcache_align) *p++ = 'A'; if (s->poison) @@ -636,7 +652,8 @@ static void slabcache(struct slabinfo *s) printf("%-21s %8ld %7d %15s %14s %4d %1d %3ld %3ld %s\n", s->name, s->objects, s->object_size, size_str, dist_str, s->objs_per_slab, s->order, - s->slabs ? (s->partial * 100) / s->slabs : 100, + s->slabs ? (s->partial * 100) / + (s->slabs * s->objs_per_slab) : 100, s->slabs ? (s->objects * s->object_size * 100) / (s->slabs * (page_size << s->order)) : 100, flags); @@ -1256,6 +1273,13 @@ static void read_slab_dir(void) slab->alloc_node_mismatch = get_obj("alloc_node_mismatch"); slab->deactivate_bypass = get_obj("deactivate_bypass"); chdir(".."); + if (read_slab_obj(slab, "ops")) { + if (strstr(buffer, "ctor :")) + slab->ctor = 1; + if (strstr(buffer, "migrate :")) + slab->movable = 1; + } + if (slab->name[0] == ':') alias_targets++; slab++; @@ -1332,6 +1356,8 @@ static void xtotals(void) } struct option opts[] = { + { "ctor", no_argument, NULL, 'C' }, + { "movable", no_argument, NULL, 'M' }, { "aliases", no_argument, NULL, 'a' }, { "activity", no_argument, NULL, 'A' }, { "debug", optional_argument, NULL, 'd' }, @@ -1367,7 +1393,7 @@ int main(int argc, char *argv[]) page_size = getpagesize(); - while ((c = getopt_long(argc,
[PATCH 03/15] slub: Sort slab cache list
It is advantageous to have all defragmentable slabs together at the beginning of the list of slabs so that there is no need to scan the complete list. Put defragmentable caches first when adding a slab cache and others last. Signed-off-by: Tobin C. Harding --- mm/slab_common.c | 2 +- mm/slub.c| 6 ++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 58251ba63e4a..db5e9a0b1535 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -393,7 +393,7 @@ static struct kmem_cache *create_cache(const char *name, goto out_free_cache; s->refcount = 1; - list_add(>list, _caches); + list_add_tail(>list, _caches); memcg_link_cache(s); out: if (err) diff --git a/mm/slub.c b/mm/slub.c index 1c380a2bc78a..66d474397c0f 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4333,6 +4333,8 @@ void kmem_cache_setup_mobility(struct kmem_cache *s, return; } + mutex_lock(_mutex); + s->isolate = isolate; s->migrate = migrate; @@ -4341,6 +4343,10 @@ void kmem_cache_setup_mobility(struct kmem_cache *s, * to disable fast cmpxchg based processing. */ s->flags &= ~__CMPXCHG_DOUBLE; + + list_move(>list, _caches); /* Move to top */ + + mutex_unlock(_mutex); } EXPORT_SYMBOL(kmem_cache_setup_mobility); -- 2.21.0
[PATCH 04/15] slub: Slab defrag core
Internal fragmentation can occur within pages used by the slub allocator. Under some workloads large numbers of pages can be used by partial slab pages. This under-utilisation is bad simply because it wastes memory but also because if the system is under memory pressure higher order allocations may become difficult to satisfy. If we can defrag slab caches we can alleviate these problems. Implement Slab Movable Objects in order to defragment slab caches. Slab defragmentation may occur: 1. Unconditionally when __kmem_cache_shrink() is called on a slab cache by the kernel calling kmem_cache_shrink(). 2. Unconditionally through the use of the slabinfo command. slabinfo -s 3. Conditionally via the use of kmem_cache_defrag() - Use Slab Movable Objects when shrinking cache. Currently when the kernel calls kmem_cache_shrink() we curate the partial slabs list. If object migration is not enabled for the cache we still do this, if however, SMO is enabled we attempt to move objects in partially full slabs in order to defragment the cache. Shrink attempts to move all objects in order to reduce the cache to a single partial slab for each node. - Add conditional per node defrag via new function: kmem_defrag_slabs(int node). kmem_defrag_slabs() attempts to defragment all slab caches for node. Defragmentation is done conditionally dependent on MAX_PARTIAL _and_ defrag_used_ratio. Caches are only considered for defragmentation if the number of partial slabs exceeds MAX_PARTIAL (per node). Also, defragmentation only occurs if the usage ratio of the slab is lower than the configured percentage (sysfs field added in this patch). Fragmentation ratios are measured by calculating the percentage of objects in use compared to the total number of objects that the slab page can accommodate. The scanning of slab caches is optimized because the defragmentable slabs come first on the list. Thus we can terminate scans on the first slab encountered that does not support defragmentation. kmem_defrag_slabs() takes a node parameter. This can either be -1 if defragmentation should be performed on all nodes, or a node number. Defragmentation may be disabled by setting defrag ratio to 0 echo 0 > /sys/kernel/slab//defrag_used_ratio - Add a defrag ratio sysfs field and set it to 30% by default. A limit of 30% specifies that more than 3 out of 10 available slots for objects need to be in use otherwise slab defragmentation will be attempted on the remaining objects. In order for a cache to be defragmentable the cache must support object migration (SMO). Enabling SMO for a cache is done via a call to the recently added function: void kmem_cache_setup_mobility(struct kmem_cache *, kmem_cache_isolate_func, kmem_cache_migrate_func); Signed-off-by: Tobin C. Harding --- Documentation/ABI/testing/sysfs-kernel-slab | 14 + include/linux/slab.h| 1 + include/linux/slub_def.h| 7 + mm/slub.c | 385 4 files changed, 334 insertions(+), 73 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-kernel-slab b/Documentation/ABI/testing/sysfs-kernel-slab index 29601d93a1c2..8bd893968e4f 100644 --- a/Documentation/ABI/testing/sysfs-kernel-slab +++ b/Documentation/ABI/testing/sysfs-kernel-slab @@ -180,6 +180,20 @@ Description: list. It can be written to clear the current count. Available when CONFIG_SLUB_STATS is enabled. +What: /sys/kernel/slab/cache/defrag_used_ratio +Date: June 2019 +KernelVersion: 5.2 +Contact: Christoph Lameter + Pekka Enberg , +Description: + The defrag_used_ratio file allows the control of how aggressive + slab fragmentation reduction works at reclaiming objects from + sparsely populated slabs. This is a percentage. If a slab has + less than this percentage of objects allocated then reclaim will + attempt to reclaim objects so that the whole slab page can be + freed. 0% specifies no reclaim attempt (defrag disabled), 100% + specifies attempt to reclaim all pages. The default is 30%. + What: /sys/kernel/slab/cache/deactivate_to_tail Date: February 2008 KernelVersion: 2.6.25 diff --git a/include/linux/slab.h b/include/linux/slab.h index 886fc130334d..4bf381b34829 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -149,6 +149,7 @@ struct kmem_cache *kmem_cache_create_usercopy(const char *name, void (*ctor)(void *)); void kmem_cache_destroy(struct kmem_cache *); int kmem_cache_shrink(struct kmem_cache *); +unsigned long kmem_defrag_slabs(int node); void memcg_create_kmem_cache(struct mem_cgroup *, struct kmem_cache *); void
[PATCH 00/15] Slab Movable Objects (SMO)
Hi, TL;DR - Add object migration (SMO) to the SLUB allocator and implement object migration for the XArray and the dcache. Thanks for you patience with all the RFC's of this patch set. Here it is, ready for prime time. Internal fragmentation can occur within pages used by the slub allocator. Under some workloads large numbers of pages can be used by partial slab pages. This under-utilisation is bad simply because it wastes memory but also because if the system is under memory pressure higher order allocations may become difficult to satisfy. If we can defrag slab caches we can alleviate these problems. In order to be able to defrag slab chaches we need to be able to migrate objects to a new slab. Slab object migration is the core functionality added by this patch series. Internal slab fragmentation is a long known problem. This series does not claim to completely _fix_ the issue. Instead we are adding core code to the SLUB allocator to enable users of the allocator to help mitigate internal fragmentation. Object migration is on a per cache basis, with each cache being able to take advantage of object migration to varying degrees depending on the nature of the objects stored in the cache. Series includes test modules and test code that can be used to verify the claimed behaviour. Patch #1 - Adds the callbacks used to enable SMO for a particular cache. Patch #2 - Updates the slabinfo tool to show operations related to SMO. Patch #3 - Sorts the cache list putting migratable slabs at front. Patch #4 - Adds the SMO infrastructure. This is the core patch of the series. Patch #5, #6 - Further update slabinfo tool for information just added. Patch #7 - Add a module for testing SMO. Patch #8 - Add unit test suite in Python utilising test module from #7. Patch #9 - Add a new slab cache for the XArray (separate from radix tree). Patch #10 - Implement SMO for the XArray. Patch #11 - Add module for testing XArray SMO implementation. Patch #12 - Add a dentry constructor. Patch #13 - Use SMO to attempt to reduce fragmentation of the dcache by selectively freeing dentry objects. Patch #14 - Add functionality to move slab objects to a specific NUMA node. Patch #15 - Add functionality to balance slab objects across all NUMA nodes. The last RFC (RFCv5 and discussion on it) included code to conditionally exclude SMO for the dcache. This has been removed. IMO it is now not needed. Al sufficiently bollock'ed me during development that I believe the dentry code is good and does not negatively effect the dcache. If someone would like to prove me wrong simply remove the call to kmem_cache_setup_mobility(dentry_cache, d_isolate, d_partial_shrink); Testing: The series has been tested to verify that objects are moved using bare metal (core i5) and also Qemu. This has not been tested on big metal or on NUMA hardware. I have no measurements on performance gains achievable with this set, I have just verified that the migration works and does not appear to break anything. Patch #14 and #15 depend on CONFIG_SLBU_DEBUG_ON or boot with 'slub_debug' Thanks for taking the time to look at this. Tobin Tobin C. Harding (15): slub: Add isolate() and migrate() methods tools/vm/slabinfo: Add support for -C and -M options slub: Sort slab cache list slub: Slab defrag core tools/vm/slabinfo: Add remote node defrag ratio output tools/vm/slabinfo: Add defrag_used_ratio output tools/testing/slab: Add object migration test module tools/testing/slab: Add object migration test suite lib: Separate radix_tree_node and xa_node slab cache xarray: Implement migration function for xa_node objects tools/testing/slab: Add XArray movable objects tests dcache: Provide a dentry constructor dcache: Implement partial shrink via Slab Movable Objects slub: Enable moving objects to/from specific nodes slub: Enable balancing slabs across nodes Documentation/ABI/testing/sysfs-kernel-slab | 14 + fs/dcache.c | 105 ++- include/linux/slab.h| 71 ++ include/linux/slub_def.h| 10 + include/linux/xarray.h | 3 + init/main.c | 2 + lib/radix-tree.c| 2 +- lib/xarray.c| 109 ++- mm/Kconfig | 7 + mm/slab_common.c| 2 +- mm/slub.c | 827 ++-- tools/testing/slab/Makefile | 10 + tools/testing/slab/slub_defrag.c| 567 ++ tools/testing/slab/slub_defrag.py | 451 +++ tools/testing/slab/slub_defrag_xarray.c | 211 + tools/vm/slabinfo.c | 51 +- 16 files changed, 2339 insertions(+), 103 deletions(-) create mode 100644 tools/testing/slab/Makefile create mode 100644
Re: [RFC PATCH v5 16/16] dcache: Add CONFIG_DCACHE_SMO
On Wed, May 29, 2019 at 04:16:51PM +, Roman Gushchin wrote: > On Wed, May 29, 2019 at 01:54:06PM +1000, Tobin C. Harding wrote: > > On Tue, May 21, 2019 at 02:05:38AM +, Roman Gushchin wrote: > > > On Tue, May 21, 2019 at 11:31:18AM +1000, Tobin C. Harding wrote: > > > > On Tue, May 21, 2019 at 12:57:47AM +, Roman Gushchin wrote: > > > > > On Mon, May 20, 2019 at 03:40:17PM +1000, Tobin C. Harding wrote: > > > > > > In an attempt to make the SMO patchset as non-invasive as possible > > > > > > add a > > > > > > config option CONFIG_DCACHE_SMO (under "Memory Management options") > > > > > > for > > > > > > enabling SMO for the DCACHE. Whithout this option dcache > > > > > > constructor is > > > > > > used but no other code is built in, with this option enabled slab > > > > > > mobility is enabled and the isolate/migrate functions are built in. > > > > > > > > > > > > Add CONFIG_DCACHE_SMO to guard the partial shrinking of the dcache > > > > > > via > > > > > > Slab Movable Objects infrastructure. > > > > > > > > > > Hm, isn't it better to make it a static branch? Or basically anything > > > > > that allows switching on the fly? > > > > > > > > If that is wanted, turning SMO on and off per cache, we can probably do > > > > this in the SMO code in SLUB. > > > > > > Not necessarily per cache, but without recompiling the kernel. > > > > > > > > > It seems that the cost of just building it in shouldn't be that high. > > > > > And the question if the defragmentation worth the trouble is so much > > > > > easier to answer if it's possible to turn it on and off without > > > > > rebooting. > > > > > > > > If the question is 'is defragmentation worth the trouble for the > > > > dcache', I'm not sure having SMO turned off helps answer that question. > > > > If one doesn't shrink the dentry cache there should be very little > > > > overhead in having SMO enabled. So if one wants to explore this > > > > question then they can turn on the config option. Please correct me if > > > > I'm wrong. > > > > > > The problem with a config option is that it's hard to switch over. > > > > > > So just to test your changes in production a new kernel should be built, > > > tested and rolled out to a representative set of machines (which can be > > > measured in thousands of machines). Then if results are questionable, > > > it should be rolled back. > > > > > > What you're actually guarding is the kmem_cache_setup_mobility() call, > > > which can be perfectly avoided using a boot option, for example. Turning > > > it on and off completely dynamic isn't that hard too. > > > > Hi Roman, > > > > I've added a boot parameter to SLUB so that admins can enable/disable > > SMO at boot time system wide. Then for each object that implements SMO > > (currently XArray and dcache) I've also added a boot parameter to > > enable/disable SMO for that cache specifically (these depend on SMO > > being enabled system wide). > > > > All three boot parameters default to 'off', I've added a config option > > to default each to 'on'. > > > > I've got a little more testing to do on another part of the set then the > > PATCH version is coming at you :) > > > > This is more a courtesy email than a request for comment, but please > > feel free to shout if you don't like the method outlined above. > > > > Fully dynamic config is not currently possible because currently the SMO > > implementation does not support disabling mobility for a cache once it > > is turned on, a bit of extra logic would need to be added and some state > > stored - I'm not sure it warrants it ATM but that can be easily added > > later if wanted. Maybe Christoph will give his opinion on this. > > Perfect! Hi Roman, I'm about to post PATCH series. I have removed all the boot time config options in contrast to what I stated in this thread. I feel it requires some comment so as not to seem rude to you. Please feel free to re-raise these issues on the series if you feel it is a better place to do it than on this thread. I still hear you re making testing easier if there are boot parameters. I don't have extensive experience testing on a large number of machines so I have no basis to contradict what you said. It was suggested to me that having switches to turn SMO off implies the series is not ready. I am claiming that SMO _is_ ready and also that it has no negative effects (especially on the dcache). I therefore think this comment is pertinent. So ... I re-did the boot parameters defaulting to 'on'. However I could then see no reason (outside of testing) to turn them off. It seems ugly to have code that is only required during testing and never after. Please correct me if I'm wrong. Finally I decided that since adding a boot parameter is trivial that hackers could easily add one to test if they wanted to test a specific cache. Otherwise we just test 'patched kernel' vs 'unpatched kernel'. Again, please correct me if I'm wrong. So, that
Re: [PATCH] PCI: endpoint: Add DMA to Linux PCI EP Framework
Hi Vinod, On 31/05/19 12:02 PM, Vinod Koul wrote: > On 31-05-19, 10:50, Kishon Vijay Abraham I wrote: >> Hi Vinod, >> >> On 31/05/19 10:37 AM, Vinod Koul wrote: >>> Hi Kishon, >>> >>> On 30-05-19, 11:16, Kishon Vijay Abraham I wrote: +Vinod Koul Hi, On 30/05/19 4:07 AM, Alan Mikhak wrote: > On Mon, May 27, 2019 at 2:09 AM Gustavo Pimentel > wrote: >> >> On Fri, May 24, 2019 at 20:42:43, Alan Mikhak >> wrote: >> >> Hi Alan, >> >>> On Fri, May 24, 2019 at 1:59 AM Gustavo Pimentel >>> wrote: Hi Alan, This patch implementation is very HW implementation dependent and requires the DMA to exposed through PCIe BARs, which aren't always the case. Besides, you are defining some control bits on include/linux/pci-epc.h that may not have any meaning to other types of DMA. I don't think this was what Kishon had in mind when he developed the pcitest, but let see what Kishon was to say about it. I've developed a DMA driver for DWC PCI using Linux Kernel DMAengine API and which I submitted some days ago. By having a DMA driver which implemented using DMAengine API, means the pcitest can use the DMAengine client API, which will be completely generic to any other DMA implementation. right, my initial thought process was to use only dmaengine APIs in pci-epf-test so that the system DMA or DMA within the PCIe controller can be used transparently. But can we register DMA within the PCIe controller to the DMA subsystem? AFAIK only system DMA should register with the DMA subsystem. (ADMA in SDHCI doesn't use dmaengine). Vinod Koul can confirm. >>> >>> So would this DMA be dedicated for PCI and all PCI devices on the bus? >> >> Yes, this DMA will be used only by PCI ($patch is w.r.t PCIe device mode. So >> all endpoint functions both physical and virtual functions will use the DMA >> in >> the controller). >>> If so I do not see a reason why this cannot be using dmaengine. The use >> >> Thanks for clarifying. I was under the impression any DMA within a peripheral >> controller shouldn't use DMAengine. > > That is indeed a correct assumption. The dmaengine helps in cases where > we have a dma controller with multiple users, for a single user case it > might be overhead to setup dma driver and then use it thru framework. > > Someone needs to see the benefit and cost of using the framework and > decide. The DMA within the endpoint controller can indeed be used by multiple users for e.g in the case of multi function EP devices or SR-IOV devices, all the function drivers can use the DMA in the endpoint controller. I think it makes sense to use dmaengine for DMA within the endpoint controller. > >>> case would be memcpy for DMA right or mem to device (vice versa) transfers? >> >> The device is memory mapped so it would be only memcopy. >>> >>> Btw many driver in sdhci do use dmaengine APIs and yes we are missing >>> support in framework than individual drivers >> >> I think dmaengine APIs is used only when the platform uses system DMA and not >> ADMA within the SDHCI controller. IOW there is no dma_async_device_register() >> to register ADMA in SDHCI with DMA subsystem. > > We are looking it from the different point of view. You are looking for > dmaengine drivers in that (which would be in drivers/dma/) and I am > pointing to users of dmaengine in that. > > So the users in mmc would be ones using dmaengine APIs: > $git grep -l dmaengine_prep_* drivers/mmc/ > > which tells me 17 drivers! right. For the endpoint case, drivers/pci/controller should register with the dmaengine i.e if the controller has aN embedded DMA (I think it should be okay to keep that in drivers/pci/controller itself instead of drivers/dma) and drivers/pci/endpoint/functions/ should use dmaengine API's (Depending on the platform, this will either use system DMA or DMA within the PCI controller). Thanks Kishon
Re: [PATCH] sched/core: add __sched tag for io_schedule()
On 2019/5/31 22:37, Tejun Heo wrote: > On Fri, May 31, 2019 at 04:29:12PM +0800, Gao Xiang wrote: >> non-inline io_schedule() was introduced in >> commit 10ab56434f2f ("sched/core: Separate out io_schedule_prepare() and >> io_schedule_finish()") >> Keep in line with io_schedule_timeout, Otherwise >> "/proc//wchan" will report io_schedule() >> rather than its callers when waiting io. >> >> Reported-by: Jilong Kou >> Cc: Tejun Heo >> Cc: Ingo Molnar >> Cc: Peter Zijlstra >> Signed-off-by: Gao Xiang > > Acked-by: Tejun Heo Cc: # 4.11+ Thanks Tejun. This patch will be needed for io performance analysis since we found that Android systrace tool cannot show the callers of iowait raised from io_schedule() on linux-4.14 LTS kernel. Hi Andrew, could you kindly take this patch? Thanks, Gao Xiang > > Thanks. >
Re: [PATCH -next v2] mm/hotplug: fix a null-ptr-deref during NUMA boot
On Fri, May 31, 2019 at 5:03 PM Michal Hocko wrote: > > On Thu 30-05-19 20:55:32, Pingfan Liu wrote: > > On Wed, May 29, 2019 at 2:20 AM Michal Hocko wrote: > > > > > > [Sorry for a late reply] > > > > > > On Thu 23-05-19 11:58:45, Pingfan Liu wrote: > > > > On Wed, May 22, 2019 at 7:16 PM Michal Hocko wrote: > > > > > > > > > > On Wed 22-05-19 15:12:16, Pingfan Liu wrote: > > > [...] > > > > > > But in fact, we already have for_each_node_state(nid, N_MEMORY) to > > > > > > cover this purpose. > > > > > > > > > > I do not really think we want to spread N_MEMORY outside of the core > > > > > MM. > > > > > It is quite confusing IMHO. > > > > > . > > > > But it has already like this. Just git grep N_MEMORY. > > > > > > I might be wrong but I suspect a closer review would reveal that the use > > > will be inconsistent or dubious so following the existing users is not > > > the best approach. > > > > > > > > > Furthermore, changing the definition of online may > > > > > > break something in the scheduler, e.g. in task_numa_migrate(), where > > > > > > it calls for_each_online_node. > > > > > > > > > > Could you be more specific please? Why should numa balancing consider > > > > > nodes without any memory? > > > > > > > > > As my understanding, the destination cpu can be on a memory less node. > > > > BTW, there are several functions in the scheduler facing the same > > > > scenario, task_numa_migrate() is an example. > > > > > > Even if the destination node is memoryless then any migration would fail > > > because there is no memory. Anyway I still do not see how using online > > > node would break anything. > > > > > Suppose we have nodes A, B,C, where C is memory less but has little > > distance to B, comparing with the one from A to B. Then if a task is > > running on A, but prefer to run on B due to memory footprint. > > task_numa_migrate() allows us to migrate the task to node C. Changing > > for_each_online_node will break this. > > That would require the task to have preferred node to be C no? Or do I > missunderstand the task migration logic? I think in task_numa_migrate(), the migration logic should looks like: env.dst_nid = p->numa_preferred_nid; //Here dst nid is B But later in if (env.best_cpu == -1 || (p->numa_group && p->numa_group->active_nodes > 1)) { for_each_online_node(nid) { [...] task_numa_find_cpu(, taskimp, groupimp); // Here is a chance to change p->numa_preferred_nid There are serveral other broken by changing for_each_online_node(), -1. show_numa_stats() -2. init_numa_topology_type(), where sched_numa_topology_type may be mistaken evaluated. -3. ... can check call to for_each_online_node() one by one in scheduler. That is my understanding of the code. Thanks, Pingfan
Re: rcu_read_lock lost its compiler barrier
On Mon, Jun 03, 2019 at 12:01:14PM +0800, Herbert Xu wrote: > On Sun, Jun 02, 2019 at 08:47:07PM -0700, Paul E. McKenney wrote: > > > > CPU2: if (b != 1) > > CPU2: b = 1; > > Stop right there. The kernel is full of code that assumes that > assignment to an int/long is atomic. If your compiler breaks this > assumption that we can kiss the kernel good-bye. The slippery slope apparently started here: : commit ea435467500612636f8f4fb639ff6e76b2496e4b : Author: Matthew Wilcox : Date: Tue Jan 6 14:40:39 2009 -0800 : : atomic_t: unify all arch definitions : : diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h : index ad5b9f6ecddf..85b46fba4229 100644 : --- a/arch/x86/include/asm/atomic_32.h : +++ b/arch/x86/include/asm/atomic_32.h : ... : @@ -10,15 +11,6 @@ : * resource counting etc.. : */ : : -/* : - * Make sure gcc doesn't try to be clever and move things around : - * on us. We need to use _exactly_ the address the user gave us, : - * not some alias that contains the same information. : - */ : -typedef struct { : - int counter; : -} atomic_t; : : diff --git a/include/linux/types.h b/include/linux/types.h : index 121f349cb7ec..3b864f2d9560 100644 : --- a/include/linux/types.h : +++ b/include/linux/types.h : @@ -195,6 +195,16 @@ typedef u32 phys_addr_t; : : typedef phys_addr_t resource_size_t; : : +typedef struct { : + volatile int counter; : +} atomic_t; : + Before evolving into the READ_ONCE/WRITE_ONCE that we have now. Linus, are we now really supporting a compiler where an assignment (or a read) from an int/long/pointer can be non-atomic without the volatile marker? Because if that's the case then we have a lot of code to audit. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] mm/gup: fix omission of check on FOLL_LONGTERM in get_user_pages_fast()
On Sat, Jun 1, 2019 at 1:06 AM John Hubbard wrote: > > On 5/31/19 4:05 AM, Pingfan Liu wrote: > > On Fri, May 31, 2019 at 7:21 AM John Hubbard wrote: > >> On 5/30/19 2:47 PM, Ira Weiny wrote: > >>> On Thu, May 30, 2019 at 06:54:04AM +0800, Pingfan Liu wrote: > >> [...] > >> Rather lightly tested...I've compile-tested with CONFIG_CMA and > >> !CONFIG_CMA, > >> and boot tested with CONFIG_CMA, but could use a second set of eyes on > >> whether > >> I've added any off-by-one errors, or worse. :) > >> > > Do you mind I send V2 based on your above patch? Anyway, it is a simple bug > > fix. > > > > Sure, that's why I sent it. :) Note that Ira also recommended splitting the > "nr --> nr_pinned" renaming into a separate patch. > Thanks for your kind help. I will split out nr_pinned to a separate patch. Regards, Pingfan
Re: [PATCH] mm/gup: fix omission of check on FOLL_LONGTERM in get_user_pages_fast()
On Sat, Jun 1, 2019 at 1:12 AM Ira Weiny wrote: > > On Fri, May 31, 2019 at 07:05:27PM +0800, Pingfan Liu wrote: > > On Fri, May 31, 2019 at 7:21 AM John Hubbard wrote: > > > > > > > > > Rather lightly tested...I've compile-tested with CONFIG_CMA and > > > !CONFIG_CMA, > > > and boot tested with CONFIG_CMA, but could use a second set of eyes on > > > whether > > > I've added any off-by-one errors, or worse. :) > > > > > Do you mind I send V2 based on your above patch? Anyway, it is a simple bug > > fix. > > FWIW please split out the nr_pinned change to a separate patch. > OK. Thanks, Pingfan
Re: rcu_read_lock lost its compiler barrier
On Sun, Jun 02, 2019 at 08:47:07PM -0700, Paul E. McKenney wrote: > > CPU2: if (b != 1) > CPU2: b = 1; Stop right there. The kernel is full of code that assumes that assignment to an int/long is atomic. If your compiler breaks this assumption that we can kiss the kernel good-bye. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[PATCH 1/2] dt-bindings: i3c: Document MediaTek I3C master bindings
Document MediaTek I3C master DT bindings. Signed-off-by: Qii Wang --- .../devicetree/bindings/i3c/mtk,i3c-master.txt | 50 1 file changed, 50 insertions(+) create mode 100644 Documentation/devicetree/bindings/i3c/mtk,i3c-master.txt diff --git a/Documentation/devicetree/bindings/i3c/mtk,i3c-master.txt b/Documentation/devicetree/bindings/i3c/mtk,i3c-master.txt new file mode 100644 index 000..89ec380 --- /dev/null +++ b/Documentation/devicetree/bindings/i3c/mtk,i3c-master.txt @@ -0,0 +1,50 @@ +Bindings for MediaTek I3C master block += + +Required properties: + +- compatible: shall be "mediatek,i3c-master" +- reg: physical base address of the controller and apdma base, length of + memory mapped region. +- reg-names: should be "main" for controller and "dma" for apdma. +- interrupts: interrupt number to the cpu. +- clock-div: the fixed value for frequency divider of clock source in i3c + module. Each IC may be different. +- clocks: clock name from clock manager. +- clock-names: must include "main" and "dma". + +Mandatory properties defined by the generic binding (see +Documentation/devicetree/bindings/i3c/i3c.txt for more details): + +- #address-cells: shall be set to 3 +- #size-cells: shall be set to 0 + +Optional properties defined by the generic binding (see +Documentation/devicetree/bindings/i3c/i3c.txt for more details): + +- i2c-scl-hz +- i3c-scl-hz + +I3C device connected on the bus follow the generic description (see +Documentation/devicetree/bindings/i3c/i3c.txt for more details). + +Example: + + i3c0: i3c@1100d000 { + compatible = "mediatek,i3c-master"; + reg = <0x1100d000 0x100>, + <0x11000300 0x80>; + reg-names = "main", "dma"; + interrupts = ; + clock-div = <16>; + clocks = <_ck>, <_dma_ck>; + clock-names = "main", "dma"; + #address-cells = <1>; + #size-cells = <0>; + i2c-scl-hz = <10>; + + nunchuk: nunchuk@52 { + compatible = "nintendo,nunchuk"; + reg = <0x52 0x8010 0>; + }; + }; -- 1.7.9.5
[PATCH 2/2] i3c: master: Add driver for MediaTek IP
Add a driver for MediaTek I3C master IP. Signed-off-by: Qii Wang --- drivers/i3c/master/Kconfig | 10 + drivers/i3c/master/Makefile |1 + drivers/i3c/master/i3c-master-mtk.c | 1246 +++ 3 files changed, 1257 insertions(+) create mode 100644 drivers/i3c/master/i3c-master-mtk.c diff --git a/drivers/i3c/master/Kconfig b/drivers/i3c/master/Kconfig index 26c6b58..acc00d9 100644 --- a/drivers/i3c/master/Kconfig +++ b/drivers/i3c/master/Kconfig @@ -20,3 +20,13 @@ config DW_I3C_MASTER This driver can also be built as a module. If so, the module will be called dw-i3c-master. + +config MTK_I3C_MASTER + tristate "MediaTek I3C master driver" + depends on I3C + depends on HAS_IOMEM + depends on !(ALPHA || PARISC) + help + This selects the MediaTek(R) I3C master controller driver. + If you want to use MediaTek(R) I3C interface, say Y here. + If unsure, say N or M. diff --git a/drivers/i3c/master/Makefile b/drivers/i3c/master/Makefile index fc53939..fe7ccf5 100644 --- a/drivers/i3c/master/Makefile +++ b/drivers/i3c/master/Makefile @@ -1,2 +1,3 @@ obj-$(CONFIG_CDNS_I3C_MASTER) += i3c-master-cdns.o obj-$(CONFIG_DW_I3C_MASTER)+= dw-i3c-master.o +obj-$(CONFIG_MTK_I3C_MASTER) += i3c-master-mtk.o diff --git a/drivers/i3c/master/i3c-master-mtk.c b/drivers/i3c/master/i3c-master-mtk.c new file mode 100644 index 000..a209bb6 --- /dev/null +++ b/drivers/i3c/master/i3c-master-mtk.c @@ -0,0 +1,1246 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 MediaTek Design Systems Inc. + * + * Author: Qii Wang + */ + +#include +#include +#include +#include +#include +#include +#include + +#define DRV_NAME "i3c-master-mtk" + +#define SLAVE_ADDR 0x04 +#define INTR_MASK 0x08 +#define INTR_STAT 0x0c +#define INTR_TRANSAC_COMP BIT(0) +#define INTR_ACKERRGENMASK(2, 1) +#define INTR_ARB_LOST BIT(3) +#define INTR_RS_MULTI BIT(4) +#define INTR_MAS_ERR BIT(8) +#define INTR_ALL (INTR_MAS_ERR | INTR_ARB_LOST |\ + INTR_ACKERR | INTR_TRANSAC_COMP) + +#define CONTROL0x10 +#define CONTROL_WRAPPERBIT(0) +#define CONTROL_RS BIT(1) +#define CONTROL_DMA_EN BIT(2) +#define CONTROL_CLK_EXT_EN BIT(3) +#define CONTROL_DIR_CHANGE BIT(4) +#define CONTROL_ACKERR_DET_EN BIT(5) +#define CONTROL_LEN_CHANGE BIT(6) +#define CONTROL_DMAACK_EN BIT(8) +#define CONTROL_ASYNC_MODE BIT(9) + +#define TRANSFER_LEN 0x14 +#define TRANSAC_LEN0x18 +#define TRANSAC_LEN_WRRD 0x0002 +#define TRANS_ONE_LEN 0x0001 + +#define DELAY_LEN 0x1c +#define DELAY_LEN_DEFAULT 0x000a + +#define TIMING 0x20 +#define TIMING_VALUE(sample_cnt, step_cnt) ({ \ + typeof(sample_cnt) sample_cnt_ = (sample_cnt); \ + typeof(step_cnt) step_cnt_ = (step_cnt); \ + (((sample_cnt_) << 8) | (step_cnt_)); \ +}) + +#define START 0x24 +#define START_EN BIT(0) +#define START_MUL_TRIG BIT(14) +#define START_MUL_CNFG BIT(15) + +#define EXT_CONF 0x28 +#define EXT_CONF_DEFAULT 0x0a1f + +#define LTIMING0x2c +#define LTIMING_VALUE(sample_cnt, step_cnt) ({ \ + typeof(sample_cnt) sample_cnt_ = (sample_cnt); \ + typeof(step_cnt) step_cnt_ = (step_cnt); \ + (((sample_cnt_) << 6) | (step_cnt_) | \ + ((sample_cnt_) << 12) | ((step_cnt_) << 9)); \ +}) + +#define HS 0x30 +#define HS_CLR_VALUE 0x +#define HS_DEFAULT_VALUE 0x0083 +#define HS_VALUE(sample_cnt, step_cnt) ({ \ + typeof(sample_cnt) sample_cnt_ = (sample_cnt); \ + typeof(step_cnt) step_cnt_ = (step_cnt); \ + (HS_DEFAULT_VALUE | \ + ((sample_cnt_) << 12) | ((step_cnt_) << 8)); \ +}) + +#define IO_CONFIG 0x34 +#define IO_CONFIG_PUSH_PULL0x + +#define FIFO_ADDR_CLR 0x38 +#define FIFO_CLR 0x0003 + +#define MCU_INTR 0x40 +#define MCU_INTR_ENBIT(0) + +#define TRANSFER_LEN_AUX 0x44 +#define CLOCK_DIV 0x48 +#define CLOCK_DIV_DEFAULT ((INTER_CLK_DIV - 1) << 8 |\ + (INTER_CLK_DIV - 1)) + +#define SOFTRESET 0x50 +#define SOFT_RST BIT(0) + +#define TRAFFIC0x54 +#define TRAFFIC_DAA_EN BIT(4) +#define TRAFFIC_TBIT BIT(7) +#define TRAFFIC_HEAD_ONLY BIT(9) +#define TRAFFIC_SKIP_SLV_ADDR BIT(10) +#define TRAFFIC_HANDOFFBIT(14) + +#define DEF_DA 0x68 +#define DEF_DAA_SLV_PARITY BIT(8) + +#define SHAPE 0x6c +#define SHAPE_TBIT_STALL BIT(1) + +#define HFIFO_DATA
[PATCH 0/2] Add MediaTek I3C master controller driver
This series are based on 5.2-rc1, we provide two patches to support MediaTek I3C master controller. Qii Wang (2): dt-bindings: i3c: Document MediaTek I3C master bindings i3c: master: Add driver for MediaTek IP .../devicetree/bindings/i3c/mtk,i3c-master.txt | 50 + drivers/i3c/master/Kconfig | 10 + drivers/i3c/master/Makefile|1 + drivers/i3c/master/i3c-master-mtk.c| 1246 4 files changed, 1307 insertions(+) create mode 100644 Documentation/devicetree/bindings/i3c/mtk,i3c-master.txt create mode 100644 drivers/i3c/master/i3c-master-mtk.c -- 1.7.9.5
Re: rcu_read_lock lost its compiler barrier
On Mon, Jun 03, 2019 at 10:46:40AM +0800, Herbert Xu wrote: > On Sun, Jun 02, 2019 at 01:54:12PM -0700, Linus Torvalds wrote: > > On Sat, Jun 1, 2019 at 10:56 PM Herbert Xu > > wrote: > > > > > > You can't then go and decide to remove the compiler barrier! To do > > > that you'd need to audit every single use of rcu_read_lock in the > > > kernel to ensure that they're not depending on the compiler barrier. > > > > What's the possible case where it would matter when there is no preemption? > > The case we were discussing is from net/ipv4/inet_fragment.c from > the net-next tree: > > void fqdir_exit(struct fqdir *fqdir) > { > ... > fqdir->dead = true; > > /* call_rcu is supposed to provide memory barrier semantics, >* separating the setting of fqdir->dead with the destruction >* work. This implicit barrier is paired with inet_frag_kill(). >*/ > > INIT_RCU_WORK(>destroy_rwork, fqdir_rwork_fn); > queue_rcu_work(system_wq, >destroy_rwork); > } > > and > > void inet_frag_kill(struct inet_frag_queue *fq) > { > ... > rcu_read_lock(); > /* The RCU read lock provides a memory barrier >* guaranteeing that if fqdir->dead is false then >* the hash table destruction will not start until >* after we unlock. Paired with inet_frags_exit_net(). >*/ > if (!fqdir->dead) { > rhashtable_remove_fast(>rhashtable, >node, > fqdir->f->rhash_params); > ... > } > ... > rcu_read_unlock(); > ... > } > > I simplified this to > > Initial values: > > a = 0 > b = 0 > > CPU1 CPU2 > > a = 1 rcu_read_lock > synchronize_rcu if (a == 0) > b = 2 b = 1 > rcu_read_unlock > > On exit we want this to be true: > b == 2 > > Now what Paul was telling me is that unless every memory operation > is done with READ_ONCE/WRITE_ONCE then his memory model shows that > the exit constraint won't hold. But please note that the plain-variable portion of the memory model is very new and likely still has a bug or two. In fact, see below. > IOW, we need > > CPU1 CPU2 > > WRITE_ONCE(a, 1) rcu_read_lock > synchronize_rcu if (READ_ONCE(a) == 0) > WRITE_ONCE(b, 2) WRITE_ONCE(b, 1) > rcu_read_unlock > > Now I think this bullshit because if we really needed these compiler > barriers then we surely would need real memory barriers to go with > them. On the one hand, you have no code before your rcu_read_lock() and also 1no code after your rcu_read_unlock(). So in this particular example, adding compiler barriers to these guys won't help you. On the other hand, on CPU 1's write to "b", I agree with you and disagree with the model, though perhaps my partners in LKMM crime will show me the error of my ways on this point. On CPU 2's write to "b", I can see the memory model's point, but getting there requires some gymnastics on the part of both the compiler and the CPU. The WRITE_ONCE() and READ_ONCE() for "a" is the normal requirement for variables that are concurrently loaded and stored. Please note that garden-variety uses of RCU have similar requirements, namely the rcu_assign_pointer() on the one side and the rcu_dereference() on the other. Your use case allows rcu_assign_pointer() to be weakened to WRITE_ONCE() and rcu_dereference() to be weakened to READ_ONCE() (not that this last is all that much of a weakening these days). > In fact, the sole purpose of the RCU mechanism is to provide those > memory barriers. Quoting from > Documentation/RCU/Design/Requirements/Requirements.html: > > Each CPU that has an RCU read-side critical section that > begins before synchronize_rcu() starts is > guaranteed to execute a full memory barrier between the time > that the RCU read-side critical section ends and the time that > synchronize_rcu() returns. > Without this guarantee, a pre-existing RCU read-side critical section > might hold a reference to the newly removed struct foo > after the kfree() on line14 of > remove_gp_synchronous(). > Each CPU that has an RCU read-side critical section that ends > after synchronize_rcu() returns is guaranteed > to execute a full memory barrier between the time that > synchronize_rcu() begins and the time that the RCU > read-side critical section begins. > Without this guarantee, a later RCU read-side critical section > running after the kfree() on line14 of >
[PATCH v10 4/7] rpmsg: add rpmsg support for mt8183 SCP.
Add a simple rpmsg support for mt8183 SCP, that use IPI / IPC directly. Signed-off-by: Pi-Hsun Shih --- Changes from v9, v8, v7: - No change. Changes from v6: - Decouple mtk_rpmsg from mtk_scp by putting all necessary informations (name service IPI id, register/unregister/send functions) into a struct, and pass it to the mtk_rpmsg_create_rproc_subdev function. Changes from v5: - CONFIG_MTK_SCP now selects CONFIG_RPMSG_MTK_SCP, and the dummy implementation for mtk_rpmsg_{create,destroy}_rproc_subdev when CONFIG_RPMSG_MTK_SCP is not defined is removed. Changes from v4: - Match and fill the device tree node to the created rpmsg subdevice, so the rpmsg subdevice can utilize the properties and subnodes on device tree (This is similar to what drivers/rpmsg/qcom_smd.c does). Changes from v3: - Change from unprepare to stop, to stop the rpmsg driver before the rproc is stopped, avoiding problem that some rpmsg would fail after rproc is stopped. - Add missing spin_lock_init, and use destroy_ept instead of kref_put. Changes from v2: - Unregiser IPI handler on unprepare. - Lock the channel list on operations. - Move SCP_IPI_NS_SERVICE to 0xFF. Changes from v1: - Do cleanup properly in mtk_rpmsg.c, which also removes the problem of short-lived work items. - Fix several issues checkpatch found. --- drivers/remoteproc/Kconfig| 1 + drivers/remoteproc/mtk_common.h | 2 + drivers/remoteproc/mtk_scp.c | 38 ++- drivers/remoteproc/mtk_scp_ipi.c | 1 + drivers/rpmsg/Kconfig | 9 + drivers/rpmsg/Makefile| 1 + drivers/rpmsg/mtk_rpmsg.c | 396 ++ include/linux/platform_data/mtk_scp.h | 4 +- include/linux/rpmsg/mtk_rpmsg.h | 30 ++ 9 files changed, 477 insertions(+), 5 deletions(-) create mode 100644 drivers/rpmsg/mtk_rpmsg.c create mode 100644 include/linux/rpmsg/mtk_rpmsg.h diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig index ad3a0de04d9e..82747a5b9caf 100644 --- a/drivers/remoteproc/Kconfig +++ b/drivers/remoteproc/Kconfig @@ -26,6 +26,7 @@ config IMX_REMOTEPROC config MTK_SCP tristate "Mediatek SCP support" depends on ARCH_MEDIATEK + select RPMSG_MTK_SCP help Say y here to support Mediatek's System Companion Processor (SCP) via the remote processor framework. diff --git a/drivers/remoteproc/mtk_common.h b/drivers/remoteproc/mtk_common.h index 7504ae1bc0ef..19a907810271 100644 --- a/drivers/remoteproc/mtk_common.h +++ b/drivers/remoteproc/mtk_common.h @@ -54,6 +54,8 @@ struct mtk_scp { void __iomem *cpu_addr; phys_addr_t phys_addr; size_t dram_size; + + struct rproc_subdev *rpmsg_subdev; }; /** diff --git a/drivers/remoteproc/mtk_scp.c b/drivers/remoteproc/mtk_scp.c index bebecd470b8d..0c73aba6858d 100644 --- a/drivers/remoteproc/mtk_scp.c +++ b/drivers/remoteproc/mtk_scp.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "mtk_common.h" #include "remoteproc_internal.h" @@ -513,6 +514,31 @@ static int scp_map_memory_region(struct mtk_scp *scp) return 0; } +static struct mtk_rpmsg_info mtk_scp_rpmsg_info = { + .send_ipi = scp_ipi_send, + .register_ipi = scp_ipi_register, + .unregister_ipi = scp_ipi_unregister, + .ns_ipi_id = SCP_IPI_NS_SERVICE, +}; + +static void scp_add_rpmsg_subdev(struct mtk_scp *scp) +{ + scp->rpmsg_subdev = + mtk_rpmsg_create_rproc_subdev(to_platform_device(scp->dev), + _scp_rpmsg_info); + if (scp->rpmsg_subdev) + rproc_add_subdev(scp->rproc, scp->rpmsg_subdev); +} + +static void scp_remove_rpmsg_subdev(struct mtk_scp *scp) +{ + if (scp->rpmsg_subdev) { + rproc_remove_subdev(scp->rproc, scp->rpmsg_subdev); + mtk_rpmsg_destroy_rproc_subdev(scp->rpmsg_subdev); + scp->rpmsg_subdev = NULL; + } +} + static int scp_probe(struct platform_device *pdev) { struct device *dev = >dev; @@ -594,22 +620,25 @@ static int scp_probe(struct platform_device *pdev) init_waitqueue_head(>run.wq); init_waitqueue_head(>ack_wq); + scp_add_rpmsg_subdev(scp); + ret = devm_request_threaded_irq(dev, platform_get_irq(pdev, 0), NULL, scp_irq_handler, IRQF_ONESHOT, pdev->name, scp); if (ret) { dev_err(dev, "failed to request irq\n"); - goto destroy_mutex; + goto remove_subdev; } ret = rproc_add(rproc); if (ret) - goto destroy_mutex; + goto remove_subdev; - return ret; + return 0; -destroy_mutex: +remove_subdev: + scp_remove_rpmsg_subdev(scp); mutex_destroy(>lock); free_rproc: rproc_free(rproc); @@ -621,6
[PATCH v10 6/7] mfd: cros_ec: differentiate SCP from EC by feature bit.
System Companion Processor (SCP) is Cortex M4 co-processor on some MediaTek platform that can run EC-style firmware. Since a SCP and EC would both exist on a system, and use the cros_ec_dev driver, we need to differentiate between them for the userspace, or they would both be registered at /dev/cros_ec, causing a conflict. Signed-off-by: Pi-Hsun Shih Acked-by: Enric Balletbo i Serra --- Changes from v9: - Remove changes in cros_ec_commands.h (which is sync in https://lore.kernel.org/lkml/20190518063949.GY4319@dell/T/). Changes from v8: - No change. Changes from v7: - Address comments in v7. - Rebase the series onto https://lore.kernel.org/patchwork/patch/1059196/. Changes from v6, v5, v4, v3, v2: - No change. Changes from v1: - New patch extracted from Patch 5. --- drivers/mfd/cros_ec_dev.c | 10 ++ include/linux/mfd/cros_ec.h | 1 + 2 files changed, 11 insertions(+) diff --git a/drivers/mfd/cros_ec_dev.c b/drivers/mfd/cros_ec_dev.c index a5391f96eafd..66107de3dbce 100644 --- a/drivers/mfd/cros_ec_dev.c +++ b/drivers/mfd/cros_ec_dev.c @@ -440,6 +440,16 @@ static int ec_device_probe(struct platform_device *pdev) ec_platform->ec_name = CROS_EC_DEV_TP_NAME; } + /* Check whether this is actually a SCP rather than an EC. */ + if (cros_ec_check_features(ec, EC_FEATURE_SCP)) { + dev_info(dev, "CrOS SCP MCU detected.\n"); + /* +* Help userspace differentiating ECs from SCP, +* regardless of the probing order. +*/ + ec_platform->ec_name = CROS_EC_DEV_SCP_NAME; + } + /* * Add the class device * Link to the character device for creating the /dev entry diff --git a/include/linux/mfd/cros_ec.h b/include/linux/mfd/cros_ec.h index cfa78bb4990f..751cb3756d49 100644 --- a/include/linux/mfd/cros_ec.h +++ b/include/linux/mfd/cros_ec.h @@ -27,6 +27,7 @@ #define CROS_EC_DEV_PD_NAME "cros_pd" #define CROS_EC_DEV_TP_NAME "cros_tp" #define CROS_EC_DEV_ISH_NAME "cros_ish" +#define CROS_EC_DEV_SCP_NAME "cros_scp" /* * The EC is unresponsive for a time after a reboot command. Add a -- 2.22.0.rc1.257.g3120a18244-goog
[PATCH v10 7/7] arm64: dts: mt8183: add scp node
From: Eddie Huang Add scp node to mt8183 and mt8183-evb Signed-off-by: Erin Lo Signed-off-by: Pi-Hsun Shih Signed-off-by: Eddie Huang --- Changes from v9: - Remove extra reserve-memory-vpu_share node. Changes from v8: - New patch. --- arch/arm64/boot/dts/mediatek/mt8183-evb.dts | 11 +++ arch/arm64/boot/dts/mediatek/mt8183.dtsi| 12 2 files changed, 23 insertions(+) diff --git a/arch/arm64/boot/dts/mediatek/mt8183-evb.dts b/arch/arm64/boot/dts/mediatek/mt8183-evb.dts index d8e555cbb5d3..e46e34ce3159 100644 --- a/arch/arm64/boot/dts/mediatek/mt8183-evb.dts +++ b/arch/arm64/boot/dts/mediatek/mt8183-evb.dts @@ -24,6 +24,17 @@ chosen { stdout-path = "serial0:921600n8"; }; + + reserved-memory { + #address-cells = <2>; + #size-cells = <2>; + ranges; + scp_mem_reserved: scp_mem_region { + compatible = "shared-dma-pool"; + reg = <0 0x5000 0 0x290>; + no-map; + }; + }; }; { diff --git a/arch/arm64/boot/dts/mediatek/mt8183.dtsi b/arch/arm64/boot/dts/mediatek/mt8183.dtsi index c2749c4631bc..133146b52904 100644 --- a/arch/arm64/boot/dts/mediatek/mt8183.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8183.dtsi @@ -254,6 +254,18 @@ clock-names = "spi", "wrap"; }; + scp: scp@1050 { + compatible = "mediatek,mt8183-scp"; + reg = <0 0x1050 0 0x8>, + <0 0x105c 0 0x5000>; + reg-names = "sram", "cfg"; + interrupts = ; + clocks = < CLK_INFRA_SCPSYS>; + clock-names = "main"; + memory-region = <_mem_reserved>; + status = "disabled"; + }; + auxadc: auxadc@11001000 { compatible = "mediatek,mt8183-auxadc", "mediatek,mt8173-auxadc"; -- 2.22.0.rc1.257.g3120a18244-goog
[PATCH v10 3/7] remoteproc: mt8183: add reserved memory manager API
From: Erin Lo Add memory table mapping API for other driver to lookup reserved physical and virtual memory Signed-off-by: Erin Lo Signed-off-by: Pi-Hsun Shih --- Changes from v9: - No change. Changes from v8: - Add more reserved regions for camera ISP. Changes from v7, v6, v5: - No change. Changes from v4: - New patch. --- drivers/remoteproc/mtk_scp.c | 135 ++ include/linux/platform_data/mtk_scp.h | 24 + 2 files changed, 159 insertions(+) diff --git a/drivers/remoteproc/mtk_scp.c b/drivers/remoteproc/mtk_scp.c index c4d900f4fe1c..bebecd470b8d 100644 --- a/drivers/remoteproc/mtk_scp.c +++ b/drivers/remoteproc/mtk_scp.c @@ -348,6 +348,137 @@ void *scp_mapping_dm_addr(struct platform_device *pdev, u32 mem_addr) } EXPORT_SYMBOL_GPL(scp_mapping_dm_addr); +#if SCP_RESERVED_MEM +phys_addr_t scp_mem_base_phys; +phys_addr_t scp_mem_base_virt; +phys_addr_t scp_mem_size; + +static struct scp_reserve_mblock scp_reserve_mblock[] = { + { + .num = SCP_ISP_MEM_ID, + .start_phys = 0x0, + .start_virt = 0x0, + .size = 0x20, /*2MB*/ + }, + { + .num = SCP_ISP_MEM2_ID, + .start_phys = 0x0, + .start_virt = 0x0, + .size = 0x80, /*8MB*/ + }, + { + .num = SCP_DIP_MEM_ID, + .start_phys = 0x0, + .start_virt = 0x0, + .size = 0x90, /*9MB*/ + }, + { + .num = SCP_MDP_MEM_ID, + .start_phys = 0x0, + .start_virt = 0x0, + .size = 0x60, /*6MB*/ + }, + { + .num = SCP_FD_MEM_ID, + .start_phys = 0x0, + .start_virt = 0x0, + .size = 0x10, /*1MB*/ + }, +}; + +static int scp_reserve_mem_init(struct mtk_scp *scp) +{ + enum scp_reserve_mem_id_t id; + phys_addr_t accumlate_memory_size = 0; + + scp_mem_base_phys = (phys_addr_t) (scp->phys_addr + MAX_CODE_SIZE); + scp_mem_size = (phys_addr_t) (scp->dram_size - MAX_CODE_SIZE); + + dev_info(scp->dev, +"phys:0x%llx - 0x%llx (0x%llx)\n", +scp_mem_base_phys, +scp_mem_base_phys + scp_mem_size, +scp_mem_size); + accumlate_memory_size = 0; + for (id = 0; id < SCP_NUMS_MEM_ID; id++) { + scp_reserve_mblock[id].start_phys = + scp_mem_base_phys + accumlate_memory_size; + accumlate_memory_size += scp_reserve_mblock[id].size; + dev_info(scp->dev, +"[reserve_mem:%d]: phys:0x%llx - 0x%llx (0x%llx)\n", +id, scp_reserve_mblock[id].start_phys, +scp_reserve_mblock[id].start_phys + +scp_reserve_mblock[id].size, +scp_reserve_mblock[id].size); + } + return 0; +} + +static int scp_reserve_memory_ioremap(struct mtk_scp *scp) +{ + enum scp_reserve_mem_id_t id; + phys_addr_t accumlate_memory_size = 0; + + scp_mem_base_virt = (phys_addr_t)(size_t)ioremap_wc(scp_mem_base_phys, + scp_mem_size); + + dev_info(scp->dev, +"virt:0x%llx - 0x%llx (0x%llx)\n", + (phys_addr_t)scp_mem_base_virt, + (phys_addr_t)scp_mem_base_virt + (phys_addr_t)scp_mem_size, + scp_mem_size); + for (id = 0; id < SCP_NUMS_MEM_ID; id++) { + scp_reserve_mblock[id].start_virt = + scp_mem_base_virt + accumlate_memory_size; + accumlate_memory_size += scp_reserve_mblock[id].size; + } + /* the reserved memory should be larger then expected memory +* or scp_reserve_mblock does not match dts +*/ + WARN_ON(accumlate_memory_size > scp_mem_size); +#ifdef DEBUG + for (id = 0; id < NUMS_MEM_ID; id++) { + dev_info(scp->dev, +"[mem_reserve-%d] phys:0x%llx,virt:0x%llx,size:0x%llx\n", +id, +scp_get_reserve_mem_phys(id), +scp_get_reserve_mem_virt(id), +scp_get_reserve_mem_size(id)); + } +#endif + return 0; +} +phys_addr_t scp_get_reserve_mem_phys(enum scp_reserve_mem_id_t id) +{ + if (id >= SCP_NUMS_MEM_ID) { + pr_err("[SCP] no reserve memory for %d", id); + return 0; + } else + return scp_reserve_mblock[id].start_phys; +} +EXPORT_SYMBOL_GPL(scp_get_reserve_mem_phys); + +phys_addr_t scp_get_reserve_mem_virt(enum scp_reserve_mem_id_t id) +{ + if (id >= SCP_NUMS_MEM_ID) { + pr_err("[SCP] no reserve memory for %d", id); + return 0; + } else + return scp_reserve_mblock[id].start_virt; +}
[PATCH v10 5/7] dt-bindings: Add binding for cros-ec-rpmsg.
Add a DT binding documentation for ChromeOS EC driver over rpmsg. Signed-off-by: Pi-Hsun Shih Acked-by: Rob Herring --- Changes from v9, v8, v7, v6: - No change. Changes from v5: - New patch. --- Documentation/devicetree/bindings/mfd/cros-ec.txt | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/mfd/cros-ec.txt b/Documentation/devicetree/bindings/mfd/cros-ec.txt index 6245c9b1a68b..4860eabd0f72 100644 --- a/Documentation/devicetree/bindings/mfd/cros-ec.txt +++ b/Documentation/devicetree/bindings/mfd/cros-ec.txt @@ -3,7 +3,7 @@ ChromeOS Embedded Controller Google's ChromeOS EC is a Cortex-M device which talks to the AP and implements various function such as keyboard and battery charging. -The EC can be connect through various means (I2C, SPI, LPC) and the +The EC can be connect through various means (I2C, SPI, LPC, RPMSG) and the compatible string used depends on the interface. Each connection method has its own driver which connects to the top level interface-agnostic EC driver. Other Linux driver (such as cros-ec-keyb for the matrix keyboard) connect to @@ -17,6 +17,9 @@ Required properties (SPI): - compatible: "google,cros-ec-spi" - reg: SPI chip select +Required properties (RPMSG): +- compatible: "google,cros-ec-rpmsg" + Optional properties (SPI): - google,cros-ec-spi-pre-delay: Some implementations of the EC need a little time to wake up from sleep before they can receive SPI transfers at a high -- 2.22.0.rc1.257.g3120a18244-goog
[PATCH v10 2/7] remoteproc/mediatek: add SCP support for mt8183
From: Erin Lo Provide a basic driver to control Cortex M4 co-processor Signed-off-by: Erin Lo Signed-off-by: Nicolas Boichat Signed-off-by: Pi-Hsun Shih --- Changes from v9: - No change. Changes from v8: - Add a missing space. Changes from v7: - Moved the location of shared SCP buffer. - Fix clock enable/disable sequence. - Add more IPI ID that would be used. Changes from v6: - No change. Changes from v5: - Changed some space to tab. Changes from v4: - Rename most function from mtk_scp_* to scp_*. - Change the irq to threaded handler. - Load ELF file instead of plain binary file as firmware by default (Squashed patch 6 in v4 into this patch). Changes from v3: - Fix some issue found by checkpatch. - Make writes aligned in scp_ipi_send. Changes from v2: - Squash patch 3 from v2 (separate the ipi interface) into this patch. - Remove unused name argument from scp_ipi_register. - Add scp_ipi_unregister for proper cleanup. - Move IPI ids in sync with firmware. - Add mb() in proper place, and correctly clear the run->signaled. Changes from v1: - Extract functions and rename variables in mtk_scp.c. --- drivers/remoteproc/Kconfig| 9 + drivers/remoteproc/Makefile | 1 + drivers/remoteproc/mtk_common.h | 75 drivers/remoteproc/mtk_scp.c | 513 ++ drivers/remoteproc/mtk_scp_ipi.c | 162 include/linux/platform_data/mtk_scp.h | 141 +++ 6 files changed, 901 insertions(+) create mode 100644 drivers/remoteproc/mtk_common.h create mode 100644 drivers/remoteproc/mtk_scp.c create mode 100644 drivers/remoteproc/mtk_scp_ipi.c create mode 100644 include/linux/platform_data/mtk_scp.h diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig index 18be41b8aa7e..ad3a0de04d9e 100644 --- a/drivers/remoteproc/Kconfig +++ b/drivers/remoteproc/Kconfig @@ -23,6 +23,15 @@ config IMX_REMOTEPROC It's safe to say N here. +config MTK_SCP + tristate "Mediatek SCP support" + depends on ARCH_MEDIATEK + help + Say y here to support Mediatek's System Companion Processor (SCP) via + the remote processor framework. + + It's safe to say N here. + config OMAP_REMOTEPROC tristate "OMAP remoteproc support" depends on ARCH_OMAP4 || SOC_OMAP5 diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile index ce5d061e92be..16b3e5e7a81c 100644 --- a/drivers/remoteproc/Makefile +++ b/drivers/remoteproc/Makefile @@ -10,6 +10,7 @@ remoteproc-y += remoteproc_sysfs.o remoteproc-y += remoteproc_virtio.o remoteproc-y += remoteproc_elf_loader.o obj-$(CONFIG_IMX_REMOTEPROC) += imx_rproc.o +obj-$(CONFIG_MTK_SCP) += mtk_scp.o mtk_scp_ipi.o obj-$(CONFIG_OMAP_REMOTEPROC) += omap_remoteproc.o obj-$(CONFIG_WKUP_M3_RPROC)+= wkup_m3_rproc.o obj-$(CONFIG_DA8XX_REMOTEPROC) += da8xx_remoteproc.o diff --git a/drivers/remoteproc/mtk_common.h b/drivers/remoteproc/mtk_common.h new file mode 100644 index ..7504ae1bc0ef --- /dev/null +++ b/drivers/remoteproc/mtk_common.h @@ -0,0 +1,75 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (c) 2018 MediaTek Inc. + */ + +#ifndef __RPROC_MTK_COMMON_H +#define __RPROC_MTK_COMMON_H + +#include +#include +#include +#include + +#define MT8183_SW_RSTN 0x0 +#define MT8183_SW_RSTN_BIT BIT(0) +#define MT8183_SCP_TO_HOST 0x1C +#define MT8183_SCP_IPC_INT_BIT BIT(0) +#define MT8183_SCP_WDT_INT_BIT BIT(8) +#define MT8183_HOST_TO_SCP 0x28 +#define MT8183_HOST_IPC_INT_BITBIT(0) +#define MT8183_SCP_SRAM_PDN0x402C + +#define SCP_FW_VER_LEN 32 + +struct scp_run { + u32 signaled; + s8 fw_ver[SCP_FW_VER_LEN]; + u32 dec_capability; + u32 enc_capability; + wait_queue_head_t wq; +}; + +struct scp_ipi_desc { + scp_ipi_handler_t handler; + void *priv; +}; + +struct mtk_scp { + struct device *dev; + struct rproc *rproc; + struct clk *clk; + void __iomem *reg_base; + void __iomem *sram_base; + size_t sram_size; + + struct share_obj *recv_buf; + struct share_obj *send_buf; + struct scp_run run; + struct mutex lock; /* for protecting mtk_scp data structure */ + struct scp_ipi_desc ipi_desc[SCP_IPI_MAX]; + bool ipi_id_ack[SCP_IPI_MAX]; + wait_queue_head_t ack_wq; + + void __iomem *cpu_addr; + phys_addr_t phys_addr; + size_t dram_size; +}; + +/** + * struct share_obj - SRAM buffer shared with + * AP and SCP + * + * @id:IPI id + * @len: share buffer length + * @share_buf: share buffer data + */ +struct share_obj { + s32 id; + u32 len; + u8 share_buf[288]; +}; + +void scp_memcpy_aligned(void
[PATCH v10 1/7] dt-bindings: Add a binding for Mediatek SCP
From: Erin Lo Add a DT binding documentation of SCP for the MT8183 SoC from Mediatek. Signed-off-by: Erin Lo Signed-off-by: Pi-Hsun Shih Reviewed-by: Rob Herring --- Changes from v9, v8, v7, v6: - No change. Changes from v5: - Remove dependency on CONFIG_RPMSG_MTK_SCP. Changes from v4: - Add detail of more properties. - Document the usage of mtk,rpmsg-name in subnode from the new design. Changes from v3: - No change. Changes from v2: - No change. I realized that for this patch series, there's no need to add anything under the mt8183-scp node (neither the mt8183-rpmsg or the cros-ec-rpmsg) for them to work, since mt8183-rpmsg is added directly as a rproc_subdev by code, and cros-ec-rpmsg is dynamically created by SCP name service. Changes from v1: - No change. --- .../bindings/remoteproc/mtk,scp.txt | 36 +++ 1 file changed, 36 insertions(+) create mode 100644 Documentation/devicetree/bindings/remoteproc/mtk,scp.txt diff --git a/Documentation/devicetree/bindings/remoteproc/mtk,scp.txt b/Documentation/devicetree/bindings/remoteproc/mtk,scp.txt new file mode 100644 index ..3ba668bab14b --- /dev/null +++ b/Documentation/devicetree/bindings/remoteproc/mtk,scp.txt @@ -0,0 +1,36 @@ +Mediatek SCP Bindings + + +This binding provides support for ARM Cortex M4 Co-processor found on some +Mediatek SoCs. + +Required properties: +- compatible Should be "mediatek,mt8183-scp" +- reg Should contain the address ranges for the two memory + regions, SRAM and CFG. +- reg-namesContains the corresponding names for the two memory + regions. These should be named "sram" & "cfg". +- clocks Clock for co-processor (See: ../clock/clock-bindings.txt) +- clock-names Contains the corresponding name for the clock. This + should be named "main". + +Subnodes + + +Subnodes of the SCP represent rpmsg devices. The names of the devices are not +important. The properties of these nodes are defined by the individual bindings +for the rpmsg devices - but must contain the following property: + +- mtk,rpmsg-name Contains the name for the rpmsg device. Used to match + the subnode to rpmsg device announced by SCP. + +Example: + + scp: scp@1050 { + compatible = "mediatek,mt8183-scp"; + reg = <0 0x1050 0 0x8>, + <0 0x105c 0 0x5000>; + reg-names = "sram", "cfg"; + clocks = < CLK_INFRA_SCPSYS>; + clock-names = "main"; + }; -- 2.22.0.rc1.257.g3120a18244-goog
[PATCH v10 0/7] Add support for mt8183 SCP.
Add support for controlling and communicating with mt8183's system control processor (SCP), using the remoteproc & rpmsg framework. And also add a cros_ec driver for CrOS EC host command over rpmsg. The overall structure of the series is: * remoteproc/mtk_scp.c: Control the start / stop of SCP (Patch 2, 3). * remoteproc/mtk_scp_ipi.c: Communicates to SCP using inter-processor interrupt (IPI) and shared memory (Patch 2, 3). * rpmsg/mtk_rpmsg.c: Wrapper to wrap the IPI communication into a rpmsg device. Supports name service for SCP firmware to announce channels (Patch 4). * platform/chrome/cros_ec_rpmsg.c: Communicates with the SCP over the rpmsg framework (like what platform/chrome/cros_ec_{i2c,spi}.c does) (Patch 5, 6). * add scp dts node to mt8183 platform (Patch 7). This series (In particular, Patch 7) is based on https://patchwork.kernel.org/cover/10962385/. Changes from v9: - Remove reserve-memory-vpu_share node. - Remove change to cros_ec_commands.h (That is already in https://lore.kernel.org/lkml/20190518063949.GY4319@dell/T/) Changes from v8: - Rebased onto https://patchwork.kernel.org/cover/10962385/. - Drop merged cros_ec_rpmsg patch, and add scp dts node patch. - Add more reserved memory region. Changes from v7: - Rebase onto https://lore.kernel.org/patchwork/patch/1059196/. - Fix clock enable/disable timing for SCP driver. - Add more SCP IPI ID. Changes from v6: - Decouple mtk_rpmsg from mtk_scp. - Change data of EC response to be aligned to 4 bytes. Changes from v5: - Add device tree binding document for cros_ec_rpmsg. - Better document in comments for cros_ec_rpmsg. - Remove dependency on CONFIG_ in binding tree document. Changes from v4: - Merge patch 6 (Load ELF firmware) into patch 2, so the driver loads ELF firmware by default, and no longer accept plain binary. - rpmsg_device listed in device tree (as a child of the SCP node) would have it's device tree node mapped to the rpmsg_device, so the rpmsg driver can use the properties on device tree. Changes from v3: - Make writing to SCP SRAM aligned. - Add a new patch (Patch 6) to load ELF instead of bin firmware. - Add host event support for EC driver. - Fix some bugs found in testing (missing spin_lock_init, rproc_subdev_unprepare to rproc_subdev_stop). - Fix some coding style issue found by checkpatch.pl. Changes from v2: - Fold patch 3 into patch 2 in v2. - Move IPI id around to support cross-testing for old and new firmware. - Finish more TODO items. Changes from v1: - Extract functions and rename variables in mtk_scp.c. - Do cleanup properly in mtk_rpmsg.c, which also removes the problem of short-lived work items. - Code format fix based on feedback for cros_ec_rpmsg.c. - Extract feature detection for SCP into separate patch (Patch 6). Eddie Huang (1): arm64: dts: mt8183: add scp node Erin Lo (3): dt-bindings: Add a binding for Mediatek SCP remoteproc/mediatek: add SCP support for mt8183 remoteproc: mt8183: add reserved memory manager API Pi-Hsun Shih (3): rpmsg: add rpmsg support for mt8183 SCP. dt-bindings: Add binding for cros-ec-rpmsg. mfd: cros_ec: differentiate SCP from EC by feature bit. .../devicetree/bindings/mfd/cros-ec.txt | 5 +- .../bindings/remoteproc/mtk,scp.txt | 36 + arch/arm64/boot/dts/mediatek/mt8183-evb.dts | 11 + arch/arm64/boot/dts/mediatek/mt8183.dtsi | 12 + drivers/mfd/cros_ec_dev.c | 10 + drivers/remoteproc/Kconfig| 10 + drivers/remoteproc/Makefile | 1 + drivers/remoteproc/mtk_common.h | 77 ++ drivers/remoteproc/mtk_scp.c | 678 ++ drivers/remoteproc/mtk_scp_ipi.c | 163 + drivers/rpmsg/Kconfig | 9 + drivers/rpmsg/Makefile| 1 + drivers/rpmsg/mtk_rpmsg.c | 396 ++ include/linux/mfd/cros_ec.h | 1 + include/linux/platform_data/mtk_scp.h | 167 + include/linux/rpmsg/mtk_rpmsg.h | 30 + 16 files changed, 1606 insertions(+), 1 deletion(-) create mode 100644 Documentation/devicetree/bindings/remoteproc/mtk,scp.txt create mode 100644 drivers/remoteproc/mtk_common.h create mode 100644 drivers/remoteproc/mtk_scp.c create mode 100644 drivers/remoteproc/mtk_scp_ipi.c create mode 100644 drivers/rpmsg/mtk_rpmsg.c create mode 100644 include/linux/platform_data/mtk_scp.h create mode 100644 include/linux/rpmsg/mtk_rpmsg.h -- 2.22.0.rc1.257.g3120a18244-goog
Re: [PATCH] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work
Hi Michael, On Sun, Jun 02, 2019 at 08:15:38PM +1000, Michael Ellerman wrote: > Hi Nathan, > > It's always preferable IMHO to keep any initialisation as localised as > possible, so that the compiler can continue to warn about uninitialised > usages elsewhere. In this case that would mean doing the rc = 0 in the > switch, something like: I am certainly okay with implementing this in a v2. I mulled over which would be preferred, I suppose I guessed wrong :) Thank you for the review and input. > > diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c > b/drivers/scsi/ibmvscsi/ibmvscsi.c > index 727c31dc11a0..7ee5755cf636 100644 > --- a/drivers/scsi/ibmvscsi/ibmvscsi.c > +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c > @@ -2123,9 +2123,6 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data > *hostdata) > > spin_lock_irqsave(hostdata->host->host_lock, flags); > switch (hostdata->action) { > - case IBMVSCSI_HOST_ACTION_NONE: > - case IBMVSCSI_HOST_ACTION_UNBLOCK: > - break; > case IBMVSCSI_HOST_ACTION_RESET: > spin_unlock_irqrestore(hostdata->host->host_lock, flags); > rc = ibmvscsi_reset_crq_queue(>queue, hostdata); > @@ -2142,7 +2139,10 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data > *hostdata) > if (!rc) > rc = ibmvscsi_send_crq(hostdata, > 0xC001LL, 0); > break; > + case IBMVSCSI_HOST_ACTION_NONE: > + case IBMVSCSI_HOST_ACTION_UNBLOCK: > default: > + rc = 0; > break; > } > > > But then that makes me wonder if that's actually correct? > > If we get an action that we don't recognise should we just throw it away > like that? (by doing hostdata->action = IBMVSCSI_HOST_ACTION_NONE). Tyrel? However, because of this, I will hold off on v2 until Tyrel can give some feedback. Thanks, Nathan
[PATCH] ipvlan: Don't propagate IFF_ALLMULTI changes on down interfaces.
Clearing the IFF_ALLMULTI flag on a down interface could cause an allmulti overflow on the underlying interface. Attempting the set IFF_ALLMULTI on the underlying interface would cause an error and the log message: "allmulti touches root, set allmulti failed." Signed-off-by: Young Xiao <92siuy...@gmail.com> --- drivers/net/ipvlan/ipvlan_main.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index bbeb162..523bb83 100644 --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -242,8 +242,10 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change) struct ipvl_dev *ipvlan = netdev_priv(dev); struct net_device *phy_dev = ipvlan->phy_dev; - if (change & IFF_ALLMULTI) - dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1); + if (dev->flags & IFF_UP) { + if (change & IFF_ALLMULTI) + dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI ? 1 : -1); + } } static void ipvlan_set_multicast_mac_filter(struct net_device *dev) -- 2.7.4
Re: rcu_read_lock lost its compiler barrier
On Sun, Jun 02, 2019 at 05:06:17PM -0700, Paul E. McKenney wrote: > > Please note that preemptible Tree RCU has lacked the compiler barrier on > all but the outermost rcu_read_unlock() for years before Boqun's patch. Actually this is not true. Boqun's patch (commit bb73c52bad36) does not add a barrier() to __rcu_read_lock. In fact I dug into the git history and this compiler barrier() has existed in preemptible tree RCU since the very start in 2009: : commit f41d911f8c49a5d65c86504c19e8204bb605c4fd : Author: Paul E. McKenney : Date: Sat Aug 22 13:56:52 2009 -0700 : : rcu: Merge preemptable-RCU functionality into hierarchical RCU : : +/* : + * Tree-preemptable RCU implementation for rcu_read_lock(). : + * Just increment ->rcu_read_lock_nesting, shared state will be updated : + * if we block. : + */ : +void __rcu_read_lock(void) : +{ : + ACCESS_ONCE(current->rcu_read_lock_nesting)++; : + barrier(); /* needed if we ever invoke rcu_read_lock in rcutree.c */ : +} : +EXPORT_SYMBOL_GPL(__rcu_read_lock); However, you are correct that in the non-preempt tree RCU case, the compiler barrier in __rcu_read_lock was not always present. In fact it was added by: : commit 386afc91144b36b42117b0092893f15bc8798a80 : Author: Linus Torvalds : Date: Tue Apr 9 10:48:33 2013 -0700 : : spinlocks and preemption points need to be at least compiler barriers I suspect this is what prompted you to remove it in 2015. > I do not believe that reverting that patch will help you at all. > > But who knows? So please point me at the full code body that was being > debated earlier on this thread. It will no doubt take me quite a while to > dig through it, given my being on the road for the next couple of weeks, > but so it goes. Please refer to my response to Linus for the code in question. In any case, I am now even more certain that compiler barriers are not needed in the code in question. The reasoning is quite simple. If you need those compiler barriers then you surely need real memory barriers. Vice versa, if real memory barriers are already present thanks to RCU, then you don't need those compiler barriers. In fact this calls into question the use of READ_ONCE/WRITE_ONCE in RCU primitives such as rcu_dereference and rcu_assign_pointer. IIRC when RCU was first added to the Linux kernel we did not have compiler barriers in rcu_dereference and rcu_assign_pointer. They were added later on. As compiler barriers per se are useless, these are surely meant to be coupled with the memory barriers provided by RCU grace periods and synchronize_rcu. But then those real memory barriers would have compiler barriers too. So why do we need the compiler barriers in rcu_dereference and rcu_assign_pointer? Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
RE: [PATCH] arm64: dts: imx8mm: Fix build warnings
Hi, Fabio > -Original Message- > From: Fabio Estevam > Sent: Monday, June 3, 2019 10:49 AM > To: Anson Huang > Cc: Rob Herring ; Mark Rutland > ; Shawn Guo ; Sascha > Hauer ; Sascha Hauer ; > Leonard Crestez ; Aisheng Dong > ; viresh kumar ; Jacky > Bai ; open list:OPEN FIRMWARE AND FLATTENED > DEVICE TREE BINDINGS ; moderated > list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE ker...@lists.infradead.org>; linux-kernel ; dl- > linux-imx > Subject: Re: [PATCH] arm64: dts: imx8mm: Fix build warnings > > Hi Anson, > > On Sun, Jun 2, 2019 at 9:46 PM wrote: > > > > From: Anson Huang > > > > This patch fixes below build warning with "W=1": > > I have already sent patches to fix these warnings. OK, thanks, then please ignore this patch. Anson.
Re: [PATCH 3/3] ACPI / device_sysfs: Add eject show attr to monitor eject status
On Fri, May 31, 2019 at 06:38:59AM -0700, Greg KH wrote: > On Fri, May 31, 2019 at 02:56:42PM +0800, Chester Lin wrote: > > An acpi_eject_show attribute for users to monitor current status because > > sometimes it might take time to finish an ejection so we need to know > > whether it is still in progress or not. > > > > Signed-off-by: Chester Lin > > --- > > drivers/acpi/device_sysfs.c | 20 +++- > > drivers/acpi/internal.h | 1 + > > drivers/acpi/scan.c | 27 +++ > > 3 files changed, 47 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/acpi/device_sysfs.c b/drivers/acpi/device_sysfs.c > > index 78c2653bf020..70b22eec6bbc 100644 > > --- a/drivers/acpi/device_sysfs.c > > +++ b/drivers/acpi/device_sysfs.c > > @@ -403,7 +403,25 @@ acpi_eject_store(struct device *d, struct > > device_attribute *attr, > > return status == AE_NO_MEMORY ? -ENOMEM : -EAGAIN; > > } > > > > -static DEVICE_ATTR(eject, 0200, NULL, acpi_eject_store); > > +static ssize_t acpi_eject_show(struct device *d, > > + struct device_attribute *attr, char *buf) > > +{ > > + struct acpi_device *acpi_device = to_acpi_device(d); > > + acpi_object_type not_used; > > + acpi_status status; > > + > > + if ((!acpi_device->handler || !acpi_device->handler->hotplug.enabled) > > + && !acpi_device->driver) > > + return -ENODEV; > > + > > + status = acpi_get_type(acpi_device->handle, _used); > > + if (ACPI_FAILURE(status) || !acpi_device->flags.ejectable) > > + return -ENODEV; > > + > > + return sprintf(buf, "%s\n", acpi_eject_status_string(acpi_device)); > > +} > > + > > +static DEVICE_ATTR(eject, 0644, acpi_eject_show, acpi_eject_store); > > DEVICE_ATTR_RW()? > > And you need to document the new sysfs file in Documentation/ABI/ > > thanks, > > greg k-h > Hi Greg, Thank you for the reminder and I will fix these two in v2. Regards, Chester
Re: [PATCH] arm64: dts: imx8mm: Fix build warnings
Hi Anson, On Sun, Jun 2, 2019 at 9:46 PM wrote: > > From: Anson Huang > > This patch fixes below build warning with "W=1": I have already sent patches to fix these warnings.
Re: rcu_read_lock lost its compiler barrier
On Sun, Jun 02, 2019 at 01:54:12PM -0700, Linus Torvalds wrote: > On Sat, Jun 1, 2019 at 10:56 PM Herbert Xu > wrote: > > > > You can't then go and decide to remove the compiler barrier! To do > > that you'd need to audit every single use of rcu_read_lock in the > > kernel to ensure that they're not depending on the compiler barrier. > > What's the possible case where it would matter when there is no preemption? The case we were discussing is from net/ipv4/inet_fragment.c from the net-next tree: void fqdir_exit(struct fqdir *fqdir) { ... fqdir->dead = true; /* call_rcu is supposed to provide memory barrier semantics, * separating the setting of fqdir->dead with the destruction * work. This implicit barrier is paired with inet_frag_kill(). */ INIT_RCU_WORK(>destroy_rwork, fqdir_rwork_fn); queue_rcu_work(system_wq, >destroy_rwork); } and void inet_frag_kill(struct inet_frag_queue *fq) { ... rcu_read_lock(); /* The RCU read lock provides a memory barrier * guaranteeing that if fqdir->dead is false then * the hash table destruction will not start until * after we unlock. Paired with inet_frags_exit_net(). */ if (!fqdir->dead) { rhashtable_remove_fast(>rhashtable, >node, fqdir->f->rhash_params); ... } ... rcu_read_unlock(); ... } I simplified this to Initial values: a = 0 b = 0 CPU1CPU2 a = 1 rcu_read_lock synchronize_rcu if (a == 0) b = 2 b = 1 rcu_read_unlock On exit we want this to be true: b == 2 Now what Paul was telling me is that unless every memory operation is done with READ_ONCE/WRITE_ONCE then his memory model shows that the exit constraint won't hold. IOW, we need CPU1CPU2 WRITE_ONCE(a, 1)rcu_read_lock synchronize_rcu if (READ_ONCE(a) == 0) WRITE_ONCE(b, 2)WRITE_ONCE(b, 1) rcu_read_unlock Now I think this bullshit because if we really needed these compiler barriers then we surely would need real memory barriers to go with them. In fact, the sole purpose of the RCU mechanism is to provide those memory barriers. Quoting from Documentation/RCU/Design/Requirements/Requirements.html: Each CPU that has an RCU read-side critical section that begins before synchronize_rcu() starts is guaranteed to execute a full memory barrier between the time that the RCU read-side critical section ends and the time that synchronize_rcu() returns. Without this guarantee, a pre-existing RCU read-side critical section might hold a reference to the newly removed struct foo after the kfree() on line14 of remove_gp_synchronous(). Each CPU that has an RCU read-side critical section that ends after synchronize_rcu() returns is guaranteed to execute a full memory barrier between the time that synchronize_rcu() begins and the time that the RCU read-side critical section begins. Without this guarantee, a later RCU read-side critical section running after the kfree() on line14 of remove_gp_synchronous() might later run do_something_gp() and find the newly deleted struct foo. My review of the RCU code shows that these memory barriers are indeed present (at least when we're not in tiny mode where all this discussion would be moot anyway). For example, in call_rcu we eventually get down to rcu_segcblist_enqueue which has an smp_mb. On the reader side (correct me if I'm wrong Paul) the memory barrier is implicitly coming from the scheduler. My point is that within our kernel whenever we have a CPU memory barrier we always have a compiler barrier too. Therefore my code example above does not need any extra compiler barriers such as the ones provided by READ_ONCE/WRITE_ONCE. I think perhaps Paul was perhaps thinking that I'm expecting rcu_read_lock/rcu_read_unlock themselves to provide the memory or compiler barriers. That would indeed be wrong but this is not what I need. All I need is the RCU semantics as documented for there to be memory and compiler barriers around the whole grace period. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[cgroup] c03cd7738a: BUG:KASAN:slab-out-of-bounds_in_c
FYI, we noticed the following commit (built with gcc-7): commit: c03cd7738a83b13739f00546166969342c8ff014 ("cgroup: Include dying leaders with live threads in PROCS iterations") https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git for-next in testcase: trinity with following parameters: runtime: 300s test-description: Trinity is a linux system call fuzz tester. test-url: http://codemonkey.org.uk/projects/trinity/ on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 2G caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +--+++ | | b636fd38dc | c03cd7738a | +--+++ | boot_successes | 18 | 5 | | boot_failures| 1 | 9 | | BUG:kernel_hang_in_boot-around-mounting-root_stage | 1 | | | BUG:KASAN:slab-out-of-bounds_in_c| 0 | 7 | | WARNING:at_lib/refcount.c:#refcount_inc_checked | 0 | 8 | | RIP:refcount_inc_checked | 0 | 8 | | WARNING:at_lib/refcount.c:#refcount_sub_and_test_checked | 0 | 8 | | RIP:refcount_sub_and_test_checked| 0 | 8 | | BUG:KASAN:use-after-free_in_c| 0 | 1 | +--+++ If you fix the issue, kindly add following tag Reported-by: kernel test robot [ 18.337218] BUG: KASAN: slab-out-of-bounds in css_task_iter_advance+0x1bd/0x240 [ 18.338974] Read of size 4 at addr 888050ff294c by task systemd/1 [ 18.340408] [ 18.340960] CPU: 1 PID: 1 Comm: systemd Not tainted 5.2.0-rc2-00013-gc03cd77 #1 [ 18.342728] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 18.344685] Call Trace: [ 18.345424] dump_stack+0x7d/0xb8 [ 18.346304] ? css_task_iter_advance+0x1bd/0x240 [ 18.347420] print_address_description+0xa1/0x330 [ 18.348547] ? css_task_iter_advance+0x1bd/0x240 [ 18.349658] ? css_task_iter_advance+0x1bd/0x240 [ 18.350767] ? css_task_iter_advance+0x1bd/0x240 [ 18.351878] __kasan_report+0x11d/0x163 [ 18.352850] ? css_task_iter_advance+0x1bd/0x240 [ 18.353965] kasan_report+0x2f/0x40 [ 18.354873] __asan_load4+0x6a/0x90 [ 18.355780] css_task_iter_advance+0x1bd/0x240 [ 18.356857] css_task_iter_start+0xd0/0x120 [ 18.357889] pidlist_array_load+0x107/0x540 [ 18.358921] ? cgroup_pidlist_find+0xa0/0xa0 [ 18.359972] cgroup_pidlist_start+0x24e/0x2b0 [ 18.361037] cgroup_seqfile_start+0x57/0x60 [ 18.362065] ? cgroup_file_release+0x60/0x60 [ 18.363111] kernfs_seq_start+0x86/0xd0 [ 18.364080] seq_read+0x16e/0x750 [ 18.364960] kernfs_fop_read+0x23c/0x2b0 [ 18.365949] ? security_file_permission+0x140/0x1c0 [ 18.367106] ? kernfs_fop_write+0x280/0x280 [ 18.368149] __vfs_read+0x59/0xb0 [ 18.369024] vfs_read+0xeb/0x1d0 [ 18.369888] ksys_read+0x134/0x1b0 [ 18.370787] ? kernel_write+0xa0/0xa0 [ 18.371734] ? __this_cpu_preempt_check+0x2f/0x150 [ 18.372922] __x64_sys_read+0x43/0x50 [ 18.373876] do_syscall_64+0xd3/0x3a0 [ 18.374930] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 18.376117] RIP: 0033:0x7f02f62f56d0 [ 18.377046] Code: b6 fe ff ff 48 8d 3d 17 be 08 00 48 83 ec 08 e8 06 db 01 00 66 0f 1f 44 00 00 83 3d 39 30 2c 00 00 75 10 b8 00 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 de 9b 01 00 48 89 04 24 [ 18.381026] RSP: 002b:7ffd6f7342b8 EFLAGS: 0246 ORIG_RAX: [ 18.382834] RAX: ffda RBX: 55f99ecc8110 RCX: 7f02f62f56d0 [ 18.384393] RDX: 1000 RSI: 55f99ec82710 RDI: 0023 [ 18.385954] RBP: 0d68 R08: 7f02f65b41a8 R09: 1010 [ 18.387509] R10: 0050 R11: 0246 R12: 7f02f65b0440 [ 18.389058] R13: 7f02f65af900 R14: R15: [ 18.390621] [ 18.391170] Allocated by task 1: [ 18.392034] __kasan_kmalloc+0xe4/0x150 [ 18.393199] kasan_kmalloc+0x28/0x40 [ 18.394119] find_css_set+0x1ad/0x770 [ 18.395058] cgroup_migrate_prepare_dst+0x10d/0x3a0 [ 18.396226] cgroup_attach_task+0x1ee/0x290 [ 18.397258] __cgroup1_procs_write+0x17a/0x210 [ 18.398523] cgroup1_procs_write+0x2a/0x40 [ 18.399539] cgroup_file_write+0x190/0x330 [ 18.400559] kernfs_fop_write+0x1d9/0x280 [ 18.401563] __vfs_write+0x59/0xb0 [ 18.402461] vfs_write+0x13c/0x2d0 [ 18.403354] ksys_write+0x134/0x1b0 [ 18.404262] __x64_sys_write+0x43/0x50 [ 18.405218] do_syscall_64+0xd3/0x3a0
RE: [PATCH] usb: dwc3: Enable the USB snooping
Hi Felipe, On Thursday, May 30, 2019 17:09, Ran Wang wrote: > > > > >> >> > /* Global Debug Queue/FIFO Space Available Register */ > > >> >> > #define DWC3_GDBGFIFOSPACE_NUM(n) ((n) & 0x1f) > > >> >> > #define DWC3_GDBGFIFOSPACE_TYPE(n) (((n) << 5) & 0x1e0) > > >> >> > @@ -859,6 +867,7 @@ struct dwc3_scratchpad_array { > > >> >> > * 3 - Reserved > > >> >> > * @imod_interval: set the interrupt moderation interval in 250ns > > >> >> > * increments or 0 to disable. > > >> >> > + * @dma_coherent: set if enable dma-coherent. > > >> >> > > >> >> you're not enabling dma coherency, you're enabling cache snooping. > > >> >> And this property should describe that. Also, keep in mind that > > >> >> different devices may want different cache types for each of > > >> >> those fields, so your property would have to be a lot more > > >> >> complex. Something > > like: > > >> >> > > >> >> snps,cache-type = , , ... > > >> >> > > >> >> Then driver would have to parse this properly to setup GSBUSCFG0. > > > > > > According to the DesignWare Cores SuperSpeed USB 3.0 Controller > > > Databook (v2.60a), it has described Type Bit Assignments for all > > > supported > > master bus type: > > > AHB, AXI3, AXI4 and Native. I found the bit definition are different > > > among > > them. > > > So, for the example you gave above, feel a little bit confused. > > > Did you mean: > > > snps,cache-type = , > > "cacheable">, , > > > > yeah, something like that. > > I think DATA_RD should be a macro, right? So, where I can put its define? > Create a dwc3.h in include/dt-bindings/usb/ ? Could you please give me some advice here? I'd like to prepare next version patch after getting this settled. > Another question about this remain open is: DWC3 data book's Table 6-5 Cache > Type Bit Assignments show that bits definition will differ per MBUS_TYPEs as > below: > > MBUS_TYPE| bit[3] |bit[2] |bit[1] |bit[0] > > AHB |Cacheable |Bufferable |Privilegge |Data > AXI3 |Write Allocate|Read Allocate|Cacheable |Bufferable > AXI4 |Allocate Other|Allocate |Modifiable |Bufferable > AXI4 |Other Allocate|Allocate |Modifiable |Bufferable > Native |Same as AXI |Same as AXI |Same as AXI|Same as AXI > > Note: The AHB, AXI3, AXI4, and PCIe busses use different names for certain > signals, which have the same meaning: >Bufferable = Posted >Cacheable = Modifiable = Snoop (negation of No Snoop) > > For Layerscape SoCs, MBUS_TYPE is AXI3. So I am not sure how to use > snps,cache-type = , to cover all MBUS_TYPE? > (you can notice that AHB and AXI3's cacheable are on different bit) Or I just > need > to handle AXI3 case? Also on this open. Thank you in advance. Regards, Ran
[PATCH] unicore32: check stack pointer in get_wchan
get_wchan() is lockless. Task may wakeup at any time and change its own stack, thus each next stack frame may be overwritten and filled with random stuff. This patch fixes oops in unwind_frame() by adding stack pointer validation on each step (as x86 code do), unwind_frame() already checks frame pointer. See commit 1b15ec7a7427 ("ARM: 7912/1: check stack pointer in get_wchan") for details. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- arch/unicore32/kernel/process.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c index 2bc10b8..1899ebc 100644 --- a/arch/unicore32/kernel/process.c +++ b/arch/unicore32/kernel/process.c @@ -277,6 +277,7 @@ EXPORT_SYMBOL(dump_fpu); unsigned long get_wchan(struct task_struct *p) { struct stackframe frame; + unsigned long stack_page; int count = 0; if (!p || p == current || p->state == TASK_RUNNING) return 0; @@ -285,9 +286,11 @@ unsigned long get_wchan(struct task_struct *p) frame.sp = thread_saved_sp(p); frame.lr = 0; /* recovered from the stack */ frame.pc = thread_saved_pc(p); + stack_page = (unsigned long)task_stack_page(p); do { - int ret = unwind_frame(); - if (ret < 0) + if (frame.sp < stack_page || + frame.sp >= stack_page + THREAD_SIZE || + unwind_frame() < 0) return 0; if (!in_sched_functions(frame.pc)) return frame.pc; -- 2.7.4
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote: On 31/05/2019 08:49, Shawn Anastasio wrote: On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote: On 28/05/2019 17:39, Shawn Anastasio wrote: On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: On 28/05/2019 15:36, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the guest which receives an event from qemu (RAS_EPOW from /proc/interrupts), fetches device tree chunks (and as I understand it - they come with BARs from phyp but without from qemu) and writes "1" to "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: Interesting. Does this mean that the PHYP hotplug path doesn't call pci_assign_resource? I'd expect dlpar_add_slot() to be called under phyp and eventually pci_device_add() which (I think) may or may not trigger later reassignment. If so it means the patch may not break that platform after all, though it still may not be the correct way of doing things. We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems that (unless resource_alignment= is used) the pseries guest should just walk through all allocated resources and leave them unchanged. If we add a pcibios_default_alignment() implementation like was suggested earlier, then it will behave as if the user has specified resource_alignment= by default and SLOF's assignments won't be honored (I think). I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and tried booting with and without pci=resource_alignment= and I can see no difference - BARs are still aligned to 64K as programmed in SLOF; if I hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves them unchanged. I guess it boils down to one question - is it important that we observe SLOF's initial BAR assignments? It isn't if it's SLOF but it is if it's phyp. It used to not allow/support BAR reassignment and even if it does not, I'd rather avoid touching them. A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which worked, but if I add an implementation of pcibios_default_alignment which simply returns PAGE_SIZE, my VM fails to boot and many errors from the virtio disk driver are printed to the console. After some investigation, it seems that with pcibios_default_alignment present, Linux will reallocate all resources provided by SLOF on boot. I'm still not sure why exactly
[PATCH v3] mtd: rawnand: Add Macronix NAND read retry support
Add support for Macronix NAND read retry. Macronix NANDs support specific read operation for data recovery, which can be enabled with a SET_FEATURE. Driver checks byte 167 of Vendor Blocks in ONFI parameter page table to see if this high-reliability function is supported. Signed-off-by: Mason Yang --- drivers/mtd/nand/raw/nand_macronix.c | 45 1 file changed, 45 insertions(+) diff --git a/drivers/mtd/nand/raw/nand_macronix.c b/drivers/mtd/nand/raw/nand_macronix.c index fad57c3..58511ae 100644 --- a/drivers/mtd/nand/raw/nand_macronix.c +++ b/drivers/mtd/nand/raw/nand_macronix.c @@ -8,6 +8,50 @@ #include "internals.h" +#define MACRONIX_READ_RETRY_BIT BIT(0) +#define MACRONIX_NUM_READ_RETRY_MODES 6 + +struct nand_onfi_vendor_macronix { + u8 reserved; + u8 reliability_func; +} __packed; + +static int macronix_nand_setup_read_retry(struct nand_chip *chip, int mode) +{ + u8 feature[ONFI_SUBFEATURE_PARAM_LEN]; + + if (!chip->parameters.supports_set_get_features || + !test_bit(ONFI_FEATURE_ADDR_READ_RETRY, + chip->parameters.set_feature_list)) + return -ENOTSUPP; + + feature[0] = mode; + return nand_set_features(chip, ONFI_FEATURE_ADDR_READ_RETRY, feature); +} + +static void macronix_nand_onfi_init(struct nand_chip *chip) +{ + struct nand_parameters *p = >parameters; + struct nand_onfi_vendor_macronix *mxic; + + if (!p->onfi) + return; + + mxic = (struct nand_onfi_vendor_macronix *)p->onfi->vendor; + if ((mxic->reliability_func & MACRONIX_READ_RETRY_BIT) == 0) + return; + + chip->read_retries = MACRONIX_NUM_READ_RETRY_MODES; + chip->setup_read_retry = macronix_nand_setup_read_retry; + + if (p->supports_set_get_features) { + bitmap_set(p->set_feature_list, + ONFI_FEATURE_ADDR_READ_RETRY, 1); + bitmap_set(p->get_feature_list, + ONFI_FEATURE_ADDR_READ_RETRY, 1); + } +} + /* * Macronix AC series does not support using SET/GET_FEATURES to change * the timings unlike what is declared in the parameter page. Unflag @@ -56,6 +100,7 @@ static int macronix_nand_init(struct nand_chip *chip) chip->options |= NAND_BBM_FIRSTPAGE | NAND_BBM_SECONDPAGE; macronix_nand_fix_broken_get_timings(chip); + macronix_nand_onfi_init(chip); return 0; } -- 1.9.1
Re: [v4 3/7] drm/mediatek: add dsi reg commit disable control
Hi, Jitao: On Sat, 2019-06-01 at 17:26 +0800, Jitao Shi wrote: > New DSI IP has shadow register and working reg. The register > values are writen to shadow register. And then trigger with > commit reg, the register values will be moved working register. > > This fucntion is defualt on. But this driver doesn't use this > function. So add the disable control. Reviewed-by: CK Hu > > Signed-off-by: Jitao Shi > --- > drivers/gpu/drm/mediatek/mtk_dsi.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/gpu/drm/mediatek/mtk_dsi.c > b/drivers/gpu/drm/mediatek/mtk_dsi.c > index a48db056df6c..eea47294079e 100644 > --- a/drivers/gpu/drm/mediatek/mtk_dsi.c > +++ b/drivers/gpu/drm/mediatek/mtk_dsi.c > @@ -131,6 +131,10 @@ > #define VM_CMD_ENBIT(0) > #define TS_VFP_ENBIT(5) > > +#define DSI_SHADOW_DEBUG 0x190U > +#define FORCE_COMMIT BIT(0) > +#define BYPASS_SHADOWBIT(1) > + > #define CONFIG (0xff << 0) > #define SHORT_PACKET 0 > #define LONG_PACKET 2 > @@ -157,6 +161,7 @@ struct phy; > > struct mtk_dsi_driver_data { > const u32 reg_cmdq_off; > + bool has_shadow_ctl; > }; > > struct mtk_dsi { > @@ -594,6 +599,11 @@ static int mtk_dsi_poweron(struct mtk_dsi *dsi) > } > > mtk_dsi_enable(dsi); > + > + if (dsi->driver_data->has_shadow_ctl) > + writel(FORCE_COMMIT | BYPASS_SHADOW, > +dsi->regs + DSI_SHADOW_DEBUG); > + > mtk_dsi_reset_engine(dsi); > mtk_dsi_phy_timconfig(dsi); >
linux-next: build failure after merge of the clockevents tree
Hi Daniel, After merging the clockevents tree, today's linux-next build (x86_64 allmodconfig) failed like this: drivers/clocksource/timer-atmel-tcb.c: In function 'tcb_clksrc_init': drivers/clocksource/timer-atmel-tcb.c:448:17: error: invalid use of undefined type 'struct delay_timer' tc_delay_timer.read_current_timer = tc_delay_timer_read32; ^ drivers/clocksource/timer-atmel-tcb.c:461:17: error: invalid use of undefined type 'struct delay_timer' tc_delay_timer.read_current_timer = tc_delay_timer_read; ^ drivers/clocksource/timer-atmel-tcb.c:476:16: error: invalid use of undefined type 'struct delay_timer' tc_delay_timer.freq = divided_rate; ^ drivers/clocksource/timer-atmel-tcb.c:477:2: error: implicit declaration of function 'register_current_timer_delay'; did you mean 'read_current_timer'? [-Werror=implicit-function-declaration] register_current_timer_delay(_delay_timer); ^~~~ read_current_timer drivers/clocksource/timer-atmel-tcb.c: At top level: drivers/clocksource/timer-atmel-tcb.c:129:27: error: storage size of 'tc_delay_timer' isn't known static struct delay_timer tc_delay_timer; ^~ cc1: some warnings being treated as errors Caused by commit dd40f5020581 ("clocksource/drivers/tcb_clksrc: Register delay timer") I have reverted that commit for today. -- Cheers, Stephen Rothwell pgpm5APr5KHIH.pgp Description: OpenPGP digital signature
[PATCH V2 net-next 09/10] net: hns3: add opcode about query and clear RAS & MSI-X to special opcode
From: Weihang Li There are four commands being used to query and clear RAS and MSI-X interrupts status. They should be contained in array of special opcodes because these commands have several descriptors, and we need to judge return value in the first descriptor rather than the last one as other opcodes. In addition, we shouldn't set the NEXT_FLAG of first descriptor. This patch fixes above issues. Signed-off-by: Weihang Li Signed-off-by: Peng Li Signed-off-by: Huazhong Tan --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 6 +- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 16 2 files changed, 5 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c index e532905..7a3bde7 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c @@ -173,7 +173,11 @@ static bool hclge_is_special_opcode(u16 opcode) HCLGE_OPC_STATS_MAC, HCLGE_OPC_STATS_MAC_ALL, HCLGE_OPC_QUERY_32_BIT_REG, -HCLGE_OPC_QUERY_64_BIT_REG}; +HCLGE_OPC_QUERY_64_BIT_REG, +HCLGE_QUERY_CLEAR_MPF_RAS_INT, +HCLGE_QUERY_CLEAR_PF_RAS_INT, +HCLGE_QUERY_CLEAR_ALL_MPF_MSIX_INT, +HCLGE_QUERY_CLEAR_ALL_PF_MSIX_INT}; int i; for (i = 0; i < ARRAY_SIZE(spec_opcode); i++) { diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c index 83b07ce..b4a7e6a 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c @@ -1098,8 +1098,6 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, /* query all main PF RAS errors */ hclge_cmd_setup_basic_desc([0], HCLGE_QUERY_CLEAR_MPF_RAS_INT, true); - desc[0].flag |= cpu_to_le16(HCLGE_CMD_FLAG_NEXT); - ret = hclge_cmd_send(>hw, [0], num); if (ret) { dev_err(dev, "query all mpf ras int cmd failed (%d)\n", ret); @@ -1262,8 +1260,6 @@ static int hclge_handle_mpf_ras_error(struct hclge_dev *hdev, /* clear all main PF RAS errors */ hclge_cmd_reuse_desc([0], false); - desc[0].flag |= cpu_to_le16(HCLGE_CMD_FLAG_NEXT); - ret = hclge_cmd_send(>hw, [0], num); if (ret) dev_err(dev, "clear all mpf ras int cmd failed (%d)\n", ret); @@ -1293,8 +1289,6 @@ static int hclge_handle_pf_ras_error(struct hclge_dev *hdev, /* query all PF RAS errors */ hclge_cmd_setup_basic_desc([0], HCLGE_QUERY_CLEAR_PF_RAS_INT, true); - desc[0].flag |= cpu_to_le16(HCLGE_CMD_FLAG_NEXT); - ret = hclge_cmd_send(>hw, [0], num); if (ret) { dev_err(dev, "query all pf ras int cmd failed (%d)\n", ret); @@ -1348,8 +1342,6 @@ static int hclge_handle_pf_ras_error(struct hclge_dev *hdev, /* clear all PF RAS errors */ hclge_cmd_reuse_desc([0], false); - desc[0].flag |= cpu_to_le16(HCLGE_CMD_FLAG_NEXT); - ret = hclge_cmd_send(>hw, [0], num); if (ret) dev_err(dev, "clear all pf ras int cmd failed (%d)\n", ret); @@ -1667,8 +1659,6 @@ int hclge_handle_hw_msix_error(struct hclge_dev *hdev, /* query all main PF MSIx errors */ hclge_cmd_setup_basic_desc([0], HCLGE_QUERY_CLEAR_ALL_MPF_MSIX_INT, true); - desc[0].flag |= cpu_to_le16(HCLGE_CMD_FLAG_NEXT); - ret = hclge_cmd_send(>hw, [0], mpf_bd_num); if (ret) { dev_err(dev, "query all mpf msix int cmd failed (%d)\n", @@ -1700,8 +1690,6 @@ int hclge_handle_hw_msix_error(struct hclge_dev *hdev, /* clear all main PF MSIx errors */ hclge_cmd_reuse_desc([0], false); - desc[0].flag |= cpu_to_le16(HCLGE_CMD_FLAG_NEXT); - ret = hclge_cmd_send(>hw, [0], mpf_bd_num); if (ret) { dev_err(dev, "clear all mpf msix int cmd failed (%d)\n", @@ -1713,8 +1701,6 @@ int hclge_handle_hw_msix_error(struct hclge_dev *hdev, memset(desc, 0, bd_num * sizeof(struct hclge_desc)); hclge_cmd_setup_basic_desc([0], HCLGE_QUERY_CLEAR_ALL_PF_MSIX_INT, true); - desc[0].flag |= cpu_to_le16(HCLGE_CMD_FLAG_NEXT); - ret = hclge_cmd_send(>hw, [0], pf_bd_num); if (ret) { dev_err(dev, "query all pf msix int cmd failed (%d)\n", @@ -1753,8 +1739,6 @@ int hclge_handle_hw_msix_error(struct hclge_dev *hdev, /* clear all PF MSIx errors */ hclge_cmd_reuse_desc([0], false); - desc[0].flag |=
[PATCH V2 net-next 10/10] net: hns3: delay and separate enabling of NIC and ROCE HW errors
From: Weihang Li All RAS and MSI-X should be enabled just in the final stage of HNS3 initialization. It means that they should be enabled in hclge_init_xxx_client_instance instead of hclge_ae_dev(). Especially MSI-X, if it is enabled before opening vector0 IRQ, there are some chances that a MSI-X error will cause failure on initialization of NIC client instane. So this patch delays enabling of HW errors. Otherwise, we also separate enabling of ROCE RAS from NIC, because it's not reasonable to enable ROCE RAS if we even don't have a ROCE driver. Signed-off-by: Weihang Li Signed-off-by: Peng Li Signed-off-by: Huazhong tan --- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 9 + .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h | 3 +- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 45 +++--- 3 files changed, 36 insertions(+), 21 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c index b4a7e6a..784512d 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c @@ -1493,7 +1493,7 @@ hclge_log_and_clear_rocee_ras_error(struct hclge_dev *hdev) return reset_type; } -static int hclge_config_rocee_ras_interrupt(struct hclge_dev *hdev, bool en) +int hclge_config_rocee_ras_interrupt(struct hclge_dev *hdev, bool en) { struct device *dev = >pdev->dev; struct hclge_desc desc; @@ -1566,10 +1566,9 @@ static const struct hclge_hw_blk hw_blk[] = { { /* sentinel */ } }; -int hclge_hw_error_set_state(struct hclge_dev *hdev, bool state) +int hclge_config_nic_hw_error(struct hclge_dev *hdev, bool state) { const struct hclge_hw_blk *module = hw_blk; - struct device *dev = >pdev->dev; int ret = 0; while (module->name) { @@ -1581,10 +1580,6 @@ int hclge_hw_error_set_state(struct hclge_dev *hdev, bool state) module++; } - ret = hclge_config_rocee_ras_interrupt(hdev, state); - if (ret) - dev_err(dev, "fail(%d) to configure ROCEE err int\n", ret); - return ret; } diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h index c56b11e..81d115a 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h @@ -119,7 +119,8 @@ struct hclge_hw_error { }; int hclge_config_mac_tnl_int(struct hclge_dev *hdev, bool en); -int hclge_hw_error_set_state(struct hclge_dev *hdev, bool state); +int hclge_config_nic_hw_error(struct hclge_dev *hdev, bool state); +int hclge_config_rocee_ras_interrupt(struct hclge_dev *hdev, bool en); pci_ers_result_t hclge_handle_hw_ras_error(struct hnae3_ae_dev *ae_dev); int hclge_handle_hw_msix_error(struct hclge_dev *hdev, unsigned long *reset_requests); diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 4873a8e..35d2a45 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -8202,10 +8202,16 @@ static int hclge_init_nic_client_instance(struct hnae3_ae_dev *ae_dev, set_bit(HCLGE_STATE_NIC_REGISTERED, >state); hnae3_set_client_init_flag(client, ae_dev, 1); + /* Enable nic hw error interrupts */ + ret = hclge_config_nic_hw_error(hdev, true); + if (ret) + dev_err(_dev->pdev->dev, + "fail(%d) to enable hw error interrupts\n", ret); + if (netif_msg_drv(>vport->nic)) hclge_info_show(hdev); - return 0; + return ret; } static int hclge_init_roce_client_instance(struct hnae3_ae_dev *ae_dev, @@ -8285,7 +8291,13 @@ static int hclge_init_client_instance(struct hnae3_client *client, } } - return 0; + /* Enable roce ras interrupts */ + ret = hclge_config_rocee_ras_interrupt(hdev, true); + if (ret) + dev_err(_dev->pdev->dev, + "fail(%d) to enable roce ras interrupts\n", ret); + + return ret; clear_nic: hdev->nic_client = NULL; @@ -8589,13 +8601,6 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev) goto err_mdiobus_unreg; } - ret = hclge_hw_error_set_state(hdev, true); - if (ret) { - dev_err(>dev, - "fail(%d) to enable hw error interrupts\n", ret); - goto err_mdiobus_unreg; - } - INIT_KFIFO(hdev->mac_tnl_log); hclge_dcb_ops_set(hdev); @@ -8719,15 +8724,26 @@ static int hclge_reset_ae_dev(struct hnae3_ae_dev *ae_dev) } /* Re-enable the hw error interrupts because -* the interrupts get disabled on core/global
[PATCH V2 net-next 06/10] net: hns3: set ops to null when unregister ad_dev
From: Weihang Li The hclge/hclgevf and hns3 module can be unloaded independently, when hclge/hclgevf unloaded firstly, the ops of ae_dev should be set to NULL, otherwise it will cause an use-after-free problem. Fixes: 38caee9d3ee8 ("net: hns3: Add support of the HNAE3 framework") Signed-off-by: Weihang Li Signed-off-by: Peng Li Signed-off-by: Huazhong Tan --- drivers/net/ethernet/hisilicon/hns3/hnae3.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.c b/drivers/net/ethernet/hisilicon/hns3/hnae3.c index fa8b850..738e013 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hnae3.c +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.c @@ -251,6 +251,7 @@ void hnae3_unregister_ae_algo(struct hnae3_ae_algo *ae_algo) ae_algo->ops->uninit_ae_dev(ae_dev); hnae3_set_bit(ae_dev->flag, HNAE3_DEV_INITED_B, 0); + ae_dev->ops = NULL; } list_del(_algo->node); @@ -351,6 +352,7 @@ void hnae3_unregister_ae_dev(struct hnae3_ae_dev *ae_dev) ae_algo->ops->uninit_ae_dev(ae_dev); hnae3_set_bit(ae_dev->flag, HNAE3_DEV_INITED_B, 0); + ae_dev->ops = NULL; } list_del(_dev->node); -- 2.7.4
[PATCH V2 net-next 03/10] net: hns3: fix VLAN filter restore issue after reset
From: Jian Shen In orginal codes, the driver only restore VLAN filter entries for PF after reset, the VLAN entries of VF will lose in this case. This patch fixes it by recording VLAN IDs for each function when add VLAN, and restore the VLAN IDs after reset. Fixes: 681ec3999b3d ("net: hns3: fix for vlan table lost problem when resetting") Signed-off-by: Jian Shen Signed-off-by: Huazhong Tan --- drivers/net/ethernet/hisilicon/hns3/hnae3.h| 3 ++ drivers/net/ethernet/hisilicon/hns3/hns3_enet.c| 34 ++ drivers/net/ethernet/hisilicon/hns3/hns3_enet.h| 1 - .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 42 +++--- 4 files changed, 43 insertions(+), 37 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h index 51c2ff1..2e478d9 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h @@ -338,6 +338,8 @@ struct hnae3_ae_dev { * Set vlan filter config of Ports * set_vf_vlan_filter() * Set vlan filter config of vf + * restore_vlan_table() + * Restore vlan filter entries after reset * enable_hw_strip_rxvtag() * Enable/disable hardware strip vlan tag of packets received * set_gro_en @@ -505,6 +507,7 @@ struct hnae3_ae_ops { void (*set_timer_task)(struct hnae3_handle *handle, bool enable); int (*mac_connect_phy)(struct hnae3_handle *handle); void (*mac_disconnect_phy)(struct hnae3_handle *handle); + void (*restore_vlan_table)(struct hnae3_handle *handle); }; struct hnae3_dcb_ops { diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index f6dc305..1e68bcb 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -1548,15 +1548,11 @@ static int hns3_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid) { struct hnae3_handle *h = hns3_get_handle(netdev); - struct hns3_nic_priv *priv = netdev_priv(netdev); int ret = -EIO; if (h->ae_algo->ops->set_vlan_filter) ret = h->ae_algo->ops->set_vlan_filter(h, proto, vid, false); - if (!ret) - set_bit(vid, priv->active_vlans); - return ret; } @@ -1564,33 +1560,11 @@ static int hns3_vlan_rx_kill_vid(struct net_device *netdev, __be16 proto, u16 vid) { struct hnae3_handle *h = hns3_get_handle(netdev); - struct hns3_nic_priv *priv = netdev_priv(netdev); int ret = -EIO; if (h->ae_algo->ops->set_vlan_filter) ret = h->ae_algo->ops->set_vlan_filter(h, proto, vid, true); - if (!ret) - clear_bit(vid, priv->active_vlans); - - return ret; -} - -static int hns3_restore_vlan(struct net_device *netdev) -{ - struct hns3_nic_priv *priv = netdev_priv(netdev); - int ret = 0; - u16 vid; - - for_each_set_bit(vid, priv->active_vlans, VLAN_N_VID) { - ret = hns3_vlan_rx_add_vid(netdev, htons(ETH_P_8021Q), vid); - if (ret) { - netdev_err(netdev, "Restore vlan: %d filter, ret:%d\n", - vid, ret); - return ret; - } - } - return ret; } @@ -4301,12 +4275,8 @@ static int hns3_reset_notify_restore_enet(struct hnae3_handle *handle) vlan_filter_enable = netdev->flags & IFF_PROMISC ? false : true; hns3_enable_vlan_filter(netdev, vlan_filter_enable); - /* Hardware table is only clear when pf resets */ - if (!(handle->flags & HNAE3_SUPPORT_VF)) { - ret = hns3_restore_vlan(netdev); - if (ret) - return ret; - } + if (handle->ae_algo->ops->restore_vlan_table) + handle->ae_algo->ops->restore_vlan_table(handle); return hns3_restore_fd_rules(netdev); } diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h index 408efd5..efab15f 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h @@ -550,7 +550,6 @@ struct hns3_nic_priv { struct notifier_block notifier_block; /* Vxlan/Geneve information */ struct hns3_udp_tunnel udp_tnl[HNS3_UDP_TNL_MAX]; - unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)]; struct hns3_enet_coalesce tx_coal; struct hns3_enet_coalesce rx_coal; }; diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 1215455..4873a8e 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -7401,10 +7401,6 @@ static void
Re: [PATCH v2 net-next] net: link_watch: prevent starvation when processing linkwatch wq
On 2019/5/31 19:17, Salil Mehta wrote: >> From: netdev-ow...@vger.kernel.org [mailto:netdev- >> ow...@vger.kernel.org] On Behalf Of Yunsheng Lin >> Sent: Friday, May 31, 2019 10:01 AM >> To: da...@davemloft.net >> Cc: hkallwe...@gmail.com; f.faine...@gmail.com; >> step...@networkplumber.org; net...@vger.kernel.org; linux- >> ker...@vger.kernel.org; Linuxarm >> Subject: [PATCH v2 net-next] net: link_watch: prevent starvation when >> processing linkwatch wq >> >> When user has configured a large number of virtual netdev, such >> as 4K vlans, the carrier on/off operation of the real netdev >> will also cause it's virtual netdev's link state to be processed >> in linkwatch. Currently, the processing is done in a work queue, >> which may cause cpu and rtnl locking starvation problem. >> >> This patch releases the cpu and rtnl lock when link watch worker >> has processed a fixed number of netdev' link watch event. >> >> Currently __linkwatch_run_queue is called with rtnl lock, so >> enfore it with ASSERT_RTNL(); >> >> Signed-off-by: Yunsheng Lin >> --- >> V2: use cond_resched and rtnl_unlock after processing a fixed >> number of events >> --- >> net/core/link_watch.c | 17 + >> 1 file changed, 17 insertions(+) >> >> diff --git a/net/core/link_watch.c b/net/core/link_watch.c >> index 7f51efb..07eebfb 100644 >> --- a/net/core/link_watch.c >> +++ b/net/core/link_watch.c >> @@ -168,9 +168,18 @@ static void linkwatch_do_dev(struct net_device >> *dev) >> >> static void __linkwatch_run_queue(int urgent_only) >> { >> +#define MAX_DO_DEV_PER_LOOP 100 >> + >> +int do_dev = MAX_DO_DEV_PER_LOOP; >> struct net_device *dev; >> LIST_HEAD(wrk); >> >> +ASSERT_RTNL(); >> + >> +/* Give urgent case more budget */ >> +if (urgent_only) >> +do_dev += MAX_DO_DEV_PER_LOOP; >> + >> /* >> * Limit the number of linkwatch events to one >> * per second so that a runaway driver does not >> @@ -200,6 +209,14 @@ static void __linkwatch_run_queue(int urgent_only) >> } >> spin_unlock_irq(_lock); >> linkwatch_do_dev(dev); >> + >> +if (--do_dev < 0) { >> +rtnl_unlock(); >> +cond_resched(); > > > > Sorry, missed in my earlier comment. I could see multiple problems here > and please correct me if I am wrong: > > 1. It looks like releasing the rtnl_lock here and then res-scheduling might >not be safe, especially when you have already held *lweventlist_lock* >(which is global and not per-netdev), and when you are trying to >reschedule. This can cause *deadlock* with itself. > >Reason: once you release the rtnl_lock() the similar leg of function >netdev_wait_allrefs() could be called for some other netdevice which >might end up in waiting for same global linkwatch event list lock >i.e. *lweventlist_lock*. lweventlist_lock has been released before releasing the rtnl_lock and rescheduling. > > 2. After releasing the rtnl_lock() we have not ensured that all the rcu >operations are complete. Perhaps we need to take rcu_barrier() before >retaking the rtnl_lock() Why do we need to ensure all the rcu operations are complete here? > > > > >> +do_dev = MAX_DO_DEV_PER_LOOP; > > > > Here, I think rcu_barrier() should exist. In netdev_wait_allrefs, rcu_barrier is indeed called between __rtnl_unlock and rtnl_lock and is added by below commit 0115e8e30d6f ("net: remove delay at device dismantle"), which seems to work with NETDEV_UNREGISTER_FINAL. And the NETDEV_UNREGISTER_FINAL is removed by commit 070f2d7e264a ("net: Drop NETDEV_UNREGISTER_FINAL"), which says something about whether the rcu_barrier is still needed. "dev_change_net_namespace() and netdev_wait_allrefs() have rcu_barrier() before NETDEV_UNREGISTER_FINAL call, and the source commits say they were introduced to delemit the call with NETDEV_UNREGISTER, but this patch leaves them on the places, since they require additional analysis, whether we need in them for something else." So the reason of calling rcu_barrier in netdev_wait_allrefs is unclear now. Also rcu_barrier in netdev_wait_allrefs is added to fix the device dismantle problem, so for linkwatch, maybe it is not needed. > > > >> +rtnl_lock(); >> +} >> + >> spin_lock_irq(_lock); >> } > > > . >
[PATCH V2 net-next 04/10] net: hns3: set the port shaper according to MAC speed
From: Yunsheng Lin This patch sets the port shaper according to the MAC speed as suggested by hardware user manual. Signed-off-by: Yunsheng Lin Signed-off-by: Peng Li Signed-off-by: Huazhong Tan --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c index a7bbb6d..fac5193 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c @@ -397,7 +397,7 @@ static int hclge_tm_port_shaper_cfg(struct hclge_dev *hdev) u8 ir_u, ir_b, ir_s; int ret; - ret = hclge_shaper_para_calc(HCLGE_ETHER_MAX_RATE, + ret = hclge_shaper_para_calc(hdev->hw.mac.speed, HCLGE_SHAPER_LVL_PORT, _b, _u, _s); if (ret) -- 2.7.4
[PATCH V2 net-next 07/10] net: hns3: add handling of two bits in MAC tunnel interrupts
From: Weihang Li LINK_UP and LINK_DOWN are two bits of MAC tunnel interrupts, but previous HNS3 driver didn't handle them. If they were enabled, value of these two bits will change during link down and link up, which will cause HNS3 driver keep receiving IRQ but can't handle them. This patch adds handling of these two bits of interrupts, we will record and clear them as what we do to other MAC tunnel interrupts. Signed-off-by: Weihang Li Signed-off-by: Peng Li Signed-off-by: Huazhong Tan --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 2 +- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c index ed1f533..e1007d9 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c @@ -1053,7 +1053,7 @@ static void hclge_dbg_dump_mac_tnl_status(struct hclge_dev *hdev) while (kfifo_get(>mac_tnl_log, )) { rem_nsec = do_div(stats.time, HCLGE_BILLION_NANO_SECONDS); - dev_info(>pdev->dev, "[%07lu.%03lu]status = 0x%x\n", + dev_info(>pdev->dev, "[%07lu.%03lu] status = 0x%x\n", (unsigned long)stats.time, rem_nsec / 1000, stats.status); } diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h index 9645590..c56b11e 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h @@ -47,9 +47,9 @@ #define HCLGE_NCSI_ERR_INT_TYPE0x9 #define HCLGE_MAC_COMMON_ERR_INT_EN0x107FF #define HCLGE_MAC_COMMON_ERR_INT_EN_MASK 0x107FF -#define HCLGE_MAC_TNL_INT_EN GENMASK(7, 0) -#define HCLGE_MAC_TNL_INT_EN_MASK GENMASK(7, 0) -#define HCLGE_MAC_TNL_INT_CLR GENMASK(7, 0) +#define HCLGE_MAC_TNL_INT_EN GENMASK(9, 0) +#define HCLGE_MAC_TNL_INT_EN_MASK GENMASK(9, 0) +#define HCLGE_MAC_TNL_INT_CLR GENMASK(9, 0) #define HCLGE_PPU_MPF_ABNORMAL_INT0_EN GENMASK(31, 0) #define HCLGE_PPU_MPF_ABNORMAL_INT0_EN_MASKGENMASK(31, 0) #define HCLGE_PPU_MPF_ABNORMAL_INT1_EN GENMASK(31, 0) -- 2.7.4
[PATCH V2 net-next 05/10] net: hns3: add a check to pointer in error_detected and slot_reset
From: Weihang Li If we add a VF without loading hclgevf.ko and then there is a RAS error occurs, PCIe AER will call error_detected and slot_reset of all functions, and will get a NULL pointer when we check ad_dev->ops->handle_hw_ras_error. This will cause a call trace and failures on handling of follow-up RAS errors. This patch check ae_dev and ad_dev->ops at first to solve above issues. Signed-off-by: Weihang Li Signed-off-by: Peng Li Signed-off-by: Huazhong Tan --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index 1e68bcb..0501b78 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -1920,9 +1920,9 @@ static pci_ers_result_t hns3_error_detected(struct pci_dev *pdev, if (state == pci_channel_io_perm_failure) return PCI_ERS_RESULT_DISCONNECT; - if (!ae_dev) { + if (!ae_dev || !ae_dev->ops) { dev_err(>dev, - "Can't recover - error happened during device init\n"); + "Can't recover - error happened before device initialized\n"); return PCI_ERS_RESULT_NONE; } @@ -1941,6 +1941,9 @@ static pci_ers_result_t hns3_slot_reset(struct pci_dev *pdev) dev_info(dev, "requesting reset due to PCI error\n"); + if (!ae_dev || !ae_dev->ops) + return PCI_ERS_RESULT_NONE; + /* request the reset */ if (ae_dev->ops->reset_event) { if (!ae_dev->override_pci_need_reset) -- 2.7.4
[PATCH V2 net-next 02/10] net: hns3: don't configure new VLAN ID into VF VLAN table when it's full
From: Jian Shen VF VLAN table can only support no more than 256 VLANs. When user adds too many VLANs, the VF VLAN table will be full, and firmware will close the VF VLAN table for the function. When VF VLAN table is full, and user keeps adding new VLANs, it's unnecessary to configure the VF VLAN table, because it will always fail, and print warning message. The worst case is adding 4K VLANs, and doing reset, it will take much time to restore these VLANs, which may cause VF reset fail by timeout. Fixes: 6c251711b37f ("net: hns3: Disable vf vlan filter when vf vlan table is full") Signed-off-by: Jian Shen Signed-off-by: Peng Li Signed-off-by: Huazhong Tan --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 8 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h | 1 + 2 files changed, 9 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index f0f618d..1215455 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -7025,6 +7025,12 @@ static int hclge_set_vf_vlan_common(struct hclge_dev *hdev, int vfid, u8 vf_byte_off; int ret; + /* if vf vlan table is full, firmware will close vf vlan filter, it +* is unable and unnecessary to add new vlan id to vf vlan filter +*/ + if (test_bit(vfid, hdev->vf_vlan_full) && !is_kill) + return 0; + hclge_cmd_setup_basic_desc([0], HCLGE_OPC_VLAN_FILTER_VF_CFG, false); hclge_cmd_setup_basic_desc([1], @@ -7060,6 +7066,7 @@ static int hclge_set_vf_vlan_common(struct hclge_dev *hdev, int vfid, return 0; if (req0->resp_code == HCLGE_VF_VLAN_NO_ENTRY) { + set_bit(vfid, hdev->vf_vlan_full); dev_warn(>pdev->dev, "vf vlan table is full, vf vlan filter is disabled\n"); return 0; @@ -8621,6 +8628,7 @@ static int hclge_reset_ae_dev(struct hnae3_ae_dev *ae_dev) hclge_stats_clear(hdev); memset(hdev->vlan_table, 0, sizeof(hdev->vlan_table)); + memset(hdev->vf_vlan_full, 0, sizeof(hdev->vf_vlan_full)); ret = hclge_cmd_init(hdev); if (ret) { diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h index 2b3bc95..414f7db 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h @@ -820,6 +820,7 @@ struct hclge_dev { struct hclge_vlan_type_cfg vlan_type_cfg; unsigned long vlan_table[VLAN_N_VID][BITS_TO_LONGS(HCLGE_VPORT_NUM)]; + unsigned long vf_vlan_full[BITS_TO_LONGS(HCLGE_VPORT_NUM)]; struct hclge_fd_cfg fd_cfg; struct hlist_head fd_rule_list; -- 2.7.4
[PATCH V2 net-next 08/10] net: hns3: remove setting bit of reset_requests when handling mac tunnel interrupts
From: Weihang Li We shouldn't set HNAE3_NONE_RESET bit of the variable that represents a reset request during handling of MSI-X errors, or may cause issue when trigger reset. Signed-off-by: Weihang Li Signed-off-by: Peng Li Signed-off-by: Huazhong Tan --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c index 55c4a1b..83b07ce 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c @@ -1783,7 +1783,6 @@ int hclge_handle_hw_msix_error(struct hclge_dev *hdev, ret = hclge_clear_mac_tnl_int(hdev); if (ret) dev_err(dev, "clear mac tnl int failed (%d)\n", ret); - set_bit(HNAE3_NONE_RESET, reset_requests); } msi_error: -- 2.7.4
[PATCH V2 net-next 01/10] net: hns3: remove redundant core reset
Since core reset is similar to the global reset, so this patch removes it and uses global reset to replace it. Signed-off-by: Huazhong Tan Signed-off-by: Peng Li --- drivers/net/ethernet/hisilicon/hns3/hnae3.h| 1 - .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 24 +-- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 28 -- 3 files changed, 12 insertions(+), 41 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h index a18645e..51c2ff1 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h @@ -154,7 +154,6 @@ enum hnae3_reset_type { HNAE3_VF_FULL_RESET, HNAE3_FLR_RESET, HNAE3_FUNC_RESET, - HNAE3_CORE_RESET, HNAE3_GLOBAL_RESET, HNAE3_IMP_RESET, HNAE3_UNKNOWN_RESET, diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c index 4ac8063..55c4a1b 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c @@ -87,25 +87,25 @@ static const struct hclge_hw_error hclge_msix_sram_ecc_int[] = { static const struct hclge_hw_error hclge_igu_int[] = { { .int_msk = BIT(0), .msg = "igu_rx_buf0_ecc_mbit_err", - .reset_level = HNAE3_CORE_RESET }, + .reset_level = HNAE3_GLOBAL_RESET }, { .int_msk = BIT(2), .msg = "igu_rx_buf1_ecc_mbit_err", - .reset_level = HNAE3_CORE_RESET }, + .reset_level = HNAE3_GLOBAL_RESET }, { /* sentinel */ } }; static const struct hclge_hw_error hclge_igu_egu_tnl_int[] = { { .int_msk = BIT(0), .msg = "rx_buf_overflow", - .reset_level = HNAE3_CORE_RESET }, + .reset_level = HNAE3_GLOBAL_RESET }, { .int_msk = BIT(1), .msg = "rx_stp_fifo_overflow", - .reset_level = HNAE3_CORE_RESET }, + .reset_level = HNAE3_GLOBAL_RESET }, { .int_msk = BIT(2), .msg = "rx_stp_fifo_undeflow", - .reset_level = HNAE3_CORE_RESET }, + .reset_level = HNAE3_GLOBAL_RESET }, { .int_msk = BIT(3), .msg = "tx_buf_overflow", - .reset_level = HNAE3_CORE_RESET }, + .reset_level = HNAE3_GLOBAL_RESET }, { .int_msk = BIT(4), .msg = "tx_buf_underrun", - .reset_level = HNAE3_CORE_RESET }, + .reset_level = HNAE3_GLOBAL_RESET }, { .int_msk = BIT(5), .msg = "rx_stp_buf_overflow", - .reset_level = HNAE3_CORE_RESET }, + .reset_level = HNAE3_GLOBAL_RESET }, { /* sentinel */ } }; @@ -413,13 +413,13 @@ static const struct hclge_hw_error hclge_ppu_mpf_abnormal_int_st2[] = { static const struct hclge_hw_error hclge_ppu_mpf_abnormal_int_st3[] = { { .int_msk = BIT(4), .msg = "gro_bd_ecc_mbit_err", - .reset_level = HNAE3_CORE_RESET }, + .reset_level = HNAE3_GLOBAL_RESET }, { .int_msk = BIT(5), .msg = "gro_context_ecc_mbit_err", - .reset_level = HNAE3_CORE_RESET }, + .reset_level = HNAE3_GLOBAL_RESET }, { .int_msk = BIT(6), .msg = "rx_stash_cfg_ecc_mbit_err", - .reset_level = HNAE3_CORE_RESET }, + .reset_level = HNAE3_GLOBAL_RESET }, { .int_msk = BIT(7), .msg = "axi_rd_fbd_ecc_mbit_err", - .reset_level = HNAE3_CORE_RESET }, + .reset_level = HNAE3_GLOBAL_RESET }, { /* sentinel */ } }; diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 0545f38..f0f618d 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -2706,15 +2706,6 @@ static u32 hclge_check_event_cause(struct hclge_dev *hdev, u32 *clearval) return HCLGE_VECTOR0_EVENT_RST; } - if (BIT(HCLGE_VECTOR0_CORERESET_INT_B) & rst_src_reg) { - dev_info(>pdev->dev, "core reset interrupt\n"); - set_bit(HCLGE_STATE_CMD_DISABLE, >state); - set_bit(HNAE3_CORE_RESET, >reset_pending); - *clearval = BIT(HCLGE_VECTOR0_CORERESET_INT_B); - hdev->rst_stats.core_rst_cnt++; - return HCLGE_VECTOR0_EVENT_RST; - } - /* check for vector0 msix event source */ if (msix_src_reg & HCLGE_VECTOR0_REG_MSIX_MASK) { dev_dbg(>pdev->dev, "received event 0x%x\n", @@ -2941,10 +2932,6 @@ static int hclge_reset_wait(struct hclge_dev *hdev) reg = HCLGE_GLOBAL_RESET_REG; reg_bit = HCLGE_GLOBAL_RESET_BIT; break; - case HNAE3_CORE_RESET: - reg = HCLGE_GLOBAL_RESET_REG; - reg_bit = HCLGE_CORE_RESET_BIT; - break; case HNAE3_FUNC_RESET: reg = HCLGE_FUN_RST_ING; reg_bit = HCLGE_FUN_RST_ING_B; @@ -3076,12 +3063,6 @@
[PATCH V2 net-next 00/10] code optimizations & bugfixes for HNS3 driver
This patch-set includes code optimizations and bugfixes for the HNS3 ethernet controller driver. [patch 1/10] removes the redundant core reset type [patch 2/10 - 3/10] fixes two VLAN related issues [patch 4/10] fixes a TM issue [patch 5/10 - 10/10] includes some patches related to RAS & MSI-X error Change log: V1->V2: removes two patches which needs to change HNS's infiniband driver as well, they will be upstreamed later with the infiniband's one. Huazhong Tan (1): net: hns3: remove redundant core reset Jian Shen (2): net: hns3: don't configure new VLAN ID into VF VLAN table when it's full net: hns3: fix VLAN filter restore issue after reset Weihang Li (6): net: hns3: add a check to pointer in error_detected and slot_reset net: hns3: set ops to null when unregister ad_dev net: hns3: add handling of two bits in MAC tunnel interrupts net: hns3: remove setting bit of reset_requests when handling mac tunnel interrupts net: hns3: add opcode about query and clear RAS & MSI-X to special opcode net: hns3: delay and separate enabling of NIC and ROCE HW errors Yunsheng Lin (1): net: hns3: set the port shaper according to MAC speed drivers/net/ethernet/hisilicon/hns3/hnae3.c| 2 + drivers/net/ethernet/hisilicon/hns3/hnae3.h| 4 +- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c| 41 ++- drivers/net/ethernet/hisilicon/hns3/hns3_enet.h| 1 - .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 6 +- .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 2 +- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 50 +++-- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_err.h | 9 +- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 123 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h| 1 + .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c | 2 +- 11 files changed, 117 insertions(+), 124 deletions(-) -- 2.7.4
[v2, PATCH 1/4] net: stmmac: dwmac-mediatek: enable Ethernet power domain
add Ethernet power on/off operations in init/exit flow. Signed-off-by: Biao Huang --- .../net/ethernet/stmicro/stmmac/dwmac-mediatek.c |7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c index 126b66b..b84269e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include @@ -298,6 +299,9 @@ static int mediatek_dwmac_init(struct platform_device *pdev, void *priv) return ret; } + pm_runtime_enable(>dev); + pm_runtime_get_sync(>dev); + return 0; } @@ -307,6 +311,9 @@ static void mediatek_dwmac_exit(struct platform_device *pdev, void *priv) const struct mediatek_dwmac_variant *variant = plat->variant; clk_bulk_disable_unprepare(variant->num_clks, plat->clks); + + pm_runtime_put_sync(>dev); + pm_runtime_disable(>dev); } static int mediatek_dwmac_probe(struct platform_device *pdev) -- 1.7.9.5
[v2, PATCH 4/4] net: stmmac: dwmac4: fix flow control issue
Current dwmac4_flow_ctrl will not clear GMAC_RX_FLOW_CTRL_RFE/GMAC_RX_FLOW_CTRL_RFE bits, so MAC hw will keep flow control on although expecting flow control off by ethtool. Add codes to fix it. Fixes: 477286b53f55 ("stmmac: add GMAC4 core support") Signed-off-by: Biao Huang --- drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c index 2544cff..9322b71 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c @@ -488,8 +488,9 @@ static void dwmac4_flow_ctrl(struct mac_device_info *hw, unsigned int duplex, if (fc & FLOW_RX) { pr_debug("\tReceive Flow-Control ON\n"); flow |= GMAC_RX_FLOW_CTRL_RFE; - writel(flow, ioaddr + GMAC_RX_FLOW_CTRL); } + writel(flow, ioaddr + GMAC_RX_FLOW_CTRL); + if (fc & FLOW_TX) { pr_debug("\tTransmit Flow-Control ON\n"); @@ -497,7 +498,7 @@ static void dwmac4_flow_ctrl(struct mac_device_info *hw, unsigned int duplex, pr_debug("\tduplex mode: PAUSE %d\n", pause_time); for (queue = 0; queue < tx_cnt; queue++) { - flow |= GMAC_TX_FLOW_CTRL_TFE; + flow = GMAC_TX_FLOW_CTRL_TFE; if (duplex) flow |= @@ -505,6 +506,9 @@ static void dwmac4_flow_ctrl(struct mac_device_info *hw, unsigned int duplex, writel(flow, ioaddr + GMAC_QX_TX_FLOW_CTRL(queue)); } + } else { + for (queue = 0; queue < tx_cnt; queue++) + writel(0, ioaddr + GMAC_QX_TX_FLOW_CTRL(queue)); } } -- 1.7.9.5
[v2, PATCH 2/4] net: stmmac: dwmac-mediatek: disable rx watchdog
disable rx watchdog for dwmac-mediatek, then the hw will issue a rx interrupt once receiving a packet, so the responding time for rx path will be reduced. Signed-off-by: Biao Huang --- .../net/ethernet/stmicro/stmmac/dwmac-mediatek.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c index b84269e..79f2ee3 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c @@ -356,6 +356,7 @@ static int mediatek_dwmac_probe(struct platform_device *pdev) plat_dat->has_gmac4 = 1; plat_dat->has_gmac = 0; plat_dat->pmt = 0; + plat_dat->riwt_off = 1; plat_dat->maxmtu = ETH_DATA_LEN; plat_dat->bsp_priv = priv_plat; plat_dat->init = mediatek_dwmac_init; -- 1.7.9.5
[v2, PATCH 0/4] complete dwmac-mediatek driver and fix flow control issue
Changes in v2: patch#1: there is no extra action in mediatek_dwmac_remove, remove it v1: This series mainly complete dwmac-mediatek driver: 1. add power on/off operations for dwmac-mediatek. 2. disable rx watchdog to reduce rx path reponding time. 3. change the default value of tx-frames from 25 to 1, so ptp4l will test pass by default. and also fix the issue that flow control won't be disabled any more once being enabled. Biao Huang (4): net: stmmac: dwmac-mediatek: enable Ethernet power domain net: stmmac: dwmac-mediatek: disable rx watchdog net: stmmac: modify default value of tx-frames net: stmmac: dwmac4: fix flow control issue drivers/net/ethernet/stmicro/stmmac/common.h |2 +- .../net/ethernet/stmicro/stmmac/dwmac-mediatek.c |8 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c |8 ++-- 3 files changed, 15 insertions(+), 3 deletions(-) -- 1.7.9.5
[v2, PATCH 3/4] net: stmmac: modify default value of tx-frames
the default value of tx-frames is 25, it's too late when passing tstamp to stack, then the ptp4l will fail: ptp4l -i eth0 -f gPTP.cfg -m ptp4l: selected /dev/ptp0 as PTP clock ptp4l: port 1: INITIALIZING to LISTENING on INITIALIZE ptp4l: port 0: INITIALIZING to LISTENING on INITIALIZE ptp4l: port 1: link up ptp4l: timed out while polling for tx timestamp ptp4l: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l: port 1: send peer delay response failed ptp4l: port 1: LISTENING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l tests pass when changing the tx-frames from 25 to 1 with ethtool -C option. It should be fine to set tx-frames default value to 1, so ptp4l will pass by default. Signed-off-by: Biao Huang --- drivers/net/ethernet/stmicro/stmmac/common.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index 26bbcd8..6a08cec 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -261,7 +261,7 @@ struct stmmac_safety_stats { #define STMMAC_COAL_TX_TIMER 1000 #define STMMAC_MAX_COAL_TX_TICK10 #define STMMAC_TX_MAX_FRAMES 256 -#define STMMAC_TX_FRAMES 25 +#define STMMAC_TX_FRAMES 1 /* Packets types */ enum packets_types { -- 1.7.9.5
[PATCH] arm64: dts: imx8mm: Move gic node into soc node
From: Anson Huang GIC is inside of SoC from architecture perspective, it should be located inside of soc node in DT. Signed-off-by: Anson Huang --- arch/arm64/boot/dts/freescale/imx8mm.dtsi | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/arm64/boot/dts/freescale/imx8mm.dtsi b/arch/arm64/boot/dts/freescale/imx8mm.dtsi index dc99f45..429312e 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mm.dtsi @@ -169,15 +169,6 @@ clock-output-names = "clk_ext4"; }; - gic: interrupt-controller@3880 { - compatible = "arm,gic-v3"; - reg = <0x0 0x3880 0 0x1>, /* GIC Dist */ - <0x0 0x3888 0 0xC>; /* GICR (RD_base + SGI_base) */ - #interrupt-cells = <3>; - interrupt-controller; - interrupts = ; - }; - psci { compatible = "arm,psci-1.0"; method = "smc"; @@ -739,6 +730,15 @@ dma-names = "rx-tx"; status = "disabled"; }; + + gic: interrupt-controller@3880 { + compatible = "arm,gic-v3"; + reg = <0x3880 0x1>, /* GIC Dist */ + <0x3888 0xc>; /* GICR (RD_base + SGI_base) */ + #interrupt-cells = <3>; + interrupt-controller; + interrupts = ; + }; }; usbphynop1: usbphynop1 { -- 2.7.4
Re: [PATCH net-next 00/12] code optimizations & bugfixes for HNS3 driver
On 2019/6/1 8:18, David Miller wrote: From: David Miller Date: Fri, 31 May 2019 17:15:29 -0700 (PDT) From: Huazhong Tan Date: Fri, 31 May 2019 16:54:46 +0800 This patch-set includes code optimizations and bugfixes for the HNS3 ethernet controller driver. [patch 1/12] removes the redundant core reset type [patch 2/12 - 3/12] fixes two VLAN related issues [patch 4/12] fixes a TM issue [patch 5/12 - 12/12] includes some patches related to RAS & MSI-X error Series applied. I reverted, you need to actually build test the infiniband side of your driver. drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function ‘hns_roce_v2_msix_interrupt_abn’: drivers/infiniband/hw/hns/hns_roce_hw_v2.c:5032:14: warning: passing argument 2 of ‘ops->set_default_reset_request’ makes pointer from integer without a cast [-Wint-conversion] HNAE3_FUNC_RESET); ^~~~ drivers/infiniband/hw/hns/hns_roce_hw_v2.c:5032:14: note: expected ‘long unsigned int *’ but argument is of type ‘int’ C-c C-cmake[5]: *** Deleting file 'drivers/net/wireless/ath/carl9170/cmd.o' Sorry, I will remove [10/12 - 11/12] for V2, these two patches needs to modify HNS's infiniband driver at the same time, so they will be upstreamed later with the infiniband's one.
Re: rcu_read_lock lost its compiler barrier
On Sun, Jun 02, 2019 at 01:56:07PM +0800, Herbert Xu wrote: > Digging up an old email because I was not aware of this previously > but Paul pointed me to it during another discussion. > > On Mon, Sep 21, 2015 at 01:43:27PM -0700, Paul E. McKenney wrote: > > On Mon, Sep 21, 2015 at 09:30:49PM +0200, Frederic Weisbecker wrote: > > > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > > > index d63bb77..6c3cece 100644 > > > > --- a/include/linux/rcupdate.h > > > > +++ b/include/linux/rcupdate.h > > > > @@ -297,12 +297,14 @@ void synchronize_rcu(void); > > > > > > > > static inline void __rcu_read_lock(void) > > > > { > > > > - preempt_disable(); > > > > + if (IS_ENABLED(CONFIG_PREEMPT_COUNT)) > > > > + preempt_disable(); > > > > > > preempt_disable() is a no-op when !CONFIG_PREEMPT_COUNT, right? > > > Or rather it's a barrier(), which is anyway implied by rcu_read_lock(). > > > > > > So perhaps we can get rid of the IS_ENABLED() check? > > > > Actually, barrier() is not intended to be implied by rcu_read_lock(). > > In a non-preemptible RCU implementation, it doesn't help anything > > to have the compiler flush its temporaries upon rcu_read_lock() > > and rcu_read_unlock(). > > This is seriously broken. RCU has been around for years and is > used throughout the kernel while the compiler barrier existed. Please note that preemptible Tree RCU has lacked the compiler barrier on all but the outermost rcu_read_unlock() for years before Boqun's patch. So exactly where in the code that we are currently discussing are you relying on compiler barriers in either rcu_read_lock() or rcu_read_unlock()? The grace-period guarantee allows the compiler ordering to be either in the readers (SMP&), in the grace-period mechanism (SMP&&!PREEMPT), or both (SRCU). > You can't then go and decide to remove the compiler barrier! To do > that you'd need to audit every single use of rcu_read_lock in the > kernel to ensure that they're not depending on the compiler barrier. > > This is also contrary to the definition of almost every other > *_lock primitive in the kernel where the compiler barrier is > included. > > So please revert this patch. I do not believe that reverting that patch will help you at all. But who knows? So please point me at the full code body that was being debated earlier on this thread. It will no doubt take me quite a while to dig through it, given my being on the road for the next couple of weeks, but so it goes. Thanx, Paul
[PATCH V2 3/3] arm64: defconfig: Select CONFIG_CLK_IMX8MN by default
From: Anson Huang Enable CONFIG_CLK_IMX8MN to support i.MX8MN clock driver. Signed-off-by: Anson Huang --- No changes. --- arch/arm64/configs/defconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 8d4f25c..aef797c 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -654,6 +654,7 @@ CONFIG_COMMON_CLK_CS2000_CP=y CONFIG_COMMON_CLK_S2MPS11=y CONFIG_CLK_QORIQ=y CONFIG_COMMON_CLK_PWM=y +CONFIG_CLK_IMX8MN=y CONFIG_CLK_IMX8MM=y CONFIG_CLK_IMX8MQ=y CONFIG_CLK_IMX8QXP=y -- 2.7.4
[PATCH V2 1/3] dt-bindings: imx: Add clock binding doc for i.MX8MN
From: Anson Huang Add the clock binding doc for i.MX8MN. Signed-off-by: Anson Huang --- No changes. --- .../devicetree/bindings/clock/imx8mn-clock.txt | 29 +++ include/dt-bindings/clock/imx8mn-clock.h | 215 + 2 files changed, 244 insertions(+) create mode 100644 Documentation/devicetree/bindings/clock/imx8mn-clock.txt create mode 100644 include/dt-bindings/clock/imx8mn-clock.h diff --git a/Documentation/devicetree/bindings/clock/imx8mn-clock.txt b/Documentation/devicetree/bindings/clock/imx8mn-clock.txt new file mode 100644 index 000..d83db5c --- /dev/null +++ b/Documentation/devicetree/bindings/clock/imx8mn-clock.txt @@ -0,0 +1,29 @@ +* Clock bindings for NXP i.MX8M Nano + +Required properties: +- compatible: Should be "fsl,imx8mn-ccm" +- reg: Address and length of the register set +- #clock-cells: Should be <1> +- clocks: list of clock specifiers, must contain an entry for each required + entry in clock-names +- clock-names: should include the following entries: +- "osc_32k" +- "osc_24m" +- "clk_ext1" +- "clk_ext2" +- "clk_ext3" +- "clk_ext4" + +clk: clock-controller@3038 { + compatible = "fsl,imx8mn-ccm"; + reg = <0x0 0x3038 0x0 0x1>; + #clock-cells = <1>; + clocks = <_32k>, <_24m>, <_ext1>, <_ext2>, +<_ext3>, <_ext4>; + clock-names = "osc_32k", "osc_24m", "clk_ext1", "clk_ext2", + "clk_ext3", "clk_ext4"; +}; + +The clock consumer should specify the desired clock by having the clock +ID in its "clocks" phandle cell. See include/dt-bindings/clock/imx8mn-clock.h +for the full list of i.MX8M Nano clock IDs. diff --git a/include/dt-bindings/clock/imx8mn-clock.h b/include/dt-bindings/clock/imx8mn-clock.h new file mode 100644 index 000..5255b1c --- /dev/null +++ b/include/dt-bindings/clock/imx8mn-clock.h @@ -0,0 +1,215 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright 2018-2019 NXP + */ + +#ifndef __DT_BINDINGS_CLOCK_IMX8MN_H +#define __DT_BINDINGS_CLOCK_IMX8MN_H + +#define IMX8MN_CLK_DUMMY 0 +#define IMX8MN_CLK_32K 1 +#define IMX8MN_CLK_24M 2 +#define IMX8MN_OSC_HDMI_CLK3 +#define IMX8MN_CLK_EXT14 +#define IMX8MN_CLK_EXT25 +#define IMX8MN_CLK_EXT36 +#define IMX8MN_CLK_EXT47 +#define IMX8MN_AUDIO_PLL1_REF_SEL 8 +#define IMX8MN_AUDIO_PLL2_REF_SEL 9 +#define IMX8MN_VIDEO_PLL1_REF_SEL 10 +#define IMX8MN_DRAM_PLL_REF_SEL11 +#define IMX8MN_GPU_PLL_REF_SEL 12 +#define IMX8MN_VPU_PLL_REF_SEL 13 +#define IMX8MN_ARM_PLL_REF_SEL 14 +#define IMX8MN_SYS_PLL1_REF_SEL15 +#define IMX8MN_SYS_PLL2_REF_SEL16 +#define IMX8MN_SYS_PLL3_REF_SEL17 +#define IMX8MN_AUDIO_PLL1 18 +#define IMX8MN_AUDIO_PLL2 19 +#define IMX8MN_VIDEO_PLL1 20 +#define IMX8MN_DRAM_PLL21 +#define IMX8MN_GPU_PLL 22 +#define IMX8MN_VPU_PLL 23 +#define IMX8MN_ARM_PLL 24 +#define IMX8MN_SYS_PLL125 +#define IMX8MN_SYS_PLL226 +#define IMX8MN_SYS_PLL327 +#define IMX8MN_AUDIO_PLL1_BYPASS 28 +#define IMX8MN_AUDIO_PLL2_BYPASS 29 +#define IMX8MN_VIDEO_PLL1_BYPASS 30 +#define IMX8MN_DRAM_PLL_BYPASS 31 +#define IMX8MN_GPU_PLL_BYPASS 32 +#define IMX8MN_VPU_PLL_BYPASS 33 +#define IMX8MN_ARM_PLL_BYPASS 34 +#define IMX8MN_SYS_PLL1_BYPASS 35 +#define IMX8MN_SYS_PLL2_BYPASS 36 +#define IMX8MN_SYS_PLL3_BYPASS 37 +#define IMX8MN_AUDIO_PLL1_OUT 38 +#define IMX8MN_AUDIO_PLL2_OUT 39 +#define IMX8MN_VIDEO_PLL1_OUT 40 +#define IMX8MN_DRAM_PLL_OUT41 +#define IMX8MN_GPU_PLL_OUT 42 +#define IMX8MN_VPU_PLL_OUT 43 +#define IMX8MN_ARM_PLL_OUT 44 +#define IMX8MN_SYS_PLL1_OUT45 +#define IMX8MN_SYS_PLL2_OUT46 +#define IMX8MN_SYS_PLL3_OUT47 +#define IMX8MN_SYS_PLL1_40M48 +#define IMX8MN_SYS_PLL1_80M49 +#define IMX8MN_SYS_PLL1_100M 50 +#define IMX8MN_SYS_PLL1_133M 51 +#define IMX8MN_SYS_PLL1_160M 52 +#define IMX8MN_SYS_PLL1_200M 53 +#define IMX8MN_SYS_PLL1_266M 54 +#define IMX8MN_SYS_PLL1_400M
[PATCH V2 2/3] clk: imx: Add support for i.MX8MN clock driver
From: Anson Huang This patch adds i.MX8MN clock driver support. Signed-off-by: Anson Huang --- Changes since V1: - add GPIOx clocks. --- drivers/clk/imx/Kconfig | 6 + drivers/clk/imx/Makefile | 1 + drivers/clk/imx/clk-imx8mn.c | 614 +++ 3 files changed, 621 insertions(+) create mode 100644 drivers/clk/imx/clk-imx8mn.c diff --git a/drivers/clk/imx/Kconfig b/drivers/clk/imx/Kconfig index 0eaf418..1ac0c79 100644 --- a/drivers/clk/imx/Kconfig +++ b/drivers/clk/imx/Kconfig @@ -14,6 +14,12 @@ config CLK_IMX8MM help Build the driver for i.MX8MM CCM Clock Driver +config CLK_IMX8MN + bool "IMX8MN CCM Clock Driver" + depends on ARCH_MXC && ARM64 + help + Build the driver for i.MX8MN CCM Clock Driver + config CLK_IMX8MQ bool "IMX8MQ CCM Clock Driver" depends on ARCH_MXC && ARM64 diff --git a/drivers/clk/imx/Makefile b/drivers/clk/imx/Makefile index 05641c6..70a55cd 100644 --- a/drivers/clk/imx/Makefile +++ b/drivers/clk/imx/Makefile @@ -25,6 +25,7 @@ obj-$(CONFIG_MXC_CLK_SCU) += \ clk-scu.o \ clk-lpcg-scu.o +obj-$(CONFIG_CLK_IMX8MN) += clk-imx8mn.o obj-$(CONFIG_CLK_IMX8MM) += clk-imx8mm.o obj-$(CONFIG_CLK_IMX8MQ) += clk-imx8mq.o obj-$(CONFIG_CLK_IMX8QXP) += clk-imx8qxp.o clk-imx8qxp-lpcg.o diff --git a/drivers/clk/imx/clk-imx8mn.c b/drivers/clk/imx/clk-imx8mn.c new file mode 100644 index 000..7a92c75a --- /dev/null +++ b/drivers/clk/imx/clk-imx8mn.c @@ -0,0 +1,614 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2018-2019 NXP. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "clk.h" + +static u32 share_count_sai2; +static u32 share_count_sai3; +static u32 share_count_sai5; +static u32 share_count_sai6; +static u32 share_count_sai7; +static u32 share_count_disp; +static u32 share_count_pdm; +static u32 share_count_nand; + +enum { + ARM_PLL, + GPU_PLL, + VPU_PLL, + SYS_PLL1, + SYS_PLL2, + SYS_PLL3, + DRAM_PLL, + AUDIO_PLL1, + AUDIO_PLL2, + VIDEO_PLL2, + NR_PLLS, +}; + +#define PLL_1416X_RATE(_rate, _m, _p, _s) \ + { \ + .rate = (_rate),\ + .mdiv = (_m), \ + .pdiv = (_p), \ + .sdiv = (_s), \ + } + +#define PLL_1443X_RATE(_rate, _m, _p, _s, _k) \ + { \ + .rate = (_rate),\ + .mdiv = (_m), \ + .pdiv = (_p), \ + .sdiv = (_s), \ + .kdiv = (_k), \ + } + +static const struct imx_pll14xx_rate_table imx8mn_pll1416x_tbl[] = { + PLL_1416X_RATE(18U, 225, 3, 0), + PLL_1416X_RATE(16U, 200, 3, 0), + PLL_1416X_RATE(12U, 300, 3, 1), + PLL_1416X_RATE(10U, 250, 3, 1), + PLL_1416X_RATE(8U, 200, 3, 1), + PLL_1416X_RATE(75000U, 250, 2, 2), + PLL_1416X_RATE(7U, 350, 3, 2), + PLL_1416X_RATE(6U, 300, 3, 2), +}; + +static const struct imx_pll14xx_rate_table imx8mn_audiopll_tbl[] = { + PLL_1443X_RATE(786432000U, 655, 5, 2, 23593), + PLL_1443X_RATE(722534400U, 301, 5, 1, 3670), +}; + +static const struct imx_pll14xx_rate_table imx8mn_videopll_tbl[] = { + PLL_1443X_RATE(65000U, 325, 3, 2, 0), + PLL_1443X_RATE(59400U, 198, 2, 2, 0), +}; + +static const struct imx_pll14xx_rate_table imx8mn_drampll_tbl[] = { + PLL_1443X_RATE(65000U, 325, 3, 2, 0), +}; + +static struct imx_pll14xx_clk imx8mn_audio_pll __initdata = { + .type = PLL_1443X, + .rate_table = imx8mn_audiopll_tbl, +}; + +static struct imx_pll14xx_clk imx8mn_video_pll __initdata = { + .type = PLL_1443X, + .rate_table = imx8mn_videopll_tbl, +}; + +static struct imx_pll14xx_clk imx8mn_dram_pll __initdata = { + .type = PLL_1443X, + .rate_table = imx8mn_drampll_tbl, +}; + +static struct imx_pll14xx_clk imx8mn_arm_pll __initdata = { + .type = PLL_1416X, + .rate_table = imx8mn_pll1416x_tbl, +}; + +static struct imx_pll14xx_clk imx8mn_gpu_pll __initdata = { + .type = PLL_1416X, + .rate_table = imx8mn_pll1416x_tbl, +}; + +static struct imx_pll14xx_clk imx8mn_vpu_pll __initdata = { + .type = PLL_1416X, + .rate_table = imx8mn_pll1416x_tbl, +}; + +static struct imx_pll14xx_clk imx8mn_sys_pll __initdata = { + .type = PLL_1416X, + .rate_table = imx8mn_pll1416x_tbl,
Re: [PATCH v4] ARM: dts: aspeed: Add YADRO VESNIN BMC
On Fri, 31 May 2019, at 18:40, Alexander Filippov wrote: > VESNIN is an OpenPower machine with an Aspeed 2400 BMC SoC manufactured > by YADRO. > > Signed-off-by: Alexander Filippov Reviewed-by: Andrew Jeffery > --- > arch/arm/boot/dts/Makefile | 1 + > arch/arm/boot/dts/aspeed-bmc-opp-vesnin.dts | 224 > 2 files changed, 225 insertions(+) > create mode 100644 arch/arm/boot/dts/aspeed-bmc-opp-vesnin.dts > > diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile > index dab2914fa293..64a956372fe1 100644 > --- a/arch/arm/boot/dts/Makefile > +++ b/arch/arm/boot/dts/Makefile > @@ -1272,6 +1272,7 @@ dtb-$(CONFIG_ARCH_ASPEED) += \ > aspeed-bmc-opp-lanyang.dtb \ > aspeed-bmc-opp-palmetto.dtb \ > aspeed-bmc-opp-romulus.dtb \ > + aspeed-bmc-opp-vesnin.dtb \ > aspeed-bmc-opp-witherspoon.dtb \ > aspeed-bmc-opp-zaius.dtb \ > aspeed-bmc-portwell-neptune.dtb \ > diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-vesnin.dts > b/arch/arm/boot/dts/aspeed-bmc-opp-vesnin.dts > new file mode 100644 > index ..0b9e29c3212e > --- /dev/null > +++ b/arch/arm/boot/dts/aspeed-bmc-opp-vesnin.dts > @@ -0,0 +1,224 @@ > +// SPDX-License-Identifier: GPL-2.0+ > +// Copyright 2019 YADRO > +/dts-v1/; > + > +#include "aspeed-g4.dtsi" > +#include > + > +/ { > + model = "Vesnin BMC"; > + compatible = "yadro,vesnin-bmc", "aspeed,ast2400"; > + > + chosen { > + stdout-path = > + bootargs = "console=ttyS4,115200 earlyprintk"; > + }; > + > + memory { > + reg = <0x4000 0x2000>; > + }; > + > + reserved-memory { > + #address-cells = <1>; > + #size-cells = <1>; > + ranges; > + > + vga_memory: framebuffer@5f00 { > + no-map; > + reg = <0x5f00 0x0100>; /* 16MB */ > + }; > + flash_memory: region@5c00 { > + no-map; > + reg = <0x5c00 0x0200>; /* 32M */ > + }; > + }; > + > + leds { > + compatible = "gpio-leds"; > + > + heartbeat { > + gpios = < ASPEED_GPIO(R, 4) GPIO_ACTIVE_LOW>; > + }; > + power_red { > + gpios = < ASPEED_GPIO(N, 1) GPIO_ACTIVE_LOW>; > + }; > + > + id_blue { > + gpios = < ASPEED_GPIO(O, 0) GPIO_ACTIVE_LOW>; > + }; > + > + alarm_red { > + gpios = < ASPEED_GPIO(N, 6) GPIO_ACTIVE_LOW>; > + }; > + > + alarm_yel { > + gpios = < ASPEED_GPIO(N, 7) GPIO_ACTIVE_HIGH>; > + }; > + }; > + > + gpio-keys { > + compatible = "gpio-keys"; > + > + button_checkstop { > + label = "checkstop"; > + linux,code = <74>; > + gpios = < ASPEED_GPIO(P, 5) GPIO_ACTIVE_LOW>; > + }; > + > + button_identify { > + label = "identify"; > + linux,code = <152>; > + gpios = < ASPEED_GPIO(O, 7) GPIO_ACTIVE_LOW>; > + }; > + }; > +}; > + > + { > + status = "okay"; > + flash@0 { > + status = "okay"; > + m25p,fast-read; > +label = "bmc"; > +#include "openbmc-flash-layout.dtsi" > + }; > +}; > + > + { > + status = "okay"; > + pinctrl-names = "default"; > + pinctrl-0 = <_spi1debug_default>; > + > + flash@0 { > + status = "okay"; > + label = "pnor"; > + m25p,fast-read; > + }; > +}; > + > + { > + status = "okay"; > + > + use-ncsi; > + no-hw-checksum; > + > + pinctrl-names = "default"; > + pinctrl-0 = <_rmii1_default>; > +}; > + > + > + { > + status = "okay"; > +}; > + > +_ctrl { > + status = "okay"; > + memory-region = <_memory>; > + flash = <>; > +}; > + > + { > + status = "okay"; > +}; > + > + { > + status = "okay"; > + pinctrl-names = "default"; > + pinctrl-0 = <_txd2_default _rxd2_default>; > +}; > + > + { > + status = "okay"; > + > + eeprom@50 { > + compatible = "atmel,24c256"; > + reg = <0x50>; > + pagesize = <64>; > + }; > +}; > + > + { > + status = "okay"; > + > + tmp75@49 { > + compatible = "ti,tmp75"; > + reg = <0x49>; > + }; > +}; > + > + { > + status = "okay"; > +}; > + > + { > + status = "okay"; > +}; > + > + { > + status = "okay"; > + > + occ-hwmon@50 { > + compatible = "ibm,p8-occ-hwmon"; > + reg = <0x50>; > + }; > +}; > + > + { > + status = "okay"; > + > + occ-hwmon@51 { > + compatible = "ibm,p8-occ-hwmon"; > + reg = <0x51>; > + }; > +}; > + > + { > + status = "okay";
[PATCH V2 2/3] arm64: dts: freescale: Add i.MX8MN dtsi support
From: Anson Huang The i.MX8M Nano Media Applications Processor is a new SoC of the i.MX8M family, it is a 14nm FinFET product of the growing mScale family targeting the consumer market. It is built in Samsung 14LPP to achieve both high performance and low power consumption and relies on a powerful fully coherent core complex based on a quad core ARM Cortex-A53 cluster, Cortex-M7 low-power coprocessor and graphics accelerator. This patch adds the basic dtsi support for i.MX8MN. Signed-off-by: Anson Huang --- Changes since V1: - fix build warnings of soc/aips bus unit name and reg properties; - move gic into soc node; - move usbphynop1/usbphynop2 node outside the soc node. --- arch/arm64/boot/dts/freescale/imx8mn.dtsi | 710 ++ 1 file changed, 710 insertions(+) create mode 100644 arch/arm64/boot/dts/freescale/imx8mn.dtsi diff --git a/arch/arm64/boot/dts/freescale/imx8mn.dtsi b/arch/arm64/boot/dts/freescale/imx8mn.dtsi new file mode 100644 index 000..1fb9148 --- /dev/null +++ b/arch/arm64/boot/dts/freescale/imx8mn.dtsi @@ -0,0 +1,710 @@ +// SPDX-License-Identifier: (GPL-2.0+ OR MIT) +/* + * Copyright 2019 NXP + */ + +#include +#include +#include +#include + +#include "imx8mn-pinfunc.h" + +/ { + compatible = "fsl,imx8mn"; + interrupt-parent = <>; + #address-cells = <2>; + #size-cells = <2>; + + aliases { + ethernet0 = + gpio0 = + gpio1 = + gpio2 = + gpio3 = + gpio4 = + i2c0 = + i2c1 = + i2c2 = + i2c3 = + mmc0 = + mmc1 = + mmc2 = + serial0 = + serial1 = + serial2 = + serial3 = + spi0 = + spi1 = + spi2 = + }; + + cpus { + #address-cells = <1>; + #size-cells = <0>; + + A53_0: cpu@0 { + device_type = "cpu"; + compatible = "arm,cortex-a53"; + reg = <0x0>; + clock-latency = <61036>; + clocks = < IMX8MN_CLK_ARM>; + enable-method = "psci"; + next-level-cache = <_L2>; + }; + + A53_1: cpu@1 { + device_type = "cpu"; + compatible = "arm,cortex-a53"; + reg = <0x1>; + clock-latency = <61036>; + clocks = < IMX8MN_CLK_ARM>; + enable-method = "psci"; + next-level-cache = <_L2>; + }; + + A53_2: cpu@2 { + device_type = "cpu"; + compatible = "arm,cortex-a53"; + reg = <0x2>; + clock-latency = <61036>; + clocks = < IMX8MN_CLK_ARM>; + enable-method = "psci"; + next-level-cache = <_L2>; + }; + + A53_3: cpu@3 { + device_type = "cpu"; + compatible = "arm,cortex-a53"; + reg = <0x3>; + clock-latency = <61036>; + clocks = < IMX8MN_CLK_ARM>; + enable-method = "psci"; + next-level-cache = <_L2>; + }; + + A53_L2: l2-cache0 { + compatible = "cache"; + }; + }; + + memory@4000 { + device_type = "memory"; + reg = <0x0 0x4000 0 0x8000>; + }; + + osc_32k: clock-osc-32k { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <32768>; + clock-output-names = "osc_32k"; + }; + + osc_24m: clock-osc-24m { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <2400>; + clock-output-names = "osc_24m"; + }; + + clk_ext1: clock-ext1 { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <13300>; + clock-output-names = "clk_ext1"; + }; + + clk_ext2: clock-ext2 { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <13300>; + clock-output-names = "clk_ext2"; + }; + + clk_ext3: clock-ext3 { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <13300>; + clock-output-names = "clk_ext3"; + }; + + clk_ext4: clock-ext4 { + compatible = "fixed-clock"; +
[PATCH V2 3/3] arm64: dts: freescale: Add i.MX8MN DDR4 EVK board support
From: Anson Huang This patch adds basic i.MM8MN DDR4 EVK board support. Signed-off-by: Anson Huang --- No changes. --- arch/arm64/boot/dts/freescale/Makefile| 1 + arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts | 217 ++ 2 files changed, 218 insertions(+) create mode 100644 arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts diff --git a/arch/arm64/boot/dts/freescale/Makefile b/arch/arm64/boot/dts/freescale/Makefile index 0bd122f..2cdd4cc 100644 --- a/arch/arm64/boot/dts/freescale/Makefile +++ b/arch/arm64/boot/dts/freescale/Makefile @@ -20,6 +20,7 @@ dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-rdb.dtb dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-qds.dtb dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-rdb.dtb +dtb-$(CONFIG_ARCH_MXC) += imx8mn-ddr4-evk.dtb dtb-$(CONFIG_ARCH_MXC) += imx8mm-evk.dtb dtb-$(CONFIG_ARCH_MXC) += imx8mq-evk.dtb dtb-$(CONFIG_ARCH_MXC) += imx8mq-zii-ultra-rmb3.dtb diff --git a/arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts b/arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts new file mode 100644 index 000..da552c2 --- /dev/null +++ b/arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts @@ -0,0 +1,217 @@ +// SPDX-License-Identifier: (GPL-2.0+ OR MIT) +/* + * Copyright 2019 NXP + */ + +/dts-v1/; + +#include "imx8mn.dtsi" + +/ { + model = "NXP i.MX8MNano DDR4 EVK board"; + compatible = "fsl,imx8mn-ddr4-evk", "fsl,imx8mn"; + + chosen { + stdout-path = + }; + + reg_usdhc2_vmmc: regulator-usdhc2 { + compatible = "regulator-fixed"; + pinctrl-names = "default"; + pinctrl-0 = <_reg_usdhc2_vmmc>; + regulator-name = "VSD_3V3"; + regulator-min-microvolt = <330>; + regulator-max-microvolt = <330>; + gpio = < 19 GPIO_ACTIVE_HIGH>; + enable-active-high; + }; +}; + + { + pinctrl-names = "default"; + + pinctrl_fec1: fec1grp { + fsl,pins = < + MX8MN_IOMUXC_ENET_MDC_ENET1_MDC 0x3 + MX8MN_IOMUXC_ENET_MDIO_ENET1_MDIO 0x3 + MX8MN_IOMUXC_ENET_TD3_ENET1_RGMII_TD3 0x1f + MX8MN_IOMUXC_ENET_TD2_ENET1_RGMII_TD2 0x1f + MX8MN_IOMUXC_ENET_TD1_ENET1_RGMII_TD1 0x1f + MX8MN_IOMUXC_ENET_TD0_ENET1_RGMII_TD0 0x1f + MX8MN_IOMUXC_ENET_RD3_ENET1_RGMII_RD3 0x91 + MX8MN_IOMUXC_ENET_RD2_ENET1_RGMII_RD2 0x91 + MX8MN_IOMUXC_ENET_RD1_ENET1_RGMII_RD1 0x91 + MX8MN_IOMUXC_ENET_RD0_ENET1_RGMII_RD0 0x91 + MX8MN_IOMUXC_ENET_TXC_ENET1_RGMII_TXC 0x1f + MX8MN_IOMUXC_ENET_RXC_ENET1_RGMII_RXC 0x91 + MX8MN_IOMUXC_ENET_RX_CTL_ENET1_RGMII_RX_CTL 0x91 + MX8MN_IOMUXC_ENET_TX_CTL_ENET1_RGMII_TX_CTL 0x1f + MX8MN_IOMUXC_SAI2_RXC_GPIO4_IO220x19 + >; + }; + + pinctrl_reg_usdhc2_vmmc: regusdhc2vmmc { + fsl,pins = < + MX8MN_IOMUXC_SD2_RESET_B_GPIO2_IO19 0x41 + >; + }; + + pinctrl_uart2: uart2grp { + fsl,pins = < + MX8MN_IOMUXC_UART2_RXD_UART2_DCE_RX 0x140 + MX8MN_IOMUXC_UART2_TXD_UART2_DCE_TX 0x140 + >; + }; + + pinctrl_usdhc2_gpio: usdhc2grpgpio { + fsl,pins = < + MX8MN_IOMUXC_GPIO1_IO15_GPIO1_IO15 0x1c4 + >; + }; + + pinctrl_usdhc2: usdhc2grp { + fsl,pins = < + MX8MN_IOMUXC_SD2_CLK_USDHC2_CLK 0x190 + MX8MN_IOMUXC_SD2_CMD_USDHC2_CMD 0x1d0 + MX8MN_IOMUXC_SD2_DATA0_USDHC2_DATA0 0x1d0 + MX8MN_IOMUXC_SD2_DATA1_USDHC2_DATA1 0x1d0 + MX8MN_IOMUXC_SD2_DATA2_USDHC2_DATA2 0x1d0 + MX8MN_IOMUXC_SD2_DATA3_USDHC2_DATA3 0x1d0 + MX8MN_IOMUXC_GPIO1_IO04_USDHC2_VSELECT 0x1d0 + >; + }; + + pinctrl_usdhc2_100mhz: usdhc2grp100mhz { + fsl,pins = < + MX8MN_IOMUXC_SD2_CLK_USDHC2_CLK 0x194 + MX8MN_IOMUXC_SD2_CMD_USDHC2_CMD 0x1d4 + MX8MN_IOMUXC_SD2_DATA0_USDHC2_DATA0 0x1d4 + MX8MN_IOMUXC_SD2_DATA1_USDHC2_DATA1 0x1d4 + MX8MN_IOMUXC_SD2_DATA2_USDHC2_DATA2 0x1d4 + MX8MN_IOMUXC_SD2_DATA3_USDHC2_DATA3 0x1d4 + MX8MN_IOMUXC_GPIO1_IO04_USDHC2_VSELECT 0x1d0 + >; + }; + + pinctrl_usdhc2_200mhz: usdhc2grp200mhz { + fsl,pins = < +
[PATCH V2 1/3] dt-bindings: arm: imx: Add the soc binding for i.MX8MN
From: Anson Huang This patch adds the soc & board binding for i.MX8MN. Signed-off-by: Anson Huang --- No changes. --- Documentation/devicetree/bindings/arm/fsl.yaml | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/devicetree/bindings/arm/fsl.yaml b/Documentation/devicetree/bindings/arm/fsl.yaml index 407138e..b1a5231 100644 --- a/Documentation/devicetree/bindings/arm/fsl.yaml +++ b/Documentation/devicetree/bindings/arm/fsl.yaml @@ -171,6 +171,12 @@ properties: - const: compulab,cl-som-imx7 - const: fsl,imx7d + - description: i.MX8MN based Boards +items: + - enum: + - fsl,imx8mn-ddr4-evk# i.MX8MN DDR4 EVK Board + - const: fsl,imx8mn + - description: i.MX8MM based Boards items: - enum: -- 2.7.4