Re: [PATCH] powerpc/time: use get_tb instead of get_vtb in running_clock
Hi Ben I add some printk logs in watchdog_timer_fn in the guest [ 16.025222] get_vtb=8236291881, get_tb=13756711357, get_timestamp=4 [ 20.025624] get_vtb=9745285807, get_tb=15804711283, get_timestamp=7 [ 24.025042] get_vtb=11518119641, get_tb=17852711085, get_timestamp=10 [ 28.024074] get_vtb=13192704319, get_tb=19900711071, get_timestamp=13 [ 32.024086] get_vtb=14856516982, get_tb=21948711066, get_timestamp=16 [ 36.024075] get_vtb=16569127618, get_tb=23996711078, get_timestamp=20 [ 40.024138] get_vtb=17008865823, get_tb=26044718418, get_timestamp=20 [ 44.023993] get_vtb=17020637241, get_tb=28092716383, get_timestamp=20 [ 48.023996] get_vtb=17022857170, get_tb=30140718472, get_timestamp=20 [ 52.023996] get_vtb=17024268541, get_tb=32188718432, get_timestamp=20 [ 56.023996] get_vtb=17036577783, get_tb=34236718077, get_timestamp=20 [ 60.023996] get_vtb=17037829743, get_tb=36284718437, get_timestamp=20 [ 64.023992] get_vtb=17039846747, get_tb=38332716609, get_timestamp=20 [ 68.023991] get_vtb=17041448345, get_tb=40380715903, get_timestamp=20 The get_timestamp(use get_vtb(),unit is second) is slower down compared with printk time. You also can obviously watch the get_vtb increment is slowly less than get_tb increment. Without this patch, I thought there might be some softlockup warnings missed in guest. -Jia On 13/07/2017 6:45 AM, Benjamin Herrenschmidt wrote: On Wed, 2017-07-12 at 23:01 +0800, Jia He wrote: Virtual time base(vtb) is a register which increases only in guest. Any exit from guest to host will stop the vtb(saved and restored by kvm). But if there is an IO causes guest exits to host, the guest's watchdog (watchdog_timer_fn -> is_softlockup -> get_timestamp -> running_clock) needs to also include the time elapsed in host. get_vtb is not correct in this case. Also, the TB_OFFSET is well saved and restored by qemu after commit [1]. So we can use get_tb here. That completely defeats the purpose here... This was done specifically to exploit the VTB which doesn't count in hypervisor mode. [1] http://git.qemu.org/?p=qemu.git;a=commit;h=42043e4f1 Signed-off-by: Jia He --- arch/powerpc/kernel/time.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index fe6f3a2..c542dd3 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -695,16 +695,15 @@ notrace unsigned long long sched_clock(void) unsigned long long running_clock(void) { /* -* Don't read the VTB as a host since KVM does not switch in host -* timebase into the VTB when it takes a guest off the CPU, reading the -* VTB would result in reading 'last switched out' guest VTB. +* Use get_tb instead of get_vtb for guest since the TB_OFFSET has been +* well saved/restored when qemu does suspend/resume. * * Host kernels are often compiled with CONFIG_PPC_PSERIES checked, it * would be unsafe to rely only on the #ifdef above. */ if (firmware_has_feature(FW_FEATURE_LPAR) && cpu_has_feature(CPU_FTR_ARCH_207S)) - return mulhdu(get_vtb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift; + return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift; /* * This is a next best approximation without a VTB.
Re: [PATCH v2] mm/vmscan: fix high cpu usage of kswapd if there are no reclaimable pages
Hi Johannes I have another concern: kswapd -> balance_pgdat -> age_active_anon This code path will do some background works to age anon list, will this patch have some impact on it if the retry time is > 16 and kswapd is not waken up? B.R. Jia On 28/02/2017 1:06 AM, Johannes Weiner wrote: On Mon, Feb 27, 2017 at 09:50:24AM +0100, Michal Hocko wrote: On Fri 24-02-17 11:51:05, Johannes Weiner wrote: [...] >From 29fefdca148e28830e0934d4e6cceb95ed2ee36e Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Fri, 24 Feb 2017 10:56:32 -0500 Subject: [PATCH] mm: vmscan: disable kswapd on unreclaimable nodes Jia He reports a problem with kswapd spinning at 100% CPU when requesting more hugepages than memory available in the system: $ echo 4000 >/proc/sys/vm/nr_hugepages top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01 Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers KiB Swap: 6284224 total, 115712 used, 6168512 free.48192 cached Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3 At that time, there are no reclaimable pages left in the node, but as kswapd fails to restore the high watermarks it refuses to go to sleep. Kswapd needs to back away from nodes that fail to balance. Up until 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes") kswapd had such a mechanism. It considered zones whose theoretically reclaimable pages it had reclaimed six times over as unreclaimable and backed away from them. This guard was erroneously removed as the patch changed the definition of a balanced node. However, simply restoring this code wouldn't help in the case reported here: there *are* no reclaimable pages that could be scanned until the threshold is met. Kswapd would stay awake anyway. Introduce a new and much simpler way of backing off. If kswapd runs through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single page, make it back off from the node. This is the same number of shots direct reclaim takes before declaring OOM. Kswapd will go to sleep on that node until a direct reclaimer manages to reclaim some pages, thus proving the node reclaimable again. Yes this looks, nice&simple. I would just be worried about [1] a bit. Maybe that is worth a separate patch though. [1] http://lkml.kernel.org/r/20170223111609.hlncnvokhq3qu...@dhcp22.suse.cz I think I'd prefer the simplicity of keeping this contained inside vmscan.c, as an interaction between direct reclaimers and kswapd, as well as leaving the wakeup tied to actually seeing reclaimable pages rather than merely producing free pages (e.g. should we also add a kick to a large munmap() for example?). OOM kills come with such high latencies that I cannot imagine a slightly quicker kswapd restart would matter in practice. Reported-by: Jia He Signed-off-by: Johannes Weiner Acked-by: Michal Hocko Thanks! I would have just one more suggestion. Please move MAX_RECLAIM_RETRIES to mm/internal.h. This is MM internal thing and there is no need to make it visible. Good point, I'll move it.
Re: [PATCH v2] mm/vmscan: fix high cpu usage of kswapd if there are no reclaimable pages
Hi Tested-by: Jia He cat /proc/meminfo [...] CmaFree: 0 kB HugePages_Total:1831 HugePages_Free: 1831 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 16384 kB top - 06:50:29 up 1:26, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.2 sy, 0.0 ni, 99.6 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 31371520 total, 30577664 used, 793856 free, 256 buffers KiB Swap: 6284224 total, 128 used, 6284096 free. 281280 cached Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 79 root 20 0 0 0 0 S 0.000 0.000 0:00.00 kswapd3 On 25/02/2017 12:51 AM, Johannes Weiner wrote: On Fri, Feb 24, 2017 at 09:49:50AM +0100, Michal Hocko wrote: I believe we should pursue the proposal from Johannes which is more generic and copes with corner cases much better. Jia, can you try this? I'll put the cleanups in follow-up patches. --- From 29fefdca148e28830e0934d4e6cceb95ed2ee36e Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Fri, 24 Feb 2017 10:56:32 -0500 Subject: [PATCH] mm: vmscan: disable kswapd on unreclaimable nodes Jia He reports a problem with kswapd spinning at 100% CPU when requesting more hugepages than memory available in the system: $ echo 4000 >/proc/sys/vm/nr_hugepages top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01 Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers KiB Swap: 6284224 total, 115712 used, 6168512 free.48192 cached Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3 At that time, there are no reclaimable pages left in the node, but as kswapd fails to restore the high watermarks it refuses to go to sleep. Kswapd needs to back away from nodes that fail to balance. Up until 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes") kswapd had such a mechanism. It considered zones whose theoretically reclaimable pages it had reclaimed six times over as unreclaimable and backed away from them. This guard was erroneously removed as the patch changed the definition of a balanced node. However, simply restoring this code wouldn't help in the case reported here: there *are* no reclaimable pages that could be scanned until the threshold is met. Kswapd would stay awake anyway. Introduce a new and much simpler way of backing off. If kswapd runs through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single page, make it back off from the node. This is the same number of shots direct reclaim takes before declaring OOM. Kswapd will go to sleep on that node until a direct reclaimer manages to reclaim some pages, thus proving the node reclaimable again. Reported-by: Jia He Signed-off-by: Johannes Weiner --- include/linux/mmzone.h | 2 ++ include/linux/swap.h | 1 + mm/page_alloc.c| 6 -- mm/vmscan.c| 20 4 files changed, 23 insertions(+), 6 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8e02b3750fe0..d2c50ab6ae40 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -630,6 +630,8 @@ typedef struct pglist_data { int kswapd_order; enum zone_type kswapd_classzone_idx; + int kswapd_failures;/* Number of 'reclaimed == 0' runs */ + #ifdef CONFIG_COMPACTION int kcompactd_max_order; enum zone_type kcompactd_classzone_idx; diff --git a/include/linux/swap.h b/include/linux/swap.h index 45e91dd6716d..5c06581a730b 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -288,6 +288,7 @@ extern void lru_cache_add_active_or_unevictable(struct page *page, struct vm_area_struct *vma); /* linux/mm/vmscan.c */ +#define MAX_RECLAIM_RETRIES 16 extern unsigned long zone_reclaimable_pages(struct zone *zone); extern unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat); extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 614cd0397ce3..83f0442f07fa 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3516,12 +3516,6 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) } /* - * Maximum number of reclaim retries without any progress before OOM killer - * is consider as the only way to move forward. - */ -#define MAX_RECLAIM_RETRIES 16 - -/* * Checks whether it makes sense to retry the reclaim to make a forward progress * for the given allocation request. * The reclaim feedback represented by did_some_progress (any progress during diff --git a/mm/vmscan.c b/mm/vmscan.c index 26c3b405ef34..8e9bdd172182 100644
Fwd: Re: [RFC PATCH] mm/vmscan: fix high cpu usage of kswapd if there
resend it to lkml only. Forwarded Message Subject: Re: [RFC PATCH] mm/vmscan: fix high cpu usage of kswapd if there Date: Thu, 23 Feb 2017 10:46:01 +0800 From: hejianet To: Johannes Weiner CC: linux...@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mel Gorman , Vlastimil Babka , Michal Hocko , Minchan Kim , Rik van Riel sorry, resend it due to a delivery-failure: "Wrong MIME labeling on 8-bit character texts" I am sorry if anybody received it twice Hi Johannes On 23/02/2017 4:16 AM, Johannes Weiner wrote: On Wed, Feb 22, 2017 at 05:04:48PM +0800, Jia He wrote: When I try to dynamically allocate the hugepages more than system total free memory: Then the kswapd will take 100% cpu for a long time(more than 3 hours, and will not be about to end) The root cause is kswapd3 is trying to do relaim again and again but it makes no progress At that time, there are no relaimable pages in that node: Yes, this is a problem with the current kswapd code. A less artificial scenario that I observed recently was machines with two NUMA nodes, after being up for 200+ days, getting into a state where node0 is mostly consumed by anon and some kernel allocations, leaving less than the high watermark free. The machines don't have swap, so the anon isn't reclaimable. But also, anon LRU is never even *scanned*, so the "all unreclaimable" logic doesn't kick in. Kswapd is spinning at 100% CPU calculating scan counts and checking zone states. One specific problem with your patch, Jia, is that there might be some cache pages that are pinned one way or another. That was the case on our machines, and so reclaimable pages wasn't 0. Even if we check the reclaimable pages, we need a hard cutoff after X attempts. And then it sounds pretty much like what the allocator/direct reclaim already does. Can we use the *exact* same cutoff conditions for direct reclaim and kswapd, though? I don't think so. For direct reclaim, the goal is the watermark, to make an allocation happen in the caller. While kswapd tries to restore the watermarks too, it might never meet them but still do useful work on behalf of concurrently allocating threads. It should only stop when it tries and fails to free any pages at all. Yes, this is what I thought before this patch, but seems Michal doesn't like this idea :) Please see https://lkml.org/lkml/2017/1/24/543 Frankly speaking, it did a little impact but not so much on kswapd high cpu usage(not tested your patch here but I've tested my 1st patch) Because in the case when the memory in nodeN is mostly consumed, any direct reclaim path will try to wake up the kswapd: __alloc_pages_slowpath wake_all_kswapds wakeup_kswapd retry for 16 times Compared with the idea which limited the no progress loop of kswapd, avoiding wakeup kswapd can have significant impact on the high cpu usage. B.R. Jia The recent removal of the scanned > reclaimable * 6 exit condition from kswapd was a mistake, we need something like it. But it's also broken the way we perform reclaim on swapless systems, so instead of re-instating it, we should remove the remnants and do something new. Can we simply count the number of balance_pgdat() runs that didn't reclaim anything and have kswapd sleep after MAX_RECLAIM_RETRIES? And a follow-up: once it gives up, when should kswapd return to work? We used to reset NR_PAGES_SCANNED whenever a page gets freed. But that's a branch in a common allocator path, just to recover kswapd - a latency tool, not a necessity for functional correctness - from a situation that's exceedingly pretty rare. How about we leave it disabled until a direct reclaimer manages to free something? Something like this (untested): diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 96c78d840916..611c5fb52d7b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -149,7 +149,6 @@ enum node_stat_item { NR_UNEVICTABLE, /* " " " " " */ NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ - NR_PAGES_SCANNED, /* pages scanned since last reclaim */ PGSCAN_ANON, PGSCAN_FILE, PGSTEAL_ANON, @@ -630,6 +629,8 @@ typedef struct pglist_data { int kswapd_order; enum zone_type kswapd_classzone_idx; + int kswapd_failed_runs; + #ifdef CONFIG_COMPACTION int kcompactd_max_order; enum zone_type kcompactd_classzone_idx; diff --git a/include/linux/swap.h b/include/linux/swap.h index 145005e6dc85..9de49e2710af 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -285,8 +285,8 @@ extern void lru_cache_add_active_or_unevictable(struct page *page, struc
Re: [RFC PATCH] mm/vmscan: fix high cpu usage of kswapd if there
Hi Michal On 22/02/2017 11:48 PM, Michal Hocko wrote: On Wed 22-02-17 22:31:50, hejianet wrote: Hi Michal On 22/02/2017 7:41 PM, Michal Hocko wrote: On Wed 22-02-17 17:04:48, Jia He wrote: When I try to dynamically allocate the hugepages more than system total free memory: e.g. echo 4000 >/proc/sys/vm/nr_hugepages I assume that the command has terminated with less huge pages allocated than requested but Yes, at last the allocated hugepages are less than 4000 HugePages_Total:1864 HugePages_Free: 1864 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 16384 kB In the bad case, although kswapd takes 100% cpu, the number of HugePages_Total is not increase at all. Node 3, zone DMA [...] pages free 2951 min 2821 low 3526 high 4231 it left the zone below high watermark with node_scanned 0 spanned 245760 present 245760 managed 245388 nr_free_pages 2951 nr_zone_inactive_anon 0 nr_zone_active_anon 0 nr_zone_inactive_file 0 nr_zone_active_file 0 no pages reclaimable, so kswapd will not go to sleep. It would be quite easy and comfortable to call it a misconfiguration but it seems that it might be quite easy to hit with NUMA machines which have large differences in the node sizes. I guess it makes sense to back off the kswapd rather than burning CPU without any way to make forward progress. agree. please make sure that this information is in the changelog [...] @@ -3502,6 +3503,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx) { pg_data_t *pgdat; int z; + int node_has_relaimable_pages = 0; if (!managed_zone(zone)) return; @@ -3522,8 +3524,15 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx) if (zone_balanced(zone, order, classzone_idx)) return; + + if (!zone_reclaimable_pages(zone)) + node_has_relaimable_pages = 1; What, this doesn't make any sense? Did you mean if (zone_reclaimable_pages)? I mean, if any one zone has reclaimable pages, then this zone's *node* has reclaimable pages. Thus, the kswapN for this node should be waken up. e.g. node 1 has 2 zones. zone A has no reclaimable pages but zone B has. Thus node 1 has reclaimable pages, and kswapd1 will be waken up. I use node_has_relaimable_pages in the loop to check all the zones' reclaimable pages number. So I prefer the name node_has_relaimable_pages instead of zone_has_relaimable_pages I still do not understand. This code starts with node_has_relaimable_pages = 0. If you see a zone with no reclaimable pages then you make it node_has_relaimable_pages = 1 which means that + /* Dont wake kswapd if no reclaimable pages */ + if (!node_has_relaimable_pages) + return; this will not hold and we will wake up the kswapd. I believe what you want instead, is to skip the wake up if _all_ zones have !zone_reclaimable_pages() Or I am missing your point. This means that you want if (zone_reclaimable_pages(zone) node_has_relaimable_pages = 1; You are right, will send v2 soon after testing it B.R. Jia + trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order); wake_up_interruptible(&pgdat->kswapd_wait);
Re: [RFC PATCH] mm/vmscan: fix high cpu usage of kswapd if there
Hi Michal On 22/02/2017 7:41 PM, Michal Hocko wrote: On Wed 22-02-17 17:04:48, Jia He wrote: When I try to dynamically allocate the hugepages more than system total free memory: e.g. echo 4000 >/proc/sys/vm/nr_hugepages I assume that the command has terminated with less huge pages allocated than requested but Yes, at last the allocated hugepages are less than 4000 HugePages_Total:1864 HugePages_Free: 1864 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 16384 kB In the bad case, although kswapd takes 100% cpu, the number of HugePages_Total is not increase at all. Node 3, zone DMA [...] pages free 2951 min 2821 low 3526 high 4231 it left the zone below high watermark with node_scanned 0 spanned 245760 present 245760 managed 245388 nr_free_pages 2951 nr_zone_inactive_anon 0 nr_zone_active_anon 0 nr_zone_inactive_file 0 nr_zone_active_file 0 no pages reclaimable, so kswapd will not go to sleep. It would be quite easy and comfortable to call it a misconfiguration but it seems that it might be quite easy to hit with NUMA machines which have large differences in the node sizes. I guess it makes sense to back off the kswapd rather than burning CPU without any way to make forward progress. agree. [...] diff --git a/mm/vmscan.c b/mm/vmscan.c index 532a2a7..a05e3ab 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3139,7 +3139,8 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, int classzone_idx) if (!managed_zone(zone)) continue; - if (!zone_balanced(zone, order, classzone_idx)) + if (!zone_balanced(zone, order, classzone_idx) + && zone_reclaimable_pages(zone)) return false; OK, this makes some sense, although zone_reclaimable_pages doesn't count SLAB reclaimable pages. So we might go to sleep with a reclaimable slab still around. This is not really easy to address because the reclaimable slab doesn't really imply that those pages will be reclaimed... Yes, even in the bad case, when kswapd takes all the cpu, the reclaimable pages are not decreased } @@ -3502,6 +3503,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx) { pg_data_t *pgdat; int z; + int node_has_relaimable_pages = 0; if (!managed_zone(zone)) return; @@ -3522,8 +3524,15 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx) if (zone_balanced(zone, order, classzone_idx)) return; + + if (!zone_reclaimable_pages(zone)) + node_has_relaimable_pages = 1; What, this doesn't make any sense? Did you mean if (zone_reclaimable_pages)? I mean, if any one zone has reclaimable pages, then this zone's *node* has reclaimable pages. Thus, the kswapN for this node should be waken up. e.g. node 1 has 2 zones. zone A has no reclaimable pages but zone B has. Thus node 1 has reclaimable pages, and kswapd1 will be waken up. I use node_has_relaimable_pages in the loop to check all the zones' reclaimable pages number. So I prefer the name node_has_relaimable_pages instead of zone_has_relaimable_pages Did I understand it correctly? Thanks B.R. Jia } + /* Dont wake kswapd if no reclaimable pages */ + if (!node_has_relaimable_pages) + return; + trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order); wake_up_interruptible(&pgdat->kswapd_wait); } -- 1.8.5.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org";> em...@kvack.org
Re: [GIT PULL] cputime: Convert core use of cputime_t to nsecs
Hi Frederic Do we need to support CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=n on ppc64le? If yes, I got a compiling error after applying your patch set: arch/powerpc/kernel/time.c:712:2: error: implicit declaration of function ‘cputime_to_nsecs’ [-Werror=implicit-function-declaration] return local_clock() - cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]); ^ I thought it is due to CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=n on my ppc64le server. My kernel config: # grep -n CONFIG_VIRT_CPU .config 136:CONFIG_VIRT_CPU_ACCOUNTING=y 137:CONFIG_VIRT_CPU_ACCOUNTING_GEN=y B.R. Jia On 30/01/2017 12:46 PM, Frederic Weisbecker wrote: Ingo, Please pull the cputime/nsecs-for-tip branch that can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git cputime/nsecs-for-tip HEAD: 14d889bef71ff808c450f82bcd257b10f05bb061 The patches are the same than on my previous post: https://lwn.net/Articles/712213/ If you apply them by email, just please ignore the very last one ([PATCH 37/37] s390: Prevent from cputime leaks) because we need to find a better solution with Martin. The branch doesn't have this patch. --- Summary --- cputime_t is a type that can map to different time units and granularities: jiffies, nsecs or architecture clock. This type and its accessors and mutators have been designed to deal with all these time units that can vary depending on the kernel config in order to support a model where the cputime is stored as-is under the source unit. The pro here with this model is to avoid expensive conversions from the source unit cputime to a more generic type during the accounting hotpath. Especially for config that have CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y. Now there are several cons: * we need to maintain a whole set of cputime_t mutators and accessors for all implementations of cputime_t (currently 4 of them). And we need such function for every kind of time conversion: to/from jiffies, nsecs, usecs, timeval, clock_t, ... * The core code needs to deal with different possible granularities of cputime_t while converting to/from another time unit. Especially functions like nsecs_to_cputime() can leak some nsecs remainder. This adds more complexity and even sometimes performance loss (involving reverse conversion) in order to avoid losing such time remainder (eg: irqtime accounting, steal time accounting, posix cpu timers, ...). * Kernel developers are seldom familiar with these granularity issues: cputime leaks often appear in patches dealing with cputime code. * In general cputime_t, as a varying type, is more opaque and harder to deal with than static nsecs. Making the core code less readable. This patchset removes all core use of cputime_t and stores the cputime into nsecs units. Only s390 and powerpc (with CONFIG_VIRT_CPU_ACCOUNTING_NATIVE for the latter) now still use cputime_t. Many code get simplified, the diffstat is appealing and some fastpath should even have a small performance gain (irqtime and steal time accounting). Now lets admit one drawback: s390 and powerpc with CONFIG_VIRT_CPU_ACCOUNTING_NATIVE have new cputime_t to nsecs conversion on cputime accounting path. But this should be leveraged by the recent changes which delay the cputime accounting to tick and context switch. Thanks, Frederic --- Frederic Weisbecker (36): jiffies: Reuse TICK_NSEC instead of NSEC_PER_JIFFY time: Introduce jiffies64_to_nsecs() sched: Remove unused INIT_CPUTIME macro cputime: Convert kcpustat to nsecs macintosh/rack-meter: Remove cputime_t internal use cputime: Convert guest time accounting to nsecs cputime: Special API to return old-typed cputime cputime: Convert task/group cputime to nsecs alpha: Convert obsolete cputime_t to nsecs x86: Convert obsolete cputime type to nsecs isdn: Convert obsolete cputime type to nsecs binfmt: Convert obsolete cputime type to nsecs acct: Convert obsolete cputime type to nsecs delaycct: Convert obsolete cputime type to nsecs tsacct: Convert obsolete cputime type to nsecs signal: Convert obsolete cputime type to nsecs cputime: Increment kcpustat directly on irqtime account posix-timers: Use TICK_NSEC instead of a dynamically ad-hoc calculated version posix-timers: Convert internals to use nsecs itimer: Convert internal cputime_t units to nsec sched: Remove temporary cputime_t accessors cputime: Push time to account_user_time() in nsecs cputime: Push time to account_steal_time() in nsecs cputime: Push time to account_idle_time() in nsecs cputime: Push time to account_system_time() in nsecs cputime: Complete nsec conversion of tick based accounting vtime: Return nsecs instead of cputime_t to account cputime: Remove jiffies based cputime
Re: [PATCH RFC 2/3] mm, vmscan: limit kswapd loop if no progress is made
On 25/01/2017 12:54 AM, Michal Hocko wrote: On Tue 24-01-17 15:49:03, Jia He wrote: Currently there is no hard limitation for kswapd retry times if no progress is made. Yes, because the main objective of the kswapd is to balance all memory zones. So having a hard limit on retries doesn't make any sense. But do you think even when there is no any process, kswapd still need to run and take the cpu usage uselessly? Then kswapd will take 100% for a long time. Where it is spending time? I've watched kswapd takes 100% cpu for a whole night. In my test, I tried to allocate 4000 hugepages by: echo 4000 > /proc/sys/vm/nr_hugepages Then,kswapd will take 100% cpu for a long time. The numa layout is: available: 7 nodes (0-6) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 6611 MB node 0 free: 1103 MB node 1 cpus: node 1 size: 12527 MB node 1 free: 8477 MB node 2 cpus: node 2 size: 15087 MB node 2 free: 11037 MB node 3 cpus: node 3 size: 16111 MB node 3 free: 12060 MB node 4 cpus: 8 9 10 11 12 13 14 15 node 4 size: 24815 MB node 4 free: 20704 MB node 5 cpus: node 5 size: 4095 MB node 5 free: 61 MB node 6 cpus: node 6 size: 22750 MB node 6 free: 18716 MB The cause is kswapd will loop for long time even if there is no progress in balance_pgdat. How does this solve anything? If the kswapd just backs off then the more work has to be done in the direct reclaim context. What if there is still no progress in direct context? B.R. Jia Signed-off-by: Jia He --- mm/vmscan.c | 25 ++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 532a2a7..7396a0a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -59,6 +59,7 @@ #define CREATE_TRACE_POINTS #include +#define MAX_KSWAPD_RECLAIM_RETRIES 16 struct scan_control { /* How many pages shrink_list() should reclaim */ unsigned long nr_to_reclaim; @@ -3202,7 +3203,8 @@ static bool kswapd_shrink_node(pg_data_t *pgdat, * or lower is eligible for reclaim until at least one usable zone is * balanced. */ -static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) +static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx, +int *did_some_progress) { int i; unsigned long nr_soft_reclaimed; @@ -3322,6 +3324,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) * entered the allocator slow path while kswapd was awake, order will * remain at the higher level. */ + *did_some_progress = !!(sc.nr_scanned || sc.nr_reclaimed); return sc.order; } @@ -3417,6 +3420,8 @@ static int kswapd(void *p) unsigned int alloc_order, reclaim_order, classzone_idx; pg_data_t *pgdat = (pg_data_t*)p; struct task_struct *tsk = current; + int no_progress_loops = 0; + int did_some_progress = 0; struct reclaim_state reclaim_state = { .reclaimed_slab = 0, @@ -3480,9 +3485,23 @@ static int kswapd(void *p) */ trace_mm_vmscan_kswapd_wake(pgdat->node_id, classzone_idx, alloc_order); - reclaim_order = balance_pgdat(pgdat, alloc_order, classzone_idx); - if (reclaim_order < alloc_order) + reclaim_order = balance_pgdat(pgdat, alloc_order, classzone_idx, + &did_some_progress); + + if (reclaim_order < alloc_order) { + no_progress_loops = 0; goto kswapd_try_sleep; + } + + if (did_some_progress) + no_progress_loops = 0; + else + no_progress_loops++; + + if (no_progress_loops >= MAX_KSWAPD_RECLAIM_RETRIES) { + no_progress_loops = 0; + goto kswapd_try_sleep; + } alloc_order = reclaim_order = pgdat->kswapd_order; classzone_idx = pgdat->kswapd_classzone_idx; -- 2.5.5
Re: [PATCH RFC 3/3] mm, vmscan: correct prepare_kswapd_sleep return value
On 25/01/2017 6:01 AM, Rik van Riel wrote: On Tue, 2017-01-24 at 15:49 +0800, Jia He wrote: When there is no reclaimable pages in the zone, even the zone is not balanced, we let kswapd go sleeping. That is prepare_kswapd_sleep will return true in this case. Signed-off-by: Jia He --- mm/vmscan.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 7396a0a..54445e2 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3140,7 +3140,8 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, int classzone_idx) if (!managed_zone(zone)) continue; - if (!zone_balanced(zone, order, classzone_idx)) + if (!zone_balanced(zone, order, classzone_idx) + && !zone_reclaimable_pages(zone)) return false; } This patch does the opposite of what your changelog says. The above keeps kswapd running forever if the zone is not balanced, and there are no reclaimable pages. sorry for the mistake, I will check what happened. I tested in my local system. B.R. Jia
Re: [PATCH RFC 0/3] optimize kswapd when it does reclaim for hugepage
Hi Michal Thanks for the comments, I will resend the patch as per your comment after my 2 weeks vacation. B.R. Jia On 25/01/2017 12:46 AM, Michal Hocko wrote: On Tue 24-01-17 15:49:01, Jia He wrote: If there is a server with uneven numa memory layout: available: 7 nodes (0-6) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 6603 MB node 0 free: 91 MB node 1 cpus: node 1 size: 12527 MB node 1 free: 157 MB node 2 cpus: node 2 size: 15087 MB node 2 free: 189 MB node 3 cpus: node 3 size: 16111 MB node 3 free: 205 MB node 4 cpus: 8 9 10 11 12 13 14 15 node 4 size: 24815 MB node 4 free: 310 MB node 5 cpus: node 5 size: 4095 MB node 5 free: 61 MB node 6 cpus: node 6 size: 22750 MB node 6 free: 283 MB node distances: node 0 1 2 3 4 5 6 0: 10 20 40 40 40 40 40 1: 20 10 40 40 40 40 40 2: 40 40 10 20 40 40 40 3: 40 40 20 10 40 40 40 4: 40 40 40 40 10 20 40 5: 40 40 40 40 20 10 40 6: 40 40 40 40 40 40 10 In this case node 5 has less memory and we will alloc the hugepages from these nodes one by one after we trigger echo 4000 > /proc/sys/vm/nr_hugepages Then the kswapd5 will take 100% cpu for a long time. This is a livelock issue in kswapd. This patch set fixes it. It would be really helpful to describe what is the issue and whether it is specific to the configuration above. Also a highlevel overview of the fix and why it is the right approach would be appreciated. The 3rd patch improves the kswapd's bad performance significantly. Numbers? Jia He (3): mm/hugetlb: split alloc_fresh_huge_page_node into fast and slow path mm, vmscan: limit kswapd loop if no progress is made mm, vmscan: correct prepare_kswapd_sleep return value mm/hugetlb.c | 9 + mm/vmscan.c | 28 2 files changed, 33 insertions(+), 4 deletions(-) -- 2.5.5
Re: [PATCH v5 1/2] sysctl: introduce new proc handler proc_dobool
On 05/01/2017 5:09 AM, J. Bruce Fields wrote: On Thu, Dec 15, 2016 at 03:24:20PM +0800, Jia He wrote: This is to let bool variable could be correctly displayed in big/little endian sysctl procfs. sizeof(bool) is arch dependent, proc_dobool should work in all arches. Did Alexey Debriyan agree that this dealt with his objections? Hi Alexey, any thoughts? ;-) Thanks B.R. Jia Also it'd be nice to have a sign-off from somebody that's actually touched sysctl.c. Then I could take it through the nfsd tree if that doesn't cause anyone trouble. --b. Suggested-by: Pan Xinhui Signed-off-by: Jia He --- include/linux/sysctl.h | 2 ++ kernel/sysctl.c| 41 + 2 files changed, 43 insertions(+) diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index adf4e51..255a9c7 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -41,6 +41,8 @@ typedef int proc_handler (struct ctl_table *ctl, int write, extern int proc_dostring(struct ctl_table *, int, void __user *, size_t *, loff_t *); +extern int proc_dobool(struct ctl_table *, int, + void __user *, size_t *, loff_t *); extern int proc_dointvec(struct ctl_table *, int, void __user *, size_t *, loff_t *); extern int proc_douintvec(struct ctl_table *, int, diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 706309f..c4bec65 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2112,6 +2112,20 @@ static int proc_put_char(void __user **buf, size_t *size, char c) return 0; } +static int do_proc_dobool_conv(bool *negp, unsigned long *lvalp, + int *valp, + int write, void *data) +{ + if (write) + *(bool *)valp = *lvalp; + else { + int val = *(bool *)valp; + + *lvalp = (unsigned long)val; + } + return 0; +} + static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp, int *valp, int write, void *data) @@ -2258,6 +2272,26 @@ static int do_proc_dointvec(struct ctl_table *table, int write, } /** + * proc_dobool - read/write a bool + * @table: the sysctl table + * @write: %TRUE if this is a write to the sysctl file + * @buffer: the user buffer + * @lenp: the size of the user buffer + * @ppos: file position + * + * Reads/writes up to table->maxlen/sizeof(unsigned int) integer + * values from/to the user buffer, treated as an ASCII string. + * + * Returns 0 on success. + */ +int proc_dobool(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + return do_proc_dointvec(table, write, buffer, lenp, ppos, + do_proc_dobool_conv, NULL); +} + +/** * proc_dointvec - read a vector of integers * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file @@ -2885,6 +2919,12 @@ int proc_dostring(struct ctl_table *table, int write, return -ENOSYS; } +int proc_dobool(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ +return -ENOSYS; +} + int proc_dointvec(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { @@ -2941,6 +2981,7 @@ int proc_doulongvec_ms_jiffies_minmax(struct ctl_table *table, int write, * No sense putting this after each symbol definition, twice, * exception granted :-) */ +EXPORT_SYMBOL(proc_dobool); EXPORT_SYMBOL(proc_dointvec); EXPORT_SYMBOL(proc_douintvec); EXPORT_SYMBOL(proc_dointvec_jiffies); -- 2.5.5
Re: [PATCH v5 0/2] change the proc handler for nsm_use_hostnames
Ping, any comments are welcome. Thanks B.R. Jia On 15/12/2016 3:24 PM, Jia He wrote: nsm_use_hostnames is a module parameter and it will be exported to sysctl procfs. This is to let user sometimes change it from userspace. But the minimal unit for sysctl procfs read/write it sizeof(int). In big endian system, the converting from/to bool to/from int will cause error for proc items. This patch introduces a new proc handler proc_dobool for nsm_use_hostnames. Changes: v5: Fix compilation error when CONFIG_PROC_SYSCTL is not set v4: Change (u8 *) to (bool *) v3: Introduce a new proc handler proc_dou8vec(suggested by Xinhui Pan) v2: Change extern type in lockd.h The test case I used: /***/ #include #include #include bool __read_mostly nsm_use_hostnames; module_param(nsm_use_hostnames, bool, 0644); static struct ctl_table my_sysctl[] = { { .procname = "nsm_use_hostnames", .data = &nsm_use_hostnames, .maxlen = sizeof(int), .mode = 0644, .proc_handler = &proc_dointvec, }, {} }; static struct ctl_table my_root[] = { { .procname = "mysysctl", .mode = 0555, .child = my_sysctl, }, {} }; static struct ctl_table_header * my_ctl_header; static int __init sysctl_exam_init(void) { my_ctl_header = register_sysctl_table(&my_root); if (my_ctl_header == NULL) printk("error regiester sysctl"); return 0; } static void __exit sysctl_exam_exit(void) { unregister_sysctl_table(my_ctl_header); } module_init(sysctl_exam_init); module_exit(sysctl_exam_exit); MODULE_LICENSE("GPL"); // [root@bigendian my]# insmod -f /root/my/hello.ko nsm_use_hostnames=1 [root@bigendian my]# cat /proc/sys/mysysctl/nsm_use_hostnames 16777216 After I change the proc_dointvec to new handler proc_dou8vec with the patch: [root@bigendian my]# insmod -f /root/my/hello.ko nsm_use_hostnames=1 [root@bigendian my]# cat /proc/sys/mysysctl/nsm_use_hostnames 1 In little endian system, there is no such issue. Already tested in both of big and little endian(ppc64 and ppc64le) Jia He (2): sysctl: introduce new proc handler proc_dobool lockd: change the proc_handler for nsm_use_hostnames fs/lockd/svc.c | 2 +- include/linux/sysctl.h | 2 ++ kernel/sysctl.c| 41 + 3 files changed, 44 insertions(+), 1 deletion(-)
Re: [PATCH RFC 1/1] mm, page_alloc: fix incorrect zone_statistics data
On 20/12/2016 8:31 PM, Mel Gorman wrote: On Mon, Dec 12, 2016 at 01:59:07PM +0800, Jia He wrote: In commit b9f00e147f27 ("mm, page_alloc: reduce branches in zone_statistics"), it reconstructed codes to reduce the branch miss rate. Compared with the original logic, it assumed if !(flag & __GFP_OTHER_NODE) z->node would not be equal to preferred_zone->node. That seems to be incorrect. Fixes: commit b9f00e147f27 ("mm, page_alloc: reduce branches in zone_statistics") Signed-off-by: Jia He This is slightly curious. It appear it would only occur if a process was running on a node that was outside the memory policy. Can you confirm that is the case? Yes, here is what I caught: dumpstack() is triggered when z->node(5) == preferred_zone->node(5) and z->node != numa_node_id(4) without flag GET_OTHER_NODE It is not a rare case. I saw hundreds of such cases when kernel boots up [c00cdcaef440] [c02e88cc] cache_grow_begin+0xcc/0x500 [c00cdcaef6f0] [c02ecb44] do_tune_cpucache+0x64/0x100 [c00cdcaef750] [c02ecc7c] enable_cpucache+0x9c/0x180 [c00cdcaef7d0] [c02ed01c] __kmem_cache_create+0x1ec/00x2c0 [c00cdcaef820] [c0291c98] create_cache+0xb8/0x240 [c00cdcaef890] [c0291fa8] kmem_cache_create+0x188/0x2290 [c00cdcaef950] [d00011dc5c70] ext4_mb_init+0x3c0/0x5e0 [eext4] [c00cdcaef9f0] [d00011daaedc] ext4_fill_super+0x266c/0x33390 [ext4] [c00cdcaefb30] [c0328b8c] mount_bdev+0x22c/0x260 [c00cdcaefbd0] [d00011da1fa8] ext4_mount+0x48/0x60 [ext4] [c00cdcaefc10] [c032a11c] mount_fs+0x8c/0x230 [c00cdcaefcb0] [c0351f98] vfs_kern_mount+0x78/0x180 [c00cdcaefd00] [c0356d68] do_mount+0x258/0xea0 [c00cdcaefde0] [c0357da0] SyS_mount+0xa0/0x110 [c00cdcaefe30] [c000bd84] system_call+0x38/0xe0 If so, your patch is a a semantic curiousity because it's actually impossible for a NUMA allocation to be local and the definition of "HIT" is fuzzy enough to be useless. I won't object to the patch but it makes me trust "hit" even less than I already do for any analysis. Note that after this mail that I'll be unavailable by mail until early new years.
Re: [PATCH RFC 1/1] mm, page_alloc: fix incorrect zone_statistics data
On 20/12/2016 5:18 PM, Michal Hocko wrote: On Mon 12-12-16 13:59:07, Jia He wrote: In commit b9f00e147f27 ("mm, page_alloc: reduce branches in zone_statistics"), it reconstructed codes to reduce the branch miss rate. Compared with the original logic, it assumed if !(flag & __GFP_OTHER_NODE) z->node would not be equal to preferred_zone->node. That seems to be incorrect. I am sorry but I have hard time following the changelog. It is clear that you are trying to fix a missed NUMA_{HIT,OTHER} accounting but it is not really clear when such thing happens. You are adding preferred_zone->node check. preferred_zone is the first zone in the requested zonelist. So for the most allocations it is a node from the local node. But if something request an explicit numa node (without __GFP_OTHER_NODE which would be the majority I suspect) then we could indeed end up accounting that as a NUMA_MISS, NUMA_FOREIGN so the referenced patch indeed caused an unintended change of accounting AFAIU. If this is correct then it should be a part of the changelog. I also cannot say I would like the fix. First of all I am not sure __GFP_OTHER_NODE is a good idea at all. How is an explicit usage of the flag any different from an explicit __alloc_pages_node(non_local_nid)? In both cases we ask for an allocation on a remote node and successful allocation is a NUMA_HIT and NUMA_OTHER. That being said, why cannot we simply do the following? As a bonus, we can get rid of a barely used __GFP_OTHER_NODE. Also the number of branches will stay same. Yes, I agree maybe we can get rid of __GFP_OTHER_NODE if no objections Seems currently it is only used for hugepage and statistics --- diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 429855be6ec9..f035d5c8b864 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2583,25 +2583,17 @@ int __isolate_free_page(struct page *page, unsigned int order) * Update NUMA hit/miss statistics * * Must be called with interrupts disabled. - * - * When __GFP_OTHER_NODE is set assume the node of the preferred - * zone is the local node. This is useful for daemons who allocate - * memory on behalf of other processes. */ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z, gfp_t flags) { #ifdef CONFIG_NUMA - int local_nid = numa_node_id(); - enum zone_stat_item local_stat = NUMA_LOCAL; - - if (unlikely(flags & __GFP_OTHER_NODE)) { - local_stat = NUMA_OTHER; - local_nid = preferred_zone->node; - } + if (z->node == preferred_zone->node) { + enum zone_stat_item local_stat = NUMA_LOCAL; - if (z->node == local_nid) { __inc_zone_state(z, NUMA_HIT); + if (z->node != numa_node_id()) + local_stat = NUMA_OTHER; __inc_zone_state(z, local_stat); } else { __inc_zone_state(z, NUMA_MISS); I thought the logic here is different Here is the zone_statistics() before introducing __GFP_OTHER_NODE: if (z->zone_pgdat == preferred_zone->zone_pgdat) { __inc_zone_state(z, NUMA_HIT); } else { __inc_zone_state(z, NUMA_MISS); __inc_zone_state(preferred_zone, NUMA_FOREIGN); } if (z->node == numa_node_id()) __inc_zone_state(z, NUMA_LOCAL); else __inc_zone_state(z, NUMA_OTHER); B.R. Jia
Re: [RESEND PATCH v4 1/2] sysctl: introduce new proc handler proc_dobool
Thanks, this error is caused by # CONFIG_PROC_SYSCTL is not set Will fixed in next version Jia B.R. On 12/14/16 2:13 PM, kbuild test robot wrote: Hi Jia, [auto build test ERROR on linus/master] [also build test ERROR on v4.9 next-20161214] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Jia-He/sysctl-introduce-new-proc-handler-proc_dobool/20161214-112656 config: x86_64-randconfig-n0-12141159 (attached as .config) compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): kernel/built-in.o:(___ksymtab+proc_dobool+0x0): undefined reference to `proc_dobool' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [PATCH v2 1/1] lockd: Change nsm_use_hostnames from bool to u32
Hi Xinhui Thanks, it really works. Will send out V3 soon afterwards B.R. Jia On 12/12/16 1:43 AM, Pan Xinhui wrote: hi, jia nice catch! However I think we should fix it totally. This is because do_proc_dointvec_conv() try to get a int value from a bool *. something like below might help. pls. ignore the code style and this is tested :) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index fc4084e..7eeaee4 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -519,6 +519,8 @@ EXPORT_SYMBOL_GPL(lockd_down); * Sysctl parameters (same as module parameters, different interface). */ +int proc_dou8vec(struct ctl_table *table, int write, +void __user *buffer, size_t *lenp, loff_t *ppos); static struct ctl_table nlm_sysctls[] = { { .procname = "nlm_grace_period", @@ -561,7 +563,7 @@ static struct ctl_table nlm_sysctls[] = { .data = &nsm_use_hostnames, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_dou8vec, }, { .procname = "nsm_local_state", diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 706309f..6307737 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2112,6 +2112,30 @@ static int proc_put_char(void __user **buf, size_t *size, char c) return 0; } + +static int do_proc_dou8vec_conv(bool *negp, unsigned long *lvalp, +u8 *valp, +int write, void *data) +{ + if (write) { + if (*negp) { + *valp = -*lvalp; + } else { + *valp = *lvalp; + } + } else { + int val = *valp; + if (val < 0) { + *negp = true; + *lvalp = -(unsigned long)val; + } else { + *negp = false; + *lvalp = (unsigned long)val; + } + } + return 0; +} + static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp, int *valp, int write, void *data) @@ -2296,6 +2320,14 @@ int proc_douintvec(struct ctl_table *table, int write, do_proc_douintvec_conv, NULL); } +int proc_dou8vec(struct ctl_table *table, int write, +void __user *buffer, size_t *lenp, loff_t *ppos) +{ + return do_proc_dointvec(table, write, buffer, lenp, ppos, + do_proc_dou8vec_conv, NULL); +} + + 在 2016/12/11 23:36, Jia He 写道: nsm_use_hostnames is a module paramter and it will be exported to sysctl procfs. This is to let user sometimes change it from userspace. But the minimal unit for sysctl procfs read/write it sizeof(int). In big endian system, the converting from/to bool to/from int will cause error for proc items. This patch changes the type definition of nsm_use_hostnames. V2: Changes extern type in lockd.h Signed-off-by: Jia He --- fs/lockd/mon.c | 2 +- fs/lockd/svc.c | 2 +- include/linux/lockd/lockd.h | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c index 19166d4..3e7ff4d 100644 --- a/fs/lockd/mon.c +++ b/fs/lockd/mon.c @@ -57,7 +57,7 @@ static DEFINE_SPINLOCK(nsm_lock); * Local NSM state */ u32__read_mostlynsm_local_state; -bool__read_mostlynsm_use_hostnames; +u32__read_mostlynsm_use_hostnames; static inline struct sockaddr *nsm_addr(const struct nsm_handle *nsm) { diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index fc4084e..308033d 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -658,7 +658,7 @@ module_param_call(nlm_udpport, param_set_port, param_get_int, &nlm_udpport, 0644); module_param_call(nlm_tcpport, param_set_port, param_get_int, &nlm_tcpport, 0644); -module_param(nsm_use_hostnames, bool, 0644); +module_param(nsm_use_hostnames, u32, 0644); module_param(nlm_max_connections, uint, 0644); static int lockd_init_net(struct net *net) diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index c153738..db52152 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -196,7 +196,7 @@ extern struct svc_procedure nlmsvc_procedures4[]; #endif extern intnlmsvc_grace_period; extern unsigned longnlmsvc_timeout; -extern boolnsm_use_hostnames; +extern u32nsm_use_hostnames; extern u32nsm_local_state; /*
Re: [PATCH v5 0/7] Reduce cache miss for snmp_fold_field
On 9/28/16 5:08 PM, David Miller wrote: From: Jia He Date: Wed, 28 Sep 2016 14:22:21 +0800 v5: - order local variables from longest to shortest line I still see many cases where this problem still exists. Please do not resubmit this patch series until you fix all of them. Patch #2: -static int snmp_seq_show(struct seq_file *seq, void *v) +static int snmp_seq_show_ipstats(struct seq_file *seq, void *v) { int i; + u64 buff64[IPSTATS_MIB_MAX]; struct net *net = seq->private; The order should be "net" then "buff64" then "i". Sorry for my bad eyesight and quick hand :( B.R. Jia +static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v) +{ + int i; + struct net *net = seq->private; + unsigned long buff[TCPUDP_MIB_MAX]; The order should be "buff", "net", then "i". Patch #3: @@ -192,13 +197,19 @@ static void snmp6_seq_show_item(struct seq_file *seq, void __percpu *pcpumib, const struct snmp_mib *itemlist) { int i; - unsigned long val; - ... + unsigned long buff[SNMP_MIB_MAX]; The order should be "buff" then "i". @@ -206,10 +217,13 @@ static void snmp6_seq_show_item64(struct seq_file *seq, void __percpu *mib, const struct snmp_mib *itemlist, size_t syncpoff) { int i; + u64 buff64[SNMP_MIB_MAX]; Likewise. I cannot be any more explicit in my request than this.
Re: [RFC PATCH v3 2/7] proc: Reduce cache miss in {snmp,netstat}_seq_show
On 9/22/16 2:24 AM, Marcelo wrote: On Thu, Sep 22, 2016 at 12:18:46AM +0800, hejianet wrote: Hi Marcelo sorry for the late, just came back from a vacation. Hi, no problem. Hope your batteries are recharged now :-) On 9/14/16 7:55 PM, Marcelo wrote: Hi Jia, On Wed, Sep 14, 2016 at 01:58:42PM +0800, hejianet wrote: Hi Marcelo On 9/13/16 2:57 AM, Marcelo wrote: On Fri, Sep 09, 2016 at 02:33:57PM +0800, Jia He wrote: This is to use the generic interface snmp_get_cpu_field{,64}_batch to aggregate the data by going through all the items of each cpu sequentially. Then snmp_seq_show and netstat_seq_show are split into 2 parts to avoid build warning "the frame size" larger than 1024 on s390. Yeah about that, did you test it with stack overflow detection? These arrays can be quite large. One more below.. Do you think it is acceptable if the stack usage is a little larger than 1024? e.g. 1120 I can't find any other way to reduce the stack usage except use "static" before unsigned long buff[TCP_MIB_MAX] PS. sizeof buff is about TCP_MIB_MAX(116)*8=928 B.R. That's pretty much the question. Linux has the option on some archs to run with 4Kb (4KSTACKS option), so this function alone would be using 25% of it in this last case. While on x86_64, it uses 16Kb (6538b8ea886e ("x86_64: expand kernel stack to 16K")). Adding static to it is not an option as it actually makes the variable shared amongst the CPUs (and then you have concurrency issues), plus the fact that it's always allocated, even while not in use. Others here certainly know better than me if it's okay to make such usage of the stach. What about this patch instead? It is a trade-off. I split the aggregation process into 2 parts, it will increase the cache miss a little bit, but it can reduce the stack usage. After this, stack usage is 672bytes objdump -d vmlinux | ./scripts/checkstack.pl ppc64 | grep seq_show 0xc07f7cc0 netstat_seq_show_tcpext.isra.3 [vmlinux]:672 diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index c6ee8a2..cc41590 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -486,22 +486,37 @@ static const struct file_operations snmp_seq_fops = { */ static int netstat_seq_show_tcpext(struct seq_file *seq, void *v) { - int i; - unsigned long buff[LINUX_MIB_MAX]; + int i, c; + unsigned long buff[LINUX_MIB_MAX/2 + 1]; struct net *net = seq->private; - memset(buff, 0, sizeof(unsigned long) * LINUX_MIB_MAX); + memset(buff, 0, sizeof(unsigned long) * (LINUX_MIB_MAX/2 + 1)); seq_puts(seq, "TcpExt:"); for (i = 0; snmp4_net_list[i].name; i++) seq_printf(seq, " %s", snmp4_net_list[i].name); seq_puts(seq, "\nTcpExt:"); - snmp_get_cpu_field_batch(buff, snmp4_net_list, -net->mib.net_statistics); - for (i = 0; snmp4_net_list[i].name; i++) + for_each_possible_cpu(c) { + for (i = 0; i < LINUX_MIB_MAX/2; i++) + buff[i] += snmp_get_cpu_field( + net->mib.net_statistics, + c, snmp4_net_list[i].entry); + } + for (i = 0; i < LINUX_MIB_MAX/2; i++) seq_printf(seq, " %lu", buff[i]); + memset(buff, 0, sizeof(unsigned long) * (LINUX_MIB_MAX/2 + 1)); + for_each_possible_cpu(c) { + for (i = LINUX_MIB_MAX/2; snmp4_net_list[i].name; i++) + buff[i - LINUX_MIB_MAX/2] += snmp_get_cpu_field( + net->mib.net_statistics, + c, + snmp4_net_list[i].entry); + } +for (i = LINUX_MIB_MAX/2; snmp4_net_list[i].name; i++) +seq_printf(seq, " %lu", buff[i - LINUX_MIB_MAX/2]); + return 0; } Yep, it halves the stack usage, but it doesn't look good heh But well, you may try to post the patchset (with or without this last change, you pick) officially and see how it goes. As you're posting as RFC, it's not being evaluated as seriously. Thanks for the suggestion, I will remove it in future patch version FWIW, I tested your patches, using your test and /proc/net/snmp file on a x86_64 box, Intel(R) Xeon(R) CPU E5-2643 v3. Before the patches: Performance counter stats for './test /proc/net/snmp': 5.225 cache-misses 12.708.673.785 L1-dcache-loads 1.288.450.174 L1-dcache-load-misses # 10,14% of all L1-dcache hits 1.271.857.028 LLC-loads 4.122 LLC-load-misses #0,00% of all LL-cache hits 9,174936524 seconds time elapsed After: Performance counter stats for './test /proc/net/snmp': 2.865 cache-misses 30.203.883.807 L1-dcache-loads 1.215.774.643 L1
Re: [RFC PATCH v3 2/7] proc: Reduce cache miss in {snmp,netstat}_seq_show
Hi Marcelo sorry for the late, just came back from a vacation. On 9/14/16 7:55 PM, Marcelo wrote: Hi Jia, On Wed, Sep 14, 2016 at 01:58:42PM +0800, hejianet wrote: Hi Marcelo On 9/13/16 2:57 AM, Marcelo wrote: On Fri, Sep 09, 2016 at 02:33:57PM +0800, Jia He wrote: This is to use the generic interface snmp_get_cpu_field{,64}_batch to aggregate the data by going through all the items of each cpu sequentially. Then snmp_seq_show and netstat_seq_show are split into 2 parts to avoid build warning "the frame size" larger than 1024 on s390. Yeah about that, did you test it with stack overflow detection? These arrays can be quite large. One more below.. Do you think it is acceptable if the stack usage is a little larger than 1024? e.g. 1120 I can't find any other way to reduce the stack usage except use "static" before unsigned long buff[TCP_MIB_MAX] PS. sizeof buff is about TCP_MIB_MAX(116)*8=928 B.R. That's pretty much the question. Linux has the option on some archs to run with 4Kb (4KSTACKS option), so this function alone would be using 25% of it in this last case. While on x86_64, it uses 16Kb (6538b8ea886e ("x86_64: expand kernel stack to 16K")). Adding static to it is not an option as it actually makes the variable shared amongst the CPUs (and then you have concurrency issues), plus the fact that it's always allocated, even while not in use. Others here certainly know better than me if it's okay to make such usage of the stach. What about this patch instead? It is a trade-off. I split the aggregation process into 2 parts, it will increase the cache miss a little bit, but it can reduce the stack usage. After this, stack usage is 672bytes objdump -d vmlinux | ./scripts/checkstack.pl ppc64 | grep seq_show 0xc07f7cc0 netstat_seq_show_tcpext.isra.3 [vmlinux]:672 diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index c6ee8a2..cc41590 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -486,22 +486,37 @@ static const struct file_operations snmp_seq_fops = { */ static int netstat_seq_show_tcpext(struct seq_file *seq, void *v) { - int i; - unsigned long buff[LINUX_MIB_MAX]; + int i, c; + unsigned long buff[LINUX_MIB_MAX/2 + 1]; struct net *net = seq->private; - memset(buff, 0, sizeof(unsigned long) * LINUX_MIB_MAX); + memset(buff, 0, sizeof(unsigned long) * (LINUX_MIB_MAX/2 + 1)); seq_puts(seq, "TcpExt:"); for (i = 0; snmp4_net_list[i].name; i++) seq_printf(seq, " %s", snmp4_net_list[i].name); seq_puts(seq, "\nTcpExt:"); - snmp_get_cpu_field_batch(buff, snmp4_net_list, -net->mib.net_statistics); - for (i = 0; snmp4_net_list[i].name; i++) + for_each_possible_cpu(c) { + for (i = 0; i < LINUX_MIB_MAX/2; i++) + buff[i] += snmp_get_cpu_field( + net->mib.net_statistics, + c, snmp4_net_list[i].entry); + } + for (i = 0; i < LINUX_MIB_MAX/2; i++) seq_printf(seq, " %lu", buff[i]); + memset(buff, 0, sizeof(unsigned long) * (LINUX_MIB_MAX/2 + 1)); + for_each_possible_cpu(c) { + for (i = LINUX_MIB_MAX/2; snmp4_net_list[i].name; i++) + buff[i - LINUX_MIB_MAX/2] += snmp_get_cpu_field( + net->mib.net_statistics, + c, + snmp4_net_list[i].entry); + } +for (i = LINUX_MIB_MAX/2; snmp4_net_list[i].name; i++) +seq_printf(seq, " %lu", buff[i - LINUX_MIB_MAX/2]); + return 0; } +static int netstat_seq_show_ipext(struct seq_file *seq, void *v) +{ + int i; + u64 buff64[IPSTATS_MIB_MAX]; + struct net *net = seq->private; seq_puts(seq, "\nIpExt:"); for (i = 0; snmp4_ipextstats_list[i].name != NULL; i++) seq_printf(seq, " %s", snmp4_ipextstats_list[i].name); seq_puts(seq, "\nIpExt:"); You're missing a memset() call here. Not sure if you missed this one or not.. indeed, thanks B.R. Jia Thanks, Marcelo
Re: [RFC PATCH v3 2/7] proc: Reduce cache miss in {snmp,netstat}_seq_show
Hi Marcelo On 9/13/16 2:57 AM, Marcelo wrote: On Fri, Sep 09, 2016 at 02:33:57PM +0800, Jia He wrote: This is to use the generic interface snmp_get_cpu_field{,64}_batch to aggregate the data by going through all the items of each cpu sequentially. Then snmp_seq_show and netstat_seq_show are split into 2 parts to avoid build warning "the frame size" larger than 1024 on s390. Yeah about that, did you test it with stack overflow detection? These arrays can be quite large. One more below.. Do you think it is acceptable if the stack usage is a little larger than 1024? e.g. 1120 I can't find any other way to reduce the stack usage except use "static" before unsigned long buff[TCP_MIB_MAX] PS. sizeof buff is about TCP_MIB_MAX(116)*8=928 B.R. Signed-off-by: Jia He --- net/ipv4/proc.c | 106 +++- 1 file changed, 74 insertions(+), 32 deletions(-) diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index 9f665b6..c6fc80e 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -46,6 +46,8 @@ #include #include +#define TCPUDP_MIB_MAX max_t(u32, UDP_MIB_MAX, TCP_MIB_MAX) + /* *Report socket allocation statistics [m...@utu.fi] */ @@ -378,13 +380,15 @@ static void icmp_put(struct seq_file *seq) /* *Called from the PROCfs module. This outputs /proc/net/snmp. */ -static int snmp_seq_show(struct seq_file *seq, void *v) +static int snmp_seq_show_ipstats(struct seq_file *seq, void *v) { int i; + u64 buff64[IPSTATS_MIB_MAX]; struct net *net = seq->private; - seq_puts(seq, "Ip: Forwarding DefaultTTL"); + memset(buff64, 0, IPSTATS_MIB_MAX * sizeof(u64)); + seq_puts(seq, "Ip: Forwarding DefaultTTL"); for (i = 0; snmp4_ipstats_list[i].name != NULL; i++) seq_printf(seq, " %s", snmp4_ipstats_list[i].name); @@ -393,57 +397,77 @@ static int snmp_seq_show(struct seq_file *seq, void *v) net->ipv4.sysctl_ip_default_ttl); BUILD_BUG_ON(offsetof(struct ipstats_mib, mibs) != 0); + snmp_get_cpu_field64_batch(buff64, snmp4_ipstats_list, + net->mib.ip_statistics, + offsetof(struct ipstats_mib, syncp)); for (i = 0; snmp4_ipstats_list[i].name != NULL; i++) - seq_printf(seq, " %llu", - snmp_fold_field64(net->mib.ip_statistics, -snmp4_ipstats_list[i].entry, -offsetof(struct ipstats_mib, syncp))); + seq_printf(seq, " %llu", buff64[i]); - icmp_put(seq); /* RFC 2011 compatibility */ - icmpmsg_put(seq); + return 0; +} + +static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v) +{ + int i; + unsigned long buff[TCPUDP_MIB_MAX]; + struct net *net = seq->private; + + memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long)); seq_puts(seq, "\nTcp:"); for (i = 0; snmp4_tcp_list[i].name != NULL; i++) seq_printf(seq, " %s", snmp4_tcp_list[i].name); seq_puts(seq, "\nTcp:"); + snmp_get_cpu_field_batch(buff, snmp4_tcp_list, +net->mib.tcp_statistics); for (i = 0; snmp4_tcp_list[i].name != NULL; i++) { /* MaxConn field is signed, RFC 2012 */ if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN) - seq_printf(seq, " %ld", - snmp_fold_field(net->mib.tcp_statistics, - snmp4_tcp_list[i].entry)); + seq_printf(seq, " %ld", buff[i]); else - seq_printf(seq, " %lu", - snmp_fold_field(net->mib.tcp_statistics, - snmp4_tcp_list[i].entry)); + seq_printf(seq, " %lu", buff[i]); } + memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long)); + + snmp_get_cpu_field_batch(buff, snmp4_udp_list, +net->mib.udp_statistics); seq_puts(seq, "\nUdp:"); for (i = 0; snmp4_udp_list[i].name != NULL; i++) seq_printf(seq, " %s", snmp4_udp_list[i].name); - seq_puts(seq, "\nUdp:"); for (i = 0; snmp4_udp_list[i].name != NULL; i++) - seq_printf(seq, " %lu", - snmp_fold_field(net->mib.udp_statistics, - snmp4_udp_list[i].entry)); + seq_printf(seq, " %lu", buff[i]); + + memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long)); /* the UDP and UDP-Lite MIBs are the same */ seq_puts(seq, "\nUdpLite:"); + snmp_get_cpu_field_batch(buff, snmp4_udp_list, +net->mib.udplite_statistics); for (i = 0; snmp4_udp_list[i].name != NULL; i++)
Re: [RFC PATCH v3 3/7] proc: Reduce cache miss in snmp6_seq_show
On 9/13/16 3:05 AM, Marcelo wrote: On Fri, Sep 09, 2016 at 02:33:58PM +0800, Jia He wrote: This is to use the generic interface snmp_get_cpu_field{,64}_batch to aggregate the data by going through all the items of each cpu sequentially. Signed-off-by: Jia He --- net/ipv6/proc.c | 32 +++- 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c index 679253d0..50ba2c3 100644 --- a/net/ipv6/proc.c +++ b/net/ipv6/proc.c @@ -30,6 +30,11 @@ #include #include +#define MAX4(a, b, c, d) \ + max_t(u32, max_t(u32, a, b), max_t(u32, c, d)) +#define SNMP_MIB_MAX MAX4(UDP_MIB_MAX, TCP_MIB_MAX, \ + IPSTATS_MIB_MAX, ICMP_MIB_MAX) + static int sockstat6_seq_show(struct seq_file *seq, void *v) { struct net *net = seq->private; @@ -192,13 +197,19 @@ static void snmp6_seq_show_item(struct seq_file *seq, void __percpu *pcpumib, const struct snmp_mib *itemlist) { int i; - unsigned long val; - - for (i = 0; itemlist[i].name; i++) { - val = pcpumib ? - snmp_fold_field(pcpumib, itemlist[i].entry) : - atomic_long_read(smib + itemlist[i].entry); - seq_printf(seq, "%-32s\t%lu\n", itemlist[i].name, val); + unsigned long buff[SNMP_MIB_MAX]; + + memset(buff, 0, sizeof(unsigned long) * SNMP_MIB_MAX); This memset() could be moved... + + if (pcpumib) { ... here, so it's not executed if it hits the else block. Thanks for the suggestion B.R. Jia + snmp_get_cpu_field_batch(buff, itemlist, pcpumib); + for (i = 0; itemlist[i].name; i++) + seq_printf(seq, "%-32s\t%lu\n", + itemlist[i].name, buff[i]); + } else { + for (i = 0; itemlist[i].name; i++) + seq_printf(seq, "%-32s\t%lu\n", itemlist[i].name, + atomic_long_read(smib + itemlist[i].entry)); } } @@ -206,10 +217,13 @@ static void snmp6_seq_show_item64(struct seq_file *seq, void __percpu *mib, const struct snmp_mib *itemlist, size_t syncpoff) { int i; + u64 buff64[SNMP_MIB_MAX]; + + memset(buff64, 0, sizeof(unsigned long) * SNMP_MIB_MAX); + snmp_get_cpu_field64_batch(buff64, itemlist, mib, syncpoff); for (i = 0; itemlist[i].name; i++) - seq_printf(seq, "%-32s\t%llu\n", itemlist[i].name, - snmp_fold_field64(mib, itemlist[i].entry, syncpoff)); + seq_printf(seq, "%-32s\t%llu\n", itemlist[i].name, buff64[i]); } static int snmp6_seq_show(struct seq_file *seq, void *v) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v3 2/7] proc: Reduce cache miss in {snmp,netstat}_seq_show
Hi Marcelo On 9/13/16 2:57 AM, Marcelo wrote: On Fri, Sep 09, 2016 at 02:33:57PM +0800, Jia He wrote: This is to use the generic interface snmp_get_cpu_field{,64}_batch to aggregate the data by going through all the items of each cpu sequentially. Then snmp_seq_show and netstat_seq_show are split into 2 parts to avoid build warning "the frame size" larger than 1024 on s390. Yeah about that, did you test it with stack overflow detection? These arrays can be quite large. One more below.. I found scripts/checkstack.pl could analyze the stack usage statically. [root@tian-lp1 kernel]# objdump -d vmlinux | scripts/checkstack.pl ppc64|grep seq 0xc07d4b18 netstat_seq_show_tcpext.isra.7 [vmlinux]:1120 0xc07ccbe8 fib_triestat_seq_show [vmlinux]: 496 0xc083e7a4 tcp6_seq_show [vmlinux]: 480 0xc07d4908 snmp_seq_show_ipstats.isra.6 [vmlinux]:464 0xc07d4d18 netstat_seq_show_ipext.isra.8 [vmlinux]:464 0xc06f5bd8 proto_seq_show [vmlinux]:416 0xc07f5718 xfrm_statistics_seq_show [vmlinux]: 416 0xc07405b4 dev_seq_printf_stats [vmlinux]: 400 seems the stack usage in netstat_seq_show_tcpext is too big. Will consider how to reduce it B.R. Jia Signed-off-by: Jia He --- net/ipv4/proc.c | 106 +++- 1 file changed, 74 insertions(+), 32 deletions(-) diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index 9f665b6..c6fc80e 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -46,6 +46,8 @@ #include #include +#define TCPUDP_MIB_MAX max_t(u32, UDP_MIB_MAX, TCP_MIB_MAX) + /* *Report socket allocation statistics [m...@utu.fi] */ @@ -378,13 +380,15 @@ static void icmp_put(struct seq_file *seq) /* *Called from the PROCfs module. This outputs /proc/net/snmp. */ -static int snmp_seq_show(struct seq_file *seq, void *v) +static int snmp_seq_show_ipstats(struct seq_file *seq, void *v) { int i; + u64 buff64[IPSTATS_MIB_MAX]; struct net *net = seq->private; - seq_puts(seq, "Ip: Forwarding DefaultTTL"); + memset(buff64, 0, IPSTATS_MIB_MAX * sizeof(u64)); + seq_puts(seq, "Ip: Forwarding DefaultTTL"); for (i = 0; snmp4_ipstats_list[i].name != NULL; i++) seq_printf(seq, " %s", snmp4_ipstats_list[i].name); @@ -393,57 +397,77 @@ static int snmp_seq_show(struct seq_file *seq, void *v) net->ipv4.sysctl_ip_default_ttl); BUILD_BUG_ON(offsetof(struct ipstats_mib, mibs) != 0); + snmp_get_cpu_field64_batch(buff64, snmp4_ipstats_list, + net->mib.ip_statistics, + offsetof(struct ipstats_mib, syncp)); for (i = 0; snmp4_ipstats_list[i].name != NULL; i++) - seq_printf(seq, " %llu", - snmp_fold_field64(net->mib.ip_statistics, -snmp4_ipstats_list[i].entry, -offsetof(struct ipstats_mib, syncp))); + seq_printf(seq, " %llu", buff64[i]); - icmp_put(seq); /* RFC 2011 compatibility */ - icmpmsg_put(seq); + return 0; +} + +static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v) +{ + int i; + unsigned long buff[TCPUDP_MIB_MAX]; + struct net *net = seq->private; + + memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long)); seq_puts(seq, "\nTcp:"); for (i = 0; snmp4_tcp_list[i].name != NULL; i++) seq_printf(seq, " %s", snmp4_tcp_list[i].name); seq_puts(seq, "\nTcp:"); + snmp_get_cpu_field_batch(buff, snmp4_tcp_list, +net->mib.tcp_statistics); for (i = 0; snmp4_tcp_list[i].name != NULL; i++) { /* MaxConn field is signed, RFC 2012 */ if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN) - seq_printf(seq, " %ld", - snmp_fold_field(net->mib.tcp_statistics, - snmp4_tcp_list[i].entry)); + seq_printf(seq, " %ld", buff[i]); else - seq_printf(seq, " %lu", - snmp_fold_field(net->mib.tcp_statistics, - snmp4_tcp_list[i].entry)); + seq_printf(seq, " %lu", buff[i]); } + memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long)); + + snmp_get_cpu_field_batch(buff, snmp4_udp_list, +net->mib.udp_statistics); seq_puts(seq, "\nUdp:"); for (i = 0; snmp4_udp_list[i].name != NULL; i++) seq_printf(seq, " %s", snmp4_udp_list[i].name); - seq_puts(seq, "\nUdp:"); for (i = 0; snmp4_udp_list[i].name != NULL; i++) - seq_printf(seq, " %lu", - snmp_fold_fie
Re: [RFC PATCH v2 0/6] Reduce cache miss for snmp_fold_field
Hi Marcelo Thanks for the suggestion Will consider that B.R. Jia On 9/6/16 8:44 PM, Marcelo Ricardo Leitner wrote: On Tue, Sep 06, 2016 at 10:30:03AM +0800, Jia He wrote: ... v2: - 1/6 fix bug in udplite statistics. - 1/6 snmp_seq_show is split into 2 parts Jia He (6): proc: Reduce cache miss in {snmp,netstat}_seq_show proc: Reduce cache miss in snmp6_seq_show proc: Reduce cache miss in sctp_snmp_seq_show proc: Reduce cache miss in xfrm_statistics_seq_show ipv6: Remove useless parameter in __snmp6_fill_statsdev net: Suppress the "Comparison to NULL could be written" warning Hi Jia, Did you try to come up with a generic interface for this, like snmp_fold_fields64() (note the fieldS) or snmp_fold_field64_batch() ? Sounds like we have the same code in several places and seems they all operate very similarly. They have a percpu table, an identified max, a destination buffer.. If this is possible, this would reduce the possibility of hiccups in a particular code. Marcelo
Re: [RFC PATCH v2 1/6] proc: Reduce cache miss in {snmp,netstat}_seq_show
On 9/7/16 6:57 AM, David Miller wrote: From: Jia He Date: Tue, 6 Sep 2016 10:30:04 +0800 +#define MAX(a, b) ((u32)(a) >= (u32)(b) ? (a) : (b)) Thanks B.R. Jia Please do not define private min/max macros, use the existing max_t() or similar as needed.
Re: [RFC PATCH 1/6] proc: Reduce cache miss in {snmp,netstat}_seq_show
Hi Eric On 8/30/16 12:41 AM, Eric Dumazet wrote: On Tue, 2016-08-30 at 00:03 +0800, Jia He wrote: This patch exchanges the two loop for collecting the percpu statistics data. This can aggregate the data by going through all the items of each cpu sequentially. In snmp_seq_show, just use one buff copy to dislay the Udp and UdpLite data because they are the same. This is obviously not true. On my laptop it seems it handled no UdpLite frames, but got plenty of Udp ones. $ grep Udp /proc/net/snmp Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors InCsumErrors Udp: 1555889 108318 0 3740780 0 0 0 UdpLite: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors InCsumErrors UdpLite: 0 0 0 0 0 0 0 Thanks, you are right. I misunderstand the comments of source codes. Will resend it B.R. Jia .
Re: [RFC PATCH 1/3] net: Remove unnecessary memset in __snmp6_fill_stats64
Yes, sorry about it,I am too hasty B.R. Jia He On 8/8/16 7:12 PM, Florian Westphal wrote: Jia He wrote: buff[] will be assigned later, so memset is not necessary. Signed-off-by: Jia He Cc: "David S. Miller" Cc: Alexey Kuznetsov Cc: James Morris Cc: Hideaki YOSHIFUJI Cc: Patrick McHardy --- net/ipv6/addrconf.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index ab3e796..43fa8d0 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -4967,7 +4967,6 @@ static inline void __snmp6_fill_stats64(u64 *stats, void __percpu *mib, BUG_ON(pad < 0); - memset(buff, 0, sizeof(buff)); buff[0] = IPSTATS_MIB_MAX; for_each_possible_cpu(c) { for (i = 1; i < IPSTATS_MIB_MAX; i++) buff[i] += snmp_get_cpu_field64(mib, c, i, syncpoff); Without memset result of buff[i] += ... is undefined.
Re: [RFC PATCH] mm/hugetlb: Avoid soft lockup in set_max_huge_pages()
Hi Dave On 7/26/16 11:58 PM, Dave Hansen wrote: On 07/26/2016 08:44 AM, Jia He wrote: This patch is to fix such soft lockup. I thouhgt it is safe to call cond_resched() because alloc_fresh_gigantic_page and alloc_fresh_huge_page are out of spin_lock/unlock section. Yikes. So the call site for both the things you patch is this: while (count > persistent_huge_pages(h)) { ... spin_unlock(&hugetlb_lock); if (hstate_is_gigantic(h)) ret = alloc_fresh_gigantic_page(h, nodes_allowed); else ret = alloc_fresh_huge_page(h, nodes_allowed); spin_lock(&hugetlb_lock); and you choose to patch both of the alloc_*() functions. Why not just fix it at the common call site? Seems like that spin_lock(&hugetlb_lock) could be a cond_resched_lock() which would fix both cases. I agree to move the cond_resched() to a common site in set_max_huge_pages(). But do you mean the spin_lock in this while loop can be replaced by cond_resched_lock? IIUC, cond_resched_lock = spin_unlock+cond_resched+spin_lock. So could you please explain more details about it? Thanks. B.R. Justin Also, putting that cond_resched() inside the for_each_node*() loop is an odd choice. It seems to indicate that the loops can take a long time, which really isn't the case. The _loop_ isn't long, right?
Re: [RFC PATCH] mm/hugetlb: Avoid soft lockup in set_max_huge_pages()
On 7/26/16 11:58 PM, Dave Hansen wrote: On 07/26/2016 08:44 AM, Jia He wrote: This patch is to fix such soft lockup. I thouhgt it is safe to call cond_resched() because alloc_fresh_gigantic_page and alloc_fresh_huge_page are out of spin_lock/unlock section. Yikes. So the call site for both the things you patch is this: while (count > persistent_huge_pages(h)) { ... spin_unlock(&hugetlb_lock); if (hstate_is_gigantic(h)) ret = alloc_fresh_gigantic_page(h, nodes_allowed); else ret = alloc_fresh_huge_page(h, nodes_allowed); spin_lock(&hugetlb_lock); and you choose to patch both of the alloc_*() functions. Why not just fix it at the common call site? Seems like that spin_lock(&hugetlb_lock) could be a cond_resched_lock() which would fix both cases. Also, putting that cond_resched() inside the for_each_node*() loop is an odd choice. It seems to indicate that the loops can take a long time, which really isn't the case. The _loop_ isn't long, right? Yes,thanks for the suggestions Will send out V2 later B.R.
Re: [PATCH v2 2/3] lib: Introduce 2 bit ops api: all_is_bit_{one,zero}
Thanks, I will add it in next verison B.R. Justin 在 11/19/15 4:40 PM, xinhui 写道: hi, jia Nice patch. But I have one minor question. see inline comments. On 2015/11/19 14:48, Jia He wrote: This patch introduces 2 lightweight bit api. all_bit_is_zero return 1 if the bit string is all zero. The addr is the start address, the size is the bit size of the bit string. all_bit_is_one is the opposite. Signed-off-by: Jia He --- lib/find_bit.c | 50 ++ 1 file changed, 50 insertions(+) diff --git a/lib/find_bit.c b/lib/find_bit.c index 18072ea..1d56d8d 100644 --- a/lib/find_bit.c +++ b/lib/find_bit.c @@ -131,6 +131,56 @@ unsigned long find_last_bit(const unsigned long *addr, unsigned long size) EXPORT_SYMBOL(find_last_bit); #endif +#ifndef all_bit_is_zero +/* + * return val: 1 means all bit is zero + */ +unsigned int all_bit_is_zero(const unsigned long *addr, unsigned size) +{ Seems better that size should be type of "unsigned long". Otherwise I'm afraid when we compare idx * BITS_PER_LONG with size, there might be overflow issue. +unsigned long idx; +unsigned long mask = size; + +if (unlikely(size == 0)) +return 1; + +if (size > BITS_PER_LONG) { +for (idx = 0; idx * BITS_PER_LONG < size; idx++) +if (addr[idx]) +return 0; + +mask = size - (idx - 1) * BITS_PER_LONG; +} + +return !(*addr & BITMAP_LAST_WORD_MASK(mask)); +} +EXPORT_SYMBOL(all_bit_is_zero); +#endif + +#ifndef all_bit_is_one +/* + * return val: 1 means all bit is one + */ +unsigned int all_bit_is_one(const unsigned long *addr, unsigned size) +{ this argc of size should be type of "unsigned long", too. thanks xinhui +unsigned long idx; +unsigned long mask = size; + +if (unlikely(size == 0)) +return 1; + +if (size > BITS_PER_LONG) { +for (idx = 0; idx * BITS_PER_LONG < size; idx++) +if (~addr[idx]) +return 0; + +mask = size - (idx - 1) * BITS_PER_LONG; +} + +return !(~(*addr) & BITMAP_LAST_WORD_MASK(mask)); +} +EXPORT_SYMBOL(all_bit_is_one); +#endif + #ifdef __BIG_ENDIAN /* include/linux/byteorder does not support "unsigned long" type */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] linux/bitmap: Replace find_fisrt_{zero_}bit with the new lightweight api
Thanks, I only compiled and tested in x86_64, will check what's wrong in m68k B.R. Justin 在 11/19/15 10:53 AM, kbuild test robot 写道: Hi Jia, [auto build test ERROR on: v4.4-rc1] [also build test ERROR on: next-20151118] url: https://github.com/0day-ci/linux/commits/Jia-He/Improve-bitmap_empty-and-bitmap_full/20151119-103554 config: m68k-allyesconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=m68k All errors (new ones prefixed by >>): kernel/time/Kconfig:155:warning: range is invalid warning: (USB_OTG_FSM && FSL_USB2_OTG && USB_MV_OTG) selects USB_OTG which has unmet direct dependencies (USB_SUPPORT && USB && PM) warning: (SINGLE_MEMORY_CHUNK) selects NEED_MULTIPLE_NODES which has unmet direct dependencies (DISCONTIGMEM || NUMA) warning: (PREEMPT && DEBUG_ATOMIC_SLEEP) selects PREEMPT_COUNT which has unmet direct dependencies (COLDFIRE) warning: (USB_OTG_FSM && FSL_USB2_OTG && USB_MV_OTG) selects USB_OTG which has unmet direct dependencies (USB_SUPPORT && USB && PM) warning: (SINGLE_MEMORY_CHUNK) selects NEED_MULTIPLE_NODES which has unmet direct dependencies (DISCONTIGMEM || NUMA) warning: (PREEMPT && DEBUG_ATOMIC_SLEEP) selects PREEMPT_COUNT which has unmet direct dependencies (COLDFIRE) In file included from include/linux/cpumask.h:11:0, from include/linux/rcupdate.h:40, from include/linux/rbtree.h:34, from include/linux/sched.h:22, from arch/m68k/kernel/asm-offsets.c:14: include/linux/bitmap.h: In function 'bitmap_empty': include/linux/bitmap.h:284:2: error: implicit declaration of function 'all_bit_is_zero' [-Werror=implicit-function-declaration] return all_bit_is_zero(src, nbits); ^ include/linux/bitmap.h: In function 'bitmap_full': include/linux/bitmap.h:292:2: error: implicit declaration of function 'all_bit_is_one' [-Werror=implicit-function-declaration] return all_bit_is_one(src, nbits); ^ cc1: some warnings being treated as errors make[2]: *** [arch/m68k/kernel/asm-offsets.s] Error 1 make[2]: Target '__build' not remade because of errors. make[1]: *** [prepare0] Error 2 make[1]: Target 'prepare' not remade because of errors. make: *** [sub-make] Error 2 vim +/all_bit_is_zero +284 include/linux/bitmap.h 278 279 static inline int bitmap_empty(const unsigned long *src, unsigned nbits) 280 { 281 if (small_const_nbits(nbits)) 282 return ! (*src & BITMAP_LAST_WORD_MASK(nbits)); 283 > 284 return all_bit_is_zero(src, nbits); 285 } 286 287 static inline int bitmap_full(const unsigned long *src, unsigned int nbits) 288 { 289 if (small_const_nbits(nbits)) 290 return ! (~(*src) & BITMAP_LAST_WORD_MASK(nbits)); 291 > 292 return all_bit_is_one(src, nbits); 293 } 294 295 static __always_inline int bitmap_weight(const unsigned long *src, unsigned int nbits) --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging: lustre: Fix sparse warning on address-space
Hi all Ping again, any comments are welcome Justin On Mon, 29 Dec 2014 20:49:40 +0800 from hejia...@gmail.com wrote: > This patch is to fix the below warnings generated by sparse: > warning: incorrect type in argument 1 (different address spaces) > expected void const volatile [noderef] * > got int * > Adding the macro "__user" can suppress the warnings, and aslo some lines > are adjusted for 80 chars line limitation. > Already checked by sparse "C=2" > > Signed-off-by: Jia He > Cc: Oleg Drokin > Cc: Andreas Dilger > Cc: Greg Kroah-Hartman > --- > .../staging/lustre/lustre/include/lprocfs_status.h | 18 - > .../lustre/lustre/obdclass/lprocfs_status.c| 6 +-- > drivers/staging/lustre/lustre/osc/lproc_osc.c | 44 > +++--- > drivers/staging/lustre/lustre/osc/osc_request.c| 6 +-- > .../staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c| 6 +-- > 5 files changed, 40 insertions(+), 40 deletions(-) > > diff --git a/drivers/staging/lustre/lustre/include/lprocfs_status.h > b/drivers/staging/lustre/lustre/include/lprocfs_status.h > index cfe503b..ed4932f 100644 > --- a/drivers/staging/lustre/lustre/include/lprocfs_status.h > +++ b/drivers/staging/lustre/lustre/include/lprocfs_status.h > @@ -631,13 +631,13 @@ extern int lprocfs_wr_timeouts(struct file *file, const > char *buffer, > unsigned long count, void *data); > extern int lprocfs_wr_evict_client(struct file *file, const char *buffer, > size_t count, loff_t *off); > -extern int lprocfs_wr_ping(struct file *file, const char *buffer, > +extern int lprocfs_wr_ping(struct file *file, const char __user *buffer, > size_t count, loff_t *off); > -extern int lprocfs_wr_import(struct file *file, const char *buffer, > +extern int lprocfs_wr_import(struct file *file, const char __user *buffer, > size_t count, loff_t *off); > extern int lprocfs_rd_pinger_recov(struct seq_file *m, void *n); > -extern int lprocfs_wr_pinger_recov(struct file *file, const char *buffer, > -size_t count, loff_t *off); > +extern int lprocfs_wr_pinger_recov(struct file *file, > + const char __user *buffer, size_t count, loff_t *off); > > /* Statfs helpers */ > extern int lprocfs_rd_blksize(struct seq_file *m, void *data); > @@ -650,9 +650,9 @@ extern int lprocfs_rd_filesfree(struct seq_file *m, void > *data); > extern int lprocfs_write_helper(const char __user *buffer, unsigned long > count, > int *val); > extern int lprocfs_seq_read_frac_helper(struct seq_file *m, long val, int > mult); > -extern int lprocfs_write_u64_helper(const char *buffer, unsigned long count, > - __u64 *val); > -extern int lprocfs_write_frac_u64_helper(const char *buffer, > +extern int lprocfs_write_u64_helper(const char __user *buffer, > + unsigned long count, __u64 *val); > +extern int lprocfs_write_frac_u64_helper(const char __user *buffer, >unsigned long count, >__u64 *val, int mult); > extern char *lprocfs_find_named_value(const char *buffer, const char *name, > @@ -716,7 +716,7 @@ static struct file_operations name##_fops = { > \ > return lprocfs_rd_##type(m, m->private);\ > } \ > static ssize_t name##_##type##_seq_write(struct file *file, \ > - const char *buffer, size_t count, loff_t *off) \ > + const char __user *buffer, size_t count, loff_t *off) \ > { \ > struct seq_file *seq = file->private_data; \ > return lprocfs_wr_##type(file, buffer, \ > @@ -726,7 +726,7 @@ static struct file_operations name##_fops = { > \ > > #define LPROC_SEQ_FOPS_WR_ONLY(name, type) \ > static ssize_t name##_##type##_write(struct file *file, \ > - const char *buffer, size_t count, loff_t *off) \ > + const char __user *buffer, size_t count, loff_t *off) \ > { \ > return lprocfs_wr_##type(file, buffer, count, off); \ > } \ > diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c > b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c > index 3b7dfc3..9a84b52 100644 > --- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c > +++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c > @@ -1849,15 +1849,15 @@ int lprocfs_seq_read_frac_helper(struct seq_file *m, > long val, int mult)
Re: [PATCH] staging: lustre: libcfs: fix sparse warnings about static declaration
Hi all Ping again, any comments are welcome Justin On Tue, 16 Dec 2014 22:39:38 +0800 from hejia...@gmail.com wrote: > make sparse happy since these two fuchtion are only used in module.c. > tested by successful compilation. > > Signed-off-by: Jia He > Cc: Oleg Drokin > Cc: Andreas Dilger > Cc: Greg Kroah-Hartman > --- > drivers/staging/lustre/lustre/libcfs/module.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/staging/lustre/lustre/libcfs/module.c > b/drivers/staging/lustre/lustre/libcfs/module.c > index 2c4fc74..b9ee37c 100644 > --- a/drivers/staging/lustre/lustre/libcfs/module.c > +++ b/drivers/staging/lustre/lustre/libcfs/module.c > @@ -42,8 +42,7 @@ > #include "../../include/linux/lnet/lnet.h" > #include "tracefile.h" > > -void > -kportal_memhog_free (struct libcfs_device_userstate *ldu) > +static void kportal_memhog_free (struct libcfs_device_userstate *ldu) > { > struct page **level0p = &ldu->ldu_memhog_root_page; > struct page **level1p; > @@ -86,8 +85,7 @@ kportal_memhog_free (struct libcfs_device_userstate *ldu) > LASSERT (ldu->ldu_memhog_pages == 0); > } > > -int > -kportal_memhog_alloc(struct libcfs_device_userstate *ldu, int npages, > +static int kportal_memhog_alloc(struct libcfs_device_userstate *ldu, int > npages, >gfp_t flags) > { > struct page **level0p; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging: lustre: libcfs: fix sparse warnings about static declaration
Ping. Thanks for any comments On Tue, 16 Dec 2014 22:39:38 +0800 from hejia...@gmail.com wrote: > make sparse happy since these two fuchtion are only used in module.c. > tested by successful compilation. > > Signed-off-by: Jia He > Cc: Oleg Drokin > Cc: Andreas Dilger > Cc: Greg Kroah-Hartman > --- > drivers/staging/lustre/lustre/libcfs/module.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/staging/lustre/lustre/libcfs/module.c > b/drivers/staging/lustre/lustre/libcfs/module.c > index 2c4fc74..b9ee37c 100644 > --- a/drivers/staging/lustre/lustre/libcfs/module.c > +++ b/drivers/staging/lustre/lustre/libcfs/module.c > @@ -42,8 +42,7 @@ > #include "../../include/linux/lnet/lnet.h" > #include "tracefile.h" > > -void > -kportal_memhog_free (struct libcfs_device_userstate *ldu) > +static void kportal_memhog_free (struct libcfs_device_userstate *ldu) > { > struct page **level0p = &ldu->ldu_memhog_root_page; > struct page **level1p; > @@ -86,8 +85,7 @@ kportal_memhog_free (struct libcfs_device_userstate *ldu) > LASSERT (ldu->ldu_memhog_pages == 0); > } > > -int > -kportal_memhog_alloc(struct libcfs_device_userstate *ldu, int npages, > +static int kportal_memhog_alloc(struct libcfs_device_userstate *ldu, int > npages, >gfp_t flags) > { > struct page **level0p; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging: rtl8188eu: Fix coding style space related ERROR problems
Hi Greg Sorry for the inconvience. Seems I generated the patch @ your master branch instead of staging-test. I will rebase it later. B.R. Justin On Mon, 3 Nov 2014 16:25:57 -0800 from gre...@linuxfoundation.org wrote: > On Fri, Oct 31, 2014 at 10:07:15AM +0800, Jia He wrote: >> This fixes space related ERROR reports by checkpatch.pl >> Generated by $ git ls-files "drivers/staging/rtl8188eu/*.[ch]" | \ >> xargs ./scripts/checkpatch.pl -f --fix-inplace --strict --types=SPACING >> Already checked by text comparasion >> $git diff -w >> and binary comparasion of r8188eu.ko >> $objdiff diff >> >> Signed-off-by: Jia He > Doesn't apply to my tree at all: > > checking file drivers/staging/rtl8188eu/core/rtw_cmd.c > Hunk #1 succeeded at 637 (offset -1 lines). > checking file drivers/staging/rtl8188eu/core/rtw_ieee80211.c > checking file drivers/staging/rtl8188eu/core/rtw_mlme.c > Hunk #1 succeeded at 2115 (offset -2 lines). > checking file drivers/staging/rtl8188eu/core/rtw_mlme_ext.c > checking file drivers/staging/rtl8188eu/core/rtw_pwrctrl.c > Reversed (or previously applied) patch detected! Assume -R? [n] > Apply anyway? [n] > Skipping patch. > 1 out of 1 hunk ignored > checking file drivers/staging/rtl8188eu/core/rtw_recv.c > checking file drivers/staging/rtl8188eu/core/rtw_security.c > checking file drivers/staging/rtl8188eu/core/rtw_sta_mgt.c > checking file drivers/staging/rtl8188eu/core/rtw_xmit.c > checking file drivers/staging/rtl8188eu/hal/bb_cfg.c > Reversed (or previously applied) patch detected! Assume -R? [n] > Apply anyway? [n] > Skipping patch. > 2 out of 2 hunks ignored > checking file drivers/staging/rtl8188eu/hal/fw.c > checking file drivers/staging/rtl8188eu/hal/mac_cfg.c > checking file drivers/staging/rtl8188eu/hal/odm.c > checking file drivers/staging/rtl8188eu/hal/odm_HWConfig.c > checking file drivers/staging/rtl8188eu/hal/odm_RTL8188E.c > checking file drivers/staging/rtl8188eu/hal/phy.c > checking file drivers/staging/rtl8188eu/hal/rf.c > checking file drivers/staging/rtl8188eu/hal/rf_cfg.c > checking file drivers/staging/rtl8188eu/hal/rtl8188e_cmd.c > Hunk #1 succeeded at 252 (offset -2 lines). > checking file drivers/staging/rtl8188eu/hal/rtl8188eu_xmit.c > checking file drivers/staging/rtl8188eu/hal/usb_halinit.c > checking file drivers/staging/rtl8188eu/include/ieee80211_ext.h > checking file drivers/staging/rtl8188eu/include/odm_debug.h > Hunk #1 FAILED at 125. > 1 out of 1 hunk FAILED > checking file drivers/staging/rtl8188eu/include/osdep_service.h > checking file drivers/staging/rtl8188eu/include/rtw_debug.h > checking file drivers/staging/rtl8188eu/include/rtw_led.h > checking file drivers/staging/rtl8188eu/include/rtw_mlme_ext.h > checking file drivers/staging/rtl8188eu/include/wifi.h > checking file drivers/staging/rtl8188eu/os_dep/ioctl_linux.c > checking file drivers/staging/rtl8188eu/os_dep/os_intfs.c > checking file drivers/staging/rtl8188eu/os_dep/usb_ops_linux.c > checking file drivers/staging/rtl8188eu/os_dep/xmit_linux.c > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv3] staging: rtl8188eu: Fix coding style space related ERROR problems
Hi Greg On Thu, 30 Oct 2014 08:29:14 -0700 from gre...@linuxfoundation.org wrote: > On Thu, Oct 30, 2014 at 11:21:41PM +0800, Jia He wrote: >> >From v2: rebased upon gregkh/staging.git >> >From v1: suggested by j...@perches.com, fixed some points not checked >> or not correctly checked by checkpatch.pl > What is all of this in here for? This is my changelog for patch V1 -> V3 And I had removed it in my latest mail if it was boring. Thanks for any further comments B.R. Justin(Jia He) > > And you have odd spacing problems in your Subject :( > > greg k-h > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2] staging: rtl8188eu: Fix coding style space related ERROR problems
Hi Greg Just ping it, coz this is a practising patch for Eudyptula Challenge task 10. Any comment is welcome, thank you :) On Tue, 14 Oct 2014 11:28:32 +0800 from hejia...@gmail.com wrote: > This fixes space related ERROR reports by checkpatch.pl > Generated by $ git ls-files "drivers/staging/rtl8188eu/*.[ch]" | \ > xargs ./scripts/checkpatch.pl -f --fix-inplace --strict --types=SPACING > Already checked by text comparasion > $git diff -w > and binary comparasion of r8188eu.ko > $objdiff diff > > Signed-off-by: Jia He > Cc: Greg Kroah-Hartman > --- > drivers/staging/rtl8188eu/core/rtw_cmd.c | 2 +- > drivers/staging/rtl8188eu/core/rtw_ieee80211.c| 16 +- > drivers/staging/rtl8188eu/core/rtw_mlme.c | 2 +- > drivers/staging/rtl8188eu/core/rtw_mlme_ext.c | 22 +++--- > drivers/staging/rtl8188eu/core/rtw_pwrctrl.c | 2 +- > drivers/staging/rtl8188eu/core/rtw_recv.c | 12 > drivers/staging/rtl8188eu/core/rtw_security.c | 20 ++--- > drivers/staging/rtl8188eu/core/rtw_sta_mgt.c | 2 +- > drivers/staging/rtl8188eu/core/rtw_xmit.c | 10 +++ > drivers/staging/rtl8188eu/hal/bb_cfg.c| 4 +-- > drivers/staging/rtl8188eu/hal/fw.c| 8 ++--- > drivers/staging/rtl8188eu/hal/mac_cfg.c | 2 +- > drivers/staging/rtl8188eu/hal/odm.c | 8 ++--- > drivers/staging/rtl8188eu/hal/odm_HWConfig.c | 2 +- > drivers/staging/rtl8188eu/hal/odm_RTL8188E.c | 2 +- > drivers/staging/rtl8188eu/hal/phy.c | 2 +- > drivers/staging/rtl8188eu/hal/rf.c| 4 +-- > drivers/staging/rtl8188eu/hal/rf_cfg.c| 4 +-- > drivers/staging/rtl8188eu/hal/rtl8188e_cmd.c | 2 +- > drivers/staging/rtl8188eu/hal/rtl8188eu_xmit.c| 2 +- > drivers/staging/rtl8188eu/hal/usb_halinit.c | 2 +- > drivers/staging/rtl8188eu/include/ieee80211_ext.h | 20 ++--- > drivers/staging/rtl8188eu/include/odm_debug.h | 2 +- > drivers/staging/rtl8188eu/include/osdep_service.h | 4 +-- > drivers/staging/rtl8188eu/include/rtw_debug.h | 2 +- > drivers/staging/rtl8188eu/include/rtw_led.h | 2 +- > drivers/staging/rtl8188eu/include/rtw_mlme_ext.h | 26 > drivers/staging/rtl8188eu/include/wifi.h | 36 > +++ > drivers/staging/rtl8188eu/os_dep/ioctl_linux.c| 2 +- > drivers/staging/rtl8188eu/os_dep/os_intfs.c | 2 +- > drivers/staging/rtl8188eu/os_dep/usb_ops_linux.c | 4 +-- > drivers/staging/rtl8188eu/os_dep/xmit_linux.c | 2 +- > 32 files changed, 116 insertions(+), 116 deletions(-) > > diff --git a/drivers/staging/rtl8188eu/core/rtw_cmd.c > b/drivers/staging/rtl8188eu/core/rtw_cmd.c > index 9935e66..7b59a10 100644 > --- a/drivers/staging/rtl8188eu/core/rtw_cmd.c > +++ b/drivers/staging/rtl8188eu/core/rtw_cmd.c > @@ -638,7 +638,7 @@ u8 rtw_setstakey_cmd(struct adapter *padapter, u8 *psta, > u8 unicast_key) > ether_addr_copy(psetstakey_para->addr, sta->hwaddr); > > if (check_fwstate(pmlmepriv, WIFI_STATION_STATE)) > - psetstakey_para->algorithm = (unsigned char) > psecuritypriv->dot11PrivacyAlgrthm; > + psetstakey_para->algorithm = (unsigned > char)psecuritypriv->dot11PrivacyAlgrthm; > else > GET_ENCRY_ALGO(psecuritypriv, sta, psetstakey_para->algorithm, > false); > > diff --git a/drivers/staging/rtl8188eu/core/rtw_ieee80211.c > b/drivers/staging/rtl8188eu/core/rtw_ieee80211.c > index 755d3ef..f2c3ca7 100644 > --- a/drivers/staging/rtl8188eu/core/rtw_ieee80211.c > +++ b/drivers/staging/rtl8188eu/core/rtw_ieee80211.c > @@ -159,7 +159,7 @@ u8 *rtw_set_ie > return pbuf + len + 2; > } > > -inline u8 *rtw_set_ie_ch_switch (u8 *buf, u32 *buf_len, u8 ch_switch_mode, > +inline u8 *rtw_set_ie_ch_switch(u8 *buf, u32 *buf_len, u8 ch_switch_mode, > u8 new_ch, u8 ch_switch_cnt) > { > u8 ie_data[3]; > @@ -870,7 +870,7 @@ static int rtw_ieee802_11_parse_vendor_specific(u8 *pos, > uint elen, > if (elen < 4) { > if (show_errors) { > DBG_88E("short vendor specific information element > ignored (len=%lu)\n", > - (unsigned long) elen); > + (unsigned long)elen); > } > return -1; > } > @@ -890,7 +890,7 @@ static int rtw_ieee802_11_parse_vendor_specific(u8 *pos, > uint elen, > case WME_OUI_TYPE: /* this is a Wi-Fi WME info. element */ > if (elen < 5) { > DBG_88E("short WME information element ignored > (len=%lu)\n", > - (unsigned long) elen); > + (unsigned long)elen); > return -1; > } > switch (pos[4]) { > @@ -905,7 +905,7 @@ static int rtw_ieee802_11_parse_
Re: [PATCH 1/2] staging: rtl8188eu: Fix coding style space missing problems
Thanks for the suggestion, I will update it later On Thu, 09 Oct 2014 07:35:59 -0700 from j...@perches.com wrote: > git ls-files "drivers/staging/rtl8188eu/*.[ch]" | \ > xargs ./scripts/checkpatch.pl -f --fix-inplace --strict --types=SPACING -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/