Re: [PATCH 0/4] (CMA_AGGRESSIVE) Make CMA memory be more aggressive about allocation
On 10/22/14 20:02, Peter Hurley wrote: > On 10/16/2014 04:55 AM, Laura Abbott wrote: >> On 10/15/2014 8:35 PM, Hui Zhu wrote: >>> In fallbacks of page_alloc.c, MIGRATE_CMA is the fallback of >>> MIGRATE_MOVABLE. >>> MIGRATE_MOVABLE will use MIGRATE_CMA when it doesn't have a page in >>> order that Linux kernel want. >>> >>> If a system that has a lot of user space program is running, for >>> instance, an Android board, most of memory is in MIGRATE_MOVABLE and >>> allocated. Before function __rmqueue_fallback get memory from >>> MIGRATE_CMA, the oom_killer will kill a task to release memory when >>> kernel want get MIGRATE_UNMOVABLE memory because fallbacks of >>> MIGRATE_UNMOVABLE are MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE. >>> This status is odd. The MIGRATE_CMA has a lot free memory but Linux >>> kernel kill some tasks to release memory. >>> >>> This patch series adds a new function CMA_AGGRESSIVE to make CMA memory >>> be more aggressive about allocation. >>> If function CMA_AGGRESSIVE is available, when Linux kernel call function >>> __rmqueue try to get pages from MIGRATE_MOVABLE and conditions allow, >>> MIGRATE_CMA will be allocated as MIGRATE_MOVABLE first. If MIGRATE_CMA >>> doesn't have enough pages for allocation, go back to allocate memory from >>> MIGRATE_MOVABLE. >>> Then the memory of MIGRATE_MOVABLE can be kept for MIGRATE_UNMOVABLE and >>> MIGRATE_RECLAIMABLE which doesn't have fallback MIGRATE_CMA. >>> >> >> It's good to see another proposal to fix CMA utilization. Do you have >> any data about the success rate of CMA contiguous allocation after >> this patch series? I played around with a similar approach of using >> CMA for MIGRATE_MOVABLE allocations and found that although utilization >> did increase, contiguous allocations failed at a higher rate and were >> much slower. I see what this series is trying to do with avoiding >> allocation from CMA pages when a contiguous allocation is progress. >> My concern is that there would still be problems with contiguous >> allocation after all the MIGRATE_MOVABLE fallback has happened. > > What impact does this series have on x86 platforms now that CMA is the > backup allocator for all iommu dma allocations? They will not affect driver CMA memory allocation. Thanks, Hui > > Regards, > Peter Hurley >
Re: [PATCH 0/4] (CMA_AGGRESSIVE) Make CMA memory be more aggressive about allocation
On 10/22/14 20:02, Peter Hurley wrote: On 10/16/2014 04:55 AM, Laura Abbott wrote: On 10/15/2014 8:35 PM, Hui Zhu wrote: In fallbacks of page_alloc.c, MIGRATE_CMA is the fallback of MIGRATE_MOVABLE. MIGRATE_MOVABLE will use MIGRATE_CMA when it doesn't have a page in order that Linux kernel want. If a system that has a lot of user space program is running, for instance, an Android board, most of memory is in MIGRATE_MOVABLE and allocated. Before function __rmqueue_fallback get memory from MIGRATE_CMA, the oom_killer will kill a task to release memory when kernel want get MIGRATE_UNMOVABLE memory because fallbacks of MIGRATE_UNMOVABLE are MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE. This status is odd. The MIGRATE_CMA has a lot free memory but Linux kernel kill some tasks to release memory. This patch series adds a new function CMA_AGGRESSIVE to make CMA memory be more aggressive about allocation. If function CMA_AGGRESSIVE is available, when Linux kernel call function __rmqueue try to get pages from MIGRATE_MOVABLE and conditions allow, MIGRATE_CMA will be allocated as MIGRATE_MOVABLE first. If MIGRATE_CMA doesn't have enough pages for allocation, go back to allocate memory from MIGRATE_MOVABLE. Then the memory of MIGRATE_MOVABLE can be kept for MIGRATE_UNMOVABLE and MIGRATE_RECLAIMABLE which doesn't have fallback MIGRATE_CMA. It's good to see another proposal to fix CMA utilization. Do you have any data about the success rate of CMA contiguous allocation after this patch series? I played around with a similar approach of using CMA for MIGRATE_MOVABLE allocations and found that although utilization did increase, contiguous allocations failed at a higher rate and were much slower. I see what this series is trying to do with avoiding allocation from CMA pages when a contiguous allocation is progress. My concern is that there would still be problems with contiguous allocation after all the MIGRATE_MOVABLE fallback has happened. What impact does this series have on x86 platforms now that CMA is the backup allocator for all iommu dma allocations? They will not affect driver CMA memory allocation. Thanks, Hui Regards, Peter Hurley
Re: [PATCH 1/4] (CMA_AGGRESSIVE) Add CMA_AGGRESSIVE to Kconfig
On 10/22/14 09:27, Pavel Machek wrote: > Hi! > >> Add CMA_AGGRESSIVE config that depend on CMA to Linux kernel config. >> Add CMA_AGGRESSIVE_PHY_MAX, CMA_AGGRESSIVE_FREE_MIN and CMA_AGGRESSIVE_SHRINK >> that depend on CMA_AGGRESSIVE. >> >> If physical memory size (not include CMA memory) in byte less than or equal >> to >> CMA_AGGRESSIVE_PHY_MAX, CMA aggressive switch (sysctl >> vm.cma-aggressive-switch) >> will be opened. > > Ok... > > Do I understand it correctly that there is some problem with > hibernation not working on machines not working on machines with big > CMA areas...? No, these patches want to handle this issue that most of CMA memory is not allocated before lowmemorykiller or oom_killer begin to kill tasks. > > But adding 4 config options end-user has no chance to set right can > not be the best solution, can it? > >> +config CMA_AGGRESSIVE_PHY_MAX >> +hex "Physical memory size in Bytes that auto turn on the CMA aggressive >> switch" >> +depends on CMA_AGGRESSIVE >> +default 0x4000 >> +help >> + If physical memory size (not include CMA memory) in byte less than or >> + equal to this value, CMA aggressive switch will be opened. >> + After the Linux boot, sysctl "vm.cma-aggressive-switch" can control >> + the CMA AGGRESSIVE switch. > > For example... how am I expected to figure right value to place here? I agree with that. I will update this config to auto set in next version. Thanks, Hui > > Pavel >
Re: [PATCH 1/4] (CMA_AGGRESSIVE) Add CMA_AGGRESSIVE to Kconfig
On 10/22/14 09:27, Pavel Machek wrote: Hi! Add CMA_AGGRESSIVE config that depend on CMA to Linux kernel config. Add CMA_AGGRESSIVE_PHY_MAX, CMA_AGGRESSIVE_FREE_MIN and CMA_AGGRESSIVE_SHRINK that depend on CMA_AGGRESSIVE. If physical memory size (not include CMA memory) in byte less than or equal to CMA_AGGRESSIVE_PHY_MAX, CMA aggressive switch (sysctl vm.cma-aggressive-switch) will be opened. Ok... Do I understand it correctly that there is some problem with hibernation not working on machines not working on machines with big CMA areas...? No, these patches want to handle this issue that most of CMA memory is not allocated before lowmemorykiller or oom_killer begin to kill tasks. But adding 4 config options end-user has no chance to set right can not be the best solution, can it? +config CMA_AGGRESSIVE_PHY_MAX +hex Physical memory size in Bytes that auto turn on the CMA aggressive switch +depends on CMA_AGGRESSIVE +default 0x4000 +help + If physical memory size (not include CMA memory) in byte less than or + equal to this value, CMA aggressive switch will be opened. + After the Linux boot, sysctl vm.cma-aggressive-switch can control + the CMA AGGRESSIVE switch. For example... how am I expected to figure right value to place here? I agree with that. I will update this config to auto set in next version. Thanks, Hui Pavel
Re: [PATCH 0/4] (CMA_AGGRESSIVE) Make CMA memory be more aggressive about allocation
On 10/16/14 16:56, Laura Abbott wrote: > On 10/15/2014 8:35 PM, Hui Zhu wrote: >> In fallbacks of page_alloc.c, MIGRATE_CMA is the fallback of >> MIGRATE_MOVABLE. >> MIGRATE_MOVABLE will use MIGRATE_CMA when it doesn't have a page in >> order that Linux kernel want. >> >> If a system that has a lot of user space program is running, for >> instance, an Android board, most of memory is in MIGRATE_MOVABLE and >> allocated. Before function __rmqueue_fallback get memory from >> MIGRATE_CMA, the oom_killer will kill a task to release memory when >> kernel want get MIGRATE_UNMOVABLE memory because fallbacks of >> MIGRATE_UNMOVABLE are MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE. >> This status is odd. The MIGRATE_CMA has a lot free memory but Linux >> kernel kill some tasks to release memory. >> >> This patch series adds a new function CMA_AGGRESSIVE to make CMA memory >> be more aggressive about allocation. >> If function CMA_AGGRESSIVE is available, when Linux kernel call function >> __rmqueue try to get pages from MIGRATE_MOVABLE and conditions allow, >> MIGRATE_CMA will be allocated as MIGRATE_MOVABLE first. If MIGRATE_CMA >> doesn't have enough pages for allocation, go back to allocate memory from >> MIGRATE_MOVABLE. >> Then the memory of MIGRATE_MOVABLE can be kept for MIGRATE_UNMOVABLE and >> MIGRATE_RECLAIMABLE which doesn't have fallback MIGRATE_CMA. >> > > It's good to see another proposal to fix CMA utilization. Thanks Laura. Do you have > any data about the success rate of CMA contiguous allocation after > this patch series? I played around with a similar approach of using > CMA for MIGRATE_MOVABLE allocations and found that although utilization > did increase, contiguous allocations failed at a higher rate and were > much slower. I see what this series is trying to do with avoiding > allocation from CMA pages when a contiguous allocation is progress. > My concern is that there would still be problems with contiguous > allocation after all the MIGRATE_MOVABLE fallback has happened. I did some test with the cma_alloc_counter and cma-aggressive-shrink in a android board that has 1g memory. Run some apps to make free CMA close to the value of cma_aggressive_free_min(500 pages). A driver Begin to request CMA more than 10 times. Each time, it will request more than 3000 pages. I don't have established number for that because it is really hard to get a fail. I think the success rate is over 95% at least. And I think maybe the isolate fail has relation with page alloc and free code. Maybe let zone->lock protect more code can handle this issue. Thanks, Hui > > Thanks, > Laura >
Re: [PATCH 2/4] (CMA_AGGRESSIVE) Add argument hibernation to function shrink_all_memory
On 10/16/14 16:29, Rafael J. Wysocki wrote: > [CC list trimmed] > > On Thursday, October 16, 2014 11:35:49 AM Hui Zhu wrote: >> Function shrink_all_memory try to free `nr_to_reclaim' of memory. >> CMA_AGGRESSIVE_SHRINK function will call this functon to free >> `nr_to_reclaim' of >> memory. It need different scan_control with current caller function >> hibernate_preallocate_memory. >> >> If hibernation is true, the caller is hibernate_preallocate_memory. >> if not, the caller is CMA alloc function. >> >> Signed-off-by: Hui Zhu >> --- >> include/linux/swap.h| 3 ++- >> kernel/power/snapshot.c | 2 +- >> mm/vmscan.c | 19 +-- >> 3 files changed, 16 insertions(+), 8 deletions(-) >> >> diff --git a/include/linux/swap.h b/include/linux/swap.h >> index 37a585b..9f2cb43 100644 >> --- a/include/linux/swap.h >> +++ b/include/linux/swap.h >> @@ -335,7 +335,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct >> mem_cgroup *mem, >> gfp_t gfp_mask, bool noswap, >> struct zone *zone, >> unsigned long *nr_scanned); >> -extern unsigned long shrink_all_memory(unsigned long nr_pages); >> +extern unsigned long shrink_all_memory(unsigned long nr_pages, >> + bool hibernation); >> extern int vm_swappiness; >> extern int remove_mapping(struct address_space *mapping, struct page >> *page); >> extern unsigned long vm_total_pages; >> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c >> index 791a618..a00fc35 100644 >> --- a/kernel/power/snapshot.c >> +++ b/kernel/power/snapshot.c >> @@ -1657,7 +1657,7 @@ int hibernate_preallocate_memory(void) >> * NOTE: If this is not done, performance will be hurt badly in some >> * test cases. >> */ >> -shrink_all_memory(saveable - size); >> +shrink_all_memory(saveable - size, true); > > Instead of doing this, can you please define > > __shrink_all_memory() > > that will take the appropriate struct scan_control as an argument and > then define two wrappers around that, one for hibernation and one for CMA? > > The way you did it opens a field for bugs caused by passing a wrong value > as the second argument. Thanks Rafael. I will update patch according to your comments. Best, Hui > >> >> /* >> * The number of saveable pages in memory was too high, so apply some >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index dcb4707..fdcfa30 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -3404,7 +3404,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum >> zone_type classzone_idx) >> wake_up_interruptible(>kswapd_wait); >> } >> >> -#ifdef CONFIG_HIBERNATION >> +#if defined CONFIG_HIBERNATION || defined CONFIG_CMA_AGGRESSIVE >> /* >>* Try to free `nr_to_reclaim' of memory, system-wide, and return the >> number of >>* freed pages. >> @@ -3413,22 +3413,29 @@ void wakeup_kswapd(struct zone *zone, int order, >> enum zone_type classzone_idx) >>* LRU order by reclaiming preferentially >>* inactive > active > active referenced > active mapped >>*/ >> -unsigned long shrink_all_memory(unsigned long nr_to_reclaim) >> +unsigned long shrink_all_memory(unsigned long nr_to_reclaim, bool >> hibernation) >> { >> struct reclaim_state reclaim_state; >> struct scan_control sc = { >> .nr_to_reclaim = nr_to_reclaim, >> -.gfp_mask = GFP_HIGHUSER_MOVABLE, >> .priority = DEF_PRIORITY, >> -.may_writepage = 1, >> .may_unmap = 1, >> .may_swap = 1, >> -.hibernation_mode = 1, >> }; >> struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask); >> struct task_struct *p = current; >> unsigned long nr_reclaimed; >> >> +if (hibernation) { >> +sc.hibernation_mode = 1; >> +sc.may_writepage = 1; >> +sc.gfp_mask = GFP_HIGHUSER_MOVABLE; >> +} else { >> +sc.hibernation_mode = 0; >> +sc.may_writepage = !laptop_mode; >> +sc.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_HIGHMEM; >> +} >> + >> p->flags |= PF_MEMALLOC; >> lockdep_set_current_reclaim_state(sc.gfp_mask); >> reclaim_state.reclaimed_slab = 0; >> @@ -3442,7 +3449,7 @@ unsigned long shrink_all_memory(unsigned long >> nr_to_reclaim) >> >> return nr_reclaimed; >> } >> -#endif /* CONFIG_HIBERNATION */ >> +#endif /* CONFIG_HIBERNATION || CONFIG_CMA_AGGRESSIVE */ >> >> /* It's optimal to keep kswapds on the same CPUs as their memory, but >> not required for correctness. So if the last cpu in a node goes >> >
Re: [PATCH 2/4] (CMA_AGGRESSIVE) Add argument hibernation to function shrink_all_memory
On 10/16/14 16:29, Rafael J. Wysocki wrote: [CC list trimmed] On Thursday, October 16, 2014 11:35:49 AM Hui Zhu wrote: Function shrink_all_memory try to free `nr_to_reclaim' of memory. CMA_AGGRESSIVE_SHRINK function will call this functon to free `nr_to_reclaim' of memory. It need different scan_control with current caller function hibernate_preallocate_memory. If hibernation is true, the caller is hibernate_preallocate_memory. if not, the caller is CMA alloc function. Signed-off-by: Hui Zhu zhu...@xiaomi.com --- include/linux/swap.h| 3 ++- kernel/power/snapshot.c | 2 +- mm/vmscan.c | 19 +-- 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 37a585b..9f2cb43 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -335,7 +335,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem, gfp_t gfp_mask, bool noswap, struct zone *zone, unsigned long *nr_scanned); -extern unsigned long shrink_all_memory(unsigned long nr_pages); +extern unsigned long shrink_all_memory(unsigned long nr_pages, + bool hibernation); extern int vm_swappiness; extern int remove_mapping(struct address_space *mapping, struct page *page); extern unsigned long vm_total_pages; diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index 791a618..a00fc35 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -1657,7 +1657,7 @@ int hibernate_preallocate_memory(void) * NOTE: If this is not done, performance will be hurt badly in some * test cases. */ -shrink_all_memory(saveable - size); +shrink_all_memory(saveable - size, true); Instead of doing this, can you please define __shrink_all_memory() that will take the appropriate struct scan_control as an argument and then define two wrappers around that, one for hibernation and one for CMA? The way you did it opens a field for bugs caused by passing a wrong value as the second argument. Thanks Rafael. I will update patch according to your comments. Best, Hui /* * The number of saveable pages in memory was too high, so apply some diff --git a/mm/vmscan.c b/mm/vmscan.c index dcb4707..fdcfa30 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3404,7 +3404,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx) wake_up_interruptible(pgdat-kswapd_wait); } -#ifdef CONFIG_HIBERNATION +#if defined CONFIG_HIBERNATION || defined CONFIG_CMA_AGGRESSIVE /* * Try to free `nr_to_reclaim' of memory, system-wide, and return the number of * freed pages. @@ -3413,22 +3413,29 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx) * LRU order by reclaiming preferentially * inactive active active referenced active mapped */ -unsigned long shrink_all_memory(unsigned long nr_to_reclaim) +unsigned long shrink_all_memory(unsigned long nr_to_reclaim, bool hibernation) { struct reclaim_state reclaim_state; struct scan_control sc = { .nr_to_reclaim = nr_to_reclaim, -.gfp_mask = GFP_HIGHUSER_MOVABLE, .priority = DEF_PRIORITY, -.may_writepage = 1, .may_unmap = 1, .may_swap = 1, -.hibernation_mode = 1, }; struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask); struct task_struct *p = current; unsigned long nr_reclaimed; +if (hibernation) { +sc.hibernation_mode = 1; +sc.may_writepage = 1; +sc.gfp_mask = GFP_HIGHUSER_MOVABLE; +} else { +sc.hibernation_mode = 0; +sc.may_writepage = !laptop_mode; +sc.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_HIGHMEM; +} + p-flags |= PF_MEMALLOC; lockdep_set_current_reclaim_state(sc.gfp_mask); reclaim_state.reclaimed_slab = 0; @@ -3442,7 +3449,7 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim) return nr_reclaimed; } -#endif /* CONFIG_HIBERNATION */ +#endif /* CONFIG_HIBERNATION || CONFIG_CMA_AGGRESSIVE */ /* It's optimal to keep kswapds on the same CPUs as their memory, but not required for correctness. So if the last cpu in a node goes
Re: [PATCH 0/4] (CMA_AGGRESSIVE) Make CMA memory be more aggressive about allocation
On 10/16/14 16:56, Laura Abbott wrote: On 10/15/2014 8:35 PM, Hui Zhu wrote: In fallbacks of page_alloc.c, MIGRATE_CMA is the fallback of MIGRATE_MOVABLE. MIGRATE_MOVABLE will use MIGRATE_CMA when it doesn't have a page in order that Linux kernel want. If a system that has a lot of user space program is running, for instance, an Android board, most of memory is in MIGRATE_MOVABLE and allocated. Before function __rmqueue_fallback get memory from MIGRATE_CMA, the oom_killer will kill a task to release memory when kernel want get MIGRATE_UNMOVABLE memory because fallbacks of MIGRATE_UNMOVABLE are MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE. This status is odd. The MIGRATE_CMA has a lot free memory but Linux kernel kill some tasks to release memory. This patch series adds a new function CMA_AGGRESSIVE to make CMA memory be more aggressive about allocation. If function CMA_AGGRESSIVE is available, when Linux kernel call function __rmqueue try to get pages from MIGRATE_MOVABLE and conditions allow, MIGRATE_CMA will be allocated as MIGRATE_MOVABLE first. If MIGRATE_CMA doesn't have enough pages for allocation, go back to allocate memory from MIGRATE_MOVABLE. Then the memory of MIGRATE_MOVABLE can be kept for MIGRATE_UNMOVABLE and MIGRATE_RECLAIMABLE which doesn't have fallback MIGRATE_CMA. It's good to see another proposal to fix CMA utilization. Thanks Laura. Do you have any data about the success rate of CMA contiguous allocation after this patch series? I played around with a similar approach of using CMA for MIGRATE_MOVABLE allocations and found that although utilization did increase, contiguous allocations failed at a higher rate and were much slower. I see what this series is trying to do with avoiding allocation from CMA pages when a contiguous allocation is progress. My concern is that there would still be problems with contiguous allocation after all the MIGRATE_MOVABLE fallback has happened. I did some test with the cma_alloc_counter and cma-aggressive-shrink in a android board that has 1g memory. Run some apps to make free CMA close to the value of cma_aggressive_free_min(500 pages). A driver Begin to request CMA more than 10 times. Each time, it will request more than 3000 pages. I don't have established number for that because it is really hard to get a fail. I think the success rate is over 95% at least. And I think maybe the isolate fail has relation with page alloc and free code. Maybe let zone-lock protect more code can handle this issue. Thanks, Hui Thanks, Laura
Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that try to kill a task
2014 09 24 23:36, Rik van Riel: > On 09/22/2014 10:57 PM, Hui Zhu wrote: >> The cause of this issue is when free memroy size is low and a lot of task is >> trying to shrink the memory, the task that is killed by lowmemkiller cannot >> get >> CPU to exit itself. >> >> Fix this issue with change the scheduling policy to SCHED_FIFO if a task's >> flag >> is TIF_MEMDIE in lowmemkiller. > > Is it actually true that the task that was killed by lowmemkiller > cannot get CPU time? I am so sorry that answer this mail late because I tried to do more test around it. But this issue is really hard to reproduce the issue. I got a special app that can reproduce this issue easyly. But I still need retry a lot of times to repdroduce this issue. And I found that most of time, the task cannot be killed because it is blocked by binder_lock. It looks like there are something wrong with a task that get binder_lock and it is blocked by another thing. So I make a patch that change a binder_lock to binder_lock_killable to handle this issue.(I will post it later) It work sometime but I am not sure it is right. And I just met one time, the kernel with the binder patch and without the lowmemkiller SCHED_FIFO patch, a task that didn't blocked by a lock. And different tasks call lowmemkiller tried to kill this task. I think the root cause of this issue is killed task cannot get cpu. But I just got this issue one time. > > It is also possible that the task is busy in the kernel, for example > in the reclaim code, and is not breaking out of some loop fast enough, > despite the TIF_MEMDIE flag being set. > > I suspect SCHED_FIFO simply papers over that kind of issue, by not > letting anything else run until the task is gone, instead of fixing > the root cause of the problem. > > According to I introduction, I think lowmemkiller SCHED_FIFO patch maybe can handle some issue. Thanks, Hui
Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that try to kill a task
2014 09 24 23:36, Rik van Riel: On 09/22/2014 10:57 PM, Hui Zhu wrote: The cause of this issue is when free memroy size is low and a lot of task is trying to shrink the memory, the task that is killed by lowmemkiller cannot get CPU to exit itself. Fix this issue with change the scheduling policy to SCHED_FIFO if a task's flag is TIF_MEMDIE in lowmemkiller. Is it actually true that the task that was killed by lowmemkiller cannot get CPU time? I am so sorry that answer this mail late because I tried to do more test around it. But this issue is really hard to reproduce the issue. I got a special app that can reproduce this issue easyly. But I still need retry a lot of times to repdroduce this issue. And I found that most of time, the task cannot be killed because it is blocked by binder_lock. It looks like there are something wrong with a task that get binder_lock and it is blocked by another thing. So I make a patch that change a binder_lock to binder_lock_killable to handle this issue.(I will post it later) It work sometime but I am not sure it is right. And I just met one time, the kernel with the binder patch and without the lowmemkiller SCHED_FIFO patch, a task that didn't blocked by a lock. And different tasks call lowmemkiller tried to kill this task. I think the root cause of this issue is killed task cannot get cpu. But I just got this issue one time. It is also possible that the task is busy in the kernel, for example in the reclaim code, and is not breaking out of some loop fast enough, despite the TIF_MEMDIE flag being set. I suspect SCHED_FIFO simply papers over that kind of issue, by not letting anything else run until the task is gone, instead of fixing the root cause of the problem. According to I introduction, I think lowmemkiller SCHED_FIFO patch maybe can handle some issue. Thanks, Hui
Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that try to kill a task
On 09/23/14 16:00, Weijie Yang wrote: > On Tue, Sep 23, 2014 at 12:48 PM, 朱辉 wrote: >> >> >> On 09/23/14 12:18, Greg KH wrote: >>> On Tue, Sep 23, 2014 at 10:57:09AM +0800, Hui Zhu wrote: >>>> The cause of this issue is when free memroy size is low and a lot of task >>>> is >>>> trying to shrink the memory, the task that is killed by lowmemkiller >>>> cannot get >>>> CPU to exit itself. >>>> >>>> Fix this issue with change the scheduling policy to SCHED_FIFO if a task's >>>> flag >>>> is TIF_MEMDIE in lowmemkiller. >>>> >>>> Signed-off-by: Hui Zhu >>>> --- >>>>drivers/staging/android/lowmemorykiller.c | 4 >>>>1 file changed, 4 insertions(+) >>>> >>>> diff --git a/drivers/staging/android/lowmemorykiller.c >>>> b/drivers/staging/android/lowmemorykiller.c >>>> index b545d3d..ca1ffac 100644 >>>> --- a/drivers/staging/android/lowmemorykiller.c >>>> +++ b/drivers/staging/android/lowmemorykiller.c >>>> @@ -129,6 +129,10 @@ static unsigned long lowmem_scan(struct shrinker *s, >>>> struct shrink_control *sc) >>>> >>>> if (test_tsk_thread_flag(p, TIF_MEMDIE) && >>>> time_before_eq(jiffies, lowmem_deathpending_timeout)) { >>>> +struct sched_param param = { .sched_priority = 1 }; >>>> + >>>> +if (p->policy == SCHED_NORMAL) >>>> +sched_setscheduler(p, SCHED_FIFO, ); >>> >>> This seems really specific to a specific scheduler pattern now. Isn't >>> there some other way to resolve this? > > hui, how about modify lowmem_deathpending_timeout if we don't > touch scheduler pattern? I tried to change line "lowmem_deathpending_timeout = jiffies + HZ" to "lowmem_deathpending_timeout = jiffies + HZ * 10". But the issue can be reproduced sometimes. Could you give me some comments on this part? > >> I tried to let the task that call lowmemkiller sleep some time when it >> try to kill same task. But it doesn't work. >> I think the issue is that the free memroy size is too low to make more >> and more tasks come to call lowmemkiller. > > I am not opposed to the idea that the task which is selected to be killed > should exit ASAP. > > I want to make it clear, what is problem for the existing code and which > effect we can get by applying this patch. > 1. LMK count is increased, which can be reduced by applying this patch? I think the free mem size will grow faster than without this patch. > 2. app become more sluggish? I didn't get that in my part. > > By the way, whether we need to modify out_of_memory() which also > try to kill task? I am not sure because LMK handle the memory issue early than OOM. But I think this issue will not affect OOM because OOM has oom_zonelist_trylock and oom_zonelist_unlock. Thanks, Hui > >> Thanks, >> Hui >> >>> >>> thanks, >>> >>> greg k-h >>> >
Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that try to kill a task
On 09/23/14 16:00, Weijie Yang wrote: On Tue, Sep 23, 2014 at 12:48 PM, 朱辉 zhu...@xiaomi.com wrote: On 09/23/14 12:18, Greg KH wrote: On Tue, Sep 23, 2014 at 10:57:09AM +0800, Hui Zhu wrote: The cause of this issue is when free memroy size is low and a lot of task is trying to shrink the memory, the task that is killed by lowmemkiller cannot get CPU to exit itself. Fix this issue with change the scheduling policy to SCHED_FIFO if a task's flag is TIF_MEMDIE in lowmemkiller. Signed-off-by: Hui Zhu zhu...@xiaomi.com --- drivers/staging/android/lowmemorykiller.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index b545d3d..ca1ffac 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -129,6 +129,10 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc) if (test_tsk_thread_flag(p, TIF_MEMDIE) time_before_eq(jiffies, lowmem_deathpending_timeout)) { +struct sched_param param = { .sched_priority = 1 }; + +if (p-policy == SCHED_NORMAL) +sched_setscheduler(p, SCHED_FIFO, param); This seems really specific to a specific scheduler pattern now. Isn't there some other way to resolve this? hui, how about modify lowmem_deathpending_timeout if we don't touch scheduler pattern? I tried to change line lowmem_deathpending_timeout = jiffies + HZ to lowmem_deathpending_timeout = jiffies + HZ * 10. But the issue can be reproduced sometimes. Could you give me some comments on this part? I tried to let the task that call lowmemkiller sleep some time when it try to kill same task. But it doesn't work. I think the issue is that the free memroy size is too low to make more and more tasks come to call lowmemkiller. I am not opposed to the idea that the task which is selected to be killed should exit ASAP. I want to make it clear, what is problem for the existing code and which effect we can get by applying this patch. 1. LMK count is increased, which can be reduced by applying this patch? I think the free mem size will grow faster than without this patch. 2. app become more sluggish? I didn't get that in my part. By the way, whether we need to modify out_of_memory() which also try to kill task? I am not sure because LMK handle the memory issue early than OOM. But I think this issue will not affect OOM because OOM has oom_zonelist_trylock and oom_zonelist_unlock. Thanks, Hui Thanks, Hui thanks, greg k-h
Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that try to kill a task
On 09/23/14 12:18, Greg KH wrote: > On Tue, Sep 23, 2014 at 10:57:09AM +0800, Hui Zhu wrote: >> The cause of this issue is when free memroy size is low and a lot of task is >> trying to shrink the memory, the task that is killed by lowmemkiller cannot >> get >> CPU to exit itself. >> >> Fix this issue with change the scheduling policy to SCHED_FIFO if a task's >> flag >> is TIF_MEMDIE in lowmemkiller. >> >> Signed-off-by: Hui Zhu >> --- >> drivers/staging/android/lowmemorykiller.c | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/staging/android/lowmemorykiller.c >> b/drivers/staging/android/lowmemorykiller.c >> index b545d3d..ca1ffac 100644 >> --- a/drivers/staging/android/lowmemorykiller.c >> +++ b/drivers/staging/android/lowmemorykiller.c >> @@ -129,6 +129,10 @@ static unsigned long lowmem_scan(struct shrinker *s, >> struct shrink_control *sc) >> >> if (test_tsk_thread_flag(p, TIF_MEMDIE) && >> time_before_eq(jiffies, lowmem_deathpending_timeout)) { >> +struct sched_param param = { .sched_priority = 1 }; >> + >> +if (p->policy == SCHED_NORMAL) >> +sched_setscheduler(p, SCHED_FIFO, ); > > This seems really specific to a specific scheduler pattern now. Isn't > there some other way to resolve this? I tried to let the task that call lowmemkiller sleep some time when it try to kill same task. But it doesn't work. I think the issue is that the free memroy size is too low to make more and more tasks come to call lowmemkiller. Thanks, Hui > > thanks, > > greg k-h >
Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that try to kill a task
On 09/23/14 12:18, Greg KH wrote: On Tue, Sep 23, 2014 at 10:57:09AM +0800, Hui Zhu wrote: The cause of this issue is when free memroy size is low and a lot of task is trying to shrink the memory, the task that is killed by lowmemkiller cannot get CPU to exit itself. Fix this issue with change the scheduling policy to SCHED_FIFO if a task's flag is TIF_MEMDIE in lowmemkiller. Signed-off-by: Hui Zhu zhu...@xiaomi.com --- drivers/staging/android/lowmemorykiller.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index b545d3d..ca1ffac 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -129,6 +129,10 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc) if (test_tsk_thread_flag(p, TIF_MEMDIE) time_before_eq(jiffies, lowmem_deathpending_timeout)) { +struct sched_param param = { .sched_priority = 1 }; + +if (p-policy == SCHED_NORMAL) +sched_setscheduler(p, SCHED_FIFO, param); This seems really specific to a specific scheduler pattern now. Isn't there some other way to resolve this? I tried to let the task that call lowmemkiller sleep some time when it try to kill same task. But it doesn't work. I think the issue is that the free memroy size is too low to make more and more tasks come to call lowmemkiller. Thanks, Hui thanks, greg k-h