Re: [PATCH 0/4] (CMA_AGGRESSIVE) Make CMA memory be more aggressive about allocation

2014-10-22 Thread


On 10/22/14 20:02, Peter Hurley wrote:
> On 10/16/2014 04:55 AM, Laura Abbott wrote:
>> On 10/15/2014 8:35 PM, Hui Zhu wrote:
>>> In fallbacks of page_alloc.c, MIGRATE_CMA is the fallback of
>>> MIGRATE_MOVABLE.
>>> MIGRATE_MOVABLE will use MIGRATE_CMA when it doesn't have a page in
>>> order that Linux kernel want.
>>>
>>> If a system that has a lot of user space program is running, for
>>> instance, an Android board, most of memory is in MIGRATE_MOVABLE and
>>> allocated.  Before function __rmqueue_fallback get memory from
>>> MIGRATE_CMA, the oom_killer will kill a task to release memory when
>>> kernel want get MIGRATE_UNMOVABLE memory because fallbacks of
>>> MIGRATE_UNMOVABLE are MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE.
>>> This status is odd.  The MIGRATE_CMA has a lot free memory but Linux
>>> kernel kill some tasks to release memory.
>>>
>>> This patch series adds a new function CMA_AGGRESSIVE to make CMA memory
>>> be more aggressive about allocation.
>>> If function CMA_AGGRESSIVE is available, when Linux kernel call function
>>> __rmqueue try to get pages from MIGRATE_MOVABLE and conditions allow,
>>> MIGRATE_CMA will be allocated as MIGRATE_MOVABLE first.  If MIGRATE_CMA
>>> doesn't have enough pages for allocation, go back to allocate memory from
>>> MIGRATE_MOVABLE.
>>> Then the memory of MIGRATE_MOVABLE can be kept for MIGRATE_UNMOVABLE and
>>> MIGRATE_RECLAIMABLE which doesn't have fallback MIGRATE_CMA.
>>>
>>
>> It's good to see another proposal to fix CMA utilization. Do you have
>> any data about the success rate of CMA contiguous allocation after
>> this patch series? I played around with a similar approach of using
>> CMA for MIGRATE_MOVABLE allocations and found that although utilization
>> did increase, contiguous allocations failed at a higher rate and were
>> much slower. I see what this series is trying to do with avoiding
>> allocation from CMA pages when a contiguous allocation is progress.
>> My concern is that there would still be problems with contiguous
>> allocation after all the MIGRATE_MOVABLE fallback has happened.
>
> What impact does this series have on x86 platforms now that CMA is the
> backup allocator for all iommu dma allocations?

They will not affect driver CMA memory allocation.

Thanks,
Hui

>
> Regards,
> Peter Hurley
>


Re: [PATCH 0/4] (CMA_AGGRESSIVE) Make CMA memory be more aggressive about allocation

2014-10-22 Thread


On 10/22/14 20:02, Peter Hurley wrote:
 On 10/16/2014 04:55 AM, Laura Abbott wrote:
 On 10/15/2014 8:35 PM, Hui Zhu wrote:
 In fallbacks of page_alloc.c, MIGRATE_CMA is the fallback of
 MIGRATE_MOVABLE.
 MIGRATE_MOVABLE will use MIGRATE_CMA when it doesn't have a page in
 order that Linux kernel want.

 If a system that has a lot of user space program is running, for
 instance, an Android board, most of memory is in MIGRATE_MOVABLE and
 allocated.  Before function __rmqueue_fallback get memory from
 MIGRATE_CMA, the oom_killer will kill a task to release memory when
 kernel want get MIGRATE_UNMOVABLE memory because fallbacks of
 MIGRATE_UNMOVABLE are MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE.
 This status is odd.  The MIGRATE_CMA has a lot free memory but Linux
 kernel kill some tasks to release memory.

 This patch series adds a new function CMA_AGGRESSIVE to make CMA memory
 be more aggressive about allocation.
 If function CMA_AGGRESSIVE is available, when Linux kernel call function
 __rmqueue try to get pages from MIGRATE_MOVABLE and conditions allow,
 MIGRATE_CMA will be allocated as MIGRATE_MOVABLE first.  If MIGRATE_CMA
 doesn't have enough pages for allocation, go back to allocate memory from
 MIGRATE_MOVABLE.
 Then the memory of MIGRATE_MOVABLE can be kept for MIGRATE_UNMOVABLE and
 MIGRATE_RECLAIMABLE which doesn't have fallback MIGRATE_CMA.


 It's good to see another proposal to fix CMA utilization. Do you have
 any data about the success rate of CMA contiguous allocation after
 this patch series? I played around with a similar approach of using
 CMA for MIGRATE_MOVABLE allocations and found that although utilization
 did increase, contiguous allocations failed at a higher rate and were
 much slower. I see what this series is trying to do with avoiding
 allocation from CMA pages when a contiguous allocation is progress.
 My concern is that there would still be problems with contiguous
 allocation after all the MIGRATE_MOVABLE fallback has happened.

 What impact does this series have on x86 platforms now that CMA is the
 backup allocator for all iommu dma allocations?

They will not affect driver CMA memory allocation.

Thanks,
Hui


 Regards,
 Peter Hurley



Re: [PATCH 1/4] (CMA_AGGRESSIVE) Add CMA_AGGRESSIVE to Kconfig

2014-10-21 Thread


On 10/22/14 09:27, Pavel Machek wrote:
> Hi!
>
>> Add CMA_AGGRESSIVE config that depend on CMA to Linux kernel config.
>> Add CMA_AGGRESSIVE_PHY_MAX, CMA_AGGRESSIVE_FREE_MIN and CMA_AGGRESSIVE_SHRINK
>> that depend on CMA_AGGRESSIVE.
>>
>> If physical memory size (not include CMA memory) in byte less than or equal 
>> to
>> CMA_AGGRESSIVE_PHY_MAX, CMA aggressive switch (sysctl 
>> vm.cma-aggressive-switch)
>> will be opened.
>
> Ok...
>
> Do I understand it correctly that there is some problem with
> hibernation not working on machines not working on machines with big
> CMA areas...?

No, these patches want to handle this issue that most of CMA memory is 
not allocated before lowmemorykiller or oom_killer begin to kill tasks.

>
> But adding 4 config options end-user has no chance to set right can
> not be the best solution, can it?
>
>> +config CMA_AGGRESSIVE_PHY_MAX
>> +hex "Physical memory size in Bytes that auto turn on the CMA aggressive 
>> switch"
>> +depends on CMA_AGGRESSIVE
>> +default 0x4000
>> +help
>> +  If physical memory size (not include CMA memory) in byte less than or
>> +  equal to this value, CMA aggressive switch will be opened.
>> +  After the Linux boot, sysctl "vm.cma-aggressive-switch" can control
>> +  the CMA AGGRESSIVE switch.
>
> For example... how am I expected to figure right value to place here?

I agree with that.  I will update this config to auto set in next version.

Thanks,
Hui

>
>   Pavel
>


Re: [PATCH 1/4] (CMA_AGGRESSIVE) Add CMA_AGGRESSIVE to Kconfig

2014-10-21 Thread


On 10/22/14 09:27, Pavel Machek wrote:
 Hi!

 Add CMA_AGGRESSIVE config that depend on CMA to Linux kernel config.
 Add CMA_AGGRESSIVE_PHY_MAX, CMA_AGGRESSIVE_FREE_MIN and CMA_AGGRESSIVE_SHRINK
 that depend on CMA_AGGRESSIVE.

 If physical memory size (not include CMA memory) in byte less than or equal 
 to
 CMA_AGGRESSIVE_PHY_MAX, CMA aggressive switch (sysctl 
 vm.cma-aggressive-switch)
 will be opened.

 Ok...

 Do I understand it correctly that there is some problem with
 hibernation not working on machines not working on machines with big
 CMA areas...?

No, these patches want to handle this issue that most of CMA memory is 
not allocated before lowmemorykiller or oom_killer begin to kill tasks.


 But adding 4 config options end-user has no chance to set right can
 not be the best solution, can it?

 +config CMA_AGGRESSIVE_PHY_MAX
 +hex Physical memory size in Bytes that auto turn on the CMA aggressive 
 switch
 +depends on CMA_AGGRESSIVE
 +default 0x4000
 +help
 +  If physical memory size (not include CMA memory) in byte less than or
 +  equal to this value, CMA aggressive switch will be opened.
 +  After the Linux boot, sysctl vm.cma-aggressive-switch can control
 +  the CMA AGGRESSIVE switch.

 For example... how am I expected to figure right value to place here?

I agree with that.  I will update this config to auto set in next version.

Thanks,
Hui


   Pavel



Re: [PATCH 0/4] (CMA_AGGRESSIVE) Make CMA memory be more aggressive about allocation

2014-10-17 Thread


On 10/16/14 16:56, Laura Abbott wrote:
> On 10/15/2014 8:35 PM, Hui Zhu wrote:
>> In fallbacks of page_alloc.c, MIGRATE_CMA is the fallback of
>> MIGRATE_MOVABLE.
>> MIGRATE_MOVABLE will use MIGRATE_CMA when it doesn't have a page in
>> order that Linux kernel want.
>>
>> If a system that has a lot of user space program is running, for
>> instance, an Android board, most of memory is in MIGRATE_MOVABLE and
>> allocated.  Before function __rmqueue_fallback get memory from
>> MIGRATE_CMA, the oom_killer will kill a task to release memory when
>> kernel want get MIGRATE_UNMOVABLE memory because fallbacks of
>> MIGRATE_UNMOVABLE are MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE.
>> This status is odd.  The MIGRATE_CMA has a lot free memory but Linux
>> kernel kill some tasks to release memory.
>>
>> This patch series adds a new function CMA_AGGRESSIVE to make CMA memory
>> be more aggressive about allocation.
>> If function CMA_AGGRESSIVE is available, when Linux kernel call function
>> __rmqueue try to get pages from MIGRATE_MOVABLE and conditions allow,
>> MIGRATE_CMA will be allocated as MIGRATE_MOVABLE first.  If MIGRATE_CMA
>> doesn't have enough pages for allocation, go back to allocate memory from
>> MIGRATE_MOVABLE.
>> Then the memory of MIGRATE_MOVABLE can be kept for MIGRATE_UNMOVABLE and
>> MIGRATE_RECLAIMABLE which doesn't have fallback MIGRATE_CMA.
>>
>
> It's good to see another proposal to fix CMA utilization.

Thanks Laura.

Do you have
> any data about the success rate of CMA contiguous allocation after
> this patch series?   I played around with a similar approach of using
> CMA for MIGRATE_MOVABLE allocations and found that although utilization
> did increase, contiguous allocations failed at a higher rate and were
> much slower. I see what this series is trying to do with avoiding
> allocation from CMA pages when a contiguous allocation is progress.
> My concern is that there would still be problems with contiguous
> allocation after all the MIGRATE_MOVABLE fallback has happened.

I did some test with the cma_alloc_counter and cma-aggressive-shrink in 
a android board that has 1g memory.  Run some apps to make free CMA 
close to the value of cma_aggressive_free_min(500 pages).  A driver 
Begin to request CMA more than 10 times. Each time, it will request more 
than 3000 pages.

I don't have established number for that because it is really hard to 
get a fail.  I think the success rate is over 95% at least.

And I think maybe the isolate fail has relation with page alloc and free 
code.  Maybe let zone->lock protect more code can handle this issue.

Thanks,
Hui

>
> Thanks,
> Laura
>


Re: [PATCH 2/4] (CMA_AGGRESSIVE) Add argument hibernation to function shrink_all_memory

2014-10-17 Thread


On 10/16/14 16:29, Rafael J. Wysocki wrote:
> [CC list trimmed]
>
> On Thursday, October 16, 2014 11:35:49 AM Hui Zhu wrote:
>> Function shrink_all_memory try to free `nr_to_reclaim' of memory.
>> CMA_AGGRESSIVE_SHRINK function will call this functon to free 
>> `nr_to_reclaim' of
>> memory.  It need different scan_control with current caller function
>> hibernate_preallocate_memory.
>>
>> If hibernation is true, the caller is hibernate_preallocate_memory.
>> if not, the caller is CMA alloc function.
>>
>> Signed-off-by: Hui Zhu 
>> ---
>>   include/linux/swap.h|  3 ++-
>>   kernel/power/snapshot.c |  2 +-
>>   mm/vmscan.c | 19 +--
>>   3 files changed, 16 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index 37a585b..9f2cb43 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -335,7 +335,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct 
>> mem_cgroup *mem,
>>  gfp_t gfp_mask, bool noswap,
>>  struct zone *zone,
>>  unsigned long *nr_scanned);
>> -extern unsigned long shrink_all_memory(unsigned long nr_pages);
>> +extern unsigned long shrink_all_memory(unsigned long nr_pages,
>> +   bool hibernation);
>>   extern int vm_swappiness;
>>   extern int remove_mapping(struct address_space *mapping, struct page 
>> *page);
>>   extern unsigned long vm_total_pages;
>> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
>> index 791a618..a00fc35 100644
>> --- a/kernel/power/snapshot.c
>> +++ b/kernel/power/snapshot.c
>> @@ -1657,7 +1657,7 @@ int hibernate_preallocate_memory(void)
>>   * NOTE: If this is not done, performance will be hurt badly in some
>>   * test cases.
>>   */
>> -shrink_all_memory(saveable - size);
>> +shrink_all_memory(saveable - size, true);
>
> Instead of doing this, can you please define
>
> __shrink_all_memory()
>
> that will take the appropriate struct scan_control as an argument and
> then define two wrappers around that, one for hibernation and one for CMA?
>
> The way you did it opens a field for bugs caused by passing a wrong value
> as the second argument.

Thanks Rafael.
I will update patch according to your comments.

Best,
Hui

>
>>
>>  /*
>>   * The number of saveable pages in memory was too high, so apply some
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index dcb4707..fdcfa30 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -3404,7 +3404,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum 
>> zone_type classzone_idx)
>>  wake_up_interruptible(>kswapd_wait);
>>   }
>>
>> -#ifdef CONFIG_HIBERNATION
>> +#if defined CONFIG_HIBERNATION || defined CONFIG_CMA_AGGRESSIVE
>>   /*
>>* Try to free `nr_to_reclaim' of memory, system-wide, and return the 
>> number of
>>* freed pages.
>> @@ -3413,22 +3413,29 @@ void wakeup_kswapd(struct zone *zone, int order, 
>> enum zone_type classzone_idx)
>>* LRU order by reclaiming preferentially
>>* inactive > active > active referenced > active mapped
>>*/
>> -unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
>> +unsigned long shrink_all_memory(unsigned long nr_to_reclaim, bool 
>> hibernation)
>>   {
>>  struct reclaim_state reclaim_state;
>>  struct scan_control sc = {
>>  .nr_to_reclaim = nr_to_reclaim,
>> -.gfp_mask = GFP_HIGHUSER_MOVABLE,
>>  .priority = DEF_PRIORITY,
>> -.may_writepage = 1,
>>  .may_unmap = 1,
>>  .may_swap = 1,
>> -.hibernation_mode = 1,
>>  };
>>  struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
>>  struct task_struct *p = current;
>>  unsigned long nr_reclaimed;
>>
>> +if (hibernation) {
>> +sc.hibernation_mode = 1;
>> +sc.may_writepage = 1;
>> +sc.gfp_mask = GFP_HIGHUSER_MOVABLE;
>> +} else {
>> +sc.hibernation_mode = 0;
>> +sc.may_writepage = !laptop_mode;
>> +sc.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_HIGHMEM;
>> +}
>> +
>>  p->flags |= PF_MEMALLOC;
>>  lockdep_set_current_reclaim_state(sc.gfp_mask);
>>  reclaim_state.reclaimed_slab = 0;
>> @@ -3442,7 +3449,7 @@ unsigned long shrink_all_memory(unsigned long 
>> nr_to_reclaim)
>>
>>  return nr_reclaimed;
>>   }
>> -#endif /* CONFIG_HIBERNATION */
>> +#endif /* CONFIG_HIBERNATION || CONFIG_CMA_AGGRESSIVE */
>>
>>   /* It's optimal to keep kswapds on the same CPUs as their memory, but
>>  not required for correctness.  So if the last cpu in a node goes
>>
>


Re: [PATCH 2/4] (CMA_AGGRESSIVE) Add argument hibernation to function shrink_all_memory

2014-10-17 Thread


On 10/16/14 16:29, Rafael J. Wysocki wrote:
 [CC list trimmed]

 On Thursday, October 16, 2014 11:35:49 AM Hui Zhu wrote:
 Function shrink_all_memory try to free `nr_to_reclaim' of memory.
 CMA_AGGRESSIVE_SHRINK function will call this functon to free 
 `nr_to_reclaim' of
 memory.  It need different scan_control with current caller function
 hibernate_preallocate_memory.

 If hibernation is true, the caller is hibernate_preallocate_memory.
 if not, the caller is CMA alloc function.

 Signed-off-by: Hui Zhu zhu...@xiaomi.com
 ---
   include/linux/swap.h|  3 ++-
   kernel/power/snapshot.c |  2 +-
   mm/vmscan.c | 19 +--
   3 files changed, 16 insertions(+), 8 deletions(-)

 diff --git a/include/linux/swap.h b/include/linux/swap.h
 index 37a585b..9f2cb43 100644
 --- a/include/linux/swap.h
 +++ b/include/linux/swap.h
 @@ -335,7 +335,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct 
 mem_cgroup *mem,
  gfp_t gfp_mask, bool noswap,
  struct zone *zone,
  unsigned long *nr_scanned);
 -extern unsigned long shrink_all_memory(unsigned long nr_pages);
 +extern unsigned long shrink_all_memory(unsigned long nr_pages,
 +   bool hibernation);
   extern int vm_swappiness;
   extern int remove_mapping(struct address_space *mapping, struct page 
 *page);
   extern unsigned long vm_total_pages;
 diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
 index 791a618..a00fc35 100644
 --- a/kernel/power/snapshot.c
 +++ b/kernel/power/snapshot.c
 @@ -1657,7 +1657,7 @@ int hibernate_preallocate_memory(void)
   * NOTE: If this is not done, performance will be hurt badly in some
   * test cases.
   */
 -shrink_all_memory(saveable - size);
 +shrink_all_memory(saveable - size, true);

 Instead of doing this, can you please define

 __shrink_all_memory()

 that will take the appropriate struct scan_control as an argument and
 then define two wrappers around that, one for hibernation and one for CMA?

 The way you did it opens a field for bugs caused by passing a wrong value
 as the second argument.

Thanks Rafael.
I will update patch according to your comments.

Best,
Hui



  /*
   * The number of saveable pages in memory was too high, so apply some
 diff --git a/mm/vmscan.c b/mm/vmscan.c
 index dcb4707..fdcfa30 100644
 --- a/mm/vmscan.c
 +++ b/mm/vmscan.c
 @@ -3404,7 +3404,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum 
 zone_type classzone_idx)
  wake_up_interruptible(pgdat-kswapd_wait);
   }

 -#ifdef CONFIG_HIBERNATION
 +#if defined CONFIG_HIBERNATION || defined CONFIG_CMA_AGGRESSIVE
   /*
* Try to free `nr_to_reclaim' of memory, system-wide, and return the 
 number of
* freed pages.
 @@ -3413,22 +3413,29 @@ void wakeup_kswapd(struct zone *zone, int order, 
 enum zone_type classzone_idx)
* LRU order by reclaiming preferentially
* inactive  active  active referenced  active mapped
*/
 -unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
 +unsigned long shrink_all_memory(unsigned long nr_to_reclaim, bool 
 hibernation)
   {
  struct reclaim_state reclaim_state;
  struct scan_control sc = {
  .nr_to_reclaim = nr_to_reclaim,
 -.gfp_mask = GFP_HIGHUSER_MOVABLE,
  .priority = DEF_PRIORITY,
 -.may_writepage = 1,
  .may_unmap = 1,
  .may_swap = 1,
 -.hibernation_mode = 1,
  };
  struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
  struct task_struct *p = current;
  unsigned long nr_reclaimed;

 +if (hibernation) {
 +sc.hibernation_mode = 1;
 +sc.may_writepage = 1;
 +sc.gfp_mask = GFP_HIGHUSER_MOVABLE;
 +} else {
 +sc.hibernation_mode = 0;
 +sc.may_writepage = !laptop_mode;
 +sc.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_HIGHMEM;
 +}
 +
  p-flags |= PF_MEMALLOC;
  lockdep_set_current_reclaim_state(sc.gfp_mask);
  reclaim_state.reclaimed_slab = 0;
 @@ -3442,7 +3449,7 @@ unsigned long shrink_all_memory(unsigned long 
 nr_to_reclaim)

  return nr_reclaimed;
   }
 -#endif /* CONFIG_HIBERNATION */
 +#endif /* CONFIG_HIBERNATION || CONFIG_CMA_AGGRESSIVE */

   /* It's optimal to keep kswapds on the same CPUs as their memory, but
  not required for correctness.  So if the last cpu in a node goes




Re: [PATCH 0/4] (CMA_AGGRESSIVE) Make CMA memory be more aggressive about allocation

2014-10-17 Thread


On 10/16/14 16:56, Laura Abbott wrote:
 On 10/15/2014 8:35 PM, Hui Zhu wrote:
 In fallbacks of page_alloc.c, MIGRATE_CMA is the fallback of
 MIGRATE_MOVABLE.
 MIGRATE_MOVABLE will use MIGRATE_CMA when it doesn't have a page in
 order that Linux kernel want.

 If a system that has a lot of user space program is running, for
 instance, an Android board, most of memory is in MIGRATE_MOVABLE and
 allocated.  Before function __rmqueue_fallback get memory from
 MIGRATE_CMA, the oom_killer will kill a task to release memory when
 kernel want get MIGRATE_UNMOVABLE memory because fallbacks of
 MIGRATE_UNMOVABLE are MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE.
 This status is odd.  The MIGRATE_CMA has a lot free memory but Linux
 kernel kill some tasks to release memory.

 This patch series adds a new function CMA_AGGRESSIVE to make CMA memory
 be more aggressive about allocation.
 If function CMA_AGGRESSIVE is available, when Linux kernel call function
 __rmqueue try to get pages from MIGRATE_MOVABLE and conditions allow,
 MIGRATE_CMA will be allocated as MIGRATE_MOVABLE first.  If MIGRATE_CMA
 doesn't have enough pages for allocation, go back to allocate memory from
 MIGRATE_MOVABLE.
 Then the memory of MIGRATE_MOVABLE can be kept for MIGRATE_UNMOVABLE and
 MIGRATE_RECLAIMABLE which doesn't have fallback MIGRATE_CMA.


 It's good to see another proposal to fix CMA utilization.

Thanks Laura.

Do you have
 any data about the success rate of CMA contiguous allocation after
 this patch series?   I played around with a similar approach of using
 CMA for MIGRATE_MOVABLE allocations and found that although utilization
 did increase, contiguous allocations failed at a higher rate and were
 much slower. I see what this series is trying to do with avoiding
 allocation from CMA pages when a contiguous allocation is progress.
 My concern is that there would still be problems with contiguous
 allocation after all the MIGRATE_MOVABLE fallback has happened.

I did some test with the cma_alloc_counter and cma-aggressive-shrink in 
a android board that has 1g memory.  Run some apps to make free CMA 
close to the value of cma_aggressive_free_min(500 pages).  A driver 
Begin to request CMA more than 10 times. Each time, it will request more 
than 3000 pages.

I don't have established number for that because it is really hard to 
get a fail.  I think the success rate is over 95% at least.

And I think maybe the isolate fail has relation with page alloc and free 
code.  Maybe let zone-lock protect more code can handle this issue.

Thanks,
Hui


 Thanks,
 Laura



Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that try to kill a task

2014-10-14 Thread
2014 09 24 23:36, Rik van Riel:
> On 09/22/2014 10:57 PM, Hui Zhu wrote:
>> The cause of this issue is when free memroy size is low and a lot of task is
>> trying to shrink the memory, the task that is killed by lowmemkiller cannot 
>> get
>> CPU to exit itself.
>>
>> Fix this issue with change the scheduling policy to SCHED_FIFO if a task's 
>> flag
>> is TIF_MEMDIE in lowmemkiller.
>
> Is it actually true that the task that was killed by lowmemkiller
> cannot get CPU time?

I am so sorry that answer this mail late because I tried to do more test 
around it.
But this issue is really hard to reproduce the issue.  I got a special 
app that can reproduce this issue easyly. But I still need retry a lot 
of times to repdroduce this issue.

And I found that most of time, the task cannot be killed because it is 
blocked by binder_lock.
It looks like there are something wrong with a task that get binder_lock 
and it is blocked by another thing.

So I make a patch that change a binder_lock to binder_lock_killable to 
handle this issue.(I will post it later)
It work sometime but I am not sure it is right.
And I just met one time, the kernel with the binder patch and without 
the lowmemkiller SCHED_FIFO patch, a task that didn't blocked by a lock. 
  And different tasks call lowmemkiller tried to kill this task.
I think the root cause of this issue is killed task cannot get cpu.
But I just got this issue one time.

>
> It is also possible that the task is busy in the kernel, for example
> in the reclaim code, and is not breaking out of some loop fast enough,
> despite the TIF_MEMDIE flag being set.
>
> I suspect SCHED_FIFO simply papers over that kind of issue, by not
> letting anything else run until the task is gone, instead of fixing
> the root cause of the problem.
>
>

According to I introduction, I think lowmemkiller SCHED_FIFO patch maybe 
can handle some issue.

Thanks,
Hui


Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that try to kill a task

2014-10-14 Thread
2014 09 24 23:36, Rik van Riel:
 On 09/22/2014 10:57 PM, Hui Zhu wrote:
 The cause of this issue is when free memroy size is low and a lot of task is
 trying to shrink the memory, the task that is killed by lowmemkiller cannot 
 get
 CPU to exit itself.

 Fix this issue with change the scheduling policy to SCHED_FIFO if a task's 
 flag
 is TIF_MEMDIE in lowmemkiller.

 Is it actually true that the task that was killed by lowmemkiller
 cannot get CPU time?

I am so sorry that answer this mail late because I tried to do more test 
around it.
But this issue is really hard to reproduce the issue.  I got a special 
app that can reproduce this issue easyly. But I still need retry a lot 
of times to repdroduce this issue.

And I found that most of time, the task cannot be killed because it is 
blocked by binder_lock.
It looks like there are something wrong with a task that get binder_lock 
and it is blocked by another thing.

So I make a patch that change a binder_lock to binder_lock_killable to 
handle this issue.(I will post it later)
It work sometime but I am not sure it is right.
And I just met one time, the kernel with the binder patch and without 
the lowmemkiller SCHED_FIFO patch, a task that didn't blocked by a lock. 
  And different tasks call lowmemkiller tried to kill this task.
I think the root cause of this issue is killed task cannot get cpu.
But I just got this issue one time.


 It is also possible that the task is busy in the kernel, for example
 in the reclaim code, and is not breaking out of some loop fast enough,
 despite the TIF_MEMDIE flag being set.

 I suspect SCHED_FIFO simply papers over that kind of issue, by not
 letting anything else run until the task is gone, instead of fixing
 the root cause of the problem.



According to I introduction, I think lowmemkiller SCHED_FIFO patch maybe 
can handle some issue.

Thanks,
Hui


Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that try to kill a task

2014-09-23 Thread
On 09/23/14 16:00, Weijie Yang wrote:
> On Tue, Sep 23, 2014 at 12:48 PM, 朱辉  wrote:
>>
>>
>> On 09/23/14 12:18, Greg KH wrote:
>>> On Tue, Sep 23, 2014 at 10:57:09AM +0800, Hui Zhu wrote:
>>>> The cause of this issue is when free memroy size is low and a lot of task 
>>>> is
>>>> trying to shrink the memory, the task that is killed by lowmemkiller 
>>>> cannot get
>>>> CPU to exit itself.
>>>>
>>>> Fix this issue with change the scheduling policy to SCHED_FIFO if a task's 
>>>> flag
>>>> is TIF_MEMDIE in lowmemkiller.
>>>>
>>>> Signed-off-by: Hui Zhu 
>>>> ---
>>>>drivers/staging/android/lowmemorykiller.c | 4 
>>>>1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/drivers/staging/android/lowmemorykiller.c 
>>>> b/drivers/staging/android/lowmemorykiller.c
>>>> index b545d3d..ca1ffac 100644
>>>> --- a/drivers/staging/android/lowmemorykiller.c
>>>> +++ b/drivers/staging/android/lowmemorykiller.c
>>>> @@ -129,6 +129,10 @@ static unsigned long lowmem_scan(struct shrinker *s, 
>>>> struct shrink_control *sc)
>>>>
>>>>   if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
>>>>   time_before_eq(jiffies, lowmem_deathpending_timeout)) {
>>>> +struct sched_param param = { .sched_priority = 1 };
>>>> +
>>>> +if (p->policy == SCHED_NORMAL)
>>>> +sched_setscheduler(p, SCHED_FIFO, );
>>>
>>> This seems really specific to a specific scheduler pattern now.  Isn't
>>> there some other way to resolve this?
>
> hui, how about modify lowmem_deathpending_timeout if we don't
> touch scheduler pattern?

I tried to change line "lowmem_deathpending_timeout = jiffies + HZ" to 
"lowmem_deathpending_timeout = jiffies + HZ * 10".
But the issue can be reproduced sometimes.
Could you give me some comments on this part?

>
>> I tried to let the task that call lowmemkiller sleep some time when it
>> try to kill same task.  But it doesn't work.
>> I think the issue is that the free memroy size is too low to make more
>> and more tasks come to call lowmemkiller.
>
> I am not opposed to the idea that the task which is selected to be killed
> should exit ASAP.
>
> I want to make it clear, what is problem for the existing code and which
> effect we can get by applying this patch.
> 1. LMK count is increased, which can be reduced by applying this patch?
I think the free mem size will grow faster than without this patch.
> 2. app become more sluggish?
I didn't get that in my part.

>
> By the way, whether we need to modify out_of_memory() which also
> try to kill task?

I am not sure because LMK handle the memory issue early than OOM.
But I think this issue will not affect OOM because OOM has oom_zonelist_trylock 
and oom_zonelist_unlock.

Thanks,
Hui

>
>> Thanks,
>> Hui
>>
>>>
>>> thanks,
>>>
>>> greg k-h
>>>
>

Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that try to kill a task

2014-09-23 Thread
On 09/23/14 16:00, Weijie Yang wrote:
 On Tue, Sep 23, 2014 at 12:48 PM, 朱辉 zhu...@xiaomi.com wrote:


 On 09/23/14 12:18, Greg KH wrote:
 On Tue, Sep 23, 2014 at 10:57:09AM +0800, Hui Zhu wrote:
 The cause of this issue is when free memroy size is low and a lot of task 
 is
 trying to shrink the memory, the task that is killed by lowmemkiller 
 cannot get
 CPU to exit itself.

 Fix this issue with change the scheduling policy to SCHED_FIFO if a task's 
 flag
 is TIF_MEMDIE in lowmemkiller.

 Signed-off-by: Hui Zhu zhu...@xiaomi.com
 ---
drivers/staging/android/lowmemorykiller.c | 4 
1 file changed, 4 insertions(+)

 diff --git a/drivers/staging/android/lowmemorykiller.c 
 b/drivers/staging/android/lowmemorykiller.c
 index b545d3d..ca1ffac 100644
 --- a/drivers/staging/android/lowmemorykiller.c
 +++ b/drivers/staging/android/lowmemorykiller.c
 @@ -129,6 +129,10 @@ static unsigned long lowmem_scan(struct shrinker *s, 
 struct shrink_control *sc)

   if (test_tsk_thread_flag(p, TIF_MEMDIE) 
   time_before_eq(jiffies, lowmem_deathpending_timeout)) {
 +struct sched_param param = { .sched_priority = 1 };
 +
 +if (p-policy == SCHED_NORMAL)
 +sched_setscheduler(p, SCHED_FIFO, param);

 This seems really specific to a specific scheduler pattern now.  Isn't
 there some other way to resolve this?

 hui, how about modify lowmem_deathpending_timeout if we don't
 touch scheduler pattern?

I tried to change line lowmem_deathpending_timeout = jiffies + HZ to 
lowmem_deathpending_timeout = jiffies + HZ * 10.
But the issue can be reproduced sometimes.
Could you give me some comments on this part?


 I tried to let the task that call lowmemkiller sleep some time when it
 try to kill same task.  But it doesn't work.
 I think the issue is that the free memroy size is too low to make more
 and more tasks come to call lowmemkiller.

 I am not opposed to the idea that the task which is selected to be killed
 should exit ASAP.

 I want to make it clear, what is problem for the existing code and which
 effect we can get by applying this patch.
 1. LMK count is increased, which can be reduced by applying this patch?
I think the free mem size will grow faster than without this patch.
 2. app become more sluggish?
I didn't get that in my part.


 By the way, whether we need to modify out_of_memory() which also
 try to kill task?

I am not sure because LMK handle the memory issue early than OOM.
But I think this issue will not affect OOM because OOM has oom_zonelist_trylock 
and oom_zonelist_unlock.

Thanks,
Hui


 Thanks,
 Hui


 thanks,

 greg k-h



Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that try to kill a task

2014-09-22 Thread


On 09/23/14 12:18, Greg KH wrote:
> On Tue, Sep 23, 2014 at 10:57:09AM +0800, Hui Zhu wrote:
>> The cause of this issue is when free memroy size is low and a lot of task is
>> trying to shrink the memory, the task that is killed by lowmemkiller cannot 
>> get
>> CPU to exit itself.
>>
>> Fix this issue with change the scheduling policy to SCHED_FIFO if a task's 
>> flag
>> is TIF_MEMDIE in lowmemkiller.
>>
>> Signed-off-by: Hui Zhu 
>> ---
>>   drivers/staging/android/lowmemorykiller.c | 4 
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/staging/android/lowmemorykiller.c 
>> b/drivers/staging/android/lowmemorykiller.c
>> index b545d3d..ca1ffac 100644
>> --- a/drivers/staging/android/lowmemorykiller.c
>> +++ b/drivers/staging/android/lowmemorykiller.c
>> @@ -129,6 +129,10 @@ static unsigned long lowmem_scan(struct shrinker *s, 
>> struct shrink_control *sc)
>>
>>  if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
>>  time_before_eq(jiffies, lowmem_deathpending_timeout)) {
>> +struct sched_param param = { .sched_priority = 1 };
>> +
>> +if (p->policy == SCHED_NORMAL)
>> +sched_setscheduler(p, SCHED_FIFO, );
>
> This seems really specific to a specific scheduler pattern now.  Isn't
> there some other way to resolve this?

I tried to let the task that call lowmemkiller sleep some time when it 
try to kill same task.  But it doesn't work.
I think the issue is that the free memroy size is too low to make more 
and more tasks come to call lowmemkiller.

Thanks,
Hui

>
> thanks,
>
> greg k-h
>


Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that try to kill a task

2014-09-22 Thread


On 09/23/14 12:18, Greg KH wrote:
 On Tue, Sep 23, 2014 at 10:57:09AM +0800, Hui Zhu wrote:
 The cause of this issue is when free memroy size is low and a lot of task is
 trying to shrink the memory, the task that is killed by lowmemkiller cannot 
 get
 CPU to exit itself.

 Fix this issue with change the scheduling policy to SCHED_FIFO if a task's 
 flag
 is TIF_MEMDIE in lowmemkiller.

 Signed-off-by: Hui Zhu zhu...@xiaomi.com
 ---
   drivers/staging/android/lowmemorykiller.c | 4 
   1 file changed, 4 insertions(+)

 diff --git a/drivers/staging/android/lowmemorykiller.c 
 b/drivers/staging/android/lowmemorykiller.c
 index b545d3d..ca1ffac 100644
 --- a/drivers/staging/android/lowmemorykiller.c
 +++ b/drivers/staging/android/lowmemorykiller.c
 @@ -129,6 +129,10 @@ static unsigned long lowmem_scan(struct shrinker *s, 
 struct shrink_control *sc)

  if (test_tsk_thread_flag(p, TIF_MEMDIE) 
  time_before_eq(jiffies, lowmem_deathpending_timeout)) {
 +struct sched_param param = { .sched_priority = 1 };
 +
 +if (p-policy == SCHED_NORMAL)
 +sched_setscheduler(p, SCHED_FIFO, param);

 This seems really specific to a specific scheduler pattern now.  Isn't
 there some other way to resolve this?

I tried to let the task that call lowmemkiller sleep some time when it 
try to kill same task.  But it doesn't work.
I think the issue is that the free memroy size is too low to make more 
and more tasks come to call lowmemkiller.

Thanks,
Hui


 thanks,

 greg k-h