Re: [PATCH v6 2/6] mm/cma: introduce new zone, ZONE_CMA

2016-11-10 Thread Joonsoo Kim
On Tue, Nov 08, 2016 at 02:59:19PM +0800, Chen Feng wrote:
> 
> 
> On 2016/11/8 11:59, Joonsoo Kim wrote:
> > On Mon, Nov 07, 2016 at 03:46:02PM +0800, Chen Feng wrote:
> >>
> >>
> >> On 2016/11/7 15:44, Chen Feng wrote:
> >>> On 2016/11/7 15:27, Joonsoo Kim wrote:
>  On Mon, Nov 07, 2016 at 03:08:49PM +0800, Chen Feng wrote:
> >
> >
> > On 2016/11/7 14:15, Joonsoo Kim wrote:
> >> On Tue, Nov 01, 2016 at 03:58:32PM +0800, Chen Feng wrote:
> >>> Hello, I hava a question on cma zone.
> >>>
> >>> When we have cma zone, cma zone will be the highest zone of system.
> >>>
> >>> In android system, the most memory allocator is ION. Media system will
> >>> alloc unmovable memory from it.
> >>>
> >>> On low memory scene, will the CMA zone always do balance?
> >>
> >> Allocation request for low zone (normal zone) would not cause CMA zone
> >> to be balanced since it isn't helpful.
> >>
> > Yes. But the cma zone will run out soon. And it always need to do 
> > balance.
> >
> > How about use migrate cma before movable and let cma type to fallback 
> > movable.
> >
> > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1263745.html
> 
>  ZONE_CMA approach will act like as your solution. Could you elaborate
>  more on the problem of zone approach?
> 
> >>>
> >>> The ZONE approach is that makes cma pages in a zone. It can cause a 
> >>> higher swapin/out
> >>> than use migrate cma first.
> > 
> > Interesting result. I should look at it more deeply. Could you explain
> > me why the ZONE approach causes a higher swapin/out?
> > 
> The result is that. I don't have a obvious reason. Maybe add a zone, need to 
> do more balance
> to keep the watermark of cma-zone. cma-zone is always used firstly. Since the 
> test-case
> alloced the same memory in total.

Please do more analysis.
Without the correct analysis, the result doesn't have any meaning. We
can't make sure that it is always better than the other. IMHO, number
is important but more important thing is a theory. Number is just
auxiliary method to prove the theory.

> 
> >>>
> >>> The higher swapin/out may have a performance effect to application. The 
> >>> application may
> >>> use too much time swapin memory.
> >>>
> >>> You can see my tested result attached for detail. And the baseline is 
> >>> result of [1].
> >>>
> >>>
> >> My test case is run 60 applications and alloc 512MB ION memory.
> >>
> >> Repeat this action 50 times
> > 
> > Could you tell me more detail about your test?
> > Kernel version? Total Memory? Total CMA Memory? Android system? What
> > type of memory does ION uses? Other statistics? Etc...
> 
> Tested on 4.1, android 7, 512MB-cma in 4G memory.
> ION use normal unmovable memory, I use it to simulate a camera open operator.

Okay. Kernel version would be the one of the reasons.

On 4.1, there is a fair zone allocator so behaviour of ZONE_CMA is
different with movable first policy. Allocation would be interleaving
between zones. It has pros and cons. The fair zone allocator is
removed in the recent kernel so please test with it on the recent
kernel for apple to apple comparison.

> > 
> > If it tested on the Android, I'm not sure that we need to consider
> > it's result. Android has a lowmemory killer which is quitely different
> > with normal reclaim behaviour.
> Why?

Lowmemory killer don't keep LRU ordering of the page. It uses LRU
ordering of the app and kill the app to reclaim the mermory. It makes
reclaim behaviour quiet different with original one. And, it largely
depends on userspace setting so we can't take care about it.

Thanks.



Re: [PATCH v6 2/6] mm/cma: introduce new zone, ZONE_CMA

2016-11-07 Thread Chen Feng


On 2016/11/8 11:59, Joonsoo Kim wrote:
> On Mon, Nov 07, 2016 at 03:46:02PM +0800, Chen Feng wrote:
>>
>>
>> On 2016/11/7 15:44, Chen Feng wrote:
>>> On 2016/11/7 15:27, Joonsoo Kim wrote:
 On Mon, Nov 07, 2016 at 03:08:49PM +0800, Chen Feng wrote:
>
>
> On 2016/11/7 14:15, Joonsoo Kim wrote:
>> On Tue, Nov 01, 2016 at 03:58:32PM +0800, Chen Feng wrote:
>>> Hello, I hava a question on cma zone.
>>>
>>> When we have cma zone, cma zone will be the highest zone of system.
>>>
>>> In android system, the most memory allocator is ION. Media system will
>>> alloc unmovable memory from it.
>>>
>>> On low memory scene, will the CMA zone always do balance?
>>
>> Allocation request for low zone (normal zone) would not cause CMA zone
>> to be balanced since it isn't helpful.
>>
> Yes. But the cma zone will run out soon. And it always need to do balance.
>
> How about use migrate cma before movable and let cma type to fallback 
> movable.
>
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1263745.html

 ZONE_CMA approach will act like as your solution. Could you elaborate
 more on the problem of zone approach?

>>>
>>> The ZONE approach is that makes cma pages in a zone. It can cause a higher 
>>> swapin/out
>>> than use migrate cma first.
> 
> Interesting result. I should look at it more deeply. Could you explain
> me why the ZONE approach causes a higher swapin/out?
> 
The result is that. I don't have a obvious reason. Maybe add a zone, need to do 
more balance
to keep the watermark of cma-zone. cma-zone is always used firstly. Since the 
test-case
alloced the same memory in total.

>>>
>>> The higher swapin/out may have a performance effect to application. The 
>>> application may
>>> use too much time swapin memory.
>>>
>>> You can see my tested result attached for detail. And the baseline is 
>>> result of [1].
>>>
>>>
>> My test case is run 60 applications and alloc 512MB ION memory.
>>
>> Repeat this action 50 times
> 
> Could you tell me more detail about your test?
> Kernel version? Total Memory? Total CMA Memory? Android system? What
> type of memory does ION uses? Other statistics? Etc...

Tested on 4.1, android 7, 512MB-cma in 4G memory.
ION use normal unmovable memory, I use it to simulate a camera open operator.
> 
> If it tested on the Android, I'm not sure that we need to consider
> it's result. Android has a lowmemory killer which is quitely different
> with normal reclaim behaviour.
Why?
> 
> Thanks.
> 
> 
> .
> 



Re: [PATCH v6 2/6] mm/cma: introduce new zone, ZONE_CMA

2016-11-07 Thread Joonsoo Kim
On Mon, Nov 07, 2016 at 03:46:02PM +0800, Chen Feng wrote:
> 
> 
> On 2016/11/7 15:44, Chen Feng wrote:
> > On 2016/11/7 15:27, Joonsoo Kim wrote:
> >> On Mon, Nov 07, 2016 at 03:08:49PM +0800, Chen Feng wrote:
> >>>
> >>>
> >>> On 2016/11/7 14:15, Joonsoo Kim wrote:
>  On Tue, Nov 01, 2016 at 03:58:32PM +0800, Chen Feng wrote:
> > Hello, I hava a question on cma zone.
> >
> > When we have cma zone, cma zone will be the highest zone of system.
> >
> > In android system, the most memory allocator is ION. Media system will
> > alloc unmovable memory from it.
> >
> > On low memory scene, will the CMA zone always do balance?
> 
>  Allocation request for low zone (normal zone) would not cause CMA zone
>  to be balanced since it isn't helpful.
> 
> >>> Yes. But the cma zone will run out soon. And it always need to do balance.
> >>>
> >>> How about use migrate cma before movable and let cma type to fallback 
> >>> movable.
> >>>
> >>> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1263745.html
> >>
> >> ZONE_CMA approach will act like as your solution. Could you elaborate
> >> more on the problem of zone approach?
> >>
> > 
> > The ZONE approach is that makes cma pages in a zone. It can cause a higher 
> > swapin/out
> > than use migrate cma first.

Interesting result. I should look at it more deeply. Could you explain
me why the ZONE approach causes a higher swapin/out?

> > 
> > The higher swapin/out may have a performance effect to application. The 
> > application may
> > use too much time swapin memory.
> > 
> > You can see my tested result attached for detail. And the baseline is 
> > result of [1].
> > 
> > 
> My test case is run 60 applications and alloc 512MB ION memory.
> 
> Repeat this action 50 times

Could you tell me more detail about your test?
Kernel version? Total Memory? Total CMA Memory? Android system? What
type of memory does ION uses? Other statistics? Etc...

If it tested on the Android, I'm not sure that we need to consider
it's result. Android has a lowmemory killer which is quitely different
with normal reclaim behaviour.

Thanks.



Re: [PATCH v6 2/6] mm/cma: introduce new zone, ZONE_CMA

2016-11-06 Thread Chen Feng


On 2016/11/7 15:44, Chen Feng wrote:
> On 2016/11/7 15:27, Joonsoo Kim wrote:
>> On Mon, Nov 07, 2016 at 03:08:49PM +0800, Chen Feng wrote:
>>>
>>>
>>> On 2016/11/7 14:15, Joonsoo Kim wrote:
 On Tue, Nov 01, 2016 at 03:58:32PM +0800, Chen Feng wrote:
> Hello, I hava a question on cma zone.
>
> When we have cma zone, cma zone will be the highest zone of system.
>
> In android system, the most memory allocator is ION. Media system will
> alloc unmovable memory from it.
>
> On low memory scene, will the CMA zone always do balance?

 Allocation request for low zone (normal zone) would not cause CMA zone
 to be balanced since it isn't helpful.

>>> Yes. But the cma zone will run out soon. And it always need to do balance.
>>>
>>> How about use migrate cma before movable and let cma type to fallback 
>>> movable.
>>>
>>> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1263745.html
>>
>> ZONE_CMA approach will act like as your solution. Could you elaborate
>> more on the problem of zone approach?
>>
> 
> The ZONE approach is that makes cma pages in a zone. It can cause a higher 
> swapin/out
> than use migrate cma first.
> 
> The higher swapin/out may have a performance effect to application. The 
> application may
> use too much time swapin memory.
> 
> You can see my tested result attached for detail. And the baseline is result 
> of [1].
> 
> 
My test case is run 60 applications and alloc 512MB ION memory.

Repeat this action 50 times

> [1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1263745.html
>> Thanks.
>>
>> .
>>



Re: [PATCH v6 2/6] mm/cma: introduce new zone, ZONE_CMA

2016-11-06 Thread Chen Feng
On 2016/11/7 15:27, Joonsoo Kim wrote:
> On Mon, Nov 07, 2016 at 03:08:49PM +0800, Chen Feng wrote:
>>
>>
>> On 2016/11/7 14:15, Joonsoo Kim wrote:
>>> On Tue, Nov 01, 2016 at 03:58:32PM +0800, Chen Feng wrote:
 Hello, I hava a question on cma zone.

 When we have cma zone, cma zone will be the highest zone of system.

 In android system, the most memory allocator is ION. Media system will
 alloc unmovable memory from it.

 On low memory scene, will the CMA zone always do balance?
>>>
>>> Allocation request for low zone (normal zone) would not cause CMA zone
>>> to be balanced since it isn't helpful.
>>>
>> Yes. But the cma zone will run out soon. And it always need to do balance.
>>
>> How about use migrate cma before movable and let cma type to fallback 
>> movable.
>>
>> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1263745.html
> 
> ZONE_CMA approach will act like as your solution. Could you elaborate
> more on the problem of zone approach?
> 

The ZONE approach is that makes cma pages in a zone. It can cause a higher 
swapin/out
than use migrate cma first.

The higher swapin/out may have a performance effect to application. The 
application may
use too much time swapin memory.

You can see my tested result attached for detail. And the baseline is result of 
[1].


[1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1263745.html
> Thanks.
> 
> .
> 


Re: [PATCH v6 2/6] mm/cma: introduce new zone, ZONE_CMA

2016-11-06 Thread Joonsoo Kim
On Mon, Nov 07, 2016 at 03:08:49PM +0800, Chen Feng wrote:
> 
> 
> On 2016/11/7 14:15, Joonsoo Kim wrote:
> > On Tue, Nov 01, 2016 at 03:58:32PM +0800, Chen Feng wrote:
> >> Hello, I hava a question on cma zone.
> >>
> >> When we have cma zone, cma zone will be the highest zone of system.
> >>
> >> In android system, the most memory allocator is ION. Media system will
> >> alloc unmovable memory from it.
> >>
> >> On low memory scene, will the CMA zone always do balance?
> > 
> > Allocation request for low zone (normal zone) would not cause CMA zone
> > to be balanced since it isn't helpful.
> > 
> Yes. But the cma zone will run out soon. And it always need to do balance.
> 
> How about use migrate cma before movable and let cma type to fallback movable.
> 
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1263745.html

ZONE_CMA approach will act like as your solution. Could you elaborate
more on the problem of zone approach?

Thanks.


Re: [PATCH v6 2/6] mm/cma: introduce new zone, ZONE_CMA

2016-11-06 Thread Chen Feng


On 2016/11/7 14:15, Joonsoo Kim wrote:
> On Tue, Nov 01, 2016 at 03:58:32PM +0800, Chen Feng wrote:
>> Hello, I hava a question on cma zone.
>>
>> When we have cma zone, cma zone will be the highest zone of system.
>>
>> In android system, the most memory allocator is ION. Media system will
>> alloc unmovable memory from it.
>>
>> On low memory scene, will the CMA zone always do balance?
> 
> Allocation request for low zone (normal zone) would not cause CMA zone
> to be balanced since it isn't helpful.
> 
Yes. But the cma zone will run out soon. And it always need to do balance.

How about use migrate cma before movable and let cma type to fallback movable.

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1263745.html

>> Should we transmit the highest available zone to kswapd?
> 
> It is already done when necessary.
> 
> Thanks.
> 
> 
> .
> 



Re: [PATCH v6 2/6] mm/cma: introduce new zone, ZONE_CMA

2016-11-06 Thread Joonsoo Kim
On Tue, Nov 01, 2016 at 03:58:32PM +0800, Chen Feng wrote:
> Hello, I hava a question on cma zone.
> 
> When we have cma zone, cma zone will be the highest zone of system.
> 
> In android system, the most memory allocator is ION. Media system will
> alloc unmovable memory from it.
> 
> On low memory scene, will the CMA zone always do balance?

Allocation request for low zone (normal zone) would not cause CMA zone
to be balanced since it isn't helpful.

> Should we transmit the highest available zone to kswapd?

It is already done when necessary.

Thanks.



Re: [PATCH v6 2/6] mm/cma: introduce new zone, ZONE_CMA

2016-11-01 Thread Chen Feng
Hello, I hava a question on cma zone.

When we have cma zone, cma zone will be the highest zone of system.

In android system, the most memory allocator is ION. Media system will
alloc unmovable memory from it.

On low memory scene, will the CMA zone always do balance?

Should we transmit the highest available zone to kswapd?

On 2016/10/14 11:03, js1...@gmail.com wrote:
> From: Joonsoo Kim 
> 
> Attached cover-letter:
> 
> This series try to solve problems of current CMA implementation.
> 
> CMA is introduced to provide physically contiguous pages at runtime
> without exclusive reserved memory area. But, current implementation
> works like as previous reserved memory approach, because freepages
> on CMA region are used only if there is no movable freepage. In other
> words, freepages on CMA region are only used as fallback. In that
> situation where freepages on CMA region are used as fallback, kswapd
> would be woken up easily since there is no unmovable and reclaimable
> freepage, too. If kswapd starts to reclaim memory, fallback allocation
> to MIGRATE_CMA doesn't occur any more since movable freepages are
> already refilled by kswapd and then most of freepage on CMA are left
> to be in free. This situation looks like exclusive reserved memory case.
> 
> In my experiment, I found that if system memory has 1024 MB memory and
> 512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
> free memory is left. Detailed reason is that for keeping enough free
> memory for unmovable and reclaimable allocation, kswapd uses below
> equation when calculating free memory and it easily go under the watermark.
> 
> Free memory for unmovable and reclaimable = Free total - Free CMA pages
> 
> This is derivated from the property of CMA freepage that CMA freepage
> can't be used for unmovable and reclaimable allocation.
> 
> Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
> is lower than low watermark and tries to make free memory until
> (FreeTotal - FreeCMA) is higher than high watermark. That results
> in that FreeTotal is moving around 512MB boundary consistently. It
> then means that we can't utilize full memory capacity.
> 
> To fix this problem, I submitted some patches [1] about 10 months ago,
> but, found some more problems to be fixed before solving this problem.
> It requires many hooks in allocator hotpath so some developers doesn't
> like it. Instead, some of them suggest different approach [2] to fix
> all the problems related to CMA, that is, introducing a new zone to deal
> with free CMA pages. I agree that it is the best way to go so implement
> here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
> decide to add a new zone rather than piggyback on ZONE_MOVABLE since
> they have some differences. First, reserved CMA pages should not be
> offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
> MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
> to distiguish hotpluggable memory and reserved memory for CMA in the same
> zone. It would make memory hotplug code which is already complicated
> more complicated. Second, cma_alloc() can be called more frequently
> than memory hotplug operation and possibly we need to control
> allocation rate of ZONE_CMA to optimize latency in the future.
> In this case, separate zone approach is easy to modify. Third, I'd
> like to see statistics for CMA, separately. Sometimes, we need to debug
> why cma_alloc() is failed and separate statistics would be more helpful
> in this situtaion.
> 
> Anyway, this patchset solves four problems related to CMA implementation.
> 
> 1) Utilization problem
> As mentioned above, we can't utilize full memory capacity due to the
> limitation of CMA freepage and fallback policy. This patchset implements
> a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
> typed allocation is used for page cache and anonymous pages which
> occupies most of memory usage in normal case so we can utilize full
> memory capacity. Below is the experiment result about this problem.
> 
> 8 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
> 
> 
> CMA reserve:0 MB512 MB
> Elapsed-time:   92.4  186.5
> pswpin: 8218647
> pswpout:160   69839
> 
> 
> CMA reserve:0 MB512 MB
> Elapsed-time:   93.1  93.4
> pswpin: 8446
> pswpout:183   92
> 
> FYI, there is another attempt [3] trying to solve this problem in lkml.
> And, as far as I know, Qualcomm also has out-of-tree solution for this
> problem.
> 
> 2) Reclaim problem
> Currently, there is no logic to distinguish CMA pages in reclaim path.
> If reclaim is initiated for unmovable and reclaimable allocation,
> reclaiming CMA pages doesn't help to satisfy the request and reclaiming
> CMA page is just waste. By managing CMA pages in the new zone,

[PATCH v6 2/6] mm/cma: introduce new zone, ZONE_CMA

2016-10-13 Thread js1304
From: Joonsoo Kim 

Attached cover-letter:

This series try to solve problems of current CMA implementation.

CMA is introduced to provide physically contiguous pages at runtime
without exclusive reserved memory area. But, current implementation
works like as previous reserved memory approach, because freepages
on CMA region are used only if there is no movable freepage. In other
words, freepages on CMA region are only used as fallback. In that
situation where freepages on CMA region are used as fallback, kswapd
would be woken up easily since there is no unmovable and reclaimable
freepage, too. If kswapd starts to reclaim memory, fallback allocation
to MIGRATE_CMA doesn't occur any more since movable freepages are
already refilled by kswapd and then most of freepage on CMA are left
to be in free. This situation looks like exclusive reserved memory case.

In my experiment, I found that if system memory has 1024 MB memory and
512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
free memory is left. Detailed reason is that for keeping enough free
memory for unmovable and reclaimable allocation, kswapd uses below
equation when calculating free memory and it easily go under the watermark.

Free memory for unmovable and reclaimable = Free total - Free CMA pages

This is derivated from the property of CMA freepage that CMA freepage
can't be used for unmovable and reclaimable allocation.

Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
is lower than low watermark and tries to make free memory until
(FreeTotal - FreeCMA) is higher than high watermark. That results
in that FreeTotal is moving around 512MB boundary consistently. It
then means that we can't utilize full memory capacity.

To fix this problem, I submitted some patches [1] about 10 months ago,
but, found some more problems to be fixed before solving this problem.
It requires many hooks in allocator hotpath so some developers doesn't
like it. Instead, some of them suggest different approach [2] to fix
all the problems related to CMA, that is, introducing a new zone to deal
with free CMA pages. I agree that it is the best way to go so implement
here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
decide to add a new zone rather than piggyback on ZONE_MOVABLE since
they have some differences. First, reserved CMA pages should not be
offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
to distiguish hotpluggable memory and reserved memory for CMA in the same
zone. It would make memory hotplug code which is already complicated
more complicated. Second, cma_alloc() can be called more frequently
than memory hotplug operation and possibly we need to control
allocation rate of ZONE_CMA to optimize latency in the future.
In this case, separate zone approach is easy to modify. Third, I'd
like to see statistics for CMA, separately. Sometimes, we need to debug
why cma_alloc() is failed and separate statistics would be more helpful
in this situtaion.

Anyway, this patchset solves four problems related to CMA implementation.

1) Utilization problem
As mentioned above, we can't utilize full memory capacity due to the
limitation of CMA freepage and fallback policy. This patchset implements
a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
typed allocation is used for page cache and anonymous pages which
occupies most of memory usage in normal case so we can utilize full
memory capacity. Below is the experiment result about this problem.

8 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16


CMA reserve:0 MB512 MB
Elapsed-time:   92.4186.5
pswpin: 82  18647
pswpout:160 69839


CMA reserve:0 MB512 MB
Elapsed-time:   93.193.4
pswpin: 84  46
pswpout:183 92

FYI, there is another attempt [3] trying to solve this problem in lkml.
And, as far as I know, Qualcomm also has out-of-tree solution for this
problem.

2) Reclaim problem
Currently, there is no logic to distinguish CMA pages in reclaim path.
If reclaim is initiated for unmovable and reclaimable allocation,
reclaiming CMA pages doesn't help to satisfy the request and reclaiming
CMA page is just waste. By managing CMA pages in the new zone, we can
skip to reclaim ZONE_CMA completely if it is unnecessary.

3) Atomic allocation failure problem
Kswapd isn't started to reclaim pages when allocation request is movable
type and there is enough free page in the CMA region. After bunch of
consecutive movable allocation requests, free pages in ordinary region
(not CMA region) would be exhausted without waking up kswapd. At that time,
if atomic unmovable allocation comes, it can't be successful since there
is not enough page in ordinary region. This problem is reported
by Aneesh [4