Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-17 Thread Joonsoo Kim
> > 
> > Okay. We did a lot of discussion so it's better to summarise it.
> > 
> > 1. ZONE_CMA might be a nicer solution than MIGRATETYPE.
> > 2. Additional bit in page flags would cause another kind of
> > maintenance problem so it's better to avoid it as much as possible.
> > 3. Abusing ZONE_MOVABLE looks better than introducing ZONE_CMA since
> > it doesn't need additional bit in page flag.
> > 4. (Not-yet-finished) If ZONE_CMA doesn't need extra bit in page
> > flags with hacky magic and it has no performance regression,
> > ??? (it's okay to use separate zone for CMA?)
> 
> As mentioned above. I do not see why we should go over additional hops
> just to have a zone which is not strictly needed. So if there are no
> inherent problems reusing MOVABLE/HIGMEM zone then a separate zone
> sounds like a wrong direction.
> 
> But let me repeat. I am _not_ convinced that the migratetype situation
> is all that bad and unfixable. You have mentioned some issues with the
> current approach but none of them seem inherently unfixable. So I would
> still prefer keeping the current way. But I am not going to insist if
> you _really_ believe that the long term maintenance cost will be higher
> than a zone approach and you can reuse MOVABLE/HIGHMEM zones without
> disruptive changes. I can help you with the hotplug part of the MOVABLE
> zone because that is desirable on its own.

Okay. Thanks for sharing your opinion. I will decide the final
direction after some investigation.

Thanks.


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-17 Thread Joonsoo Kim
> > 
> > Okay. We did a lot of discussion so it's better to summarise it.
> > 
> > 1. ZONE_CMA might be a nicer solution than MIGRATETYPE.
> > 2. Additional bit in page flags would cause another kind of
> > maintenance problem so it's better to avoid it as much as possible.
> > 3. Abusing ZONE_MOVABLE looks better than introducing ZONE_CMA since
> > it doesn't need additional bit in page flag.
> > 4. (Not-yet-finished) If ZONE_CMA doesn't need extra bit in page
> > flags with hacky magic and it has no performance regression,
> > ??? (it's okay to use separate zone for CMA?)
> 
> As mentioned above. I do not see why we should go over additional hops
> just to have a zone which is not strictly needed. So if there are no
> inherent problems reusing MOVABLE/HIGMEM zone then a separate zone
> sounds like a wrong direction.
> 
> But let me repeat. I am _not_ convinced that the migratetype situation
> is all that bad and unfixable. You have mentioned some issues with the
> current approach but none of them seem inherently unfixable. So I would
> still prefer keeping the current way. But I am not going to insist if
> you _really_ believe that the long term maintenance cost will be higher
> than a zone approach and you can reuse MOVABLE/HIGHMEM zones without
> disruptive changes. I can help you with the hotplug part of the MOVABLE
> zone because that is desirable on its own.

Okay. Thanks for sharing your opinion. I will decide the final
direction after some investigation.

Thanks.


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-16 Thread Michal Hocko
On Mon 15-05-17 12:57:15, Joonsoo Kim wrote:
> On Fri, May 12, 2017 at 08:38:15AM +0200, Michal Hocko wrote:
[...]
> > I really do not want to question your "simple test" but page_zonenum is
> > used in many performance sensitive paths and proving it doesn't regress
> > would require testing many different workload. Are you going to do that?
> 
> In fact, I don't think that we need to take care about this
> performance problem seriously. The reasons are that:
> 
> 1. Currently, there is a usable bit in the page flags.
> 2. Even if others consume one usable bit, there still exists spare bit
> in 64b kernel. And, for 32b kernel, the number of the zone can be five
> if both, ZONE_CMA and ZONE_HIGHMEM, are used. And, using ZONE_HIGHMEM
> in 32b system is out of the trend.
> 3. Even if we fall into the latter category, I can optimize it not to
> regress if both the zone, ZONE_MOVABLE and ZONE_CMA, aren't used
> simultaneously with two zone bits in page flags. However, using both
> zones is not usual case.
> 4. This performance problem only affects CMA users and there is also a
> benefit due to removal of many hooks in MM subsystem so net result would
> not be worse.

A lot of fiddling for something that we can address in a different way,
really.

> So, I think that performance would be better in most of cases. It
> would be magianlly worse in rare cases and they could bear with it. Do
> you still think that using ZONE_MOVABLE for CMA memory is
> necessary rather than separate zone, ZONE_CMA?

yes, because the main point is that a new zone is not really needed
AFAICS. Just try to reuse what we already have (ZONE_MOVABLE). And more
over a new zone just pulls a lot of infrastructure which will be never
used.

> > > > But I feel we are looping without much progress. So let me NAK this
> > > > until it is _proven_ that the current code is unfixable nor ZONE_MOVABLE
> > > > can be reused
> > > 
> > > I want to open all the possibilty so could you check that ZONE_MOVABLE
> > > can be overlapped with other zones? IIRC, your rework doesn't allow
> > > it.
> > 
> > My rework keeps the status quo, which is based on the assumption that
> > zones cannot overlap. A longer term plan is that this restriction is
> > removed. As I've said earlier overlapping zones is an interesting
> > concept which is definitely worth pursuing.
> 
> Okay. We did a lot of discussion so it's better to summarise it.
> 
> 1. ZONE_CMA might be a nicer solution than MIGRATETYPE.
> 2. Additional bit in page flags would cause another kind of
> maintenance problem so it's better to avoid it as much as possible.
> 3. Abusing ZONE_MOVABLE looks better than introducing ZONE_CMA since
> it doesn't need additional bit in page flag.
> 4. (Not-yet-finished) If ZONE_CMA doesn't need extra bit in page
> flags with hacky magic and it has no performance regression,
> ??? (it's okay to use separate zone for CMA?)

As mentioned above. I do not see why we should go over additional hops
just to have a zone which is not strictly needed. So if there are no
inherent problems reusing MOVABLE/HIGMEM zone then a separate zone
sounds like a wrong direction.

But let me repeat. I am _not_ convinced that the migratetype situation
is all that bad and unfixable. You have mentioned some issues with the
current approach but none of them seem inherently unfixable. So I would
still prefer keeping the current way. But I am not going to insist if
you _really_ believe that the long term maintenance cost will be higher
than a zone approach and you can reuse MOVABLE/HIGHMEM zones without
disruptive changes. I can help you with the hotplug part of the MOVABLE
zone because that is desirable on its own.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-16 Thread Michal Hocko
On Mon 15-05-17 12:57:15, Joonsoo Kim wrote:
> On Fri, May 12, 2017 at 08:38:15AM +0200, Michal Hocko wrote:
[...]
> > I really do not want to question your "simple test" but page_zonenum is
> > used in many performance sensitive paths and proving it doesn't regress
> > would require testing many different workload. Are you going to do that?
> 
> In fact, I don't think that we need to take care about this
> performance problem seriously. The reasons are that:
> 
> 1. Currently, there is a usable bit in the page flags.
> 2. Even if others consume one usable bit, there still exists spare bit
> in 64b kernel. And, for 32b kernel, the number of the zone can be five
> if both, ZONE_CMA and ZONE_HIGHMEM, are used. And, using ZONE_HIGHMEM
> in 32b system is out of the trend.
> 3. Even if we fall into the latter category, I can optimize it not to
> regress if both the zone, ZONE_MOVABLE and ZONE_CMA, aren't used
> simultaneously with two zone bits in page flags. However, using both
> zones is not usual case.
> 4. This performance problem only affects CMA users and there is also a
> benefit due to removal of many hooks in MM subsystem so net result would
> not be worse.

A lot of fiddling for something that we can address in a different way,
really.

> So, I think that performance would be better in most of cases. It
> would be magianlly worse in rare cases and they could bear with it. Do
> you still think that using ZONE_MOVABLE for CMA memory is
> necessary rather than separate zone, ZONE_CMA?

yes, because the main point is that a new zone is not really needed
AFAICS. Just try to reuse what we already have (ZONE_MOVABLE). And more
over a new zone just pulls a lot of infrastructure which will be never
used.

> > > > But I feel we are looping without much progress. So let me NAK this
> > > > until it is _proven_ that the current code is unfixable nor ZONE_MOVABLE
> > > > can be reused
> > > 
> > > I want to open all the possibilty so could you check that ZONE_MOVABLE
> > > can be overlapped with other zones? IIRC, your rework doesn't allow
> > > it.
> > 
> > My rework keeps the status quo, which is based on the assumption that
> > zones cannot overlap. A longer term plan is that this restriction is
> > removed. As I've said earlier overlapping zones is an interesting
> > concept which is definitely worth pursuing.
> 
> Okay. We did a lot of discussion so it's better to summarise it.
> 
> 1. ZONE_CMA might be a nicer solution than MIGRATETYPE.
> 2. Additional bit in page flags would cause another kind of
> maintenance problem so it's better to avoid it as much as possible.
> 3. Abusing ZONE_MOVABLE looks better than introducing ZONE_CMA since
> it doesn't need additional bit in page flag.
> 4. (Not-yet-finished) If ZONE_CMA doesn't need extra bit in page
> flags with hacky magic and it has no performance regression,
> ??? (it's okay to use separate zone for CMA?)

As mentioned above. I do not see why we should go over additional hops
just to have a zone which is not strictly needed. So if there are no
inherent problems reusing MOVABLE/HIGMEM zone then a separate zone
sounds like a wrong direction.

But let me repeat. I am _not_ convinced that the migratetype situation
is all that bad and unfixable. You have mentioned some issues with the
current approach but none of them seem inherently unfixable. So I would
still prefer keeping the current way. But I am not going to insist if
you _really_ believe that the long term maintenance cost will be higher
than a zone approach and you can reuse MOVABLE/HIGHMEM zones without
disruptive changes. I can help you with the hotplug part of the MOVABLE
zone because that is desirable on its own.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-14 Thread Joonsoo Kim
On Fri, May 12, 2017 at 08:38:15AM +0200, Michal Hocko wrote:
> On Fri 12-05-17 11:00:48, Joonsoo Kim wrote:
> > On Thu, May 11, 2017 at 11:13:04AM +0200, Michal Hocko wrote:
> > > On Thu 11-05-17 11:12:43, Joonsoo Kim wrote:
> > > > Sorry for the late response. I was on a vacation.
> > > > 
> > > > On Tue, May 02, 2017 at 03:32:29PM +0200, Michal Hocko wrote:
> > > > > On Tue 02-05-17 13:01:32, Joonsoo Kim wrote:
> > > > > > On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote:
> > > > > [...]
> > > > > > > I see this point and I agree that using a specific zone might be a
> > > > > > > _nicer_ solution in the end but you have to consider another 
> > > > > > > aspects as
> > > > > > > well. The main one I am worried about is a long term 
> > > > > > > maintainability.
> > > > > > > We are really out of page flags and consuming one for a rather 
> > > > > > > specific
> > > > > > > usecase is not good. Look at ZONE_DMA. I am pretty sure that 
> > > > > > > almost
> > > > > > > no sane HW needs 16MB zone anymore, yet we have hard time to get 
> > > > > > > rid
> > > > > > > of it and so we have that memory laying around unused all the time
> > > > > > > and blocking one page flag bit. CMA falls into a similar category
> > > > > > > AFAIU. I wouldn't be all that surprised if a future HW will not 
> > > > > > > need CMA
> > > > > > > allocations in few years, yet we will have to fight to get rid of 
> > > > > > > it
> > > > > > > like we do with ZONE_DMA. And not only that. We will also have to 
> > > > > > > fight
> > > > > > > finding page flags for other more general usecases in the 
> > > > > > > meantime.
> > > > > > 
> > > > > > This maintenance problem is inherent. This problem exists even if we
> > > > > > uses MIGRATETYPE approach. We cannot remove many hooks for CMA if a
> > > > > > future HW will not need CMA allocation in few years. The only
> > > > > > difference is that one takes single zone bit only for CMA user and 
> > > > > > the
> > > > > > other approach takes many hooks that we need to take care about it 
> > > > > > all
> > > > > > the time.
> > > > > 
> > > > > And I consider this a big difference. Because while hooks are not nice
> > > > > they will affect CMA users (in a sense of bugs/performance etc.). 
> > > > > While
> > > > > an additional bit consumed will affect potential future and more 
> > > > > generic
> > > > > features.
> > > > 
> > > > Because these hooks are so tricky and are spread on many places,
> > > > bugs about these hooks can be made by *non-CMA* user and they hurt
> > > > *CMA* user. These hooks could also delay non-CMA user's development 
> > > > speed
> > > > since there are many hooks about CMA and understanding how CMA is 
> > > > managed
> > > > is rather difficult.
> > > 
> > > Than make those hooks easier to maintain. Seriously this is a
> > > non-argument.
> > 
> > I can't understand what you said here. 
> 
> I wanted to say that you can make those hooks so non-intrusive that
> nobody outside of the CMA has to even care that CMA exists.

I guess that current code is the result of such effort and it would be
intrusive.

> 
> > With zone approach, someone who
> > isn't related to CMA don't need to understand how CMA is managed.
> > 
> > > 
> > > [...]
> > > 
> > > > > And all this can be isolated to CMA specific hooks with mostly minimum
> > > > > impact to most users. I hear you saying that zone approach is more 
> > > > > natural
> > > > > and I would agree if we wouldn't have to care about the number of 
> > > > > zones.
> > > > 
> > > > I attach a solution about one more bit in page flags although I don't
> > > > agree with your opinion that additional bit is no-go approach. Just
> > > > note that we have already used three bits for zone encoding in some
> > > > configuration due to ZONE_DEVICE.
> > > 
> > > I am absolutely not happy about ZONE_DEVICE but there is _no_ other
> > > viable solution right now. I know that people behind this are still
> > > considering struct page over direct pfn usage but they are not in the
> > > same situation as CMA which _can_ work without additional zone.
> > 
> > IIUC, ZONE_DEVICE can reuse the other zone and migratetype.
> 
> They are not going to migrate anything or define any allocation fallback
> policy because those pages are outside of the page allocator completely.
> And that is why a zone approach is a reasonable approach. There are
> probably other ways and I will certainly push going that way.

I have a different opinion but it's not a main issue here so I don't
argue anymore.

> [...]
> 
> > > If you _really_ insist on using zone for CMA then reuse ZONE_MOVABLE.
> > > I absolutely miss why do you push a specialized zone so hard.
> > 
> > As I said before, there is no fundamental issue to reuse ZONE_MOVABLE.
> > I just don't want to reuse it because they have a different
> > characteristic. In MM subsystem context, their characteristic is the same.
> > However, CMA memory should be used for the 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-14 Thread Joonsoo Kim
On Fri, May 12, 2017 at 08:38:15AM +0200, Michal Hocko wrote:
> On Fri 12-05-17 11:00:48, Joonsoo Kim wrote:
> > On Thu, May 11, 2017 at 11:13:04AM +0200, Michal Hocko wrote:
> > > On Thu 11-05-17 11:12:43, Joonsoo Kim wrote:
> > > > Sorry for the late response. I was on a vacation.
> > > > 
> > > > On Tue, May 02, 2017 at 03:32:29PM +0200, Michal Hocko wrote:
> > > > > On Tue 02-05-17 13:01:32, Joonsoo Kim wrote:
> > > > > > On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote:
> > > > > [...]
> > > > > > > I see this point and I agree that using a specific zone might be a
> > > > > > > _nicer_ solution in the end but you have to consider another 
> > > > > > > aspects as
> > > > > > > well. The main one I am worried about is a long term 
> > > > > > > maintainability.
> > > > > > > We are really out of page flags and consuming one for a rather 
> > > > > > > specific
> > > > > > > usecase is not good. Look at ZONE_DMA. I am pretty sure that 
> > > > > > > almost
> > > > > > > no sane HW needs 16MB zone anymore, yet we have hard time to get 
> > > > > > > rid
> > > > > > > of it and so we have that memory laying around unused all the time
> > > > > > > and blocking one page flag bit. CMA falls into a similar category
> > > > > > > AFAIU. I wouldn't be all that surprised if a future HW will not 
> > > > > > > need CMA
> > > > > > > allocations in few years, yet we will have to fight to get rid of 
> > > > > > > it
> > > > > > > like we do with ZONE_DMA. And not only that. We will also have to 
> > > > > > > fight
> > > > > > > finding page flags for other more general usecases in the 
> > > > > > > meantime.
> > > > > > 
> > > > > > This maintenance problem is inherent. This problem exists even if we
> > > > > > uses MIGRATETYPE approach. We cannot remove many hooks for CMA if a
> > > > > > future HW will not need CMA allocation in few years. The only
> > > > > > difference is that one takes single zone bit only for CMA user and 
> > > > > > the
> > > > > > other approach takes many hooks that we need to take care about it 
> > > > > > all
> > > > > > the time.
> > > > > 
> > > > > And I consider this a big difference. Because while hooks are not nice
> > > > > they will affect CMA users (in a sense of bugs/performance etc.). 
> > > > > While
> > > > > an additional bit consumed will affect potential future and more 
> > > > > generic
> > > > > features.
> > > > 
> > > > Because these hooks are so tricky and are spread on many places,
> > > > bugs about these hooks can be made by *non-CMA* user and they hurt
> > > > *CMA* user. These hooks could also delay non-CMA user's development 
> > > > speed
> > > > since there are many hooks about CMA and understanding how CMA is 
> > > > managed
> > > > is rather difficult.
> > > 
> > > Than make those hooks easier to maintain. Seriously this is a
> > > non-argument.
> > 
> > I can't understand what you said here. 
> 
> I wanted to say that you can make those hooks so non-intrusive that
> nobody outside of the CMA has to even care that CMA exists.

I guess that current code is the result of such effort and it would be
intrusive.

> 
> > With zone approach, someone who
> > isn't related to CMA don't need to understand how CMA is managed.
> > 
> > > 
> > > [...]
> > > 
> > > > > And all this can be isolated to CMA specific hooks with mostly minimum
> > > > > impact to most users. I hear you saying that zone approach is more 
> > > > > natural
> > > > > and I would agree if we wouldn't have to care about the number of 
> > > > > zones.
> > > > 
> > > > I attach a solution about one more bit in page flags although I don't
> > > > agree with your opinion that additional bit is no-go approach. Just
> > > > note that we have already used three bits for zone encoding in some
> > > > configuration due to ZONE_DEVICE.
> > > 
> > > I am absolutely not happy about ZONE_DEVICE but there is _no_ other
> > > viable solution right now. I know that people behind this are still
> > > considering struct page over direct pfn usage but they are not in the
> > > same situation as CMA which _can_ work without additional zone.
> > 
> > IIUC, ZONE_DEVICE can reuse the other zone and migratetype.
> 
> They are not going to migrate anything or define any allocation fallback
> policy because those pages are outside of the page allocator completely.
> And that is why a zone approach is a reasonable approach. There are
> probably other ways and I will certainly push going that way.

I have a different opinion but it's not a main issue here so I don't
argue anymore.

> [...]
> 
> > > If you _really_ insist on using zone for CMA then reuse ZONE_MOVABLE.
> > > I absolutely miss why do you push a specialized zone so hard.
> > 
> > As I said before, there is no fundamental issue to reuse ZONE_MOVABLE.
> > I just don't want to reuse it because they have a different
> > characteristic. In MM subsystem context, their characteristic is the same.
> > However, CMA memory should be used for the 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-12 Thread Michal Hocko
On Fri 12-05-17 11:00:48, Joonsoo Kim wrote:
> On Thu, May 11, 2017 at 11:13:04AM +0200, Michal Hocko wrote:
> > On Thu 11-05-17 11:12:43, Joonsoo Kim wrote:
> > > Sorry for the late response. I was on a vacation.
> > > 
> > > On Tue, May 02, 2017 at 03:32:29PM +0200, Michal Hocko wrote:
> > > > On Tue 02-05-17 13:01:32, Joonsoo Kim wrote:
> > > > > On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote:
> > > > [...]
> > > > > > I see this point and I agree that using a specific zone might be a
> > > > > > _nicer_ solution in the end but you have to consider another 
> > > > > > aspects as
> > > > > > well. The main one I am worried about is a long term 
> > > > > > maintainability.
> > > > > > We are really out of page flags and consuming one for a rather 
> > > > > > specific
> > > > > > usecase is not good. Look at ZONE_DMA. I am pretty sure that almost
> > > > > > no sane HW needs 16MB zone anymore, yet we have hard time to get rid
> > > > > > of it and so we have that memory laying around unused all the time
> > > > > > and blocking one page flag bit. CMA falls into a similar category
> > > > > > AFAIU. I wouldn't be all that surprised if a future HW will not 
> > > > > > need CMA
> > > > > > allocations in few years, yet we will have to fight to get rid of it
> > > > > > like we do with ZONE_DMA. And not only that. We will also have to 
> > > > > > fight
> > > > > > finding page flags for other more general usecases in the meantime.
> > > > > 
> > > > > This maintenance problem is inherent. This problem exists even if we
> > > > > uses MIGRATETYPE approach. We cannot remove many hooks for CMA if a
> > > > > future HW will not need CMA allocation in few years. The only
> > > > > difference is that one takes single zone bit only for CMA user and the
> > > > > other approach takes many hooks that we need to take care about it all
> > > > > the time.
> > > > 
> > > > And I consider this a big difference. Because while hooks are not nice
> > > > they will affect CMA users (in a sense of bugs/performance etc.). While
> > > > an additional bit consumed will affect potential future and more generic
> > > > features.
> > > 
> > > Because these hooks are so tricky and are spread on many places,
> > > bugs about these hooks can be made by *non-CMA* user and they hurt
> > > *CMA* user. These hooks could also delay non-CMA user's development speed
> > > since there are many hooks about CMA and understanding how CMA is managed
> > > is rather difficult.
> > 
> > Than make those hooks easier to maintain. Seriously this is a
> > non-argument.
> 
> I can't understand what you said here. 

I wanted to say that you can make those hooks so non-intrusive that
nobody outside of the CMA has to even care that CMA exists.

> With zone approach, someone who
> isn't related to CMA don't need to understand how CMA is managed.
> 
> > 
> > [...]
> > 
> > > > And all this can be isolated to CMA specific hooks with mostly minimum
> > > > impact to most users. I hear you saying that zone approach is more 
> > > > natural
> > > > and I would agree if we wouldn't have to care about the number of zones.
> > > 
> > > I attach a solution about one more bit in page flags although I don't
> > > agree with your opinion that additional bit is no-go approach. Just
> > > note that we have already used three bits for zone encoding in some
> > > configuration due to ZONE_DEVICE.
> > 
> > I am absolutely not happy about ZONE_DEVICE but there is _no_ other
> > viable solution right now. I know that people behind this are still
> > considering struct page over direct pfn usage but they are not in the
> > same situation as CMA which _can_ work without additional zone.
> 
> IIUC, ZONE_DEVICE can reuse the other zone and migratetype.

They are not going to migrate anything or define any allocation fallback
policy because those pages are outside of the page allocator completely.
And that is why a zone approach is a reasonable approach. There are
probably other ways and I will certainly push going that way.

[...]

> > If you _really_ insist on using zone for CMA then reuse ZONE_MOVABLE.
> > I absolutely miss why do you push a specialized zone so hard.
> 
> As I said before, there is no fundamental issue to reuse ZONE_MOVABLE.
> I just don't want to reuse it because they have a different
> characteristic. In MM subsystem context, their characteristic is the same.
> However, CMA memory should be used for the device in runtime so more
> allocation guarantee is needed. See the offline_pages() in
> mm/memory_hotplug.c. They can bear in 120 sec to offline the
> memory but CMA memory can't.

This is just an implementation detail. Pinned pages in the CMA ranges
should be easilly checked. Moreover memory hotplug cares only about
hotplugable memory and placing CMA ranges there could be seen as a
configuration bug.

> And, this is a design issue. I don't want to talk about why should we
> pursuit the good design. Originally, ZONE exists to manage 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-12 Thread Michal Hocko
On Fri 12-05-17 11:00:48, Joonsoo Kim wrote:
> On Thu, May 11, 2017 at 11:13:04AM +0200, Michal Hocko wrote:
> > On Thu 11-05-17 11:12:43, Joonsoo Kim wrote:
> > > Sorry for the late response. I was on a vacation.
> > > 
> > > On Tue, May 02, 2017 at 03:32:29PM +0200, Michal Hocko wrote:
> > > > On Tue 02-05-17 13:01:32, Joonsoo Kim wrote:
> > > > > On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote:
> > > > [...]
> > > > > > I see this point and I agree that using a specific zone might be a
> > > > > > _nicer_ solution in the end but you have to consider another 
> > > > > > aspects as
> > > > > > well. The main one I am worried about is a long term 
> > > > > > maintainability.
> > > > > > We are really out of page flags and consuming one for a rather 
> > > > > > specific
> > > > > > usecase is not good. Look at ZONE_DMA. I am pretty sure that almost
> > > > > > no sane HW needs 16MB zone anymore, yet we have hard time to get rid
> > > > > > of it and so we have that memory laying around unused all the time
> > > > > > and blocking one page flag bit. CMA falls into a similar category
> > > > > > AFAIU. I wouldn't be all that surprised if a future HW will not 
> > > > > > need CMA
> > > > > > allocations in few years, yet we will have to fight to get rid of it
> > > > > > like we do with ZONE_DMA. And not only that. We will also have to 
> > > > > > fight
> > > > > > finding page flags for other more general usecases in the meantime.
> > > > > 
> > > > > This maintenance problem is inherent. This problem exists even if we
> > > > > uses MIGRATETYPE approach. We cannot remove many hooks for CMA if a
> > > > > future HW will not need CMA allocation in few years. The only
> > > > > difference is that one takes single zone bit only for CMA user and the
> > > > > other approach takes many hooks that we need to take care about it all
> > > > > the time.
> > > > 
> > > > And I consider this a big difference. Because while hooks are not nice
> > > > they will affect CMA users (in a sense of bugs/performance etc.). While
> > > > an additional bit consumed will affect potential future and more generic
> > > > features.
> > > 
> > > Because these hooks are so tricky and are spread on many places,
> > > bugs about these hooks can be made by *non-CMA* user and they hurt
> > > *CMA* user. These hooks could also delay non-CMA user's development speed
> > > since there are many hooks about CMA and understanding how CMA is managed
> > > is rather difficult.
> > 
> > Than make those hooks easier to maintain. Seriously this is a
> > non-argument.
> 
> I can't understand what you said here. 

I wanted to say that you can make those hooks so non-intrusive that
nobody outside of the CMA has to even care that CMA exists.

> With zone approach, someone who
> isn't related to CMA don't need to understand how CMA is managed.
> 
> > 
> > [...]
> > 
> > > > And all this can be isolated to CMA specific hooks with mostly minimum
> > > > impact to most users. I hear you saying that zone approach is more 
> > > > natural
> > > > and I would agree if we wouldn't have to care about the number of zones.
> > > 
> > > I attach a solution about one more bit in page flags although I don't
> > > agree with your opinion that additional bit is no-go approach. Just
> > > note that we have already used three bits for zone encoding in some
> > > configuration due to ZONE_DEVICE.
> > 
> > I am absolutely not happy about ZONE_DEVICE but there is _no_ other
> > viable solution right now. I know that people behind this are still
> > considering struct page over direct pfn usage but they are not in the
> > same situation as CMA which _can_ work without additional zone.
> 
> IIUC, ZONE_DEVICE can reuse the other zone and migratetype.

They are not going to migrate anything or define any allocation fallback
policy because those pages are outside of the page allocator completely.
And that is why a zone approach is a reasonable approach. There are
probably other ways and I will certainly push going that way.

[...]

> > If you _really_ insist on using zone for CMA then reuse ZONE_MOVABLE.
> > I absolutely miss why do you push a specialized zone so hard.
> 
> As I said before, there is no fundamental issue to reuse ZONE_MOVABLE.
> I just don't want to reuse it because they have a different
> characteristic. In MM subsystem context, their characteristic is the same.
> However, CMA memory should be used for the device in runtime so more
> allocation guarantee is needed. See the offline_pages() in
> mm/memory_hotplug.c. They can bear in 120 sec to offline the
> memory but CMA memory can't.

This is just an implementation detail. Pinned pages in the CMA ranges
should be easilly checked. Moreover memory hotplug cares only about
hotplugable memory and placing CMA ranges there could be seen as a
configuration bug.

> And, this is a design issue. I don't want to talk about why should we
> pursuit the good design. Originally, ZONE exists to manage 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-11 Thread Joonsoo Kim
On Thu, May 11, 2017 at 11:13:04AM +0200, Michal Hocko wrote:
> On Thu 11-05-17 11:12:43, Joonsoo Kim wrote:
> > Sorry for the late response. I was on a vacation.
> > 
> > On Tue, May 02, 2017 at 03:32:29PM +0200, Michal Hocko wrote:
> > > On Tue 02-05-17 13:01:32, Joonsoo Kim wrote:
> > > > On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote:
> > > [...]
> > > > > I see this point and I agree that using a specific zone might be a
> > > > > _nicer_ solution in the end but you have to consider another aspects 
> > > > > as
> > > > > well. The main one I am worried about is a long term maintainability.
> > > > > We are really out of page flags and consuming one for a rather 
> > > > > specific
> > > > > usecase is not good. Look at ZONE_DMA. I am pretty sure that almost
> > > > > no sane HW needs 16MB zone anymore, yet we have hard time to get rid
> > > > > of it and so we have that memory laying around unused all the time
> > > > > and blocking one page flag bit. CMA falls into a similar category
> > > > > AFAIU. I wouldn't be all that surprised if a future HW will not need 
> > > > > CMA
> > > > > allocations in few years, yet we will have to fight to get rid of it
> > > > > like we do with ZONE_DMA. And not only that. We will also have to 
> > > > > fight
> > > > > finding page flags for other more general usecases in the meantime.
> > > > 
> > > > This maintenance problem is inherent. This problem exists even if we
> > > > uses MIGRATETYPE approach. We cannot remove many hooks for CMA if a
> > > > future HW will not need CMA allocation in few years. The only
> > > > difference is that one takes single zone bit only for CMA user and the
> > > > other approach takes many hooks that we need to take care about it all
> > > > the time.
> > > 
> > > And I consider this a big difference. Because while hooks are not nice
> > > they will affect CMA users (in a sense of bugs/performance etc.). While
> > > an additional bit consumed will affect potential future and more generic
> > > features.
> > 
> > Because these hooks are so tricky and are spread on many places,
> > bugs about these hooks can be made by *non-CMA* user and they hurt
> > *CMA* user. These hooks could also delay non-CMA user's development speed
> > since there are many hooks about CMA and understanding how CMA is managed
> > is rather difficult.
> 
> Than make those hooks easier to maintain. Seriously this is a
> non-argument.

I can't understand what you said here. With zone approach, someone who
isn't related to CMA don't need to understand how CMA is managed.

> 
> [...]
> 
> > > And all this can be isolated to CMA specific hooks with mostly minimum
> > > impact to most users. I hear you saying that zone approach is more natural
> > > and I would agree if we wouldn't have to care about the number of zones.
> > 
> > I attach a solution about one more bit in page flags although I don't
> > agree with your opinion that additional bit is no-go approach. Just
> > note that we have already used three bits for zone encoding in some
> > configuration due to ZONE_DEVICE.
> 
> I am absolutely not happy about ZONE_DEVICE but there is _no_ other
> viable solution right now. I know that people behind this are still
> considering struct page over direct pfn usage but they are not in the
> same situation as CMA which _can_ work without additional zone.

IIUC, ZONE_DEVICE can reuse the other zone and migratetype. What
they need is just struct page and separate zone is not necessarily needed.
The other thing that they want is to distinguish if the page is for
the ZONE_DEVICE memory or not so it can use similar approach with CMA.

IMHO, there is almost nothing that _cannot_ work in S/W world. What we
need to consider is just trade-off. So, please don't say impossibility
again.

> 
> If you _really_ insist on using zone for CMA then reuse ZONE_MOVABLE.
> I absolutely miss why do you push a specialized zone so hard.

As I said before, there is no fundamental issue to reuse ZONE_MOVABLE.
I just don't want to reuse it because they have a different
characteristic. In MM subsystem context, their characteristic is the same.
However, CMA memory should be used for the device in runtime so more
allocation guarantee is needed. See the offline_pages() in
mm/memory_hotplug.c. They can bear in 120 sec to offline the
memory but CMA memory can't.

And, this is a design issue. I don't want to talk about why should we
pursuit the good design. Originally, ZONE exists to manage different
type of memory. Migratetype is not for that purpose. Using separate
zone fits the original purpose. Mixing them would be a bad design and
we would esaily encounter the unexpected problem in the future.

> 
> [...]
> > > No, but I haven't heard any single argument that those bugs are
> > > impossible to fix with the current approach. They might be harder to fix
> > > but if I can chose between harder for CMA and harder for other more
> > > generic HW independent features I will go for the 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-11 Thread Joonsoo Kim
On Thu, May 11, 2017 at 11:13:04AM +0200, Michal Hocko wrote:
> On Thu 11-05-17 11:12:43, Joonsoo Kim wrote:
> > Sorry for the late response. I was on a vacation.
> > 
> > On Tue, May 02, 2017 at 03:32:29PM +0200, Michal Hocko wrote:
> > > On Tue 02-05-17 13:01:32, Joonsoo Kim wrote:
> > > > On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote:
> > > [...]
> > > > > I see this point and I agree that using a specific zone might be a
> > > > > _nicer_ solution in the end but you have to consider another aspects 
> > > > > as
> > > > > well. The main one I am worried about is a long term maintainability.
> > > > > We are really out of page flags and consuming one for a rather 
> > > > > specific
> > > > > usecase is not good. Look at ZONE_DMA. I am pretty sure that almost
> > > > > no sane HW needs 16MB zone anymore, yet we have hard time to get rid
> > > > > of it and so we have that memory laying around unused all the time
> > > > > and blocking one page flag bit. CMA falls into a similar category
> > > > > AFAIU. I wouldn't be all that surprised if a future HW will not need 
> > > > > CMA
> > > > > allocations in few years, yet we will have to fight to get rid of it
> > > > > like we do with ZONE_DMA. And not only that. We will also have to 
> > > > > fight
> > > > > finding page flags for other more general usecases in the meantime.
> > > > 
> > > > This maintenance problem is inherent. This problem exists even if we
> > > > uses MIGRATETYPE approach. We cannot remove many hooks for CMA if a
> > > > future HW will not need CMA allocation in few years. The only
> > > > difference is that one takes single zone bit only for CMA user and the
> > > > other approach takes many hooks that we need to take care about it all
> > > > the time.
> > > 
> > > And I consider this a big difference. Because while hooks are not nice
> > > they will affect CMA users (in a sense of bugs/performance etc.). While
> > > an additional bit consumed will affect potential future and more generic
> > > features.
> > 
> > Because these hooks are so tricky and are spread on many places,
> > bugs about these hooks can be made by *non-CMA* user and they hurt
> > *CMA* user. These hooks could also delay non-CMA user's development speed
> > since there are many hooks about CMA and understanding how CMA is managed
> > is rather difficult.
> 
> Than make those hooks easier to maintain. Seriously this is a
> non-argument.

I can't understand what you said here. With zone approach, someone who
isn't related to CMA don't need to understand how CMA is managed.

> 
> [...]
> 
> > > And all this can be isolated to CMA specific hooks with mostly minimum
> > > impact to most users. I hear you saying that zone approach is more natural
> > > and I would agree if we wouldn't have to care about the number of zones.
> > 
> > I attach a solution about one more bit in page flags although I don't
> > agree with your opinion that additional bit is no-go approach. Just
> > note that we have already used three bits for zone encoding in some
> > configuration due to ZONE_DEVICE.
> 
> I am absolutely not happy about ZONE_DEVICE but there is _no_ other
> viable solution right now. I know that people behind this are still
> considering struct page over direct pfn usage but they are not in the
> same situation as CMA which _can_ work without additional zone.

IIUC, ZONE_DEVICE can reuse the other zone and migratetype. What
they need is just struct page and separate zone is not necessarily needed.
The other thing that they want is to distinguish if the page is for
the ZONE_DEVICE memory or not so it can use similar approach with CMA.

IMHO, there is almost nothing that _cannot_ work in S/W world. What we
need to consider is just trade-off. So, please don't say impossibility
again.

> 
> If you _really_ insist on using zone for CMA then reuse ZONE_MOVABLE.
> I absolutely miss why do you push a specialized zone so hard.

As I said before, there is no fundamental issue to reuse ZONE_MOVABLE.
I just don't want to reuse it because they have a different
characteristic. In MM subsystem context, their characteristic is the same.
However, CMA memory should be used for the device in runtime so more
allocation guarantee is needed. See the offline_pages() in
mm/memory_hotplug.c. They can bear in 120 sec to offline the
memory but CMA memory can't.

And, this is a design issue. I don't want to talk about why should we
pursuit the good design. Originally, ZONE exists to manage different
type of memory. Migratetype is not for that purpose. Using separate
zone fits the original purpose. Mixing them would be a bad design and
we would esaily encounter the unexpected problem in the future.

> 
> [...]
> > > No, but I haven't heard any single argument that those bugs are
> > > impossible to fix with the current approach. They might be harder to fix
> > > but if I can chose between harder for CMA and harder for other more
> > > generic HW independent features I will go for the 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-11 Thread Michal Hocko
On Thu 11-05-17 11:12:43, Joonsoo Kim wrote:
> Sorry for the late response. I was on a vacation.
> 
> On Tue, May 02, 2017 at 03:32:29PM +0200, Michal Hocko wrote:
> > On Tue 02-05-17 13:01:32, Joonsoo Kim wrote:
> > > On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote:
> > [...]
> > > > I see this point and I agree that using a specific zone might be a
> > > > _nicer_ solution in the end but you have to consider another aspects as
> > > > well. The main one I am worried about is a long term maintainability.
> > > > We are really out of page flags and consuming one for a rather specific
> > > > usecase is not good. Look at ZONE_DMA. I am pretty sure that almost
> > > > no sane HW needs 16MB zone anymore, yet we have hard time to get rid
> > > > of it and so we have that memory laying around unused all the time
> > > > and blocking one page flag bit. CMA falls into a similar category
> > > > AFAIU. I wouldn't be all that surprised if a future HW will not need CMA
> > > > allocations in few years, yet we will have to fight to get rid of it
> > > > like we do with ZONE_DMA. And not only that. We will also have to fight
> > > > finding page flags for other more general usecases in the meantime.
> > > 
> > > This maintenance problem is inherent. This problem exists even if we
> > > uses MIGRATETYPE approach. We cannot remove many hooks for CMA if a
> > > future HW will not need CMA allocation in few years. The only
> > > difference is that one takes single zone bit only for CMA user and the
> > > other approach takes many hooks that we need to take care about it all
> > > the time.
> > 
> > And I consider this a big difference. Because while hooks are not nice
> > they will affect CMA users (in a sense of bugs/performance etc.). While
> > an additional bit consumed will affect potential future and more generic
> > features.
> 
> Because these hooks are so tricky and are spread on many places,
> bugs about these hooks can be made by *non-CMA* user and they hurt
> *CMA* user. These hooks could also delay non-CMA user's development speed
> since there are many hooks about CMA and understanding how CMA is managed
> is rather difficult.

Than make those hooks easier to maintain. Seriously this is a
non-argument.

[...]

> > And all this can be isolated to CMA specific hooks with mostly minimum
> > impact to most users. I hear you saying that zone approach is more natural
> > and I would agree if we wouldn't have to care about the number of zones.
> 
> I attach a solution about one more bit in page flags although I don't
> agree with your opinion that additional bit is no-go approach. Just
> note that we have already used three bits for zone encoding in some
> configuration due to ZONE_DEVICE.

I am absolutely not happy about ZONE_DEVICE but there is _no_ other
viable solution right now. I know that people behind this are still
considering struct page over direct pfn usage but they are not in the
same situation as CMA which _can_ work without additional zone.

If you _really_ insist on using zone for CMA then reuse ZONE_MOVABLE.
I absolutely miss why do you push a specialized zone so hard.

[...]
> > No, but I haven't heard any single argument that those bugs are
> > impossible to fix with the current approach. They might be harder to fix
> > but if I can chose between harder for CMA and harder for other more
> > generic HW independent features I will go for the first one. And do not
> > take me wrong, I have nothing against CMA as such. It solves a real life
> > problem. I just believe it doesn't deserve to consume a new bit in page
> > flags because that is just too scarce resource.
> 
> As I mentioned above, I think that maintenance overhead due to CMA
> deserves to consume a new bit in page flags. It also provide us
> extendability to introduce more zones in the future.
> 
> Anyway, this value-judgement is subjective so I guess that we
> cannot agree with each other. To solve your concern,
> I make following solution. Please let me know your opinion about this.
> This patch can be applied on top of my ZONE_CMA series.

I don not think this makes situation any easier or more acceptable for
merging.

But I feel we are looping without much progress. So let me NAK this
until it is _proven_ that the current code is unfixable nor ZONE_MOVABLE
can be reused
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-11 Thread Michal Hocko
On Thu 11-05-17 11:12:43, Joonsoo Kim wrote:
> Sorry for the late response. I was on a vacation.
> 
> On Tue, May 02, 2017 at 03:32:29PM +0200, Michal Hocko wrote:
> > On Tue 02-05-17 13:01:32, Joonsoo Kim wrote:
> > > On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote:
> > [...]
> > > > I see this point and I agree that using a specific zone might be a
> > > > _nicer_ solution in the end but you have to consider another aspects as
> > > > well. The main one I am worried about is a long term maintainability.
> > > > We are really out of page flags and consuming one for a rather specific
> > > > usecase is not good. Look at ZONE_DMA. I am pretty sure that almost
> > > > no sane HW needs 16MB zone anymore, yet we have hard time to get rid
> > > > of it and so we have that memory laying around unused all the time
> > > > and blocking one page flag bit. CMA falls into a similar category
> > > > AFAIU. I wouldn't be all that surprised if a future HW will not need CMA
> > > > allocations in few years, yet we will have to fight to get rid of it
> > > > like we do with ZONE_DMA. And not only that. We will also have to fight
> > > > finding page flags for other more general usecases in the meantime.
> > > 
> > > This maintenance problem is inherent. This problem exists even if we
> > > uses MIGRATETYPE approach. We cannot remove many hooks for CMA if a
> > > future HW will not need CMA allocation in few years. The only
> > > difference is that one takes single zone bit only for CMA user and the
> > > other approach takes many hooks that we need to take care about it all
> > > the time.
> > 
> > And I consider this a big difference. Because while hooks are not nice
> > they will affect CMA users (in a sense of bugs/performance etc.). While
> > an additional bit consumed will affect potential future and more generic
> > features.
> 
> Because these hooks are so tricky and are spread on many places,
> bugs about these hooks can be made by *non-CMA* user and they hurt
> *CMA* user. These hooks could also delay non-CMA user's development speed
> since there are many hooks about CMA and understanding how CMA is managed
> is rather difficult.

Than make those hooks easier to maintain. Seriously this is a
non-argument.

[...]

> > And all this can be isolated to CMA specific hooks with mostly minimum
> > impact to most users. I hear you saying that zone approach is more natural
> > and I would agree if we wouldn't have to care about the number of zones.
> 
> I attach a solution about one more bit in page flags although I don't
> agree with your opinion that additional bit is no-go approach. Just
> note that we have already used three bits for zone encoding in some
> configuration due to ZONE_DEVICE.

I am absolutely not happy about ZONE_DEVICE but there is _no_ other
viable solution right now. I know that people behind this are still
considering struct page over direct pfn usage but they are not in the
same situation as CMA which _can_ work without additional zone.

If you _really_ insist on using zone for CMA then reuse ZONE_MOVABLE.
I absolutely miss why do you push a specialized zone so hard.

[...]
> > No, but I haven't heard any single argument that those bugs are
> > impossible to fix with the current approach. They might be harder to fix
> > but if I can chose between harder for CMA and harder for other more
> > generic HW independent features I will go for the first one. And do not
> > take me wrong, I have nothing against CMA as such. It solves a real life
> > problem. I just believe it doesn't deserve to consume a new bit in page
> > flags because that is just too scarce resource.
> 
> As I mentioned above, I think that maintenance overhead due to CMA
> deserves to consume a new bit in page flags. It also provide us
> extendability to introduce more zones in the future.
> 
> Anyway, this value-judgement is subjective so I guess that we
> cannot agree with each other. To solve your concern,
> I make following solution. Please let me know your opinion about this.
> This patch can be applied on top of my ZONE_CMA series.

I don not think this makes situation any easier or more acceptable for
merging.

But I feel we are looping without much progress. So let me NAK this
until it is _proven_ that the current code is unfixable nor ZONE_MOVABLE
can be reused
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-11 Thread Vlastimil Babka
On 05/04/2017 02:46 PM, Michal Hocko wrote:
> On Thu 04-05-17 14:33:24, Vlastimil Babka wrote:
>>>
>>> I am pretty sure s390 and ppc support NUMA and aim at supporting really
>>> large systems. 
>>
>> I don't see ppc there,
> 
> config KVM_BOOK3S_64_HV
> tristate "KVM for POWER7 and later using hypervisor mode in host"
> depends on KVM_BOOK3S_64 && PPC_POWERNV
> select KVM_BOOK3S_HV_POSSIBLE
> select MMU_NOTIFIER
> select CMA
> 
> fa61a4e376d21 tries to explain some more

Uh, that's unfortunate then.

> [...]
>>> Are we really ready to add another thing like that? How are distribution
>>> kernels going to handle that?
>>
>> I still hope that generic enterprise/desktop distributions can disable
>> it, and it's only used for small devices with custom kernels.
>>
>> The config burden is already there in any case, it just translates to
>> extra migratetype and fastpath hooks, not extra zone and potentially
>> less nodes.
> 
> AFAIU the extra migrate type costs nothing when there are no cma
> reservations. And those hooks can be made noop behind static branch
> as well. So distribution kernels do not really have to be afraid of
> enabling CMA currently.

The tradeoff is unfortunate :/


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-11 Thread Vlastimil Babka
On 05/04/2017 02:46 PM, Michal Hocko wrote:
> On Thu 04-05-17 14:33:24, Vlastimil Babka wrote:
>>>
>>> I am pretty sure s390 and ppc support NUMA and aim at supporting really
>>> large systems. 
>>
>> I don't see ppc there,
> 
> config KVM_BOOK3S_64_HV
> tristate "KVM for POWER7 and later using hypervisor mode in host"
> depends on KVM_BOOK3S_64 && PPC_POWERNV
> select KVM_BOOK3S_HV_POSSIBLE
> select MMU_NOTIFIER
> select CMA
> 
> fa61a4e376d21 tries to explain some more

Uh, that's unfortunate then.

> [...]
>>> Are we really ready to add another thing like that? How are distribution
>>> kernels going to handle that?
>>
>> I still hope that generic enterprise/desktop distributions can disable
>> it, and it's only used for small devices with custom kernels.
>>
>> The config burden is already there in any case, it just translates to
>> extra migratetype and fastpath hooks, not extra zone and potentially
>> less nodes.
> 
> AFAIU the extra migrate type costs nothing when there are no cma
> reservations. And those hooks can be made noop behind static branch
> as well. So distribution kernels do not really have to be afraid of
> enabling CMA currently.

The tradeoff is unfortunate :/


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-10 Thread Joonsoo Kim
Sorry for the late response. I was on a vacation.

On Tue, May 02, 2017 at 03:32:29PM +0200, Michal Hocko wrote:
> On Tue 02-05-17 13:01:32, Joonsoo Kim wrote:
> > On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote:
> [...]
> > > I see this point and I agree that using a specific zone might be a
> > > _nicer_ solution in the end but you have to consider another aspects as
> > > well. The main one I am worried about is a long term maintainability.
> > > We are really out of page flags and consuming one for a rather specific
> > > usecase is not good. Look at ZONE_DMA. I am pretty sure that almost
> > > no sane HW needs 16MB zone anymore, yet we have hard time to get rid
> > > of it and so we have that memory laying around unused all the time
> > > and blocking one page flag bit. CMA falls into a similar category
> > > AFAIU. I wouldn't be all that surprised if a future HW will not need CMA
> > > allocations in few years, yet we will have to fight to get rid of it
> > > like we do with ZONE_DMA. And not only that. We will also have to fight
> > > finding page flags for other more general usecases in the meantime.
> > 
> > This maintenance problem is inherent. This problem exists even if we
> > uses MIGRATETYPE approach. We cannot remove many hooks for CMA if a
> > future HW will not need CMA allocation in few years. The only
> > difference is that one takes single zone bit only for CMA user and the
> > other approach takes many hooks that we need to take care about it all
> > the time.
> 
> And I consider this a big difference. Because while hooks are not nice
> they will affect CMA users (in a sense of bugs/performance etc.). While
> an additional bit consumed will affect potential future and more generic
> features.

Because these hooks are so tricky and are spread on many places,
bugs about these hooks can be made by *non-CMA* user and they hurt
*CMA* user. These hooks could also delay non-CMA user's development speed
since there are many hooks about CMA and understanding how CMA is managed
is rather difficult. I think that this is a big maintenance overhead
not only for CMA user but also for non-CMA user. So, I think that it
can justify additional bit consumed.

> 
> [...]
> > > I believe that the overhead in the hot path is not such a big deal. We
> > > have means to make it 0 when CMA is not used by jumplabels. I assume
> > > that the vast majority of systems will not use CMA. Those systems which
> > > use CMA should be able to cope with some slight overhead IMHO.
> > 
> > Please don't underestimate number of CMA user. Most of android device
> > uses CMA. So, there would be more devices using CMA than the server
> > not using CMA. They also have a right to experience the best performance.
> 
> This is not a fair comparison, though. Android development model is much
> more faster and tend to not care about future maintainability at all. I
> do not know about any android device that would run on a clean vanilla
> kernel because vendors simply do not care enough (or have time) to put
> the code into a proper shape to upstream it. I understand that this
> model might work quite well for rapidly changing and moving mobile or
> IoT segment but it is not the greatest fit to motivate the core kernel
> subsystem development. We are not in the drivers space!
> 
> [...]
> > > This looks like a nice clean up. Those ifdefs are ugly as hell. One
> > > could argue that some of that could be cleaned up by simply adding some
> > > helpers (with a jump label to reduce the overhead), though. But is this
> > > really strong enough reason to bring the whole zone in? I am not really
> > > convinced to be honest.
> > 
> > Please don't underestimate the benefit of this patchset.
> > I have said that we need *more* hooks to fix all the problems.
> > 
> > And, please think that this code removal is not only code removal but
> > also concept removal. With this removing, we don't need to consider
> > ALLOC_CMA for alloc_flags when calling zone_watermark_ok(). There are
> > many bugs on it and it still remains. We don't need to consider
> > pageblock migratetype when handling pageblock migratetype. We don't
> > need to take a great care about calculating the number of CMA
> > freepages.
> 
> And all this can be isolated to CMA specific hooks with mostly minimum
> impact to most users. I hear you saying that zone approach is more natural
> and I would agree if we wouldn't have to care about the number of zones.

I attach a solution about one more bit in page flags although I don't
agree with your opinion that additional bit is no-go approach. Just
note that we have already used three bits for zone encoding in some
configuration due to ZONE_DEVICE.

> 
> > > [...]
> > > 
> > > > > Please do _not_ take this as a NAK from me. At least not at this 
> > > > > time. I
> > > > > am still trying to understand all the consequences but my intuition
> > > > > tells me that building on top of highmem like approach will turn out 
> > 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-10 Thread Joonsoo Kim
Sorry for the late response. I was on a vacation.

On Tue, May 02, 2017 at 03:32:29PM +0200, Michal Hocko wrote:
> On Tue 02-05-17 13:01:32, Joonsoo Kim wrote:
> > On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote:
> [...]
> > > I see this point and I agree that using a specific zone might be a
> > > _nicer_ solution in the end but you have to consider another aspects as
> > > well. The main one I am worried about is a long term maintainability.
> > > We are really out of page flags and consuming one for a rather specific
> > > usecase is not good. Look at ZONE_DMA. I am pretty sure that almost
> > > no sane HW needs 16MB zone anymore, yet we have hard time to get rid
> > > of it and so we have that memory laying around unused all the time
> > > and blocking one page flag bit. CMA falls into a similar category
> > > AFAIU. I wouldn't be all that surprised if a future HW will not need CMA
> > > allocations in few years, yet we will have to fight to get rid of it
> > > like we do with ZONE_DMA. And not only that. We will also have to fight
> > > finding page flags for other more general usecases in the meantime.
> > 
> > This maintenance problem is inherent. This problem exists even if we
> > uses MIGRATETYPE approach. We cannot remove many hooks for CMA if a
> > future HW will not need CMA allocation in few years. The only
> > difference is that one takes single zone bit only for CMA user and the
> > other approach takes many hooks that we need to take care about it all
> > the time.
> 
> And I consider this a big difference. Because while hooks are not nice
> they will affect CMA users (in a sense of bugs/performance etc.). While
> an additional bit consumed will affect potential future and more generic
> features.

Because these hooks are so tricky and are spread on many places,
bugs about these hooks can be made by *non-CMA* user and they hurt
*CMA* user. These hooks could also delay non-CMA user's development speed
since there are many hooks about CMA and understanding how CMA is managed
is rather difficult. I think that this is a big maintenance overhead
not only for CMA user but also for non-CMA user. So, I think that it
can justify additional bit consumed.

> 
> [...]
> > > I believe that the overhead in the hot path is not such a big deal. We
> > > have means to make it 0 when CMA is not used by jumplabels. I assume
> > > that the vast majority of systems will not use CMA. Those systems which
> > > use CMA should be able to cope with some slight overhead IMHO.
> > 
> > Please don't underestimate number of CMA user. Most of android device
> > uses CMA. So, there would be more devices using CMA than the server
> > not using CMA. They also have a right to experience the best performance.
> 
> This is not a fair comparison, though. Android development model is much
> more faster and tend to not care about future maintainability at all. I
> do not know about any android device that would run on a clean vanilla
> kernel because vendors simply do not care enough (or have time) to put
> the code into a proper shape to upstream it. I understand that this
> model might work quite well for rapidly changing and moving mobile or
> IoT segment but it is not the greatest fit to motivate the core kernel
> subsystem development. We are not in the drivers space!
> 
> [...]
> > > This looks like a nice clean up. Those ifdefs are ugly as hell. One
> > > could argue that some of that could be cleaned up by simply adding some
> > > helpers (with a jump label to reduce the overhead), though. But is this
> > > really strong enough reason to bring the whole zone in? I am not really
> > > convinced to be honest.
> > 
> > Please don't underestimate the benefit of this patchset.
> > I have said that we need *more* hooks to fix all the problems.
> > 
> > And, please think that this code removal is not only code removal but
> > also concept removal. With this removing, we don't need to consider
> > ALLOC_CMA for alloc_flags when calling zone_watermark_ok(). There are
> > many bugs on it and it still remains. We don't need to consider
> > pageblock migratetype when handling pageblock migratetype. We don't
> > need to take a great care about calculating the number of CMA
> > freepages.
> 
> And all this can be isolated to CMA specific hooks with mostly minimum
> impact to most users. I hear you saying that zone approach is more natural
> and I would agree if we wouldn't have to care about the number of zones.

I attach a solution about one more bit in page flags although I don't
agree with your opinion that additional bit is no-go approach. Just
note that we have already used three bits for zone encoding in some
configuration due to ZONE_DEVICE.

> 
> > > [...]
> > > 
> > > > > Please do _not_ take this as a NAK from me. At least not at this 
> > > > > time. I
> > > > > am still trying to understand all the consequences but my intuition
> > > > > tells me that building on top of highmem like approach will turn out 
> > 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-04 Thread Michal Hocko
On Thu 04-05-17 14:33:24, Vlastimil Babka wrote:
> On 05/02/2017 03:03 PM, Michal Hocko wrote:
> > On Tue 02-05-17 10:06:01, Vlastimil Babka wrote:
> >> On 04/27/2017 05:06 PM, Michal Hocko wrote:
> >>> On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:
>  On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
> > On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
> >> On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> >>> On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> >>> [...]
> > not for free. For most common configurations where we have ZONE_DMA,
> > ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
> > consumed so a new zone will need a new one AFAICS.
> 
>  Yes, it requires one more bit for a new zone and it's handled by the 
>  patch.
> >>>
> >>> I am pretty sure that you are aware that consuming new page flag bits
> >>> is usually a no-go and something we try to avoid as much as possible
> >>> because we are in a great shortage there. So there really have to be a
> >>> _strong_ reason if we go that way. My current understanding that the
> >>> whole zone concept is more about a more convenient implementation rather
> >>> than a fundamental change which will solve unsolvable problems with the
> >>> current approach. More on that below.
> >>
> >> I don't see it as such a big issue. It's behind a CONFIG option (so we
> >> also don't need the jump labels you suggest later) and enabling it
> >> reduces the number of possible NUMA nodes (not page flags). So either
> >> you are building a kernel for android phone that needs CMA but will have
> >> a single NUMA node, or for a large server with many nodes that won't
> >> have CMA. As long as there won't be large servers that need CMA, we
> >> should be fine (yes, I know some HW vendors can be very creative, but
> >> then it's their problem?).
> > 
> > Is this really about Android/UMA systems only? My quick grep seems to 
> > disagree
> > $ git grep CONFIG_CMA=y
> > arch/arm/configs/exynos_defconfig:CONFIG_CMA=y
> > arch/arm/configs/imx_v6_v7_defconfig:CONFIG_CMA=y
> > arch/arm/configs/keystone_defconfig:CONFIG_CMA=y
> > arch/arm/configs/multi_v7_defconfig:CONFIG_CMA=y
> > arch/arm/configs/omap2plus_defconfig:CONFIG_CMA=y
> > arch/arm/configs/tegra_defconfig:CONFIG_CMA=y
> > arch/arm/configs/vexpress_defconfig:CONFIG_CMA=y
> > arch/arm64/configs/defconfig:CONFIG_CMA=y
> > arch/mips/configs/ci20_defconfig:CONFIG_CMA=y
> > arch/mips/configs/db1xxx_defconfig:CONFIG_CMA=y
> > arch/s390/configs/default_defconfig:CONFIG_CMA=y
> > arch/s390/configs/gcov_defconfig:CONFIG_CMA=y
> > arch/s390/configs/performance_defconfig:CONFIG_CMA=y
> > arch/s390/defconfig:CONFIG_CMA=y
> > 
> > I am pretty sure s390 and ppc support NUMA and aim at supporting really
> > large systems. 
> 
> I don't see ppc there,

config KVM_BOOK3S_64_HV
tristate "KVM for POWER7 and later using hypervisor mode in host"
depends on KVM_BOOK3S_64 && PPC_POWERNV
select KVM_BOOK3S_HV_POSSIBLE
select MMU_NOTIFIER
select CMA

fa61a4e376d21 tries to explain some more

[...]
> > Are we really ready to add another thing like that? How are distribution
> > kernels going to handle that?
> 
> I still hope that generic enterprise/desktop distributions can disable
> it, and it's only used for small devices with custom kernels.
> 
> The config burden is already there in any case, it just translates to
> extra migratetype and fastpath hooks, not extra zone and potentially
> less nodes.

AFAIU the extra migrate type costs nothing when there are no cma
reservations. And those hooks can be made noop behind static branch
as well. So distribution kernels do not really have to be afraid of
enabling CMA currently.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-04 Thread Michal Hocko
On Thu 04-05-17 14:33:24, Vlastimil Babka wrote:
> On 05/02/2017 03:03 PM, Michal Hocko wrote:
> > On Tue 02-05-17 10:06:01, Vlastimil Babka wrote:
> >> On 04/27/2017 05:06 PM, Michal Hocko wrote:
> >>> On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:
>  On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
> > On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
> >> On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> >>> On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> >>> [...]
> > not for free. For most common configurations where we have ZONE_DMA,
> > ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
> > consumed so a new zone will need a new one AFAICS.
> 
>  Yes, it requires one more bit for a new zone and it's handled by the 
>  patch.
> >>>
> >>> I am pretty sure that you are aware that consuming new page flag bits
> >>> is usually a no-go and something we try to avoid as much as possible
> >>> because we are in a great shortage there. So there really have to be a
> >>> _strong_ reason if we go that way. My current understanding that the
> >>> whole zone concept is more about a more convenient implementation rather
> >>> than a fundamental change which will solve unsolvable problems with the
> >>> current approach. More on that below.
> >>
> >> I don't see it as such a big issue. It's behind a CONFIG option (so we
> >> also don't need the jump labels you suggest later) and enabling it
> >> reduces the number of possible NUMA nodes (not page flags). So either
> >> you are building a kernel for android phone that needs CMA but will have
> >> a single NUMA node, or for a large server with many nodes that won't
> >> have CMA. As long as there won't be large servers that need CMA, we
> >> should be fine (yes, I know some HW vendors can be very creative, but
> >> then it's their problem?).
> > 
> > Is this really about Android/UMA systems only? My quick grep seems to 
> > disagree
> > $ git grep CONFIG_CMA=y
> > arch/arm/configs/exynos_defconfig:CONFIG_CMA=y
> > arch/arm/configs/imx_v6_v7_defconfig:CONFIG_CMA=y
> > arch/arm/configs/keystone_defconfig:CONFIG_CMA=y
> > arch/arm/configs/multi_v7_defconfig:CONFIG_CMA=y
> > arch/arm/configs/omap2plus_defconfig:CONFIG_CMA=y
> > arch/arm/configs/tegra_defconfig:CONFIG_CMA=y
> > arch/arm/configs/vexpress_defconfig:CONFIG_CMA=y
> > arch/arm64/configs/defconfig:CONFIG_CMA=y
> > arch/mips/configs/ci20_defconfig:CONFIG_CMA=y
> > arch/mips/configs/db1xxx_defconfig:CONFIG_CMA=y
> > arch/s390/configs/default_defconfig:CONFIG_CMA=y
> > arch/s390/configs/gcov_defconfig:CONFIG_CMA=y
> > arch/s390/configs/performance_defconfig:CONFIG_CMA=y
> > arch/s390/defconfig:CONFIG_CMA=y
> > 
> > I am pretty sure s390 and ppc support NUMA and aim at supporting really
> > large systems. 
> 
> I don't see ppc there,

config KVM_BOOK3S_64_HV
tristate "KVM for POWER7 and later using hypervisor mode in host"
depends on KVM_BOOK3S_64 && PPC_POWERNV
select KVM_BOOK3S_HV_POSSIBLE
select MMU_NOTIFIER
select CMA

fa61a4e376d21 tries to explain some more

[...]
> > Are we really ready to add another thing like that? How are distribution
> > kernels going to handle that?
> 
> I still hope that generic enterprise/desktop distributions can disable
> it, and it's only used for small devices with custom kernels.
> 
> The config burden is already there in any case, it just translates to
> extra migratetype and fastpath hooks, not extra zone and potentially
> less nodes.

AFAIU the extra migrate type costs nothing when there are no cma
reservations. And those hooks can be made noop behind static branch
as well. So distribution kernels do not really have to be afraid of
enabling CMA currently.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-04 Thread Vlastimil Babka
On 05/02/2017 03:03 PM, Michal Hocko wrote:
> On Tue 02-05-17 10:06:01, Vlastimil Babka wrote:
>> On 04/27/2017 05:06 PM, Michal Hocko wrote:
>>> On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:
 On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
> On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
>> On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
>>> On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
>>> [...]
> not for free. For most common configurations where we have ZONE_DMA,
> ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
> consumed so a new zone will need a new one AFAICS.

 Yes, it requires one more bit for a new zone and it's handled by the patch.
>>>
>>> I am pretty sure that you are aware that consuming new page flag bits
>>> is usually a no-go and something we try to avoid as much as possible
>>> because we are in a great shortage there. So there really have to be a
>>> _strong_ reason if we go that way. My current understanding that the
>>> whole zone concept is more about a more convenient implementation rather
>>> than a fundamental change which will solve unsolvable problems with the
>>> current approach. More on that below.
>>
>> I don't see it as such a big issue. It's behind a CONFIG option (so we
>> also don't need the jump labels you suggest later) and enabling it
>> reduces the number of possible NUMA nodes (not page flags). So either
>> you are building a kernel for android phone that needs CMA but will have
>> a single NUMA node, or for a large server with many nodes that won't
>> have CMA. As long as there won't be large servers that need CMA, we
>> should be fine (yes, I know some HW vendors can be very creative, but
>> then it's their problem?).
> 
> Is this really about Android/UMA systems only? My quick grep seems to disagree
> $ git grep CONFIG_CMA=y
> arch/arm/configs/exynos_defconfig:CONFIG_CMA=y
> arch/arm/configs/imx_v6_v7_defconfig:CONFIG_CMA=y
> arch/arm/configs/keystone_defconfig:CONFIG_CMA=y
> arch/arm/configs/multi_v7_defconfig:CONFIG_CMA=y
> arch/arm/configs/omap2plus_defconfig:CONFIG_CMA=y
> arch/arm/configs/tegra_defconfig:CONFIG_CMA=y
> arch/arm/configs/vexpress_defconfig:CONFIG_CMA=y
> arch/arm64/configs/defconfig:CONFIG_CMA=y
> arch/mips/configs/ci20_defconfig:CONFIG_CMA=y
> arch/mips/configs/db1xxx_defconfig:CONFIG_CMA=y
> arch/s390/configs/default_defconfig:CONFIG_CMA=y
> arch/s390/configs/gcov_defconfig:CONFIG_CMA=y
> arch/s390/configs/performance_defconfig:CONFIG_CMA=y
> arch/s390/defconfig:CONFIG_CMA=y
> 
> I am pretty sure s390 and ppc support NUMA and aim at supporting really
> large systems. 

I don't see ppc there, and s390 commit adding CMA as default provides no
info. Heiko/Martin, could you share what does s390 use CMA for? Thanks.

> I can imagine that we could make ZONE_CMA configurable in a way that
> only very well defined use cases would be supported so that we can save
> page flags space. But this alone sounds like a maintainability nightmare
> to me. Especially when I consider ZONE_DMA situation. There is simply
> not an easy way to find out whether my HW really needs DMA zone or
> not. Most probably not but it still is configured and hidden behind
> config ZONE_DMA
> bool "DMA memory allocation support" if EXPERT
> default y
> help
>   DMA memory allocation support allows devices with less than 32-bit
>   addressing to allocate within the first 16MB of address space.
>   Disable if no such devices will be used.
> 
>   If unsure, say Y.
> 
> Are we really ready to add another thing like that? How are distribution
> kernels going to handle that?

I still hope that generic enterprise/desktop distributions can disable
it, and it's only used for small devices with custom kernels.

The config burden is already there in any case, it just translates to
extra migratetype and fastpath hooks, not extra zone and potentially
less nodes.


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-04 Thread Vlastimil Babka
On 05/02/2017 03:03 PM, Michal Hocko wrote:
> On Tue 02-05-17 10:06:01, Vlastimil Babka wrote:
>> On 04/27/2017 05:06 PM, Michal Hocko wrote:
>>> On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:
 On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
> On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
>> On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
>>> On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
>>> [...]
> not for free. For most common configurations where we have ZONE_DMA,
> ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
> consumed so a new zone will need a new one AFAICS.

 Yes, it requires one more bit for a new zone and it's handled by the patch.
>>>
>>> I am pretty sure that you are aware that consuming new page flag bits
>>> is usually a no-go and something we try to avoid as much as possible
>>> because we are in a great shortage there. So there really have to be a
>>> _strong_ reason if we go that way. My current understanding that the
>>> whole zone concept is more about a more convenient implementation rather
>>> than a fundamental change which will solve unsolvable problems with the
>>> current approach. More on that below.
>>
>> I don't see it as such a big issue. It's behind a CONFIG option (so we
>> also don't need the jump labels you suggest later) and enabling it
>> reduces the number of possible NUMA nodes (not page flags). So either
>> you are building a kernel for android phone that needs CMA but will have
>> a single NUMA node, or for a large server with many nodes that won't
>> have CMA. As long as there won't be large servers that need CMA, we
>> should be fine (yes, I know some HW vendors can be very creative, but
>> then it's their problem?).
> 
> Is this really about Android/UMA systems only? My quick grep seems to disagree
> $ git grep CONFIG_CMA=y
> arch/arm/configs/exynos_defconfig:CONFIG_CMA=y
> arch/arm/configs/imx_v6_v7_defconfig:CONFIG_CMA=y
> arch/arm/configs/keystone_defconfig:CONFIG_CMA=y
> arch/arm/configs/multi_v7_defconfig:CONFIG_CMA=y
> arch/arm/configs/omap2plus_defconfig:CONFIG_CMA=y
> arch/arm/configs/tegra_defconfig:CONFIG_CMA=y
> arch/arm/configs/vexpress_defconfig:CONFIG_CMA=y
> arch/arm64/configs/defconfig:CONFIG_CMA=y
> arch/mips/configs/ci20_defconfig:CONFIG_CMA=y
> arch/mips/configs/db1xxx_defconfig:CONFIG_CMA=y
> arch/s390/configs/default_defconfig:CONFIG_CMA=y
> arch/s390/configs/gcov_defconfig:CONFIG_CMA=y
> arch/s390/configs/performance_defconfig:CONFIG_CMA=y
> arch/s390/defconfig:CONFIG_CMA=y
> 
> I am pretty sure s390 and ppc support NUMA and aim at supporting really
> large systems. 

I don't see ppc there, and s390 commit adding CMA as default provides no
info. Heiko/Martin, could you share what does s390 use CMA for? Thanks.

> I can imagine that we could make ZONE_CMA configurable in a way that
> only very well defined use cases would be supported so that we can save
> page flags space. But this alone sounds like a maintainability nightmare
> to me. Especially when I consider ZONE_DMA situation. There is simply
> not an easy way to find out whether my HW really needs DMA zone or
> not. Most probably not but it still is configured and hidden behind
> config ZONE_DMA
> bool "DMA memory allocation support" if EXPERT
> default y
> help
>   DMA memory allocation support allows devices with less than 32-bit
>   addressing to allocate within the first 16MB of address space.
>   Disable if no such devices will be used.
> 
>   If unsure, say Y.
> 
> Are we really ready to add another thing like that? How are distribution
> kernels going to handle that?

I still hope that generic enterprise/desktop distributions can disable
it, and it's only used for small devices with custom kernels.

The config burden is already there in any case, it just translates to
extra migratetype and fastpath hooks, not extra zone and potentially
less nodes.


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-02 Thread Igor Stoppa
On 02/05/17 16:03, Michal Hocko wrote:

> I can imagine that we could make ZONE_CMA configurable in a way that
> only very well defined use cases would be supported so that we can save
> page flags space. But this alone sounds like a maintainability nightmare
> to me. Especially when I consider ZONE_DMA situation. There is simply
> not an easy way to find out whether my HW really needs DMA zone or
> not. Most probably not but it still is configured and hidden behind
> config ZONE_DMA
> bool "DMA memory allocation support" if EXPERT
> default y
> help
>   DMA memory allocation support allows devices with less than 32-bit
>   addressing to allocate within the first 16MB of address space.
>   Disable if no such devices will be used.
> 
>   If unsure, say Y.
> 
> Are we really ready to add another thing like that? How are distribution
> kernels going to handle that?

In practice there are 2 quite opposite scenarios:

- distros that try to cater to (almost) everyone and are constrained in
what they can leave out

- ad-hoc builds (like Android, but also IoT) where the HW is *very* well
known upfront, because it's probably even impossible to make any change
that doesn't involved a rework station.

So maybe the answer is to not have only EXPERT, but rather DISTRO/CUSTOM
with the implications these can bring.

A generic build would assume to be a DISTRO type, but something else, of
more embedded persuasion, could do otherwise.

ZONE_DMA / ZONE_DMA32 actually seem to be perfect candidates for being
replaced by something else, when unused, as I proposed on Friday:

http://marc.info/?l=linux-mm=149337033630993=2


It might still be that only some cases would be upstreamable, even after
these changes.

But at least some of those might be useful also for non-Android/ non-IoT
scenarios.


---
igor


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-02 Thread Igor Stoppa
On 02/05/17 16:03, Michal Hocko wrote:

> I can imagine that we could make ZONE_CMA configurable in a way that
> only very well defined use cases would be supported so that we can save
> page flags space. But this alone sounds like a maintainability nightmare
> to me. Especially when I consider ZONE_DMA situation. There is simply
> not an easy way to find out whether my HW really needs DMA zone or
> not. Most probably not but it still is configured and hidden behind
> config ZONE_DMA
> bool "DMA memory allocation support" if EXPERT
> default y
> help
>   DMA memory allocation support allows devices with less than 32-bit
>   addressing to allocate within the first 16MB of address space.
>   Disable if no such devices will be used.
> 
>   If unsure, say Y.
> 
> Are we really ready to add another thing like that? How are distribution
> kernels going to handle that?

In practice there are 2 quite opposite scenarios:

- distros that try to cater to (almost) everyone and are constrained in
what they can leave out

- ad-hoc builds (like Android, but also IoT) where the HW is *very* well
known upfront, because it's probably even impossible to make any change
that doesn't involved a rework station.

So maybe the answer is to not have only EXPERT, but rather DISTRO/CUSTOM
with the implications these can bring.

A generic build would assume to be a DISTRO type, but something else, of
more embedded persuasion, could do otherwise.

ZONE_DMA / ZONE_DMA32 actually seem to be perfect candidates for being
replaced by something else, when unused, as I proposed on Friday:

http://marc.info/?l=linux-mm=149337033630993=2


It might still be that only some cases would be upstreamable, even after
these changes.

But at least some of those might be useful also for non-Android/ non-IoT
scenarios.


---
igor


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-02 Thread Michal Hocko
On Tue 02-05-17 13:01:32, Joonsoo Kim wrote:
> On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote:
[...]
> > I see this point and I agree that using a specific zone might be a
> > _nicer_ solution in the end but you have to consider another aspects as
> > well. The main one I am worried about is a long term maintainability.
> > We are really out of page flags and consuming one for a rather specific
> > usecase is not good. Look at ZONE_DMA. I am pretty sure that almost
> > no sane HW needs 16MB zone anymore, yet we have hard time to get rid
> > of it and so we have that memory laying around unused all the time
> > and blocking one page flag bit. CMA falls into a similar category
> > AFAIU. I wouldn't be all that surprised if a future HW will not need CMA
> > allocations in few years, yet we will have to fight to get rid of it
> > like we do with ZONE_DMA. And not only that. We will also have to fight
> > finding page flags for other more general usecases in the meantime.
> 
> This maintenance problem is inherent. This problem exists even if we
> uses MIGRATETYPE approach. We cannot remove many hooks for CMA if a
> future HW will not need CMA allocation in few years. The only
> difference is that one takes single zone bit only for CMA user and the
> other approach takes many hooks that we need to take care about it all
> the time.

And I consider this a big difference. Because while hooks are not nice
they will affect CMA users (in a sense of bugs/performance etc.). While
an additional bit consumed will affect potential future and more generic
features.

[...]
> > I believe that the overhead in the hot path is not such a big deal. We
> > have means to make it 0 when CMA is not used by jumplabels. I assume
> > that the vast majority of systems will not use CMA. Those systems which
> > use CMA should be able to cope with some slight overhead IMHO.
> 
> Please don't underestimate number of CMA user. Most of android device
> uses CMA. So, there would be more devices using CMA than the server
> not using CMA. They also have a right to experience the best performance.

This is not a fair comparison, though. Android development model is much
more faster and tend to not care about future maintainability at all. I
do not know about any android device that would run on a clean vanilla
kernel because vendors simply do not care enough (or have time) to put
the code into a proper shape to upstream it. I understand that this
model might work quite well for rapidly changing and moving mobile or
IoT segment but it is not the greatest fit to motivate the core kernel
subsystem development. We are not in the drivers space!

[...]
> > This looks like a nice clean up. Those ifdefs are ugly as hell. One
> > could argue that some of that could be cleaned up by simply adding some
> > helpers (with a jump label to reduce the overhead), though. But is this
> > really strong enough reason to bring the whole zone in? I am not really
> > convinced to be honest.
> 
> Please don't underestimate the benefit of this patchset.
> I have said that we need *more* hooks to fix all the problems.
> 
> And, please think that this code removal is not only code removal but
> also concept removal. With this removing, we don't need to consider
> ALLOC_CMA for alloc_flags when calling zone_watermark_ok(). There are
> many bugs on it and it still remains. We don't need to consider
> pageblock migratetype when handling pageblock migratetype. We don't
> need to take a great care about calculating the number of CMA
> freepages.

And all this can be isolated to CMA specific hooks with mostly minimum
impact to most users. I hear you saying that zone approach is more natural
and I would agree if we wouldn't have to care about the number of zones.

> > [...]
> > 
> > > > Please do _not_ take this as a NAK from me. At least not at this time. I
> > > > am still trying to understand all the consequences but my intuition
> > > > tells me that building on top of highmem like approach will turn out to
> > > > be problematic in future (as we have already seen with the highmem and
> > > > movable zones) so this needs a very prudent consideration.
> > > 
> > > I can understand that you are prudent to this issue. However, it takes 
> > > more
> > > than two years and many people already expressed that ZONE approach is the
> > > way to go.
> > 
> > I can see a single Acked-by and one Reviewed-by. It would be much more
> > convincing to see much larger support. Do not take me wrong I am not
> > trying to undermine the feedback so far but we should be clear about one
> > thing. CMA is mostly motivated by the industry which tries to overcome
> > HW limitations which can change in future very easily. I would rather
> > see good enough solution for something like that than a nicer solution
> > which is pushing additional burden on more general usecases.
> 
> First of all, current MIGRATETYPE approach isn't good enough to me.
> They caused too many problems and 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-02 Thread Michal Hocko
On Tue 02-05-17 13:01:32, Joonsoo Kim wrote:
> On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote:
[...]
> > I see this point and I agree that using a specific zone might be a
> > _nicer_ solution in the end but you have to consider another aspects as
> > well. The main one I am worried about is a long term maintainability.
> > We are really out of page flags and consuming one for a rather specific
> > usecase is not good. Look at ZONE_DMA. I am pretty sure that almost
> > no sane HW needs 16MB zone anymore, yet we have hard time to get rid
> > of it and so we have that memory laying around unused all the time
> > and blocking one page flag bit. CMA falls into a similar category
> > AFAIU. I wouldn't be all that surprised if a future HW will not need CMA
> > allocations in few years, yet we will have to fight to get rid of it
> > like we do with ZONE_DMA. And not only that. We will also have to fight
> > finding page flags for other more general usecases in the meantime.
> 
> This maintenance problem is inherent. This problem exists even if we
> uses MIGRATETYPE approach. We cannot remove many hooks for CMA if a
> future HW will not need CMA allocation in few years. The only
> difference is that one takes single zone bit only for CMA user and the
> other approach takes many hooks that we need to take care about it all
> the time.

And I consider this a big difference. Because while hooks are not nice
they will affect CMA users (in a sense of bugs/performance etc.). While
an additional bit consumed will affect potential future and more generic
features.

[...]
> > I believe that the overhead in the hot path is not such a big deal. We
> > have means to make it 0 when CMA is not used by jumplabels. I assume
> > that the vast majority of systems will not use CMA. Those systems which
> > use CMA should be able to cope with some slight overhead IMHO.
> 
> Please don't underestimate number of CMA user. Most of android device
> uses CMA. So, there would be more devices using CMA than the server
> not using CMA. They also have a right to experience the best performance.

This is not a fair comparison, though. Android development model is much
more faster and tend to not care about future maintainability at all. I
do not know about any android device that would run on a clean vanilla
kernel because vendors simply do not care enough (or have time) to put
the code into a proper shape to upstream it. I understand that this
model might work quite well for rapidly changing and moving mobile or
IoT segment but it is not the greatest fit to motivate the core kernel
subsystem development. We are not in the drivers space!

[...]
> > This looks like a nice clean up. Those ifdefs are ugly as hell. One
> > could argue that some of that could be cleaned up by simply adding some
> > helpers (with a jump label to reduce the overhead), though. But is this
> > really strong enough reason to bring the whole zone in? I am not really
> > convinced to be honest.
> 
> Please don't underestimate the benefit of this patchset.
> I have said that we need *more* hooks to fix all the problems.
> 
> And, please think that this code removal is not only code removal but
> also concept removal. With this removing, we don't need to consider
> ALLOC_CMA for alloc_flags when calling zone_watermark_ok(). There are
> many bugs on it and it still remains. We don't need to consider
> pageblock migratetype when handling pageblock migratetype. We don't
> need to take a great care about calculating the number of CMA
> freepages.

And all this can be isolated to CMA specific hooks with mostly minimum
impact to most users. I hear you saying that zone approach is more natural
and I would agree if we wouldn't have to care about the number of zones.

> > [...]
> > 
> > > > Please do _not_ take this as a NAK from me. At least not at this time. I
> > > > am still trying to understand all the consequences but my intuition
> > > > tells me that building on top of highmem like approach will turn out to
> > > > be problematic in future (as we have already seen with the highmem and
> > > > movable zones) so this needs a very prudent consideration.
> > > 
> > > I can understand that you are prudent to this issue. However, it takes 
> > > more
> > > than two years and many people already expressed that ZONE approach is the
> > > way to go.
> > 
> > I can see a single Acked-by and one Reviewed-by. It would be much more
> > convincing to see much larger support. Do not take me wrong I am not
> > trying to undermine the feedback so far but we should be clear about one
> > thing. CMA is mostly motivated by the industry which tries to overcome
> > HW limitations which can change in future very easily. I would rather
> > see good enough solution for something like that than a nicer solution
> > which is pushing additional burden on more general usecases.
> 
> First of all, current MIGRATETYPE approach isn't good enough to me.
> They caused too many problems and 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-02 Thread Michal Hocko
On Tue 02-05-17 10:06:01, Vlastimil Babka wrote:
> On 04/27/2017 05:06 PM, Michal Hocko wrote:
> > On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:
> >> On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
> >>> On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
>  On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> > On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> > [...]
> >>> not for free. For most common configurations where we have ZONE_DMA,
> >>> ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
> >>> consumed so a new zone will need a new one AFAICS.
> >>
> >> Yes, it requires one more bit for a new zone and it's handled by the patch.
> > 
> > I am pretty sure that you are aware that consuming new page flag bits
> > is usually a no-go and something we try to avoid as much as possible
> > because we are in a great shortage there. So there really have to be a
> > _strong_ reason if we go that way. My current understanding that the
> > whole zone concept is more about a more convenient implementation rather
> > than a fundamental change which will solve unsolvable problems with the
> > current approach. More on that below.
> 
> I don't see it as such a big issue. It's behind a CONFIG option (so we
> also don't need the jump labels you suggest later) and enabling it
> reduces the number of possible NUMA nodes (not page flags). So either
> you are building a kernel for android phone that needs CMA but will have
> a single NUMA node, or for a large server with many nodes that won't
> have CMA. As long as there won't be large servers that need CMA, we
> should be fine (yes, I know some HW vendors can be very creative, but
> then it's their problem?).

Is this really about Android/UMA systems only? My quick grep seems to disagree
$ git grep CONFIG_CMA=y
arch/arm/configs/exynos_defconfig:CONFIG_CMA=y
arch/arm/configs/imx_v6_v7_defconfig:CONFIG_CMA=y
arch/arm/configs/keystone_defconfig:CONFIG_CMA=y
arch/arm/configs/multi_v7_defconfig:CONFIG_CMA=y
arch/arm/configs/omap2plus_defconfig:CONFIG_CMA=y
arch/arm/configs/tegra_defconfig:CONFIG_CMA=y
arch/arm/configs/vexpress_defconfig:CONFIG_CMA=y
arch/arm64/configs/defconfig:CONFIG_CMA=y
arch/mips/configs/ci20_defconfig:CONFIG_CMA=y
arch/mips/configs/db1xxx_defconfig:CONFIG_CMA=y
arch/s390/configs/default_defconfig:CONFIG_CMA=y
arch/s390/configs/gcov_defconfig:CONFIG_CMA=y
arch/s390/configs/performance_defconfig:CONFIG_CMA=y
arch/s390/defconfig:CONFIG_CMA=y

I am pretty sure s390 and ppc support NUMA and aim at supporting really
large systems. 

I can imagine that we could make ZONE_CMA configurable in a way that
only very well defined use cases would be supported so that we can save
page flags space. But this alone sounds like a maintainability nightmare
to me. Especially when I consider ZONE_DMA situation. There is simply
not an easy way to find out whether my HW really needs DMA zone or
not. Most probably not but it still is configured and hidden behind
config ZONE_DMA
bool "DMA memory allocation support" if EXPERT
default y
help
  DMA memory allocation support allows devices with less than 32-bit
  addressing to allocate within the first 16MB of address space.
  Disable if no such devices will be used.

  If unsure, say Y.

Are we really ready to add another thing like that? How are distribution
kernels going to handle that?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-02 Thread Michal Hocko
On Tue 02-05-17 10:06:01, Vlastimil Babka wrote:
> On 04/27/2017 05:06 PM, Michal Hocko wrote:
> > On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:
> >> On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
> >>> On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
>  On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> > On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> > [...]
> >>> not for free. For most common configurations where we have ZONE_DMA,
> >>> ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
> >>> consumed so a new zone will need a new one AFAICS.
> >>
> >> Yes, it requires one more bit for a new zone and it's handled by the patch.
> > 
> > I am pretty sure that you are aware that consuming new page flag bits
> > is usually a no-go and something we try to avoid as much as possible
> > because we are in a great shortage there. So there really have to be a
> > _strong_ reason if we go that way. My current understanding that the
> > whole zone concept is more about a more convenient implementation rather
> > than a fundamental change which will solve unsolvable problems with the
> > current approach. More on that below.
> 
> I don't see it as such a big issue. It's behind a CONFIG option (so we
> also don't need the jump labels you suggest later) and enabling it
> reduces the number of possible NUMA nodes (not page flags). So either
> you are building a kernel for android phone that needs CMA but will have
> a single NUMA node, or for a large server with many nodes that won't
> have CMA. As long as there won't be large servers that need CMA, we
> should be fine (yes, I know some HW vendors can be very creative, but
> then it's their problem?).

Is this really about Android/UMA systems only? My quick grep seems to disagree
$ git grep CONFIG_CMA=y
arch/arm/configs/exynos_defconfig:CONFIG_CMA=y
arch/arm/configs/imx_v6_v7_defconfig:CONFIG_CMA=y
arch/arm/configs/keystone_defconfig:CONFIG_CMA=y
arch/arm/configs/multi_v7_defconfig:CONFIG_CMA=y
arch/arm/configs/omap2plus_defconfig:CONFIG_CMA=y
arch/arm/configs/tegra_defconfig:CONFIG_CMA=y
arch/arm/configs/vexpress_defconfig:CONFIG_CMA=y
arch/arm64/configs/defconfig:CONFIG_CMA=y
arch/mips/configs/ci20_defconfig:CONFIG_CMA=y
arch/mips/configs/db1xxx_defconfig:CONFIG_CMA=y
arch/s390/configs/default_defconfig:CONFIG_CMA=y
arch/s390/configs/gcov_defconfig:CONFIG_CMA=y
arch/s390/configs/performance_defconfig:CONFIG_CMA=y
arch/s390/defconfig:CONFIG_CMA=y

I am pretty sure s390 and ppc support NUMA and aim at supporting really
large systems. 

I can imagine that we could make ZONE_CMA configurable in a way that
only very well defined use cases would be supported so that we can save
page flags space. But this alone sounds like a maintainability nightmare
to me. Especially when I consider ZONE_DMA situation. There is simply
not an easy way to find out whether my HW really needs DMA zone or
not. Most probably not but it still is configured and hidden behind
config ZONE_DMA
bool "DMA memory allocation support" if EXPERT
default y
help
  DMA memory allocation support allows devices with less than 32-bit
  addressing to allocate within the first 16MB of address space.
  Disable if no such devices will be used.

  If unsure, say Y.

Are we really ready to add another thing like that? How are distribution
kernels going to handle that?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-02 Thread Vlastimil Babka
On 04/27/2017 05:06 PM, Michal Hocko wrote:
> On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:
>> On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
>>> On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
 On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> [...]
>>> not for free. For most common configurations where we have ZONE_DMA,
>>> ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
>>> consumed so a new zone will need a new one AFAICS.
>>
>> Yes, it requires one more bit for a new zone and it's handled by the patch.
> 
> I am pretty sure that you are aware that consuming new page flag bits
> is usually a no-go and something we try to avoid as much as possible
> because we are in a great shortage there. So there really have to be a
> _strong_ reason if we go that way. My current understanding that the
> whole zone concept is more about a more convenient implementation rather
> than a fundamental change which will solve unsolvable problems with the
> current approach. More on that below.

I don't see it as such a big issue. It's behind a CONFIG option (so we
also don't need the jump labels you suggest later) and enabling it
reduces the number of possible NUMA nodes (not page flags). So either
you are building a kernel for android phone that needs CMA but will have
a single NUMA node, or for a large server with many nodes that won't
have CMA. As long as there won't be large servers that need CMA, we
should be fine (yes, I know some HW vendors can be very creative, but
then it's their problem?).

> [...]
>> MOVABLE allocation will fallback as following sequence.
>>
>> ZONE_CMA -> ZONE_MOVABLE -> ZONE_HIGHMEM -> ZONE_NORMAL -> ...

Hmm, so this in effect resembles some of the aggressive CMA utilization
efforts that were never merged due to issues. Joonsoo, could you
summarize/expand the cover letter part on what were the issues with
aggressive CMA utilization, and why they no longer apply with ZONE_CMA,
especially given the current node-lru reclaim? Thanks.




Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-02 Thread Vlastimil Babka
On 04/27/2017 05:06 PM, Michal Hocko wrote:
> On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:
>> On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
>>> On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
 On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> [...]
>>> not for free. For most common configurations where we have ZONE_DMA,
>>> ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
>>> consumed so a new zone will need a new one AFAICS.
>>
>> Yes, it requires one more bit for a new zone and it's handled by the patch.
> 
> I am pretty sure that you are aware that consuming new page flag bits
> is usually a no-go and something we try to avoid as much as possible
> because we are in a great shortage there. So there really have to be a
> _strong_ reason if we go that way. My current understanding that the
> whole zone concept is more about a more convenient implementation rather
> than a fundamental change which will solve unsolvable problems with the
> current approach. More on that below.

I don't see it as such a big issue. It's behind a CONFIG option (so we
also don't need the jump labels you suggest later) and enabling it
reduces the number of possible NUMA nodes (not page flags). So either
you are building a kernel for android phone that needs CMA but will have
a single NUMA node, or for a large server with many nodes that won't
have CMA. As long as there won't be large servers that need CMA, we
should be fine (yes, I know some HW vendors can be very creative, but
then it's their problem?).

> [...]
>> MOVABLE allocation will fallback as following sequence.
>>
>> ZONE_CMA -> ZONE_MOVABLE -> ZONE_HIGHMEM -> ZONE_NORMAL -> ...

Hmm, so this in effect resembles some of the aggressive CMA utilization
efforts that were never merged due to issues. Joonsoo, could you
summarize/expand the cover letter part on what were the issues with
aggressive CMA utilization, and why they no longer apply with ZONE_CMA,
especially given the current node-lru reclaim? Thanks.




Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-01 Thread Joonsoo Kim
On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote:
> On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:
> > On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
> > > On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
> > > > On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> > > > > On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> [...]
> > > not for free. For most common configurations where we have ZONE_DMA,
> > > ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
> > > consumed so a new zone will need a new one AFAICS.
> > 
> > Yes, it requires one more bit for a new zone and it's handled by the patch.
> 
> I am pretty sure that you are aware that consuming new page flag bits
> is usually a no-go and something we try to avoid as much as possible
> because we are in a great shortage there. So there really have to be a
> _strong_ reason if we go that way. My current understanding that the
> whole zone concept is more about a more convenient implementation rather
> than a fundamental change which will solve unsolvable problems with the
> current approach. More on that below.

If there is a consensus that adding a new zone and one more bit in
page flags bits seems to be unreasonable, I try to find a way to use
ZONE_MOVABLE. As mentioned before, that's not fundamental issue to me.
However, it will have many potential problems as I mentioned so I
*really* don't prefer that way.

> 
> [...]
> > MOVABLE allocation will fallback as following sequence.
> > 
> > ZONE_CMA -> ZONE_MOVABLE -> ZONE_HIGHMEM -> ZONE_NORMAL -> ...
> > 
> > I don't understand what you mean CMA allocation. In MM's context,
> > there is no CMA allocation. That is just MOVABLE allocation.
> > 
> > For device's context, there is CMA allocation. It is range specific
> > allocation so it should be succeed for requested range. No fallback is
> > allowed in this case.
> 
> OK. that answers my question. I guess... My main confusion comes from
> __alloc_gigantic_page which shares alloc_contig_range with the cma
> allocation. But from what you wrote above and my quick glance over the
> code __alloc_gigantic_page simply changes the migrate type of the pfn
> range and it doesn't move it to the zone CMA. Right?

Yes, it doesn't move it to the zone CMA.

> 
> [...]
> > > > At a glance, special migratetype sound natural. I also did. However,
> > > > it's not natural in implementation POV. Zone consists of the same type
> > > > of memory (by definition ?) and MM subsystem is implemented with that
> > > > assumption. If difference type of memory shares the same zone, it easily
> > > > causes the problem and CMA problems are the such case.
> > > 
> > > But this is not any different from the highmem vs. lowmem problems we
> > > already have, no? I have looked at your example in the cover where you
> > > mention utilization and the reclaim problems. With the node reclaim we
> > > will have pages from all zones on the same LRU(s). isolate_lru_pages
> > > will skip those from ZONE_CMA because their zone_idx is higher than
> > > gfp_idx(GFP_KERNEL). The same could be achieved by an explicit check for
> > > the pageblock migrate type. So the zone doesn't really help much. Or is
> > > there some aspect that I am missing?
> > 
> > Your understanding is correct. It can archieved by an explict check
> > for migratetype. And, this is the main reason that we should avoid
> > such approach.
> > 
> > With ZONE approach, all these things are done naturally. We don't need
> > any explicit check to anywhere. We already have a code to skip to
> > reclaim such pages by checking zone_idx.
> 
> Yes, and as we have to filter pages anyway doing so for cma blocks
> doesn't sound overly burdensome from the maintenance point of view.
>  
> > However, with MIGRATETYPE approach, all these things *cannot* be done
> > naturally. We need extra checks to all the places (allocator fast
> > path, reclaim path, compaction, etc...). It is really error-prone and
> > it already causes many problems due to this aspect. For the
> > performance wise, this approach is also bad since it requires to check
> > migratetype for each pages.
> > 
> > Moreover, even if we adds extra checks, things cannot be easily
> > perfect.
> 
> I see this point and I agree that using a specific zone might be a
> _nicer_ solution in the end but you have to consider another aspects as
> well. The main one I am worried about is a long term maintainability.
> We are really out of page flags and consuming one for a rather specific
> usecase is not good. Look at ZONE_DMA. I am pretty sure that almost
> no sane HW needs 16MB zone anymore, yet we have hard time to get rid
> of it and so we have that memory laying around unused all the time
> and blocking one page flag bit. CMA falls into a similar category
> AFAIU. I wouldn't be all that surprised if a future HW will not need CMA
> allocations in few years, yet we will have to fight to get rid of it
> like we do with ZONE_DMA. And not only 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-05-01 Thread Joonsoo Kim
On Thu, Apr 27, 2017 at 05:06:36PM +0200, Michal Hocko wrote:
> On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:
> > On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
> > > On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
> > > > On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> > > > > On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> [...]
> > > not for free. For most common configurations where we have ZONE_DMA,
> > > ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
> > > consumed so a new zone will need a new one AFAICS.
> > 
> > Yes, it requires one more bit for a new zone and it's handled by the patch.
> 
> I am pretty sure that you are aware that consuming new page flag bits
> is usually a no-go and something we try to avoid as much as possible
> because we are in a great shortage there. So there really have to be a
> _strong_ reason if we go that way. My current understanding that the
> whole zone concept is more about a more convenient implementation rather
> than a fundamental change which will solve unsolvable problems with the
> current approach. More on that below.

If there is a consensus that adding a new zone and one more bit in
page flags bits seems to be unreasonable, I try to find a way to use
ZONE_MOVABLE. As mentioned before, that's not fundamental issue to me.
However, it will have many potential problems as I mentioned so I
*really* don't prefer that way.

> 
> [...]
> > MOVABLE allocation will fallback as following sequence.
> > 
> > ZONE_CMA -> ZONE_MOVABLE -> ZONE_HIGHMEM -> ZONE_NORMAL -> ...
> > 
> > I don't understand what you mean CMA allocation. In MM's context,
> > there is no CMA allocation. That is just MOVABLE allocation.
> > 
> > For device's context, there is CMA allocation. It is range specific
> > allocation so it should be succeed for requested range. No fallback is
> > allowed in this case.
> 
> OK. that answers my question. I guess... My main confusion comes from
> __alloc_gigantic_page which shares alloc_contig_range with the cma
> allocation. But from what you wrote above and my quick glance over the
> code __alloc_gigantic_page simply changes the migrate type of the pfn
> range and it doesn't move it to the zone CMA. Right?

Yes, it doesn't move it to the zone CMA.

> 
> [...]
> > > > At a glance, special migratetype sound natural. I also did. However,
> > > > it's not natural in implementation POV. Zone consists of the same type
> > > > of memory (by definition ?) and MM subsystem is implemented with that
> > > > assumption. If difference type of memory shares the same zone, it easily
> > > > causes the problem and CMA problems are the such case.
> > > 
> > > But this is not any different from the highmem vs. lowmem problems we
> > > already have, no? I have looked at your example in the cover where you
> > > mention utilization and the reclaim problems. With the node reclaim we
> > > will have pages from all zones on the same LRU(s). isolate_lru_pages
> > > will skip those from ZONE_CMA because their zone_idx is higher than
> > > gfp_idx(GFP_KERNEL). The same could be achieved by an explicit check for
> > > the pageblock migrate type. So the zone doesn't really help much. Or is
> > > there some aspect that I am missing?
> > 
> > Your understanding is correct. It can archieved by an explict check
> > for migratetype. And, this is the main reason that we should avoid
> > such approach.
> > 
> > With ZONE approach, all these things are done naturally. We don't need
> > any explicit check to anywhere. We already have a code to skip to
> > reclaim such pages by checking zone_idx.
> 
> Yes, and as we have to filter pages anyway doing so for cma blocks
> doesn't sound overly burdensome from the maintenance point of view.
>  
> > However, with MIGRATETYPE approach, all these things *cannot* be done
> > naturally. We need extra checks to all the places (allocator fast
> > path, reclaim path, compaction, etc...). It is really error-prone and
> > it already causes many problems due to this aspect. For the
> > performance wise, this approach is also bad since it requires to check
> > migratetype for each pages.
> > 
> > Moreover, even if we adds extra checks, things cannot be easily
> > perfect.
> 
> I see this point and I agree that using a specific zone might be a
> _nicer_ solution in the end but you have to consider another aspects as
> well. The main one I am worried about is a long term maintainability.
> We are really out of page flags and consuming one for a rather specific
> usecase is not good. Look at ZONE_DMA. I am pretty sure that almost
> no sane HW needs 16MB zone anymore, yet we have hard time to get rid
> of it and so we have that memory laying around unused all the time
> and blocking one page flag bit. CMA falls into a similar category
> AFAIU. I wouldn't be all that surprised if a future HW will not need CMA
> allocations in few years, yet we will have to fight to get rid of it
> like we do with ZONE_DMA. And not only 

Re: Generic approach to customizable zones - was: Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-28 Thread Igor Stoppa


On 28/04/17 11:36, Michal Hocko wrote:
> I didn't read this thoughly yet because I will be travelling shortly

ok, thanks for bearing with me =)

> but
> this point alone just made ask, because it seems there is some
> misunderstanding

It is possible, so far I did some changes, but I have not completed the
whole conversion.

> On Fri 28-04-17 11:04:27, Igor Stoppa wrote:
> [...]
>> * if one is happy to have a 64bits type, allow for as many zones as
>>   it's possible to fit, or anyway more than what is possible with
>>   the 32 bit mask.
> 
> zones are currently placed in struct page::flags. And that already is
> 64b size on 64b arches. 

Ok, the issues I had so fare were related to the enum for zones being
treated as 32b.

> And we do not really have any room spare there.
> We encode page flags, zone id, numa_nid/sparse section_nr there. How can
> you add more without enlarging the struct page itself or using external
> means to store the same information (page_ext comes to mind)?

Then I'll be conservative and assume I can't, unless I can prove otherwise.

There is still the possibility I mentioned of loosely coupling DMA,
DMA32 and HIGHMEM with the bits currently reserved for them, right?

If my system doesn't use those zones as such, because it doesn't
have/need them, those bits are wasted for me. Otoh someone else is
probably not interested in what I'm after but needs one or more of those
zones.

Making the meaning of the bits configurable should still be a viable
option. It's not altering their amount, just their purpose on a specific
build.

> Even if
> the later would be possible then note thatpage_zone() is used in many
> performance sensitive paths and making it perform well with special
> casing would be far from trivial.


If the solution I propose is acceptable, I'm willing to bite the bullet
and go for implementing the conversion.

In my case I really would like to be able to use kmalloc, because it
would provide an easy path to convert also other portions of the kernel,
besides SE Linux.

I suspect I would encounter overall far less resistance if the type of
change I propose is limited to:

s/GFP_KERNEL/GFP_LOCKABLE/

And if I can guarrantee that GFP_LOCKABLE falls back to GFP_KERNEL when
the "lockable" feature is not enabled.


--
thanks, igor


Re: Generic approach to customizable zones - was: Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-28 Thread Igor Stoppa


On 28/04/17 11:36, Michal Hocko wrote:
> I didn't read this thoughly yet because I will be travelling shortly

ok, thanks for bearing with me =)

> but
> this point alone just made ask, because it seems there is some
> misunderstanding

It is possible, so far I did some changes, but I have not completed the
whole conversion.

> On Fri 28-04-17 11:04:27, Igor Stoppa wrote:
> [...]
>> * if one is happy to have a 64bits type, allow for as many zones as
>>   it's possible to fit, or anyway more than what is possible with
>>   the 32 bit mask.
> 
> zones are currently placed in struct page::flags. And that already is
> 64b size on 64b arches. 

Ok, the issues I had so fare were related to the enum for zones being
treated as 32b.

> And we do not really have any room spare there.
> We encode page flags, zone id, numa_nid/sparse section_nr there. How can
> you add more without enlarging the struct page itself or using external
> means to store the same information (page_ext comes to mind)?

Then I'll be conservative and assume I can't, unless I can prove otherwise.

There is still the possibility I mentioned of loosely coupling DMA,
DMA32 and HIGHMEM with the bits currently reserved for them, right?

If my system doesn't use those zones as such, because it doesn't
have/need them, those bits are wasted for me. Otoh someone else is
probably not interested in what I'm after but needs one or more of those
zones.

Making the meaning of the bits configurable should still be a viable
option. It's not altering their amount, just their purpose on a specific
build.

> Even if
> the later would be possible then note thatpage_zone() is used in many
> performance sensitive paths and making it perform well with special
> casing would be far from trivial.


If the solution I propose is acceptable, I'm willing to bite the bullet
and go for implementing the conversion.

In my case I really would like to be able to use kmalloc, because it
would provide an easy path to convert also other portions of the kernel,
besides SE Linux.

I suspect I would encounter overall far less resistance if the type of
change I propose is limited to:

s/GFP_KERNEL/GFP_LOCKABLE/

And if I can guarrantee that GFP_LOCKABLE falls back to GFP_KERNEL when
the "lockable" feature is not enabled.


--
thanks, igor


Re: Generic approach to customizable zones - was: Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-28 Thread Michal Hocko
I didn't read this thoughly yet because I will be travelling shortly but
this point alone just made ask, because it seems there is some
misunderstanding

On Fri 28-04-17 11:04:27, Igor Stoppa wrote:
[...]
> * if one is happy to have a 64bits type, allow for as many zones as
>   it's possible to fit, or anyway more than what is possible with
>   the 32 bit mask.

zones are currently placed in struct page::flags. And that already is
64b size on 64b arches. And we do not really have any room spare there.
We encode page flags, zone id, numa_nid/sparse section_nr there. How can
you add more without enlarging the struct page itself or using external
means to store the same information (page_ext comes to mind)? Even if
the later would be possible then note thatpage_zone() is used in many
performance sensitive paths and making it perform well with special
casing would be far from trivial.
-- 
Michal Hocko
SUSE Labs


Re: Generic approach to customizable zones - was: Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-28 Thread Michal Hocko
I didn't read this thoughly yet because I will be travelling shortly but
this point alone just made ask, because it seems there is some
misunderstanding

On Fri 28-04-17 11:04:27, Igor Stoppa wrote:
[...]
> * if one is happy to have a 64bits type, allow for as many zones as
>   it's possible to fit, or anyway more than what is possible with
>   the 32 bit mask.

zones are currently placed in struct page::flags. And that already is
64b size on 64b arches. And we do not really have any room spare there.
We encode page flags, zone id, numa_nid/sparse section_nr there. How can
you add more without enlarging the struct page itself or using external
means to store the same information (page_ext comes to mind)? Even if
the later would be possible then note thatpage_zone() is used in many
performance sensitive paths and making it perform well with special
casing would be far from trivial.
-- 
Michal Hocko
SUSE Labs


Generic approach to customizable zones - was: Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-28 Thread Igor Stoppa
On 27/04/17 18:06, Michal Hocko wrote:
> On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:

[...]

>> Yes, it requires one more bit for a new zone and it's handled by the patch.
> 
> I am pretty sure that you are aware that consuming new page flag bits
> is usually a no-go and something we try to avoid as much as possible
> because we are in a great shortage there. So there really have to be a
> _strong_ reason if we go that way. My current understanding that the
> whole zone concept is more about a more convenient implementation rather
> than a fundamental change which will solve unsolvable problems with the
> current approach. More on that below.

Since I am in a similar situation, I think it's better if I join this
conversation instead of going through the same in a separate thread.

In this regard, I have a few observations (are they correct?):

* not everyone seems to be interested in having all the current
  zones active simultaneously

* some zones are even not so meaningful on certain architectures or
  platforms

* some architectures/platforms that are 64 bits would have no penalty
  in dealing with a larger data type.

So I wonder, would anybody be against this:

* within the 32bits constraint, define some optional zones

* decouple the specific position of a bit from the zone it represents;
  iow: if the zone is enabled, ensure that it gets a bit in the mask,
  but do not make promises about which one it is, provided that the
  corresponding macros work properly

* ensure that if one selects more optional zones than there are bits
  available (in the case of a 32bits mask), an error is produced at
  compile time

* if one is happy to have a 64bits type, allow for as many zones as
  it's possible to fit, or anyway more than what is possible with
  the 32 bit mask.

I think I can re-factor the code so that there is no runtime performance
degradation, if there is no immediate objection to what I described. Or
maybe I failed to notice some obvious pitfall?

>From what I see, there seems to be a lot of interest in using functions
like Kmalloc / vmalloc, with the ability of specifying pseudo-custom
areas from where they should tap into.

Why not, as long as those who do not need it are not negatively impacted?

I understand that if the association between bits and zones is fixed,
then suddenly bits become very precious stuff, but if they could be used
in a more efficient way, then maybe they could be used more liberally.

The alternative is to keep getting requests about new zones and turning
them away because they do not pass the bar of being extremely critical,
even if indeed they would simplify people's life.


The change shouldn't be too ugly, if I do something along these lines of
the pseudo code below.
Note: the #ifdefs would be mainly concentrated in the declaration part.

enum gfp_zone_shift {
#if IS_ENABLED(CONFIG_ZONE_DMA)
/*I haven't checked if this is the correct name, but it gives the idea*/
ZONE_DMA_SHIFT = 0,
#endif
#if IS_ENABLED(CONFIG_ZONE_HIGHMEM)
ZONE_HIGHMEM_SHIFT,
#endif
#if IS_ENABLED(CONFIG_ZONE_DMA32)
ZONE_DMA32_SHIFT,
#endif
#if IS_ENABLED(CONFIG_ZONE_xxx)
ZONE_xxx,
#endif
   NON_OPTIONAL_ZONE_SHIFT,
   ...
   USED_ZONES_NUMBER,
   ZONE_MOVABLE_SHIFT = USED_ZONES_NUMBER,
   ...
};

#if USED_ZONES_NUMBER < MAX_ZONES_32BITS
typedef gfp_zones_t uint32_t
#elif IS_ENABLED(CONFIG_ZONES_64BITS
typedef gfp_zones_t uint64_t
#else
#error
#endif

The type should be adjusted in other places where it is used, but I
didn't find too many occurrences.

#define __ZONE_DMA \
  (((gfp_zones_t)IS_ENABLED(CONFIG_ZONE_DMA)) << \
   (ZONE_DMA_SHIFT - 0))

[rinse and repeat]

Code referring to these optional zones can be sandboxed in

#if IS_ENABLED(CONFIG_ZONE_DMA)

inline function do_something_dma() {
   
}

#else
#define do_something_dma()
#endif


Or equivalent, effectively removing many #ifdefs from the main code of
functions like those called by kmalloc.


So, would this approach stand a chance?


thanks, igor


Generic approach to customizable zones - was: Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-28 Thread Igor Stoppa
On 27/04/17 18:06, Michal Hocko wrote:
> On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:

[...]

>> Yes, it requires one more bit for a new zone and it's handled by the patch.
> 
> I am pretty sure that you are aware that consuming new page flag bits
> is usually a no-go and something we try to avoid as much as possible
> because we are in a great shortage there. So there really have to be a
> _strong_ reason if we go that way. My current understanding that the
> whole zone concept is more about a more convenient implementation rather
> than a fundamental change which will solve unsolvable problems with the
> current approach. More on that below.

Since I am in a similar situation, I think it's better if I join this
conversation instead of going through the same in a separate thread.

In this regard, I have a few observations (are they correct?):

* not everyone seems to be interested in having all the current
  zones active simultaneously

* some zones are even not so meaningful on certain architectures or
  platforms

* some architectures/platforms that are 64 bits would have no penalty
  in dealing with a larger data type.

So I wonder, would anybody be against this:

* within the 32bits constraint, define some optional zones

* decouple the specific position of a bit from the zone it represents;
  iow: if the zone is enabled, ensure that it gets a bit in the mask,
  but do not make promises about which one it is, provided that the
  corresponding macros work properly

* ensure that if one selects more optional zones than there are bits
  available (in the case of a 32bits mask), an error is produced at
  compile time

* if one is happy to have a 64bits type, allow for as many zones as
  it's possible to fit, or anyway more than what is possible with
  the 32 bit mask.

I think I can re-factor the code so that there is no runtime performance
degradation, if there is no immediate objection to what I described. Or
maybe I failed to notice some obvious pitfall?

>From what I see, there seems to be a lot of interest in using functions
like Kmalloc / vmalloc, with the ability of specifying pseudo-custom
areas from where they should tap into.

Why not, as long as those who do not need it are not negatively impacted?

I understand that if the association between bits and zones is fixed,
then suddenly bits become very precious stuff, but if they could be used
in a more efficient way, then maybe they could be used more liberally.

The alternative is to keep getting requests about new zones and turning
them away because they do not pass the bar of being extremely critical,
even if indeed they would simplify people's life.


The change shouldn't be too ugly, if I do something along these lines of
the pseudo code below.
Note: the #ifdefs would be mainly concentrated in the declaration part.

enum gfp_zone_shift {
#if IS_ENABLED(CONFIG_ZONE_DMA)
/*I haven't checked if this is the correct name, but it gives the idea*/
ZONE_DMA_SHIFT = 0,
#endif
#if IS_ENABLED(CONFIG_ZONE_HIGHMEM)
ZONE_HIGHMEM_SHIFT,
#endif
#if IS_ENABLED(CONFIG_ZONE_DMA32)
ZONE_DMA32_SHIFT,
#endif
#if IS_ENABLED(CONFIG_ZONE_xxx)
ZONE_xxx,
#endif
   NON_OPTIONAL_ZONE_SHIFT,
   ...
   USED_ZONES_NUMBER,
   ZONE_MOVABLE_SHIFT = USED_ZONES_NUMBER,
   ...
};

#if USED_ZONES_NUMBER < MAX_ZONES_32BITS
typedef gfp_zones_t uint32_t
#elif IS_ENABLED(CONFIG_ZONES_64BITS
typedef gfp_zones_t uint64_t
#else
#error
#endif

The type should be adjusted in other places where it is used, but I
didn't find too many occurrences.

#define __ZONE_DMA \
  (((gfp_zones_t)IS_ENABLED(CONFIG_ZONE_DMA)) << \
   (ZONE_DMA_SHIFT - 0))

[rinse and repeat]

Code referring to these optional zones can be sandboxed in

#if IS_ENABLED(CONFIG_ZONE_DMA)

inline function do_something_dma() {
   
}

#else
#define do_something_dma()
#endif


Or equivalent, effectively removing many #ifdefs from the main code of
functions like those called by kmalloc.


So, would this approach stand a chance?


thanks, igor


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-27 Thread Michal Hocko
On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:
> On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
> > On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
> > > On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> > > > On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
[...]
> > not for free. For most common configurations where we have ZONE_DMA,
> > ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
> > consumed so a new zone will need a new one AFAICS.
> 
> Yes, it requires one more bit for a new zone and it's handled by the patch.

I am pretty sure that you are aware that consuming new page flag bits
is usually a no-go and something we try to avoid as much as possible
because we are in a great shortage there. So there really have to be a
_strong_ reason if we go that way. My current understanding that the
whole zone concept is more about a more convenient implementation rather
than a fundamental change which will solve unsolvable problems with the
current approach. More on that below.

[...]
> MOVABLE allocation will fallback as following sequence.
> 
> ZONE_CMA -> ZONE_MOVABLE -> ZONE_HIGHMEM -> ZONE_NORMAL -> ...
> 
> I don't understand what you mean CMA allocation. In MM's context,
> there is no CMA allocation. That is just MOVABLE allocation.
> 
> For device's context, there is CMA allocation. It is range specific
> allocation so it should be succeed for requested range. No fallback is
> allowed in this case.

OK. that answers my question. I guess... My main confusion comes from
__alloc_gigantic_page which shares alloc_contig_range with the cma
allocation. But from what you wrote above and my quick glance over the
code __alloc_gigantic_page simply changes the migrate type of the pfn
range and it doesn't move it to the zone CMA. Right?

[...]
> > > At a glance, special migratetype sound natural. I also did. However,
> > > it's not natural in implementation POV. Zone consists of the same type
> > > of memory (by definition ?) and MM subsystem is implemented with that
> > > assumption. If difference type of memory shares the same zone, it easily
> > > causes the problem and CMA problems are the such case.
> > 
> > But this is not any different from the highmem vs. lowmem problems we
> > already have, no? I have looked at your example in the cover where you
> > mention utilization and the reclaim problems. With the node reclaim we
> > will have pages from all zones on the same LRU(s). isolate_lru_pages
> > will skip those from ZONE_CMA because their zone_idx is higher than
> > gfp_idx(GFP_KERNEL). The same could be achieved by an explicit check for
> > the pageblock migrate type. So the zone doesn't really help much. Or is
> > there some aspect that I am missing?
> 
> Your understanding is correct. It can archieved by an explict check
> for migratetype. And, this is the main reason that we should avoid
> such approach.
> 
> With ZONE approach, all these things are done naturally. We don't need
> any explicit check to anywhere. We already have a code to skip to
> reclaim such pages by checking zone_idx.

Yes, and as we have to filter pages anyway doing so for cma blocks
doesn't sound overly burdensome from the maintenance point of view.
 
> However, with MIGRATETYPE approach, all these things *cannot* be done
> naturally. We need extra checks to all the places (allocator fast
> path, reclaim path, compaction, etc...). It is really error-prone and
> it already causes many problems due to this aspect. For the
> performance wise, this approach is also bad since it requires to check
> migratetype for each pages.
> 
> Moreover, even if we adds extra checks, things cannot be easily
> perfect.

I see this point and I agree that using a specific zone might be a
_nicer_ solution in the end but you have to consider another aspects as
well. The main one I am worried about is a long term maintainability.
We are really out of page flags and consuming one for a rather specific
usecase is not good. Look at ZONE_DMA. I am pretty sure that almost
no sane HW needs 16MB zone anymore, yet we have hard time to get rid
of it and so we have that memory laying around unused all the time
and blocking one page flag bit. CMA falls into a similar category
AFAIU. I wouldn't be all that surprised if a future HW will not need CMA
allocations in few years, yet we will have to fight to get rid of it
like we do with ZONE_DMA. And not only that. We will also have to fight
finding page flags for other more general usecases in the meantime.

> See 3) Atomic allocation failure problem. It's inherent
> problem if we have different types of memory in a single zone.
> We possibly can make things perfect even with MIGRATETYPE approach,
> however, it requires additional checks in hotpath than current. It's
> expensive and undesirable. It will make future maintenance of MM code
> much difficult.

I believe that the overhead in the hot path is not such a big deal. We
have means to make it 0 when CMA is not used by 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-27 Thread Michal Hocko
On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:
> On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
> > On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
> > > On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> > > > On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
[...]
> > not for free. For most common configurations where we have ZONE_DMA,
> > ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
> > consumed so a new zone will need a new one AFAICS.
> 
> Yes, it requires one more bit for a new zone and it's handled by the patch.

I am pretty sure that you are aware that consuming new page flag bits
is usually a no-go and something we try to avoid as much as possible
because we are in a great shortage there. So there really have to be a
_strong_ reason if we go that way. My current understanding that the
whole zone concept is more about a more convenient implementation rather
than a fundamental change which will solve unsolvable problems with the
current approach. More on that below.

[...]
> MOVABLE allocation will fallback as following sequence.
> 
> ZONE_CMA -> ZONE_MOVABLE -> ZONE_HIGHMEM -> ZONE_NORMAL -> ...
> 
> I don't understand what you mean CMA allocation. In MM's context,
> there is no CMA allocation. That is just MOVABLE allocation.
> 
> For device's context, there is CMA allocation. It is range specific
> allocation so it should be succeed for requested range. No fallback is
> allowed in this case.

OK. that answers my question. I guess... My main confusion comes from
__alloc_gigantic_page which shares alloc_contig_range with the cma
allocation. But from what you wrote above and my quick glance over the
code __alloc_gigantic_page simply changes the migrate type of the pfn
range and it doesn't move it to the zone CMA. Right?

[...]
> > > At a glance, special migratetype sound natural. I also did. However,
> > > it's not natural in implementation POV. Zone consists of the same type
> > > of memory (by definition ?) and MM subsystem is implemented with that
> > > assumption. If difference type of memory shares the same zone, it easily
> > > causes the problem and CMA problems are the such case.
> > 
> > But this is not any different from the highmem vs. lowmem problems we
> > already have, no? I have looked at your example in the cover where you
> > mention utilization and the reclaim problems. With the node reclaim we
> > will have pages from all zones on the same LRU(s). isolate_lru_pages
> > will skip those from ZONE_CMA because their zone_idx is higher than
> > gfp_idx(GFP_KERNEL). The same could be achieved by an explicit check for
> > the pageblock migrate type. So the zone doesn't really help much. Or is
> > there some aspect that I am missing?
> 
> Your understanding is correct. It can archieved by an explict check
> for migratetype. And, this is the main reason that we should avoid
> such approach.
> 
> With ZONE approach, all these things are done naturally. We don't need
> any explicit check to anywhere. We already have a code to skip to
> reclaim such pages by checking zone_idx.

Yes, and as we have to filter pages anyway doing so for cma blocks
doesn't sound overly burdensome from the maintenance point of view.
 
> However, with MIGRATETYPE approach, all these things *cannot* be done
> naturally. We need extra checks to all the places (allocator fast
> path, reclaim path, compaction, etc...). It is really error-prone and
> it already causes many problems due to this aspect. For the
> performance wise, this approach is also bad since it requires to check
> migratetype for each pages.
> 
> Moreover, even if we adds extra checks, things cannot be easily
> perfect.

I see this point and I agree that using a specific zone might be a
_nicer_ solution in the end but you have to consider another aspects as
well. The main one I am worried about is a long term maintainability.
We are really out of page flags and consuming one for a rather specific
usecase is not good. Look at ZONE_DMA. I am pretty sure that almost
no sane HW needs 16MB zone anymore, yet we have hard time to get rid
of it and so we have that memory laying around unused all the time
and blocking one page flag bit. CMA falls into a similar category
AFAIU. I wouldn't be all that surprised if a future HW will not need CMA
allocations in few years, yet we will have to fight to get rid of it
like we do with ZONE_DMA. And not only that. We will also have to fight
finding page flags for other more general usecases in the meantime.

> See 3) Atomic allocation failure problem. It's inherent
> problem if we have different types of memory in a single zone.
> We possibly can make things perfect even with MIGRATETYPE approach,
> however, it requires additional checks in hotpath than current. It's
> expensive and undesirable. It will make future maintenance of MM code
> much difficult.

I believe that the overhead in the hot path is not such a big deal. We
have means to make it 0 when CMA is not used by 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-24 Thread Joonsoo Kim
On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
> On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
> > On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> > > On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> [...]
> > > > ZONE_CMA is conceptually the same with ZONE_MOVABLE. There is a software
> > > > constraint to guarantee the success of future allocation request from
> > > > the device. If the device requests the specific range of the memory in 
> > > > CMA
> > > > area at the runtime, page that allocated by MM will be migrated to
> > > > the other page and it will be returned to the device. To guarantee it,
> > > > ZONE_CMA only takes the allocation request with GFP_MOVABLE.
> > > 
> > > The immediate follow up question is. Why cannot we reuse ZONE_MOVABLE
> > > for that purpose?
> > 
> > I can make CMA reuses the ZONE_MOVABLE but I don't want it. Reasons
> > are that
> > 
> > 1. If ZONE_MOVABLE has two different types of memory, hotpluggable and
> > CMA, it may need special handling for each type. This would lead to a new
> > migratetype again (to distinguish them) and easy to be error-prone. I
> > don't want that case.
> 
> Hmm, I see your motivation. I believe that we could find a way
> around this. Anyway, movable zones are quite special and configuring
> overlapping CMA and hotplug movable regions could be refused. So I am
> not even sure this is a real problem in practice.
> 
> > 2. CMA users want to see usage stat separately since CMA often causes
> > the problems and separate stat would helps to debug it.
> 
> That could be solved by a per-zone/node counter.
> 
> Anyway, these reasons should be mentioned as well. Adding a new zone is

Okay.

> not for free. For most common configurations where we have ZONE_DMA,
> ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
> consumed so a new zone will need a new one AFAICS.

Yes, it requires one more bit for a new zone and it's handled by the patch.

> 
> [...]
> > > > Other things are completely the same with other zones. For MM POV, 
> > > > there is
> > > > no difference in allocation process except that it only takes
> > > > GFP_MOVABLE request. In reclaim, pages that are allocated by MM will
> > > > be reclaimed by the same policy of the MM. So, no difference.
> > > 
> > > OK, so essentially this is yet another "highmem" zone. We already know
> > > that only GFP_MOVABLE are allowed to fallback to ZONE_CMA but do CMA
> > > allocations fallback to other zones and punch new holes? In which zone
> > > order?
> > 
> > Hmm... I don't understand your question. Could you elaborate it more?
> 
> Well, my question was about the zone fallback chain. MOVABLE allocation
> can fallback to lower zones and also to the ZONE_CMA with your patch. If
> there is a CMA allocation it doesn't fall back to any other zone - in
> other words no new holes are punched to other zones. Is this correct?

Hmm... I still don't get the meaning of "no new holes are punched to
other zones". I try to answer with my current understanding about your
question.

MOVABLE allocation will fallback as following sequence.

ZONE_CMA -> ZONE_MOVABLE -> ZONE_HIGHMEM -> ZONE_NORMAL -> ...

I don't understand what you mean CMA allocation. In MM's context,
there is no CMA allocation. That is just MOVABLE allocation.

For device's context, there is CMA allocation. It is range specific
allocation so it should be succeed for requested range. No fallback is
allowed in this case.

> > > > This 'no difference' is a strong point of this approach. ZONE_CMA is
> > > > naturally handled by MM subsystem unlike as before (special handling is
> > > > required for MIGRATE_CMA).
> > > > 
> > > > 3. Controversial Point
> > > > 
> > > > Major concern from Mel is that zone concept is abused. ZONE is 
> > > > originally
> > > > introduced to solve some issues due to H/W addressing limitation.
> > > 
> > > Yes, very much agreed on that. You basically want to punch holes into
> > > other zones to guarantee an allocation progress. Marking those wholes
> > > with special migrate type sounds quite natural but I will have to study
> > > the current code some more to see whether issues you mention are
> > > inherently unfixable. This might very well turn out to be the case.
> > 
> > At a glance, special migratetype sound natural. I also did. However,
> > it's not natural in implementation POV. Zone consists of the same type
> > of memory (by definition ?) and MM subsystem is implemented with that
> > assumption. If difference type of memory shares the same zone, it easily
> > causes the problem and CMA problems are the such case.
> 
> But this is not any different from the highmem vs. lowmem problems we
> already have, no? I have looked at your example in the cover where you
> mention utilization and the reclaim problems. With the node reclaim we
> will have pages from all zones on the same LRU(s). isolate_lru_pages
> will skip those from ZONE_CMA because their zone_idx is higher than
> 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-24 Thread Joonsoo Kim
On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
> On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
> > On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> > > On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> [...]
> > > > ZONE_CMA is conceptually the same with ZONE_MOVABLE. There is a software
> > > > constraint to guarantee the success of future allocation request from
> > > > the device. If the device requests the specific range of the memory in 
> > > > CMA
> > > > area at the runtime, page that allocated by MM will be migrated to
> > > > the other page and it will be returned to the device. To guarantee it,
> > > > ZONE_CMA only takes the allocation request with GFP_MOVABLE.
> > > 
> > > The immediate follow up question is. Why cannot we reuse ZONE_MOVABLE
> > > for that purpose?
> > 
> > I can make CMA reuses the ZONE_MOVABLE but I don't want it. Reasons
> > are that
> > 
> > 1. If ZONE_MOVABLE has two different types of memory, hotpluggable and
> > CMA, it may need special handling for each type. This would lead to a new
> > migratetype again (to distinguish them) and easy to be error-prone. I
> > don't want that case.
> 
> Hmm, I see your motivation. I believe that we could find a way
> around this. Anyway, movable zones are quite special and configuring
> overlapping CMA and hotplug movable regions could be refused. So I am
> not even sure this is a real problem in practice.
> 
> > 2. CMA users want to see usage stat separately since CMA often causes
> > the problems and separate stat would helps to debug it.
> 
> That could be solved by a per-zone/node counter.
> 
> Anyway, these reasons should be mentioned as well. Adding a new zone is

Okay.

> not for free. For most common configurations where we have ZONE_DMA,
> ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
> consumed so a new zone will need a new one AFAICS.

Yes, it requires one more bit for a new zone and it's handled by the patch.

> 
> [...]
> > > > Other things are completely the same with other zones. For MM POV, 
> > > > there is
> > > > no difference in allocation process except that it only takes
> > > > GFP_MOVABLE request. In reclaim, pages that are allocated by MM will
> > > > be reclaimed by the same policy of the MM. So, no difference.
> > > 
> > > OK, so essentially this is yet another "highmem" zone. We already know
> > > that only GFP_MOVABLE are allowed to fallback to ZONE_CMA but do CMA
> > > allocations fallback to other zones and punch new holes? In which zone
> > > order?
> > 
> > Hmm... I don't understand your question. Could you elaborate it more?
> 
> Well, my question was about the zone fallback chain. MOVABLE allocation
> can fallback to lower zones and also to the ZONE_CMA with your patch. If
> there is a CMA allocation it doesn't fall back to any other zone - in
> other words no new holes are punched to other zones. Is this correct?

Hmm... I still don't get the meaning of "no new holes are punched to
other zones". I try to answer with my current understanding about your
question.

MOVABLE allocation will fallback as following sequence.

ZONE_CMA -> ZONE_MOVABLE -> ZONE_HIGHMEM -> ZONE_NORMAL -> ...

I don't understand what you mean CMA allocation. In MM's context,
there is no CMA allocation. That is just MOVABLE allocation.

For device's context, there is CMA allocation. It is range specific
allocation so it should be succeed for requested range. No fallback is
allowed in this case.

> > > > This 'no difference' is a strong point of this approach. ZONE_CMA is
> > > > naturally handled by MM subsystem unlike as before (special handling is
> > > > required for MIGRATE_CMA).
> > > > 
> > > > 3. Controversial Point
> > > > 
> > > > Major concern from Mel is that zone concept is abused. ZONE is 
> > > > originally
> > > > introduced to solve some issues due to H/W addressing limitation.
> > > 
> > > Yes, very much agreed on that. You basically want to punch holes into
> > > other zones to guarantee an allocation progress. Marking those wholes
> > > with special migrate type sounds quite natural but I will have to study
> > > the current code some more to see whether issues you mention are
> > > inherently unfixable. This might very well turn out to be the case.
> > 
> > At a glance, special migratetype sound natural. I also did. However,
> > it's not natural in implementation POV. Zone consists of the same type
> > of memory (by definition ?) and MM subsystem is implemented with that
> > assumption. If difference type of memory shares the same zone, it easily
> > causes the problem and CMA problems are the such case.
> 
> But this is not any different from the highmem vs. lowmem problems we
> already have, no? I have looked at your example in the cover where you
> mention utilization and the reclaim problems. With the node reclaim we
> will have pages from all zones on the same LRU(s). isolate_lru_pages
> will skip those from ZONE_CMA because their zone_idx is higher than
> 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-24 Thread Michal Hocko
On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
> On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> > On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
[...]
> > > ZONE_CMA is conceptually the same with ZONE_MOVABLE. There is a software
> > > constraint to guarantee the success of future allocation request from
> > > the device. If the device requests the specific range of the memory in CMA
> > > area at the runtime, page that allocated by MM will be migrated to
> > > the other page and it will be returned to the device. To guarantee it,
> > > ZONE_CMA only takes the allocation request with GFP_MOVABLE.
> > 
> > The immediate follow up question is. Why cannot we reuse ZONE_MOVABLE
> > for that purpose?
> 
> I can make CMA reuses the ZONE_MOVABLE but I don't want it. Reasons
> are that
> 
> 1. If ZONE_MOVABLE has two different types of memory, hotpluggable and
> CMA, it may need special handling for each type. This would lead to a new
> migratetype again (to distinguish them) and easy to be error-prone. I
> don't want that case.

Hmm, I see your motivation. I believe that we could find a way
around this. Anyway, movable zones are quite special and configuring
overlapping CMA and hotplug movable regions could be refused. So I am
not even sure this is a real problem in practice.

> 2. CMA users want to see usage stat separately since CMA often causes
> the problems and separate stat would helps to debug it.

That could be solved by a per-zone/node counter.

Anyway, these reasons should be mentioned as well. Adding a new zone is
not for free. For most common configurations where we have ZONE_DMA,
ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
consumed so a new zone will need a new one AFAICS.

[...]
> > > Other things are completely the same with other zones. For MM POV, there 
> > > is
> > > no difference in allocation process except that it only takes
> > > GFP_MOVABLE request. In reclaim, pages that are allocated by MM will
> > > be reclaimed by the same policy of the MM. So, no difference.
> > 
> > OK, so essentially this is yet another "highmem" zone. We already know
> > that only GFP_MOVABLE are allowed to fallback to ZONE_CMA but do CMA
> > allocations fallback to other zones and punch new holes? In which zone
> > order?
> 
> Hmm... I don't understand your question. Could you elaborate it more?

Well, my question was about the zone fallback chain. MOVABLE allocation
can fallback to lower zones and also to the ZONE_CMA with your patch. If
there is a CMA allocation it doesn't fall back to any other zone - in
other words no new holes are punched to other zones. Is this correct?

> > > This 'no difference' is a strong point of this approach. ZONE_CMA is
> > > naturally handled by MM subsystem unlike as before (special handling is
> > > required for MIGRATE_CMA).
> > > 
> > > 3. Controversial Point
> > > 
> > > Major concern from Mel is that zone concept is abused. ZONE is originally
> > > introduced to solve some issues due to H/W addressing limitation.
> > 
> > Yes, very much agreed on that. You basically want to punch holes into
> > other zones to guarantee an allocation progress. Marking those wholes
> > with special migrate type sounds quite natural but I will have to study
> > the current code some more to see whether issues you mention are
> > inherently unfixable. This might very well turn out to be the case.
> 
> At a glance, special migratetype sound natural. I also did. However,
> it's not natural in implementation POV. Zone consists of the same type
> of memory (by definition ?) and MM subsystem is implemented with that
> assumption. If difference type of memory shares the same zone, it easily
> causes the problem and CMA problems are the such case.

But this is not any different from the highmem vs. lowmem problems we
already have, no? I have looked at your example in the cover where you
mention utilization and the reclaim problems. With the node reclaim we
will have pages from all zones on the same LRU(s). isolate_lru_pages
will skip those from ZONE_CMA because their zone_idx is higher than
gfp_idx(GFP_KERNEL). The same could be achieved by an explicit check for
the pageblock migrate type. So the zone doesn't really help much. Or is
there some aspect that I am missing?

Another worry I would have with the zone approach is that there is a
risk to reintroduce issues we used to have with small zones in the
past. Just consider that the CMA will get depleted by CMA users almost
completely. Now that zone will not get balanced with only few pages.
wakeup_kswapd/pgdat_balanced already has measures to prevent from wake
ups but I cannot say I would be sure everything will work smoothly.

I have glanced through the cumulative diff and to be honest I am not
really sure the result is a great simplification in the end. There is
still quite a lot of special casing. It is true that the page allocator
path is cleaned up and some CMA specific checks are moved away. This is
definitely 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-24 Thread Michal Hocko
On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
> On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> > On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
[...]
> > > ZONE_CMA is conceptually the same with ZONE_MOVABLE. There is a software
> > > constraint to guarantee the success of future allocation request from
> > > the device. If the device requests the specific range of the memory in CMA
> > > area at the runtime, page that allocated by MM will be migrated to
> > > the other page and it will be returned to the device. To guarantee it,
> > > ZONE_CMA only takes the allocation request with GFP_MOVABLE.
> > 
> > The immediate follow up question is. Why cannot we reuse ZONE_MOVABLE
> > for that purpose?
> 
> I can make CMA reuses the ZONE_MOVABLE but I don't want it. Reasons
> are that
> 
> 1. If ZONE_MOVABLE has two different types of memory, hotpluggable and
> CMA, it may need special handling for each type. This would lead to a new
> migratetype again (to distinguish them) and easy to be error-prone. I
> don't want that case.

Hmm, I see your motivation. I believe that we could find a way
around this. Anyway, movable zones are quite special and configuring
overlapping CMA and hotplug movable regions could be refused. So I am
not even sure this is a real problem in practice.

> 2. CMA users want to see usage stat separately since CMA often causes
> the problems and separate stat would helps to debug it.

That could be solved by a per-zone/node counter.

Anyway, these reasons should be mentioned as well. Adding a new zone is
not for free. For most common configurations where we have ZONE_DMA,
ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
consumed so a new zone will need a new one AFAICS.

[...]
> > > Other things are completely the same with other zones. For MM POV, there 
> > > is
> > > no difference in allocation process except that it only takes
> > > GFP_MOVABLE request. In reclaim, pages that are allocated by MM will
> > > be reclaimed by the same policy of the MM. So, no difference.
> > 
> > OK, so essentially this is yet another "highmem" zone. We already know
> > that only GFP_MOVABLE are allowed to fallback to ZONE_CMA but do CMA
> > allocations fallback to other zones and punch new holes? In which zone
> > order?
> 
> Hmm... I don't understand your question. Could you elaborate it more?

Well, my question was about the zone fallback chain. MOVABLE allocation
can fallback to lower zones and also to the ZONE_CMA with your patch. If
there is a CMA allocation it doesn't fall back to any other zone - in
other words no new holes are punched to other zones. Is this correct?

> > > This 'no difference' is a strong point of this approach. ZONE_CMA is
> > > naturally handled by MM subsystem unlike as before (special handling is
> > > required for MIGRATE_CMA).
> > > 
> > > 3. Controversial Point
> > > 
> > > Major concern from Mel is that zone concept is abused. ZONE is originally
> > > introduced to solve some issues due to H/W addressing limitation.
> > 
> > Yes, very much agreed on that. You basically want to punch holes into
> > other zones to guarantee an allocation progress. Marking those wholes
> > with special migrate type sounds quite natural but I will have to study
> > the current code some more to see whether issues you mention are
> > inherently unfixable. This might very well turn out to be the case.
> 
> At a glance, special migratetype sound natural. I also did. However,
> it's not natural in implementation POV. Zone consists of the same type
> of memory (by definition ?) and MM subsystem is implemented with that
> assumption. If difference type of memory shares the same zone, it easily
> causes the problem and CMA problems are the such case.

But this is not any different from the highmem vs. lowmem problems we
already have, no? I have looked at your example in the cover where you
mention utilization and the reclaim problems. With the node reclaim we
will have pages from all zones on the same LRU(s). isolate_lru_pages
will skip those from ZONE_CMA because their zone_idx is higher than
gfp_idx(GFP_KERNEL). The same could be achieved by an explicit check for
the pageblock migrate type. So the zone doesn't really help much. Or is
there some aspect that I am missing?

Another worry I would have with the zone approach is that there is a
risk to reintroduce issues we used to have with small zones in the
past. Just consider that the CMA will get depleted by CMA users almost
completely. Now that zone will not get balanced with only few pages.
wakeup_kswapd/pgdat_balanced already has measures to prevent from wake
ups but I cannot say I would be sure everything will work smoothly.

I have glanced through the cumulative diff and to be honest I am not
really sure the result is a great simplification in the end. There is
still quite a lot of special casing. It is true that the page allocator
path is cleaned up and some CMA specific checks are moved away. This is
definitely 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-23 Thread Bob Liu
On 2017/4/11 11:17, js1...@gmail.com wrote:
> From: Joonsoo Kim 
> 
> Changed from v6
> o Rebase on next-20170405
> o Add a fix for lowmem mapping on ARM (last patch)
> o Re-organize the cover letter
> 
> Changes from v5
> o Rebase on next-20161013
> o Cosmetic change on patch 1
> o Optimize span of ZONE_CMA on multiple node system
> 
> Changes from v4
> o Rebase on next-20160825
> o Add general fix patch for lowmem reserve
> o Fix lowmem reserve ratio
> o Fix zone span optimizaion per Vlastimil
> o Fix pageset initialization
> o Change invocation timing on cma_init_reserved_areas()
> 
> Changes from v3
> o Rebase on next-20160805
> o Split first patch per Vlastimil
> o Remove useless function parameter per Vlastimil
> o Add code comment per Vlastimil
> o Add following description on cover-letter
> 
> Changes from v2
> o Rebase on next-20160525
> o No other changes except following description
> 
> Changes from v1
> o Separate some patches which deserve to submit independently
> o Modify description to reflect current kernel state
> (e.g. high-order watermark problem disappeared by Mel's work)
> o Don't increase SECTION_SIZE_BITS to make a room in page flags
> (detailed reason is on the patch that adds ZONE_CMA)
> o Adjust ZONE_CMA population code
> 
> 
> Hello,
> 
> This is the 7th version of ZONE_CMA patchset. One patch is added
> to fix potential problem on ARM. Other changes are just due to rebase.
> 
> This patchset has long history and got some reviews before. This
> cover-letter has the summary and my opinion on those reviews. Content
> order is so confusing so I make a simple index. If anyone want to
> understand the history properly, please read them by reverse order.
> 
> PART 1. Strong points of the zone approach
> PART 2. Summary in LSF/MM 2016 discussion
> PART 3. Original motivation of this patchset
> 
> * PART 1 *
> 
> CMA has many problems and I mentioned them on the bottom of the
> cover letter. These problems comes from limitation of CMA memory that
> should be always migratable for device usage. I think that introducing
> a new zone is the best approach to solve them. Here are the reasons.
> 
> Zone is introduced to solve some issues due to H/W addressing limitation.
> MM subsystem is implemented to work efficiently with these zones.
> Allocation/reclaim logic in MM consider this limitation very much.
> What I did in this patchset is introducing a new zone and extending zone's
> concept slightly. New concept is that zone can have not only H/W addressing
> limitation but also S/W limitation to guarantee page migration.
> This concept is originated from ZONE_MOVABLE and it works well
> for a long time. So, ZONE_CMA should not be special at this moment.
> 
> There is a major concern from Mel that ZONE_MOVABLE which has
> S/W limitation causes highmem/lowmem problem. Highmem/lowmem problem is
> that some of memory cannot be usable for kernel memory due to limitation
> of the zone. It causes to break LRU ordering and makes hard to find kernel
> usable memory when memory pressure.
> 
> However, important point is that this problem doesn't come from
> implementation detail (ZONE_MOVABLE/MIGRATETYPE). Even if we implement it
> by MIGRATETYPE instead of by ZONE_MOVABLE, we cannot use that type of
> memory for kernel allocation because it isn't migratable. So, it will cause
> to break LRU ordering, too. We cannot avoid the problem in any case.
> Therefore, we should focus on which solution is better for maintenance
> and not intrusive for MM subsystem.
> 
> In this viewpoint, I think that zone approach is better. As mentioned
> earlier, MM subsystem already have many infrastructures to deal with
> zone's H/W addressing limitation. Adding S/W limitation on zone concept
> and adding a new zone doesn't change anything. It will work by itself.
> My patchset can remove many hooks related to CMA area management in MM
> while solving the problems. More hooks are required to solve the problems
> if we choose MIGRATETYPE approach.
> 

Agree, there are already too many hooks and pain to maintain/bugfix.
It looks better if choose this ZONE_CMA approach.

--
Regards,
Bob Liu




Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-23 Thread Bob Liu
On 2017/4/11 11:17, js1...@gmail.com wrote:
> From: Joonsoo Kim 
> 
> Changed from v6
> o Rebase on next-20170405
> o Add a fix for lowmem mapping on ARM (last patch)
> o Re-organize the cover letter
> 
> Changes from v5
> o Rebase on next-20161013
> o Cosmetic change on patch 1
> o Optimize span of ZONE_CMA on multiple node system
> 
> Changes from v4
> o Rebase on next-20160825
> o Add general fix patch for lowmem reserve
> o Fix lowmem reserve ratio
> o Fix zone span optimizaion per Vlastimil
> o Fix pageset initialization
> o Change invocation timing on cma_init_reserved_areas()
> 
> Changes from v3
> o Rebase on next-20160805
> o Split first patch per Vlastimil
> o Remove useless function parameter per Vlastimil
> o Add code comment per Vlastimil
> o Add following description on cover-letter
> 
> Changes from v2
> o Rebase on next-20160525
> o No other changes except following description
> 
> Changes from v1
> o Separate some patches which deserve to submit independently
> o Modify description to reflect current kernel state
> (e.g. high-order watermark problem disappeared by Mel's work)
> o Don't increase SECTION_SIZE_BITS to make a room in page flags
> (detailed reason is on the patch that adds ZONE_CMA)
> o Adjust ZONE_CMA population code
> 
> 
> Hello,
> 
> This is the 7th version of ZONE_CMA patchset. One patch is added
> to fix potential problem on ARM. Other changes are just due to rebase.
> 
> This patchset has long history and got some reviews before. This
> cover-letter has the summary and my opinion on those reviews. Content
> order is so confusing so I make a simple index. If anyone want to
> understand the history properly, please read them by reverse order.
> 
> PART 1. Strong points of the zone approach
> PART 2. Summary in LSF/MM 2016 discussion
> PART 3. Original motivation of this patchset
> 
> * PART 1 *
> 
> CMA has many problems and I mentioned them on the bottom of the
> cover letter. These problems comes from limitation of CMA memory that
> should be always migratable for device usage. I think that introducing
> a new zone is the best approach to solve them. Here are the reasons.
> 
> Zone is introduced to solve some issues due to H/W addressing limitation.
> MM subsystem is implemented to work efficiently with these zones.
> Allocation/reclaim logic in MM consider this limitation very much.
> What I did in this patchset is introducing a new zone and extending zone's
> concept slightly. New concept is that zone can have not only H/W addressing
> limitation but also S/W limitation to guarantee page migration.
> This concept is originated from ZONE_MOVABLE and it works well
> for a long time. So, ZONE_CMA should not be special at this moment.
> 
> There is a major concern from Mel that ZONE_MOVABLE which has
> S/W limitation causes highmem/lowmem problem. Highmem/lowmem problem is
> that some of memory cannot be usable for kernel memory due to limitation
> of the zone. It causes to break LRU ordering and makes hard to find kernel
> usable memory when memory pressure.
> 
> However, important point is that this problem doesn't come from
> implementation detail (ZONE_MOVABLE/MIGRATETYPE). Even if we implement it
> by MIGRATETYPE instead of by ZONE_MOVABLE, we cannot use that type of
> memory for kernel allocation because it isn't migratable. So, it will cause
> to break LRU ordering, too. We cannot avoid the problem in any case.
> Therefore, we should focus on which solution is better for maintenance
> and not intrusive for MM subsystem.
> 
> In this viewpoint, I think that zone approach is better. As mentioned
> earlier, MM subsystem already have many infrastructures to deal with
> zone's H/W addressing limitation. Adding S/W limitation on zone concept
> and adding a new zone doesn't change anything. It will work by itself.
> My patchset can remove many hooks related to CMA area management in MM
> while solving the problems. More hooks are required to solve the problems
> if we choose MIGRATETYPE approach.
> 

Agree, there are already too many hooks and pain to maintain/bugfix.
It looks better if choose this ZONE_CMA approach.

--
Regards,
Bob Liu




Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-21 Thread Michal Hocko
On Fri 21-04-17 10:35:03, Joonsoo Kim wrote:
[...]
> Hello, Michal.
> 
> If you don't have any more question, I will send next version with
> updated cover-letter.

I am sorry but I am bussy as hell this week and didn't get to your email
yet. I will try as soon as possible.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-21 Thread Michal Hocko
On Fri 21-04-17 10:35:03, Joonsoo Kim wrote:
[...]
> Hello, Michal.
> 
> If you don't have any more question, I will send next version with
> updated cover-letter.

I am sorry but I am bussy as hell this week and didn't get to your email
yet. I will try as soon as possible.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-20 Thread Joonsoo Kim
On Mon, Apr 17, 2017 at 11:02:12AM +0900, Joonsoo Kim wrote:
> On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> > On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> > > On Tue, Apr 11, 2017 at 08:15:20PM +0200, Michal Hocko wrote:
> > > > Hi,
> > > > I didn't get to read though patches yet but the cover letter didn't
> > > > really help me to understand the basic concepts to have a good starting
> > > > point before diving into implementation details. It contains a lot of
> > > > history remarks which is not bad but IMHO too excessive here. I would
> > > > appreciate the following information (some of that is already provided
> > > > in the cover but could benefit from some rewording/text reorganization).
> > > > 
> > > > - what is ZONE_CMA and how it is configured (from admin POV)
> > > > - how does ZONE_CMA compare to other zones
> > > > - who is allowed to allocate from this zone and what are the
> > > >   guarantees/requirements for successful allocation
> > > > - how does the zone compare to a preallocate allocation pool
> > > > - how is ZONE_CMA balanced/reclaimed due to internal memory pressure
> > > >   (from CMA users)
> > > > - is this zone reclaimable for the global memory reclaim
> > > > - why this was/is controversial
> > > 
> > > Hello,
> > > 
> > > I hope that following summary helps you to understand this patchset.
> > > I skip some basic things about CMA. I will attach this description to
> > > the cover-letter if re-spin is needed.
> > 
> > I believe that sorting out these questions is more important than what
> > you have in the current cover letter. Andrew tends to fold the cover
> > into the first patch so I think you should update.
> 
> Okay.
>  
> > > 2. How does ZONE_CMA compare to other zones
> > > 
> > > ZONE_CMA is conceptually the same with ZONE_MOVABLE. There is a software
> > > constraint to guarantee the success of future allocation request from
> > > the device. If the device requests the specific range of the memory in CMA
> > > area at the runtime, page that allocated by MM will be migrated to
> > > the other page and it will be returned to the device. To guarantee it,
> > > ZONE_CMA only takes the allocation request with GFP_MOVABLE.
> > 
> > The immediate follow up question is. Why cannot we reuse ZONE_MOVABLE
> > for that purpose?
> 
> I can make CMA reuses the ZONE_MOVABLE but I don't want it. Reasons
> are that
> 
> 1. If ZONE_MOVABLE has two different types of memory, hotpluggable and
> CMA, it may need special handling for each type. This would lead to a new
> migratetype again (to distinguish them) and easy to be error-prone. I
> don't want that case.
> 
> 2. CMA users want to see usage stat separately since CMA often causes
> the problems and separate stat would helps to debug it.
> 
> > > The other important point about ZONE_CMA is that span of ZONE_CMA would be
> > > overlapped with the other zone. This is not new to MM subsystem and
> > > MM subsystem has enough logic to handle such situation
> > > so there would be no problem.
> > 
> > I am not really sure this is actually true. Zones are disjoint from the
> > early beginning. I remember that we had something like numa nodes
> > interleaving but that is such a rare configuration that I wouldn't be
> > surprised if it wasn't very well tested and actually broken in some
> > subtle ways.
> 
> I agree with your concern however if something is broken for them, it
> just shows that we need to fix it. MM should handle this situation
> since we already know that such architecture exists.
> 
> > 
> > There are many page_zone(page) != zone checks sprinkled in the code but
> > I do not see anything consistent there. Similarly pageblock_pfn_to_page
> > is only used by compaction but there are other pfn walkers which do
> > ad-hoc checking. I was staring into that code these days due to my
> > hotplug patches.
> >
> > That being said, I think that interleaving zones are an interesting
> > concept but I would be rather nervous to consider this as working
> > currently without a deeper review.
> 
> I have tried to audit all the pfn walkers before and have added above
> mentioned check. Perhaps, I missed something however I believe not
> that much. Our production already have used ZONE_CMA and I haven't get
> the report about such problem.
> 
> > 
> > > Other things are completely the same with other zones. For MM POV, there 
> > > is
> > > no difference in allocation process except that it only takes
> > > GFP_MOVABLE request. In reclaim, pages that are allocated by MM will
> > > be reclaimed by the same policy of the MM. So, no difference.
> > 
> > OK, so essentially this is yet another "highmem" zone. We already know
> > that only GFP_MOVABLE are allowed to fallback to ZONE_CMA but do CMA
> > allocations fallback to other zones and punch new holes? In which zone
> > order?
> 
> Hmm... I don't understand your question. Could you elaborate it more?
> 
> > > This 'no difference' is a strong point of this approach. 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-20 Thread Joonsoo Kim
On Mon, Apr 17, 2017 at 11:02:12AM +0900, Joonsoo Kim wrote:
> On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> > On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> > > On Tue, Apr 11, 2017 at 08:15:20PM +0200, Michal Hocko wrote:
> > > > Hi,
> > > > I didn't get to read though patches yet but the cover letter didn't
> > > > really help me to understand the basic concepts to have a good starting
> > > > point before diving into implementation details. It contains a lot of
> > > > history remarks which is not bad but IMHO too excessive here. I would
> > > > appreciate the following information (some of that is already provided
> > > > in the cover but could benefit from some rewording/text reorganization).
> > > > 
> > > > - what is ZONE_CMA and how it is configured (from admin POV)
> > > > - how does ZONE_CMA compare to other zones
> > > > - who is allowed to allocate from this zone and what are the
> > > >   guarantees/requirements for successful allocation
> > > > - how does the zone compare to a preallocate allocation pool
> > > > - how is ZONE_CMA balanced/reclaimed due to internal memory pressure
> > > >   (from CMA users)
> > > > - is this zone reclaimable for the global memory reclaim
> > > > - why this was/is controversial
> > > 
> > > Hello,
> > > 
> > > I hope that following summary helps you to understand this patchset.
> > > I skip some basic things about CMA. I will attach this description to
> > > the cover-letter if re-spin is needed.
> > 
> > I believe that sorting out these questions is more important than what
> > you have in the current cover letter. Andrew tends to fold the cover
> > into the first patch so I think you should update.
> 
> Okay.
>  
> > > 2. How does ZONE_CMA compare to other zones
> > > 
> > > ZONE_CMA is conceptually the same with ZONE_MOVABLE. There is a software
> > > constraint to guarantee the success of future allocation request from
> > > the device. If the device requests the specific range of the memory in CMA
> > > area at the runtime, page that allocated by MM will be migrated to
> > > the other page and it will be returned to the device. To guarantee it,
> > > ZONE_CMA only takes the allocation request with GFP_MOVABLE.
> > 
> > The immediate follow up question is. Why cannot we reuse ZONE_MOVABLE
> > for that purpose?
> 
> I can make CMA reuses the ZONE_MOVABLE but I don't want it. Reasons
> are that
> 
> 1. If ZONE_MOVABLE has two different types of memory, hotpluggable and
> CMA, it may need special handling for each type. This would lead to a new
> migratetype again (to distinguish them) and easy to be error-prone. I
> don't want that case.
> 
> 2. CMA users want to see usage stat separately since CMA often causes
> the problems and separate stat would helps to debug it.
> 
> > > The other important point about ZONE_CMA is that span of ZONE_CMA would be
> > > overlapped with the other zone. This is not new to MM subsystem and
> > > MM subsystem has enough logic to handle such situation
> > > so there would be no problem.
> > 
> > I am not really sure this is actually true. Zones are disjoint from the
> > early beginning. I remember that we had something like numa nodes
> > interleaving but that is such a rare configuration that I wouldn't be
> > surprised if it wasn't very well tested and actually broken in some
> > subtle ways.
> 
> I agree with your concern however if something is broken for them, it
> just shows that we need to fix it. MM should handle this situation
> since we already know that such architecture exists.
> 
> > 
> > There are many page_zone(page) != zone checks sprinkled in the code but
> > I do not see anything consistent there. Similarly pageblock_pfn_to_page
> > is only used by compaction but there are other pfn walkers which do
> > ad-hoc checking. I was staring into that code these days due to my
> > hotplug patches.
> >
> > That being said, I think that interleaving zones are an interesting
> > concept but I would be rather nervous to consider this as working
> > currently without a deeper review.
> 
> I have tried to audit all the pfn walkers before and have added above
> mentioned check. Perhaps, I missed something however I believe not
> that much. Our production already have used ZONE_CMA and I haven't get
> the report about such problem.
> 
> > 
> > > Other things are completely the same with other zones. For MM POV, there 
> > > is
> > > no difference in allocation process except that it only takes
> > > GFP_MOVABLE request. In reclaim, pages that are allocated by MM will
> > > be reclaimed by the same policy of the MM. So, no difference.
> > 
> > OK, so essentially this is yet another "highmem" zone. We already know
> > that only GFP_MOVABLE are allowed to fallback to ZONE_CMA but do CMA
> > allocations fallback to other zones and punch new holes? In which zone
> > order?
> 
> Hmm... I don't understand your question. Could you elaborate it more?
> 
> > > This 'no difference' is a strong point of this approach. 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-16 Thread Joonsoo Kim
On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> > On Tue, Apr 11, 2017 at 08:15:20PM +0200, Michal Hocko wrote:
> > > Hi,
> > > I didn't get to read though patches yet but the cover letter didn't
> > > really help me to understand the basic concepts to have a good starting
> > > point before diving into implementation details. It contains a lot of
> > > history remarks which is not bad but IMHO too excessive here. I would
> > > appreciate the following information (some of that is already provided
> > > in the cover but could benefit from some rewording/text reorganization).
> > > 
> > > - what is ZONE_CMA and how it is configured (from admin POV)
> > > - how does ZONE_CMA compare to other zones
> > > - who is allowed to allocate from this zone and what are the
> > >   guarantees/requirements for successful allocation
> > > - how does the zone compare to a preallocate allocation pool
> > > - how is ZONE_CMA balanced/reclaimed due to internal memory pressure
> > >   (from CMA users)
> > > - is this zone reclaimable for the global memory reclaim
> > > - why this was/is controversial
> > 
> > Hello,
> > 
> > I hope that following summary helps you to understand this patchset.
> > I skip some basic things about CMA. I will attach this description to
> > the cover-letter if re-spin is needed.
> 
> I believe that sorting out these questions is more important than what
> you have in the current cover letter. Andrew tends to fold the cover
> into the first patch so I think you should update.

Okay.
 
> > 2. How does ZONE_CMA compare to other zones
> > 
> > ZONE_CMA is conceptually the same with ZONE_MOVABLE. There is a software
> > constraint to guarantee the success of future allocation request from
> > the device. If the device requests the specific range of the memory in CMA
> > area at the runtime, page that allocated by MM will be migrated to
> > the other page and it will be returned to the device. To guarantee it,
> > ZONE_CMA only takes the allocation request with GFP_MOVABLE.
> 
> The immediate follow up question is. Why cannot we reuse ZONE_MOVABLE
> for that purpose?

I can make CMA reuses the ZONE_MOVABLE but I don't want it. Reasons
are that

1. If ZONE_MOVABLE has two different types of memory, hotpluggable and
CMA, it may need special handling for each type. This would lead to a new
migratetype again (to distinguish them) and easy to be error-prone. I
don't want that case.

2. CMA users want to see usage stat separately since CMA often causes
the problems and separate stat would helps to debug it.

> > The other important point about ZONE_CMA is that span of ZONE_CMA would be
> > overlapped with the other zone. This is not new to MM subsystem and
> > MM subsystem has enough logic to handle such situation
> > so there would be no problem.
> 
> I am not really sure this is actually true. Zones are disjoint from the
> early beginning. I remember that we had something like numa nodes
> interleaving but that is such a rare configuration that I wouldn't be
> surprised if it wasn't very well tested and actually broken in some
> subtle ways.

I agree with your concern however if something is broken for them, it
just shows that we need to fix it. MM should handle this situation
since we already know that such architecture exists.

> 
> There are many page_zone(page) != zone checks sprinkled in the code but
> I do not see anything consistent there. Similarly pageblock_pfn_to_page
> is only used by compaction but there are other pfn walkers which do
> ad-hoc checking. I was staring into that code these days due to my
> hotplug patches.
>
> That being said, I think that interleaving zones are an interesting
> concept but I would be rather nervous to consider this as working
> currently without a deeper review.

I have tried to audit all the pfn walkers before and have added above
mentioned check. Perhaps, I missed something however I believe not
that much. Our production already have used ZONE_CMA and I haven't get
the report about such problem.

> 
> > Other things are completely the same with other zones. For MM POV, there is
> > no difference in allocation process except that it only takes
> > GFP_MOVABLE request. In reclaim, pages that are allocated by MM will
> > be reclaimed by the same policy of the MM. So, no difference.
> 
> OK, so essentially this is yet another "highmem" zone. We already know
> that only GFP_MOVABLE are allowed to fallback to ZONE_CMA but do CMA
> allocations fallback to other zones and punch new holes? In which zone
> order?

Hmm... I don't understand your question. Could you elaborate it more?

> > This 'no difference' is a strong point of this approach. ZONE_CMA is
> > naturally handled by MM subsystem unlike as before (special handling is
> > required for MIGRATE_CMA).
> > 
> > 3. Controversial Point
> > 
> > Major concern from Mel is that zone concept is abused. ZONE is originally
> > introduced to solve some issues 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-16 Thread Joonsoo Kim
On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
> On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> > On Tue, Apr 11, 2017 at 08:15:20PM +0200, Michal Hocko wrote:
> > > Hi,
> > > I didn't get to read though patches yet but the cover letter didn't
> > > really help me to understand the basic concepts to have a good starting
> > > point before diving into implementation details. It contains a lot of
> > > history remarks which is not bad but IMHO too excessive here. I would
> > > appreciate the following information (some of that is already provided
> > > in the cover but could benefit from some rewording/text reorganization).
> > > 
> > > - what is ZONE_CMA and how it is configured (from admin POV)
> > > - how does ZONE_CMA compare to other zones
> > > - who is allowed to allocate from this zone and what are the
> > >   guarantees/requirements for successful allocation
> > > - how does the zone compare to a preallocate allocation pool
> > > - how is ZONE_CMA balanced/reclaimed due to internal memory pressure
> > >   (from CMA users)
> > > - is this zone reclaimable for the global memory reclaim
> > > - why this was/is controversial
> > 
> > Hello,
> > 
> > I hope that following summary helps you to understand this patchset.
> > I skip some basic things about CMA. I will attach this description to
> > the cover-letter if re-spin is needed.
> 
> I believe that sorting out these questions is more important than what
> you have in the current cover letter. Andrew tends to fold the cover
> into the first patch so I think you should update.

Okay.
 
> > 2. How does ZONE_CMA compare to other zones
> > 
> > ZONE_CMA is conceptually the same with ZONE_MOVABLE. There is a software
> > constraint to guarantee the success of future allocation request from
> > the device. If the device requests the specific range of the memory in CMA
> > area at the runtime, page that allocated by MM will be migrated to
> > the other page and it will be returned to the device. To guarantee it,
> > ZONE_CMA only takes the allocation request with GFP_MOVABLE.
> 
> The immediate follow up question is. Why cannot we reuse ZONE_MOVABLE
> for that purpose?

I can make CMA reuses the ZONE_MOVABLE but I don't want it. Reasons
are that

1. If ZONE_MOVABLE has two different types of memory, hotpluggable and
CMA, it may need special handling for each type. This would lead to a new
migratetype again (to distinguish them) and easy to be error-prone. I
don't want that case.

2. CMA users want to see usage stat separately since CMA often causes
the problems and separate stat would helps to debug it.

> > The other important point about ZONE_CMA is that span of ZONE_CMA would be
> > overlapped with the other zone. This is not new to MM subsystem and
> > MM subsystem has enough logic to handle such situation
> > so there would be no problem.
> 
> I am not really sure this is actually true. Zones are disjoint from the
> early beginning. I remember that we had something like numa nodes
> interleaving but that is such a rare configuration that I wouldn't be
> surprised if it wasn't very well tested and actually broken in some
> subtle ways.

I agree with your concern however if something is broken for them, it
just shows that we need to fix it. MM should handle this situation
since we already know that such architecture exists.

> 
> There are many page_zone(page) != zone checks sprinkled in the code but
> I do not see anything consistent there. Similarly pageblock_pfn_to_page
> is only used by compaction but there are other pfn walkers which do
> ad-hoc checking. I was staring into that code these days due to my
> hotplug patches.
>
> That being said, I think that interleaving zones are an interesting
> concept but I would be rather nervous to consider this as working
> currently without a deeper review.

I have tried to audit all the pfn walkers before and have added above
mentioned check. Perhaps, I missed something however I believe not
that much. Our production already have used ZONE_CMA and I haven't get
the report about such problem.

> 
> > Other things are completely the same with other zones. For MM POV, there is
> > no difference in allocation process except that it only takes
> > GFP_MOVABLE request. In reclaim, pages that are allocated by MM will
> > be reclaimed by the same policy of the MM. So, no difference.
> 
> OK, so essentially this is yet another "highmem" zone. We already know
> that only GFP_MOVABLE are allowed to fallback to ZONE_CMA but do CMA
> allocations fallback to other zones and punch new holes? In which zone
> order?

Hmm... I don't understand your question. Could you elaborate it more?

> > This 'no difference' is a strong point of this approach. ZONE_CMA is
> > naturally handled by MM subsystem unlike as before (special handling is
> > required for MIGRATE_CMA).
> > 
> > 3. Controversial Point
> > 
> > Major concern from Mel is that zone concept is abused. ZONE is originally
> > introduced to solve some issues 

Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-13 Thread Michal Hocko
On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> On Tue, Apr 11, 2017 at 08:15:20PM +0200, Michal Hocko wrote:
> > Hi,
> > I didn't get to read though patches yet but the cover letter didn't
> > really help me to understand the basic concepts to have a good starting
> > point before diving into implementation details. It contains a lot of
> > history remarks which is not bad but IMHO too excessive here. I would
> > appreciate the following information (some of that is already provided
> > in the cover but could benefit from some rewording/text reorganization).
> > 
> > - what is ZONE_CMA and how it is configured (from admin POV)
> > - how does ZONE_CMA compare to other zones
> > - who is allowed to allocate from this zone and what are the
> >   guarantees/requirements for successful allocation
> > - how does the zone compare to a preallocate allocation pool
> > - how is ZONE_CMA balanced/reclaimed due to internal memory pressure
> >   (from CMA users)
> > - is this zone reclaimable for the global memory reclaim
> > - why this was/is controversial
> 
> Hello,
> 
> I hope that following summary helps you to understand this patchset.
> I skip some basic things about CMA. I will attach this description to
> the cover-letter if re-spin is needed.

I believe that sorting out these questions is more important than what
you have in the current cover letter. Andrew tends to fold the cover
into the first patch so I think you should update.

> 2. How does ZONE_CMA compare to other zones
> 
> ZONE_CMA is conceptually the same with ZONE_MOVABLE. There is a software
> constraint to guarantee the success of future allocation request from
> the device. If the device requests the specific range of the memory in CMA
> area at the runtime, page that allocated by MM will be migrated to
> the other page and it will be returned to the device. To guarantee it,
> ZONE_CMA only takes the allocation request with GFP_MOVABLE.

The immediate follow up question is. Why cannot we reuse ZONE_MOVABLE
for that purpose?

> The other important point about ZONE_CMA is that span of ZONE_CMA would be
> overlapped with the other zone. This is not new to MM subsystem and
> MM subsystem has enough logic to handle such situation
> so there would be no problem.

I am not really sure this is actually true. Zones are disjoint from the
early beginning. I remember that we had something like numa nodes
interleaving but that is such a rare configuration that I wouldn't be
surprised if it wasn't very well tested and actually broken in some
subtle ways.

There are many page_zone(page) != zone checks sprinkled in the code but
I do not see anything consistent there. Similarly pageblock_pfn_to_page
is only used by compaction but there are other pfn walkers which do
ad-hoc checking. I was staring into that code these days due to my
hotplug patches.

That being said, I think that interleaving zones are an interesting
concept but I would be rather nervous to consider this as working
currently without a deeper review.

> Other things are completely the same with other zones. For MM POV, there is
> no difference in allocation process except that it only takes
> GFP_MOVABLE request. In reclaim, pages that are allocated by MM will
> be reclaimed by the same policy of the MM. So, no difference.

OK, so essentially this is yet another "highmem" zone. We already know
that only GFP_MOVABLE are allowed to fallback to ZONE_CMA but do CMA
allocations fallback to other zones and punch new holes? In which zone
order?

> This 'no difference' is a strong point of this approach. ZONE_CMA is
> naturally handled by MM subsystem unlike as before (special handling is
> required for MIGRATE_CMA).
> 
> 3. Controversial Point
> 
> Major concern from Mel is that zone concept is abused. ZONE is originally
> introduced to solve some issues due to H/W addressing limitation.

Yes, very much agreed on that. You basically want to punch holes into
other zones to guarantee an allocation progress. Marking those wholes
with special migrate type sounds quite natural but I will have to study
the current code some more to see whether issues you mention are
inherently unfixable. This might very well turn out to be the case.

> However, from the age of ZONE_MOVABLE, ZONE is used to solve the issues
> due to S/W limitation.

copying ZONE_MOVABLE pattern doesn't sound all that great to me to be
honest.

> This S/W limitation causes highmem/lowmem problem
> that is some of memory cannot be usable for kernel memory and LRU ordering
> would be broken easily. My major objection to this point is that
> this problem isn't related to implementation detail like as ZONE.

yes, agreement on that.

> Problems just comes from S/W limitation that we cannot use this memory
> for kernel memory to guarantee offlining the memory (ZONE_MOVABLE) or
> allocation from the device (ZONE_CMA) in the future. See PART 1 for
> more information.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-13 Thread Michal Hocko
On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
> On Tue, Apr 11, 2017 at 08:15:20PM +0200, Michal Hocko wrote:
> > Hi,
> > I didn't get to read though patches yet but the cover letter didn't
> > really help me to understand the basic concepts to have a good starting
> > point before diving into implementation details. It contains a lot of
> > history remarks which is not bad but IMHO too excessive here. I would
> > appreciate the following information (some of that is already provided
> > in the cover but could benefit from some rewording/text reorganization).
> > 
> > - what is ZONE_CMA and how it is configured (from admin POV)
> > - how does ZONE_CMA compare to other zones
> > - who is allowed to allocate from this zone and what are the
> >   guarantees/requirements for successful allocation
> > - how does the zone compare to a preallocate allocation pool
> > - how is ZONE_CMA balanced/reclaimed due to internal memory pressure
> >   (from CMA users)
> > - is this zone reclaimable for the global memory reclaim
> > - why this was/is controversial
> 
> Hello,
> 
> I hope that following summary helps you to understand this patchset.
> I skip some basic things about CMA. I will attach this description to
> the cover-letter if re-spin is needed.

I believe that sorting out these questions is more important than what
you have in the current cover letter. Andrew tends to fold the cover
into the first patch so I think you should update.

> 2. How does ZONE_CMA compare to other zones
> 
> ZONE_CMA is conceptually the same with ZONE_MOVABLE. There is a software
> constraint to guarantee the success of future allocation request from
> the device. If the device requests the specific range of the memory in CMA
> area at the runtime, page that allocated by MM will be migrated to
> the other page and it will be returned to the device. To guarantee it,
> ZONE_CMA only takes the allocation request with GFP_MOVABLE.

The immediate follow up question is. Why cannot we reuse ZONE_MOVABLE
for that purpose?

> The other important point about ZONE_CMA is that span of ZONE_CMA would be
> overlapped with the other zone. This is not new to MM subsystem and
> MM subsystem has enough logic to handle such situation
> so there would be no problem.

I am not really sure this is actually true. Zones are disjoint from the
early beginning. I remember that we had something like numa nodes
interleaving but that is such a rare configuration that I wouldn't be
surprised if it wasn't very well tested and actually broken in some
subtle ways.

There are many page_zone(page) != zone checks sprinkled in the code but
I do not see anything consistent there. Similarly pageblock_pfn_to_page
is only used by compaction but there are other pfn walkers which do
ad-hoc checking. I was staring into that code these days due to my
hotplug patches.

That being said, I think that interleaving zones are an interesting
concept but I would be rather nervous to consider this as working
currently without a deeper review.

> Other things are completely the same with other zones. For MM POV, there is
> no difference in allocation process except that it only takes
> GFP_MOVABLE request. In reclaim, pages that are allocated by MM will
> be reclaimed by the same policy of the MM. So, no difference.

OK, so essentially this is yet another "highmem" zone. We already know
that only GFP_MOVABLE are allowed to fallback to ZONE_CMA but do CMA
allocations fallback to other zones and punch new holes? In which zone
order?

> This 'no difference' is a strong point of this approach. ZONE_CMA is
> naturally handled by MM subsystem unlike as before (special handling is
> required for MIGRATE_CMA).
> 
> 3. Controversial Point
> 
> Major concern from Mel is that zone concept is abused. ZONE is originally
> introduced to solve some issues due to H/W addressing limitation.

Yes, very much agreed on that. You basically want to punch holes into
other zones to guarantee an allocation progress. Marking those wholes
with special migrate type sounds quite natural but I will have to study
the current code some more to see whether issues you mention are
inherently unfixable. This might very well turn out to be the case.

> However, from the age of ZONE_MOVABLE, ZONE is used to solve the issues
> due to S/W limitation.

copying ZONE_MOVABLE pattern doesn't sound all that great to me to be
honest.

> This S/W limitation causes highmem/lowmem problem
> that is some of memory cannot be usable for kernel memory and LRU ordering
> would be broken easily. My major objection to this point is that
> this problem isn't related to implementation detail like as ZONE.

yes, agreement on that.

> Problems just comes from S/W limitation that we cannot use this memory
> for kernel memory to guarantee offlining the memory (ZONE_MOVABLE) or
> allocation from the device (ZONE_CMA) in the future. See PART 1 for
> more information.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-11 Thread Joonsoo Kim
On Tue, Apr 11, 2017 at 12:17:13PM +0900, js1...@gmail.com wrote:
> From: Joonsoo Kim 
> 
> Changed from v6
> o Rebase on next-20170405
> o Add a fix for lowmem mapping on ARM (last patch)

Hello, Russell and Will.

In this 7th patchset, I newly added a patch for ARM.
Could you review it?

Thanks.



Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-11 Thread Joonsoo Kim
On Tue, Apr 11, 2017 at 12:17:13PM +0900, js1...@gmail.com wrote:
> From: Joonsoo Kim 
> 
> Changed from v6
> o Rebase on next-20170405
> o Add a fix for lowmem mapping on ARM (last patch)

Hello, Russell and Will.

In this 7th patchset, I newly added a patch for ARM.
Could you review it?

Thanks.



Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-11 Thread Joonsoo Kim
On Tue, Apr 11, 2017 at 08:15:20PM +0200, Michal Hocko wrote:
> Hi,
> I didn't get to read though patches yet but the cover letter didn't
> really help me to understand the basic concepts to have a good starting
> point before diving into implementation details. It contains a lot of
> history remarks which is not bad but IMHO too excessive here. I would
> appreciate the following information (some of that is already provided
> in the cover but could benefit from some rewording/text reorganization).
> 
> - what is ZONE_CMA and how it is configured (from admin POV)
> - how does ZONE_CMA compare to other zones
> - who is allowed to allocate from this zone and what are the
>   guarantees/requirements for successful allocation
> - how does the zone compare to a preallocate allocation pool
> - how is ZONE_CMA balanced/reclaimed due to internal memory pressure
>   (from CMA users)
> - is this zone reclaimable for the global memory reclaim
> - why this was/is controversial

Hello,

I hope that following summary helps you to understand this patchset.
I skip some basic things about CMA. I will attach this description to
the cover-letter if re-spin is needed.

1. What is ZONE_CMA

ZONE_CMA is a newly introduced zone that manages freepages in CMA areas.
Previously, freepages in CMA areas are in the ordinary zone and
managed/distinguished by the special migratetype, MIGRATE_CMA.
However, it causes too many subtle problems and fixing all the problems
due to it seems to be impossible and too intrusive to MM subsystem.
Therefore, different solution is requested and this is the outcome of
this request. Problem details are described in PART 3.

There is no change in admin POV. It is just implementation detail.
If the kernel is congifured to use CMA, it is managed by MM like as before
except pages are now belong to the separate zone, ZONE_CMA.

2. How does ZONE_CMA compare to other zones

ZONE_CMA is conceptually the same with ZONE_MOVABLE. There is a software
constraint to guarantee the success of future allocation request from
the device. If the device requests the specific range of the memory in CMA
area at the runtime, page that allocated by MM will be migrated to
the other page and it will be returned to the device. To guarantee it,
ZONE_CMA only takes the allocation request with GFP_MOVABLE.

The other important point about ZONE_CMA is that span of ZONE_CMA would be
overlapped with the other zone. This is not new to MM subsystem and
MM subsystem has enough logic to handle such situation
so there would be no problem.

Other things are completely the same with other zones. For MM POV, there is
no difference in allocation process except that it only takes
GFP_MOVABLE request. In reclaim, pages that are allocated by MM will
be reclaimed by the same policy of the MM. So, no difference.

This 'no difference' is a strong point of this approach. ZONE_CMA is
naturally handled by MM subsystem unlike as before (special handling is
required for MIGRATE_CMA).

3. Controversial Point

Major concern from Mel is that zone concept is abused. ZONE is originally
introduced to solve some issues due to H/W addressing limitation.
However, from the age of ZONE_MOVABLE, ZONE is used to solve the issues
due to S/W limitation. This S/W limitation causes highmem/lowmem problem
that is some of memory cannot be usable for kernel memory and LRU ordering
would be broken easily. My major objection to this point is that
this problem isn't related to implementation detail like as ZONE.
Problems just comes from S/W limitation that we cannot use this memory
for kernel memory to guarantee offlining the memory (ZONE_MOVABLE) or
allocation from the device (ZONE_CMA) in the future. See PART 1 for
more information.

Thanks.


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-11 Thread Joonsoo Kim
On Tue, Apr 11, 2017 at 08:15:20PM +0200, Michal Hocko wrote:
> Hi,
> I didn't get to read though patches yet but the cover letter didn't
> really help me to understand the basic concepts to have a good starting
> point before diving into implementation details. It contains a lot of
> history remarks which is not bad but IMHO too excessive here. I would
> appreciate the following information (some of that is already provided
> in the cover but could benefit from some rewording/text reorganization).
> 
> - what is ZONE_CMA and how it is configured (from admin POV)
> - how does ZONE_CMA compare to other zones
> - who is allowed to allocate from this zone and what are the
>   guarantees/requirements for successful allocation
> - how does the zone compare to a preallocate allocation pool
> - how is ZONE_CMA balanced/reclaimed due to internal memory pressure
>   (from CMA users)
> - is this zone reclaimable for the global memory reclaim
> - why this was/is controversial

Hello,

I hope that following summary helps you to understand this patchset.
I skip some basic things about CMA. I will attach this description to
the cover-letter if re-spin is needed.

1. What is ZONE_CMA

ZONE_CMA is a newly introduced zone that manages freepages in CMA areas.
Previously, freepages in CMA areas are in the ordinary zone and
managed/distinguished by the special migratetype, MIGRATE_CMA.
However, it causes too many subtle problems and fixing all the problems
due to it seems to be impossible and too intrusive to MM subsystem.
Therefore, different solution is requested and this is the outcome of
this request. Problem details are described in PART 3.

There is no change in admin POV. It is just implementation detail.
If the kernel is congifured to use CMA, it is managed by MM like as before
except pages are now belong to the separate zone, ZONE_CMA.

2. How does ZONE_CMA compare to other zones

ZONE_CMA is conceptually the same with ZONE_MOVABLE. There is a software
constraint to guarantee the success of future allocation request from
the device. If the device requests the specific range of the memory in CMA
area at the runtime, page that allocated by MM will be migrated to
the other page and it will be returned to the device. To guarantee it,
ZONE_CMA only takes the allocation request with GFP_MOVABLE.

The other important point about ZONE_CMA is that span of ZONE_CMA would be
overlapped with the other zone. This is not new to MM subsystem and
MM subsystem has enough logic to handle such situation
so there would be no problem.

Other things are completely the same with other zones. For MM POV, there is
no difference in allocation process except that it only takes
GFP_MOVABLE request. In reclaim, pages that are allocated by MM will
be reclaimed by the same policy of the MM. So, no difference.

This 'no difference' is a strong point of this approach. ZONE_CMA is
naturally handled by MM subsystem unlike as before (special handling is
required for MIGRATE_CMA).

3. Controversial Point

Major concern from Mel is that zone concept is abused. ZONE is originally
introduced to solve some issues due to H/W addressing limitation.
However, from the age of ZONE_MOVABLE, ZONE is used to solve the issues
due to S/W limitation. This S/W limitation causes highmem/lowmem problem
that is some of memory cannot be usable for kernel memory and LRU ordering
would be broken easily. My major objection to this point is that
this problem isn't related to implementation detail like as ZONE.
Problems just comes from S/W limitation that we cannot use this memory
for kernel memory to guarantee offlining the memory (ZONE_MOVABLE) or
allocation from the device (ZONE_CMA) in the future. See PART 1 for
more information.

Thanks.


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-11 Thread Michal Hocko
Hi,
I didn't get to read though patches yet but the cover letter didn't
really help me to understand the basic concepts to have a good starting
point before diving into implementation details. It contains a lot of
history remarks which is not bad but IMHO too excessive here. I would
appreciate the following information (some of that is already provided
in the cover but could benefit from some rewording/text reorganization).

- what is ZONE_CMA and how it is configured (from admin POV)
- how does ZONE_CMA compare to other zones
- who is allowed to allocate from this zone and what are the
  guarantees/requirements for successful allocation
- how does the zone compare to a preallocate allocation pool
- how is ZONE_CMA balanced/reclaimed due to internal memory pressure
  (from CMA users)
- is this zone reclaimable for the global memory reclaim
- why this was/is controversial
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v7 0/7] Introduce ZONE_CMA

2017-04-11 Thread Michal Hocko
Hi,
I didn't get to read though patches yet but the cover letter didn't
really help me to understand the basic concepts to have a good starting
point before diving into implementation details. It contains a lot of
history remarks which is not bad but IMHO too excessive here. I would
appreciate the following information (some of that is already provided
in the cover but could benefit from some rewording/text reorganization).

- what is ZONE_CMA and how it is configured (from admin POV)
- how does ZONE_CMA compare to other zones
- who is allowed to allocate from this zone and what are the
  guarantees/requirements for successful allocation
- how does the zone compare to a preallocate allocation pool
- how is ZONE_CMA balanced/reclaimed due to internal memory pressure
  (from CMA users)
- is this zone reclaimable for the global memory reclaim
- why this was/is controversial
-- 
Michal Hocko
SUSE Labs


[PATCH v7 0/7] Introduce ZONE_CMA

2017-04-10 Thread js1304
From: Joonsoo Kim 

Changed from v6
o Rebase on next-20170405
o Add a fix for lowmem mapping on ARM (last patch)
o Re-organize the cover letter

Changes from v5
o Rebase on next-20161013
o Cosmetic change on patch 1
o Optimize span of ZONE_CMA on multiple node system

Changes from v4
o Rebase on next-20160825
o Add general fix patch for lowmem reserve
o Fix lowmem reserve ratio
o Fix zone span optimizaion per Vlastimil
o Fix pageset initialization
o Change invocation timing on cma_init_reserved_areas()

Changes from v3
o Rebase on next-20160805
o Split first patch per Vlastimil
o Remove useless function parameter per Vlastimil
o Add code comment per Vlastimil
o Add following description on cover-letter

Changes from v2
o Rebase on next-20160525
o No other changes except following description

Changes from v1
o Separate some patches which deserve to submit independently
o Modify description to reflect current kernel state
(e.g. high-order watermark problem disappeared by Mel's work)
o Don't increase SECTION_SIZE_BITS to make a room in page flags
(detailed reason is on the patch that adds ZONE_CMA)
o Adjust ZONE_CMA population code


Hello,

This is the 7th version of ZONE_CMA patchset. One patch is added
to fix potential problem on ARM. Other changes are just due to rebase.

This patchset has long history and got some reviews before. This
cover-letter has the summary and my opinion on those reviews. Content
order is so confusing so I make a simple index. If anyone want to
understand the history properly, please read them by reverse order.

PART 1. Strong points of the zone approach
PART 2. Summary in LSF/MM 2016 discussion
PART 3. Original motivation of this patchset

* PART 1 *

CMA has many problems and I mentioned them on the bottom of the
cover letter. These problems comes from limitation of CMA memory that
should be always migratable for device usage. I think that introducing
a new zone is the best approach to solve them. Here are the reasons.

Zone is introduced to solve some issues due to H/W addressing limitation.
MM subsystem is implemented to work efficiently with these zones.
Allocation/reclaim logic in MM consider this limitation very much.
What I did in this patchset is introducing a new zone and extending zone's
concept slightly. New concept is that zone can have not only H/W addressing
limitation but also S/W limitation to guarantee page migration.
This concept is originated from ZONE_MOVABLE and it works well
for a long time. So, ZONE_CMA should not be special at this moment.

There is a major concern from Mel that ZONE_MOVABLE which has
S/W limitation causes highmem/lowmem problem. Highmem/lowmem problem is
that some of memory cannot be usable for kernel memory due to limitation
of the zone. It causes to break LRU ordering and makes hard to find kernel
usable memory when memory pressure.

However, important point is that this problem doesn't come from
implementation detail (ZONE_MOVABLE/MIGRATETYPE). Even if we implement it
by MIGRATETYPE instead of by ZONE_MOVABLE, we cannot use that type of
memory for kernel allocation because it isn't migratable. So, it will cause
to break LRU ordering, too. We cannot avoid the problem in any case.
Therefore, we should focus on which solution is better for maintenance
and not intrusive for MM subsystem.

In this viewpoint, I think that zone approach is better. As mentioned
earlier, MM subsystem already have many infrastructures to deal with
zone's H/W addressing limitation. Adding S/W limitation on zone concept
and adding a new zone doesn't change anything. It will work by itself.
My patchset can remove many hooks related to CMA area management in MM
while solving the problems. More hooks are required to solve the problems
if we choose MIGRATETYPE approach.

Although Mel withdrew the review, Vlastimil expressed an agreement on this
new zone approach [6].

 "I realize I differ here from much more experienced mm guys, and will
 probably deservingly regret it later on, but I think that the ZONE_CMA
 approach could work indeed better than current MIGRATE_CMA pageblocks."

If anyone has a different opinion, please let me know.

Thanks.

* PART 2 *

There was a discussion with Mel [5] after LSF/MM 2016. I could summarise
it to help merge decision but it's better to read by yourself since
if I summarise it, it would be biased for me. But, if anyone hope
the summary, I will do it. :)

Anyway, Mel's position on this patchset seems to be neutral. He saids:
"I'm not going to outright NAK your series but I won't ACK it either"

We can fix the problems with any approach but I hope to go a new zone
approach because it is less error-prone. It reduces some corner case
handling for now and remove need for potential corner case handling to fix
problems.

Note that our company is already using ZONE_CMA and there is no problem.

If anyone has a different opinion, please let me know and let's discuss
together.

Andrew, if there is 

[PATCH v7 0/7] Introduce ZONE_CMA

2017-04-10 Thread js1304
From: Joonsoo Kim 

Changed from v6
o Rebase on next-20170405
o Add a fix for lowmem mapping on ARM (last patch)
o Re-organize the cover letter

Changes from v5
o Rebase on next-20161013
o Cosmetic change on patch 1
o Optimize span of ZONE_CMA on multiple node system

Changes from v4
o Rebase on next-20160825
o Add general fix patch for lowmem reserve
o Fix lowmem reserve ratio
o Fix zone span optimizaion per Vlastimil
o Fix pageset initialization
o Change invocation timing on cma_init_reserved_areas()

Changes from v3
o Rebase on next-20160805
o Split first patch per Vlastimil
o Remove useless function parameter per Vlastimil
o Add code comment per Vlastimil
o Add following description on cover-letter

Changes from v2
o Rebase on next-20160525
o No other changes except following description

Changes from v1
o Separate some patches which deserve to submit independently
o Modify description to reflect current kernel state
(e.g. high-order watermark problem disappeared by Mel's work)
o Don't increase SECTION_SIZE_BITS to make a room in page flags
(detailed reason is on the patch that adds ZONE_CMA)
o Adjust ZONE_CMA population code


Hello,

This is the 7th version of ZONE_CMA patchset. One patch is added
to fix potential problem on ARM. Other changes are just due to rebase.

This patchset has long history and got some reviews before. This
cover-letter has the summary and my opinion on those reviews. Content
order is so confusing so I make a simple index. If anyone want to
understand the history properly, please read them by reverse order.

PART 1. Strong points of the zone approach
PART 2. Summary in LSF/MM 2016 discussion
PART 3. Original motivation of this patchset

* PART 1 *

CMA has many problems and I mentioned them on the bottom of the
cover letter. These problems comes from limitation of CMA memory that
should be always migratable for device usage. I think that introducing
a new zone is the best approach to solve them. Here are the reasons.

Zone is introduced to solve some issues due to H/W addressing limitation.
MM subsystem is implemented to work efficiently with these zones.
Allocation/reclaim logic in MM consider this limitation very much.
What I did in this patchset is introducing a new zone and extending zone's
concept slightly. New concept is that zone can have not only H/W addressing
limitation but also S/W limitation to guarantee page migration.
This concept is originated from ZONE_MOVABLE and it works well
for a long time. So, ZONE_CMA should not be special at this moment.

There is a major concern from Mel that ZONE_MOVABLE which has
S/W limitation causes highmem/lowmem problem. Highmem/lowmem problem is
that some of memory cannot be usable for kernel memory due to limitation
of the zone. It causes to break LRU ordering and makes hard to find kernel
usable memory when memory pressure.

However, important point is that this problem doesn't come from
implementation detail (ZONE_MOVABLE/MIGRATETYPE). Even if we implement it
by MIGRATETYPE instead of by ZONE_MOVABLE, we cannot use that type of
memory for kernel allocation because it isn't migratable. So, it will cause
to break LRU ordering, too. We cannot avoid the problem in any case.
Therefore, we should focus on which solution is better for maintenance
and not intrusive for MM subsystem.

In this viewpoint, I think that zone approach is better. As mentioned
earlier, MM subsystem already have many infrastructures to deal with
zone's H/W addressing limitation. Adding S/W limitation on zone concept
and adding a new zone doesn't change anything. It will work by itself.
My patchset can remove many hooks related to CMA area management in MM
while solving the problems. More hooks are required to solve the problems
if we choose MIGRATETYPE approach.

Although Mel withdrew the review, Vlastimil expressed an agreement on this
new zone approach [6].

 "I realize I differ here from much more experienced mm guys, and will
 probably deservingly regret it later on, but I think that the ZONE_CMA
 approach could work indeed better than current MIGRATE_CMA pageblocks."

If anyone has a different opinion, please let me know.

Thanks.

* PART 2 *

There was a discussion with Mel [5] after LSF/MM 2016. I could summarise
it to help merge decision but it's better to read by yourself since
if I summarise it, it would be biased for me. But, if anyone hope
the summary, I will do it. :)

Anyway, Mel's position on this patchset seems to be neutral. He saids:
"I'm not going to outright NAK your series but I won't ACK it either"

We can fix the problems with any approach but I hope to go a new zone
approach because it is less error-prone. It reduces some corner case
handling for now and remove need for potential corner case handling to fix
problems.

Note that our company is already using ZONE_CMA and there is no problem.

If anyone has a different opinion, please let me know and let's discuss
together.

Andrew, if there is something to do for merge,