Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Tue, 30 Jan 2007, Peter Zijlstra wrote: > I'm guessing this will involve page migration. Not necessarily. The approach also works without page migration. Depends on an intelligent allocation scheme that stays off the areas of interest to those restricted to low area allocations as much as possible and then is able to reclaim from a section of a zone if necessary. The implementation of alloc_pages_range() that I did way back did not reply on page migration. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Mon, 29 Jan 2007, Andrew Morton wrote: > On Mon, 29 Jan 2007 15:37:29 -0800 (PST) > Christoph Lameter <[EMAIL PROTECTED]> wrote: > > > With a alloc_pages_range() one would be able to specify upper and lower > > boundaries. > > Is there a proposal anywhere regarding how this would be implemented? Yes it was discussed a while back in August. Look for alloc_pages_range. Sadly I have not been able to do work on it since there are too many other issues. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Mon, 29 Jan 2007, Andrew Morton wrote: On Mon, 29 Jan 2007 15:37:29 -0800 (PST) Christoph Lameter [EMAIL PROTECTED] wrote: With a alloc_pages_range() one would be able to specify upper and lower boundaries. Is there a proposal anywhere regarding how this would be implemented? Yes it was discussed a while back in August. Look for alloc_pages_range. Sadly I have not been able to do work on it since there are too many other issues. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Tue, 30 Jan 2007, Peter Zijlstra wrote: I'm guessing this will involve page migration. Not necessarily. The approach also works without page migration. Depends on an intelligent allocation scheme that stays off the areas of interest to those restricted to low area allocations as much as possible and then is able to reclaim from a section of a zone if necessary. The implementation of alloc_pages_range() that I did way back did not reply on page migration. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Mon, 2007-01-29 at 16:09 -0800, Andrew Morton wrote: > On Mon, 29 Jan 2007 15:37:29 -0800 (PST) > Christoph Lameter <[EMAIL PROTECTED]> wrote: > > > With a alloc_pages_range() one would be able to specify upper and lower > > boundaries. > > Is there a proposal anywhere regarding how this would be implemented? I'm guessing this will involve page migration. Still, would we need to place bounds on non movable pages, or will it be a best effort? It seems the current zone approach is a best effort too, although it does try to keep allocations away from the lower zones as much as possible. But I guess we could make a single zone allocator prefer high addresses too. So then we'd end up with a single zone, and each allocation would give a range. Try and pick a free page with as high an address as possible in the given range. If no pages available in the given range try and move some movable pages out of it. This does of course involve finding free pages in a given range, and identifying pages as movable. And a gazillion trivial but tedious things I've forgotten. Christoph, is this what you were getting at? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Mon, 2007-01-29 at 16:09 -0800, Andrew Morton wrote: On Mon, 29 Jan 2007 15:37:29 -0800 (PST) Christoph Lameter [EMAIL PROTECTED] wrote: With a alloc_pages_range() one would be able to specify upper and lower boundaries. Is there a proposal anywhere regarding how this would be implemented? I'm guessing this will involve page migration. Still, would we need to place bounds on non movable pages, or will it be a best effort? It seems the current zone approach is a best effort too, although it does try to keep allocations away from the lower zones as much as possible. But I guess we could make a single zone allocator prefer high addresses too. So then we'd end up with a single zone, and each allocation would give a range. Try and pick a free page with as high an address as possible in the given range. If no pages available in the given range try and move some movable pages out of it. This does of course involve finding free pages in a given range, and identifying pages as movable. And a gazillion trivial but tedious things I've forgotten. Christoph, is this what you were getting at? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Mon, 29 Jan 2007 15:37:29 -0800 (PST) Christoph Lameter <[EMAIL PROTECTED]> wrote: > With a alloc_pages_range() one would be able to specify upper and lower > boundaries. Is there a proposal anywhere regarding how this would be implemented? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Mon, 29 Jan 2007, Russell King wrote: > This sounds like it could help ARM where we have some weird DMA areas. Some ARM platforms have no need for a ZONE_DMA. The code in mm allows you to not compile ZONE_DMA support into these kernels. > What will help even more is if the block layer can also be persuaded that > a device dma mask is precisely that - a mask - and not a set of leading > ones followed by a set of zeros, then we could eliminate the really ugly > dmabounce code. With a alloc_pages_range() one would be able to specify upper and lower boundaries. The device dma mask can be translated to a fitting boundary. Maybe we can then also get rid of the device mask and specify a boundary there. There is a lot of ugly code all around that circumvents the existing issues with dma masks. That would all go away. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Mon, Jan 29, 2007 at 02:45:06PM -0800, Christoph Lameter wrote: > On Mon, 29 Jan 2007, Andrew Morton wrote: > > > > All 64 bit machine will only have a single zone if we have such a range > > > alloc mechanism. The 32bit ones with HIGHMEM wont be able to avoid it, > > > true. But all arches that do not need gymnastics to access their memory > > > will be able run with a single zone. > > > > What is "such a range alloc mechanism"? > > As I mentioned above: A function that allows an allocation to specify > which physical memory ranges are permitted. > > > So please stop telling me what a wonderful world it is to not have multiple > > zones. It just isn't going to happen for a long long time. The > > multiple-zone kernel is the case we need to care about most by a very large > > margin indeed. Single-zone is an infinitesimal corner-case. > > We can still reduce the number of zones for those that require highmem to > two which may allows us to avoid ZONE_DMA/DMA32 issues and allow dma > devices to avoid bunce buffers that can do I/O to memory ranges not > compatible with the current boundaries of DMA/DMA32. And I am also > repeating myself. This sounds like it could help ARM where we have some weird DMA areas. What will help even more is if the block layer can also be persuaded that a device dma mask is precisely that - a mask - and not a set of leading ones followed by a set of zeros, then we could eliminate the really ugly dmabounce code. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Mon, 29 Jan 2007, Andrew Morton wrote: > > All 64 bit machine will only have a single zone if we have such a range > > alloc mechanism. The 32bit ones with HIGHMEM wont be able to avoid it, > > true. But all arches that do not need gymnastics to access their memory > > will be able run with a single zone. > > What is "such a range alloc mechanism"? As I mentioned above: A function that allows an allocation to specify which physical memory ranges are permitted. > So please stop telling me what a wonderful world it is to not have multiple > zones. It just isn't going to happen for a long long time. The > multiple-zone kernel is the case we need to care about most by a very large > margin indeed. Single-zone is an infinitesimal corner-case. We can still reduce the number of zones for those that require highmem to two which may allows us to avoid ZONE_DMA/DMA32 issues and allow dma devices to avoid bunce buffers that can do I/O to memory ranges not compatible with the current boundaries of DMA/DMA32. And I am also repeating myself. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Mon, 29 Jan 2007 13:54:38 -0800 (PST) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Fri, 26 Jan 2007, Andrew Morton wrote: > > > > The main benefit is a significant simplification of the VM, leading to > > > robust and reliable operations and a reduction of the maintenance > > > headaches coming with the additional zones. > > > > > > If we would introduce the ability of allocating from a range of > > > physical addresses then the need for DMA zones would go away allowing > > > flexibility for device driver DMA allocations and at the same time we get > > > rid of special casing in the VM. > > > > None of this is valid. The great majority of machines out there will > > continue to have the same number of zones. Nothing changes. > > All 64 bit machine will only have a single zone if we have such a range > alloc mechanism. The 32bit ones with HIGHMEM wont be able to avoid it, > true. But all arches that do not need gymnastics to access their memory > will be able run with a single zone. What is "such a range alloc mechanism"? > > That's all a real cost, so we need to see *good* benefits to outweigh that > > cost. Thus far I don't think we've seen that. > > The real savings is the simplicity of VM design, robustness and > efficiency. We loose on all these fronts if we keep or add useless zones. > > The main reason for the recent problems with dirty handling seem to be due > to exactly such a multizone balancing issues involving ZONE_NORMAL and > HIGHMEM. Those problems cannot occur on single ZONE arches (this means > right now on a series of embedded arches, UML and IA64). > > Multiple ZONES are a recipie for VM fragility and result in complexity > that is difficult to manage. Why do I have to keep repeating myself? 90% of known FC6-running machines are x86-32. 90% of vendor-shipped kernels need all three zones. And the remaining 10% ship with multiple nodes as well. So please stop telling me what a wonderful world it is to not have multiple zones. It just isn't going to happen for a long long time. The multiple-zone kernel is the case we need to care about most by a very large margin indeed. Single-zone is an infinitesimal corner-case. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Andrew Morton wrote: > > The main benefit is a significant simplification of the VM, leading to > > robust and reliable operations and a reduction of the maintenance > > headaches coming with the additional zones. > > > > If we would introduce the ability of allocating from a range of > > physical addresses then the need for DMA zones would go away allowing > > flexibility for device driver DMA allocations and at the same time we get > > rid of special casing in the VM. > > None of this is valid. The great majority of machines out there will > continue to have the same number of zones. Nothing changes. All 64 bit machine will only have a single zone if we have such a range alloc mechanism. The 32bit ones with HIGHMEM wont be able to avoid it, true. But all arches that do not need gymnastics to access their memory will be able run with a single zone. > That's all a real cost, so we need to see *good* benefits to outweigh that > cost. Thus far I don't think we've seen that. The real savings is the simplicity of VM design, robustness and efficiency. We loose on all these fronts if we keep or add useless zones. The main reason for the recent problems with dirty handling seem to be due to exactly such a multizone balancing issues involving ZONE_NORMAL and HIGHMEM. Those problems cannot occur on single ZONE arches (this means right now on a series of embedded arches, UML and IA64). Multiple ZONES are a recipie for VM fragility and result in complexity that is difficult to manage. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Andrew Morton wrote: The main benefit is a significant simplification of the VM, leading to robust and reliable operations and a reduction of the maintenance headaches coming with the additional zones. If we would introduce the ability of allocating from a range of physical addresses then the need for DMA zones would go away allowing flexibility for device driver DMA allocations and at the same time we get rid of special casing in the VM. None of this is valid. The great majority of machines out there will continue to have the same number of zones. Nothing changes. All 64 bit machine will only have a single zone if we have such a range alloc mechanism. The 32bit ones with HIGHMEM wont be able to avoid it, true. But all arches that do not need gymnastics to access their memory will be able run with a single zone. That's all a real cost, so we need to see *good* benefits to outweigh that cost. Thus far I don't think we've seen that. The real savings is the simplicity of VM design, robustness and efficiency. We loose on all these fronts if we keep or add useless zones. The main reason for the recent problems with dirty handling seem to be due to exactly such a multizone balancing issues involving ZONE_NORMAL and HIGHMEM. Those problems cannot occur on single ZONE arches (this means right now on a series of embedded arches, UML and IA64). Multiple ZONES are a recipie for VM fragility and result in complexity that is difficult to manage. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Mon, 29 Jan 2007 13:54:38 -0800 (PST) Christoph Lameter [EMAIL PROTECTED] wrote: On Fri, 26 Jan 2007, Andrew Morton wrote: The main benefit is a significant simplification of the VM, leading to robust and reliable operations and a reduction of the maintenance headaches coming with the additional zones. If we would introduce the ability of allocating from a range of physical addresses then the need for DMA zones would go away allowing flexibility for device driver DMA allocations and at the same time we get rid of special casing in the VM. None of this is valid. The great majority of machines out there will continue to have the same number of zones. Nothing changes. All 64 bit machine will only have a single zone if we have such a range alloc mechanism. The 32bit ones with HIGHMEM wont be able to avoid it, true. But all arches that do not need gymnastics to access their memory will be able run with a single zone. What is such a range alloc mechanism? That's all a real cost, so we need to see *good* benefits to outweigh that cost. Thus far I don't think we've seen that. The real savings is the simplicity of VM design, robustness and efficiency. We loose on all these fronts if we keep or add useless zones. The main reason for the recent problems with dirty handling seem to be due to exactly such a multizone balancing issues involving ZONE_NORMAL and HIGHMEM. Those problems cannot occur on single ZONE arches (this means right now on a series of embedded arches, UML and IA64). Multiple ZONES are a recipie for VM fragility and result in complexity that is difficult to manage. Why do I have to keep repeating myself? 90% of known FC6-running machines are x86-32. 90% of vendor-shipped kernels need all three zones. And the remaining 10% ship with multiple nodes as well. So please stop telling me what a wonderful world it is to not have multiple zones. It just isn't going to happen for a long long time. The multiple-zone kernel is the case we need to care about most by a very large margin indeed. Single-zone is an infinitesimal corner-case. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Mon, 29 Jan 2007, Andrew Morton wrote: All 64 bit machine will only have a single zone if we have such a range alloc mechanism. The 32bit ones with HIGHMEM wont be able to avoid it, true. But all arches that do not need gymnastics to access their memory will be able run with a single zone. What is such a range alloc mechanism? As I mentioned above: A function that allows an allocation to specify which physical memory ranges are permitted. So please stop telling me what a wonderful world it is to not have multiple zones. It just isn't going to happen for a long long time. The multiple-zone kernel is the case we need to care about most by a very large margin indeed. Single-zone is an infinitesimal corner-case. We can still reduce the number of zones for those that require highmem to two which may allows us to avoid ZONE_DMA/DMA32 issues and allow dma devices to avoid bunce buffers that can do I/O to memory ranges not compatible with the current boundaries of DMA/DMA32. And I am also repeating myself. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Mon, Jan 29, 2007 at 02:45:06PM -0800, Christoph Lameter wrote: On Mon, 29 Jan 2007, Andrew Morton wrote: All 64 bit machine will only have a single zone if we have such a range alloc mechanism. The 32bit ones with HIGHMEM wont be able to avoid it, true. But all arches that do not need gymnastics to access their memory will be able run with a single zone. What is such a range alloc mechanism? As I mentioned above: A function that allows an allocation to specify which physical memory ranges are permitted. So please stop telling me what a wonderful world it is to not have multiple zones. It just isn't going to happen for a long long time. The multiple-zone kernel is the case we need to care about most by a very large margin indeed. Single-zone is an infinitesimal corner-case. We can still reduce the number of zones for those that require highmem to two which may allows us to avoid ZONE_DMA/DMA32 issues and allow dma devices to avoid bunce buffers that can do I/O to memory ranges not compatible with the current boundaries of DMA/DMA32. And I am also repeating myself. This sounds like it could help ARM where we have some weird DMA areas. What will help even more is if the block layer can also be persuaded that a device dma mask is precisely that - a mask - and not a set of leading ones followed by a set of zeros, then we could eliminate the really ugly dmabounce code. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Mon, 29 Jan 2007, Russell King wrote: This sounds like it could help ARM where we have some weird DMA areas. Some ARM platforms have no need for a ZONE_DMA. The code in mm allows you to not compile ZONE_DMA support into these kernels. What will help even more is if the block layer can also be persuaded that a device dma mask is precisely that - a mask - and not a set of leading ones followed by a set of zeros, then we could eliminate the really ugly dmabounce code. With a alloc_pages_range() one would be able to specify upper and lower boundaries. The device dma mask can be translated to a fitting boundary. Maybe we can then also get rid of the device mask and specify a boundary there. There is a lot of ugly code all around that circumvents the existing issues with dma masks. That would all go away. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Mon, 29 Jan 2007 15:37:29 -0800 (PST) Christoph Lameter [EMAIL PROTECTED] wrote: With a alloc_pages_range() one would be able to specify upper and lower boundaries. Is there a proposal anywhere regarding how this would be implemented? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007 11:58:18 -0800 (PST) Christoph Lameter <[EMAIL PROTECTED]> wrote: > > If the only demonstrable benefit is a saving of a few k of text on a small > > number of machines then things are looking very grim, IMO. > > The main benefit is a significant simplification of the VM, leading to > robust and reliable operations and a reduction of the maintenance > headaches coming with the additional zones. > > If we would introduce the ability of allocating from a range of > physical addresses then the need for DMA zones would go away allowing > flexibility for device driver DMA allocations and at the same time we get > rid of special casing in the VM. None of this is valid. The great majority of machines out there will continue to have the same number of zones. Nothing changes. What will happen is that a small number of machines will have different runtime behaviour. So they don't benefit from the majority's testing and they don't contrinute to it and they potentially have unique-to-them problems which we need to worry about. That's all a real cost, so we need to see *good* benefits to outweigh that cost. Thus far I don't think we've seen that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Andrew Morton wrote: > As Mel points out, distros will ship with CONFIG_ZONE_DMA=y, so the number > of machines which will actually benefit from this change is really small. > And the benefit to those few machines will also, I suspect, be small. > > > > - We kicked around some quite different ways of implementing the same > > > things, but nothing came of it. iirc, one was to remove the hard-coded > > > zones altogether and rework all the MM to operate in terms of > > > > > > for (idx = 0; idx < NUMBER_OF_ZONES; idx++) > > > ... > > > > Hmmm.. How would that be simpler? > > Replace a sprinkle of open-coded ifdefs with a regular code sequence which > everyone uses. Pretty obvious, I'd thought. We do use such loops in many places. However, stuff like array initialization and special casing cannot use a loop. I am not sure what we could change there. The hard coding is necessary because each zone currently has these invariant characteristics that we need to consider. Reducing the number of zones reduces the amount of special casing in the VM that needs to be considered at run time and that is a potential issue for trouble. > Plus it becoems straightforward to extend this from the present four zones > to a complete 12 zones, which gives use the full set of > ZONE_DMA20,ZONE_DMA21,...,ZONE_DMA32 for those funny devices. I just hope we can handle the VM complexity of load balancing etc etc that this will introduce. Also each zone has management overhead and will cause the touching of additional cachelines on many VM operations. Much of that management overhead becomes unnecessary if we reduce zones. > If the only demonstrable benefit is a saving of a few k of text on a small > number of machines then things are looking very grim, IMO. The main benefit is a significant simplification of the VM, leading to robust and reliable operations and a reduction of the maintenance headaches coming with the additional zones. If we would introduce the ability of allocating from a range of physical addresses then the need for DMA zones would go away allowing flexibility for device driver DMA allocations and at the same time we get rid of special casing in the VM. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007 07:56:09 -0800 (PST) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Fri, 26 Jan 2007, Andrew Morton wrote: > > > - They add zillions of ifdefs > > They just add a few for ZONE_DMA where we alreaday have similar ifdefs for > ZONE_DMA32 and ZONE_HIGHMEM. I refreshed my memory. It remains awful. > > - They make the VM's behaviour diverge between different platforms and > > between differen configs on the same platforms, and hence degrade > > maintainability and increase complexity. > > They avoid unecessary complexity on platforms. They could be made to work > on more platforms with measures to deal with what ZONE_DMA > provides in different ways. There are 6 or so platforms that do not need > ZONE_DMA at all. As Mel points out, distros will ship with CONFIG_ZONE_DMA=y, so the number of machines which will actually benefit from this change is really small. And the benefit to those few machines will also, I suspect, be small. > > - We kicked around some quite different ways of implementing the same > > things, but nothing came of it. iirc, one was to remove the hard-coded > > zones altogether and rework all the MM to operate in terms of > > > > for (idx = 0; idx < NUMBER_OF_ZONES; idx++) > > ... > > Hmmm.. How would that be simpler? Replace a sprinkle of open-coded ifdefs with a regular code sequence which everyone uses. Pretty obvious, I'd thought. Plus it becoems straightforward to extend this from the present four zones to a complete 12 zones, which gives use the full set of ZONE_DMA20,ZONE_DMA21,...,ZONE_DMA32 for those funny devices. > > - I haven't seen any hard numbers to justify the change. > > I have send you numbers showing significant reductions in code size. If it isn't in the changelog it doesn't exist. I guess I didn't copy it into the changelog. If the only demonstrable benefit is a saving of a few k of text on a small number of machines then things are looking very grim, IMO. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Christoph Lameter wrote: On Fri, 26 Jan 2007, Mel Gorman wrote: For arches that do not have HIGHMEM other zones would be okay too it seems. It would, but it'd obscure the code to take advantage of that. No MOVABLE memory for 64 bit platforms that do not have HIGHMEM right now? err, no, I misinterpreted what you meant by "other zones would be ok..". I though you were suggesting the reuse of zone names for some reason. The zone used to for ZONE_MOVABLE is the highest populated zone on the architecture. On some architectures, that will be ZONE_HIGHMEM. On others, it will be ZONE_DMA. See the function find_usable_zone_for_movable() ZONE_MOVABLE never spans zones. For example, it will not use some ZONE_HIGHMEM and some ZONE_NORMAL memory. The anti-fragmentation code could potentially be used to have subzone groups that kept movable and unmovable allocations as far apart as possible and at opposite ends of a zone. That approach has been kicked a few times because of complexity. Hmm... But his patch also introduces additional complexity plus its difficult to handle for the end user. It's harder for the user to setup all right. But it works within limits that are known well in advance and doesn't add additional code to the main allocator path. Once it's setup, it acts like any other zone and zone behavior is better understood than anti-fragmentations behavior. There are some NUMA architectures that are not that symmetric. I know, it's why find_zone_movable_pfns_for_nodes() is as complex as it is. The mechanism spreads the unmovable memory evenly throughout all nodes. In the event some nodes are too small to hold their share, the remaining unmovable memory is divided between the nodes that are larger. I would have expected a percentage of a node. If equal amounts of unmovable memory are assigned to all nodes at first then there will be large disparities in the amount of movable memories f.e. between a node with 8G memory compared to a node with 1GB memory. On the other hand, percentages make it harder for the administrator to know in advance how much unmovable memory will be available when the system starts even if the machine changes configuration. The absolute figure is easier to understand. If there was a requirement, an alternative configuration option could be made available that takes a fixed percentage of each node with memory. How do you handle headless nodes? I.e. memory nodes with no processors? The code only cares about memory, not processors. Those may be particularly large compared to the rest but these are mainly used for movable pages since unmovable things like device drivers buffers have to be kept near the processors that take the interrupt. Then what I'd do is specify kernelcore to be (number_of_nodes_with_processors * largest_amount_of_memory_on_node_with_processors) That would have all memory near processors available as unmovable memory (that movable allocations will still use so they don't always go remote) while keeping a large amount of memory on the headless nodes for movable allocations only. If requirements demanded, a configuration option could be made that allows the administrator to specify exactly how much unmovable memory he wants on a specific node. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Mel Gorman wrote: > > For arches that do not have HIGHMEM other zones would be okay too it > > seems. > It would, but it'd obscure the code to take advantage of that. No MOVABLE memory for 64 bit platforms that do not have HIGHMEM right now? > The anti-fragmentation code could potentially be used to have subzone groups > that kept movable and unmovable allocations as far apart as possible and at > opposite ends of a zone. That approach has been kicked a few times because of > complexity. Hmm... But his patch also introduces additional complexity plus its difficult to handle for the end user. > > There are some NUMA architectures that are not that > > symmetric. > I know, it's why find_zone_movable_pfns_for_nodes() is as complex as it is. > The mechanism spreads the unmovable memory evenly throughout all nodes. In the > event some nodes are too small to hold their share, the remaining unmovable > memory is divided between the nodes that are larger. I would have expected a percentage of a node. If equal amounts of unmovable memory are assigned to all nodes at first then there will be large disparities in the amount of movable memories f.e. between a node with 8G memory compared to a node with 1GB memory. How do you handle headless nodes? I.e. memory nodes with no processors? Those may be particularly large compared to the rest but these are mainly used for movable pages since unmovable things like device drivers buffers have to be kept near the processors that take the interrupt. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Christoph Lameter wrote: On Thu, 25 Jan 2007, Mel Gorman wrote: The following 8 patches against 2.6.20-rc4-mm1 create a zone called ZONE_MOVABLE that is only usable by allocations that specify both __GFP_HIGHMEM and __GFP_MOVABLE. This has the effect of keeping all non-movable pages within a single memory partition while allowing movable allocations to be satisified from either partition. For arches that do not have HIGHMEM other zones would be okay too it seems. It would, but it'd obscure the code to take advantage of that. The size of the zone is determined by a kernelcore= parameter specified at boot-time. This specifies how much memory is usable by non-movable allocations and the remainder is used for ZONE_MOVABLE. Any range of pages within ZONE_MOVABLE can be released by migrating the pages or by reclaiming. The user has to manually fiddle around with the size of the unmovable partition until it works? They have to fiddle with the size of the unmovable partition if their workload uses more unmovable kernel allocations than expected. This was always going to be the restriction with using zones for partitioning memory. Resizing zones on the fly is not really an option because the resizing would only work reliably in one direction. The anti-fragmentation code could potentially be used to have subzone groups that kept movable and unmovable allocations as far apart as possible and at opposite ends of a zone. That approach has been kicked a few times because of complexity. When selecting a zone to take pages from for ZONE_MOVABLE, there are two things to consider. First, only memory from the highest populated zone is used for ZONE_MOVABLE. On the x86, this is probably going to be ZONE_HIGHMEM but it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64. Second, the amount of memory usable by the kernel will be spreadly evenly throughout NUMA nodes where possible. If the nodes are not of equal size, the amount of memory usable by the kernel on some nodes may be greater than others. So how is the amount of movable memory on a node calculated? Subtle difference. The amount of unmovable memory is calculated per node. Evenly distributed? As evenly as possible. There are some NUMA architectures that are not that symmetric. I know, it's why find_zone_movable_pfns_for_nodes() is as complex as it is. The mechanism spreads the unmovable memory evenly throughout all nodes. In the event some nodes are too small to hold their share, the remaining unmovable memory is divided between the nodes that are larger. By default, the zone is not as useful for hugetlb allocations because they are pinned and non-migratable (currently at least). A sysctl is provided that allows huge pages to be allocated from that zone. This means that the huge page pool can be resized to the size of ZONE_MOVABLE during the lifetime of the system assuming that pages are not mlocked. Despite huge pages being non-movable, we do not introduce additional external fragmentation of note as huge pages are always the largest contiguous block we care about. The user already has to specify the partitioning of the system at bootup and could take the huge page sizes into account. Not in all cases. Some systems will not know how many huge pages they need in advance because it is used as a batch system running jobs as requested. The zone allows an amount of memory to be set aside that can be *optionally* used for hugepages if desired or base pages if not. Between jobs, the hugepage pool can be resized up to the size of ZONE_MOVABLE. The other case is ever supporting memory hot-remove. Any memory within ZONE_MOVABLE can potentially be removed by migrating pages and off-lined. Also huge pages may have variable sizes that can be specified on bootup for IA64. The assumption that a huge page is always the largest contiguous block is *not true*. I didn't say they were the largest supported contiguous block, I said they were the largest contiguous block we *care* about. Right now, it is assumed that variable pages are not supported at runtime. If they were, some smarts would be needed to keep huge pages of the same size together to control external fragmentation but that's about it. The huge page sizes on i386 and x86_64 platforms are contigent on their page table structure. This can be completely different on other platforms. The size doesn't really make much difference to the mechanism. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Thu, 25 Jan 2007, Mel Gorman wrote: > The following 8 patches against 2.6.20-rc4-mm1 create a zone called > ZONE_MOVABLE that is only usable by allocations that specify both > __GFP_HIGHMEM > and __GFP_MOVABLE. This has the effect of keeping all non-movable pages > within a single memory partition while allowing movable allocations to be > satisified from either partition. For arches that do not have HIGHMEM other zones would be okay too it seems. > The size of the zone is determined by a kernelcore= parameter specified at > boot-time. This specifies how much memory is usable by non-movable allocations > and the remainder is used for ZONE_MOVABLE. Any range of pages within > ZONE_MOVABLE can be released by migrating the pages or by reclaiming. The user has to manually fiddle around with the size of the unmovable partition until it works? > When selecting a zone to take pages from for ZONE_MOVABLE, there are two > things to consider. First, only memory from the highest populated zone is > used for ZONE_MOVABLE. On the x86, this is probably going to be ZONE_HIGHMEM > but it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64. Second, > the amount of memory usable by the kernel will be spreadly evenly throughout > NUMA nodes where possible. If the nodes are not of equal size, the amount > of memory usable by the kernel on some nodes may be greater than others. So how is the amount of movable memory on a node calculated? Evenly distributed? There are some NUMA architectures that are not that symmetric. > By default, the zone is not as useful for hugetlb allocations because they > are pinned and non-migratable (currently at least). A sysctl is provided that > allows huge pages to be allocated from that zone. This means that the huge > page pool can be resized to the size of ZONE_MOVABLE during the lifetime of > the system assuming that pages are not mlocked. Despite huge pages being > non-movable, we do not introduce additional external fragmentation of note > as huge pages are always the largest contiguous block we care about. The user already has to specify the partitioning of the system at bootup and could take the huge page sizes into account. Also huge pages may have variable sizes that can be specified on bootup for IA64. The assumption that a huge page is always the largest contiguous block is *not true*. The huge page sizes on i386 and x86_64 platforms are contigent on their page table structure. This can be completely different on other platforms. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Mel Gorman wrote: > I haven't thought about it much so I probably am missing something. The major > difference I see is when only one zone is present. In that case, a number of > loops presumably get optimised away and the behavior is very different > (presumably better although you point out no figures exist to prove it). Where > there are two or more zones, the code paths should be similar whether there > are 2, 3 or 4 zones present. The balancing of allocations between zones is becoming unnecessary. Also in a NUMA system we then have zone == node which allows for a series of simplifications. > As the common platforms will always have more than one zone, it'll be heavily > tested and I'm guessing that distros are always going to have to ship kernels > with ZONE_DMA for the devices that require it. The only platform I see that > may have problems at the moment is IA64 which looks like the only platform > that can have one and only one zone. I am guessing that Christoph will catch > problems here fairly quickly although a non-optional ZONE_MOVABLE would throw > a spanner into the works somewhat. There are 6 platforms that have only one zone. These are not major platforms. In order for major platforms to go to a single zone in general we would have to implement a generic mechanism to do an allocation where one can specify the memory boundaries. Many DMA engines have different limitations from what ZONE_DMA and ZONE_DMA32 can provide. If such a scheme would be implemented then those would be able to utilize memory better and the amount of bounce buffers would be reduced. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Andrew Morton wrote: > - They add zillions of ifdefs They just add a few for ZONE_DMA where we alreaday have similar ifdefs for ZONE_DMA32 and ZONE_HIGHMEM. > - They make the VM's behaviour diverge between different platforms and > between differen configs on the same platforms, and hence degrade > maintainability and increase complexity. They avoid unecessary complexity on platforms. They could be made to work on more platforms with measures to deal with what ZONE_DMA provides in different ways. There are 6 or so platforms that do not need ZONE_DMA at all. > - We kicked around some quite different ways of implementing the same > things, but nothing came of it. iirc, one was to remove the hard-coded > zones altogether and rework all the MM to operate in terms of > > for (idx = 0; idx < NUMBER_OF_ZONES; idx++) > ... Hmmm.. How would that be simpler? > - I haven't seen any hard numbers to justify the change. I have send you numbers showing significant reductions in code size. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Andrew Morton wrote: On Thu, 25 Jan 2007 23:44:58 + (GMT) Mel Gorman <[EMAIL PROTECTED]> wrote: The following 8 patches against 2.6.20-rc4-mm1 create a zone called ZONE_MOVABLE Argh. These surely get all tangled up with the make-zones-optional-by-adding-zillions-of-ifdef patches: There may be some entertainment there all right. I didn't see any obvious way of avoiding collisions with those patches but for what it's worth, ZONE_MOVABLE could also be made optional. In this patchset, I made no assumptions about the number of zones other than the value of MAX_NR_ZONES. There should be no critical collisions but I'll look through this patch list and see what I can spot. deal-with-cases-of-zone_dma-meaning-the-first-zone.patch This patch looks ok and looks like it stands on it's own. introduce-config_zone_dma.patch ok, no collisions here but obviously this patch does not stand on it's own. optional-zone_dma-in-the-vm.patch There are collisions here with the __ZONE_COUNT stuff but it's not difficult to work around. optional-zone_dma-in-the-vm-no-gfp_dma-check-in-the-slab-if-no-config_zone_dma-is-set.patch optional-zone_dma-in-the-vm-no-gfp_dma-check-in-the-slab-if-no-config_zone_dma-is-set-reduce-config_zone_dma-ifdefs.patch There is no cross-over here with the ZONE_MOVABLE patches. They are messing around with slab optional-zone_dma-for-ia64.patch No collision here remove-zone_dma-remains-from-parisc.patch remove-zone_dma-remains-from-sh-sh64.patch No collisions here either. I see that there were discussions about Power potentially doing something similar. set-config_zone_dma-for-arches-with-generic_isa_dma.patch No collisions zoneid-fix-up-calculations-for-zoneid_pgshift.patch Fun, but no collisions. To my suprise, I only spotted one major conflict point with optional-zone_dma-in-the-vm.patch and that should be easy enough to resolve. What I could do is break up one of my patches into most-of-the-patch and the-part-that-may-conflict-with-optional-dma-zone . The smaller part would then change depending on whether the optional DMA zone work is present. Would that be any help? My objections to those patches: - They add zillions of ifdefs - They make the VM's behaviour diverge between different platforms and between differen configs on the same platforms, and hence degrade maintainability and increase complexity. I haven't thought about it much so I probably am missing something. The major difference I see is when only one zone is present. In that case, a number of loops presumably get optimised away and the behavior is very different (presumably better although you point out no figures exist to prove it). Where there are two or more zones, the code paths should be similar whether there are 2, 3 or 4 zones present. As the common platforms will always have more than one zone, it'll be heavily tested and I'm guessing that distros are always going to have to ship kernels with ZONE_DMA for the devices that require it. The only platform I see that may have problems at the moment is IA64 which looks like the only platform that can have one and only one zone. I am guessing that Christoph will catch problems here fairly quickly although a non-optional ZONE_MOVABLE would throw a spanner into the works somewhat. - We kicked around some quite different ways of implementing the same things, but nothing came of it. iirc, one was to remove the hard-coded zones altogether and rework all the MM to operate in terms of for (idx = 0; idx < NUMBER_OF_ZONES; idx++) ... hmm. Assuming the aim is to have a situation where all zone-related loops are optimised away at compile-time, it's hard to see an alternative that works. Any dynamic way of creating zone at boot time will not have the compile-time optimizations and any API that is page-range aware will eventually hit the problems zones were made to solve (i.e. unmovable pages locked in the lower address ranges). - I haven't seen any hard numbers to justify the change. So I want to drop them all. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Thu, 25 Jan 2007 23:44:58 + (GMT) Mel Gorman <[EMAIL PROTECTED]> wrote: > The following 8 patches against 2.6.20-rc4-mm1 create a zone called > ZONE_MOVABLE Argh. These surely get all tangled up with the make-zones-optional-by-adding-zillions-of-ifdef patches: deal-with-cases-of-zone_dma-meaning-the-first-zone.patch introduce-config_zone_dma.patch optional-zone_dma-in-the-vm.patch optional-zone_dma-in-the-vm-no-gfp_dma-check-in-the-slab-if-no-config_zone_dma-is-set.patch optional-zone_dma-in-the-vm-no-gfp_dma-check-in-the-slab-if-no-config_zone_dma-is-set-reduce-config_zone_dma-ifdefs.patch optional-zone_dma-for-ia64.patch remove-zone_dma-remains-from-parisc.patch remove-zone_dma-remains-from-sh-sh64.patch set-config_zone_dma-for-arches-with-generic_isa_dma.patch zoneid-fix-up-calculations-for-zoneid_pgshift.patch My objections to those patches: - They add zillions of ifdefs - They make the VM's behaviour diverge between different platforms and between differen configs on the same platforms, and hence degrade maintainability and increase complexity. - We kicked around some quite different ways of implementing the same things, but nothing came of it. iirc, one was to remove the hard-coded zones altogether and rework all the MM to operate in terms of for (idx = 0; idx < NUMBER_OF_ZONES; idx++) ... - I haven't seen any hard numbers to justify the change. So I want to drop them all. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Thu, 25 Jan 2007 23:44:58 + (GMT) Mel Gorman [EMAIL PROTECTED] wrote: The following 8 patches against 2.6.20-rc4-mm1 create a zone called ZONE_MOVABLE Argh. These surely get all tangled up with the make-zones-optional-by-adding-zillions-of-ifdef patches: deal-with-cases-of-zone_dma-meaning-the-first-zone.patch introduce-config_zone_dma.patch optional-zone_dma-in-the-vm.patch optional-zone_dma-in-the-vm-no-gfp_dma-check-in-the-slab-if-no-config_zone_dma-is-set.patch optional-zone_dma-in-the-vm-no-gfp_dma-check-in-the-slab-if-no-config_zone_dma-is-set-reduce-config_zone_dma-ifdefs.patch optional-zone_dma-for-ia64.patch remove-zone_dma-remains-from-parisc.patch remove-zone_dma-remains-from-sh-sh64.patch set-config_zone_dma-for-arches-with-generic_isa_dma.patch zoneid-fix-up-calculations-for-zoneid_pgshift.patch My objections to those patches: - They add zillions of ifdefs - They make the VM's behaviour diverge between different platforms and between differen configs on the same platforms, and hence degrade maintainability and increase complexity. - We kicked around some quite different ways of implementing the same things, but nothing came of it. iirc, one was to remove the hard-coded zones altogether and rework all the MM to operate in terms of for (idx = 0; idx NUMBER_OF_ZONES; idx++) ... - I haven't seen any hard numbers to justify the change. So I want to drop them all. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Andrew Morton wrote: On Thu, 25 Jan 2007 23:44:58 + (GMT) Mel Gorman [EMAIL PROTECTED] wrote: The following 8 patches against 2.6.20-rc4-mm1 create a zone called ZONE_MOVABLE Argh. These surely get all tangled up with the make-zones-optional-by-adding-zillions-of-ifdef patches: There may be some entertainment there all right. I didn't see any obvious way of avoiding collisions with those patches but for what it's worth, ZONE_MOVABLE could also be made optional. In this patchset, I made no assumptions about the number of zones other than the value of MAX_NR_ZONES. There should be no critical collisions but I'll look through this patch list and see what I can spot. deal-with-cases-of-zone_dma-meaning-the-first-zone.patch This patch looks ok and looks like it stands on it's own. introduce-config_zone_dma.patch ok, no collisions here but obviously this patch does not stand on it's own. optional-zone_dma-in-the-vm.patch There are collisions here with the __ZONE_COUNT stuff but it's not difficult to work around. optional-zone_dma-in-the-vm-no-gfp_dma-check-in-the-slab-if-no-config_zone_dma-is-set.patch optional-zone_dma-in-the-vm-no-gfp_dma-check-in-the-slab-if-no-config_zone_dma-is-set-reduce-config_zone_dma-ifdefs.patch There is no cross-over here with the ZONE_MOVABLE patches. They are messing around with slab optional-zone_dma-for-ia64.patch No collision here remove-zone_dma-remains-from-parisc.patch remove-zone_dma-remains-from-sh-sh64.patch No collisions here either. I see that there were discussions about Power potentially doing something similar. set-config_zone_dma-for-arches-with-generic_isa_dma.patch No collisions zoneid-fix-up-calculations-for-zoneid_pgshift.patch Fun, but no collisions. To my suprise, I only spotted one major conflict point with optional-zone_dma-in-the-vm.patch and that should be easy enough to resolve. What I could do is break up one of my patches into most-of-the-patch and the-part-that-may-conflict-with-optional-dma-zone . The smaller part would then change depending on whether the optional DMA zone work is present. Would that be any help? My objections to those patches: - They add zillions of ifdefs - They make the VM's behaviour diverge between different platforms and between differen configs on the same platforms, and hence degrade maintainability and increase complexity. I haven't thought about it much so I probably am missing something. The major difference I see is when only one zone is present. In that case, a number of loops presumably get optimised away and the behavior is very different (presumably better although you point out no figures exist to prove it). Where there are two or more zones, the code paths should be similar whether there are 2, 3 or 4 zones present. As the common platforms will always have more than one zone, it'll be heavily tested and I'm guessing that distros are always going to have to ship kernels with ZONE_DMA for the devices that require it. The only platform I see that may have problems at the moment is IA64 which looks like the only platform that can have one and only one zone. I am guessing that Christoph will catch problems here fairly quickly although a non-optional ZONE_MOVABLE would throw a spanner into the works somewhat. - We kicked around some quite different ways of implementing the same things, but nothing came of it. iirc, one was to remove the hard-coded zones altogether and rework all the MM to operate in terms of for (idx = 0; idx NUMBER_OF_ZONES; idx++) ... hmm. Assuming the aim is to have a situation where all zone-related loops are optimised away at compile-time, it's hard to see an alternative that works. Any dynamic way of creating zone at boot time will not have the compile-time optimizations and any API that is page-range aware will eventually hit the problems zones were made to solve (i.e. unmovable pages locked in the lower address ranges). - I haven't seen any hard numbers to justify the change. So I want to drop them all. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Andrew Morton wrote: - They add zillions of ifdefs They just add a few for ZONE_DMA where we alreaday have similar ifdefs for ZONE_DMA32 and ZONE_HIGHMEM. - They make the VM's behaviour diverge between different platforms and between differen configs on the same platforms, and hence degrade maintainability and increase complexity. They avoid unecessary complexity on platforms. They could be made to work on more platforms with measures to deal with what ZONE_DMA provides in different ways. There are 6 or so platforms that do not need ZONE_DMA at all. - We kicked around some quite different ways of implementing the same things, but nothing came of it. iirc, one was to remove the hard-coded zones altogether and rework all the MM to operate in terms of for (idx = 0; idx NUMBER_OF_ZONES; idx++) ... Hmmm.. How would that be simpler? - I haven't seen any hard numbers to justify the change. I have send you numbers showing significant reductions in code size. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Mel Gorman wrote: I haven't thought about it much so I probably am missing something. The major difference I see is when only one zone is present. In that case, a number of loops presumably get optimised away and the behavior is very different (presumably better although you point out no figures exist to prove it). Where there are two or more zones, the code paths should be similar whether there are 2, 3 or 4 zones present. The balancing of allocations between zones is becoming unnecessary. Also in a NUMA system we then have zone == node which allows for a series of simplifications. As the common platforms will always have more than one zone, it'll be heavily tested and I'm guessing that distros are always going to have to ship kernels with ZONE_DMA for the devices that require it. The only platform I see that may have problems at the moment is IA64 which looks like the only platform that can have one and only one zone. I am guessing that Christoph will catch problems here fairly quickly although a non-optional ZONE_MOVABLE would throw a spanner into the works somewhat. There are 6 platforms that have only one zone. These are not major platforms. In order for major platforms to go to a single zone in general we would have to implement a generic mechanism to do an allocation where one can specify the memory boundaries. Many DMA engines have different limitations from what ZONE_DMA and ZONE_DMA32 can provide. If such a scheme would be implemented then those would be able to utilize memory better and the amount of bounce buffers would be reduced. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Thu, 25 Jan 2007, Mel Gorman wrote: The following 8 patches against 2.6.20-rc4-mm1 create a zone called ZONE_MOVABLE that is only usable by allocations that specify both __GFP_HIGHMEM and __GFP_MOVABLE. This has the effect of keeping all non-movable pages within a single memory partition while allowing movable allocations to be satisified from either partition. For arches that do not have HIGHMEM other zones would be okay too it seems. The size of the zone is determined by a kernelcore= parameter specified at boot-time. This specifies how much memory is usable by non-movable allocations and the remainder is used for ZONE_MOVABLE. Any range of pages within ZONE_MOVABLE can be released by migrating the pages or by reclaiming. The user has to manually fiddle around with the size of the unmovable partition until it works? When selecting a zone to take pages from for ZONE_MOVABLE, there are two things to consider. First, only memory from the highest populated zone is used for ZONE_MOVABLE. On the x86, this is probably going to be ZONE_HIGHMEM but it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64. Second, the amount of memory usable by the kernel will be spreadly evenly throughout NUMA nodes where possible. If the nodes are not of equal size, the amount of memory usable by the kernel on some nodes may be greater than others. So how is the amount of movable memory on a node calculated? Evenly distributed? There are some NUMA architectures that are not that symmetric. By default, the zone is not as useful for hugetlb allocations because they are pinned and non-migratable (currently at least). A sysctl is provided that allows huge pages to be allocated from that zone. This means that the huge page pool can be resized to the size of ZONE_MOVABLE during the lifetime of the system assuming that pages are not mlocked. Despite huge pages being non-movable, we do not introduce additional external fragmentation of note as huge pages are always the largest contiguous block we care about. The user already has to specify the partitioning of the system at bootup and could take the huge page sizes into account. Also huge pages may have variable sizes that can be specified on bootup for IA64. The assumption that a huge page is always the largest contiguous block is *not true*. The huge page sizes on i386 and x86_64 platforms are contigent on their page table structure. This can be completely different on other platforms. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Christoph Lameter wrote: On Thu, 25 Jan 2007, Mel Gorman wrote: The following 8 patches against 2.6.20-rc4-mm1 create a zone called ZONE_MOVABLE that is only usable by allocations that specify both __GFP_HIGHMEM and __GFP_MOVABLE. This has the effect of keeping all non-movable pages within a single memory partition while allowing movable allocations to be satisified from either partition. For arches that do not have HIGHMEM other zones would be okay too it seems. It would, but it'd obscure the code to take advantage of that. The size of the zone is determined by a kernelcore= parameter specified at boot-time. This specifies how much memory is usable by non-movable allocations and the remainder is used for ZONE_MOVABLE. Any range of pages within ZONE_MOVABLE can be released by migrating the pages or by reclaiming. The user has to manually fiddle around with the size of the unmovable partition until it works? They have to fiddle with the size of the unmovable partition if their workload uses more unmovable kernel allocations than expected. This was always going to be the restriction with using zones for partitioning memory. Resizing zones on the fly is not really an option because the resizing would only work reliably in one direction. The anti-fragmentation code could potentially be used to have subzone groups that kept movable and unmovable allocations as far apart as possible and at opposite ends of a zone. That approach has been kicked a few times because of complexity. When selecting a zone to take pages from for ZONE_MOVABLE, there are two things to consider. First, only memory from the highest populated zone is used for ZONE_MOVABLE. On the x86, this is probably going to be ZONE_HIGHMEM but it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64. Second, the amount of memory usable by the kernel will be spreadly evenly throughout NUMA nodes where possible. If the nodes are not of equal size, the amount of memory usable by the kernel on some nodes may be greater than others. So how is the amount of movable memory on a node calculated? Subtle difference. The amount of unmovable memory is calculated per node. Evenly distributed? As evenly as possible. There are some NUMA architectures that are not that symmetric. I know, it's why find_zone_movable_pfns_for_nodes() is as complex as it is. The mechanism spreads the unmovable memory evenly throughout all nodes. In the event some nodes are too small to hold their share, the remaining unmovable memory is divided between the nodes that are larger. By default, the zone is not as useful for hugetlb allocations because they are pinned and non-migratable (currently at least). A sysctl is provided that allows huge pages to be allocated from that zone. This means that the huge page pool can be resized to the size of ZONE_MOVABLE during the lifetime of the system assuming that pages are not mlocked. Despite huge pages being non-movable, we do not introduce additional external fragmentation of note as huge pages are always the largest contiguous block we care about. The user already has to specify the partitioning of the system at bootup and could take the huge page sizes into account. Not in all cases. Some systems will not know how many huge pages they need in advance because it is used as a batch system running jobs as requested. The zone allows an amount of memory to be set aside that can be *optionally* used for hugepages if desired or base pages if not. Between jobs, the hugepage pool can be resized up to the size of ZONE_MOVABLE. The other case is ever supporting memory hot-remove. Any memory within ZONE_MOVABLE can potentially be removed by migrating pages and off-lined. Also huge pages may have variable sizes that can be specified on bootup for IA64. The assumption that a huge page is always the largest contiguous block is *not true*. I didn't say they were the largest supported contiguous block, I said they were the largest contiguous block we *care* about. Right now, it is assumed that variable pages are not supported at runtime. If they were, some smarts would be needed to keep huge pages of the same size together to control external fragmentation but that's about it. The huge page sizes on i386 and x86_64 platforms are contigent on their page table structure. This can be completely different on other platforms. The size doesn't really make much difference to the mechanism. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Mel Gorman wrote: For arches that do not have HIGHMEM other zones would be okay too it seems. It would, but it'd obscure the code to take advantage of that. No MOVABLE memory for 64 bit platforms that do not have HIGHMEM right now? The anti-fragmentation code could potentially be used to have subzone groups that kept movable and unmovable allocations as far apart as possible and at opposite ends of a zone. That approach has been kicked a few times because of complexity. Hmm... But his patch also introduces additional complexity plus its difficult to handle for the end user. There are some NUMA architectures that are not that symmetric. I know, it's why find_zone_movable_pfns_for_nodes() is as complex as it is. The mechanism spreads the unmovable memory evenly throughout all nodes. In the event some nodes are too small to hold their share, the remaining unmovable memory is divided between the nodes that are larger. I would have expected a percentage of a node. If equal amounts of unmovable memory are assigned to all nodes at first then there will be large disparities in the amount of movable memories f.e. between a node with 8G memory compared to a node with 1GB memory. How do you handle headless nodes? I.e. memory nodes with no processors? Those may be particularly large compared to the rest but these are mainly used for movable pages since unmovable things like device drivers buffers have to be kept near the processors that take the interrupt. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Christoph Lameter wrote: On Fri, 26 Jan 2007, Mel Gorman wrote: For arches that do not have HIGHMEM other zones would be okay too it seems. It would, but it'd obscure the code to take advantage of that. No MOVABLE memory for 64 bit platforms that do not have HIGHMEM right now? err, no, I misinterpreted what you meant by other zones would be ok... I though you were suggesting the reuse of zone names for some reason. The zone used to for ZONE_MOVABLE is the highest populated zone on the architecture. On some architectures, that will be ZONE_HIGHMEM. On others, it will be ZONE_DMA. See the function find_usable_zone_for_movable() ZONE_MOVABLE never spans zones. For example, it will not use some ZONE_HIGHMEM and some ZONE_NORMAL memory. The anti-fragmentation code could potentially be used to have subzone groups that kept movable and unmovable allocations as far apart as possible and at opposite ends of a zone. That approach has been kicked a few times because of complexity. Hmm... But his patch also introduces additional complexity plus its difficult to handle for the end user. It's harder for the user to setup all right. But it works within limits that are known well in advance and doesn't add additional code to the main allocator path. Once it's setup, it acts like any other zone and zone behavior is better understood than anti-fragmentations behavior. There are some NUMA architectures that are not that symmetric. I know, it's why find_zone_movable_pfns_for_nodes() is as complex as it is. The mechanism spreads the unmovable memory evenly throughout all nodes. In the event some nodes are too small to hold their share, the remaining unmovable memory is divided between the nodes that are larger. I would have expected a percentage of a node. If equal amounts of unmovable memory are assigned to all nodes at first then there will be large disparities in the amount of movable memories f.e. between a node with 8G memory compared to a node with 1GB memory. On the other hand, percentages make it harder for the administrator to know in advance how much unmovable memory will be available when the system starts even if the machine changes configuration. The absolute figure is easier to understand. If there was a requirement, an alternative configuration option could be made available that takes a fixed percentage of each node with memory. How do you handle headless nodes? I.e. memory nodes with no processors? The code only cares about memory, not processors. Those may be particularly large compared to the rest but these are mainly used for movable pages since unmovable things like device drivers buffers have to be kept near the processors that take the interrupt. Then what I'd do is specify kernelcore to be (number_of_nodes_with_processors * largest_amount_of_memory_on_node_with_processors) That would have all memory near processors available as unmovable memory (that movable allocations will still use so they don't always go remote) while keeping a large amount of memory on the headless nodes for movable allocations only. If requirements demanded, a configuration option could be made that allows the administrator to specify exactly how much unmovable memory he wants on a specific node. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007 07:56:09 -0800 (PST) Christoph Lameter [EMAIL PROTECTED] wrote: On Fri, 26 Jan 2007, Andrew Morton wrote: - They add zillions of ifdefs They just add a few for ZONE_DMA where we alreaday have similar ifdefs for ZONE_DMA32 and ZONE_HIGHMEM. I refreshed my memory. It remains awful. - They make the VM's behaviour diverge between different platforms and between differen configs on the same platforms, and hence degrade maintainability and increase complexity. They avoid unecessary complexity on platforms. They could be made to work on more platforms with measures to deal with what ZONE_DMA provides in different ways. There are 6 or so platforms that do not need ZONE_DMA at all. As Mel points out, distros will ship with CONFIG_ZONE_DMA=y, so the number of machines which will actually benefit from this change is really small. And the benefit to those few machines will also, I suspect, be small. - We kicked around some quite different ways of implementing the same things, but nothing came of it. iirc, one was to remove the hard-coded zones altogether and rework all the MM to operate in terms of for (idx = 0; idx NUMBER_OF_ZONES; idx++) ... Hmmm.. How would that be simpler? Replace a sprinkle of open-coded ifdefs with a regular code sequence which everyone uses. Pretty obvious, I'd thought. Plus it becoems straightforward to extend this from the present four zones to a complete 12 zones, which gives use the full set of ZONE_DMA20,ZONE_DMA21,...,ZONE_DMA32 for those funny devices. - I haven't seen any hard numbers to justify the change. I have send you numbers showing significant reductions in code size. If it isn't in the changelog it doesn't exist. I guess I didn't copy it into the changelog. If the only demonstrable benefit is a saving of a few k of text on a small number of machines then things are looking very grim, IMO. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007, Andrew Morton wrote: As Mel points out, distros will ship with CONFIG_ZONE_DMA=y, so the number of machines which will actually benefit from this change is really small. And the benefit to those few machines will also, I suspect, be small. - We kicked around some quite different ways of implementing the same things, but nothing came of it. iirc, one was to remove the hard-coded zones altogether and rework all the MM to operate in terms of for (idx = 0; idx NUMBER_OF_ZONES; idx++) ... Hmmm.. How would that be simpler? Replace a sprinkle of open-coded ifdefs with a regular code sequence which everyone uses. Pretty obvious, I'd thought. We do use such loops in many places. However, stuff like array initialization and special casing cannot use a loop. I am not sure what we could change there. The hard coding is necessary because each zone currently has these invariant characteristics that we need to consider. Reducing the number of zones reduces the amount of special casing in the VM that needs to be considered at run time and that is a potential issue for trouble. Plus it becoems straightforward to extend this from the present four zones to a complete 12 zones, which gives use the full set of ZONE_DMA20,ZONE_DMA21,...,ZONE_DMA32 for those funny devices. I just hope we can handle the VM complexity of load balancing etc etc that this will introduce. Also each zone has management overhead and will cause the touching of additional cachelines on many VM operations. Much of that management overhead becomes unnecessary if we reduce zones. If the only demonstrable benefit is a saving of a few k of text on a small number of machines then things are looking very grim, IMO. The main benefit is a significant simplification of the VM, leading to robust and reliable operations and a reduction of the maintenance headaches coming with the additional zones. If we would introduce the ability of allocating from a range of physical addresses then the need for DMA zones would go away allowing flexibility for device driver DMA allocations and at the same time we get rid of special casing in the VM. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
On Fri, 26 Jan 2007 11:58:18 -0800 (PST) Christoph Lameter [EMAIL PROTECTED] wrote: If the only demonstrable benefit is a saving of a few k of text on a small number of machines then things are looking very grim, IMO. The main benefit is a significant simplification of the VM, leading to robust and reliable operations and a reduction of the maintenance headaches coming with the additional zones. If we would introduce the ability of allocating from a range of physical addresses then the need for DMA zones would go away allowing flexibility for device driver DMA allocations and at the same time we get rid of special casing in the VM. None of this is valid. The great majority of machines out there will continue to have the same number of zones. Nothing changes. What will happen is that a small number of machines will have different runtime behaviour. So they don't benefit from the majority's testing and they don't contrinute to it and they potentially have unique-to-them problems which we need to worry about. That's all a real cost, so we need to see *good* benefits to outweigh that cost. Thus far I don't think we've seen that. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
The following 8 patches against 2.6.20-rc4-mm1 create a zone called ZONE_MOVABLE that is only usable by allocations that specify both __GFP_HIGHMEM and __GFP_MOVABLE. This has the effect of keeping all non-movable pages within a single memory partition while allowing movable allocations to be satisified from either partition. The size of the zone is determined by a kernelcore= parameter specified at boot-time. This specifies how much memory is usable by non-movable allocations and the remainder is used for ZONE_MOVABLE. Any range of pages within ZONE_MOVABLE can be released by migrating the pages or by reclaiming. When selecting a zone to take pages from for ZONE_MOVABLE, there are two things to consider. First, only memory from the highest populated zone is used for ZONE_MOVABLE. On the x86, this is probably going to be ZONE_HIGHMEM but it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64. Second, the amount of memory usable by the kernel will be spreadly evenly throughout NUMA nodes where possible. If the nodes are not of equal size, the amount of memory usable by the kernel on some nodes may be greater than others. By default, the zone is not as useful for hugetlb allocations because they are pinned and non-migratable (currently at least). A sysctl is provided that allows huge pages to be allocated from that zone. This means that the huge page pool can be resized to the size of ZONE_MOVABLE during the lifetime of the system assuming that pages are not mlocked. Despite huge pages being non-movable, we do not introduce additional external fragmentation of note as huge pages are always the largest contiguous block we care about. A lot of credit goes to Andy Whitcroft for catching a large variety of problems during review of the patches. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages
The following 8 patches against 2.6.20-rc4-mm1 create a zone called ZONE_MOVABLE that is only usable by allocations that specify both __GFP_HIGHMEM and __GFP_MOVABLE. This has the effect of keeping all non-movable pages within a single memory partition while allowing movable allocations to be satisified from either partition. The size of the zone is determined by a kernelcore= parameter specified at boot-time. This specifies how much memory is usable by non-movable allocations and the remainder is used for ZONE_MOVABLE. Any range of pages within ZONE_MOVABLE can be released by migrating the pages or by reclaiming. When selecting a zone to take pages from for ZONE_MOVABLE, there are two things to consider. First, only memory from the highest populated zone is used for ZONE_MOVABLE. On the x86, this is probably going to be ZONE_HIGHMEM but it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64. Second, the amount of memory usable by the kernel will be spreadly evenly throughout NUMA nodes where possible. If the nodes are not of equal size, the amount of memory usable by the kernel on some nodes may be greater than others. By default, the zone is not as useful for hugetlb allocations because they are pinned and non-migratable (currently at least). A sysctl is provided that allows huge pages to be allocated from that zone. This means that the huge page pool can be resized to the size of ZONE_MOVABLE during the lifetime of the system assuming that pages are not mlocked. Despite huge pages being non-movable, we do not introduce additional external fragmentation of note as huge pages are always the largest contiguous block we care about. A lot of credit goes to Andy Whitcroft for catching a large variety of problems during review of the patches. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/