Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Fri, Sep 03, 2010 at 07:29:43PM +0900, KAMEZAWA Hiroyuki wrote: On Thu, 2 Sep 2010 17:54:24 +0900 KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: Here is a rough code for this. here is a _tested_ one. If I tested correctly, I allocated 40MB of contigous pages by the new funciton. I'm grad this can be some hints for people. Great! I didn't look into the detail but the concept seems to be good. If someone doesn't need complex intelligent(ex, shared, private, [first|best] fit, buddy), this is enough for that. So I think this will be good regardless of CMA. I will look into this more detaily and think idea to improve. Thanks, Kame. :) Thanks, -Kame == From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com This patch as a memory allocator for contiguous memory larger than MAX_ORDER. alloc_contig_pages(hint, size, list); This function allocates 'size' of contigoues pages, whose physical address is higher than 'hint'. size is specicied in byte unit. size is byte, hint is pfn? Allocated pages are all linked into the list and all of their page_count() are set to 1. Return value is the top page. free_contig_pages(list) returns all pages in the list. This patch does - find an area which can be ISOLATED. - migrate remaining pages in the area. Migrate from there to where? - steal chunk of pages from allocator. Limitation is: - retruned pages will be aligend to MAX_ORDER. - returned length of page will be aligned to MAX_ORDER. (so, the caller may have to return tails of pages by itself.) What do you mean tail? - may allocate contiguous pages which overlap node/zones. Hmm.. Do we really need this? This is fully experimental and written as example. (Maybe need more patches to make this complete.) Yes. But first impression of this patch is good to me. This patch moves some amount of codes from memory_hotplug.c to page_isolation.c and based on page-offline technique used by memory_hotplug.c Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com --- include/linux/page-isolation.h | 10 + mm/memory_hotplug.c| 84 -- mm/page_alloc.c| 32 + mm/page_isolation.c| 244 + 4 files changed, 287 insertions(+), 83 deletions(-) Index: mmotm-0827/mm/page_isolation.c === --- mmotm-0827.orig/mm/page_isolation.c +++ mmotm-0827/mm/page_isolation.c @@ -3,8 +3,11 @@ */ #include linux/mm.h +#include linux/swap.h #include linux/page-isolation.h #include linux/pageblock-flags.h +#include linux/mm_inline.h +#include linux/migrate.h #include internal.h static inline struct page * @@ -140,3 +143,244 @@ int test_pages_isolated(unsigned long st spin_unlock_irqrestore(zone-lock, flags); return ret ? 0 : -EBUSY; } + +#define CONTIG_ALLOC_MIGRATION_RETRY (5) + +/* + * Scanning pfn is much easier than scanning lru list. + * Scan pfn from start to end and Find LRU page. + */ +unsigned long scan_lru_pages(unsigned long start, unsigned long end) +{ + unsigned long pfn; + struct page *page; + for (pfn = start; pfn end; pfn++) { + if (pfn_valid(pfn)) { + page = pfn_to_page(pfn); + if (PageLRU(page)) + return pfn; + } + } + return 0; +} + +/* Migrate all LRU pages in the range to somewhere else */ +static struct page * +hotremove_migrate_alloc(struct page *page, unsigned long private, int **x) +{ + /* This should be improved!! */ Yeb. + return alloc_page(GFP_HIGHUSER_MOVABLE); +} snip +struct page *alloc_contig_pages(unsigned long long hint, + unsigned long size, struct list_head *list) +{ + unsigned long base, found, end, pages, start; + struct page *ret = NULL; + int nid, retry; + + if (hint) + hint = ALIGN(hint, MAX_ORDER_NR_PAGES); + /* request size should be aligned to pageblock */ + size = PAGE_SHIFT; + pages = ALIGN(size, MAX_ORDER_NR_PAGES); + found = 0; +retry: + for_each_node_state(nid, N_HIGH_MEMORY) { + unsigned long node_end; + pg_data_t *node = NODE_DATA(nid); + + node_end = node-node_start_pfn + node-node_spanned_pages; + /* does this node have proper range of memory ? */ + if (node_end hint + pages) + continue; + base = hint; + if (base node-node_start_pfn) + base = node-node_start_pfn; + + base = ALIGN(base, MAX_ORDER_NR_PAGES); + found = 0; + end = node_end ~(MAX_ORDER_NR_PAGES -1); + /* Maybe we can use this Node */ + if (base + pages end) + found
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Mon, 6 Sep 2010 00:57:53 +0900 Minchan Kim minchan@gmail.com wrote: Thanks, -Kame == From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com This patch as a memory allocator for contiguous memory larger than MAX_ORDER. alloc_contig_pages(hint, size, list); This function allocates 'size' of contigoues pages, whose physical address is higher than 'hint'. size is specicied in byte unit. size is byte, hint is pfn? hint is physical address. What's annoying me is x86-32, should I use physaddr_t or pfn Allocated pages are all linked into the list and all of their page_count() are set to 1. Return value is the top page. free_contig_pages(list) returns all pages in the list. This patch does - find an area which can be ISOLATED. - migrate remaining pages in the area. Migrate from there to where? somewhere. - steal chunk of pages from allocator. Limitation is: - retruned pages will be aligend to MAX_ORDER. - returned length of page will be aligned to MAX_ORDER. (so, the caller may have to return tails of pages by itself.) What do you mean tail? Ah, the allocator returns MAX_ORDER aligned pages, then, [y] x+y = allocated x = will be used. y = will be unsused. I call 'y' as tail, here. - may allocate contiguous pages which overlap node/zones. Hmm.. Do we really need this? Unnecessary. please consider this as BUG. This code just check pfn of allocated area but doesn't check which zone/node the pfn is tied to. For example, I hear IBM has following kind of memory layout. | Node0 | Node1 | Node2 | Node0 | Node2 | Node1| . So, some check should be added to avoid to allocate chunk of pages spreads out to multiple nodes. (I hope walk_page_range() can do enough jobs for us, but I'm not sure. I need to add zone check, at least) This is fully experimental and written as example. (Maybe need more patches to make this complete.) Yes. But first impression of this patch is good to me. This patch moves some amount of codes from memory_hotplug.c to page_isolation.c and based on page-offline technique used by memory_hotplug.c Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com --- include/linux/page-isolation.h | 10 + mm/memory_hotplug.c| 84 -- mm/page_alloc.c| 32 + mm/page_isolation.c| 244 + 4 files changed, 287 insertions(+), 83 deletions(-) Index: mmotm-0827/mm/page_isolation.c === --- mmotm-0827.orig/mm/page_isolation.c +++ mmotm-0827/mm/page_isolation.c @@ -3,8 +3,11 @@ */ #include linux/mm.h +#include linux/swap.h #include linux/page-isolation.h #include linux/pageblock-flags.h +#include linux/mm_inline.h +#include linux/migrate.h #include internal.h static inline struct page * @@ -140,3 +143,244 @@ int test_pages_isolated(unsigned long st spin_unlock_irqrestore(zone-lock, flags); return ret ? 0 : -EBUSY; } + +#define CONTIG_ALLOC_MIGRATION_RETRY (5) + +/* + * Scanning pfn is much easier than scanning lru list. + * Scan pfn from start to end and Find LRU page. + */ +unsigned long scan_lru_pages(unsigned long start, unsigned long end) +{ + unsigned long pfn; + struct page *page; + for (pfn = start; pfn end; pfn++) { + if (pfn_valid(pfn)) { + page = pfn_to_page(pfn); + if (PageLRU(page)) + return pfn; + } + } + return 0; +} + +/* Migrate all LRU pages in the range to somewhere else */ +static struct page * +hotremove_migrate_alloc(struct page *page, unsigned long private, int **x) +{ + /* This should be improved!! */ Yeb. + return alloc_page(GFP_HIGHUSER_MOVABLE); +} snip +struct page *alloc_contig_pages(unsigned long long hint, + unsigned long size, struct list_head *list) +{ + unsigned long base, found, end, pages, start; + struct page *ret = NULL; + int nid, retry; + + if (hint) + hint = ALIGN(hint, MAX_ORDER_NR_PAGES); + /* request size should be aligned to pageblock */ + size = PAGE_SHIFT; + pages = ALIGN(size, MAX_ORDER_NR_PAGES); + found = 0; +retry: + for_each_node_state(nid, N_HIGH_MEMORY) { + unsigned long node_end; + pg_data_t *node = NODE_DATA(nid); + + node_end = node-node_start_pfn + node-node_spanned_pages; + /* does this node have proper range of memory ? */ + if (node_end hint + pages) + continue; + base = hint; + if (base node-node_start_pfn) + base =
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, 2 Sep 2010 17:54:24 +0900 KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: Here is a rough code for this. here is a _tested_ one. If I tested correctly, I allocated 40MB of contigous pages by the new funciton. I'm grad this can be some hints for people. Thanks, -Kame == From: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com This patch as a memory allocator for contiguous memory larger than MAX_ORDER. alloc_contig_pages(hint, size, list); This function allocates 'size' of contigoues pages, whose physical address is higher than 'hint'. size is specicied in byte unit. Allocated pages are all linked into the list and all of their page_count() are set to 1. Return value is the top page. free_contig_pages(list) returns all pages in the list. This patch does - find an area which can be ISOLATED. - migrate remaining pages in the area. - steal chunk of pages from allocator. Limitation is: - retruned pages will be aligend to MAX_ORDER. - returned length of page will be aligned to MAX_ORDER. (so, the caller may have to return tails of pages by itself.) - may allocate contiguous pages which overlap node/zones. This is fully experimental and written as example. (Maybe need more patches to make this complete.) This patch moves some amount of codes from memory_hotplug.c to page_isolation.c and based on page-offline technique used by memory_hotplug.c Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com --- include/linux/page-isolation.h | 10 + mm/memory_hotplug.c| 84 -- mm/page_alloc.c| 32 + mm/page_isolation.c| 244 + 4 files changed, 287 insertions(+), 83 deletions(-) Index: mmotm-0827/mm/page_isolation.c === --- mmotm-0827.orig/mm/page_isolation.c +++ mmotm-0827/mm/page_isolation.c @@ -3,8 +3,11 @@ */ #include linux/mm.h +#include linux/swap.h #include linux/page-isolation.h #include linux/pageblock-flags.h +#include linux/mm_inline.h +#include linux/migrate.h #include internal.h static inline struct page * @@ -140,3 +143,244 @@ int test_pages_isolated(unsigned long st spin_unlock_irqrestore(zone-lock, flags); return ret ? 0 : -EBUSY; } + +#define CONTIG_ALLOC_MIGRATION_RETRY (5) + +/* + * Scanning pfn is much easier than scanning lru list. + * Scan pfn from start to end and Find LRU page. + */ +unsigned long scan_lru_pages(unsigned long start, unsigned long end) +{ + unsigned long pfn; + struct page *page; + for (pfn = start; pfn end; pfn++) { + if (pfn_valid(pfn)) { + page = pfn_to_page(pfn); + if (PageLRU(page)) + return pfn; + } + } + return 0; +} + +/* Migrate all LRU pages in the range to somewhere else */ +static struct page * +hotremove_migrate_alloc(struct page *page, unsigned long private, int **x) +{ + /* This should be improved!! */ + return alloc_page(GFP_HIGHUSER_MOVABLE); +} + +#define NR_MOVE_AT_ONCE_PAGES (256) +int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) +{ + unsigned long pfn; + struct page *page; + int move_pages = NR_MOVE_AT_ONCE_PAGES; + int not_managed = 0; + int ret = 0; + LIST_HEAD(source); + + for (pfn = start_pfn; pfn end_pfn move_pages 0; pfn++) { + if (!pfn_valid(pfn)) + continue; + page = pfn_to_page(pfn); + if (!page_count(page)) + continue; + /* +* We can skip free pages. And we can only deal with pages on +* LRU. +*/ + ret = isolate_lru_page(page); + if (!ret) { /* Success */ + list_add_tail(page-lru, source); + move_pages--; + inc_zone_page_state(page, NR_ISOLATED_ANON + + page_is_file_cache(page)); + + } else { + /* Becasue we don't have big zone-lock. we should + check this again here. */ + if (page_count(page)) + not_managed++; +#ifdef CONFIG_DEBUG_VM + printk(KERN_ALERT removing pfn %lx from LRU failed\n, + pfn); + dump_page(page); +#endif + } + } + ret = -EBUSY; + if (not_managed) { + if (!list_empty(source)) + putback_lru_pages(source); + goto out; + } + ret = 0; + if (list_empty(source)) + goto out; + /* this function returns # of failed pages */ + ret = migrate_pages(source, hotremove_migrate_alloc, 0, 1); +
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Fri, 27 Aug 2010 17:16:39 +0900 KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: On Thu, 26 Aug 2010 18:36:24 +0900 Minchan Kim minchan@gmail.com wrote: On Thu, Aug 26, 2010 at 1:30 PM, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: On Thu, 26 Aug 2010 13:06:28 +0900 Minchan Kim minchan@gmail.com wrote: On Thu, Aug 26, 2010 at 12:44 PM, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: On Thu, 26 Aug 2010 11:50:17 +0900 KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: 128MB...too big ? But it's depend on config. IBM's ppc guys used 16MB section, and recently, a new interface to shrink the number of /sys files are added, maybe usable. Something good with this approach will be you can create cma memory before installing driver. But yes, complicated and need some works. Ah, I need to clarify what I want to say. With compaction, it's helpful, but you can't get contiguous memory larger than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand, memory hot-plug code has almost all necessary things. True. Doesn't patch's idea of Christoph helps this ? http://lwn.net/Articles/200699/ yes, I think so. But, IIRC, it's own purpose of Chirstoph's work is for removing zones. please be careful what's really necessary. Ahh. Sorry for missing point. You're right. The patch can't help our problem. How about changing following this? The thing is MAX_ORDER is static. But we want to avoid too big MAX_ORDER of whole zones to support devices which requires big allocation chunk. So let's add MAX_ORDER into each zone and then, each zone can have different max order. For example, while DMA[32], NORMAL, HIGHMEM can have normal size 11, MOVABLE zone could have a 15. This approach has a big side effect? Hm...need to check hard coded MAX_ORDER usages...I don't think side-effect is big. Hmm. But I think enlarging MAX_ORDER isn't an important thing. A code which strips contiguous chunks of pages from buddy allocator is a necessaty thing, as.. What I can think of at 1st is... == int steal_pages(unsigned long start_pfn, unsigned long end_pfn) { /* Be careful mutal execution with memory hotplug, because reusing code */ split [start_pfn, end_pfn) to pageblock_order for each pageblock in the range { Mark this block as MIGRATE_ISOLATE try-to-free pages in the range or migrate pages in the range to somewhere. /* Here all pages in the range are on buddy allocator and free and never be allocated by anyone else. */ } please see __rmqueue_fallback(). it selects migration-type at 1st. Then, if you can pass start_migratetype of MIGLATE_ISOLATE, you can automatically strip all MIGRATE_ISOLATE pages from free_area[]. return chunk of pages. } == Here is a rough code for this. I'm sorry I can't have time to show enough good code. Maybe this cannot be compiled. But you may be able to see what can be done with memory hotplug or compaction code. I'll brush this up if someone has interest. == This is a code for creating isolated memory block of contiguous pages. find_isolate_contig_block(unsigned long hint, unsigned long size) will retrun [start, start+size] of isolated pages - start hint, - no memory holes within it. - page allocator will never touch pages within the range. Of course, this can fail. This code makes use of memory-hotunplug's code. But yes, you can think of reusing compaction codes. This is an example. Not compiled at all...please don't see details. --- mm/isolation.c | 236 + 1 file changed, 236 insertions(+) Index: kametest/mm/isolation.c === --- /dev/null +++ kametest/mm/isolation.c @@ -0,0 +1,233 @@ +struct page_range { + unsigned long base, end, pages; +}; + +int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg) +{ + struct page_range *blockinfo = arg; + + if (nr_pages blockinfo-pages) { + blockinfo-base = pfn; + blockinfo-end = pfn + nr_pages; + return 1; + } + return 0; +} + + +unsigned long __find_contig_block(unsigned long base, + unsigned long end, unsigned long pages) +{ + unsigned long pfn, tmp, index; + struct page_range blockinfo; + int ret; + + /* Skip memory holes */ +retry: + blockinfo.base = base; + blockinfo.end = end; + blockinfo.pages = pages; + ret = walk_system_ram_range(base, end - base, blockinfo, +
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
Andrew Morton wrote: It would help (a lot) if we could get more attention and buyin and fedback from the potential clients of this code. rmk's feedback is valuable. Have we heard from the linux-media people? What other subsystems might use it? ieee1394 perhaps? All FireWire controllers are OHCI and use scatter-gather lists. Most USB controllers require continuous memory for USB packets; the USB framework has its own DMA buffer cache. Some sound cards have no IOMMU; the ALSA framework preallocates buffers for those. Regards, Clemens -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thursday, August 26, 2010 00:58:14 Andrew Morton wrote: On Fri, 20 Aug 2010 15:15:10 +0200 Peter Zijlstra pet...@infradead.org wrote: On Fri, 2010-08-20 at 11:50 +0200, Michal Nazarewicz wrote: Hello everyone, The following patchset implements a Contiguous Memory Allocator. For those who have not yet stumbled across CMA an excerpt from documentation: The Contiguous Memory Allocator (CMA) is a framework, which allows setting up a machine-specific configuration for physically-contiguous memory management. Memory for devices is then allocated according to that configuration. The main role of the framework is not to allocate memory, but to parse and manage memory configurations, as well as to act as an in-between between device drivers and pluggable allocators. It is thus not tied to any memory allocation method or strategy. For more information please refer to the second patch from the patchset which contains the documentation. So the idea is to grab a large chunk of memory at boot time and then later allow some device to use it? I'd much rather we'd improve the regular page allocator to be smarter about this. We recently added a lot of smarts to it like memory compaction, which allows large gobs of contiguous memory to be freed for things like huge pages. If you want guarantees you can free stuff, why not add constraints to the page allocation type and only allow MIGRATE_MOVABLE pages inside a certain region, those pages are easily freed/moved aside to satisfy large contiguous allocations. That would be good. Although I expect that the allocation would need to be 100% rock-solid reliable, otherwise the end user has a non-functioning device. Yes, indeed. And you have to be careful as well how you move pages around. Say that you have a capture and an output v4l device: the first one needs 64 MB contiguous memory and so it allocates that amount, moving pages around as needed. Once allocated that memory is pinned in place since it is needed for DMA. So if the output device also needs 64 MB, then you must have a guarantee that the first allocation didn't fragment the available contiguous memory. I also wonder how expensive it is to move all the pages around. E.g. if you have a digital camera and want to make a hires picture, then it wouldn't do if it takes a second to move all the pages around making room for the captured picture. The CPUs in many SoCs are not very powerful compared to your average desktop. And how would memory allocations in specific memory ranges (e.g. memory banks) work? Note also that these issues are not limited to embedded systems, also PCI(e) boards can sometimes require massive amounts of DMA-able memory. I have had this happen in the past with the ivtv driver with customers that had 15 or so capture cards in one box. And I'm sure it will happen in the future as well, esp. with upcoming 4k video formats. Video is a major memory consumer, particularly in embedded systems. And there usually is no room for failure. Could generic core VM provide the required level of service? Anyway, these patches are going to be hard to merge but not impossible. Keep going. Part of the problem is cultural, really: the consumers of this interface are weird dinky little devices which the core MM guys tend not to work with a lot, and it adds code which they wouldn't use. It's not really that weird. The same problems can actually occur as well with the more 'mainstream' consumer level video boards, although you need more extreme environments for these problems to surface. I agree that having two contiguous memory allocators floating about on the list is distressing. Are we really all 100% diligently certain that there is no commonality here with Zach's work? I agree that Peter's above suggestion would be the best thing to do. Please let's take a look at that without getting into sunk cost fallacies with existing code! It would help (a lot) if we could get more attention and buyin and fedback from the potential clients of this code. rmk's feedback is valuable. Have we heard from the linux-media people? I'm doing the reviewing for linux-media. It would be really nice to have a good system for this in place. For example, the current TI davinci capture driver will only work reliably (memory-wise) if you also use the out-of-tree TI cmem module. Hardly a desirable situation. Basically a fair amount of custom hacks is required at the memory to have reliable video streaming on embedded systems due to the lack of a cma-type framework. What other subsystems might use it? ieee1394 perhaps? Please help identify specific subsystems and I can perhaps help to wake people up. The video subsystem is the other candidate. Probably not for the current generation of GPUs (these all have hardware IOMMUs I suspect), but definitely for the framebuffer based devices
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Sat, 2010-08-28 at 15:08 +0200, Hans Verkuil wrote: That would be good. Although I expect that the allocation would need to be 100% rock-solid reliable, otherwise the end user has a non-functioning device. Yes, indeed. And you have to be careful as well how you move pages around. Say that you have a capture and an output v4l device: the first one needs 64 MB contiguous memory and so it allocates that amount, moving pages around as needed. Once allocated that memory is pinned in place since it is needed for DMA. So if the output device also needs 64 MB, then you must have a guarantee that the first allocation didn't fragment the available contiguous memory. Isn't the proposed CMA thing vulnerable to the exact same problem? If you allow sharing of regions and plug some allocator in there you get the same problem. If you can solve it there, you can solve it for any kind of reservation scheme. I also wonder how expensive it is to move all the pages around. E.g. if you have a digital camera and want to make a hires picture, then it wouldn't do if it takes a second to move all the pages around making room for the captured picture. The CPUs in many SoCs are not very powerful compared to your average desktop. Well, that's a trade-off, if you want to have the memory be usable for anything else (which I understood people did want) then you have to pay for cleaning it up when you need to use it. As for the cost of compaction vs regular page-out of random page-cache memory, compaction is actually cheaper, since it doesn't need to write out dirty data, and page-out driven writeback sucks due to the non-linear nature of it. And how would memory allocations in specific memory ranges (e.g. memory banks) work? Make sure you reserve pageblocks in the desired range. Note also that these issues are not limited to embedded systems, also PCI(e) boards can sometimes require massive amounts of DMA-able memory. I have had this happen in the past with the ivtv driver with customers that had 15 or so capture cards in one box. And I'm sure it will happen in the future as well, esp. with upcoming 4k video formats. I would sincerely hope PCI(e) devices come with an IOMMU (and all memory lines wired up), really, any hardware that doesn't isn't worth the silicon its engraved in. Just don't buy it. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Saturday, August 28, 2010 15:34:46 Peter Zijlstra wrote: On Sat, 2010-08-28 at 15:08 +0200, Hans Verkuil wrote: That would be good. Although I expect that the allocation would need to be 100% rock-solid reliable, otherwise the end user has a non-functioning device. Yes, indeed. And you have to be careful as well how you move pages around. Say that you have a capture and an output v4l device: the first one needs 64 MB contiguous memory and so it allocates that amount, moving pages around as needed. Once allocated that memory is pinned in place since it is needed for DMA. So if the output device also needs 64 MB, then you must have a guarantee that the first allocation didn't fragment the available contiguous memory. Isn't the proposed CMA thing vulnerable to the exact same problem? If you allow sharing of regions and plug some allocator in there you get the same problem. If you can solve it there, you can solve it for any kind of reservation scheme. Since with cma you can assign a region exclusively to a driver you can ensure that this problem does not occur. Of course, if you allow sharing then you will end up with the same type of problem unless you know that there is only one driver at a time that will use that memory. I also wonder how expensive it is to move all the pages around. E.g. if you have a digital camera and want to make a hires picture, then it wouldn't do if it takes a second to move all the pages around making room for the captured picture. The CPUs in many SoCs are not very powerful compared to your average desktop. Well, that's a trade-off, if you want to have the memory be usable for anything else (which I understood people did want) then you have to pay for cleaning it up when you need to use it. As for the cost of compaction vs regular page-out of random page-cache memory, compaction is actually cheaper, since it doesn't need to write out dirty data, and page-out driven writeback sucks due to the non-linear nature of it. There is obviously a trade-off. I was just wondering how costly it is. E.g. would it be a noticeable delay making 64 MB memory available in this way on a, say, 600 MHz ARM. And how would memory allocations in specific memory ranges (e.g. memory banks) work? Make sure you reserve pageblocks in the desired range. Note also that these issues are not limited to embedded systems, also PCI(e) boards can sometimes require massive amounts of DMA-able memory. I have had this happen in the past with the ivtv driver with customers that had 15 or so capture cards in one box. And I'm sure it will happen in the future as well, esp. with upcoming 4k video formats. I would sincerely hope PCI(e) devices come with an IOMMU (and all memory lines wired up), really, any hardware that doesn't isn't worth the silicon its engraved in. Just don't buy it. In the case of the ivtv driver the PCI device had a broken scatter-gather DMA engine, which is the underlying reason for these issues. Since I was maintainer of this driver for a few years I would love to have a reliable solution for the memory issues. It's not a big deal, 99.99% of all users will never notice anything, but still... And I don't think there are any affordable or easily obtainable alternatives to this hardware with similar feature sets, even after all these years. Anyway, I agree with your sentiment, but reality can be disappointingly different :-( And especially with regards to video hardware the creativity of the hardware designers is boundless -- to the dismay of us linux-media developers. Regards, Hans -- Hans Verkuil - video4linux developer - sponsored by TANDBERG, part of Cisco -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Sat, 2010-08-28 at 15:58 +0200, Hans Verkuil wrote: Isn't the proposed CMA thing vulnerable to the exact same problem? If you allow sharing of regions and plug some allocator in there you get the same problem. If you can solve it there, you can solve it for any kind of reservation scheme. Since with cma you can assign a region exclusively to a driver you can ensure that this problem does not occur. Of course, if you allow sharing then you will end up with the same type of problem unless you know that there is only one driver at a time that will use that memory. I think you could do the same thing, the proposed page allocator solutions still needs to manage pageblock state, you can manage those the same as you would your cma regions -- the difference is that you get the option of letting the rest of the system use the memory in a transparent manner if you don't need it. There is obviously a trade-off. I was just wondering how costly it is. E.g. would it be a noticeable delay making 64 MB memory available in this way on a, say, 600 MHz ARM. Right, dunno really, rather depends on the memory bandwidth of your arm device I suspect. It is something you'd have to test. In case the machine isn't fast enough, there really isn't anything you can do but keep the memory empty at all times; unless of course the device in question needs it. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, 26 Aug 2010 18:36:24 +0900 Minchan Kim minchan@gmail.com wrote: On Thu, Aug 26, 2010 at 1:30 PM, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: On Thu, 26 Aug 2010 13:06:28 +0900 Minchan Kim minchan@gmail.com wrote: On Thu, Aug 26, 2010 at 12:44 PM, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: On Thu, 26 Aug 2010 11:50:17 +0900 KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: 128MB...too big ? But it's depend on config. IBM's ppc guys used 16MB section, and recently, a new interface to shrink the number of /sys files are added, maybe usable. Something good with this approach will be you can create cma memory before installing driver. But yes, complicated and need some works. Ah, I need to clarify what I want to say. With compaction, it's helpful, but you can't get contiguous memory larger than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand, memory hot-plug code has almost all necessary things. True. Doesn't patch's idea of Christoph helps this ? http://lwn.net/Articles/200699/ yes, I think so. But, IIRC, it's own purpose of Chirstoph's work is for removing zones. please be careful what's really necessary. Ahh. Sorry for missing point. You're right. The patch can't help our problem. How about changing following this? The thing is MAX_ORDER is static. But we want to avoid too big MAX_ORDER of whole zones to support devices which requires big allocation chunk. So let's add MAX_ORDER into each zone and then, each zone can have different max order. For example, while DMA[32], NORMAL, HIGHMEM can have normal size 11, MOVABLE zone could have a 15. This approach has a big side effect? Hm...need to check hard coded MAX_ORDER usages...I don't think side-effect is big. Hmm. But I think enlarging MAX_ORDER isn't an important thing. A code which strips contiguous chunks of pages from buddy allocator is a necessaty thing, as.. What I can think of at 1st is... == int steal_pages(unsigned long start_pfn, unsigned long end_pfn) { /* Be careful mutal execution with memory hotplug, because reusing code */ split [start_pfn, end_pfn) to pageblock_order for each pageblock in the range { Mark this block as MIGRATE_ISOLATE try-to-free pages in the range or migrate pages in the range to somewhere. /* Here all pages in the range are on buddy allocator and free and never be allocated by anyone else. */ } please see __rmqueue_fallback(). it selects migration-type at 1st. Then, if you can pass start_migratetype of MIGLATE_ISOLATE, you can automatically strip all MIGRATE_ISOLATE pages from free_area[]. return chunk of pages. } == Thanks, -Kame -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Fri, 2010-08-27 at 17:16 +0900, KAMEZAWA Hiroyuki wrote: How about changing following this? The thing is MAX_ORDER is static. But we want to avoid too big MAX_ORDER of whole zones to support devices which requires big allocation chunk. So let's add MAX_ORDER into each zone and then, each zone can have different max order. For example, while DMA[32], NORMAL, HIGHMEM can have normal size 11, MOVABLE zone could have a 15. This approach has a big side effect? The side effect of increasing MAX_ORDER is that page allocations get more expensive since the buddy tree gets larger, yielding more splits/merges. Hm...need to check hard coded MAX_ORDER usages...I don't think side-effect is big. Hmm. But I think enlarging MAX_ORDER isn't an important thing. A code which strips contiguous chunks of pages from buddy allocator is a necessaty thing, as.. Right, once we can explicitly free the pages we want, crossing MAX_ORDER isn't too hard like you say, we can simply continue with freeing the next in order page. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, 2010-08-26 at 03:28 +0200, Michał Nazarewicz wrote: On Fri, 20 Aug 2010 15:15:10 +0200, Peter Zijlstra pet...@infradead.org wrote: So the idea is to grab a large chunk of memory at boot time and then later allow some device to use it? I'd much rather we'd improve the regular page allocator to be smarter about this. We recently added a lot of smarts to it like memory compaction, which allows large gobs of contiguous memory to be freed for things like huge pages. If you want guarantees you can free stuff, why not add constraints to the page allocation type and only allow MIGRATE_MOVABLE pages inside a certain region, those pages are easily freed/moved aside to satisfy large contiguous allocations. I'm aware that grabbing a large chunk at boot time is a bit of waste of space and because of it I'm hoping to came up with a way of reusing the space when it's not used by CMA-aware devices. My current idea was to use it for easily discardable data (page cache?). Right, so to me that looks like going at the problem backwards. That will complicate the page-cache instead of your bad hardware drivers (really, hardware should use IOMMUs already). So why not work on the page allocator to improve its contiguous allocation behaviour. If you look at the thing you'll find pageblocks and migration types. If you change it so that you pin the migration type of one or a number of contiguous pageblocks to say MIGRATE_MOVABLE, so that they cannot be used for anything but movable pages you're pretty much there. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, 2010-08-26 at 04:40 +0200, Michał Nazarewicz wrote: I think that the biggest problem is fragmentation here. For instance, I think that a situation where there is enough free space but it's fragmented so no single contiguous chunk can be allocated is a serious problem. However, I would argue that if there's simply no space left, a multimedia device could fail and even though it's not desirable, it would not be such a big issue in my eyes. So, if only movable or discardable pages are allocated in CMA managed regions all should work well. When a device needs memory discardable pages would get freed and movable moved unless there is no space left on the device in which case allocation would fail. If you'd actually looked at the page allocator you'd see its capable of doing exactly that! I has the notion of movable pages, it can defragment free space (called compaction). Use it! -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, 2010-08-26 at 11:49 +0900, Minchan Kim wrote: But one of problems is anonymous page which can be has a role of pinned page in non-swapsystem. Well, compaction can move those around, but if you've got too many of them its a simple matter of over-commit and for that we've got the OOM-killer ;-) -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, Aug 26, 2010 at 5:20 PM, Peter Zijlstra pet...@infradead.org wrote: On Thu, 2010-08-26 at 11:49 +0900, Minchan Kim wrote: But one of problems is anonymous page which can be has a role of pinned page in non-swapsystem. Well, compaction can move those around, but if you've got too many of them its a simple matter of over-commit and for that we've got the OOM-killer ;-) As I said following mail, I said about free space problem. Of course, compaction could move anon pages into somewhere. What's is somewhere? At last, it's same zone. It can prevent fragment problem but not size of free space. So I mean it would be better to move it into another zone(ex, HIGHMEM) rather than OOM kill. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, Aug 26, 2010 at 1:30 PM, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: On Thu, 26 Aug 2010 13:06:28 +0900 Minchan Kim minchan@gmail.com wrote: On Thu, Aug 26, 2010 at 12:44 PM, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: On Thu, 26 Aug 2010 11:50:17 +0900 KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: 128MB...too big ? But it's depend on config. IBM's ppc guys used 16MB section, and recently, a new interface to shrink the number of /sys files are added, maybe usable. Something good with this approach will be you can create cma memory before installing driver. But yes, complicated and need some works. Ah, I need to clarify what I want to say. With compaction, it's helpful, but you can't get contiguous memory larger than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand, memory hot-plug code has almost all necessary things. True. Doesn't patch's idea of Christoph helps this ? http://lwn.net/Articles/200699/ yes, I think so. But, IIRC, it's own purpose of Chirstoph's work is for removing zones. please be careful what's really necessary. Ahh. Sorry for missing point. You're right. The patch can't help our problem. How about changing following this? The thing is MAX_ORDER is static. But we want to avoid too big MAX_ORDER of whole zones to support devices which requires big allocation chunk. So let's add MAX_ORDER into each zone and then, each zone can have different max order. For example, while DMA[32], NORMAL, HIGHMEM can have normal size 11, MOVABLE zone could have a 15. This approach has a big side effect? -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, 2010-08-26 at 18:29 +0900, Minchan Kim wrote: As I said following mail, I said about free space problem. Of course, compaction could move anon pages into somewhere. What's is somewhere? At last, it's same zone. It can prevent fragment problem but not size of free space. So I mean it would be better to move it into another zone(ex, HIGHMEM) rather than OOM kill. Real machines don't have highmem, highmem sucks!! /me runs Does cross zone movement really matter, I though these crappy devices were mostly used on crappy hardware with very limited memory, so pretty much everything would be in zone_normal.. no? But sure, if there's really a need we can look at maybe doing cross zone movement. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Fri, Aug 20, 2010 at 03:15:10PM +0200, Peter Zijlstra wrote: On Fri, 2010-08-20 at 11:50 +0200, Michal Nazarewicz wrote: Hello everyone, The following patchset implements a Contiguous Memory Allocator. For those who have not yet stumbled across CMA an excerpt from documentation: The Contiguous Memory Allocator (CMA) is a framework, which allows setting up a machine-specific configuration for physically-contiguous memory management. Memory for devices is then allocated according to that configuration. The main role of the framework is not to allocate memory, but to parse and manage memory configurations, as well as to act as an in-between between device drivers and pluggable allocators. It is thus not tied to any memory allocation method or strategy. For more information please refer to the second patch from the patchset which contains the documentation. I'm only taking a quick look at this - slow as ever so pardon me if I missed anything. So the idea is to grab a large chunk of memory at boot time and then later allow some device to use it? I'd much rather we'd improve the regular page allocator to be smarter about this. We recently added a lot of smarts to it like memory compaction, which allows large gobs of contiguous memory to be freed for things like huge pages. Quick glance tells me that buffer sizes of 20MB are being thrown about which the core page allocator doesn't handle very well (and couldn't without major modification). Fragmentation avoidance only works well on sizes MAX_ORDER_NR_PAGES which likely will be 2MB or 4MB. That said, there are things the core VM can do to help. One is related to ZONE_MOVABLE and the second is on the use of MIGRATE_ISOLATE. ZONE_MOVABLE is setup when the command line has kernelcore= or movablecore= specified. In ZONE_MOVABLE only pages that can be migrated are allocated (or huge pages if specifically configured to be allowed). The zone is setup during initialisation by slicing pieces from the end of existing zones and for various reasons, it would be best to maintain that behaviour unless CMA had a specific requirement for memory in the middle of an existing zone. So lets say the maximum amount of contiguous memory required by all devices is 64M and ZONE_MOVABLE is 64M. During normal operation, normal order-0 pages can be allocated from this zone meaning the memory is not pinned and unusable by anybody else. This avoids wasting memory. When a device needs a new buffer, compaction would need some additional smarts to compact or reclaim the size of memory needed by the driver but because all the pages in the zone are movable, it should be possible. Ideally it would have swap to reclaim because if not, compaction needs to know how to move pages outside a zone (something it currently avoids). Essentially, cma_alloc() would be a normal alloc_pages that uses ZONE_MOVABLE for buffers MAX_ORDER_NR_PAGES but would need additional compaction smarts for the larger buffers. I think it would reuse as much of the existing VM as possible but without reviewing the code, I don't know for sure how useful the suggestion is. If you want guarantees you can free stuff, why not add constraints to the page allocation type and only allow MIGRATE_MOVABLE pages inside a certain region, those pages are easily freed/moved aside to satisfy large contiguous allocations. Relatively handy to do something like this. It can also be somewhat contrained by doing something similar to MIGRATE_ISOLATE to have contiguous regions of memory in a zone unusable by non-movable allocationos. It would be a lot trickier when interacting with reclaim though so using ZONE_MOVABLE would have less gotchas. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, Aug 26, 2010 at 04:40:46AM +0200, Micha?? Nazarewicz wrote: Hello Andrew, I think Pawel has replied to most of your comments, so I'll just add my own 0.02 KRW. ;) Peter Zijlstra pet...@infradead.org wrote: So the idea is to grab a large chunk of memory at boot time and then later allow some device to use it? I'd much rather we'd improve the regular page allocator to be smarter about this. We recently added a lot of smarts to it like memory compaction, which allows large gobs of contiguous memory to be freed for things like huge pages. If you want guarantees you can free stuff, why not add constraints to the page allocation type and only allow MIGRATE_MOVABLE pages inside a certain region, those pages are easily freed/moved aside to satisfy large contiguous allocations. On Thu, 26 Aug 2010 00:58:14 +0200, Andrew Morton a...@linux-foundation.org wrote: That would be good. Although I expect that the allocation would need to be 100% rock-solid reliable, otherwise the end user has a non-functioning device. Could generic core VM provide the required level of service? I think that the biggest problem is fragmentation here. For instance, I think that a situation where there is enough free space but it's fragmented so no single contiguous chunk can be allocated is a serious problem. However, I would argue that if there's simply no space left, a multimedia device could fail and even though it's not desirable, it would not be such a big issue in my eyes. For handling fragmentation, there is the option of ZONE_MOVABLE so it's usable by normal allocations but the CMA can take action to get it cleared out if necessary. Another option that is trickier but less disruptive would be to select a range of memory in a normal zone for CMA and mark it MIGRATE_MOVABLE so that movable pages are allocated from it. The trickier part is you need to make that bit stick so that non-movable pages are never allocated from that range. That would be trickish to implement but possible and it would avoid the fragmentation problem without pinning memory. So, if only movable or discardable pages are allocated in CMA managed regions all should work well. When a device needs memory discardable pages would get freed and movable moved unless there is no space left on the device in which case allocation would fail. Critical devices (just a hypothetical entities) could have separate regions on which only discardable pages can be allocated so that memory can always be allocated for them. I agree that having two contiguous memory allocators floating about on the list is distressing. Are we really all 100% diligently certain that there is no commonality here with Zach's work? As Pawel said, I think Zach's trying to solve a different problem. No matter, as I've said in response to Konrad's message, I have thought about unifying Zach's IOMMU and CMA in such a way that devices could work on both systems with and without IOMMU if only they would limit the usage of the API to some subset which always works. Please cc me on future emails on this topic? Not a problem. -- Best regards,_ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Micha?? mina86 Nazarewicz (o o) +[mina86*mina86.com]---[mina86*jabber.org]ooO--(_)--Ooo-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, Aug 26, 2010 at 7:06 PM, Peter Zijlstra pet...@infradead.org wrote: On Thu, 2010-08-26 at 18:29 +0900, Minchan Kim wrote: As I said following mail, I said about free space problem. Of course, compaction could move anon pages into somewhere. What's is somewhere? At last, it's same zone. It can prevent fragment problem but not size of free space. So I mean it would be better to move it into another zone(ex, HIGHMEM) rather than OOM kill. Real machines don't have highmem, highmem sucks!! /me runs It's another topic. I agree highmem isn't a gorgeous. But my desktop isn't real machine? Important thing is that we already have a highmem and many guys include you(kmap stacking patch :))try to improve highmem problems. :) Does cross zone movement really matter, I though these crappy devices were mostly used on crappy hardware with very limited memory, so pretty much everything would be in zone_normal.. no? No. Until now, many embedded devices have used to small memory. In that case, only there is a DMA zone in system. But as I know, mobile phone starts to use big(?) memory like 1G or above sooner or later. So they starts to use HIGHMEM. Otherwise, 2G/2G space configuration. Some embedded device uses many thread model to port easily from RTOS. In that case, they don't have enough address space for application if it uses 2G/2G model. So we should care of HIGHMEM in embedded system from now on. But sure, if there's really a need we can look at maybe doing cross zone movement. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
Even more offtopic ;-) On Thu, 2010-08-26 at 19:21 +0900, Minchan Kim wrote: I agree highmem isn't a gorgeous. But my desktop isn't real machine? Important thing is that we already have a highmem and many guys include you(kmap stacking patch :))try to improve highmem problems. :) I have exactly 0 machines in daily use that use highmem, I had to test that kmap stuff in a 32bit qemu. Sadly some hardware folks still think its a sane thing to do, like ARM announcing 40bit PAE, I mean really?! At least AMD announced a 64bit tiny-chip and hopefully Intel Atom will soon be all 64bit too (please?!). -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, 26 Aug 2010 10:17:07 +0200, Peter Zijlstra pet...@infradead.org wrote: So why not work on the page allocator to improve its contiguous allocation behaviour. If you look at the thing you'll find pageblocks and migration types. If you change it so that you pin the migration type of one or a number of contiguous pageblocks to say MIGRATE_MOVABLE, so that they cannot be used for anything but movable pages you're pretty much there. And that's exactly where I'm headed. I've created API that seems to be usable and meat mine and others requirements (not that I'm not saying it cannot be improved -- I'm always happy to hear comments) and now I'm starting to concentrate on the reusing of the grabbed memory. At first I wasn't sure how this can be managed but thanks to many comments (including yours, thanks!) I have an idea of how the thing should work and what I should do from now. -- Best regards,_ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał mina86 Nazarewicz (o o) +[mina86*mina86.com]---[mina86*jabber.org]ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Fri, 20 Aug 2010 15:15:10 +0200 Peter Zijlstra pet...@infradead.org wrote: On Fri, 2010-08-20 at 11:50 +0200, Michal Nazarewicz wrote: Hello everyone, The following patchset implements a Contiguous Memory Allocator. For those who have not yet stumbled across CMA an excerpt from documentation: The Contiguous Memory Allocator (CMA) is a framework, which allows setting up a machine-specific configuration for physically-contiguous memory management. Memory for devices is then allocated according to that configuration. The main role of the framework is not to allocate memory, but to parse and manage memory configurations, as well as to act as an in-between between device drivers and pluggable allocators. It is thus not tied to any memory allocation method or strategy. For more information please refer to the second patch from the patchset which contains the documentation. So the idea is to grab a large chunk of memory at boot time and then later allow some device to use it? I'd much rather we'd improve the regular page allocator to be smarter about this. We recently added a lot of smarts to it like memory compaction, which allows large gobs of contiguous memory to be freed for things like huge pages. If you want guarantees you can free stuff, why not add constraints to the page allocation type and only allow MIGRATE_MOVABLE pages inside a certain region, those pages are easily freed/moved aside to satisfy large contiguous allocations. That would be good. Although I expect that the allocation would need to be 100% rock-solid reliable, otherwise the end user has a non-functioning device. Could generic core VM provide the required level of service? Anyway, these patches are going to be hard to merge but not impossible. Keep going. Part of the problem is cultural, really: the consumers of this interface are weird dinky little devices which the core MM guys tend not to work with a lot, and it adds code which they wouldn't use. I agree that having two contiguous memory allocators floating about on the list is distressing. Are we really all 100% diligently certain that there is no commonality here with Zach's work? I agree that Peter's above suggestion would be the best thing to do. Please let's take a look at that without getting into sunk cost fallacies with existing code! It would help (a lot) if we could get more attention and buyin and fedback from the potential clients of this code. rmk's feedback is valuable. Have we heard from the linux-media people? What other subsystems might use it? ieee1394 perhaps? Please help identify specific subsystems and I can perhaps help to wake people up. And I agree that this code (or one of its alternatives!) would benefit from having a core MM person take a close interest. Any volunteers? Please cc me on future emails on this topic? -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Wed, 2010-08-25 at 15:58 -0700, Andrew Morton wrote: That would be good. Although I expect that the allocation would need to be 100% rock-solid reliable, otherwise the end user has a non-functioning device. Could generic core VM provide the required level of service? Anyway, these patches are going to be hard to merge but not impossible. Keep going. Part of the problem is cultural, really: the consumers of this interface are weird dinky little devices which the core MM guys tend not to work with a lot, and it adds code which they wouldn't use. I agree that having two contiguous memory allocators floating about on the list is distressing. Are we really all 100% diligently certain that there is no commonality here with Zach's work? There is some commonality with Zach's work, but Zach should be following all of this development .. So presumably he has no issues with Michal's changes. I think Zach's solution has a similar direction to this. If Michal is active (he seems more so than Zach), and follows community comments (including Zach's , but I haven't seen any) then we can defer to that solution .. Daniel -- Sent by a consultant of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Wed, 25 Aug 2010 15:58:14 -0700 Andrew Morton a...@linux-foundation.org wrote: On Fri, 20 Aug 2010 15:15:10 +0200 Peter Zijlstra pet...@infradead.org wrote: On Fri, 2010-08-20 at 11:50 +0200, Michal Nazarewicz wrote: Hello everyone, The following patchset implements a Contiguous Memory Allocator. For those who have not yet stumbled across CMA an excerpt from documentation: The Contiguous Memory Allocator (CMA) is a framework, which allows setting up a machine-specific configuration for physically-contiguous memory management. Memory for devices is then allocated according to that configuration. The main role of the framework is not to allocate memory, but to parse and manage memory configurations, as well as to act as an in-between between device drivers and pluggable allocators. It is thus not tied to any memory allocation method or strategy. For more information please refer to the second patch from the patchset which contains the documentation. So the idea is to grab a large chunk of memory at boot time and then later allow some device to use it? I'd much rather we'd improve the regular page allocator to be smarter about this. We recently added a lot of smarts to it like memory compaction, which allows large gobs of contiguous memory to be freed for things like huge pages. If you want guarantees you can free stuff, why not add constraints to the page allocation type and only allow MIGRATE_MOVABLE pages inside a certain region, those pages are easily freed/moved aside to satisfy large contiguous allocations. That would be good. Although I expect that the allocation would need to be 100% rock-solid reliable, otherwise the end user has a non-functioning device. Could generic core VM provide the required level of service? Anyway, these patches are going to be hard to merge but not impossible. Keep going. Part of the problem is cultural, really: the consumers of this interface are weird dinky little devices which the core MM guys tend not to work with a lot, and it adds code which they wouldn't use. I agree that having two contiguous memory allocators floating about on the list is distressing. Are we really all 100% diligently certain that there is no commonality here with Zach's work? I agree that Peter's above suggestion would be the best thing to do. Please let's take a look at that without getting into sunk cost fallacies with existing code! It would help (a lot) if we could get more attention and buyin and fedback from the potential clients of this code. rmk's feedback is valuable. Have we heard from the linux-media people? What other subsystems might use it? ieee1394 perhaps? Please help identify specific subsystems and I can perhaps help to wake people up. And I agree that this code (or one of its alternatives!) would benefit from having a core MM person take a close interest. Any volunteers? Please cc me on future emails on this topic? Hmm, you may not like this..but how about following kind of interface ? Now, memoyr hotplug supports following operation to free and _isolate_ memory region. # echo offline /sys/devices/system/memory/memoryX/state Then, a region of memory will be isolated. (This succeeds if there are free memory.) Add a new interface. % echo offline /sys/devices/system/memory/memoryX/state # extract memory from System RAM and make them invisible from buddy allocator. % echo cma /sys/devices/system/memory/memoryX/state # move invisible memory to cma. Then, a chunk of memory will be moved into contiguous-memory-allocator. To move cma region as usual region, # echo offline /sys/devices/system/memory/memoryX/state # echo online /sys/devices/system/memory/memoryX/state Maybe used-for-cma memory are can be populated via /proc/iomem As, 1-63fff : System RAM 64000-8 : Contiguous RAM (Used for drivers) (And you have to skip small memory holes by seeing this file) Of course, cma guys can keep continue to use their own boot option. With memory hotplug, kernelcore=xxxM interface can be used for creating ZONE_MOVABLE. Some complicated work may be needed as # echo movable /sys/devices/system/memory/memoryX/state (online pages and move them into ZONE_MOVABLE) If anyone interested in, I may be able to offer some help. Thanks, -Kame -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
Hi Andrew, Thank you for your comments and interest in this! On 08/26/2010 07:58 AM, Andrew Morton wrote: On Fri, 20 Aug 2010 15:15:10 +0200 Peter Zijlstrapet...@infradead.org wrote: On Fri, 2010-08-20 at 11:50 +0200, Michal Nazarewicz wrote: Hello everyone, The following patchset implements a Contiguous Memory Allocator. For those who have not yet stumbled across CMA an excerpt from documentation: The Contiguous Memory Allocator (CMA) is a framework, which allows setting up a machine-specific configuration for physically-contiguous memory management. Memory for devices is then allocated according to that configuration. The main role of the framework is not to allocate memory, but to parse and manage memory configurations, as well as to act as an in-between between device drivers and pluggable allocators. It is thus not tied to any memory allocation method or strategy. For more information please refer to the second patch from the patchset which contains the documentation. So the idea is to grab a large chunk of memory at boot time and then later allow some device to use it? I'd much rather we'd improve the regular page allocator to be smarter about this. We recently added a lot of smarts to it like memory compaction, which allows large gobs of contiguous memory to be freed for things like huge pages. If you want guarantees you can free stuff, why not add constraints to the page allocation type and only allow MIGRATE_MOVABLE pages inside a certain region, those pages are easily freed/moved aside to satisfy large contiguous allocations. That would be good. Although I expect that the allocation would need to be 100% rock-solid reliable, otherwise the end user has a non-functioning device. Could generic core VM provide the required level of service? Anyway, these patches are going to be hard to merge but not impossible. Keep going. Part of the problem is cultural, really: the consumers of this interface are weird dinky little devices which the core MM guys tend not to work with a lot, and it adds code which they wouldn't use. This is encouraging, thanks. Merging a contiguous allocator seems like a lost cause, with a relative disinterest of non-embedded people, and on the other hand because of the difficulty to satisfy those actually interested. With virtually everybody having their own, custom solutions, agreeing on one is nearly impossible. I agree that having two contiguous memory allocators floating about on the list is distressing. Are we really all 100% diligently certain that there is no commonality here with Zach's work? I think Zach's work is more focused on IOMMU and on unifying virtual memory handling. As far as I understand, any physical allocator can be plugged into it, including CMA. CMA solves a different set of problems. I agree that Peter's above suggestion would be the best thing to do. Please let's take a look at that without getting into sunk cost fallacies with existing code! It would help (a lot) if we could get more attention and buyin and fedback from the potential clients of this code. rmk's feedback is valuable. Have we heard from the linux-media people? What other subsystems might use it? ieee1394 perhaps? Please help identify specific subsystems and I can perhaps help to wake people up. As a media developer myself, I talked with people and many have expressed their interest. Among them were developers from ST-Ericsson, Intel and TI, to name a few. Their SoCs, like ours at Samsung, require contiguous memory allocation schemes as well. I am working on a driver framework for media for memory management (on the logical, not physical level). One of the goals is to allow plugging in custom allocators and memory handling functions (cache management, etc.). CMA is intended to be used as one of the pluggable allocators for it. Right now, many media drivers have to provide their own, more or less complicated, memory handling, which is of course undesirable. Some of those make it to the kernel, many are maintained outside the mainline. The problem is that, as far as I am aware, there have already been quite a few proposals for such allocators and none made it to the mainline. So companies develop their own solutions and maintain them outside the mainline. I think that the interest is definitely there, but people have their deadlines and assume that it is close to impossible to have a contiguous allocator merged. Your help and support would be very much appreciated. Working in embedded Linux for some time now, I feel that the need is definitely there and is quite substantial. -- Best regards, Pawel Osciak Linux Platform Group Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Fri, 20 Aug 2010 15:15:10 +0200, Peter Zijlstra pet...@infradead.org wrote: So the idea is to grab a large chunk of memory at boot time and then later allow some device to use it? I'd much rather we'd improve the regular page allocator to be smarter about this. We recently added a lot of smarts to it like memory compaction, which allows large gobs of contiguous memory to be freed for things like huge pages. If you want guarantees you can free stuff, why not add constraints to the page allocation type and only allow MIGRATE_MOVABLE pages inside a certain region, those pages are easily freed/moved aside to satisfy large contiguous allocations. I'm aware that grabbing a large chunk at boot time is a bit of waste of space and because of it I'm hoping to came up with a way of reusing the space when it's not used by CMA-aware devices. My current idea was to use it for easily discardable data (page cache?). Also, please remove --chain-reply-to from your git config. You're using 1.7 which should do the right thing (--no-chain-reply-to) by default. OK. -- Best regards,_ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał mina86 Nazarewicz (o o) +[mina86*mina86.com]---[mina86*jabber.org]ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, 26 Aug 2010 01:26:34 +0200, Daniel Walker dwal...@codeaurora.org wrote: If Michal is active, and follows community comments (including Zach's, but I haven't seen any) then we can defer to that solution .. Comments are always welcome. :) -- Best regards,_ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał mina86 Nazarewicz (o o) +[mina86*mina86.com]---[mina86*jabber.org]ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On 08/26/2010 08:31 AM, Jonathan Corbet wrote: On Wed, 25 Aug 2010 15:58:14 -0700 Andrew Mortona...@linux-foundation.org wrote: If you want guarantees you can free stuff, why not add constraints to the page allocation type and only allow MIGRATE_MOVABLE pages inside a certain region, those pages are easily freed/moved aside to satisfy large contiguous allocations. That would be good. Although I expect that the allocation would need to be 100% rock-solid reliable, otherwise the end user has a non-functioning device. Could generic core VM provide the required level of service? The original OLPC has a camera controller which requires three contiguous, image-sized buffers in memory. That system is a little memory constrained (OK, it's desperately short of memory), so, in the past, the chances of being able to allocate those buffers anytime some kid decides to start taking pictures was poor. Thus, cafe_ccic.c has an option to snag the memory at initialization time and never let go even if you threaten its family. Hell hath no fury like a little kid whose new toy^W educational tool stops taking pictures. That, of course, is not a hugely efficient use of memory on a memory-constrained system. If the VM could reliably satisfy those allocation requestss, life would be wonderful. Seems difficult. But it would be a nicer solution than CMA, which, to a great extent, is really just a standardized mechanism for grabbing memory and never letting go. The main problem is of course fragmentation, for this there is no solution in CMA. It has a feature intended to at least reduce memory usage though, if only a little bit. It is region sharing. It allows platform architects to define regions shared by more than one driver, as explained by Michal in the RFC. So we can at least try to reuse each chunk of memory as much as possible and not hold separate regions for each driver when they are not intended to work simultaneously. Not a silver bullet, but is there any though? It would help (a lot) if we could get more attention and buyin and fedback from the potential clients of this code. rmk's feedback is valuable. Have we heard from the linux-media people? What other subsystems might use it? ieee1394 perhaps? Please help identify specific subsystems and I can perhaps help to wake people up. If this code had been present when I did the Cafe driver, I would have used it. I think it could be made useful to a number of low-end camera drivers if the videobuf layer were made to talk to it in a way which Just Works. I am working on new videobuf which will (hopefully) Just Work. CMA is intended to be pluggable into it, as should be any other allocator for that matter. -- Best regards, Pawel Osciak Linux Platform Group Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, 26 Aug 2010 01:31:25 +0200, Jonathan Corbet cor...@lwn.net wrote: The original OLPC has a camera controller which requires three contiguous, image-sized buffers in memory. That system is a little memory constrained (OK, it's desperately short of memory), so, in the past, the chances of being able to allocate those buffers anytime some kid decides to start taking pictures was poor. Thus, cafe_ccic.c has an option to snag the memory at initialization time and never let go even if you threaten its family. Hell hath no fury like a little kid whose new toy^W educational tool stops taking pictures. That, of course, is not a hugely efficient use of memory on a memory-constrained system. If the VM could reliably satisfy those allocation requestss, life would be wonderful. Seems difficult. But it would be a nicer solution than CMA, which, to a great extent, is really just a standardized mechanism for grabbing memory and never letting go. At this moment it seems nothing more then that but they way I see it is that with a common, standardised, centrally-managed mechanism for grabbing memory we can start thinking about the ways to reuse the memory. If each driver were to grab it's own memory in a way know to itself only the memory is truly lost but with CMA not only regions can be reused among devices but also the framework can manage the unallocated memory and try to utilize it in other ways (movable pages? cache? buffers? some kind of compressed memory swap?). What I'm trying to say is that I totally agree with your and other's comments about CMA essentially grabbing memory and never releasing it but I believe this can be combat with time when overall idea of haw the CMA API should look like is agreed upon. -- Best regards,_ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał mina86 Nazarewicz (o o) +[mina86*mina86.com]---[mina86*jabber.org]ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, 26 Aug 2010 02:58:57 +0200, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: Hmm, you may not like this..but how about following kind of interface ? Now, memoyr hotplug supports following operation to free and _isolate_ memory region. # echo offline /sys/devices/system/memory/memoryX/state Then, a region of memory will be isolated. (This succeeds if there are free memory.) Add a new interface. % echo offline /sys/devices/system/memory/memoryX/state # extract memory from System RAM and make them invisible from buddy allocator. % echo cma /sys/devices/system/memory/memoryX/state # move invisible memory to cma. At this point I need to say that I have no experience with hotplug memory but I think that for this to make sense the regions of memory would have to be smaller. Unless I'm misunderstanding something, the above would convert a region of sizes in order of GiBs to use for CMA. -- Best regards,_ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał mina86 Nazarewicz (o o) +[mina86*mina86.com]---[mina86*jabber.org]ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
Hello Andrew, I think Pawel has replied to most of your comments, so I'll just add my own 0.02 KRW. ;) Peter Zijlstra pet...@infradead.org wrote: So the idea is to grab a large chunk of memory at boot time and then later allow some device to use it? I'd much rather we'd improve the regular page allocator to be smarter about this. We recently added a lot of smarts to it like memory compaction, which allows large gobs of contiguous memory to be freed for things like huge pages. If you want guarantees you can free stuff, why not add constraints to the page allocation type and only allow MIGRATE_MOVABLE pages inside a certain region, those pages are easily freed/moved aside to satisfy large contiguous allocations. On Thu, 26 Aug 2010 00:58:14 +0200, Andrew Morton a...@linux-foundation.org wrote: That would be good. Although I expect that the allocation would need to be 100% rock-solid reliable, otherwise the end user has a non-functioning device. Could generic core VM provide the required level of service? I think that the biggest problem is fragmentation here. For instance, I think that a situation where there is enough free space but it's fragmented so no single contiguous chunk can be allocated is a serious problem. However, I would argue that if there's simply no space left, a multimedia device could fail and even though it's not desirable, it would not be such a big issue in my eyes. So, if only movable or discardable pages are allocated in CMA managed regions all should work well. When a device needs memory discardable pages would get freed and movable moved unless there is no space left on the device in which case allocation would fail. Critical devices (just a hypothetical entities) could have separate regions on which only discardable pages can be allocated so that memory can always be allocated for them. I agree that having two contiguous memory allocators floating about on the list is distressing. Are we really all 100% diligently certain that there is no commonality here with Zach's work? As Pawel said, I think Zach's trying to solve a different problem. No matter, as I've said in response to Konrad's message, I have thought about unifying Zach's IOMMU and CMA in such a way that devices could work on both systems with and without IOMMU if only they would limit the usage of the API to some subset which always works. Please cc me on future emails on this topic? Not a problem. -- Best regards,_ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał mina86 Nazarewicz (o o) +[mina86*mina86.com]---[mina86*jabber.org]ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, Aug 26, 2010 at 8:31 AM, Jonathan Corbet cor...@lwn.net wrote: On Wed, 25 Aug 2010 15:58:14 -0700 Andrew Morton a...@linux-foundation.org wrote: If you want guarantees you can free stuff, why not add constraints to the page allocation type and only allow MIGRATE_MOVABLE pages inside a certain region, those pages are easily freed/moved aside to satisfy large contiguous allocations. That would be good. Although I expect that the allocation would need to be 100% rock-solid reliable, otherwise the end user has a non-functioning device. Could generic core VM provide the required level of service? The original OLPC has a camera controller which requires three contiguous, image-sized buffers in memory. That system is a little memory constrained (OK, it's desperately short of memory), so, in the past, the chances of being able to allocate those buffers anytime some kid decides to start taking pictures was poor. Thus, cafe_ccic.c has an option to snag the memory at initialization time and never let go even if you threaten its family. Hell hath no fury like a little kid whose new toy^W educational tool stops taking pictures. That, of course, is not a hugely efficient use of memory on a memory-constrained system. If the VM could reliably satisfy those allocation requestss, life would be wonderful. Seems difficult. But it would be a nicer solution than CMA, which, to a great extent, is really just a standardized mechanism for grabbing memory and never letting go. It would help (a lot) if we could get more attention and buyin and fedback from the potential clients of this code. rmk's feedback is valuable. Have we heard from the linux-media people? What other subsystems might use it? ieee1394 perhaps? Please help identify specific subsystems and I can perhaps help to wake people up. If this code had been present when I did the Cafe driver, I would have used it. I think it could be made useful to a number of low-end camera drivers if the videobuf layer were made to talk to it in a way which Just Works. With a bit of tweaking, I think it could be made useful in other situations: the viafb driver, for example, really needs an allocator for framebuffer memory and it seems silly to create one from scratch. Of course, there might be other possible solutions, like adding a zones concept to LMB^W memblock. The problem which is being addressed here is real. That said, the complexity of the solution still bugs me a bit, and the core idea is still to take big chunks of memory out of service for specific needs. It would be far better if the VM could just provide big chunks on demand. Perhaps compaction and the pressures of making transparent huge pages work will get us there, but I'm not sure we're there yet. jon I agree. compaction and movable zone will be one of good solutions. If some driver needs big contiguous chunk to work, it should make sure to be allowable to have memory size for it before going. To make sure it, we have to consider compaction of ZONE_MOVABLE zone. But one of problems is anonymous page which can be has a role of pinned page in non-swapsystem. Even most of embedded system has no swap. But it's not hard to solve it. We needs Mel's opinion, too. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, 26 Aug 2010 04:12:10 +0200 Michał Nazarewicz m.nazarew...@samsung.com wrote: On Thu, 26 Aug 2010 02:58:57 +0200, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: Hmm, you may not like this..but how about following kind of interface ? Now, memoyr hotplug supports following operation to free and _isolate_ memory region. # echo offline /sys/devices/system/memory/memoryX/state Then, a region of memory will be isolated. (This succeeds if there are free memory.) Add a new interface. % echo offline /sys/devices/system/memory/memoryX/state # extract memory from System RAM and make them invisible from buddy allocator. % echo cma /sys/devices/system/memory/memoryX/state # move invisible memory to cma. At this point I need to say that I have no experience with hotplug memory but I think that for this to make sense the regions of memory would have to be smaller. Unless I'm misunderstanding something, the above would convert a region of sizes in order of GiBs to use for CMA. Now, x86's section size is == #ifdef CONFIG_X86_32 # ifdef CONFIG_X86_PAE # define SECTION_SIZE_BITS 29 # define MAX_PHYSADDR_BITS 36 # define MAX_PHYSMEM_BITS 36 # else # define SECTION_SIZE_BITS 26 # define MAX_PHYSADDR_BITS 32 # define MAX_PHYSMEM_BITS 32 # endif #else /* CONFIG_X86_32 */ # define SECTION_SIZE_BITS 27 /* matt - 128 is convenient right now */ # define MAX_PHYSADDR_BITS 44 # define MAX_PHYSMEM_BITS 46 #endif == 128MB...too big ? But it's depend on config. IBM's ppc guys used 16MB section, and recently, a new interface to shrink the number of /sys files are added, maybe usable. Something good with this approach will be you can create cma memory before installing driver. But yes, complicated and need some works. Bye, -Kame -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, Aug 26, 2010 at 11:49 AM, Minchan Kim minchan@gmail.com wrote: On Thu, Aug 26, 2010 at 8:31 AM, Jonathan Corbet cor...@lwn.net wrote: On Wed, 25 Aug 2010 15:58:14 -0700 Andrew Morton a...@linux-foundation.org wrote: If you want guarantees you can free stuff, why not add constraints to the page allocation type and only allow MIGRATE_MOVABLE pages inside a certain region, those pages are easily freed/moved aside to satisfy large contiguous allocations. That would be good. Although I expect that the allocation would need to be 100% rock-solid reliable, otherwise the end user has a non-functioning device. Could generic core VM provide the required level of service? The original OLPC has a camera controller which requires three contiguous, image-sized buffers in memory. That system is a little memory constrained (OK, it's desperately short of memory), so, in the past, the chances of being able to allocate those buffers anytime some kid decides to start taking pictures was poor. Thus, cafe_ccic.c has an option to snag the memory at initialization time and never let go even if you threaten its family. Hell hath no fury like a little kid whose new toy^W educational tool stops taking pictures. That, of course, is not a hugely efficient use of memory on a memory-constrained system. If the VM could reliably satisfy those allocation requestss, life would be wonderful. Seems difficult. But it would be a nicer solution than CMA, which, to a great extent, is really just a standardized mechanism for grabbing memory and never letting go. It would help (a lot) if we could get more attention and buyin and fedback from the potential clients of this code. rmk's feedback is valuable. Have we heard from the linux-media people? What other subsystems might use it? ieee1394 perhaps? Please help identify specific subsystems and I can perhaps help to wake people up. If this code had been present when I did the Cafe driver, I would have used it. I think it could be made useful to a number of low-end camera drivers if the videobuf layer were made to talk to it in a way which Just Works. With a bit of tweaking, I think it could be made useful in other situations: the viafb driver, for example, really needs an allocator for framebuffer memory and it seems silly to create one from scratch. Of course, there might be other possible solutions, like adding a zones concept to LMB^W memblock. The problem which is being addressed here is real. That said, the complexity of the solution still bugs me a bit, and the core idea is still to take big chunks of memory out of service for specific needs. It would be far better if the VM could just provide big chunks on demand. Perhaps compaction and the pressures of making transparent huge pages work will get us there, but I'm not sure we're there yet. jon I agree. compaction and movable zone will be one of good solutions. If some driver needs big contiguous chunk to work, it should make sure to be allowable to have memory size for it before going. To make sure it, we have to consider compaction of ZONE_MOVABLE zone. But one of problems is anonymous page which can be has a role of pinned page in non-swapsystem. Even most of embedded system has no swap. But it's not hard to solve it. We needs Mel's opinion, too. I elaborates my statement for preventing confusing due to using _pinned page_. I means that anon pages isn't not a fragment problem but space problem for the devices. It would be better to move the pages into !ZONE_MOVABLE zone. -- Kind regards, Minchan Kim -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, 26 Aug 2010 11:50:17 +0900 KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: 128MB...too big ? But it's depend on config. IBM's ppc guys used 16MB section, and recently, a new interface to shrink the number of /sys files are added, maybe usable. Something good with this approach will be you can create cma memory before installing driver. But yes, complicated and need some works. Ah, I need to clarify what I want to say. With compaction, it's helpful, but you can't get contiguous memory larger than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand, memory hot-plug code has almost all necessary things. you may able to add # echo 0xa000-0xa8000 /sys/devices/system/memory/cma to get contiguous isolated memory. BTW, just curious...the memory for cma need not to be saved at hibernation ? Or drivers has to write its own hibernation ops by driver suspend udev or some ? Thanks, -Kame -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: 128MB...too big ? But it's depend on config. On embedded systems it may be like half of the RAM. Or a quarter. So bigger granularity could be desired on some platforms. IBM's ppc guys used 16MB section, and recently, a new interface to shrink the number of /sys files are added, maybe usable. Something good with this approach will be you can create cma memory before installing driver. That's how CMA works at the moment. But if I understand you correctly, what you are proposing would allow to reserve memory *at* *runtime* long after system has booted. This would be a nice feature as well though. But yes, complicated and need some works. Ah, I need to clarify what I want to say. With compaction, it's helpful, but you can't get contiguous memory larger than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand, memory hot-plug code has almost all necessary things. I'll try to look at it then. BTW, just curious...the memory for cma need not to be saved at hibernation ? Or drivers has to write its own hibernation ops by driver suspend udev or some ? Hibernation was not considered as of yet but I think it's device driver's responsibility more then CMA's especially since it may make little sense to save some of the buffers -- ie. no need to keep a frame from camera since it'll be overwritten just after system wakes up from hibernation. It may also be better to stop playback and resume it later on rather than trying to save decoder's state. Again though, I haven't thought about hibernation as of yet. -- Best regards,_ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał mina86 Nazarewicz (o o) +[mina86*mina86.com]---[mina86*jabber.org]ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, Aug 26, 2010 at 12:44 PM, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: On Thu, 26 Aug 2010 11:50:17 +0900 KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: 128MB...too big ? But it's depend on config. IBM's ppc guys used 16MB section, and recently, a new interface to shrink the number of /sys files are added, maybe usable. Something good with this approach will be you can create cma memory before installing driver. But yes, complicated and need some works. Ah, I need to clarify what I want to say. With compaction, it's helpful, but you can't get contiguous memory larger than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand, memory hot-plug code has almost all necessary things. True. Doesn't patch's idea of Christoph helps this ? http://lwn.net/Articles/200699/ -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, Aug 26, 2010 at 1:06 PM, Minchan Kim minchan@gmail.com wrote: On Thu, Aug 26, 2010 at 12:44 PM, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: On Thu, 26 Aug 2010 11:50:17 +0900 KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: 128MB...too big ? But it's depend on config. IBM's ppc guys used 16MB section, and recently, a new interface to shrink the number of /sys files are added, maybe usable. Something good with this approach will be you can create cma memory before installing driver. But yes, complicated and need some works. Ah, I need to clarify what I want to say. With compaction, it's helpful, but you can't get contiguous memory larger than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand, memory hot-plug code has almost all necessary things. True. Doesn't patch's idea of Christoph helps this ? http://lwn.net/Articles/200699/ Of course, It itself can't meet our requirement but idea of range allocation seem to be good. I think it can be start point. -- Kind regards, Minchan Kim -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, 26 Aug 2010 13:06:28 +0900 Minchan Kim minchan@gmail.com wrote: On Thu, Aug 26, 2010 at 12:44 PM, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: On Thu, 26 Aug 2010 11:50:17 +0900 KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: 128MB...too big ? But it's depend on config. IBM's ppc guys used 16MB section, and recently, a new interface to shrink the number of /sys files are added, maybe usable. Something good with this approach will be you can create cma memory before installing driver. But yes, complicated and need some works. Ah, I need to clarify what I want to say. With compaction, it's helpful, but you can't get contiguous memory larger than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand, memory hot-plug code has almost all necessary things. True. Doesn't patch's idea of Christoph helps this ? http://lwn.net/Articles/200699/ yes, I think so. But, IIRC, it's own purpose of Chirstoph's work is for removing zones. please be careful what's really necessary. Thanks, -Kame -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, 26 Aug 2010 06:01:56 +0200 Michał Nazarewicz m.nazarew...@samsung.com wrote: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: 128MB...too big ? But it's depend on config. On embedded systems it may be like half of the RAM. Or a quarter. So bigger granularity could be desired on some platforms. IBM's ppc guys used 16MB section, and recently, a new interface to shrink the number of /sys files are added, maybe usable. Something good with this approach will be you can create cma memory before installing driver. That's how CMA works at the moment. But if I understand you correctly, what you are proposing would allow to reserve memory *at* *runtime* long after system has booted. This would be a nice feature as well though. Yes, my proposal is that. But yes, complicated and need some works. Ah, I need to clarify what I want to say. With compaction, it's helpful, but you can't get contiguous memory larger than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand, memory hot-plug code has almost all necessary things. I'll try to look at it then. mm/memory_hotplug.c::offline_pages() does 1. disallow new allocation of memory in [start_pfn...end_pfn) 2. move all LRU pages to other regions than [start_pfn...end_pfn) 3. finally, mark all pages as PG_reserved (see __offline_isolated_pages()) What's required for cma will be a. remove _section_ limitation, which is done as BUG_ON(). b. replace 'step 3' with cma code. Maybe you can do similar just using compaction logic. The biggest difference will be 'step 1'. BTW, just curious...the memory for cma need not to be saved at hibernation ? Or drivers has to write its own hibernation ops by driver suspend udev or some ? Hibernation was not considered as of yet but I think it's device driver's responsibility more then CMA's especially since it may make little sense to save some of the buffers -- ie. no need to keep a frame from camera since it'll be overwritten just after system wakes up from hibernation. It may also be better to stop playback and resume it later on rather than trying to save decoder's state. Again though, I haven't thought about hibernation as of yet. Hmm, ok, use-case dependent and it's a job of a driver. Thanks, -Kame -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Thu, Aug 26, 2010 at 06:01:56AM +0200, Michał Nazarewicz wrote: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: 128MB...too big ? But it's depend on config. On embedded systems it may be like half of the RAM. Or a quarter. So bigger granularity could be desired on some platforms. IBM's ppc guys used 16MB section, and recently, a new interface to shrink the number of /sys files are added, maybe usable. Something good with this approach will be you can create cma memory before installing driver. That's how CMA works at the moment. But if I understand you correctly, what you are proposing would allow to reserve memory *at* *runtime* long after system has booted. This would be a nice feature as well though. Yeah, if we can do this, that will avoid rebooting for kdump to reserve memory. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework
On Fri, 2010-08-20 at 11:50 +0200, Michal Nazarewicz wrote: Hello everyone, The following patchset implements a Contiguous Memory Allocator. For those who have not yet stumbled across CMA an excerpt from documentation: The Contiguous Memory Allocator (CMA) is a framework, which allows setting up a machine-specific configuration for physically-contiguous memory management. Memory for devices is then allocated according to that configuration. The main role of the framework is not to allocate memory, but to parse and manage memory configurations, as well as to act as an in-between between device drivers and pluggable allocators. It is thus not tied to any memory allocation method or strategy. For more information please refer to the second patch from the patchset which contains the documentation. So the idea is to grab a large chunk of memory at boot time and then later allow some device to use it? I'd much rather we'd improve the regular page allocator to be smarter about this. We recently added a lot of smarts to it like memory compaction, which allows large gobs of contiguous memory to be freed for things like huge pages. If you want guarantees you can free stuff, why not add constraints to the page allocation type and only allow MIGRATE_MOVABLE pages inside a certain region, those pages are easily freed/moved aside to satisfy large contiguous allocations. Also, please remove --chain-reply-to from your git config. You're using 1.7 which should do the right thing (--no-chain-reply-to) by default. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html