Re: some cleanup of of uvm_map.c
So, can I remind this simple cleanup? 2010/1/31 Anton Maksimenkov anton...@gmail.com: Here is some cleanup of uvm_map.c code. At first, in uvm_map_advice() function. Let it looks as simple as uvm_map_inherit() - no need to lock/unlock map just to realize that sanity check fail. Also no need to do it in every loop. Second, remove redundant temp_entry variable from both functions. One entry variable is enough to do the job. $ cvs diff -Nup sys/uvm/uvm_map.c Index: sys/uvm/uvm_map.c === RCS file: /cvs/src/sys/uvm/uvm_map.c,v retrieving revision 1.123 diff -u -p -r1.123 uvm_map.c --- sys/uvm/uvm_map.c 28 Aug 2009 00:40:03 - 1.123 +++ sys/uvm/uvm_map.c 31 Jan 2010 11:35:56 - @@ -2473,7 +2473,7 @@ int uvm_map_inherit(struct vm_map *map, vaddr_t start, vaddr_t end, vm_inherit_t new_inheritance) { - struct vm_map_entry *entry, *temp_entry; + struct vm_map_entry *entry; UVMHIST_FUNC(uvm_map_inherit); UVMHIST_CALLED(maphist); UVMHIST_LOG(maphist,(map=%p,start=0x%lx,end=0x%lx,new_inh=0x%lx), map, start, end, new_inheritance); @@ -2492,11 +2492,10 @@ uvm_map_inherit(struct vm_map *map, vadd VM_MAP_RANGE_CHECK(map, start, end); - if (uvm_map_lookup_entry(map, start, temp_entry)) { - entry = temp_entry; + if (uvm_map_lookup_entry(map, start, entry)) { UVM_MAP_CLIP_START(map, entry, start); } else { - entry = temp_entry-next; + entry = entry-next; } while ((entry != map-header) (entry-start end)) { @@ -2519,18 +2518,29 @@ uvm_map_inherit(struct vm_map *map, vadd int uvm_map_advice(struct vm_map *map, vaddr_t start, vaddr_t end, int new_advice) { - struct vm_map_entry *entry, *temp_entry; + struct vm_map_entry *entry; UVMHIST_FUNC(uvm_map_advice); UVMHIST_CALLED(maphist); UVMHIST_LOG(maphist,(map=%p,start=0x%lx,end=0x%lx,new_adv=0x%lx), map, start, end, new_advice); + switch (new_advice) { + case MADV_NORMAL: + case MADV_RANDOM: + case MADV_SEQUENTIAL: + /* nothing special here */ + break; + + default: + UVMHIST_LOG(maphist,- done (INVALID ARG),0,0,0,0); + return (EINVAL); + } + vm_map_lock(map); VM_MAP_RANGE_CHECK(map, start, end); - if (uvm_map_lookup_entry(map, start, temp_entry)) { - entry = temp_entry; + if (uvm_map_lookup_entry(map, start, entry)) { UVM_MAP_CLIP_START(map, entry, start); } else { - entry = temp_entry-next; + entry = entry-next; } /* @@ -2539,19 +2549,6 @@ uvm_map_advice(struct vm_map *map, vaddr while ((entry != map-header) (entry-start end)) { UVM_MAP_CLIP_END(map, entry, end); - - switch (new_advice) { - case MADV_NORMAL: - case MADV_RANDOM: - case MADV_SEQUENTIAL: - /* nothing special here */ - break; - - default: - vm_map_unlock(map); - UVMHIST_LOG(maphist,- done (INVALID ARG),0,0,0,0); - return (EINVAL); - } entry-advice = new_advice; entry = entry-next; } -- antonvm -- antonvm
Re: some cleanup of of uvm_map.c
On Thu, Apr 22, 2010 at 11:56 PM, Anton Maksimenkov anton...@gmail.com wrote: So, can I remind this simple cleanup? Thanks, applied, but next time watch the tabs vs spaces when mailing a diff. 2010/1/31 Anton Maksimenkov anton...@gmail.com: Here is some cleanup of uvm_map.c code.
Re: some cleanup of of uvm_map.c
On Thu, Feb 04, 2010 at 11:51:11AM +0500, Anton Maksimenkov wrote: 2010/2/3 Ariane van der Steldt ari...@stack.nl: On Tue, Feb 02, 2010 at 01:21:55AM +0500, Anton Maksimenkov wrote: uvm_map_lookup_entry()... in uvm_map_findspace()... I'm pretty sure that function is bugged: I think it doesn't do wrap-around on the search for an entry. ? Both functions are rather complex. And both functions pre-date random allocation. All this map-hint business has no reason to remain, since random allocation is here to stay and the hint has a high failure rate since then (if it is used, which it usually isn't anyway). The RB_TREE code in uvm_map_findspace is a pretty good solution given the map entry struct. You will need to understand the RB_AUGMENT hack that is only used in this place of the kernel though. 2010/2/3 Owain Ainsworth zer...@googlemail.com: On Wed, Feb 03, 2010 at 04:33:50PM +0100, Ariane van der Steldt wrote: I'm pretty sure that function is bugged... Yes. Henning has a bgpd core router where bgpd occasionally dies due to Let me introduce my idea. In fact, we have only two functions in uvm_map.c which make searching in maps. These are uvm_map_lookup_entry() and uvm_map_findspace(). The uvm_map_lookup_entry() do search by address (VA), while uvm_map_findspace() do search by free space. uvm_map_findspace() is also used when searching by addr: mmap(..., MAP_FIXED) requires a search by addr. uvm_map_findspace() should also be able to take an addr constraint: like Oga said, i386 has a segmented view of memory wrt W^X. And we have a RB_HEAD(uvm_tree, vm_map_entry) rbhead, which is indexed by address (see uvm_compare()). Since that this RB_TREE provide a searching by address option to us. This is what the RB_TREE used for. But when we try to use that RB_TREE for track free space and to SEARCH by free space ? we do dirty and ugly things! Then we add a RB_AUGMENT hack (it is so ugly and hard to use right with others that you can't find it anywhere in source tree... exept the uvm_map.c) and other tricks... Nah, RB_AUGMENT is easy. When an item in the tree is deleted, each node in the tree that is altered (position or insertion/removal) has each of its children and each of its parents RB_AUGMENTed. The RB_AUGMENT calls are done in such a way that each node is process prior to RB_PARENT(node) being processed. In the end we got very unclear uvm_map.c code in these places, which is very hard to understand and track, and which is overloaded and overcomplexed by that tricks and crutches. We must stop it, because it keeps hold our hands and brains. We can do things very clear and easy. Let me show how exactly. Let's just add second RB_TREE, which is indexed by space, let's call it RB_HEAD(uvm_tree_by_space, vm_map_entry) rbhead_by_space, for example. That uvm_tree_by_space will contain vm_map_entry'es sorted by free space. And that uvm_tree_by_space will be used only in uvm_map_findspace() to search for vm_map_entry with needed free space. RB_NFIND() is our best friend here! Simple, reasonable fast, clearly and easy to understand. I use that trick in uvm_pmemrange. It's a nice algorithm, easy to implement. Keep in mind however: you'll need to support random allocation, or the diff will be rejected. I was actually thinking along the same lines: rb-tree of free space, ordered by size. Let: start = first rb-entry with enough space for allocation Let: end = rb_max(free tree) Let: chosen = random chunk in [start, end] You'll want indices on the tree, so the random generator can be given a number to use. (If the number is too large, you'll get bias on the largest segments, if too small, you won't consider them.) Once you have 'chosen', you need to generate a random address inside. Let: offset = random number between 0 and (end-start) Now you have a random address inside chosen: chosen.start + offset. Disadvantage of the algorithm is that it requires 2 random numbers (the current algorithm only requires 1). Advantage is that this algorithm always produces a valid address, so no forward searching is necessary anymore. The address may have to be shifted because pmap_prefer says so. If that's the case, do so whenever possible: it removes aliasing problems on some architectures (afaik pmap has a way of dealing with it, but it's hideously expensive when it has to). Actually, since we can have two or more vm_map_entry'es with equal space, so each RB_TREE element will be the list of vm_map_entry'es with that equal space. We can use some additional logic here when we push/pop vm_map_entry from that list ? we can pop vm_map_entry with smallest start address, for example. You can't simply take the lowest start address. It would make addresses completely predictable. Then we must free the first RB_TREE (uvm_tree, indexed by address) from tracking space/ownspace and other nasty tricks, and stop using it in uvm_map_findspace(). This uvm_tree must
Re: some cleanup of of uvm_map.c
2010/2/3 Owain Ainsworth zer...@googlemail.com: ...you can't just go straight to the start of the map. Say i386 where W^X is done using segments. if you dump something right at the other end of the segment, that pretty much screws it. Perhaps going back to the min hint for the protection and trying to push just a little bit down may work? 2010/2/4 Ariane van der Steldt ari...@stack.nl: mmap(..., MAP_FIXED) requires a search by addr. uvm_map_findspace() should also be able to take an addr constraint: like Oga said, i386 has a segmented view of memory wrt W^X. ... One more problem you may face, is uvm_km_getpage starvation. If it happens, you'll be unable to allocate a vm_map_entry. In your design, you'll need 2 per allocation most of the time. If that's too much pressure, the kernel may starve itself and be unable to create new entries, becoming completely jammed. This is a problem on any non-pmap_direct architectures (like i386). I remember about MAP_FIXED, just not mentioned it in my not-so-short message ;) And I don't want 2 vm_map_entry per allocation, I only need to keep each vm_map_entry in both trees. One vm_map_entry can contain 2 separate RB_TREE entries (for both trees), so each tree can work with that vm_map_entry independantly. Can anyone explain me what is the problem with i386 segments? Or better supply some links to docs which explain it. I don't understand what you mean - MAP_FIXED flag is the problem or it used as workaround for some problem? -- antonvm
Re: some cleanup of of uvm_map.c
On Sun, Jan 31, 2010 at 05:43:02PM +0500, Anton Maksimenkov wrote: Here is some cleanup of uvm_map.c code. At first, in uvm_map_advice() function. Let it looks as simple as uvm_map_inherit() - no need to lock/unlock map just to realize that sanity check fail. Also no need to do it in every loop. The functional change is that a wrong advice on an empty range will now return EINVAL instead of 0. I actually like that. Second, remove redundant temp_entry variable from both functions. One entry variable is enough to do the job. I agree temp_entry is a redundant variable in both functions. I think the compiler may already optimize that away (otoh, we are talking gcc here...) $ cvs diff -Nup sys/uvm/uvm_map.c Index: sys/uvm/uvm_map.c === RCS file: /cvs/src/sys/uvm/uvm_map.c,v retrieving revision 1.123 diff -u -p -r1.123 uvm_map.c --- sys/uvm/uvm_map.c 28 Aug 2009 00:40:03 - 1.123 +++ sys/uvm/uvm_map.c 31 Jan 2010 11:35:56 - @@ -2473,7 +2473,7 @@ int uvm_map_inherit(struct vm_map *map, vaddr_t start, vaddr_t end, vm_inherit_t new_inheritance) { - struct vm_map_entry *entry, *temp_entry; + struct vm_map_entry *entry; UVMHIST_FUNC(uvm_map_inherit); UVMHIST_CALLED(maphist); UVMHIST_LOG(maphist,(map=%p,start=0x%lx,end=0x%lx,new_inh=0x%lx), map, start, end, new_inheritance); @@ -2492,11 +2492,10 @@ uvm_map_inherit(struct vm_map *map, vadd VM_MAP_RANGE_CHECK(map, start, end); - if (uvm_map_lookup_entry(map, start, temp_entry)) { - entry = temp_entry; + if (uvm_map_lookup_entry(map, start, entry)) { UVM_MAP_CLIP_START(map, entry, start); } else { - entry = temp_entry-next; + entry = entry-next; } while ((entry != map-header) (entry-start end)) { @@ -2519,18 +2518,29 @@ uvm_map_inherit(struct vm_map *map, vadd int uvm_map_advice(struct vm_map *map, vaddr_t start, vaddr_t end, int new_advice) { - struct vm_map_entry *entry, *temp_entry; + struct vm_map_entry *entry; UVMHIST_FUNC(uvm_map_advice); UVMHIST_CALLED(maphist); UVMHIST_LOG(maphist,(map=%p,start=0x%lx,end=0x%lx,new_adv=0x%lx), map, start, end, new_advice); + switch (new_advice) { + case MADV_NORMAL: + case MADV_RANDOM: + case MADV_SEQUENTIAL: + /* nothing special here */ + break; + + default: + UVMHIST_LOG(maphist,- done (INVALID ARG),0,0,0,0); + return (EINVAL); + } + vm_map_lock(map); VM_MAP_RANGE_CHECK(map, start, end); - if (uvm_map_lookup_entry(map, start, temp_entry)) { - entry = temp_entry; + if (uvm_map_lookup_entry(map, start, entry)) { UVM_MAP_CLIP_START(map, entry, start); } else { - entry = temp_entry-next; + entry = entry-next; } /* @@ -2539,19 +2549,6 @@ uvm_map_advice(struct vm_map *map, vaddr while ((entry != map-header) (entry-start end)) { UVM_MAP_CLIP_END(map, entry, end); - - switch (new_advice) { - case MADV_NORMAL: - case MADV_RANDOM: - case MADV_SEQUENTIAL: - /* nothing special here */ - break; - - default: - vm_map_unlock(map); - UVMHIST_LOG(maphist,- done (INVALID ARG),0,0,0,0); - return (EINVAL); - } entry-advice = new_advice; entry = entry-next; } Ok with me. -- Ariane
Re: some cleanup of of uvm_map.c
On Tue, Feb 02, 2010 at 01:21:55AM +0500, Anton Maksimenkov wrote: 2010/2/2 Anton Maksimenkov anton...@gmail.com: Overcomplicated using of RB_TREE in uvm_map_lookup_entry() making it hard to understand. I thinking about drop this RB_TREE out from uvm_map.c or completely rewrite using of this RB_TREE. Oh, sorry, I mean in uvm_map_findspace() of course. I'm pretty sure that function is bugged: I think it doesn't do wrap-around on the search for an entry. (Should be easily provable, just hint=vm_map_max(map) when entering uvm_map_findspace().) You won't notice in kernel space, since every caller will set hint=vm_map_min(map), but I suspect a number of failed allocations in userspace are actually caused by that. In uvm_map_lookup_entry() the use of RB_TREE is understandable. Both functions are rather complex. And both functions pre-date random allocation. All this map-hint business has no reason to remain, since random allocation is here to stay and the hint has a high failure rate since then (if it is used, which it usually isn't anyway). The RB_TREE code in uvm_map_findspace is a pretty good solution given the map entry struct. You will need to understand the RB_AUGMENT hack that is only used in this place of the kernel though. Ciao, -- Ariane
Re: some cleanup of of uvm_map.c
2010/2/3 Ariane van der Steldt ari...@stack.nl: On Tue, Feb 02, 2010 at 01:21:55AM +0500, Anton Maksimenkov wrote: uvm_map_lookup_entry()... in uvm_map_findspace()... I'm pretty sure that function is bugged: I think it doesn't do wrap-around on the search for an entry. Both functions are rather complex. And both functions pre-date random allocation. All this map-hint business has no reason to remain, since random allocation is here to stay and the hint has a high failure rate since then (if it is used, which it usually isn't anyway). The RB_TREE code in uvm_map_findspace is a pretty good solution given the map entry struct. You will need to understand the RB_AUGMENT hack that is only used in this place of the kernel though. 2010/2/3 Owain Ainsworth zer...@googlemail.com: On Wed, Feb 03, 2010 at 04:33:50PM +0100, Ariane van der Steldt wrote: I'm pretty sure that function is bugged... Yes. Henning has a bgpd core router where bgpd occasionally dies due to Let me introduce my idea. In fact, we have only two functions in uvm_map.c which make searching in maps. These are uvm_map_lookup_entry() and uvm_map_findspace(). The uvm_map_lookup_entry() do search by address (VA), while uvm_map_findspace() do search by free space. And we have a RB_HEAD(uvm_tree, vm_map_entry) rbhead, which is indexed by address (see uvm_compare()). Since that this RB_TREE provide a searching by address option to us. This is what the RB_TREE used for. But when we try to use that RB_TREE for track free space and to SEARCH by free space we do dirty and ugly things! Then we add a RB_AUGMENT hack (it is so ugly and hard to use right with others that you can't find it anywhere in source tree... exept the uvm_map.c) and other tricks... In the end we got very unclear uvm_map.c code in these places, which is very hard to understand and track, and which is overloaded and overcomplexed by that tricks and crutches. We must stop it, because it keeps hold our hands and brains. We can do things very clear and easy. Let me show how exactly. Let's just add second RB_TREE, which is indexed by space, let's call it RB_HEAD(uvm_tree_by_space, vm_map_entry) rbhead_by_space, for example. That uvm_tree_by_space will contain vm_map_entry'es sorted by free space. And that uvm_tree_by_space will be used only in uvm_map_findspace() to search for vm_map_entry with needed free space. RB_NFIND() is our best friend here! Simple, reasonable fast, clearly and easy to understand. Actually, since we can have two or more vm_map_entry'es with equal space, so each RB_TREE element will be the list of vm_map_entry'es with that equal space. We can use some additional logic here when we push/pop vm_map_entry from that list we can pop vm_map_entry with smallest start address, for example. Then we must free the first RB_TREE (uvm_tree, indexed by address) from tracking space/ownspace and other nasty tricks, and stop using it in uvm_map_findspace(). This uvm_tree must be used only in uvm_map_lookup_entry(). Again, RB_NFIND() is our best friend here (of course, in couple with RB_LEFT and other logic, etc). And again, simple, reasonable fast, clearly and easy to understand! And we can (and must) COMPLETELY STOP using RB_AUGMENT hack and forget about it as a bad dream, at last. If we decide that using of RB_TREE at each lookup/findspace is heavy when there are little of vm_map_entry'es in map, we can just use linear search in this case. And use RB_FIND/NFIND only when number of vm_map_entry'es will be bigger than some number. We can forget about hint and probably drop it's usage here in uvm_map.c (I'll review it later). RB_FIND/RB_NFIND will do their best, especially with random allocation, especially when there will be many vm_map_entry'es in the map. So in the end. I have some pieces of patch in my mind, and I'll create diff with ideas introduced here. If you have any positives and negatives so please, say it here. -- antonvm
Re: some cleanup of of uvm_map.c
On 31 January 2010 05:43, Anton Maksimenkov anton...@gmail.com wrote: Here is some cleanup of uvm_map.c code. Second, remove redundant temp_entry variable from both functions. One entry variable is enough to do the job. I don't think you want to do that.. uvm_map_lookup_entry can screw with that pointer even when it fails. bad juju
Re: some cleanup of of uvm_map.c
2010/2/1 Bob Beck b...@ualberta.ca: On 31 January 2010 05:43, Anton Maksimenkov anton...@gmail.com wrote: Second, remove redundant temp_entry variable from both functions. One entry variable is enough to do the job. I don't think you want to do that.. uvm_map_lookup_entry can screw with that pointer even when it fails. Sorry, I don't clearly understand what you mean. What is bad? I think that screw with that pointer even when it fails is exactly what uvm_map_lookup_entry must do and what we need here. uvm_map_protect() exploit exactly same trick. And uvm_map_extract() do the same: ... if (uvm_map_lookup_entry(srcmap, start, entry)) { ...do some work with entry... } else { /* start is not within an entry ... skip to next entry */ if (flags UVM_EXTRACT_CONTIG) { error = EINVAL; goto bad;/* definite hole here ... */ } entry = entry-next; fudge = 0; } AFAIS, uvm_map_lookup_entry() set the entry either to the vm_map_entry which contain start address (success) or to the last vm_map_entry in the map (fail). If it fail and set entry to the last vm_map_entry then entry = entry-next move entry to the map-header. So loop with while ((entry != map-header) (entry-start end)) { will never be executed (it is right thing). Overcomplicated using of RB_TREE in uvm_map_lookup_entry() making it hard to understand. I thinking about drop this RB_TREE out from uvm_map.c or completely rewrite using of this RB_TREE. -- antonvm