On Wed, Feb 27, 2008 at 06:17:02PM -0800, Nishanth Aravamudan wrote:
> On 28.02.2008 [13:00:34 +1100], David Gibson wrote:
> > On Tue, Feb 26, 2008 at 12:11:56AM -0800, Nishanth Aravamudan wrote:
> > > On 26.02.2008 [16:16:01 +1100], David Gibson wrote:
> > > > On Mon, Feb 25, 2008 at 09:05:42PM -0800, Nishanth Aravamudan wrote:
> > > > > On 26.02.2008 [15:33:55 +1100], David Gibson wrote:
> > > > > > On Mon, Feb 25, 2008 at 08:23:48PM -0800, Nishanth Aravamudan wrote:
> > > > > > > On 15.02.2008 [11:52:39 +1100], David Gibson wrote:
> > > > > > > > On Thu, Feb 14, 2008 at 11:48:42AM -0600, Andrew Hastings wrote:
> > [snip]
> > > > > If we keep trimming disabled, and allow the hugepage heap to jump in
> > > > > the
> > > > > address space (presumably because we're on power and something
> > > > > (/lib/ld-2.4.so for a 32-bit binary seems to be the common case)
> > > > > backed
> > > > > by small pages is in the way of a large mmap() [512M or so]), will
> > > > > that
> > > > > break applications?
> > > >
> > > > Um.. I don't see how the hugepage heap can jump. That's the whole
> > > > point of that else case, if we can't allocate following on from our
> > > > existing heap, we fail the morecore.
> > >
> > > The "if" applied to all of that first sentence. Sorry, I see the source
> > > of the confusion. I am not asking if we can currently violate brk()
> > > semantics. I am asking if we could change the code as I indicate below
> > > and whether a) that would violate brk() semantics and b) whether
> > > violating brk() semantics matters if we don't trim the heap.
> >
> > Ah, right, I think I understand now.
>
> Sorry for all the confusion.
>
> > Um.. a) yes, I think it would violate brk() semantics and b) I'm not
> > sure. I suspect violating brk() semantics won't work though, because
> > I think glibc's allocator will assume it can use everything between
> > the previous morecore value and the new one. And will probably barf
> > if the new value is less than the old, which it could be with this
> > change.
>
> Hrm, I guess I hadn't thought about it that way. I really am curious
> about these semantics. We already have cases where a small page [heap]
> exists in a processes address space, say at 0x10000000 for a 32-bit
> process and we start the hugepage heap in the morecore constructor at
> 0x20000000. If glibc thought it could use that entire area, but we
> didn't actually allocate any pages for it to use there, does it just
> take a bunch of page faults?
We really have such cases? I wouldn't have expected that to work.
When I wrote this, I thought from what I'd read of the glibc
documentation and code that it was only going to work reliably if we
got the very first morecore call, and every one after that.
> I'm not sure how the new value can be less than the old? I think the
> only way that happens is with trimming in response to a negative
> allocation request... But I'm suggesting we have a situation like:
>
> 0x10000000 [heap]
> 0x20000000 /hugetlbfs/tmp.file
> 0x30000000 /lib/ld-2.4.so
> 0x40000000 /hugetlbfs/tmp.file
>
> I think we'll still be returning higher addresses every time. I don't
> know, maybe it's not worth pursuing, but I was just curious.
In practice, we probably will, because of the way the kernel scans for
mapping addresses. But if we don't get our hinted address, there's no
guarantee where the mmap() will be - it could be in any free address
space, including before the executable itself.
> Do you happen to have a reference to the definitions for brk()
> semantics? Or do you recommend I just look at glibc's source [not
> something I often hear recommended :)]?
Alas, I think you will need to look at the glibc source. Good luck.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Libhugetlbfs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel