On Fri, Feb 15, 2008 at 08:50:53AM -0800, Adam Litke wrote:
> 
> The current remapping algortithm assumes that each PT_LOAD elf
> segment has a starting address (vaddr) that is aligned to the huge
> page size.  This condition was enforced by the custom linker scripts
> that are packaged with libhugetlbfs.
> 
> Unaligned starting addresses can be corrected simply by aligning
> them down to a huge page boundary and tracking the number of bytes
> by which the address was adjusted (offset).  When copying segment
> data into huge pages, the offset is used to ensure the data is
> copied to the proper offset within the huge page mapping.  An
> algorithmic solution to this issue eliminates a dependency on our
> custom linker scripts.
> 
> Unaligned segments have increased the complexity of segment
> preparation and remapping.  As a result, I felt it necessary to
> update the code comments to (hopefully) more clearly represent the
> nature of the calculations.
> 
> Signed-off-by: Adam Litke <[EMAIL PROTECTED]>

Seems sound in principle.

It seems to me, though, that already here you need the checks for
sufficiently large gaps between segments.  i.e. when you align the
segment edges outwards you should check that they don't collide with
something else.


>  elflink.c |   63 
> +++++++++++++++++++++++++++++++++++++++++++++----------------
>  1 files changed, 46 insertions(+), 17 deletions(-)
> 
> diff --git a/elflink.c b/elflink.c
> index d805e13..2b18620 100644
> --- a/elflink.c
> +++ b/elflink.c
> @@ -705,11 +705,27 @@ static void parse_elf_normal(void)
>  static int prepare_segment(struct seg_info *seg)
>  {
>       int hpage_size = gethugepagesize();
> -     void *p;
> -     unsigned long size;
> +     void *start, *p;
> +     unsigned long size, offset;
>  
> -     size = ALIGN(seg->filesz + seg->extrasz, hpage_size);
> +     /*
> +      * mmaps must begin at an address modulo the page size.  If the

Nitpick; "an address modulo the page size" is meaningless, "an address
congruent to zero modulo the page size" is what you mean, but "an
address aligned to the page size" would be clearer than both.

> +      * vaddr of this segment is not hpage_size aligned, align it downward
> +      * and begin the mmap there.  Note the offset so we can copy data to
> +      * the correct starting address within the temporary mmap.
> +      */
> +     start = (void *) ALIGN_DOWN((unsigned long)seg->vaddr, hpage_size);
> +     offset = (unsigned long)(seg->vaddr - start);

Unnecessary cast.

> +     /*
> +      * Calculate the size of the temporary mapping we must create.
> +      * This includes the offset (described above) and the filesz and
> +      * extrasz portions of the segment (described below).  We must align
> +      * this total to the huge page size so it will be valid for mmap.
> +      */
> +     size = ALIGN(offset + seg->filesz + seg->extrasz, hpage_size);
>  
> +     /* Create the temporary huge page mmap */
>       p = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, seg->fd, 0);
>       if (p == MAP_FAILED) {
>               ERROR("Couldn't map hugepage segment to copy data: %s\n",
> @@ -718,15 +734,17 @@ static int prepare_segment(struct seg_info *seg)
>       }
>  
>       /* 
> -      * Subtle, copying only filesz bytes of the segment
> -      * allows for much better performance than copying all of
> -      * memsz but it requires that all data (such as the plt)
> -      * be contained in the filesz portion of the segment.
> +      * Minimizing the amount of data copied will maximize performance.
> +      * By definition, the filesz portion of the segment contains
> +      * initialized data and must be copied.  If part of the memsz portion
> +      * is known to be initialized already, extrasz will be non-zero and
> +      * that many addtional bytes will be copied from the beginning of the
> +      * memsz region.  The rest of the memsz is understood to be zeroes and
> +      * need not be copied.
>        */
> -
>       DEBUG("Mapped hugeseg at %p. Copying %#0lx bytes and %#0lx extra bytes"
>               " from %p...", p, seg->filesz, seg->extrasz, seg->vaddr);
> -     memcpy(p, seg->vaddr, seg->filesz + seg->extrasz);
> +     memcpy(p + offset, seg->vaddr, seg->filesz + seg->extrasz);
>       DEBUG_CONT("done\n");
>  
>       munmap(p, size);
> @@ -913,8 +931,10 @@ static int obtain_prepared_file(struct seg_info 
> *htlb_seg_info)
>  static void remap_segments(struct seg_info *seg, int num)
>  {
>       long hpage_size = gethugepagesize();
> +     long page_size = getpagesize();
>       int i;
> -     void *p;
> +     void *start, *p;
> +     unsigned long offset, mapsize;
>  
>       /*
>        * XXX: The bogus call to mmap below forces ld.so to resolve the
> @@ -929,23 +949,32 @@ static void remap_segments(struct seg_info *seg, int 
> num)
>        * black hole.  We can't call anything which uses static data
>        * (ie. essentially any library function...)
>        */
> -     for (i = 0; i < num; i++)
> -             munmap(seg[i].vaddr, seg[i].memsz);
> +     for (i = 0; i < num; i++) {
> +             /* Handle unaligned segment vaddrs (see prepare_segment) */
> +             start = (void *) ALIGN_DOWN((unsigned long)seg[i].vaddr,
> +                                             page_size);
> +             offset = (unsigned long)(seg[i].vaddr - start);
> +             mapsize = ALIGN(offset + seg[i].memsz, page_size);
> +             munmap(start, mapsize);

This looks wrong.  You shouldn't need to align the addresses outwards
for the *unmap*.  This is the already mapped, normalpage segment, so
the unmodified addresses should be fine.

> +     }
>  
>       /* Step 4.  Rebuild the address space with hugetlb mappings */
>       /* NB: we can't do the remap as hugepages within the main loop
>        * because of PowerPC: we may need to unmap all the normal
>        * segments before the MMU segment is ok for hugepages */
>       for (i = 0; i < num; i++) {
> -             unsigned long mapsize = ALIGN(seg[i].memsz, hpage_size);
> +             start = (void *) ALIGN_DOWN((unsigned long)seg[i].vaddr,
> +                                             hpage_size);
> +             offset = (unsigned long)(seg[i].vaddr - start);
> +             mapsize = ALIGN(offset + seg[i].memsz, hpage_size);

This is the third time you've calculated these.  They should probably
go into the seg_info structure.

>  
> -             p = mmap(seg[i].vaddr, mapsize, seg[i].prot,
> +             p = mmap(start, mapsize, seg[i].prot,
>                        MAP_PRIVATE|MAP_FIXED, seg[i].fd, 0);
>               if (p == MAP_FAILED)
>                       unmapped_abort("Failed to map hugepage segment %u: "
> -                                    "%p-%p (errno=%u)\n", i, seg[i].vaddr,
> -                                    seg[i].vaddr+mapsize, errno);
> -             if (p != seg[i].vaddr)
> +                                     "%p-%p (errno=%u)\n", i, start,
> +                                     start + mapsize, errno);
> +             if (p != start)
>                       unmapped_abort("Mapped hugepage segment %u (%p-%p) at "
>                                      "wrong address %p\n", i, seg[i].vaddr,
>                                      seg[i].vaddr+mapsize, p);

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Libhugetlbfs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

Reply via email to