Re: [PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v3

Andrew Morton Tue, 09 Oct 2012 15:19:18 -0700

On Wed,  3 Oct 2012 15:24:23 -0700
Andi Kleen <[email protected]> wrote:


> From: Andi Kleen <[email protected]>
> 
> There was some desire in large applications using MAP_HUGETLB/SHM_HUGETLB
> to use 1GB huge pages on some mappings, and stay with 2MB on others. This
> is useful together with NUMA policy: use 2MB interleaving on some mappings,
> but 1GB on local mappings.
> 
> This patch extends the IPC/SHM syscall interfaces slightly to allow specifying
> the page size.
> 
> It borrows some upper bits in the existing flag arguments and allows encoding
> the log of the desired page size in addition to the *_HUGETLB flag.
> When 0 is specified the default size is used, this makes the change fully
> compatible.
> 
> Extending the internal hugetlb code to handle this is straight forward. 
> Instead
> of a single mount it just keeps an array of them and selects the right
> mount based on the specified page size.
> 
> I also exported the new flags to the user headers
> (they were previously under __KERNEL__). Right now only symbols
> for x86 and some other architecture for 1GB and 2MB are defined.
> The interface should already work for all other architectures
> though.

So some manpages need updating.  I'm not sure which - mmap(2) surely,
but which for the IPC change?

> v2: Port to new tree. Fix unmount.
> v3: Ported to latest tree.
> Acked-by: Rik van Riel <[email protected]>
> Acked-by: KAMEZAWA Hiroyuki <[email protected]>
> Signed-off-by: Andi Kleen <[email protected]>
> ---
>  arch/x86/include/asm/mman.h |    3 ++
>  fs/hugetlbfs/inode.c        |   63 
> ++++++++++++++++++++++++++++++++++---------
>  include/asm-generic/mman.h  |   13 +++++++++
>  include/linux/hugetlb.h     |   12 +++++++-
>  include/linux/shm.h         |   19 +++++++++++++
>  ipc/shm.c                   |    3 +-
>  mm/mmap.c                   |    5 ++-

Alas, include/asm-generic/mman.h doesn't exist now.

Does this change touch all the hugetlb-capable architectures?

z:/usr/src/linux-3.6> grep -rl MAP_HUGETLB arch
arch/alpha/include/asm/mman.h
arch/xtensa/include/asm/mman.h
arch/parisc/include/asm/mman.h
arch/tile/include/asm/mman.h
arch/sparc/include/asm/mman.h
arch/powerpc/include/asm/mman.h
arch/mips/include/asm/mman.h

>
> ...
>
> @@ -933,9 +933,22 @@ static int can_do_hugetlb_shm(void)
>       return capable(CAP_IPC_LOCK) || in_group_p(shm_group);
>  }
>  
> +static int get_hstate_idx(int page_size_log)

nitlet: "page_size_order" would be more kernely.  Or just "page_order".

> +{
> +     struct hstate *h;
> +
> +     if (!page_size_log)
> +             return default_hstate_idx;
> +     h = size_to_hstate(1 << page_size_log);
> +     if (!h)
> +             return -1;
> +     return h - hstates;
> +}
>
> ...
>
>  static int __init init_hugetlbfs_fs(void)
>  {
> +     struct hstate *h;
>       int error;
> -     struct vfsmount *vfsmount;
> +     int i;
>  
>       error = bdi_init(&hugetlbfs_backing_dev_info);
>       if (error)
> @@ -1030,14 +1049,26 @@ static int __init init_hugetlbfs_fs(void)
>       if (error)
>               goto out;
>  
> -     vfsmount = kern_mount(&hugetlbfs_fs_type);
> +     i = 0;
> +     for_each_hstate (h) {
> +             char buf[50];
> +             unsigned ps_kb = 1U << (h->order + PAGE_SHIFT - 10);
>  
> -     if (!IS_ERR(vfsmount)) {
> -             hugetlbfs_vfsmount = vfsmount;
> -             return 0;
> -     }
> +             snprintf(buf, sizeof buf, "pagesize=%uK", ps_kb);
> +             hugetlbfs_vfsmount[i] = kern_mount_data(&hugetlbfs_fs_type,
> +                                                     buf);
>  
> -     error = PTR_ERR(vfsmount);
> +             if (IS_ERR(hugetlbfs_vfsmount[i])) {
> +                             pr_err(
> +                     "hugetlb: Cannot mount internal hugetlbfs for page size 
> %uK",
> +                            ps_kb);
> +                     error = PTR_ERR(hugetlbfs_vfsmount[i]);
> +             }
> +             i++;
> +     }
> +     /* Non default hstates are optional */
> +     if (hugetlbfs_vfsmount[default_hstate_idx])
> +             return 0;

hm, so if I'm understanding this, the patch mounts hugetlbfs N times,
once for each page size.  And presumably the shm code somehow selects
one of these mounts, based on incoming flags.  And presumably if those
flags are all-zero, the behaviour is unaltered.

Please update the changelog to describe all this - the overview of how
the patch actually operates.

Also, all this affects the /proc/mounts contents, yes?  Let's changelog
that very-slightly-non-back-compatible user-visible change as well.

There's some overhead to doing all those additional mounts.  Can we
quantify it?

>
> ...
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v3

Reply via email to