On Mon 02-03-15 17:18:14, Mike Kravetz wrote: > On 03/02/2015 03:10 PM, Andrew Morton wrote: > >On Fri, 27 Feb 2015 14:58:08 -0800 Mike Kravetz <mike.krav...@oracle.com> > >wrote: > > > >>hugetlbfs allocates huge pages from the global pool as needed. Even if > >>the global pool contains a sufficient number pages for the filesystem > >>size at mount time, those global pages could be grabbed for some other > >>use. As a result, filesystem huge page allocations may fail due to lack > >>of pages. > > > >Well OK, but why is this a sufficiently serious problem to justify > >kernel changes? Please provide enough info for others to be able > >to understand the value of the change. > > > > Thanks for taking a look. > > Applications such as a database want to use huge pages for performance > reasons. hugetlbfs filesystem semantics with ownership and modes work > well to manage access to a pool of huge pages. However, the application > would like some reasonable assurance that allocations will not fail due > to a lack of huge pages. Before starting, the application will ensure > that enough huge pages exist on the system in the global pools. What > the application wants is exclusive use of a pool of huge pages. > > One could argue that this is a system administration issue. The global > huge page pools are only available to users with root privilege. > Therefore, exclusive use of a pool of huge pages can be obtained by > limiting access. However, many applications are installed to run with > elevated privilege to take advantage of resources like huge pages. It > is quite possible for one application to interfere another, especially > in the case of something like huge pages where the pool size is mostly > fixed. > > Suggestions for other ways to approach this situation are appreciated. > I saw the existing support for "reservations" within hugetlbfs and > thought of extending this to cover the size of the filesystem.
Maybe I do not understand your usecase properly but wouldn't hugetlb cgroup (CONFIG_CGROUP_HUGETLB) help to guarantee the same? Just configure limits for different users/applications (inside different groups) so that they never overcommit the existing pool. Would that work for you? -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/