> Hi,
> My AMD Opteron supports 4KB, 2MB and 1GB page sizes.
> I observed that there is performance improvement
> (reduced elapsed time) for some multi-threaded
> applications when I used 2MB page-size for heap.
> These applications need around 650MB heap (it reads a
> huge file of around 650MB size). However, when I used
> 1GB pages for heap, there is performance degradation
> for these programs. If we use 1 GB, then one page is
> enough to satisfy the heap space, right? Then, why we
> see performance degradation in this case. Please let
> me know.
> 
> $ pagesize -a
> 4096
> 2097152
> 1073741824

I'm guessing that a page has to be contiguous in RAM
as well as in VM.  1GB is a big chunk of contiguous RAM
to ask for, and the system may have to page out a lot
of other stuff to be able to make available a 1GB page.

In addition, with smaller pages, only the mappings and
a VM reservation need to be made right away; RAM to
correspond to those pages doesn't need to be made available
until they're accessed.  So with smaller pages, you're only
tying up RAM as you're actually using it (or have used it
recently enough that there hasn't been a reason to page it out).
With a large page, the whole page has to be brought in at once
(and zeroed the first time it's made available, too).

If one obtained all one's large pages before memory became fragmented,
and there wasn't much else running on the system, and there was
never a need to page out, then large pages _might_ be faster,
except maybe not so much given the initialization time for them,
esp. if one was only using 65% of 1G.

So...first, you're not running in isolation, but how your program
interacts with the system and everything else on it is also an issue.
Second, having to allocate an entire 1GB contiguous chunk (maybe
even aligned on a 1GB boundary?) is a big deal even if little else
is running on the system, and having to zero 350MB that you'll never
use also takes a little time.

The page sizes you mention seem a little extreme in their spread
4K, then 4K*512, then 4K*512*512).  On a particular SPARC I'm looking
at, they're much closer together, but there are four sizes instead of three:
8K, 64K (8K*8), 512K (8K*8*8), and 4M (8K*8*8*8).  There, even the largest
isn't crazy large.  Now maybe AMD will be right with the larger spread and
much larger maximum size when everybody has 1TB or more of RAM, and
the RAM, MMU, and so on are much faster.  But for now, I can think of
relatively few cases where a 1G page is likely to be useful...
-- 
This message posted from opensolaris.org
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to