> Hi, > My AMD Opteron supports 4KB, 2MB and 1GB page sizes. > I observed that there is performance improvement > (reduced elapsed time) for some multi-threaded > applications when I used 2MB page-size for heap. > These applications need around 650MB heap (it reads a > huge file of around 650MB size). However, when I used > 1GB pages for heap, there is performance degradation > for these programs. If we use 1 GB, then one page is > enough to satisfy the heap space, right? Then, why we > see performance degradation in this case. Please let > me know. > > $ pagesize -a > 4096 > 2097152 > 1073741824
I'm guessing that a page has to be contiguous in RAM as well as in VM. 1GB is a big chunk of contiguous RAM to ask for, and the system may have to page out a lot of other stuff to be able to make available a 1GB page. In addition, with smaller pages, only the mappings and a VM reservation need to be made right away; RAM to correspond to those pages doesn't need to be made available until they're accessed. So with smaller pages, you're only tying up RAM as you're actually using it (or have used it recently enough that there hasn't been a reason to page it out). With a large page, the whole page has to be brought in at once (and zeroed the first time it's made available, too). If one obtained all one's large pages before memory became fragmented, and there wasn't much else running on the system, and there was never a need to page out, then large pages _might_ be faster, except maybe not so much given the initialization time for them, esp. if one was only using 65% of 1G. So...first, you're not running in isolation, but how your program interacts with the system and everything else on it is also an issue. Second, having to allocate an entire 1GB contiguous chunk (maybe even aligned on a 1GB boundary?) is a big deal even if little else is running on the system, and having to zero 350MB that you'll never use also takes a little time. The page sizes you mention seem a little extreme in their spread 4K, then 4K*512, then 4K*512*512). On a particular SPARC I'm looking at, they're much closer together, but there are four sizes instead of three: 8K, 64K (8K*8), 512K (8K*8*8), and 4M (8K*8*8*8). There, even the largest isn't crazy large. Now maybe AMD will be right with the larger spread and much larger maximum size when everybody has 1TB or more of RAM, and the RAM, MMU, and so on are much faster. But for now, I can think of relatively few cases where a 1G page is likely to be useful... -- This message posted from opensolaris.org _______________________________________________ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org