Re: [perf-discuss] file system cache / segmap tuning

Nicolas Michael Wed, 30 Apr 2008 12:32:22 -0700

Hi Jim,

thanks for your quick reply. My comments inline.


Jim Mauro schrieb:

>> - SPARC, 64 GB memory
>> - UFS, PxFS file systems
>>
>> Our application is writing some logs to disk (4 GB / hour), flushing 
>> some mmapped files from time to time (4 GB each 15 min), but is not 
>> doing much disk I/O.
>> Once our application is started and "warm", it doesn't allocate any 
>> further memory.  At this point, we have 3-4 GB of free memory (vmstat) 
>> and nothing paged out to disk (swap -l).   
> Well, something is certainly consuming memory, because you indicate this 
> is a 64GB system, and you show
> 3-4GB free. Who/what is consuming 60GB of RAM?

Our application! ;-)
I don't want to go into the details here, but there's nothing wrong 
about that. We know where all this memory is coming from (there are some 
processes with large heaps, some large shm segments and so on).

Steve has some slides on our application, in case you're really 
interested...

>> Since those memory requests are not coming from our application, I 
>> assume that those 5 GB (3 GB less free memory plus 2 GB paged-out 
>> data) are used for the file system cache. I always thought the fs 
>> cache would never grow any more once memory gets short, so it should 
>> never cause paging activity (since the cache list is part of the free 
>> list). Reading Solaris Internals, I just learned that there's not only 
>> a cache list, but also a segmap cache. As I understand this, the 
>> segmap cache may very well grow up to 12% of main memory and may even 
>> cause application pages to be paged out, correct? So, this might be 
>> what's happening here. Can I somehow monitor the segmap cache (since 
>> it is kernel memory, is it reported as "Kernel" in ::memstat?)?
>>   
> Thinks of UFS as having an L1 and L2 cache (like processors). segmap is 
> the L1 cache, when segmap fills up, pages get pushed
> out the cache list (the "L2" cache), where they can be reclaimed back 
> into segmap if they are referenced again via read/write
> before.

Ok, thanks.

> The 12% of memory being consumed by segmap is not what's hurting you 
> here (at least, I would be very
> surprised if it is).

We easily consume ~ 60 GB of memory just with our application (including 
kernel, libs etc.). This doesn't allow us to spend 12% of total memory 
for segmap cache in addition to that. If we would really use all segmap 
cache that's possible (7.68 GB), we would exceed our physical memory -- 
and I think this is happening.

We can't reduce our application's demand for memory (in fact, we already 
did reduce it by something like 20 GB to fit into 64 GB of memory), so 
we need to reduce the max segmap cache size. Otherwise we would need to 
install more memory in the system (which we don't want).

>> My idea is now to set segmap_percent=1 to decrease the max size of the 
>> segmap cache and this way avoid having pages paged out due to growing 
>> fs cache. In a testrun with this configuration, my free memory doesn't 
>> fall below 3.5 GB any more and nothing is being paged out -- saving me 
>> 4.5 GB of memory!
>>   
> Does this machine really have 64GB of RAM (as indicated above)?

Yep!

>> Since we don't do much disk I/O, I would assume that we don't gain 
>> much from the segmap cache anyway, so I would like to configure it to 
>> 1%. File system pages will still be cached in the cache list as long 
>> as memory is available, right? With the advantage, that the cache list 
>> is basically "free" memory and would never cause other pages to be 
>> paged out.   
> Generally, yes.

Ok.

>> I'm not sure, but as I understand it the segmap cache is still used 
>> during read and write operations, right? So, every time we write a 
>> file, we always write into the segmap cache. If this cache is small 
>> (let's say: 1% = 640 MB), we might be slowed done when writing more 
>> than 640 MB all at once. However, if we would only write 64 MB every 
>> minute, pages from the segmap cache would migrate to the cache list 
>> and make room for more pages in the segmap cache, so next time we 
>> write 64 MB, would there again be enough space in the segmap cache for 
>> the write operation?
>>   
> Generally, yes, assuming the writes are not to files with 
> O_SYNC/O_DSYNC, in which case every write must go through
> the cache anyway.

Thanks.

>> Also, just to be sure: memory mapped files are never read or written 
>> through the segmap cache, so shrinking that cache has no effect on 
>> memory mapped files, right?
>>   
> That is correct. mmap()'d files are not cached in segmap.

Ok, that's good to know.

> Something is missing here, or the 64GB value is wrong.
> 
> You need to figure out who/what is consuming 60GB of RAM.
> Use ' echo "::memstat" | mdb -k' for a high-order profile.

As I said above, there's really nothing wrong with our application 
consuming 60 GB... ;-)

But here it is:

Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     291570              2277    4%
Anon                      6892465             53847   83%
Exec and libs               45966               359    1%
Page cache                 751264              5869    9%
Free (cachelist)           137777              1076    2%
Free (freelist)            179173              1399    2%

Total                     8298215             64829
Physical                  8166070             63797

This snapshot has been taken before I reconfigured the system. So this 
is with segmap_percent=12. It was taken 2 hours after a long testrun. As 
I wrote above, free memory jumped from 1 GB to 2.5 GB 1 hour after we 
stopped the load. The only explanation I have for this is pages being 
freed from the segmap cache.

Steve wrote, segmap cache is part of "Page cache". Assuming, there was 
1.5 GB more data in the segmap cache during the testrun, this would make 
7.4 GB Page cache. 4 GB of it are memory mapped files. This leaves 3.4 
GB for segmap cache. Seems to me that's just 50% of its possible max 
size, but still too much for our system.

I believe we don't need that much for segmap. All we are doing on the 
file system (except for the mmapped files) is write a large logfile 
sequentially, close it and copy it to a different location, and later on 
ftp it somewhere. This shouldn't require much segmap cache...

Thanks a lot,
Nick.
_______________________________________________
perf-discuss mailing list
[email protected]

Re: [perf-discuss] file system cache / segmap tuning

Reply via email to