Thanks for the explanation! I noticed during my earlier investigation
that the source of the data size is struct vmspace::vm_dused. This is
updated mostly in uvm_mapanon and uvm_map. The second function seems to
be a more general case. I think during my file mapping the second
function is called and the part where vm_dused is updated is skipped.
I'm setting up a VM on my machine with a debug kernel (this should be an
interesting exercise) to do some exploratory kernel debugging, just to
understand the process. So far this seems to make sense. I never doubted
the system is doing the "right" thing :)

I also had some questions about the page and the buffer caches. So far I
gathered the following facts (please correct me if I am wrong):

OpenBSD has separate page and buffer caches, i.e., no UBC.
Q: Is this done for security reasons or due to some complication? Just
curious.

The Inactive pages seem to back the page cache. I ran my file mapping
code a few times mapping/releasing a large file (about 300 MB) with
systat running in the uvm view, and saw the page counts for Active and
Inactive swing back and forth, keeping the total fixed.

Then I ran md5 on another 100 MB file, and this time the Cache number in
top grew by about 100 MB, with some brief disk activity (I'm on SSD so
things are zippy). I next ran my file mapping program on it. This time
the Active pages grew by about 100 MB, raising the total by the same
amount. When the program ended, those pages moved to Inactive, keeping
the total fixed. There was no disk activity during this and Cache
remained unchanged.

This seems to indicate that the data for the new file was copied from
the buffer cache to the page cache during the mapping, and both copies
were maintained.

Regards,
Anindya

From: Otto Moerbeek <[email protected]>
Sent: January 22, 2021 12:01 AM
To: Anindya Mukherjee <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Understanding memory statistics 
 
On Thu, Jan 21, 2021 at 10:38:59PM +0000, Anindya Mukherjee wrote:

> Hi,
> 
> Just to follow up, I was playing with allocating memory from a test
> program in various ways in order to produce a situation when SIZE is
> less than RES. The following program causes this to happen. If I mmap a
> large file, the SIZE remains tiny, indicating that the mapped region is
> not counted as part of text + data + stack. Then when I go ahead and
> touch all the memory, SIZE remains tiny but RES grows to the size of the
> file. Very interesting.

So SIZE does not include mappings backed by a file system object, but
RES does. RES only grows once the pages are touched, this is demand
paging in action (anon pages act the same way).

Nice. I already suspected would be something like that, but never took
the time to find out by experimenting or code study.

Now the next quesion is if SIZE *should* include non-anonymous
pages. getrlimit(2) explicitly says RLIMIT_DATA (which is limiting
SIZE) only includes anonymous data. So that hints SIZE indeed should not
include those anon pages. 

To back this up:

ps(1) lists several size related stats:

Desc            Keyw    Function        Value
Data             dsiz    dsize           p_vm_dsize
Resident 1      rss     p_rssize        p_vm_rssize
Resident 2      rsz     rssize          p_vm_rssize
Stack           ssiz    ssize           p_vm_ssize
Text (code)     tsiz    tsize           p_vm_tsize
Virtual         vsiz    vsize           p_vm_dsize + p_vm_ssize + _vm_tsize


top(1) uses the equivalent of vsiz for SIZE and rss for RES. So this
is consistent with your observations.

I note that the rss vs rsz distinciton ps(1) mentions does not
actually seems to be implemented in ps(1).

BTW: the proper way to get the size is by opening the file and use fstat(2).

        -Otto


> 
> Quick and very dirty code below:
> 
> /* This demonstrates why the SIZE column in OpenBSD top can be less than
>  * the RES column. This is because mmapped areas of virtual memory are
>  * not counted as text, data, or size, but counted as part of the
>  * resident pages, when touched. The program maps a (preferably large)
>  * file and then waits for the user to examine the process memory
>  * statistics.
>  */
> 
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <unistd.h>
> 
> int main(int argc, char **argv)
> {
>        char ch;
>        char *pch;
>        void *result;
>        FILE *fp;
>        int fd;
>        int i;
>        size_t mapSize;
>        int current = 0;
>        const int increment = 10;
>        double percent;
>        double mapRatio;
> 
>        if (argc < 2)
>        {
>                printf("No file name supplied.\n");
>                exit(1);
>        }
> 
>        printf("About to mmap. Press Enter... ");
>        getchar();
>        fp = fopen(argv[1], "r");
>        if (fp == NULL)
>        {
>                perror(NULL);
>                exit(1);
>        }
>        fd = fileno(fp);
> 
>        if (fseek(fp, 0, SEEK_END) == -1)
>        {
>                perror(NULL);
>                exit(1);
>        }
>        mapSize = ftell(fp);
>        if (mapSize == -1)
>        {
>                perror(NULL);
>                exit(1);
>        }
>        if (fseek(fp, 0, SEEK_SET) == -1)
>        {
>                perror(NULL);
>                exit(1);
>        }
>        result = mmap(NULL, mapSize, PROT_READ, MAP_PRIVATE, fd, 0);
>        if(close(fd) == -1)
>        {
>                perror(NULL);
>                exit(1);
>        }
>        if (result == MAP_FAILED)
>        {
>                perror(NULL);
>                exit(1);
>        }
>        printf("%zu bytes mmapped at %p. Press Enter... ", mapSize, result);
>        getchar();
> 
>        pch = (char *)result;
>        printf("Touching mapped memory... ");
>        mapRatio = 100.0 / mapSize;
>        for (i = 0; i < mapSize; i++)
>        {
>                ch = pch[i];
>                percent = (i + 1) * mapRatio;
>                if (current < percent)
>                {
>                        while (current < percent)
>                                current+= increment;
>                        if (current > percent)
>                                current -=increment;
>                        if (current < 100)
>                        {
>                                printf("%d%%... ", current);
>                                fflush(stdout);
>                                current+= increment;
>                        }
>                }
>        }
>        printf("100%%\nRead done. Press Enter... ");
>        getchar();
>        if(munmap(result, mapSize) == -1)
>        {
>                perror(NULL);
>                exit(1);
>        }
>        return 0;
> }
> 
> Anindya
> 
> 
> 
> From: Anindya Mukherjee <[email protected]>
> Sent: January 17, 2021 6:35 PM
> To: Otto Moerbeek <[email protected]>
> Cc: [email protected] <[email protected]>
> Subject: Re: Understanding memory statistics 
>  
> Hi,
> 
> I had a look at the code for top and some of the VM code. I think I have
> a few more answers.
> 
> The easiest one is the calculation for the tot number in top:
> https://github.com/openbsd/src/blob/d098acee57f5a5eacb13200c49034ecb8cbd8c29/usr.bin/top/machine.c#L293
> Here we see that it is calculated as the total page count - free page
> count.
> 
> If we calculate delta = tot - active - inactive - cache - wired, we see
> that it is still not zero. Typically it is a few hundred MB on my
> system. This might be some "dynamic" memory being allocated by the
> kernel? I don't know what that means :)
> 
> I am not totally sure, but from looking at the code, I suspect that the
> SIZE which ultimately comes from struct vmspace does not take into
> account shared memory mappings, or at least not all of them. The text,
> data, and stack sizes are added up here:
> https://github.com/openbsd/src/blob/d098acee57f5a5eacb13200c49034ecb8cbd8c29/usr.bin/top/machine.c#L70
> 
> However, I think the RES parameter also takes into account shared memory
> mappings. This can explain why it is often higher than SIZE,
> particularly for large programs.
> 
> Anindya
> 
> 
> 
> From: Anindya Mukherjee <[email protected]>
> Sent: January 12, 2021 3:22 PM
> To: Otto Moerbeek <[email protected]>
> Cc: [email protected] <[email protected]>
> Subject: Re: Understanding memory statistics 
>  
> Hi Otto,
> 
> Thank you for your kind reply and explanations. They helped me
> understand a few more things. I have some basic familiarity with the
> concepts but not so much with OpenBSD internals, although I have been
> using it. I need to research a bit more, but in my next reply I'll try
> to answer my questions, with some examples.
> 
> I love OpenBSD and can program in C, so I think given time I'll be able
> to make some contributions to it. I have been working on tmux and it's
> been a lot of fun. I got a lot of help and encouragement from Nicholas
> Mariott.
> 
> Best,
> Anindya
> 
> From: Otto Moerbeek <[email protected]>
> Sent: January 10, 2021 11:42 PM
> To: Anindya Mukherjee <[email protected]>
> Cc: [email protected] <[email protected]>
> Subject: Re: Understanding memory statistics 
>  
> On Sun, Jan 10, 2021 at 09:34:49PM +0000, Anindya Mukherjee wrote:
> 
> > Hi, I'm trying to understand the various numbers reported for memory
> > usage from top, vmstat, and systat. I'm running OpenBSD 6.8 on a Dell
> > Optiplex 7040 with an i7 6700, 3.4 Ghz and 32 GB RAM. The GPU is an
> > Intel HD Graphics 530, integrated. Everything is running smoothly. For
> > my own edification, I have a few questions. I searched the mailing lists
> > for similar questions in the past, and found some, but they did not
> > fully satisfy my curiosity.
> > 
> > dmesg reports:
> > real mem = 34201006080 (32616MB)
> > avail mem = 33149427712 (31613MB)
> > I think the difference is due to the GPU reserving some memory.
> 
> That might be, I think it at least includes mem used by the kernel
> for its code and static data.
> 
> > Q: Is there a way to view the total amount of video memory, the amount
> > currently being used, and the GPU usage?
> 
> AFAIK not. Some bioses have settings for the video mem used (if you
> have it shared with main mem).
> 
> > 
> > When I run top, it reports the following memory usage:
> > Memory: Real: 1497M/4672M act/tot Free: 26G Cache: 2236M Swap: 0K/11G
> > If I sum up the RES numbers for all the processes, it is close to the
> > act number = 1497 M (this is mostly due to Firefox). I read that the
> > cache number is included in tot, but even if I subtract cache and act
> > from tot there is 939 MB left.
> > Q: What is this 939 MB being used for, assuming the above makes sense?
> 
> inactive pages?
> 
> > Q: What is the cache number indicating exactly?
> 
> memoy used for file systemn caching.
> 
> > 
> > If I sum up tot + free * 1024 I get 31296 MB, which less than the 31613
> > MB of available memory reported by dmesg. I initially assumed that the
> > difference might be kernel wired memory. However the uvm view of systat
> > shows 7514 wired pages = approx 30 MB which is very small.
> > Q: What is the remaining memory being used for?
> 
> I think you are looking at dynamic allocations done by the kernel.
> 
> > Q: What is in kernel wired memory? In particular, is the file system
> > cache in kernel wired memory or in the cache number?
> 
> Kernel wired means data pages allocated by the kernel that will not be
> paged out. The file system mem will also not be paged out (when
> evecting those they are discarded if not dirty or written to the file
> if dirty) but the file system cache pages are not in the wired count.
> 
> > In the man page for systat(1) the active memory is described as being
> > used by processes active in the last 20 seconds (recently), while the
> > total is for all processes. These are the same two numbers as act and
> > tot in top, and act = avm as reported by vmstat. This confused me
> > because adding up the RES sizes of all the processes I get nowhere near
> > to tot (even after subtracting cache).
> 
> Accounting of shared pages is hard and ambiguous. To illustrate: if
> you switch on S in top, you'll see a bunch of kenel space processes al
> at SIZE 0 and the same RES size. They do share the same (kernel)
> memory.
> 
> > 
> > There is another thing that confused me in the top output. At first I
> > assumed that SIZE is the total virtual memory size of the process
> > (allocated), while RES is the resident size. For example, this is so on
> > Linux and hence in that case by definition SIZE should always be greater
> > than RES. However here in many cases SIZE < RES.
> 
> I am unsure how that is caused. It is possibly a shared pages thing.
> 
> > 
> > I read in the man page for top that SIZE is actually text + data + stack
> > for the process. However this did not clear up my confusion or
> > misunderstanding. Perhaps something to do with shared memory not being
> > counted?
> > Q: How can SIZE be less that RES? An example showing how this could
> > happen would be really helpful.
> 
> I guess doing some experimenting and code analysis and share your findings.
> 
> > 
> > Q: Finally, where can I find documentation about the classification for
> > memory pages (active, inactive, wired, etc.)? I suspect some digging
> > around in the source in order, but could use some pointers.
> 
> The start would be man uvm_init. But the rest is code.
> 
> > 
> > I hope these make sense and are not too pedantic. Looking forward to
> > comments from the experts, thanks!
> > 
> > Anindya Mukherjee
> > 
> 
>         -Otto

Reply via email to