Re: mlock() issues
On Fri, 22 Oct 2010 15:53:04 -0400 Matthew Mondor wrote: > Anyway, I like this kind of discussion and have nothing against NIH > personally (it fuels variety and competition, in fact), so thanks for > sharing your custom cache experiments and performance numbers. If you > happen to do achieve interesting performance along the above > lines with mmap(2) as well, I'd also like to know how it went. > > Thanks, > -- > Matt Hi, the application cache I've developed is using anonymous memory mappings. It defines an abstract data type mpb_t, this is multi-page buffer. The cache uses 20 different buffer pages (page sizes increase in powers of 2, from 512B to 256M), in order to provide memory segments for multi-page buffer. For example mpb_t object of size 1.2K would be allocated 1K and 512B buffer pages. An mpb_t object of size 1.8K would be allocated a single 2K buffer page. I ran some benchmarks to compare NetBSD kernel file cache and the application cache I've developed. This was run on dual Pentium 3 1.13GHz, with 2G of RAM. Kernel file cache test: uint64_t time1, time2; void *buffer = malloc(8M); time1 = get current time; for each file under /usr/src { open file; read file into buffer; close file } time2 = get current time; print time2 - time1; Application cache test: uint64_t time1, time2; for each file under /usr/src { load file into application cache; } time1 = get current time; for each file in application cache { fd = open("dev/null", ...); write(fd, cache_buffer, ...); close(fd); } time2 = get current time; print time2 - time1; In order to be fair, I kept the number of open/close system calls in each test loop the same. Kernel file cache test was run about 4 times, to make sure all files under /usr/src were loaded into cache, and then the lowest time difference was taken. The results are: Kernel file cache time difference - 15253 msec. Application cache time difference - 2784 msec. Copying data from application cache was about 5.5 times faster. On Solaris (default installation, i.e. no tuning) the time difference for kernel file cache test was so huge, I didn't even bother writing the results.
Re: mlock() issues
On Fri, 22 Oct 2010 12:06:37 +0100 Sad Clouds wrote: > Well if you're allocating memory yourself, then you've just created your > own application cache. Say many files were mapped in the process's address space, the OS would still be responsible of keeping frequently used ones pages active, possibly swapping out long-unused ones, unless of course MAP_WIRED was used. A syscall per access would be eliminated however, i.e. read(2), and I think that zero-copy could be used (with page loaning) if writing 64KB blocks out to a socket from a memory-mapped file. > On the other hand if you mmap() those files > directly, what happens if another process truncates some of those files > while you're reading them? I didn't do a test (it's definitely worth testing), but I think that a SIGSEGV could occur if a previously available page disappeared unless MAP_COPY, and file need to be remapped. I could see a problem where a siginfo-provided address might need to be easily matched with the file so that the process can efficiently know which file to remap... and for many files the current kqueue(2) EVFILT_VNODE isn't very useful either to detect that a file was recently modified, as it'd require too many open file descriptors :( There was some discussion made years ago about a kqueue(2) filter that could be set on a directory under which any modified file (possibly for the whole involved filesystem for the superuser) would generate an event with information about which file is modified by inode, but this seems non-trivial and wasn't yet implemented. There also are issues with inode to file string lookup (multiple files could point to a common destination, and a reverse name cache is needed). Anyway, I like this kind of discussion and have nothing against NIH personally (it fuels variety and competition, in fact), so thanks for sharing your custom cache experiments and performance numbers. If you happen to do achieve interesting performance along the above lines with mmap(2) as well, I'd also like to know how it went. Thanks, -- Matt
Re: mlock() issues
On Fri, 22 Oct 2010 05:54:48 -0400 Matthew Mondor wrote: > On Fri, 22 Oct 2010 10:18:52 +0100 > Sad Clouds wrote: > > > A pipelined request, say for 10 small files can be served with a > > single writev() system call (provided those files are cached in > > RAM), if you rely on kernel file cache, you need to issue 10 write > > () system calls. > > Is this also true if the 10 iovecs point to mmap(2)ed files/buffers > which pages were recently accessed? Well if you're allocating memory yourself, then you've just created your own application cache. On the other hand if you mmap() those files directly, what happens if another process truncates some of those files while you're reading them?
Re: mlock() issues
On Fri, 22 Oct 2010 10:18:52 +0100 Sad Clouds wrote: > A pipelined request, say for 10 small files can be served with a single > writev() system call (provided those files are cached in RAM), if you > rely on kernel file cache, you need to issue 10 write() system calls. Is this also true if the 10 iovecs point to mmap(2)ed files/buffers which pages were recently accessed? -- Matt
Re: mlock() issues
On Fri, 22 Oct 2010 08:13:34 +0200 Michael van Elst wrote: > On Thu, Oct 21, 2010 at 10:40:15PM +0100, Sad Clouds wrote: > > > I do realise this reinvents kernel file cache, but it gives you a > > lot more flexibility over what files get cached in memory and you > > can plug custom algorithms over how files get evicted from cache. > > NIH is the driving force for many such decisions. You make it sound like it's a really bad thing. My opinion - it's good to invent or even reinvent, because sometimes "one wheel fits all" solution is not as optimal or flexible as a custom made solution. For example, take HTTP protocol that allows file requests to be pipelined. A pipelined request, say for 10 small files can be served with a single writev() system call (provided those files are cached in RAM), if you rely on kernel file cache, you need to issue 10 write() system calls. I ran some simple benchmarks and they showed that on NetBSD copying data from application file cache was 2 to 4 times faster than relying on kernel file cache. On Linux, copying data from application file cache was 35 times faster than using sendfile(). This result looks a bit bogus, but I ran it a few times and got the same results... Also, as far as I know the only way to tell if kernel cache has file cached in memory it to call mincore() system call, which is expensive. With application cache that locks file pages, simple hash table lookup will indicate if the file is present in memory.
Re: mlock() issues
On Thu, Oct 21, 2010 at 10:40:15PM +0100, Sad Clouds wrote: > I do realise this reinvents kernel file cache, but it gives you a lot > more flexibility over what files get cached in memory and you can plug > custom algorithms over how files get evicted from cache. NIH is the driving force for many such decisions. -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: mlock() issues
On Thu, 21 Oct 2010 19:13:13 +0100 David Laight wrote: > A non-root user can then increase its own limit to 1/3 physmem, and > root can change its own 'hard' and 'soft' limits to any value it > cares. I think for some applications, having control over locking the entire physical memory can be a significant advantage. For example, for a network file server you can design a caching subsystem, which caches frequently accesses files and locks them in memory. On Linux or Solaris, you can mmap() those memory segments with larger pages to reduce TLB misses. The main benefit of locking is it guarantees those memory pages have not been flushed to disk, so your main threads never need to block. Any access to files not in your cache is done asynchronously, via a threaded I/O subsystem. I do realise this reinvents kernel file cache, but it gives you a lot more flexibility over what files get cached in memory and you can plug custom algorithms over how files get evicted from cache.
Re: mlock() issues
On Wed, Oct 20, 2010 at 11:17:17PM +0100, Sad Clouds wrote: > On Thu, 21 Oct 2010 00:02:53 +0200 > Michael van Elst wrote: > > > The UVM limit is global to all processes, obviously there should > > be such a limit to keep the system provided with unlocked pages. > > > > You could probably make this configurable, but so far nobody had > > the need to lock a large part of memory and to adjust that limit. > > Well I accept that you need some unlocked pages to keep things running, > however hardcoding the limit to 1/3 is a bit extreme. I thought this > was the whole point of sysctl/rlimit settings, i.e. I'm running as > root, I know what I'm doing, if I want to lock 95% of physical memory, > then let me do it. > > Anyway, thanks for demystifying the issue. As always the rlimit values are fubar For a normal user the 'hard' limit should (well to match current expectations) be set to physmem/3 with the 'soft' limit probably set to a relatively small value so that mistakes are detected soon. A non-root user can then increase its own limit to 1/3 physmem, and root can change its own 'hard' and 'soft' limits to any value it cares. Even for the simple case of fds, the default 'hard' limit is far too big (matches an internal kernal limit). David -- David Laight: da...@l8s.co.uk
Re: mlock() issues
On Thu, 21 Oct 2010 00:02:53 +0200 Michael van Elst wrote: > The UVM limit is global to all processes, obviously there should > be such a limit to keep the system provided with unlocked pages. > > You could probably make this configurable, but so far nobody had > the need to lock a large part of memory and to adjust that limit. Well I accept that you need some unlocked pages to keep things running, however hardcoding the limit to 1/3 is a bit extreme. I thought this was the whole point of sysctl/rlimit settings, i.e. I'm running as root, I know what I'm doing, if I want to lock 95% of physical memory, then let me do it. Anyway, thanks for demystifying the issue.
Re: mlock() issues
On Wed, Oct 20, 2010 at 10:55:46PM +0100, Sad Clouds wrote: > On Wed, 20 Oct 2010 20:06:41 + (UTC) > mlel...@serpens.de (Michael van Elst) wrote: > > > The soft rlimit and the UVM limit happen to be the same size, > > which is one third of the real memory. > > > > uvm_pdaemon.c: > >uvmexp.wiredmax = uvmexp.npages / 3; > > > > kern_proc.c: > >lim = MIN(VM_MAXUSER_ADDRESS, ctob((rlim_t)uvmexp.free)); > >... > >limit0.pl_rlimit[RLIMIT_MEMLOCK].rlim_cur = lim / 3; > > OK, if I understand you correctly, there is hardcoded limit on how much > memory can be locked, it is set to 1/3 of total memory and cannot be > changed. > > If that's the case, then why supplement soft and hard rlimit with UVM > limit? The UVM limit is global to all processes, obviously there should be such a limit to keep the system provided with unlocked pages. You could probably make this configurable, but so far nobody had the need to lock a large part of memory and to adjust that limit. -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: mlock() issues
On Wed, 20 Oct 2010 20:06:41 + (UTC) mlel...@serpens.de (Michael van Elst) wrote: > The soft rlimit and the UVM limit happen to be the same size, > which is one third of the real memory. > > uvm_pdaemon.c: >uvmexp.wiredmax = uvmexp.npages / 3; > > kern_proc.c: >lim = MIN(VM_MAXUSER_ADDRESS, ctob((rlim_t)uvmexp.free)); >... >limit0.pl_rlimit[RLIMIT_MEMLOCK].rlim_cur = lim / 3; OK, if I understand you correctly, there is hardcoded limit on how much memory can be locked, it is set to 1/3 of total memory and cannot be changed. If that's the case, then why supplement soft and hard rlimit with UVM limit?
Re: mlock() issues
cryintotheblue...@googlemail.com (Sad Clouds) writes: >Hi, I've been trying to figure out why it's not possible to lock more >than 666MB of memory, and I'm beginning to think it might be a kernel >issue. >This is what I'm doing: >Run program as root. >Lock only memory segments that are multiples of system page size. >ulimit -l is set to unlimited. >proc.curproc.rlimit.memorylocked.soft = 697976149 >proc.curproc.rlimit.memorylocked.hard = 2093928448 >With all of the above set, for some reason it's not possible to lock >more than 666MB. That's what your soft limit is set to. On my -current/amd64 system I have: memorylocked 2704186 kbytes proc.curproc.rlimit.memorylocked.soft = 2769087146 proc.curproc.rlimit.memorylocked.hard = 8307261440 and a programm running under mlockall(MLC_CURRENT|MCL_FUTURE) can allocate about 2.6GB. So far, so fine. However, when I set the limit to 6GB (6144m): memorylocked 6291456 kbytes proc.curproc.rlimit.memorylocked.soft = 6442450944 proc.curproc.rlimit.memorylocked.hard = 8307261440 this has no effect and the program still can only allocate 2.6GB. The reason for this is that there is a global UVM limit: % vmstat -s|grep wired 3177 pages wired 676717 maximum wired pages The soft rlimit and the UVM limit happen to be the same size, which is one third of the real memory. uvm_pdaemon.c: uvmexp.wiredmax = uvmexp.npages / 3; kern_proc.c: lim = MIN(VM_MAXUSER_ADDRESS, ctob((rlim_t)uvmexp.free)); ... limit0.pl_rlimit[RLIMIT_MEMLOCK].rlim_cur = lim / 3; -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: mlock() issues
In article <20101020182953.752bfd63.cryintotheblue...@googlemail.com>, Sad Clouds wrote: >Hi, I've been trying to figure out why it's not possible to lock more >than 666MB of memory, and I'm beginning to think it might be a kernel >issue. > >This is what I'm doing: > >Run program as root. >Lock only memory segments that are multiples of system page size. >ulimit -l is set to unlimited. >proc.curproc.rlimit.memorylocked.soft = 697976149 >proc.curproc.rlimit.memorylocked.hard = 2093928448 > >With all of the above set, for some reason it's not possible to lock >more than 666MB. > >Can anyone shed some light on this? > ulimit -S -l 2093928448 christos
Re: mlock() issues
> proc.curproc.rlimit.memorylocked.soft = 697976149 > With all of the above set, for some reason it's not possible to lock > more than 666MB. Well, 697976149 bytes is 665.6419+ MB, so it sounds to me as though it's doing exactly what it should be. Unless you're a disk manufacturer, in which case 697976149 bytes is 697+ "MB", but I suspect you're locking 666 MB, not 666 "MB". /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: mlock() issues
On Wed, Oct 20, 2010 at 06:29:53PM +0100, Sad Clouds wrote: > Hi, I've been trying to figure out why it's not possible to lock more > than 666MB of memory, and I'm beginning to think it might be a kernel > issue. > > This is what I'm doing: > > Run program as root. > Lock only memory segments that are multiples of system page size. > ulimit -l is set to unlimited. > proc.curproc.rlimit.memorylocked.soft = 697976149 > proc.curproc.rlimit.memorylocked.hard = 2093928448 > > With all of the above set, for some reason it's not possible to lock > more than 666MB. > > Can anyone shed some light on this? IIRCC: - Those two (soft/hard) numbers are calculated at process 0 initialization. - setrlimit(2) doesn't seem to handle RLIMIT_MEMLOCK. - mlock(2)/mlockall(2) refer to the "soft" counterpart. What if you change "proc.curproc.rlimit.memorylocked.soft" sysctl value from within the program?
mlock() issues
Hi, I've been trying to figure out why it's not possible to lock more than 666MB of memory, and I'm beginning to think it might be a kernel issue. This is what I'm doing: Run program as root. Lock only memory segments that are multiples of system page size. ulimit -l is set to unlimited. proc.curproc.rlimit.memorylocked.soft = 697976149 proc.curproc.rlimit.memorylocked.hard = 2093928448 With all of the above set, for some reason it's not possible to lock more than 666MB. Can anyone shed some light on this?