Re: mlock() issues

2010-10-24 Thread Sad Clouds
On Fri, 22 Oct 2010 15:53:04 -0400
Matthew Mondor  wrote:

> Anyway, I like this kind of discussion and have nothing against NIH
> personally (it fuels variety and competition, in fact), so thanks for
> sharing your custom cache experiments and performance numbers.  If you
> happen to do achieve interesting performance along the above
> lines with mmap(2) as well, I'd also like to know how it went.
> 
> Thanks,
> -- 
> Matt

Hi, the application cache I've developed is using anonymous memory
mappings. It defines an abstract data type mpb_t, this is multi-page
buffer. The cache uses 20 different buffer pages (page sizes increase
in powers of 2, from 512B to 256M), in order to provide memory segments
for multi-page buffer.

For example mpb_t object of size 1.2K would be allocated 1K and 512B
buffer pages. An mpb_t object of size 1.8K would be allocated a single
2K buffer page.

I ran some benchmarks to compare NetBSD kernel file cache and the
application cache I've developed. This was run on dual Pentium 3
1.13GHz, with 2G of RAM.


Kernel file cache test:

uint64_t time1, time2;
void *buffer = malloc(8M);

time1 = get current time;
for each file under /usr/src
{
open file;
read file into buffer;
close file
}
time2 = get current time;
print time2 - time1;



Application cache test:

uint64_t time1, time2;

for each file under /usr/src
{
load file into application cache;
}

time1 = get current time;
for each file in application cache
{
fd = open("dev/null", ...);
write(fd, cache_buffer, ...);
close(fd);
}
time2 = get current time;
print time2 - time1;


In order to be fair, I kept the number of open/close system calls in
each test loop the same. Kernel file cache test was run about 4 times,
to make sure all files under /usr/src were loaded into cache, and then
the lowest time difference was taken.

The results are:

Kernel file cache time difference - 15253 msec.
Application cache time difference - 2784 msec.

Copying data from application cache was about 5.5 times faster. On
Solaris (default installation, i.e. no tuning) the time difference
for kernel file cache test was so huge, I didn't even bother writing the
results.


Re: mlock() issues

2010-10-22 Thread Matthew Mondor
On Fri, 22 Oct 2010 12:06:37 +0100
Sad Clouds  wrote:

> Well if you're allocating memory yourself, then you've just created your
> own application cache.

Say many files were mapped in the process's address space, the OS would
still be responsible of keeping frequently used ones pages active,
possibly swapping out long-unused ones, unless of course MAP_WIRED was
used.  A syscall per access would be eliminated however, i.e. read(2),
and I think that zero-copy could be used (with page loaning) if writing
64KB blocks out to a socket from a memory-mapped file.

> On the other hand if you mmap() those files
> directly, what happens if another process truncates some of those files
> while you're reading them?

I didn't do a test (it's definitely worth testing), but I think that a
SIGSEGV could occur if a previously available page disappeared unless
MAP_COPY, and file need to be remapped.

I could see a problem where a siginfo-provided address might need to be
easily matched with the file so that the process can efficiently know
which file to remap...  and for many files the current kqueue(2)
EVFILT_VNODE isn't very useful either to detect that a file was
recently modified, as it'd require too many open file descriptors :(

There was some discussion made years ago about a kqueue(2) filter that
could be set on a directory under which any modified file (possibly for
the whole involved filesystem for the superuser) would generate an
event with information about which file is modified by inode, but this
seems non-trivial and wasn't yet implemented.  There also are issues
with inode to file string lookup (multiple files could point to a
common destination, and a reverse name cache is needed).

Anyway, I like this kind of discussion and have nothing against NIH
personally (it fuels variety and competition, in fact), so thanks for
sharing your custom cache experiments and performance numbers.  If you
happen to do achieve interesting performance along the above
lines with mmap(2) as well, I'd also like to know how it went.

Thanks,
-- 
Matt


Re: mlock() issues

2010-10-22 Thread Sad Clouds
On Fri, 22 Oct 2010 05:54:48 -0400
Matthew Mondor  wrote:

> On Fri, 22 Oct 2010 10:18:52 +0100
> Sad Clouds  wrote:
> 
> > A pipelined request, say for 10 small files can be served with a
> > single writev() system call (provided those files are cached in
> > RAM), if you rely on kernel file cache, you need to issue 10 write
> > () system calls.
> 
> Is this also true if the 10 iovecs point to mmap(2)ed files/buffers
> which pages were recently accessed?

Well if you're allocating memory yourself, then you've just created your
own application cache. On the other hand if you mmap() those files
directly, what happens if another process truncates some of those files
while you're reading them?


Re: mlock() issues

2010-10-22 Thread Matthew Mondor
On Fri, 22 Oct 2010 10:18:52 +0100
Sad Clouds  wrote:

> A pipelined request, say for 10 small files can be served with a single
> writev() system call (provided those files are cached in RAM), if you
> rely on kernel file cache, you need to issue 10 write() system calls.

Is this also true if the 10 iovecs point to mmap(2)ed files/buffers
which pages were recently accessed?
-- 
Matt


Re: mlock() issues

2010-10-22 Thread Sad Clouds
On Fri, 22 Oct 2010 08:13:34 +0200
Michael van Elst  wrote:

> On Thu, Oct 21, 2010 at 10:40:15PM +0100, Sad Clouds wrote:
> 
> > I do realise this reinvents kernel file cache, but it gives you a
> > lot more flexibility over what files get cached in memory and you
> > can plug custom algorithms over how files get evicted from cache.
> 
> NIH is the driving force for many such decisions.

You make it sound like it's a really bad thing. My opinion - it's good
to invent or even reinvent, because sometimes "one wheel fits all"
solution is not as optimal or flexible as a custom made solution. For
example, take HTTP protocol that allows file requests to be pipelined.
A pipelined request, say for 10 small files can be served with a single
writev() system call (provided those files are cached in RAM), if you
rely on kernel file cache, you need to issue 10 write() system calls.

I ran some simple benchmarks and they showed that on NetBSD copying data
from application file cache was 2 to 4 times faster than relying on
kernel file cache.

On Linux, copying data from application file cache was 35 times faster
than using sendfile(). This result looks a bit bogus, but I ran it a
few times and got the same results...

Also, as far as I know the only way to tell if kernel cache has file
cached in memory it to call mincore() system call, which is expensive.

With application cache that locks file pages, simple hash table
lookup will indicate if the file is present in memory.


Re: mlock() issues

2010-10-21 Thread Michael van Elst
On Thu, Oct 21, 2010 at 10:40:15PM +0100, Sad Clouds wrote:

> I do realise this reinvents kernel file cache, but it gives you a lot
> more flexibility over what files get cached in memory and you can plug
> custom algorithms over how files get evicted from cache.

NIH is the driving force for many such decisions.


-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: mlock() issues

2010-10-21 Thread Sad Clouds
On Thu, 21 Oct 2010 19:13:13 +0100
David Laight  wrote:

> A non-root user can then increase its own limit to 1/3 physmem, and
> root can change its own 'hard' and 'soft' limits to any value it
> cares.

I think for some applications, having control over locking the entire
physical memory can be a significant advantage. For example, for a
network file server you can design a caching subsystem, which caches
frequently accesses files and locks them in memory. On Linux or
Solaris, you can mmap() those memory segments with larger pages to
reduce TLB misses.

The main benefit of locking is it guarantees those memory pages have
not been flushed to disk, so your main threads never need to block. Any
access to files not in your cache is done asynchronously, via a
threaded I/O subsystem.

I do realise this reinvents kernel file cache, but it gives you a lot
more flexibility over what files get cached in memory and you can plug
custom algorithms over how files get evicted from cache.


Re: mlock() issues

2010-10-21 Thread David Laight
On Wed, Oct 20, 2010 at 11:17:17PM +0100, Sad Clouds wrote:
> On Thu, 21 Oct 2010 00:02:53 +0200
> Michael van Elst  wrote:
> 
> > The UVM limit is global to all processes, obviously there should
> > be such a limit to keep the system provided with unlocked pages.
> > 
> > You could probably make this configurable, but so far nobody had
> > the need to lock a large part of memory and to adjust that limit.
> 
> Well I accept that you need some unlocked pages to keep things running,
> however hardcoding the limit to 1/3 is a bit extreme. I thought this
> was the whole point of sysctl/rlimit settings, i.e. I'm running as
> root, I know what I'm doing, if I want to lock 95% of physical memory,
> then let me do it.
> 
> Anyway, thanks for demystifying the issue.

As always the rlimit values are fubar
For a normal user the 'hard' limit should (well to match current
expectations) be set to physmem/3 with the 'soft' limit probably
set to a relatively small value so that mistakes are detected soon.

A non-root user can then increase its own limit to 1/3 physmem, and
root can change its own 'hard' and 'soft' limits to any value it
cares.

Even for the simple case of fds, the default 'hard' limit is far too
big (matches an internal kernal limit).

David

-- 
David Laight: da...@l8s.co.uk


Re: mlock() issues

2010-10-20 Thread Sad Clouds
On Thu, 21 Oct 2010 00:02:53 +0200
Michael van Elst  wrote:

> The UVM limit is global to all processes, obviously there should
> be such a limit to keep the system provided with unlocked pages.
> 
> You could probably make this configurable, but so far nobody had
> the need to lock a large part of memory and to adjust that limit.

Well I accept that you need some unlocked pages to keep things running,
however hardcoding the limit to 1/3 is a bit extreme. I thought this
was the whole point of sysctl/rlimit settings, i.e. I'm running as
root, I know what I'm doing, if I want to lock 95% of physical memory,
then let me do it.

Anyway, thanks for demystifying the issue.


Re: mlock() issues

2010-10-20 Thread Michael van Elst
On Wed, Oct 20, 2010 at 10:55:46PM +0100, Sad Clouds wrote:
> On Wed, 20 Oct 2010 20:06:41 + (UTC)
> mlel...@serpens.de (Michael van Elst) wrote:
> 
> > The soft rlimit and the UVM limit happen to be the same size,
> > which is one third of the real memory.
> > 
> > uvm_pdaemon.c:
> >uvmexp.wiredmax = uvmexp.npages / 3;
> > 
> > kern_proc.c:
> >lim = MIN(VM_MAXUSER_ADDRESS, ctob((rlim_t)uvmexp.free));
> >...
> >limit0.pl_rlimit[RLIMIT_MEMLOCK].rlim_cur = lim / 3;
> 
> OK, if I understand you correctly, there is hardcoded limit on how much
> memory can be locked, it is set to 1/3 of total memory and cannot be
> changed.
> 
> If that's the case, then why supplement soft and hard rlimit with UVM
> limit?


The UVM limit is global to all processes, obviously there should
be such a limit to keep the system provided with unlocked pages.

You could probably make this configurable, but so far nobody had
the need to lock a large part of memory and to adjust that limit.


-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: mlock() issues

2010-10-20 Thread Sad Clouds
On Wed, 20 Oct 2010 20:06:41 + (UTC)
mlel...@serpens.de (Michael van Elst) wrote:

> The soft rlimit and the UVM limit happen to be the same size,
> which is one third of the real memory.
> 
> uvm_pdaemon.c:
>uvmexp.wiredmax = uvmexp.npages / 3;
> 
> kern_proc.c:
>lim = MIN(VM_MAXUSER_ADDRESS, ctob((rlim_t)uvmexp.free));
>...
>limit0.pl_rlimit[RLIMIT_MEMLOCK].rlim_cur = lim / 3;

OK, if I understand you correctly, there is hardcoded limit on how much
memory can be locked, it is set to 1/3 of total memory and cannot be
changed.

If that's the case, then why supplement soft and hard rlimit with UVM
limit?


Re: mlock() issues

2010-10-20 Thread Michael van Elst
cryintotheblue...@googlemail.com (Sad Clouds) writes:

>Hi, I've been trying to figure out why it's not possible to lock more
>than 666MB of memory, and I'm beginning to think it might be a kernel
>issue.

>This is what I'm doing:

>Run program as root.
>Lock only memory segments that are multiples of system page size.
>ulimit -l is set to unlimited.
>proc.curproc.rlimit.memorylocked.soft = 697976149
>proc.curproc.rlimit.memorylocked.hard = 2093928448

>With all of the above set, for some reason it's not possible to lock
>more than 666MB.

That's what your soft limit is set to.

On my -current/amd64 system I have:

memorylocked 2704186 kbytes
proc.curproc.rlimit.memorylocked.soft = 2769087146
proc.curproc.rlimit.memorylocked.hard = 8307261440

and a programm running under mlockall(MLC_CURRENT|MCL_FUTURE)
can allocate about 2.6GB. So far, so fine.

However, when I set the limit to 6GB (6144m):

memorylocked 6291456 kbytes
proc.curproc.rlimit.memorylocked.soft = 6442450944
proc.curproc.rlimit.memorylocked.hard = 8307261440

this has no effect and the program still can only allocate 2.6GB.

The reason for this is that there is a global UVM limit:

% vmstat -s|grep wired
 3177 pages wired
   676717 maximum wired pages

The soft rlimit and the UVM limit happen to be the same size,
which is one third of the real memory.

uvm_pdaemon.c:
   uvmexp.wiredmax = uvmexp.npages / 3;

kern_proc.c:
   lim = MIN(VM_MAXUSER_ADDRESS, ctob((rlim_t)uvmexp.free));
   ...
   limit0.pl_rlimit[RLIMIT_MEMLOCK].rlim_cur = lim / 3;

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: mlock() issues

2010-10-20 Thread Christos Zoulas
In article <20101020182953.752bfd63.cryintotheblue...@googlemail.com>,
Sad Clouds   wrote:
>Hi, I've been trying to figure out why it's not possible to lock more
>than 666MB of memory, and I'm beginning to think it might be a kernel
>issue.
>
>This is what I'm doing:
>
>Run program as root.
>Lock only memory segments that are multiples of system page size.
>ulimit -l is set to unlimited.
>proc.curproc.rlimit.memorylocked.soft = 697976149
>proc.curproc.rlimit.memorylocked.hard = 2093928448
>
>With all of the above set, for some reason it's not possible to lock
>more than 666MB.
>
>Can anyone shed some light on this?
>

ulimit -S -l 2093928448

christos



Re: mlock() issues

2010-10-20 Thread der Mouse
> proc.curproc.rlimit.memorylocked.soft = 697976149

> With all of the above set, for some reason it's not possible to lock
> more than 666MB.

Well, 697976149 bytes is 665.6419+ MB, so it sounds to me as though
it's doing exactly what it should be.

Unless you're a disk manufacturer, in which case 697976149 bytes is
697+ "MB", but I suspect you're locking 666 MB, not 666 "MB".

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: mlock() issues

2010-10-20 Thread Masao Uebayashi
On Wed, Oct 20, 2010 at 06:29:53PM +0100, Sad Clouds wrote:
> Hi, I've been trying to figure out why it's not possible to lock more
> than 666MB of memory, and I'm beginning to think it might be a kernel
> issue.
> 
> This is what I'm doing:
> 
> Run program as root.
> Lock only memory segments that are multiples of system page size.
> ulimit -l is set to unlimited.
> proc.curproc.rlimit.memorylocked.soft = 697976149
> proc.curproc.rlimit.memorylocked.hard = 2093928448
> 
> With all of the above set, for some reason it's not possible to lock
> more than 666MB.
> 
> Can anyone shed some light on this?

IIRCC:

- Those two (soft/hard) numbers are calculated at process 0
  initialization.

- setrlimit(2) doesn't seem to handle RLIMIT_MEMLOCK.

- mlock(2)/mlockall(2) refer to the "soft" counterpart.

What if you change "proc.curproc.rlimit.memorylocked.soft" sysctl
value from within the program?


mlock() issues

2010-10-20 Thread Sad Clouds
Hi, I've been trying to figure out why it's not possible to lock more
than 666MB of memory, and I'm beginning to think it might be a kernel
issue.

This is what I'm doing:

Run program as root.
Lock only memory segments that are multiples of system page size.
ulimit -l is set to unlimited.
proc.curproc.rlimit.memorylocked.soft = 697976149
proc.curproc.rlimit.memorylocked.hard = 2093928448

With all of the above set, for some reason it's not possible to lock
more than 666MB.

Can anyone shed some light on this?