Re: problems with mmap() and disk caching

2012-04-09 Thread Andrey Zonov

On 06.04.2012 12:13, Konstantin Belousov wrote:

On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote:

On 05.04.2012 23:41, Konstantin Belousov wrote:

On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote:

On 05.04.2012 19:54, Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

[snip]

This is what I expect. But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.


I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years. Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

pmap_remove_all(mt);
if (mt-dirty != 0)
vm_page_deactivate(mt);
else
vm_page_cache(mt);

to:

vm_page_dontneed(mt);



Thanks Alan!  Now it works as I expect!

But I have more questions to you and kib@.  They are in my test below.

So, prepare file as earlier, and take information about memory usage

from top(1).  After preparation, but before test:

Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free

First run:
$ ./mmap /mnt/random
mmap:  1 pass took:   7.462865 (none:  0; res: 262144; super:
0; other:  0)

No super pages after first run, why?..

Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free

Now the file is in inactive memory, that's good.

Second run:
$ ./mmap /mnt/random
mmap:  1 pass took:   0.004191 (none:  0; res: 262144; super:
511; other:  0)

All super pages are here, nice.

Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free

Wow, all inactive pages moved to active and sit there even after process
was terminated, that's not good, what do you think?

Why do you think this is 'not good' ? You have plenty of free memory,
there is no memory pressure, and all pages were referenced recently.
THere is no reason for them to be deactivated.



I always thought that active memory this is a sum of resident memory of
all processes, inactive shows disk cache and wired shows kernel itself.

So you are wrong. Both active and inactive memory can be mapped and
not mapped, both can belong to vnode or to anonymous objects etc.
Active/inactive distinction is only the amount of references that was
noted by pagedaemon, or some other page history like the way it was
unwired.

Wired is not neccessary means kernel-used pages, user processes can
wire their pages as well.


Let's talk about that in details.

My understanding is the following:

Active memory: the memory which is referenced by application.  An 
application may get memory only through mmap() (allocator don't use 
brk()/sbrk() any more).  The resident memory of an application is the 
sum of physical used memory.  So, sum of RSS is active memory.


Inactive memory: the memory which has no references.  Once we call 
read() on the file, the file is in inactive memory, because we have no 
references to this object, we just read it.  This is also released 
memory by free().


Cache memory: I don't know what is it. It's always small enough to not 
think about it.


Wired memory: kernel memory and yes, application may get wired memory 
through mlock()/mlockall(), but I haven't seen any real application 
which calls mlock().






Read the file:
$ cat /mnt/random   /dev/null

Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free

Now the file is in wired memory.  I do not understand why so.

You do use UFS, right ?


Yes.


There is enough buffer headers and buffer KVA
to have buffers allocated for the whole file content. Since buffers wire
corresponding pages, you get pages migrated to wired.

When there appears a buffer pressure (i.e., any other i/o started),
the buffers will be repurposed and pages moved to inactive.



OK, how can I get amount of disk cache?

You cannot. At least I am not aware of any counter that keeps track
of the resident pages belonging to vnode pager.

Buffers should not be thought as disk cache, pages cache disk content.
Instead, VMIO buffers only provide bread()/bwrite() compatible interface
to the page cache (*) for filesystems.
(*) - The cache term is used in generic term, not to confuse with
cached pages counter from top etc.



Yes, I know that.  I try once again to ask my question about buffers. 
Is this reasonable to use for them 10% of the physical memory or we may 
set rational upper limit automatically?






Could you please give me explanation about active/inactive/wired memory?



because I suspect that the current code does more harm than good. In
theory, it saves activations of the page daemon. However, more often
than not, I suspect that we are spending more on page reactivations than
we are saving on page daemon activations. The sequential access
detection heuristic is just too easily triggered. For example, I've seen
it triggered by demand paging of the gcc text segment. Also, I think
that pmap_remove_all() and especially 

Re: problems with mmap() and disk caching

2012-04-09 Thread Konstantin Belousov
On Mon, Apr 09, 2012 at 11:17:41AM +0400, Andrey Zonov wrote:
 On 06.04.2012 12:13, Konstantin Belousov wrote:
 On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote:
 On 05.04.2012 23:41, Konstantin Belousov wrote:
 On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote:
 On 05.04.2012 19:54, Alan Cox wrote:
 On 04/04/2012 02:17, Konstantin Belousov wrote:
 On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:
 [snip]
 This is what I expect. But why this doesn't work without reading file
 manually?
 Issue seems to be in some change of the behaviour of the reserv or
 phys allocator. I Cc:ed Alan.
 
 I'm pretty sure that the behavior here hasn't significantly changed in
 about twelve years. Otherwise, I agree with your analysis.
 
 On more than one occasion, I've been tempted to change:
 
 pmap_remove_all(mt);
 if (mt-dirty != 0)
 vm_page_deactivate(mt);
 else
 vm_page_cache(mt);
 
 to:
 
 vm_page_dontneed(mt);
 
 
 Thanks Alan!  Now it works as I expect!
 
 But I have more questions to you and kib@.  They are in my test below.
 
 So, prepare file as earlier, and take information about memory usage
 from top(1).  After preparation, but before test:
 Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free
 
 First run:
 $ ./mmap /mnt/random
 mmap:  1 pass took:   7.462865 (none:  0; res: 262144; super:
 0; other:  0)
 
 No super pages after first run, why?..
 
 Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free
 
 Now the file is in inactive memory, that's good.
 
 Second run:
 $ ./mmap /mnt/random
 mmap:  1 pass took:   0.004191 (none:  0; res: 262144; super:
 511; other:  0)
 
 All super pages are here, nice.
 
 Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free
 
 Wow, all inactive pages moved to active and sit there even after process
 was terminated, that's not good, what do you think?
 Why do you think this is 'not good' ? You have plenty of free memory,
 there is no memory pressure, and all pages were referenced recently.
 THere is no reason for them to be deactivated.
 
 
 I always thought that active memory this is a sum of resident memory of
 all processes, inactive shows disk cache and wired shows kernel itself.
 So you are wrong. Both active and inactive memory can be mapped and
 not mapped, both can belong to vnode or to anonymous objects etc.
 Active/inactive distinction is only the amount of references that was
 noted by pagedaemon, or some other page history like the way it was
 unwired.
 
 Wired is not neccessary means kernel-used pages, user processes can
 wire their pages as well.
 
 Let's talk about that in details.
 
 My understanding is the following:
 
 Active memory: the memory which is referenced by application.  An 
Assuming the part 'by application' is removed, this sentence is almost right.
Any managed mapping of the page participates in the active references.

 application may get memory only through mmap() (allocator don't use 
 brk()/sbrk() any more).  The resident memory of an application is the 
 sum of physical used memory.  So, sum of RSS is active memory.
First, brk/sbrk is still used. Second, there is no requirement that
resident pages are referenced. E.g. page could have participated in the
buffer, and unwiring on the buffer dissolve put it into inactive state.
Or pagedaemon cleared the reference and moved the page to inactive queue.
Or the page was prefaulted by different optimizations.

More, there is subtle difference between 'resident' and 'not causing fault
on access'. Page may be resident, but pte was not preinstalled, or pte
was flushed etc.
 
 Inactive memory: the memory which has no references.  Once we call 
 read() on the file, the file is in inactive memory, because we have no 
 references to this object, we just read it.  This is also released 
 memory by free().
On buffers dissolve, buffer cache explicitely puts pages constituing 
the buffer, into the inactive queue. In fact, this is not quite right,
e.g. if the same pages are mapped and actively referenced, then
pagedaemon has slightly more work now to move the page from inactive
to active.

And, free(3) operates at so much higher level then vm subsystem that
describing the interaction between these two is impossible in any
definitive mood. Old naive mallocs put block description at the beggining
of the block, actually causing free() to reference at least the first
page of the block. Jemalloc often does madvise(MADV_FREE) for large
freed allocations. MADV_FREE  moves pages between queues probabalistically.

 
 Cache memory: I don't know what is it. It's always small enough to not 
 think about it.
This was the bug you reported, and which Alan fixed on Sunday.

 
 Wired memory: kernel memory and yes, application may get wired memory 
 through mlock()/mlockall(), but I haven't seen any real application 
 which calls mlock().
ntpd, amd from the base system. gpg and similar programs try to mlock
key store to avoid sensitive material leakage to the 

Re: problems with mmap() and disk caching

2012-04-09 Thread Andrey Zonov
On Mon, Apr 9, 2012 at 1:18 PM, Konstantin Belousov kostik...@gmail.com wrote:
 On Mon, Apr 09, 2012 at 11:17:41AM +0400, Andrey Zonov wrote:
 On 06.04.2012 12:13, Konstantin Belousov wrote:
 On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote:
[snip]
 I always thought that active memory this is a sum of resident memory of
 all processes, inactive shows disk cache and wired shows kernel itself.
 So you are wrong. Both active and inactive memory can be mapped and
 not mapped, both can belong to vnode or to anonymous objects etc.
 Active/inactive distinction is only the amount of references that was
 noted by pagedaemon, or some other page history like the way it was
 unwired.
 
 Wired is not neccessary means kernel-used pages, user processes can
 wire their pages as well.

 Let's talk about that in details.

 My understanding is the following:

 Active memory: the memory which is referenced by application.  An
 Assuming the part 'by application' is removed, this sentence is almost right.
 Any managed mapping of the page participates in the active references.

 application may get memory only through mmap() (allocator don't use
 brk()/sbrk() any more).  The resident memory of an application is the
 sum of physical used memory.  So, sum of RSS is active memory.
 First, brk/sbrk is still used. Second, there is no requirement that
 resident pages are referenced. E.g. page could have participated in the
 buffer, and unwiring on the buffer dissolve put it into inactive state.
 Or pagedaemon cleared the reference and moved the page to inactive queue.
 Or the page was prefaulted by different optimizations.

 More, there is subtle difference between 'resident' and 'not causing fault
 on access'. Page may be resident, but pte was not preinstalled, or pte
 was flushed etc.

From the user point of view: how can the memory be active if no-one (I
mean application) use it?

What I really saw not at once is that the program for a long time
worked with big mmap()'ed file, couldn't work well (many page faults)
with new version of the file, until I manually flushed active memory
by FS re-mounting.  New version couldn't force out the old one.  In my
opinion if VM moved cached objects to inactive queue after program
termination I wouldn't see this problem.


 Inactive memory: the memory which has no references.  Once we call
 read() on the file, the file is in inactive memory, because we have no
 references to this object, we just read it.  This is also released
 memory by free().
 On buffers dissolve, buffer cache explicitely puts pages constituing
 the buffer, into the inactive queue. In fact, this is not quite right,
 e.g. if the same pages are mapped and actively referenced, then
 pagedaemon has slightly more work now to move the page from inactive
 to active.


Yes, sure, if someone else use the object it should be active and even
better to introduce new SHARED counter, like one is in MacOSX and
Linux.

 And, free(3) operates at so much higher level then vm subsystem that
 describing the interaction between these two is impossible in any
 definitive mood. Old naive mallocs put block description at the beggining
 of the block, actually causing free() to reference at least the first
 page of the block. Jemalloc often does madvise(MADV_FREE) for large
 freed allocations. MADV_FREE  moves pages between queues probabalistically.


That's exactly what I meant by free().  We drop act_count to 0 and
move page to inactive queue by vm_page_dontneed()


 Cache memory: I don't know what is it. It's always small enough to not
 think about it.
 This was the bug you reported, and which Alan fixed on Sunday.


I've tested this patch under 9.0-STABLE and should say that it
introduces problems with interactivity on heavy disk loaded machines.
With the patch that I tested before I didn't observe such problems.


 Wired memory: kernel memory and yes, application may get wired memory
 through mlock()/mlockall(), but I haven't seen any real application
 which calls mlock().
 ntpd, amd from the base system. gpg and similar programs try to mlock
 key store to avoid sensitive material leakage to the swap. cdrecord(8)
 tried to mlock itself to avoid indefinite stalls during write.


Nice catch ;-)



 
 
 Read the file:
 $ cat /mnt/random   /dev/null
 
 Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free
 
 Now the file is in wired memory.  I do not understand why so.
 You do use UFS, right ?
 
 Yes.
 
 There is enough buffer headers and buffer KVA
 to have buffers allocated for the whole file content. Since buffers wire
 corresponding pages, you get pages migrated to wired.
 
 When there appears a buffer pressure (i.e., any other i/o started),
 the buffers will be repurposed and pages moved to inactive.
 
 
 OK, how can I get amount of disk cache?
 You cannot. At least I am not aware of any counter that keeps track
 of the resident pages belonging to vnode pager.
 
 Buffers should not be thought as disk cache, pages cache disk 

Re: problems with mmap() and disk caching

2012-04-09 Thread Konstantin Belousov
On Mon, Apr 09, 2012 at 03:35:30PM +0400, Andrey Zonov wrote:
 On Mon, Apr 9, 2012 at 1:18 PM, Konstantin Belousov kostik...@gmail.com 
 wrote:
  On Mon, Apr 09, 2012 at 11:17:41AM +0400, Andrey Zonov wrote:
  On 06.04.2012 12:13, Konstantin Belousov wrote:
  On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote:
 [snip]
  I always thought that active memory this is a sum of resident memory of
  all processes, inactive shows disk cache and wired shows kernel itself.
  So you are wrong. Both active and inactive memory can be mapped and
  not mapped, both can belong to vnode or to anonymous objects etc.
  Active/inactive distinction is only the amount of references that was
  noted by pagedaemon, or some other page history like the way it was
  unwired.
  
  Wired is not neccessary means kernel-used pages, user processes can
  wire their pages as well.
 
  Let's talk about that in details.
 
  My understanding is the following:
 
  Active memory: the memory which is referenced by application.  An
  Assuming the part 'by application' is removed, this sentence is almost 
  right.
  Any managed mapping of the page participates in the active references.
 
  application may get memory only through mmap() (allocator don't use
  brk()/sbrk() any more).  The resident memory of an application is the
  sum of physical used memory.  So, sum of RSS is active memory.
  First, brk/sbrk is still used. Second, there is no requirement that
  resident pages are referenced. E.g. page could have participated in the
  buffer, and unwiring on the buffer dissolve put it into inactive state.
  Or pagedaemon cleared the reference and moved the page to inactive queue.
  Or the page was prefaulted by different optimizations.
 
  More, there is subtle difference between 'resident' and 'not causing fault
  on access'. Page may be resident, but pte was not preinstalled, or pte
  was flushed etc.
 
 From the user point of view: how can the memory be active if no-one (I
 mean application) use it?
 
 What I really saw not at once is that the program for a long time
 worked with big mmap()'ed file, couldn't work well (many page faults)
 with new version of the file, until I manually flushed active memory
 by FS re-mounting.  New version couldn't force out the old one.  In my
 opinion if VM moved cached objects to inactive queue after program
 termination I wouldn't see this problem.
Moving pages to inactive just because some mapping was destroyed is plain
silly. The pages migrate between active/inactive/cache/free by the
pagedaemon algorithms.

BTW, you do not need to actually remount filesystem to flush pages of its
vnodes. It is enough to try to unmount it while cd to filesystem root.
 
 
  Inactive memory: the memory which has no references.  Once we call
  read() on the file, the file is in inactive memory, because we have no
  references to this object, we just read it.  This is also released
  memory by free().
  On buffers dissolve, buffer cache explicitely puts pages constituing
  the buffer, into the inactive queue. In fact, this is not quite right,
  e.g. if the same pages are mapped and actively referenced, then
  pagedaemon has slightly more work now to move the page from inactive
  to active.
 
 
 Yes, sure, if someone else use the object it should be active and even
 better to introduce new SHARED counter, like one is in MacOSX and
 Linux.
Counter for what ? There is already the ref counter for a vm object.

 
  And, free(3) operates at so much higher level then vm subsystem that
  describing the interaction between these two is impossible in any
  definitive mood. Old naive mallocs put block description at the beggining
  of the block, actually causing free() to reference at least the first
  page of the block. Jemalloc often does madvise(MADV_FREE) for large
  freed allocations. MADV_FREE  moves pages between queues probabalistically.
 
 
 That's exactly what I meant by free().  We drop act_count to 0 and
 move page to inactive queue by vm_page_dontneed()
 
 
  Cache memory: I don't know what is it. It's always small enough to not
  think about it.
  This was the bug you reported, and which Alan fixed on Sunday.
 
 
 I've tested this patch under 9.0-STABLE and should say that it
 introduces problems with interactivity on heavy disk loaded machines.
 With the patch that I tested before I didn't observe such problems.
 
 
  Wired memory: kernel memory and yes, application may get wired memory
  through mlock()/mlockall(), but I haven't seen any real application
  which calls mlock().
  ntpd, amd from the base system. gpg and similar programs try to mlock
  key store to avoid sensitive material leakage to the swap. cdrecord(8)
  tried to mlock itself to avoid indefinite stalls during write.
 
 
 Nice catch ;-)
 
 
 
  
  
  Read the file:
  $ cat /mnt/random   /dev/null
  
  Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free
  
  Now the file is in wired memory.  I do not understand why so.
  You do use UFS, 

Re: Graphical Terminal Environment

2012-04-09 Thread Brandon Falk
I'm still avidly trying to work on this idea, but right now the issue 
seems to be with AMD and NVIDIA not documenting their protocols. Intel 
does a good job, but I don't have any Intel chips with graphics laying 
around.


Right now I've targeted what I think is the main issue, and that is the 
closed-protocol GPU. I'm working on a minimal GPU right now on an FPGA. 
Not sure if it will actually end up going anywhere, but I would really 
like to see an open-hardware GPU out on the market. Certainly it would 
not be an NVIDIA or AMD killer, but it would be a good card for people 
who just want to watch videos, browse the web, run terminals, etc.


The main focus of this GPU would be to maximize resolutions and 
monitors, and minimize cost. Currently it looks like I could run 4 
monitors at 1080p for about $50 (that's not taking into account 
bulk-order costs).


I could try to work with nouveau (as I did before) but I'll just never 
feel ok with using a system that uses 'blobs' (nouveau terms for the 
bits that are sent to the card without knowing what they really are).


-Brandon

On 4/8/2012 3:45 PM, Michael Cardell Widerkrantz wrote:

Since Brandon started this in a sort of rambling mood I'm keeping up
with the tradition... This is just what's on top of my mind right now.

per...@pluto.rain.com, 2012-03-06 17:05 (+0100):


I _think_ SunTools/SunView were proprietary,

Absolutely.


although it's possible that Sun released the source code at some
point.

Much of the actual window system in SunView was implemented in the
kernel, IIRC. That might not be interesting in this case.

Another system I used on quite memory-starved Sun 3/50s (as little as 4
meg) and 3/60s and later on SPARCstations, was the Bellcore MGR window
system:

   http://hack.org/mc/mgr/

   http://en.wikipedia.org/wiki/ManaGeR

Many users in the Lysator academic computing society where I first met
MGR preferred it to SunView. It was really nice on monochrome monitors
at 1152x900. It's also network transparent so you can run remote
graphics applications.

MGR was ported to a lot of systems, including FreeBSD. It might still
compile, but it's unlikely to support anything higher than 640x480 on
FreeBSD. If anyone tries to compile it and runs into problems I might be
able to help. Just e-mail me.

To support higher resolutions on FreeBSD Brandon would have to rewrite
the functions in libbitblit. One way to do it would be to use vgl(3) to
implement the libbitblit functions. Should be pretty straightforward, I
think, and not too much work.

On the other hand vgl(3) probably only supports VESA so Brandon will
still have to write a special libbitblit for the nvidia card he
mentions.

MGR doesn't tile windows but Brandon might want to add a mode to do
that.

MGR has a slightly bothersome license, though, forbidding commercial
sales so this might not be the best way forward.

On Sun SPARCs under SunOS it was also possible to run a tiling window
system called Oberon. It shares its name with a programming language and
a complete native operating system. Oberon is a complete environment
using the Oberon programming language so it might not be what Brandon
wants but it might be interesting to look at nonetheless.

I believe Oberon is still available and can run either as a native
operating system or as an environment under other systems. The SPARC
port I used many years ago was running under SunOS but was running
directly on the console. I don't know if there are any modern Oberon
systems that can do that.

Incidentally, Oberon was one of the inspirations behind Rob Pike's acme
editor on Plan 9. Acme, however, just handles text. Oberon does graphics
as well.

I've been thinking something along the same lines as Brandon for several
years now: to write a lightweight window system. For many years I
resisted X and kept using MGR, even going so far as porting MGR to
Solaris and to Linux/SPARC just to be able to keep using MGR on more
modern systems. I gave up, I think, around 1994.

If I would do it again I would probably not work on MGR but I might use
it for some ideas. One thing that MGR does that I wouldn't do was to
force all graphics operations to be done through escape codes in
terminal windows. While it might be great for network transparance it's
not so great for the speed of local programs.

The Wayland project is interesting but seems very Linux oriented. On the
other hand work on KMS/GEM support on FreeBSD is coming along. It might
be possible to get Wayland running on FreeBSD. I haven't looked into it
myself (yet).

James Gosling, who wrote both the Andrew window system and Sun's NeWS
(not SunView, the *other* Sun window system, the one with a Postscript
interpreter) has written an interesting paper about window system
design. I have a copy here:

   http://hack.org/mc/texts/gosling-wsd.pdf

Some people have mentioned Plan 9's 8 1/2 and rio. They are both very
interesting window systems. While I think they have a very clean design
I think 

Re: time stops in vmware

2012-04-09 Thread Mark Felder
On Sun, 08 Apr 2012 02:11:25 -0500, Daniel Braniss da...@cs.huji.ac.il  
wrote:



Hi All
There was some mention before that time stops under vmware, and now it's
happened
to me :-)

the clock stopped now, the system is responsive, but eg
sleep 1
never finishes.
Is there a solution?
btw, I'm running 8.2-stable, i'll try 8.3 soon.



Can you recreate it? Does it go away if you use kern.hz=200 in  
loader.conf? We used to have to do that.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: problems with mmap() and disk caching

2012-04-09 Thread John Baldwin
On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote:
 On 04/04/2012 02:17, Konstantin Belousov wrote:
  On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:
  Hi,
 
  I open the file, then call mmap() on the whole file and get pointer,
  then I work with this pointer.  I expect that page should be only once
  touched to get it into the memory (disk cache?), but this doesn't work!
 
  I wrote the test (attached) and ran it for the 1G file generated from
  /dev/random, the result is the following:
 
  Prepare file:
  # swapoff -a
  # newfs /dev/ada0b
  # mount /dev/ada0b /mnt
  # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024
 
  Purge cache:
  # umount /mnt
  # mount /dev/ada0b /mnt
 
  Run test:
  $ ./mmap /mnt/random-1024 30
  mmap:  1 pass took:   7.431046 (none: 262112; res: 32; super:
  0; other:  0)
  mmap:  2 pass took:   7.356670 (none: 261648; res:496; super:
  0; other:  0)
  mmap:  3 pass took:   7.307094 (none: 260521; res:   1623; super:
  0; other:  0)
  mmap:  4 pass took:   7.350239 (none: 258904; res:   3240; super:
  0; other:  0)
  mmap:  5 pass took:   7.392480 (none: 257286; res:   4858; super:
  0; other:  0)
  mmap:  6 pass took:   7.292069 (none: 255584; res:   6560; super:
  0; other:  0)
  mmap:  7 pass took:   7.048980 (none: 251142; res:  11002; super:
  0; other:  0)
  mmap:  8 pass took:   6.899387 (none: 247584; res:  14560; super:
  0; other:  0)
  mmap:  9 pass took:   7.190579 (none: 242992; res:  19152; super:
  0; other:  0)
  mmap: 10 pass took:   6.915482 (none: 239308; res:  22836; super:
  0; other:  0)
  mmap: 11 pass took:   6.565909 (none: 232835; res:  29309; super:
  0; other:  0)
  mmap: 12 pass took:   6.423945 (none: 226160; res:  35984; super:
  0; other:  0)
  mmap: 13 pass took:   6.315385 (none: 208555; res:  53589; super:
  0; other:  0)
  mmap: 14 pass took:   6.760780 (none: 192805; res:  69339; super:
  0; other:  0)
  mmap: 15 pass took:   5.721513 (none: 174497; res:  87647; super:
  0; other:  0)
  mmap: 16 pass took:   5.004424 (none: 155938; res: 106206; super:
  0; other:  0)
  mmap: 17 pass took:   4.224926 (none: 135639; res: 126505; super:
  0; other:  0)
  mmap: 18 pass took:   3.749608 (none: 117952; res: 144192; super:
  0; other:  0)
  mmap: 19 pass took:   3.398084 (none:  99066; res: 163078; super:
  0; other:  0)
  mmap: 20 pass took:   3.029557 (none:  74994; res: 187150; super:
  0; other:  0)
  mmap: 21 pass took:   2.379430 (none:  55231; res: 206913; super:
  0; other:  0)
  mmap: 22 pass took:   2.046521 (none:  40786; res: 221358; super:
  0; other:  0)
  mmap: 23 pass took:   1.152797 (none:  30311; res: 231833; super:
  0; other:  0)
  mmap: 24 pass took:   0.972617 (none:  16196; res: 245948; super:
  0; other:  0)
  mmap: 25 pass took:   0.577515 (none:   8286; res: 253858; super:
  0; other:  0)
  mmap: 26 pass took:   0.380738 (none:   3712; res: 258432; super:
  0; other:  0)
  mmap: 27 pass took:   0.253583 (none:   1193; res: 260951; super:
  0; other:  0)
  mmap: 28 pass took:   0.157508 (none:  0; res: 262144; super:
  0; other:  0)
  mmap: 29 pass took:   0.156169 (none:  0; res: 262144; super:
  0; other:  0)
  mmap: 30 pass took:   0.156550 (none:  0; res: 262144; super:
  0; other:  0)
 
  If I ran this:
  $ cat /mnt/random-1024  /dev/null
  before test, when result is the following:
 
  $ ./mmap /mnt/random-1024 5
  mmap:  1 pass took:   0.337657 (none:  0; res: 262144; super:
  0; other:  0)
  mmap:  2 pass took:   0.186137 (none:  0; res: 262144; super:
  0; other:  0)
  mmap:  3 pass took:   0.186132 (none:  0; res: 262144; super:
  0; other:  0)
  mmap:  4 pass took:   0.186535 (none:  0; res: 262144; super:
  0; other:  0)
  mmap:  5 pass took:   0.190353 (none:  0; res: 262144; super:
  0; other:  0)
 
  This is what I expect.  But why this doesn't work without reading file
  manually?
  Issue seems to be in some change of the behaviour of the reserv or
  phys allocator. I Cc:ed Alan.
 
 I'm pretty sure that the behavior here hasn't significantly changed in 
 about twelve years.  Otherwise, I agree with your analysis.
 
 On more than one occasion, I've been tempted to change:
 
  pmap_remove_all(mt);
  if (mt-dirty != 0)
  vm_page_deactivate(mt);
  else
  vm_page_cache(mt);
 
 to:
 
  vm_page_dontneed(mt);
 
 because I suspect that the current code does more harm than good.  In 
 theory, it saves activations of the page daemon.  However, more often 
 than not, I suspect that we are spending more on page reactivations than 
 we are saving on page daemon 

Re: Startvation of realtime piority threads

2012-04-09 Thread John Baldwin
On Thursday, April 05, 2012 9:08:24 pm Sushanth Rai wrote:
 I understand the downside of badly written realtime app.  In my case 
application runs in userspace without making much syscalls and by all means it 
is a well behaved application. Yes, I can wire memory, change the application 
to use mutex instead of spinlock and those changes should help but they are 
still working around the problem. I still believe kernel should not lower the 
realtime priority when blocking on resources. This can lead to priority 
inversion, especially since these threads run at fixed priorities and kernel 
doesn't muck with them.
  
 As you suggested _sleep() should not adjust the priorities for realtime 
threads. 

Hmm, sched_sleep() for both SCHED_4BSD and SCHED_ULE already does the right
thing here in HEAD.

if (PRI_BASE(td-td_pri_class) != PRI_TIMESHARE)
return;

Which OS version did you see this on?

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Startvation of realtime piority threads

2012-04-09 Thread Sushanth Rai
I'm on 7.2. sched_sleep() on 7.2 just records the sleep time. That's why I 
though _sleep might the right place to do the check.

Thanks,
Sushanth

--- On Mon, 4/9/12, John Baldwin j...@freebsd.org wrote:

 From: John Baldwin j...@freebsd.org
 Subject: Re: Startvation of realtime piority threads
 To: Sushanth Rai sushanth_...@yahoo.com
 Cc: freebsd-hackers@freebsd.org
 Date: Monday, April 9, 2012, 9:17 AM
 On Thursday, April 05, 2012 9:08:24
 pm Sushanth Rai wrote:
  I understand the downside of badly written realtime
 app.  In my case 
 application runs in userspace without making much syscalls
 and by all means it 
 is a well behaved application. Yes, I can wire memory,
 change the application 
 to use mutex instead of spinlock and those changes should
 help but they are 
 still working around the problem. I still believe kernel
 should not lower the 
 realtime priority when blocking on resources. This can lead
 to priority 
 inversion, especially since these threads run at fixed
 priorities and kernel 
 doesn't muck with them.
   
  As you suggested _sleep() should not adjust the
 priorities for realtime 
 threads. 
 
 Hmm, sched_sleep() for both SCHED_4BSD and SCHED_ULE already
 does the right
 thing here in HEAD.
 
     if (PRI_BASE(td-td_pri_class) !=
 PRI_TIMESHARE)
         return;
 
 Which OS version did you see this on?
 
 -- 
 John Baldwin

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Startvation of realtime piority threads

2012-04-09 Thread John Baldwin
On Monday, April 09, 2012 2:08:50 pm Sushanth Rai wrote:
 I'm on 7.2. sched_sleep() on 7.2 just records the sleep time. That's why I 
though _sleep might the right place to do the check.

Nah, sched_sleep() is more accurate since the sleep priority can have other 
side effects.

Hmm, in stock 7.2, the rtprio range is below things like PVM, etc., so that
shouldn't actually be buggy in that regard.  I fixed this in 9.0 and HEAD
when I moved the rtprio range up above the kernel sleep priorities.  Are
you using local patches to 7.2 to raise the priority of rtprio threads?

 Thanks,
 Sushanth
 
 --- On Mon, 4/9/12, John Baldwin j...@freebsd.org wrote:
 
  From: John Baldwin j...@freebsd.org
  Subject: Re: Startvation of realtime piority threads
  To: Sushanth Rai sushanth_...@yahoo.com
  Cc: freebsd-hackers@freebsd.org
  Date: Monday, April 9, 2012, 9:17 AM
  On Thursday, April 05, 2012 9:08:24
  pm Sushanth Rai wrote:
   I understand the downside of badly written realtime
  app.  In my case 
  application runs in userspace without making much syscalls
  and by all means it 
  is a well behaved application. Yes, I can wire memory,
  change the application 
  to use mutex instead of spinlock and those changes should
  help but they are 
  still working around the problem. I still believe kernel
  should not lower the 
  realtime priority when blocking on resources. This can lead
  to priority 
  inversion, especially since these threads run at fixed
  priorities and kernel 
  doesn't muck with them.

   As you suggested _sleep() should not adjust the
  priorities for realtime 
  threads. 
  
  Hmm, sched_sleep() for both SCHED_4BSD and SCHED_ULE already
  does the right
  thing here in HEAD.
  
  if (PRI_BASE(td-td_pri_class) !=
  PRI_TIMESHARE)
  return;
  
  Which OS version did you see this on?
  
  -- 
  John Baldwin
  
 

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-09 Thread Alexander Motin

On 04/05/12 21:45, Alexander Motin wrote:

On 05.04.2012 21:12, Arnaud Lacombe wrote:

Hi,

[Sorry for the delay, I got a bit sidetrack'ed...]

2012/2/17 Alexander Motinm...@freebsd.org:

On 17.02.2012 18:53, Arnaud Lacombe wrote:


On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motinm...@freebsd.org
wrote:


On 02/15/12 21:54, Jeff Roberson wrote:


On Wed, 15 Feb 2012, Alexander Motin wrote:


I've decided to stop those cache black magic practices and focus on
things that really exist in this world -- SMT and CPU load. I've
dropped most of cache related things from the patch and made the
rest
of things more strict and predictable:
http://people.freebsd.org/~mav/sched.htt34.patch



This looks great. I think there is value in considering the other
approach further but I would like to do this part first. It would be
nice to also add priority as a greater influence in the load
balancing
as well.



I haven't got good idea yet about balancing priorities, but I've
rewritten
balancer itself. As soon as sched_lowest() / sched_highest() are more
intelligent now, they allowed to remove topology traversing from the
balancer itself. That should fix double-swapping problem, allow to
keep
some
affinity while moving threads and make balancing more fair. I did
number
of
tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8
and
16
threads everything is stationary as it should. With 9 threads I see
regular
and random load move between all 8 CPUs. Measurements on 5 minutes run
show
deviation of only about 5 seconds. It is the same deviation as I see
caused
by only scheduling of 16 threads on 8 cores without any balancing
needed
at
all. So I believe this code works as it should.

Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch

I plan this to be a final patch of this series (more to come :))
and if
there will be no problems or objections, I am going to commit it
(except
some debugging KTRs) in about ten days. So now it's a good time for
reviews
and testing. :)


is there a place where all the patches are available ?



All my scheduler patches are cumulative, so all you need is only the
last
mentioned here sched.htt40.patch.


You may want to have a look to the result I collected in the
`runs/freebsd-experiments' branch of:

https://github.com/lacombar/hackbench/

and compare them with vanilla FreeBSD 9.0 and -CURRENT results
available in `runs/freebsd'. On the dual package platform, your patch
is not a definite win.


But in some cases, especially for multi-socket systems, to let it
show its
best, you may want to apply additional patch from avg@ to better
detect CPU
topology:
https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd



test I conducted specifically for this patch did not showed much
improvement...


If I understand right, this test runs thousands of threads sending and
receiving data over the pipes. It is quite likely that all CPUs will be
always busy and so load balancing is not really important in this test,
What looks good is that more complicated new code is not slower then old
one.

While this test seems very scheduler-intensive, it may depend on many
other factors, such as syscall performance, context switch, etc. I'll
try to play more with it.


My profiling on 8-core Core i7 system shows that code from sched_ule.c 
staying on first places consumes still only 13% of kernel CPU time, 
while doing million of context switches per second. cpu_search(), 
affected by this patch, even less -- only 8%. The rest of time is spread 
between many small other functions. I did some optimizations at r234066 
to reduce cpu_search(0 time to 6%, but looking on how unstable results 
of this test are, hardly any difference there can be really measured by it.


I have strong feeling that while this test may be interesting for 
profiling, it's own results in first place depend not from how fast 
scheduler is, but from the pipes capacity and other alike things. Can 
somebody hint me what except pipe capacity and context switch to 
unblocked receiver prevents sender from sending all data in batch and 
then receiver from receiving them all in batch? If different OSes have 
different policies there, I think results could be incomparable.


--
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Graphical Terminal Environment

2012-04-09 Thread Dieter BSD
Brandon writes:
 I'm still avidly trying to work on this idea, but right now the issue
 seems to be with AMD and NVIDIA not documenting their protocols. Intel
 does a good job, but I don't have any Intel chips with graphics laying
 around.

I thought that AMD had documented most of it by now, with the
major exception of the UVD?

 I'm working on a minimal GPU right now on an FPGA.

 Currently it looks like I could run 4 monitors at 1080p for about $50

Have FPGA prices come down that much?  The OGP-D1 was quite a bit
more than that, last time I looked.  Or would that be the price for
a production version with an ASIC?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Startvation of realtime piority threads

2012-04-09 Thread Sushanth Rai
I'm using stock 7.2. The priorities as defined in priority.h are in this range:

/*
 * Priorities range from 0 to 255, but differences of less then 4 (RQ_PPQ)
 * are insignificant.  Ranges are as follows:
 *
 * Interrupt threads:   0 - 63
 * Top half kernel threads: 64 - 127
 * Realtime user threads:   128 - 159
 * Time sharing user threads:   160 - 223
 * Idle user threads:   224 - 255
 *
 * XXX If/When the specific interrupt thread and top half thread ranges
 * disappear, a larger range can be used for user processes.
 */

The trouble is with vm_waitpfault(), which explicitly sleeps at PUSER.


Sushanth

--- On Mon, 4/9/12, John Baldwin j...@freebsd.org wrote:

 From: John Baldwin j...@freebsd.org
 Subject: Re: Startvation of realtime piority threads
 To: Sushanth Rai sushanth_...@yahoo.com
 Cc: freebsd-hackers@freebsd.org
 Date: Monday, April 9, 2012, 11:37 AM
 On Monday, April 09, 2012 2:08:50 pm
 Sushanth Rai wrote:
  I'm on 7.2. sched_sleep() on 7.2 just records the sleep
 time. That's why I 
 though _sleep might the right place to do the check.
 
 Nah, sched_sleep() is more accurate since the sleep priority
 can have other 
 side effects.
 
 Hmm, in stock 7.2, the rtprio range is below things like
 PVM, etc., so that
 shouldn't actually be buggy in that regard.  I fixed
 this in 9.0 and HEAD
 when I moved the rtprio range up above the kernel sleep
 priorities.  Are
 you using local patches to 7.2 to raise the priority of
 rtprio threads?
 
  Thanks,
  Sushanth
  
  --- On Mon, 4/9/12, John Baldwin j...@freebsd.org
 wrote:
  
   From: John Baldwin j...@freebsd.org
   Subject: Re: Startvation of realtime piority
 threads
   To: Sushanth Rai sushanth_...@yahoo.com
   Cc: freebsd-hackers@freebsd.org
   Date: Monday, April 9, 2012, 9:17 AM
   On Thursday, April 05, 2012 9:08:24
   pm Sushanth Rai wrote:
I understand the downside of badly written
 realtime
   app.  In my case 
   application runs in userspace without making much
 syscalls
   and by all means it 
   is a well behaved application. Yes, I can wire
 memory,
   change the application 
   to use mutex instead of spinlock and those changes
 should
   help but they are 
   still working around the problem. I still believe
 kernel
   should not lower the 
   realtime priority when blocking on resources. This
 can lead
   to priority 
   inversion, especially since these threads run at
 fixed
   priorities and kernel 
   doesn't muck with them.
     
As you suggested _sleep() should not adjust
 the
   priorities for realtime 
   threads. 
   
   Hmm, sched_sleep() for both SCHED_4BSD and
 SCHED_ULE already
   does the right
   thing here in HEAD.
   
       if
 (PRI_BASE(td-td_pri_class) !=
   PRI_TIMESHARE)
           return;
   
   Which OS version did you see this on?
   
   -- 
   John Baldwin
   
  
 
 -- 
 John Baldwin

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


mlockall() on freebsd 7.2 + amd64 returns EAGAIN

2012-04-09 Thread Sushanth Rai
Hello,

I have a simple program that links with the math library. The only thing that 
program does is to call mlockall(MCL_CURRENT | MCL_FUTURE). This call to 
mlockall fails with EAGAIN. I figured out that kernel vm_fault() is returning 
KERN_PROTECTION_FAILURE when it tries to fault-in the mmap'ed math library 
address. But I can't figure why.

The /proc/mypid/map returns the following for the process:

0x800634000 0x80064c000 24 0 0xff0025571510 r-x 104 52 0x1000 COW NC vnode 
/lib/libm.so.5
0x80064c000 0x80064d000 1 0 0xff016f11c5e8 r-x 1 0 0x3100 COW NNC vnode 
/lib/libm.so.5
0x80064d000 0x80074c000 4 0 0xff0025571510 r-x 104 52 0x1000 COW NC vnode 
/lib/libm.so.5

Since ntpd calls mlockall with same option and links with math library too, I 
look at map o/p of ntpd, which looks slightly different resident column (3rd 
column) on 3rd line:
0x800682000 0x80069a000 8 0 0xff0025571510 r-x 100 50 0x1000 COW NC vnode 
/lib/libm.so.5
0x80069a000 0x80069b000 1 0 0xff0103b85870 r-x 1 0 0x3100 COW NNC vnode 
/lib/libm.so.5
0x80069b000 0x80079a000 0 0 0xff0025571510 r-x 100 50 0x1000 COW NC vnode 
/lib/libm.so.5

I don't know if that has anything to do with failure. The snippet of code that 
returns failure in vm_fault() is the following:

if (fs.pindex = fs.object-size) {
  unlock_and_deallocate(fs);
  return (KERN_PROTECTION_FAILURE);
}

Any help would be appreciated.

Thanks,
Sushanth

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org