Re: problems with mmap() and disk caching

2012-04-10 Thread Andrey Zonov

On 10.04.2012 20:19, Alan Cox wrote:

On 04/09/2012 10:26, John Baldwin wrote:

On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

Hi,

I open the file, then call mmap() on the whole file and get pointer,
then I work with this pointer. I expect that page should be only once
touched to get it into the memory (disk cache?), but this doesn't
work!

I wrote the test (attached) and ran it for the 1G file generated from
/dev/random, the result is the following:

Prepare file:
# swapoff -a
# newfs /dev/ada0b
# mount /dev/ada0b /mnt
# dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024

Purge cache:
# umount /mnt
# mount /dev/ada0b /mnt

Run test:
$ ./mmap /mnt/random-1024 30
mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super:
0; other: 0)
mmap: 2 pass took: 7.356670 (none: 261648; res: 496; super:
0; other: 0)
mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super:
0; other: 0)
mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super:
0; other: 0)
mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super:
0; other: 0)
mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super:
0; other: 0)
mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super:
0; other: 0)
mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super:
0; other: 0)
mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super:
0; other: 0)
mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super:
0; other: 0)
mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super:
0; other: 0)
mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super:
0; other: 0)
mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super:
0; other: 0)
mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super:
0; other: 0)
mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super:
0; other: 0)
mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super:
0; other: 0)
mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super:
0; other: 0)
mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super:
0; other: 0)
mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super:
0; other: 0)
mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super:
0; other: 0)
mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super:
0; other: 0)
mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super:
0; other: 0)
mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super:
0; other: 0)
mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super:
0; other: 0)
mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super:
0; other: 0)
mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super:
0; other: 0)
mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super:
0; other: 0)
mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super:
0; other: 0)
mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super:
0; other: 0)
mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super:
0; other: 0)

If I ran this:
$ cat /mnt/random-1024> /dev/null
before test, when result is the following:

$ ./mmap /mnt/random-1024 5
mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super:
0; other: 0)
mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super:
0; other: 0)
mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super:
0; other: 0)
mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super:
0; other: 0)
mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super:
0; other: 0)

This is what I expect. But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.

I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years. Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

pmap_remove_all(mt);
if (mt->dirty != 0)
vm_page_deactivate(mt);
else
vm_page_cache(mt);

to:

vm_page_dontneed(mt);

because I suspect that the current code does more harm than good. In
theory, it saves activations of the page daemon. However, more often
than not, I suspect that we are spending more on page reactivations than
we are saving on page daemon activations. The sequential access
detection heuristic is just too easily triggered. For example, I've
seen it triggered by demand paging of the gcc text segment. Also, I
think that pmap_remove_all() and especially vm_page_cache() are too
severe for a detection heuristic that is so easily triggered.

Are you planning to commit this?



Not yet. I did some tests with a file that was several times larger than
DRAM, and I didn't like what I saw. Initially, everything behaved as
expected, but about halfway through the test the bulk of the pages were
active. Despite the call to pmap_clear_reference() in
vm_page_dontneed(), the page daemon is finding the pages to be
referenced and reactivating them. The net result is that the time it
takes to 

Re: mlockall() on freebsd 7.2 + amd64 returns EAGAIN

2012-04-10 Thread Konstantin Belousov
On Tue, Apr 10, 2012 at 06:33:44PM -0700, Sushanth Rai wrote:
> 
> > > I don't know if that has anything to do with failure.
> > The snippet of code that returns failure in vm_fault() is
> > the following:
> > > 
> > > if (fs.pindex >= fs.object->size) {
> > >   
> >    unlock_and_deallocate(&fs);
> > >       return
> > (KERN_PROTECTION_FAILURE);
> > > }
> > > 
> > > Any help would be appreciated.
> > 
> > This might be a bug fixed in r191810, but I am not sure.
> > 
> 
> I tried that fix but it didn't work. What seems to happen is that libm is 
> mmap'ed beyond the size of the file. From truss o/p, I see the following:
> 
> open("/lib/libm.so.5",O_RDONLY,030577200)  = 3 (0x3)
> fstat(3,{ mode=-r--r--r-- ,inode=918533,size=115560,blksize=4096 }) = 0 (0x0)
> read(3,"\^?ELF\^B\^A\^A\t\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
> mmap(0x0,1155072,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_NOCORE,3,0x0) = 
> 34366242816 (0x800634000)
> 
> So the size of the file is 115560 but mmap() length is 1155072. The memory 
> map of the file corresponding to libm as seen from running 'cat 
> /proc//map' is the following:
> 
> 0x800634000 0x80064c000 24 0 0xff002553eca8 r-x 108 54 0x0 COW NC vnode 
> /lib/libm.so.5
> 0x80064c000 0x80064d000 1 0 0xff01d79b0a20 r-x 1 0 0x3100 COW NNC vnode 
> /lib/libm.so.5
> 0x80064d000 0x80074c000 3 0 0xff002553eca8 r-x 108 54 0x0 COW NC vnode 
> /lib/libm.so.5
> 0x80074c000 0x80074e000 2 0 0xff01d79f1288 rw- 1 0 0x3100 COW NNC vnode 
> /lib/libm.so.5
> 
> 
> when the program tries to fault-in all the pages as part of call to 
> mlockall(), the following check in vm_fault() fails when trying to fault-in 
> 0x800651000.
> 
> if (fs.pindex >= fs.object->size) {
>  unlock_and_deallocate(&fs);
>  return (KERN_PROTECTION_FAILURE);
> }
> 
> since the object size corresponds to size of libm and fault address is one 
> page beyond the object size. Is this a bug ?

Then it should be fixed in r190885.

Could you use something less antique, please ?


pgpiYXUnIyFLN.pgp
Description: PGP signature


Re: mlockall() on freebsd 7.2 + amd64 returns EAGAIN

2012-04-10 Thread Sushanth Rai

> > I don't know if that has anything to do with failure.
> The snippet of code that returns failure in vm_fault() is
> the following:
> > 
> > if (fs.pindex >= fs.object->size) {
> >   
>    unlock_and_deallocate(&fs);
> >       return
> (KERN_PROTECTION_FAILURE);
> > }
> > 
> > Any help would be appreciated.
> 
> This might be a bug fixed in r191810, but I am not sure.
> 

I tried that fix but it didn't work. What seems to happen is that libm is 
mmap'ed beyond the size of the file. From truss o/p, I see the following:

open("/lib/libm.so.5",O_RDONLY,030577200)= 3 (0x3)
fstat(3,{ mode=-r--r--r-- ,inode=918533,size=115560,blksize=4096 }) = 0 (0x0)
read(3,"\^?ELF\^B\^A\^A\t\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
mmap(0x0,1155072,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_NOCORE,3,0x0) = 
34366242816 (0x800634000)

So the size of the file is 115560 but mmap() length is 1155072. The memory map 
of the file corresponding to libm as seen from running 'cat /proc//map' 
is the following:

0x800634000 0x80064c000 24 0 0xff002553eca8 r-x 108 54 0x0 COW NC vnode 
/lib/libm.so.5
0x80064c000 0x80064d000 1 0 0xff01d79b0a20 r-x 1 0 0x3100 COW NNC vnode 
/lib/libm.so.5
0x80064d000 0x80074c000 3 0 0xff002553eca8 r-x 108 54 0x0 COW NC vnode 
/lib/libm.so.5
0x80074c000 0x80074e000 2 0 0xff01d79f1288 rw- 1 0 0x3100 COW NNC vnode 
/lib/libm.so.5


when the program tries to fault-in all the pages as part of call to mlockall(), 
the following check in vm_fault() fails when trying to fault-in 0x800651000.

if (fs.pindex >= fs.object->size) {
 unlock_and_deallocate(&fs);
 return (KERN_PROTECTION_FAILURE);
}

since the object size corresponds to size of libm and fault address is one page 
beyond the object size. Is this a bug ?

Thanks,
Sushanth


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: time stops in vmware

2012-04-10 Thread Mark Felder
On Tue, 10 Apr 2012 03:04:10 -0500, Daniel Braniss   
wrote:



no, I can't recreate it, could you?


We haven't seen this problem since FreeBSD 6.x. I can't recall how  
reproducible it was.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-10 Thread Mike Meyer
On Tue, 10 Apr 2012 16:50:39 -0400
Arnaud Lacombe  wrote:
> On Tue, Apr 10, 2012 at 4:05 PM, Mike Meyer  wrote:
> > On Tue, 10 Apr 2012 12:58:00 -0400
> > Arnaud Lacombe  wrote:
> >> Let me disagree on your conclusion. If OS A does a task in X seconds,
> >> and OS B does the same task in Y seconds, if Y > X, then OS B is just
> >> not performing good enough.
> >
> > Others have pointed out one problem with this statement. Let me point
> > out another:
[elided]
> You are discussing implementations in both case. If the implementation
> is not good enough, let's improve it, but do not discard the numbers
> on false claims.

No, I was discussing goals. You need to know what the goals of the
system are before you can declare that it's "just not performing good
enough" simply because another system can perform the same task
faster. That may well be true, and you can get the same performance
without an adverse effect on other goals.  But it may also be the case
that you can't reach that higher performance goal for your task
without unacceptable effects on more important goals which aren't
shared by the OS that's outperforming yours.

One set of numbers is merely an indication that there may be an issue
that needs to be addressed. They shouldn't be discarded out of
hand. But they shouldn't be used to justify changes until you've
verified that the changes aren't having an adverse effect on more
important goals.

http://www.mired.org/
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-10 Thread Arnaud Lacombe
Hi,

On Tue, Apr 10, 2012 at 4:05 PM, Mike Meyer  wrote:
> On Tue, 10 Apr 2012 12:58:00 -0400
> Arnaud Lacombe  wrote:
>> Let me disagree on your conclusion. If OS A does a task in X seconds,
>> and OS B does the same task in Y seconds, if Y > X, then OS B is just
>> not performing good enough.
>
> Others have pointed out one problem with this statement. Let me point
> out another:
>
> It ignores the purpose of the system. If you change the task to doing
> N concurrent versions of the task, and OS A time increases linearly
> with the number of tasks (say it's time X*N) but OS B stair-steps at
> the number of processors in the system (i.e. Y*floor(N/P)), then OS A
> is just not performing good enough.
>
> A more concrete example: if OS B spends a couple of microseconds
> optimizing disk access order and OS A doesn't, then a single process
> writing to disk on OS A could well run faster than the same on OS
> B. However, the maximum throughput on OS B as you add process will be
> higher than it is on OS A. Which one you want will depend on what
> you're using the system for.
>
You are discussing implementations in both case. If the implementation
is not good enough, let's improve it, but do not discard the numbers
on false claims.

 - Arnaud
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-10 Thread Mike Meyer
On Tue, 10 Apr 2012 12:58:00 -0400
Arnaud Lacombe  wrote:
> Let me disagree on your conclusion. If OS A does a task in X seconds,
> and OS B does the same task in Y seconds, if Y > X, then OS B is just
> not performing good enough.

Others have pointed out one problem with this statement. Let me point
out another:

It ignores the purpose of the system. If you change the task to doing
N concurrent versions of the task, and OS A time increases linearly
with the number of tasks (say it's time X*N) but OS B stair-steps at
the number of processors in the system (i.e. Y*floor(N/P)), then OS A
is just not performing good enough.

A more concrete example: if OS B spends a couple of microseconds
optimizing disk access order and OS A doesn't, then a single process
writing to disk on OS A could well run faster than the same on OS
B. However, the maximum throughput on OS B as you add process will be
higher than it is on OS A. Which one you want will depend on what
you're using the system for.

http://www.mired.org/
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-10 Thread Alexander Motin

On 04/10/12 21:46, Arnaud Lacombe wrote:

On Tue, Apr 10, 2012 at 1:53 PM, Alexander Motin  wrote:

On 04/10/12 20:18, Alexander Motin wrote:

On 04/10/12 19:58, Arnaud Lacombe wrote:

2012/4/9 Alexander Motin:

I have strong feeling that while this test may be interesting for
profiling,
it's own results in first place depend not from how fast scheduler
is, but
from the pipes capacity and other alike things. Can somebody hint me
what
except pipe capacity and context switch to unblocked receiver prevents
sender from sending all data in batch and then receiver from
receiving them
all in batch? If different OSes have different policies there, I think
results could be incomparable.


Let me disagree on your conclusion. If OS A does a task in X seconds,
and OS B does the same task in Y seconds, if Y>  X, then OS B is just
not performing good enough. Internal implementation's difference for
the task can not be waived as an excuse for result's comparability.



Sure, numbers are always numbers, but the question is what are they
showing? Understanding of the test results is even more important for
purely synthetic tests like this. Especially when one test run gives 25
seconds, while another gives 50. This test is not completely clear to me
and that is what I've told.


Small illustration to my point. Simple scheduler tuning affects thread
preemption policy and changes this test results in three times:

mav@test:/test/hackbench# ./hackbench 30 process 1000
Running with 30*40 (== 1200) tasks.
Time: 9.568

mav@test:/test/hackbench# sysctl kern.sched.interact=0
kern.sched.interact: 30 ->  0
mav@test:/test/hackbench# ./hackbench 30 process 1000
Running with 30*40 (== 1200) tasks.
Time: 5.163

mav@test:/test/hackbench# sysctl kern.sched.interact=100
kern.sched.interact: 0 ->  100
mav@test:/test/hackbench# ./hackbench 30 process 1000
Running with 30*40 (== 1200) tasks.
Time: 3.190

I think it affects balance between pipe latency and bandwidth, while test
measures only the last. It is clear that conclusion from these numbers
depends on what do we want to have.


I don't really care on this point, I'm only testing default values, or
more precisely, whatever developers though default values would be
good.

Btw, you are testing 3 differents configuration. Different results are
expected. What worries me more is the rather the huge instability on
the *same* configuration, say on a pipe/thread/70 groups/600
iterations run, where results range from 2.7s[0] to 7.4s, or a
socket/thread/20 groups/1400 iterations run, where results range from
2.4s to 4.5s.


Due to reason I've pointed in my first message this test is _extremely_ 
sensitive to context switch interval. The more aggressive scheduler 
switches threads, the smaller will be pipe latency, but the smaller will 
be also bandwidth. During test run scheduler all the time recalculates 
interactivity index for each thread, trying to balance between latency 
and switching overhead. With hundreds of threads running simultaneously 
and interfering with each other it is quite unpredictable process.


--
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-10 Thread Arnaud Lacombe
Hi,

On Tue, Apr 10, 2012 at 1:53 PM, Alexander Motin  wrote:
> On 04/10/12 20:18, Alexander Motin wrote:
>>
>> On 04/10/12 19:58, Arnaud Lacombe wrote:
>>>
>>> 2012/4/9 Alexander Motin:

 [...]

 I have strong feeling that while this test may be interesting for
 profiling,
 it's own results in first place depend not from how fast scheduler
 is, but
 from the pipes capacity and other alike things. Can somebody hint me
 what
 except pipe capacity and context switch to unblocked receiver prevents
 sender from sending all data in batch and then receiver from
 receiving them
 all in batch? If different OSes have different policies there, I think
 results could be incomparable.

>>> Let me disagree on your conclusion. If OS A does a task in X seconds,
>>> and OS B does the same task in Y seconds, if Y> X, then OS B is just
>>> not performing good enough. Internal implementation's difference for
>>> the task can not be waived as an excuse for result's comparability.
>>
>>
>> Sure, numbers are always numbers, but the question is what are they
>> showing? Understanding of the test results is even more important for
>> purely synthetic tests like this. Especially when one test run gives 25
>> seconds, while another gives 50. This test is not completely clear to me
>> and that is what I've told.
>
> Small illustration to my point. Simple scheduler tuning affects thread
> preemption policy and changes this test results in three times:
>
> mav@test:/test/hackbench# ./hackbench 30 process 1000
> Running with 30*40 (== 1200) tasks.
> Time: 9.568
>
> mav@test:/test/hackbench# sysctl kern.sched.interact=0
> kern.sched.interact: 30 -> 0
> mav@test:/test/hackbench# ./hackbench 30 process 1000
> Running with 30*40 (== 1200) tasks.
> Time: 5.163
>
> mav@test:/test/hackbench# sysctl kern.sched.interact=100
> kern.sched.interact: 0 -> 100
> mav@test:/test/hackbench# ./hackbench 30 process 1000
> Running with 30*40 (== 1200) tasks.
> Time: 3.190
>
> I think it affects balance between pipe latency and bandwidth, while test
> measures only the last. It is clear that conclusion from these numbers
> depends on what do we want to have.
>
I don't really care on this point, I'm only testing default values, or
more precisely, whatever developers though default values would be
good.

Btw, you are testing 3 differents configuration. Different results are
expected. What worries me more is the rather the huge instability on
the *same* configuration, say on a pipe/thread/70 groups/600
iterations run, where results range from 2.7s[0] to 7.4s, or a
socket/thread/20 groups/1400 iterations run, where results range from
2.4s to 4.5s.

 - Arnaud

[0]: numbers extracted from a recent run of 9.0-RELEASE on a Xeon
E5-1650 platform.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-10 Thread Alexander Motin

On 04/10/12 20:18, Alexander Motin wrote:

On 04/10/12 19:58, Arnaud Lacombe wrote:

2012/4/9 Alexander Motin:

[...]

I have strong feeling that while this test may be interesting for
profiling,
it's own results in first place depend not from how fast scheduler
is, but
from the pipes capacity and other alike things. Can somebody hint me
what
except pipe capacity and context switch to unblocked receiver prevents
sender from sending all data in batch and then receiver from
receiving them
all in batch? If different OSes have different policies there, I think
results could be incomparable.


Let me disagree on your conclusion. If OS A does a task in X seconds,
and OS B does the same task in Y seconds, if Y> X, then OS B is just
not performing good enough. Internal implementation's difference for
the task can not be waived as an excuse for result's comparability.


Sure, numbers are always numbers, but the question is what are they
showing? Understanding of the test results is even more important for
purely synthetic tests like this. Especially when one test run gives 25
seconds, while another gives 50. This test is not completely clear to me
and that is what I've told.


Small illustration to my point. Simple scheduler tuning affects thread 
preemption policy and changes this test results in three times:


mav@test:/test/hackbench# ./hackbench 30 process 1000
Running with 30*40 (== 1200) tasks.
Time: 9.568

mav@test:/test/hackbench# sysctl kern.sched.interact=0
kern.sched.interact: 30 -> 0
mav@test:/test/hackbench# ./hackbench 30 process 1000
Running with 30*40 (== 1200) tasks.
Time: 5.163

mav@test:/test/hackbench# sysctl kern.sched.interact=100
kern.sched.interact: 0 -> 100
mav@test:/test/hackbench# ./hackbench 30 process 1000
Running with 30*40 (== 1200) tasks.
Time: 3.190

I think it affects balance between pipe latency and bandwidth, while 
test measures only the last. It is clear that conclusion from these 
numbers depends on what do we want to have.


--
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Startvation of realtime piority threads

2012-04-10 Thread Sushanth Rai
Thanks. I'll try to back port locally.

Sushanth

--- On Tue, 4/10/12, John Baldwin  wrote:

> From: John Baldwin 
> Subject: Re: Startvation of realtime piority threads
> To: "Sushanth Rai" 
> Cc: freebsd-hackers@freebsd.org
> Date: Tuesday, April 10, 2012, 6:57 AM
> On Monday, April 09, 2012 4:32:24 pm
> Sushanth Rai wrote:
> > I'm using stock 7.2. The priorities as defined in
> priority.h are in this range:
> > 
> > /*
> >  * Priorities range from 0 to 255, but differences
> of less then 4 (RQ_PPQ)
> >  * are insignificant.  Ranges are as
> follows:
> >  *
> >  * Interrupt threads:       
>    0 - 63
> >  * Top half kernel threads: 
>    64 - 127
> >  * Realtime user threads:   
>    128 - 159
> >  * Time sharing user threads:   160
> - 223
> >  * Idle user threads:       
>    224 - 255
> >  *
> >  * XXX If/When the specific interrupt thread and
> top half thread ranges
> >  * disappear, a larger range can be used for user
> processes.
> >  */
> > 
> > The trouble is with vm_waitpfault(), which explicitly
> sleeps at PUSER.
> 
> Ah, yes, PUSER is the one Pxxx not in "top half kernel
> threads".  You can patch
> that locally, but you may have better lucking using 9.0 (or
> backporting my
> fixes in 9.0 back to 7 or 8).  They were too invasive
> to backport to FreeBSD
> 7/8, but you could still do it locally (I've used them at
> work on both 7 and 8).
> 
> -- 
> John Baldwin
> ___
> freebsd-hackers@freebsd.org
> mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
>
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-10 Thread Alexander Motin

On 04/10/12 19:58, Arnaud Lacombe wrote:

2012/4/9 Alexander Motin:

[...]

I have strong feeling that while this test may be interesting for profiling,
it's own results in first place depend not from how fast scheduler is, but
from the pipes capacity and other alike things. Can somebody hint me what
except pipe capacity and context switch to unblocked receiver prevents
sender from sending all data in batch and then receiver from receiving them
all in batch? If different OSes have different policies there, I think
results could be incomparable.


Let me disagree on your conclusion. If OS A does a task in X seconds,
and OS B does the same task in Y seconds, if Y>  X, then OS B is just
not performing good enough. Internal implementation's difference for
the task can not be waived as an excuse for result's comparability.


Sure, numbers are always numbers, but the question is what are they 
showing? Understanding of the test results is even more important for 
purely synthetic tests like this. Especially when one test run gives 25 
seconds, while another gives 50. This test is not completely clear to me 
and that is what I've told.


--
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


mlock/mlockall (was: Re: problems with mmap() and disk caching)

2012-04-10 Thread Dieter BSD
Andrey writes:
> Wired memory: kernel memory and yes, application may get wired memory
> through mlock()/mlockall(), but I haven't seen any real application
> which calls mlock().

Apps with real time considerations may need to lock memory to prevent
having to wait for page/swap.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-10 Thread Arnaud Lacombe
Hi,

2012/4/9 Alexander Motin :
> [...]
>
> I have strong feeling that while this test may be interesting for profiling,
> it's own results in first place depend not from how fast scheduler is, but
> from the pipes capacity and other alike things. Can somebody hint me what
> except pipe capacity and context switch to unblocked receiver prevents
> sender from sending all data in batch and then receiver from receiving them
> all in batch? If different OSes have different policies there, I think
> results could be incomparable.
>
Let me disagree on your conclusion. If OS A does a task in X seconds,
and OS B does the same task in Y seconds, if Y > X, then OS B is just
not performing good enough. Internal implementation's difference for
the task can not be waived as an excuse for result's comparability.

 - Arnaud
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: problems with mmap() and disk caching

2012-04-10 Thread Alan Cox

On 04/09/2012 10:26, John Baldwin wrote:

On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

Hi,

I open the file, then call mmap() on the whole file and get pointer,
then I work with this pointer.  I expect that page should be only once
touched to get it into the memory (disk cache?), but this doesn't work!

I wrote the test (attached) and ran it for the 1G file generated from
/dev/random, the result is the following:

Prepare file:
# swapoff -a
# newfs /dev/ada0b
# mount /dev/ada0b /mnt
# dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024

Purge cache:
# umount /mnt
# mount /dev/ada0b /mnt

Run test:
$ ./mmap /mnt/random-1024 30
mmap:  1 pass took:   7.431046 (none: 262112; res: 32; super:
0; other:  0)
mmap:  2 pass took:   7.356670 (none: 261648; res:496; super:
0; other:  0)
mmap:  3 pass took:   7.307094 (none: 260521; res:   1623; super:
0; other:  0)
mmap:  4 pass took:   7.350239 (none: 258904; res:   3240; super:
0; other:  0)
mmap:  5 pass took:   7.392480 (none: 257286; res:   4858; super:
0; other:  0)
mmap:  6 pass took:   7.292069 (none: 255584; res:   6560; super:
0; other:  0)
mmap:  7 pass took:   7.048980 (none: 251142; res:  11002; super:
0; other:  0)
mmap:  8 pass took:   6.899387 (none: 247584; res:  14560; super:
0; other:  0)
mmap:  9 pass took:   7.190579 (none: 242992; res:  19152; super:
0; other:  0)
mmap: 10 pass took:   6.915482 (none: 239308; res:  22836; super:
0; other:  0)
mmap: 11 pass took:   6.565909 (none: 232835; res:  29309; super:
0; other:  0)
mmap: 12 pass took:   6.423945 (none: 226160; res:  35984; super:
0; other:  0)
mmap: 13 pass took:   6.315385 (none: 208555; res:  53589; super:
0; other:  0)
mmap: 14 pass took:   6.760780 (none: 192805; res:  69339; super:
0; other:  0)
mmap: 15 pass took:   5.721513 (none: 174497; res:  87647; super:
0; other:  0)
mmap: 16 pass took:   5.004424 (none: 155938; res: 106206; super:
0; other:  0)
mmap: 17 pass took:   4.224926 (none: 135639; res: 126505; super:
0; other:  0)
mmap: 18 pass took:   3.749608 (none: 117952; res: 144192; super:
0; other:  0)
mmap: 19 pass took:   3.398084 (none:  99066; res: 163078; super:
0; other:  0)
mmap: 20 pass took:   3.029557 (none:  74994; res: 187150; super:
0; other:  0)
mmap: 21 pass took:   2.379430 (none:  55231; res: 206913; super:
0; other:  0)
mmap: 22 pass took:   2.046521 (none:  40786; res: 221358; super:
0; other:  0)
mmap: 23 pass took:   1.152797 (none:  30311; res: 231833; super:
0; other:  0)
mmap: 24 pass took:   0.972617 (none:  16196; res: 245948; super:
0; other:  0)
mmap: 25 pass took:   0.577515 (none:   8286; res: 253858; super:
0; other:  0)
mmap: 26 pass took:   0.380738 (none:   3712; res: 258432; super:
0; other:  0)
mmap: 27 pass took:   0.253583 (none:   1193; res: 260951; super:
0; other:  0)
mmap: 28 pass took:   0.157508 (none:  0; res: 262144; super:
0; other:  0)
mmap: 29 pass took:   0.156169 (none:  0; res: 262144; super:
0; other:  0)
mmap: 30 pass took:   0.156550 (none:  0; res: 262144; super:
0; other:  0)

If I ran this:
$ cat /mnt/random-1024>   /dev/null
before test, when result is the following:

$ ./mmap /mnt/random-1024 5
mmap:  1 pass took:   0.337657 (none:  0; res: 262144; super:
0; other:  0)
mmap:  2 pass took:   0.186137 (none:  0; res: 262144; super:
0; other:  0)
mmap:  3 pass took:   0.186132 (none:  0; res: 262144; super:
0; other:  0)
mmap:  4 pass took:   0.186535 (none:  0; res: 262144; super:
0; other:  0)
mmap:  5 pass took:   0.190353 (none:  0; res: 262144; super:
0; other:  0)

This is what I expect.  But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.

I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years.  Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

  pmap_remove_all(mt);
  if (mt->dirty != 0)
  vm_page_deactivate(mt);
  else
  vm_page_cache(mt);

to:

  vm_page_dontneed(mt);

because I suspect that the current code does more harm than good.  In
theory, it saves activations of the page daemon.  However, more often
than not, I suspect that we are spending more on page reactivations than
we are saving on page daemon activations.  The sequential access
detection heuristic is just too easily triggered.  For example, I've
seen it triggered by demand paging of the gcc text segment.  Also, I
t

Re: Startvation of realtime piority threads

2012-04-10 Thread John Baldwin
On Monday, April 09, 2012 4:32:24 pm Sushanth Rai wrote:
> I'm using stock 7.2. The priorities as defined in priority.h are in this 
> range:
> 
> /*
>  * Priorities range from 0 to 255, but differences of less then 4 (RQ_PPQ)
>  * are insignificant.  Ranges are as follows:
>  *
>  * Interrupt threads:   0 - 63
>  * Top half kernel threads: 64 - 127
>  * Realtime user threads:   128 - 159
>  * Time sharing user threads:   160 - 223
>  * Idle user threads:   224 - 255
>  *
>  * XXX If/When the specific interrupt thread and top half thread ranges
>  * disappear, a larger range can be used for user processes.
>  */
> 
> The trouble is with vm_waitpfault(), which explicitly sleeps at PUSER.

Ah, yes, PUSER is the one Pxxx not in "top half kernel threads".  You can patch
that locally, but you may have better lucking using 9.0 (or backporting my
fixes in 9.0 back to 7 or 8).  They were too invasive to backport to FreeBSD
7/8, but you could still do it locally (I've used them at work on both 7 and 8).

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: mlockall() on freebsd 7.2 + amd64 returns EAGAIN

2012-04-10 Thread Konstantin Belousov
On Mon, Apr 09, 2012 at 07:37:11PM -0700, Sushanth Rai wrote:
> Hello,
> 
> I have a simple program that links with the math library. The only thing that 
> program does is to call mlockall(MCL_CURRENT | MCL_FUTURE). This call to 
> mlockall fails with EAGAIN. I figured out that kernel vm_fault() is returning 
> KERN_PROTECTION_FAILURE when it tries to fault-in the mmap'ed math library 
> address. But I can't figure why.
> 
> The /proc//map returns the following for the process:
> 
> 0x800634000 0x80064c000 24 0 0xff0025571510 r-x 104 52 0x1000 COW NC 
> vnode /lib/libm.so.5
> 0x80064c000 0x80064d000 1 0 0xff016f11c5e8 r-x 1 0 0x3100 COW NNC vnode 
> /lib/libm.so.5
> 0x80064d000 0x80074c000 4 0 0xff0025571510 r-x 104 52 0x1000 COW NC vnode 
> /lib/libm.so.5
> 
> Since ntpd calls mlockall with same option and links with math library too, I 
> look at map o/p of ntpd, which looks slightly different "resident" column 
> (3rd column) on 3rd line:
> 0x800682000 0x80069a000 8 0 0xff0025571510 r-x 100 50 0x1000 COW NC vnode 
> /lib/libm.so.5
> 0x80069a000 0x80069b000 1 0 0xff0103b85870 r-x 1 0 0x3100 COW NNC vnode 
> /lib/libm.so.5
> 0x80069b000 0x80079a000 0 0 0xff0025571510 r-x 100 50 0x1000 COW NC vnode 
> /lib/libm.so.5
> 
> I don't know if that has anything to do with failure. The snippet of code 
> that returns failure in vm_fault() is the following:
> 
> if (fs.pindex >= fs.object->size) {
>   unlock_and_deallocate(&fs);
>   return (KERN_PROTECTION_FAILURE);
> }
> 
> Any help would be appreciated.

This might be a bug fixed in r191810, but I am not sure.


pgpHtdR6p6D09.pgp
Description: PGP signature


Re: time stops in vmware

2012-04-10 Thread Daniel Braniss
> On Sun, 08 Apr 2012 02:11:25 -0500, Daniel Braniss   
> wrote:
> 
> > Hi All
> > There was some mention before that time stops under vmware, and now it's
> > happened
> > to me :-)
> >
> > the clock stopped now, the system is responsive, but eg
> > sleep 1
> > never finishes.
> > Is there a solution?
> > btw, I'm running 8.2-stable, i'll try 8.3 soon.
> >
> 
> Can you recreate it? Does it go away if you use "kern.hz=200" in  
> loader.conf? We used to have to do that.

no, I can't recreate it, could you?



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"