Re: amrd disk performance drop after running under high load

2007-10-27 Thread Kris Kennaway

Alexey Popov wrote:

Hi

Kris Kennaway wrote:
So I can conclude that FreeBSD has a long standing bug in VM that 
could be triggered when serving large amount of static data (much 
bigger than memory size) on high rates. Possibly this only applies 
to large files like mp3 or video. 

It is possible, we have further work to do to conclude this though.
I forgot to mention I have pmc and kgmon profiling for good and bad 
times. But I have not enough knowledge to interpret it right and not 
sure if it can help.

pmc would be useful.

pmc profiling attached.


Sorry for the delay, I was travelling last weekend and it took a few 
days to catch up.


OK, the pmc traces do seem to show that it's not a lock contention 
issue.  That being the case I don't think the fact that different 
servers perform better is directly related.  In my tests multithreaded 
web servers don't seem to perform well anyway.


There is also no evidence of a VM problem.  What your vmstat and pmc 
traces show is that your system really isn't doing much work at all, 
relatively speaking.


There is also still no evidence of a disk problem.  In fact your disk 
seems to be almost idle in both cases you provided, only doing between 1 
and 10 operations per second, which is trivial.


In the "good" case you are getting a much higher interrupt rate but with 
the data you provided I can't tell where from.  You need to run vmstat 
-i at regular intervals (e.g. every 10 seconds for a minute) during the 
"good" and "bad" times, since it only provides counters and an average 
rate over the uptime of the system.


What there is evidence of is an interrupt aliasing problem between em 
and USB:


irq16: uhci0  1464547796   1870
irq64: em01463513610   1869

This is a problem on some intel systems.  Basically each em0 interrupt 
is also causing a bogus interrupt to the uhci0 device too.  This will be 
causing some overhead and might be contributing to the UMA problems.  I 
am not sure if it is the main issue, although it could be.  It is mostly 
serious when both irqs run under Giant, because they will both fight for 
it every time one of them interrupts.  That is not the case here but it 
could be other bad scenarios too.  You could try disabling USB support 
in your kernel since you dont seem to be using it.


Kris
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Floating point in interrupt handler

2007-10-27 Thread M. Warner Losh
In message: <[EMAIL PROTECTED]>
"Daan Vreeken [PA4DAN]" <[EMAIL PROTECTED]> writes:
: Hi Warner,
: 
: On Wednesday 24 October 2007 23:15:13 you wrote:
: > In message: <[EMAIL PROTECTED]>
: >
: > "Daan Vreeken [PA4DAN]" <[EMAIL PROTECTED]> writes:
: > : But what I haven't found is a description of exactly what the kernel is
: > : missing to allow floating point operations to be done there.
: >
: > FPU context is assumed to only change in user processes.  You'd have
: > to fix the FPU state saving code to cope with it changing everywhere,
: > or you'd have to explicitly put the goo to save/restore it around the
: > FP you want to do in the kernel.
:  
: Issei Suzuki pointed me into the right direction in his reply. (The following 
: text is an exact copy of the reply I sent to Issei, but I'll just copy it 
: here to show the code)
: If I understand the npx code correctly, there are 2 options when the kernel 
: arrives at hardclock() :
: o The current process is using the FPU (fpcurthread != NULL)
: o The current process hasn't used the FPU (yet) since it has been switched to
:(fpcurthread == NULL)
: In the first case, FPU instructions can be used and will not result in a 
trap, 
: but we should save/restore the FPU state before using them so userland 
: doesn't get confused. In the last case FPU instructions result in a trap, so 
: we need stop/start_emulating(), but as no one is using the FPU, there is no 
: need to save/restore it's state.
: 
: With this in mind I've come up with the following code :
: 
: At the start of the function :
: // check FPU state on entry
: if (PCPU_GET(fpcurthread) != NULL) {
: // someone is using the FPU
: // save it's state and remember to put it back later
: restore = 1;
: fpusave(&fpu_state);
: } else {
: // no one is using the FPU
: // enable use of FPU instructions, no need to save it's state
: restore = 0;
: stop_emulating();
: }
: // init FPU state every time we get here, as we don't know who has
: // been playing with it in between calls
: fninit();
: control = __INITIAL_NPXCW__;
: fldcw(&control);
: 
: Then we do some floating point arithmetic.
: 
: And at the end of the function :
: // restore FPU state before we leave
: if (restore) {
: // restore FPU registers to what they were
: fpurstor(&fpu_state);
: } else {
: // no one was using the FPU, so re-enable the FPU trap
: start_emulating();
: }
: 
: With this code trap-22 has stopped to trigger within my function. The FPU 
: instructions still seem to be executed correctly in my function and when 
: adding a couple of printf()'s I can see it fpusave() and fpurstor() when 
: interrupting a userland process that uses the FPU.
: Does this look reasonable to everyone?

My concern here would be to make sure that your code doesn't migrate
from one CPU to another.  Other than that, I think it is OK.

Warner
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Floating point in interrupt handler

2007-10-27 Thread Daan Vreeken [PA4DAN]
Hi Warner,

On Wednesday 24 October 2007 23:15:13 you wrote:
> In message: <[EMAIL PROTECTED]>
>
> "Daan Vreeken [PA4DAN]" <[EMAIL PROTECTED]> writes:
> : But what I haven't found is a description of exactly what the kernel is
> : missing to allow floating point operations to be done there.
>
> FPU context is assumed to only change in user processes.  You'd have
> to fix the FPU state saving code to cope with it changing everywhere,
> or you'd have to explicitly put the goo to save/restore it around the
> FP you want to do in the kernel.
 
Issei Suzuki pointed me into the right direction in his reply. (The following 
text is an exact copy of the reply I sent to Issei, but I'll just copy it 
here to show the code)
If I understand the npx code correctly, there are 2 options when the kernel 
arrives at hardclock() :
o The current process is using the FPU (fpcurthread != NULL)
o The current process hasn't used the FPU (yet) since it has been switched to
   (fpcurthread == NULL)
In the first case, FPU instructions can be used and will not result in a trap, 
but we should save/restore the FPU state before using them so userland 
doesn't get confused. In the last case FPU instructions result in a trap, so 
we need stop/start_emulating(), but as no one is using the FPU, there is no 
need to save/restore it's state.

With this in mind I've come up with the following code :

At the start of the function :
// check FPU state on entry
if (PCPU_GET(fpcurthread) != NULL) {
// someone is using the FPU
// save it's state and remember to put it back later
restore = 1;
fpusave(&fpu_state);
} else {
// no one is using the FPU
// enable use of FPU instructions, no need to save it's state
restore = 0;
stop_emulating();
}
// init FPU state every time we get here, as we don't know who has
// been playing with it in between calls
fninit();
control = __INITIAL_NPXCW__;
fldcw(&control);

Then we do some floating point arithmetic.

And at the end of the function :
// restore FPU state before we leave
if (restore) {
// restore FPU registers to what they were
fpurstor(&fpu_state);
} else {
// no one was using the FPU, so re-enable the FPU trap
start_emulating();
}

With this code trap-22 has stopped to trigger within my function. The FPU 
instructions still seem to be executed correctly in my function and when 
adding a couple of printf()'s I can see it fpusave() and fpurstor() when 
interrupting a userland process that uses the FPU.
Does this look reasonable to everyone?

> You had also make sure that the 
> floating point exceptions never trap, since trapping inside the kernel
> has very limited support.  You'll have to fix the problems that this
> would cause, or force the FPU into a state where it never traps.

According to 'man feenableexcept'. All exceptions are masked by default.

> Sure, maybe you can make it happen.  However, you are in for much pain
> and suffering.  The kernel isn't a general purpose computing
> environment, and trying to pretend it is will lead to suffering.
> Especially inside of an interrupt handler.  It is less bad if this is
> an ithread, but could be quite painful if you want to do this inside
> of a fast interrupt handler to reduce latency.
>
> I'd recommend strongly against trying this and revaluate your true
> need for FP in the kernel.  From your other mail, you don't seem open
> to this answer.  If you don't take it, you are setting yourself up for
> a lot of pain and suffering.  It is your choice, however.

I'm building a system much like Matlab xPC : 
http://www.techsource.com.sg/pdts/pdt_details.asp?pcid=2&pscid=19

This system will be used by multiple people for various different control 
systems. All of them know how to write basic C code, but none of them has 
enough programming knowledge to write decent integer math. I'm not a big fan 
of using floating point operations inside of the kernel, but if I can't offer 
this, I'll have to write all of their control loop routines myself. (And I 
really don't have time for that.)

The machines this will run on are dedicated to the control system. I can live 
with the machines spending 99% of their CPU time in kernel mode. Userland 
will only be used as a nice way to interface with the control loop and to 
transfer data and logs in/out.
Userland will almost never use FP instructions. In the test I've done so far 
with the above code, FPU save/restore only has to be at most a couple of 
times a second.

> If you do manage to pull it of, I'd be very interested in see what things I
> didn't know to warn you about...

Feel free to punch holes in the code I've written. I'm still rather new to the 
FPU internals, so it's highly possible that I've overse

Re: dump problems

2007-10-27 Thread Danny Braniss
to recap:
dump will get stuck/deadlock.

found the problem, but no solution in sight :-(
dump creates 3 processess and syncs with them via kill(2) - don't want
to go into the merrits of this - and sometimes:
- the signal is received even if it's blocked.
- the signal is not received.
- it happens mainly on > 2way multicore hosts.

all this seems to be a problem in the kernel.

need some insight please!

thanks,
danny
PS: sorry for the shortness of the message, anything else will get me to
start [EMAIL PROTECTED] dump, specially tape.c.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"