Re: [Emc-developers] SMP kernel latency test results

EBo Mon, 09 Feb 2009 16:55:33 -0800

John,

Thank you for your detailed reply.


> > I was finally able to get it to configure similar to Eric's setup 
> > (althought I
> > am running 2.6.27-magma due to problems with the 2.6.24 kernel not playing
> > nicely with my CPU fan -- a known problem), but have an odd thing which 
> > seems
> > to require a CPU hog process in another terminal.  If the process is not
> > there, then the RT thread seems to stall waiting for the scheduler to give 
> > it
> > a time slice.
> 
> How long of a stall?

Here is my experimental setup...  open up two windows. 

In the first run either the configuration program or the latency tests.  The
test writes out the header block and just sits there.  I mean it does not
appear to run the thread at all.  No go and run a CPU hog in the other
terminal.  My hog of choice is "top -d 0" which grabs process info with a 0
second period and just start flashes.

Once I start the second process, the first (RT) starts chunking out the
latency info at 1 second intervals.  Now for the really interesting part. 
Kill the first proces.

Killing the CPU hog process causes the RT thread to stall again.  Until I
restart CPU hog.  Every once and awhile it will give another tic, but not
continuously as expected.  So, while the 1 second wait does check the latency,
control never goes back to the calling process to get ready for the next tic.
 This completely disrupts the actual flow of the latency code and does not
really check it for overall smoothness of the requested tic's -- which has
huge implications with regard to however the tool velocities of a machine
which is driven by a processing thread on a CPU with this problem (and I would
call it a bug).  It may just be some weirdness with my kernel config, but I
personally consider this a warning sign...

> In my experience, the "cpu hog" is able to reduce latencies from 10-20 
> microseconds down to perhaps 5-7uS.  If your stalls are much longer than 
> that you must be seeing something new.

The stall is not happening inside the RT loop, but outside it in the calling
process.  As I was playing with various configurations (like Steve's?
suggestion of isolcpus=1, turning off hyperthreading, etc.) I did see similar
reduction.  There are some interesting patterns in the actual results.  I was
watching not only the ovr_max, but also the lat_max.  For me the lat_max will
bounce around between a say -200ns to maybe 300ns, and then jump to 2000ns to
4000ns blocks, and then sometimes settle back to near 0.

> My own theory (and it is only a theory) about why the cpu hog works is 
> related to cache.  The hog uses very little memory, and since it keeps 
> one CPU busy, that CPU never runs any other code.  So the RT code 
> doesn't get flushed out of cache, and doesn't have to get fetched back 
> into cache later.

The cache theory (which makes seance) would explain the jumping blocks seen
above, but it does not explain my current problem with the non RT side of the
latency test stalling the way it does.

Maybe what is needed is another latency test which uses a continuous/periodic
interrupt.  This would have caught my problem -- of if it is already written
that way, then something hinky is definitely going on, because it is not only
the I/O which is getting backed up, but it appears to be actually stalling
since it resumes at 1 second intervals (similar to putting the process to
sleep and then resuming it).

> I saw some other cache related behavior a long time ago when doing some 
> latency testing.  The latency results improved noticeably when I lowered 
> the thread period below some threshold.  (I don't remember the threshold 
> period, it was at least a year ago.)
> 
> I eventually realized that when I was running the thread very 
> frequently, the RT code never got pushed out of cache.  When I increased 
> the period, other processes had enough time to replace the RT code in 
> cache between invocations of the thread.

John,  thanks again for your reply, and I will keep this in mind while I
trudge along.  As a note, I am going to go back and completely reconfigure my
machine from scratch and see if I can sort this out.

  Best regards,

  EBo --


------------------------------------------------------------------------------
Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
software. With Adobe AIR, Ajax developers can use existing skills and code to
build responsive, highly engaging applications that combine the power of local
resources and data with the reach of the web. Download the Adobe AIR SDK and
Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com
_______________________________________________
Emc-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/emc-developers

Re: [Emc-developers] SMP kernel latency test results

Reply via email to