On Fri, Dec 9, 2011 at 1:09 PM, Win Htin <win.h...@gmail.com> wrote: > >> Hi folks, > >> > >> I have a particular process which is very latency sensitive (in the > >> milliseconds range). The network team has determined the > >> response/round-trip time between their monitoring device and that > >> particular process can go up to 4x above the norm. Looking through all > >> sorts of standard linux commands and output from monitoring tool > >> (Nagios), I can see the CPU usage never exceeds more than 20% and > >> app/user memory usage was below 50% during the so called slowness > >> windows. > >> > >> This issue pops up only a handful of times each day and since it is > >> very transient in nature, it is extremely difficult to determine what > >> is causing this intermittent slowness. Any idea what sort of tools > >> might be able to help me pinpoint the issue? > >> > >> The users are not complaining about other processes running on the > >> same server because those other processes are comparatively not that > >> latency sensitive. > >> > >> I am running RHEL 5.7 (64bit) with kernel revision 2.6.18-274.3.1.el5. > > > When the process is responding slowly, is it paged out? Have you tuned > > vm.swappiness for your server load to reduce the changes of processes > > getting paged out in favor of disk caching? The default vm.swappiness > > may not be ideal for your workload. > > With the standard tools I have, it is not possible to actively monitor > this in real time since the latency issue is only part of the multiple > stages of the application and is in the milliseconds range. Saying > that, I don't think it is paged out. > > > What about disk I/O? I suggest you run "iostat -xnd 1" to monitor your > disk use during this time. > > The disk I/O is very little. The app is mostly in memory. Again, even > if it was disk I/O related, the iostat command most likely will not be > able to capture it due to the issue lasting less than 30 milliseconds. > > The users are expecting e.g. max X millisecond response and if it is > anything above, it is no good for them. Anyhow, I did a tcpdump and > the app guys turned on debug mode while the tcpdump was going on. I > guess we can correlate some useful info out of that and go from there. >
maybe you would require realtime linux or similar, with network QoS set?
_______________________________________________ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list