Re: [OpenSIPS-Users] MediaProxy loading issues - I think I need some tuning here
Hi, On Apr 5, 2012, at 10:51 PM, Jock McKechnie wrote: > Thank you for your suggestions; > > I have noticed a very strange symptom but I've yet to determine if it > actually affects call handling or not. When the -relay is heavily > loaded it'll have the load spread out across the cores and then, > suddenly, the CPU usage appears to drift over to a single core and max > it out for a bit, with the other cores doing nothing and then it > all spreads out again. The heavier loaded it is, the more time it > spends on one core. Very strange. I installed irqbalance but it > appears not to make a difference. > If you are bombarding the server with calls continuously, you could see a CPU spike, since the call setup is done in a single thread, but after the conntrack rule has been created the kernel takes care and load is shared across all cores. Though I have never experienced this. > I'm hoping the worst this may cause is a slight delay in a call > starting up with an allocated pair of media ports in iptables > forwarding, rather than call audio distortion. Have you ever seen > anything like this? > Since MediaProxy doesn't 'touch' the actual media I don't think it can cause distortion. Now, if the system is so overloaded that packets can't leave the server 'on time', you may have jitter issues. This is just a hypothesis, because it would mean that your server is so overloaded that even sending some UDP data is a problem... Regards, -- Saúl Ibarra Corretgé AG Projects ___ Users mailing list Users@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Re: [OpenSIPS-Users] MediaProxy loading issues - I think I need some tuning here
Thank you for your suggestions; I have noticed a very strange symptom but I've yet to determine if it actually affects call handling or not. When the -relay is heavily loaded it'll have the load spread out across the cores and then, suddenly, the CPU usage appears to drift over to a single core and max it out for a bit, with the other cores doing nothing and then it all spreads out again. The heavier loaded it is, the more time it spends on one core. Very strange. I installed irqbalance but it appears not to make a difference. I'm hoping the worst this may cause is a slight delay in a call starting up with an allocated pair of media ports in iptables forwarding, rather than call audio distortion. Have you ever seen anything like this? - JP On Thu, Apr 5, 2012 at 6:58 AM, Saúl Ibarra Corretgé wrote: > Hi, > >> >> Saúl, >> >> You called it. Complete turn around in load-out - no more port >> complaints and at 900 calls I'm seeing around 20% usage across four >> cores. And the 'apt-get update' updates -relay for me, which I was >> expecting to have to build (like I did initially), so I'm very >> pleased. >> > > I'm happy it works for you now :-) > >> When the system is humming along at 900 calls I started noticing these pop >> up: >> warning: Aggregate speed calculation time exceeded 10ms: 10401us for >> 431 sessions >> Googling shows that this is related to some statistics, so I've turned >> it off for now to see how far I can push media-proxy. >> > > It's just an indicator of how much data MediaProxy is relaying, of course as > number of sessions goes up, this calculation takes time, and while this > calculation is in progress the relay can't accept new sessions. You can > either sample at longer intervals, thus having a less accurate measurement, > or just disable it, in case you don't care about it. > >> Would you be able to recommend some other settings that I should make >> to help mediaproxy-relay push as many calls as possible? >> > > Given the fact that the relayed traffic is UDP I'm not sure if something can > be tweaked in the kernel, but one thing you can check is interrupts. See if > they are hitting a single CPU (check /proc/interrupts) and if so install > irqbalance so that interrupts are balanced among the cores. Good network > cards also help, of course :-) > > Given the fact that MediaProxy doesn't do much while the kernel is relaying > the traffic, you'll probably hit limits related to networking first. > Nevertheless, MediaProxy was designed with horizontal scalability in mind, so > if you need to handle more calls, you can add more relays :-) Also don't > forget that a relay can be connected to several dispatchers. > > > Regards, > > -- > Saúl Ibarra Corretgé > AG Projects > > > > > ___ > Users mailing list > Users@lists.opensips.org > http://lists.opensips.org/cgi-bin/mailman/listinfo/users ___ Users mailing list Users@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Re: [OpenSIPS-Users] MediaProxy loading issues - I think I need some tuning here
Hi, > > Saúl, > > You called it. Complete turn around in load-out - no more port > complaints and at 900 calls I'm seeing around 20% usage across four > cores. And the 'apt-get update' updates -relay for me, which I was > expecting to have to build (like I did initially), so I'm very > pleased. > I'm happy it works for you now :-) > When the system is humming along at 900 calls I started noticing these pop up: > warning: Aggregate speed calculation time exceeded 10ms: 10401us for > 431 sessions > Googling shows that this is related to some statistics, so I've turned > it off for now to see how far I can push media-proxy. > It's just an indicator of how much data MediaProxy is relaying, of course as number of sessions goes up, this calculation takes time, and while this calculation is in progress the relay can't accept new sessions. You can either sample at longer intervals, thus having a less accurate measurement, or just disable it, in case you don't care about it. > Would you be able to recommend some other settings that I should make > to help mediaproxy-relay push as many calls as possible? > Given the fact that the relayed traffic is UDP I'm not sure if something can be tweaked in the kernel, but one thing you can check is interrupts. See if they are hitting a single CPU (check /proc/interrupts) and if so install irqbalance so that interrupts are balanced among the cores. Good network cards also help, of course :-) Given the fact that MediaProxy doesn't do much while the kernel is relaying the traffic, you'll probably hit limits related to networking first. Nevertheless, MediaProxy was designed with horizontal scalability in mind, so if you need to handle more calls, you can add more relays :-) Also don't forget that a relay can be connected to several dispatchers. Regards, -- Saúl Ibarra Corretgé AG Projects ___ Users mailing list Users@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Re: [OpenSIPS-Users] MediaProxy loading issues - I think I need some tuning here
>> I'm running a Deb Wheezy/Sid (unstable) release to keep up with the >> latest dependencies for MediaProxy's build - which, I admit, I'm using >> a build package from a few months ago. >> >> I've got iptables v1.4.12.2 running, with MediaProxy 2.5.1 (according >> to the dpkg information after the debuild), so slightly behind that >> fixed descriptor leak release. >> >> The loading on the box was clearly not right with whatever seems to be >> going wrong, so my making any kind of assumptions on how well >> MediaProxy works is unfair until I've got this sorted out. >> > > The problem is indeed the file descriptor leak, which was fixed between 2.5.1 > and 2.5.2. In case you want to verify this yourself, just use lsof on the > media-relay PID and start a call: 4 new descriptors will show up, but after > the call is ended they are not released. > > Please do upgrade to MediaProxy 2.5.2 and test again :-) > > FYI, we do have a public Debian repository with MediaProxy built for several > Debian and Ubuntu versions, check it out: > http://mediaproxy.ag-projects.com/projects/mediaproxy/wiki/InstallationGuide Saúl, You called it. Complete turn around in load-out - no more port complaints and at 900 calls I'm seeing around 20% usage across four cores. And the 'apt-get update' updates -relay for me, which I was expecting to have to build (like I did initially), so I'm very pleased. When the system is humming along at 900 calls I started noticing these pop up: warning: Aggregate speed calculation time exceeded 10ms: 10401us for 431 sessions Googling shows that this is related to some statistics, so I've turned it off for now to see how far I can push media-proxy. Would you be able to recommend some other settings that I should make to help mediaproxy-relay push as many calls as possible? I'm very grateful for your help; - Jock ___ Users mailing list Users@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Re: [OpenSIPS-Users] MediaProxy loading issues - I think I need some tuning here
Hi Jock, On Apr 4, 2012, at 3:10 PM, Jock McKechnie wrote: > Thank you, Saúl, for your swift reply. > > I'm running a Deb Wheezy/Sid (unstable) release to keep up with the > latest dependencies for MediaProxy's build - which, I admit, I'm using > a build package from a few months ago. > > I've got iptables v1.4.12.2 running, with MediaProxy 2.5.1 (according > to the dpkg information after the debuild), so slightly behind that > fixed descriptor leak release. > > The loading on the box was clearly not right with whatever seems to be > going wrong, so my making any kind of assumptions on how well > MediaProxy works is unfair until I've got this sorted out. > The problem is indeed the file descriptor leak, which was fixed between 2.5.1 and 2.5.2. In case you want to verify this yourself, just use lsof on the media-relay PID and start a call: 4 new descriptors will show up, but after the call is ended they are not released. Please do upgrade to MediaProxy 2.5.2 and test again :-) FYI, we do have a public Debian repository with MediaProxy built for several Debian and Ubuntu versions, check it out: http://mediaproxy.ag-projects.com/projects/mediaproxy/wiki/InstallationGuide Regards, -- Saúl Ibarra Corretgé AG Projects ___ Users mailing list Users@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Re: [OpenSIPS-Users] MediaProxy loading issues - I think I need some tuning here
Thank you, Saúl, for your swift reply. I'm running a Deb Wheezy/Sid (unstable) release to keep up with the latest dependencies for MediaProxy's build - which, I admit, I'm using a build package from a few months ago. I've got iptables v1.4.12.2 running, with MediaProxy 2.5.1 (according to the dpkg information after the debuild), so slightly behind that fixed descriptor leak release. The loading on the box was clearly not right with whatever seems to be going wrong, so my making any kind of assumptions on how well MediaProxy works is unfair until I've got this sorted out. Thank you, again. - JP On Wed, Apr 4, 2012 at 1:46 AM, Saúl Ibarra Corretgé wrote: > Hi Jock, > > What MediaProxy version are you running? > > On Apr 3, 2012, at 10:50 PM, Jock McKechnie wrote: > >> Greetings all; >> >> We have several mediaproxy systems running in small scale production >> (~50-100 calls concurrently) and have been very pleased with the >> results. We find that we have to restart the relay/dispatcher machines >> daily to keep them ticking over (they tend to get lost on their own >> after a few days runtime), but this is a minor inconvenience. >> > > What do you mean by "get lost on their own"? > >> Until today. Today I tried moving one of our small carrier circuits >> over to it and gee whiz did all sorts of exciting things happen. I >> have our systems set up with an initial OpenSIPS/media-dispatcher >> running on a VM (public IP). This dispatcher speaks to a blade server >> which is running a single media-relay instance. >> >> Under light load all is well. When the load starts ramping up (800+ >> calls) thing start going a bit pear-shaped, however. I end up with >> massive numbers of entries like this in the syslog of the relay: >> Cannot use port pair 53378/53379 >> Which appears to bog the whole relay down to the point where it's >> using 100% of the core. Even after turning the calls back off, the >> -relay remains at 100% and continues to dump more 'Cannot use port >> pair' notices into rsyslog and is impossible to stop normally due to >> it being so tied up. rsyslog was not loaded out in the 'top', so >> although it was clearly being hammered by -relay, I don't think >> rsyslog was the bottleneck here. >> > > There was a very nasty bug after an API change in iptables which caused > socket descriptors to be leaked, which led to this situation. What version of > iptables are you using? (iptables -V). > >> I guess my first question is, what am I doing wrong here to cause it >> to be pushing literally tens of thousands of these errors? >> >> And then, next, how do I best tune mediaproxy to handle larger loads? >> I was thinking I could run several -relays on a single blade as they >> appear to be single-threaded and, therefore, multiple forks will load >> across the machine properly... but I'm not even sure if -relay can use >> a different conf file to the default. >> > > Yes, MediaProxy is single threaded, but the actual relaying of packets happen > in *kernel space*, not in that single thread. Thus, you shouldn't run more > than one relay in a single box, and that's why it's not even supported. If > one box it's not enough, just add another one with another instance of > MediaProxy relay :-) > >> The dispatcher, which as I said lives on the OpenSIPS vm, looks like this: >> [Dispatcher] >> socket_path=/tmp/dispatcher.sock >> listen=dispatcher.public.ip.address >> management_use_tls=no >> log_level=WARNING >> >> The relay, on a Dell M610 blade, looks like: >> [Relay] >> dispatchers=dispatcher.public.ip.address >> relay_ip=relay.public.ip.address >> port_range=5:6 >> log_level=WARNING >> >> Any suggestions would be gratefully received; >> > > > Regards, > > -- > Saúl Ibarra Corretgé > AG Projects > > > > > ___ > Users mailing list > Users@lists.opensips.org > http://lists.opensips.org/cgi-bin/mailman/listinfo/users ___ Users mailing list Users@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Re: [OpenSIPS-Users] MediaProxy loading issues - I think I need some tuning here
Hi Jock, What MediaProxy version are you running? On Apr 3, 2012, at 10:50 PM, Jock McKechnie wrote: > Greetings all; > > We have several mediaproxy systems running in small scale production > (~50-100 calls concurrently) and have been very pleased with the > results. We find that we have to restart the relay/dispatcher machines > daily to keep them ticking over (they tend to get lost on their own > after a few days runtime), but this is a minor inconvenience. > What do you mean by "get lost on their own"? > Until today. Today I tried moving one of our small carrier circuits > over to it and gee whiz did all sorts of exciting things happen. I > have our systems set up with an initial OpenSIPS/media-dispatcher > running on a VM (public IP). This dispatcher speaks to a blade server > which is running a single media-relay instance. > > Under light load all is well. When the load starts ramping up (800+ > calls) thing start going a bit pear-shaped, however. I end up with > massive numbers of entries like this in the syslog of the relay: > Cannot use port pair 53378/53379 > Which appears to bog the whole relay down to the point where it's > using 100% of the core. Even after turning the calls back off, the > -relay remains at 100% and continues to dump more 'Cannot use port > pair' notices into rsyslog and is impossible to stop normally due to > it being so tied up. rsyslog was not loaded out in the 'top', so > although it was clearly being hammered by -relay, I don't think > rsyslog was the bottleneck here. > There was a very nasty bug after an API change in iptables which caused socket descriptors to be leaked, which led to this situation. What version of iptables are you using? (iptables -V). > I guess my first question is, what am I doing wrong here to cause it > to be pushing literally tens of thousands of these errors? > > And then, next, how do I best tune mediaproxy to handle larger loads? > I was thinking I could run several -relays on a single blade as they > appear to be single-threaded and, therefore, multiple forks will load > across the machine properly... but I'm not even sure if -relay can use > a different conf file to the default. > Yes, MediaProxy is single threaded, but the actual relaying of packets happen in *kernel space*, not in that single thread. Thus, you shouldn't run more than one relay in a single box, and that's why it's not even supported. If one box it's not enough, just add another one with another instance of MediaProxy relay :-) > The dispatcher, which as I said lives on the OpenSIPS vm, looks like this: > [Dispatcher] > socket_path=/tmp/dispatcher.sock > listen=dispatcher.public.ip.address > management_use_tls=no > log_level=WARNING > > The relay, on a Dell M610 blade, looks like: > [Relay] > dispatchers=dispatcher.public.ip.address > relay_ip=relay.public.ip.address > port_range=5:6 > log_level=WARNING > > Any suggestions would be gratefully received; > Regards, -- Saúl Ibarra Corretgé AG Projects ___ Users mailing list Users@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/users
[OpenSIPS-Users] MediaProxy loading issues - I think I need some tuning here
Greetings all; We have several mediaproxy systems running in small scale production (~50-100 calls concurrently) and have been very pleased with the results. We find that we have to restart the relay/dispatcher machines daily to keep them ticking over (they tend to get lost on their own after a few days runtime), but this is a minor inconvenience. Until today. Today I tried moving one of our small carrier circuits over to it and gee whiz did all sorts of exciting things happen. I have our systems set up with an initial OpenSIPS/media-dispatcher running on a VM (public IP). This dispatcher speaks to a blade server which is running a single media-relay instance. Under light load all is well. When the load starts ramping up (800+ calls) thing start going a bit pear-shaped, however. I end up with massive numbers of entries like this in the syslog of the relay: Cannot use port pair 53378/53379 Which appears to bog the whole relay down to the point where it's using 100% of the core. Even after turning the calls back off, the -relay remains at 100% and continues to dump more 'Cannot use port pair' notices into rsyslog and is impossible to stop normally due to it being so tied up. rsyslog was not loaded out in the 'top', so although it was clearly being hammered by -relay, I don't think rsyslog was the bottleneck here. I guess my first question is, what am I doing wrong here to cause it to be pushing literally tens of thousands of these errors? And then, next, how do I best tune mediaproxy to handle larger loads? I was thinking I could run several -relays on a single blade as they appear to be single-threaded and, therefore, multiple forks will load across the machine properly... but I'm not even sure if -relay can use a different conf file to the default. The dispatcher, which as I said lives on the OpenSIPS vm, looks like this: [Dispatcher] socket_path=/tmp/dispatcher.sock listen=dispatcher.public.ip.address management_use_tls=no log_level=WARNING The relay, on a Dell M610 blade, looks like: [Relay] dispatchers=dispatcher.public.ip.address relay_ip=relay.public.ip.address port_range=5:6 log_level=WARNING Any suggestions would be gratefully received; - Jock ___ Users mailing list Users@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/users