Re: [pfSense Support] Network Device pooling
At 01:31 PM 11/1/2005, you wrote: Can we please let this thread die already? I'm tired about hearing of benchmarking the *WRONG* way. "Must. Control. The. Fist. Of. Death." - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [pfSense Support] Network Device pooling
I think the first rule of testing applies start at the beginning and work your way backwards. Please u solved ur problems > -Original Message- > From: Peter Zaitsev [mailto:[EMAIL PROTECTED] > Sent: 01 November 2005 18:16 > To: support@pfsense.com > Subject: RE: [pfSense Support] Network Device pooling > > On Tue, 2005-11-01 at 10:43 -0600, Fleming, John (ZeroChaos) wrote: > > >Also I wrote when stall happens I can't telnet to port 80 on web server > > >host - which means it is not just program causing stall. > > Are you trying this from the same host as the benchmark program? I > > wonder if a 2nd host would have the same problem. > > I did not have an extra host for test. > > I've finally figured out it looks like client is running out of local > ports as increasing ip_local_port_range allowed to get to the > different point. > > Two things confused me here > > 1) For some reason it does not fail if firewall is disabled. Probably > something is different with connect closure. > > 2) The error code reported by "ab" is connect timeout. for this kind of > error it should be "Can't assign requested address" or something > similar. I guess it could be apache runtime abstraction library does > not report this error well enough. > > > > > > > > -----Original Message----- > > From: Peter Zaitsev [mailto:[EMAIL PROTECTED] > > Sent: Monday, October 31, 2005 3:53 PM > > To: support@pfsense.com > > Subject: Re: [pfSense Support] Network Device pooling > > > > On Mon, 2005-10-31 at 16:31 -0500, Scott Ullrich wrote: > > > Are we absolutely sure this program works as intended? Personally I > > > wouldn't trust anything like this but smartbits. > > > > Well... > > > > It works if filtering is disabled on pfsese - this is what worries me. > > If the program would be broken it should not work in both cases. > > > > Also I wrote when stall happens I can't telnet to port 80 on web server > > host - which means it is not just program causing stall. > > > > If it is protection on FreeBSD side from too much activity from same IP > > (Ie as it limits response to flood ping) this would be good to know. > > > > I hope this problem is actually something like that - I know there are a > > lot of FreeBSD based routers out where - if it would be broken for real > > workloads something would scream already. > > > > One more interesting thing I noticed: > > > > Percentage of the requests served within a certain time (ms) > > 50% 32 > > 66% 33 > > 75% 33 > > 80% 33 > > 90% 44 > > 95%295 > > 98%324 > > 99%330 > > 100% 21285 (longest request) > > > > Even if apache benchmark does not timeout it often shows too long > > response rate - (21 sec in this case) > > > > What I've noticed - it can be 3, 9 or 21 secs in this case - This > > really look like the times at which SYN packets are resent by TCP/IP > > stacks if no reply for previous one arrives. > > > > > > Doing more experiments I also discovered I can increase chance of > > passing benchmark (still not to 100%) if i reduce tcp_fin_timeout and > > increase ip_local_port_range variables ob my test driver host. > > > > This still brings the question why with filtering and without behavior > > is different but it makes me worry less :) > > > > > > > > > > Scott > > > > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > > On Mon, 2005-10-31 at 16:25 -0500, Scott Ullrich wrote: > > > > > >apr_poll: The timeout specified has expired (70007) > > > > > > > > > > What is the above from? Your benchmark testing box? > > > > > > > > Yes. This is output from apache benchmark program. > > > > > > > > > > > > Benchmarking 111.111.111.158 (be patient) > > > > Completed 1 requests > > > > Completed 2 requests > > > > Completed 3 requests > > > > apr_poll: The timeout specified has expired (70007) > > > > Total of 30517 requests completed > > > > > > > > > > > > > > > > > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > > > > On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote: > > > > > > > Are you viewing the traffic queue status?
Re: [pfSense Support] Network Device pooling
Can we please let this thread die already? I'm tired about hearing of benchmarking the *WRONG* way. Scott On 11/1/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > On Tue, 2005-11-01 at 10:43 -0600, Fleming, John (ZeroChaos) wrote: > > >Also I wrote when stall happens I can't telnet to port 80 on web server > > >host - which means it is not just program causing stall. > > Are you trying this from the same host as the benchmark program? I > > wonder if a 2nd host would have the same problem. > > I did not have an extra host for test. > > I've finally figured out it looks like client is running out of local > ports as increasing ip_local_port_range allowed to get to the > different point. > > Two things confused me here > > 1) For some reason it does not fail if firewall is disabled. Probably > something is different with connect closure. > > 2) The error code reported by "ab" is connect timeout. for this kind of > error it should be "Can't assign requested address" or something > similar. I guess it could be apache runtime abstraction library does > not report this error well enough. > > > > > > > > -Original Message----- > > From: Peter Zaitsev [mailto:[EMAIL PROTECTED] > > Sent: Monday, October 31, 2005 3:53 PM > > To: support@pfsense.com > > Subject: Re: [pfSense Support] Network Device pooling > > > > On Mon, 2005-10-31 at 16:31 -0500, Scott Ullrich wrote: > > > Are we absolutely sure this program works as intended? Personally I > > > wouldn't trust anything like this but smartbits. > > > > Well... > > > > It works if filtering is disabled on pfsese - this is what worries me. > > If the program would be broken it should not work in both cases. > > > > Also I wrote when stall happens I can't telnet to port 80 on web server > > host - which means it is not just program causing stall. > > > > If it is protection on FreeBSD side from too much activity from same IP > > (Ie as it limits response to flood ping) this would be good to know. > > > > I hope this problem is actually something like that - I know there are a > > lot of FreeBSD based routers out where - if it would be broken for real > > workloads something would scream already. > > > > One more interesting thing I noticed: > > > > Percentage of the requests served within a certain time (ms) > > 50% 32 > > 66% 33 > > 75% 33 > > 80% 33 > > 90% 44 > > 95%295 > > 98%324 > > 99%330 > > 100% 21285 (longest request) > > > > Even if apache benchmark does not timeout it often shows too long > > response rate - (21 sec in this case) > > > > What I've noticed - it can be 3, 9 or 21 secs in this case - This > > really look like the times at which SYN packets are resent by TCP/IP > > stacks if no reply for previous one arrives. > > > > > > Doing more experiments I also discovered I can increase chance of > > passing benchmark (still not to 100%) if i reduce tcp_fin_timeout and > > increase ip_local_port_range variables ob my test driver host. > > > > This still brings the question why with filtering and without behavior > > is different but it makes me worry less :) > > > > > > > > > > Scott > > > > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > > On Mon, 2005-10-31 at 16:25 -0500, Scott Ullrich wrote: > > > > > >apr_poll: The timeout specified has expired (70007) > > > > > > > > > > What is the above from? Your benchmark testing box? > > > > > > > > Yes. This is output from apache benchmark program. > > > > > > > > > > > > Benchmarking 111.111.111.158 (be patient) > > > > Completed 1 requests > > > > Completed 2 requests > > > > Completed 3 requests > > > > apr_poll: The timeout specified has expired (70007) > > > > Total of 30517 requests completed > > > > > > > > > > > > > > > > > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > > > > On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote: > > > > > > > Are you viewing the traffic queue status? This would be > > normal if you are... > > > > > > > > > > > > Heh, > > > > > > > > > >
RE: [pfSense Support] Network Device pooling
On Tue, 2005-11-01 at 10:43 -0600, Fleming, John (ZeroChaos) wrote: > >Also I wrote when stall happens I can't telnet to port 80 on web server > >host - which means it is not just program causing stall. > Are you trying this from the same host as the benchmark program? I > wonder if a 2nd host would have the same problem. I did not have an extra host for test. I've finally figured out it looks like client is running out of local ports as increasing ip_local_port_range allowed to get to the different point. Two things confused me here 1) For some reason it does not fail if firewall is disabled. Probably something is different with connect closure. 2) The error code reported by "ab" is connect timeout. for this kind of error it should be "Can't assign requested address" or something similar. I guess it could be apache runtime abstraction library does not report this error well enough. > > -Original Message- > From: Peter Zaitsev [mailto:[EMAIL PROTECTED] > Sent: Monday, October 31, 2005 3:53 PM > To: support@pfsense.com > Subject: Re: [pfSense Support] Network Device pooling > > On Mon, 2005-10-31 at 16:31 -0500, Scott Ullrich wrote: > > Are we absolutely sure this program works as intended? Personally I > > wouldn't trust anything like this but smartbits. > > Well... > > It works if filtering is disabled on pfsese - this is what worries me. > If the program would be broken it should not work in both cases. > > Also I wrote when stall happens I can't telnet to port 80 on web server > host - which means it is not just program causing stall. > > If it is protection on FreeBSD side from too much activity from same IP > (Ie as it limits response to flood ping) this would be good to know. > > I hope this problem is actually something like that - I know there are a > lot of FreeBSD based routers out where - if it would be broken for real > workloads something would scream already. > > One more interesting thing I noticed: > > Percentage of the requests served within a certain time (ms) > 50% 32 > 66% 33 > 75% 33 > 80% 33 > 90% 44 > 95%295 > 98%324 > 99%330 > 100% 21285 (longest request) > > Even if apache benchmark does not timeout it often shows too long > response rate - (21 sec in this case) > > What I've noticed - it can be 3, 9 or 21 secs in this case - This > really look like the times at which SYN packets are resent by TCP/IP > stacks if no reply for previous one arrives. > > > Doing more experiments I also discovered I can increase chance of > passing benchmark (still not to 100%) if i reduce tcp_fin_timeout and > increase ip_local_port_range variables ob my test driver host. > > This still brings the question why with filtering and without behavior > is different but it makes me worry less :) > > > > > > Scott > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > On Mon, 2005-10-31 at 16:25 -0500, Scott Ullrich wrote: > > > > >apr_poll: The timeout specified has expired (70007) > > > > > > > > What is the above from? Your benchmark testing box? > > > > > > Yes. This is output from apache benchmark program. > > > > > > > > > Benchmarking 111.111.111.158 (be patient) > > > Completed 1 requests > > > Completed 2 requests > > > Completed 3 requests > > > apr_poll: The timeout specified has expired (70007) > > > Total of 30517 requests completed > > > > > > > > > > > > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > > > On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote: > > > > > > Are you viewing the traffic queue status? This would be > normal if you are... > > > > > > > > > > Heh, > > > > > > > > > > yes good quess. These were running in the other window. > > > > > > > > > > > > > > > So here is the output for "stalled" case > > > > > > > > > > # pfctl -ss | wc -l > > > > >51898 > > > > > > > > > > I have number of states set to 100.000 in advanced page so it is > not > > > > > peak number. > > > > > > > > > > > > > > > Note what really surprises me is the number of request when if > fails: > > > > > > > > > > apr_poll: The timeout specified ha
RE: [pfSense Support] Network Device pooling
>Also I wrote when stall happens I can't telnet to port 80 on web server >host - which means it is not just program causing stall. Are you trying this from the same host as the benchmark program? I wonder if a 2nd host would have the same problem. -Original Message- From: Peter Zaitsev [mailto:[EMAIL PROTECTED] Sent: Monday, October 31, 2005 3:53 PM To: support@pfsense.com Subject: Re: [pfSense Support] Network Device pooling On Mon, 2005-10-31 at 16:31 -0500, Scott Ullrich wrote: > Are we absolutely sure this program works as intended? Personally I > wouldn't trust anything like this but smartbits. Well... It works if filtering is disabled on pfsese - this is what worries me. If the program would be broken it should not work in both cases. Also I wrote when stall happens I can't telnet to port 80 on web server host - which means it is not just program causing stall. If it is protection on FreeBSD side from too much activity from same IP (Ie as it limits response to flood ping) this would be good to know. I hope this problem is actually something like that - I know there are a lot of FreeBSD based routers out where - if it would be broken for real workloads something would scream already. One more interesting thing I noticed: Percentage of the requests served within a certain time (ms) 50% 32 66% 33 75% 33 80% 33 90% 44 95%295 98%324 99%330 100% 21285 (longest request) Even if apache benchmark does not timeout it often shows too long response rate - (21 sec in this case) What I've noticed - it can be 3, 9 or 21 secs in this case - This really look like the times at which SYN packets are resent by TCP/IP stacks if no reply for previous one arrives. Doing more experiments I also discovered I can increase chance of passing benchmark (still not to 100%) if i reduce tcp_fin_timeout and increase ip_local_port_range variables ob my test driver host. This still brings the question why with filtering and without behavior is different but it makes me worry less :) > > Scott > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > On Mon, 2005-10-31 at 16:25 -0500, Scott Ullrich wrote: > > > >apr_poll: The timeout specified has expired (70007) > > > > > > What is the above from? Your benchmark testing box? > > > > Yes. This is output from apache benchmark program. > > > > > > Benchmarking 111.111.111.158 (be patient) > > Completed 1 requests > > Completed 2 requests > > Completed 3 requests > > apr_poll: The timeout specified has expired (70007) > > Total of 30517 requests completed > > > > > > > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > > On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote: > > > > > Are you viewing the traffic queue status? This would be normal if you are... > > > > > > > > Heh, > > > > > > > > yes good quess. These were running in the other window. > > > > > > > > > > > > So here is the output for "stalled" case > > > > > > > > # pfctl -ss | wc -l > > > >51898 > > > > > > > > I have number of states set to 100.000 in advanced page so it is not > > > > peak number. > > > > > > > > > > > > Note what really surprises me is the number of request when if fails: > > > > > > > > apr_poll: The timeout specified has expired (70007) > > > > Total of 28217 requests completed > > > > > > > > This number of 28217 is seen so often... Sometimes it is a bit more ot > > > > less but it is very frequently withing +/- 100 of it. > > > > > > > > I was asked if I can connect to the remote box when this problem happens > > > > - yes. I can SSH to the same box which runs Apache, but I can't > > > > connect to the port 80 when this problem happens. > > > > > > > > So it looks like it does not like to see all these states corresponding > > > > to the same target port number. > > > > > > > > > > > > > > > > > > > > > > Scott > > > > > > > > > > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > > > > On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote: > > > > > > > On 10/31/05, Fleming, John (ZeroChaos) <[EMAIL PROTECTED]> wrote: > > > > > > > > I wonder if part of the probl
Re: [pfSense Support] Network Device pooling
On Mon, 2005-10-31 at 16:56 -0500, Scott Ullrich wrote: > Have you seen this? > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=110887 > > Looks like a apachebench problem to me. This is other bug - it instantly fails in that case, it is also fixed in 2.0.48 I'm testing with 2.0.54 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Network Device pooling
On Mon, 2005-10-31 at 16:31 -0500, Scott Ullrich wrote: > Are we absolutely sure this program works as intended? Personally I > wouldn't trust anything like this but smartbits. Well... It works if filtering is disabled on pfsese - this is what worries me. If the program would be broken it should not work in both cases. Also I wrote when stall happens I can't telnet to port 80 on web server host - which means it is not just program causing stall. If it is protection on FreeBSD side from too much activity from same IP (Ie as it limits response to flood ping) this would be good to know. I hope this problem is actually something like that - I know there are a lot of FreeBSD based routers out where - if it would be broken for real workloads something would scream already. One more interesting thing I noticed: Percentage of the requests served within a certain time (ms) 50% 32 66% 33 75% 33 80% 33 90% 44 95%295 98%324 99%330 100% 21285 (longest request) Even if apache benchmark does not timeout it often shows too long response rate - (21 sec in this case) What I've noticed - it can be 3, 9 or 21 secs in this case - This really look like the times at which SYN packets are resent by TCP/IP stacks if no reply for previous one arrives. Doing more experiments I also discovered I can increase chance of passing benchmark (still not to 100%) if i reduce tcp_fin_timeout and increase ip_local_port_range variables ob my test driver host. This still brings the question why with filtering and without behavior is different but it makes me worry less :) > > Scott > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > On Mon, 2005-10-31 at 16:25 -0500, Scott Ullrich wrote: > > > >apr_poll: The timeout specified has expired (70007) > > > > > > What is the above from? Your benchmark testing box? > > > > Yes. This is output from apache benchmark program. > > > > > > Benchmarking 111.111.111.158 (be patient) > > Completed 1 requests > > Completed 2 requests > > Completed 3 requests > > apr_poll: The timeout specified has expired (70007) > > Total of 30517 requests completed > > > > > > > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > > On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote: > > > > > Are you viewing the traffic queue status? This would be normal if > > > > > you are... > > > > > > > > Heh, > > > > > > > > yes good quess. These were running in the other window. > > > > > > > > > > > > So here is the output for "stalled" case > > > > > > > > # pfctl -ss | wc -l > > > >51898 > > > > > > > > I have number of states set to 100.000 in advanced page so it is not > > > > peak number. > > > > > > > > > > > > Note what really surprises me is the number of request when if fails: > > > > > > > > apr_poll: The timeout specified has expired (70007) > > > > Total of 28217 requests completed > > > > > > > > This number of 28217 is seen so often... Sometimes it is a bit more ot > > > > less but it is very frequently withing +/- 100 of it. > > > > > > > > I was asked if I can connect to the remote box when this problem happens > > > > - yes. I can SSH to the same box which runs Apache, but I can't > > > > connect to the port 80 when this problem happens. > > > > > > > > So it looks like it does not like to see all these states corresponding > > > > to the same target port number. > > > > > > > > > > > > > > > > > > > > > > Scott > > > > > > > > > > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > > > > On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote: > > > > > > > On 10/31/05, Fleming, John (ZeroChaos) <[EMAIL PROTECTED]> wrote: > > > > > > > > I wonder if part of the problem is PF isn't seeing the TCP tear > > > > > > > > down. It > > > > > > > > seems a little odd that the max gets hit and nothing else gets > > > > > > > > through. > > > > > > > > I guess it could be the benchmark isn't shutting down the > > > > > > > > session right > > > > > > > > after its down transferring data, but I would think it would > > > > > > > > kill the > > > > > > > > benchmark client to have 10K(ish) of open TCP sessions. > > > > > > > > > > > > > > One way to deterimine this would be to run pfctl -ss | wc -l once > > > > > > > pfSense stops responding? > > > > > > > > > > > > Very interesting > > > > > > > > > > > > I tried running this before the problems but it looks strange > > > > > > already: > > > > > > > > > > > > # pfctl -ss | wc -l > > > > > > 4893 > > > > > > Killed > > > > > > # pfctl -ss | wc -l > > > > > >23245 > > > > > > Killed > > > > > > > > > > > > There is nothing in dmesg or system logs. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - > > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > > > For additional commands, e-mail: [EMAIL PROTECTED] > >
Re: [pfSense Support] Network Device pooling
Have you seen this? https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=110887 Looks like a apachebench problem to me. Scott On 10/31/05, Scott Ullrich <[EMAIL PROTECTED]> wrote: > Are we absolutely sure this program works as intended? Personally I > wouldn't trust anything like this but smartbits. > > Scott > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > On Mon, 2005-10-31 at 16:25 -0500, Scott Ullrich wrote: > > > >apr_poll: The timeout specified has expired (70007) > > > > > > What is the above from? Your benchmark testing box? > > > > Yes. This is output from apache benchmark program. > > > > > > Benchmarking 111.111.111.158 (be patient) > > Completed 1 requests > > Completed 2 requests > > Completed 3 requests > > apr_poll: The timeout specified has expired (70007) > > Total of 30517 requests completed > > > > > > > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > > On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote: > > > > > Are you viewing the traffic queue status? This would be normal if > > > > > you are... > > > > > > > > Heh, > > > > > > > > yes good quess. These were running in the other window. > > > > > > > > > > > > So here is the output for "stalled" case > > > > > > > > # pfctl -ss | wc -l > > > >51898 > > > > > > > > I have number of states set to 100.000 in advanced page so it is not > > > > peak number. > > > > > > > > > > > > Note what really surprises me is the number of request when if fails: > > > > > > > > apr_poll: The timeout specified has expired (70007) > > > > Total of 28217 requests completed > > > > > > > > This number of 28217 is seen so often... Sometimes it is a bit more ot > > > > less but it is very frequently withing +/- 100 of it. > > > > > > > > I was asked if I can connect to the remote box when this problem happens > > > > - yes. I can SSH to the same box which runs Apache, but I can't > > > > connect to the port 80 when this problem happens. > > > > > > > > So it looks like it does not like to see all these states corresponding > > > > to the same target port number. > > > > > > > > > > > > > > > > > > > > > > Scott > > > > > > > > > > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > > > > On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote: > > > > > > > On 10/31/05, Fleming, John (ZeroChaos) <[EMAIL PROTECTED]> wrote: > > > > > > > > I wonder if part of the problem is PF isn't seeing the TCP tear > > > > > > > > down. It > > > > > > > > seems a little odd that the max gets hit and nothing else gets > > > > > > > > through. > > > > > > > > I guess it could be the benchmark isn't shutting down the > > > > > > > > session right > > > > > > > > after its down transferring data, but I would think it would > > > > > > > > kill the > > > > > > > > benchmark client to have 10K(ish) of open TCP sessions. > > > > > > > > > > > > > > One way to deterimine this would be to run pfctl -ss | wc -l once > > > > > > > pfSense stops responding? > > > > > > > > > > > > Very interesting > > > > > > > > > > > > I tried running this before the problems but it looks strange > > > > > > already: > > > > > > > > > > > > # pfctl -ss | wc -l > > > > > > 4893 > > > > > > Killed > > > > > > # pfctl -ss | wc -l > > > > > >23245 > > > > > > Killed > > > > > > > > > > > > There is nothing in dmesg or system logs. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - > > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > - > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > - > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Network Device pooling
Are we absolutely sure this program works as intended? Personally I wouldn't trust anything like this but smartbits. Scott On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > On Mon, 2005-10-31 at 16:25 -0500, Scott Ullrich wrote: > > >apr_poll: The timeout specified has expired (70007) > > > > What is the above from? Your benchmark testing box? > > Yes. This is output from apache benchmark program. > > > Benchmarking 111.111.111.158 (be patient) > Completed 1 requests > Completed 2 requests > Completed 3 requests > apr_poll: The timeout specified has expired (70007) > Total of 30517 requests completed > > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote: > > > > Are you viewing the traffic queue status? This would be normal if you > > > > are... > > > > > > Heh, > > > > > > yes good quess. These were running in the other window. > > > > > > > > > So here is the output for "stalled" case > > > > > > # pfctl -ss | wc -l > > >51898 > > > > > > I have number of states set to 100.000 in advanced page so it is not > > > peak number. > > > > > > > > > Note what really surprises me is the number of request when if fails: > > > > > > apr_poll: The timeout specified has expired (70007) > > > Total of 28217 requests completed > > > > > > This number of 28217 is seen so often... Sometimes it is a bit more ot > > > less but it is very frequently withing +/- 100 of it. > > > > > > I was asked if I can connect to the remote box when this problem happens > > > - yes. I can SSH to the same box which runs Apache, but I can't > > > connect to the port 80 when this problem happens. > > > > > > So it looks like it does not like to see all these states corresponding > > > to the same target port number. > > > > > > > > > > > > > > > > > Scott > > > > > > > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > > > On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote: > > > > > > On 10/31/05, Fleming, John (ZeroChaos) <[EMAIL PROTECTED]> wrote: > > > > > > > I wonder if part of the problem is PF isn't seeing the TCP tear > > > > > > > down. It > > > > > > > seems a little odd that the max gets hit and nothing else gets > > > > > > > through. > > > > > > > I guess it could be the benchmark isn't shutting down the session > > > > > > > right > > > > > > > after its down transferring data, but I would think it would kill > > > > > > > the > > > > > > > benchmark client to have 10K(ish) of open TCP sessions. > > > > > > > > > > > > One way to deterimine this would be to run pfctl -ss | wc -l once > > > > > > pfSense stops responding? > > > > > > > > > > Very interesting > > > > > > > > > > I tried running this before the problems but it looks strange already: > > > > > > > > > > # pfctl -ss | wc -l > > > > > 4893 > > > > > Killed > > > > > # pfctl -ss | wc -l > > > > >23245 > > > > > Killed > > > > > > > > > > There is nothing in dmesg or system logs. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > - > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Network Device pooling
On Mon, 2005-10-31 at 16:25 -0500, Scott Ullrich wrote: > >apr_poll: The timeout specified has expired (70007) > > What is the above from? Your benchmark testing box? Yes. This is output from apache benchmark program. Benchmarking 111.111.111.158 (be patient) Completed 1 requests Completed 2 requests Completed 3 requests apr_poll: The timeout specified has expired (70007) Total of 30517 requests completed > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote: > > > Are you viewing the traffic queue status? This would be normal if you > > > are... > > > > Heh, > > > > yes good quess. These were running in the other window. > > > > > > So here is the output for "stalled" case > > > > # pfctl -ss | wc -l > >51898 > > > > I have number of states set to 100.000 in advanced page so it is not > > peak number. > > > > > > Note what really surprises me is the number of request when if fails: > > > > apr_poll: The timeout specified has expired (70007) > > Total of 28217 requests completed > > > > This number of 28217 is seen so often... Sometimes it is a bit more ot > > less but it is very frequently withing +/- 100 of it. > > > > I was asked if I can connect to the remote box when this problem happens > > - yes. I can SSH to the same box which runs Apache, but I can't > > connect to the port 80 when this problem happens. > > > > So it looks like it does not like to see all these states corresponding > > to the same target port number. > > > > > > > > > > > > Scott > > > > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > > On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote: > > > > > On 10/31/05, Fleming, John (ZeroChaos) <[EMAIL PROTECTED]> wrote: > > > > > > I wonder if part of the problem is PF isn't seeing the TCP tear > > > > > > down. It > > > > > > seems a little odd that the max gets hit and nothing else gets > > > > > > through. > > > > > > I guess it could be the benchmark isn't shutting down the session > > > > > > right > > > > > > after its down transferring data, but I would think it would kill > > > > > > the > > > > > > benchmark client to have 10K(ish) of open TCP sessions. > > > > > > > > > > One way to deterimine this would be to run pfctl -ss | wc -l once > > > > > pfSense stops responding? > > > > > > > > Very interesting > > > > > > > > I tried running this before the problems but it looks strange already: > > > > > > > > # pfctl -ss | wc -l > > > > 4893 > > > > Killed > > > > # pfctl -ss | wc -l > > > >23245 > > > > Killed > > > > > > > > There is nothing in dmesg or system logs. > > > > > > > > > > > > > > > > > > > > > > > > - > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Network Device pooling
>apr_poll: The timeout specified has expired (70007) What is the above from? Your benchmark testing box? On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote: > > Are you viewing the traffic queue status? This would be normal if you > > are... > > Heh, > > yes good quess. These were running in the other window. > > > So here is the output for "stalled" case > > # pfctl -ss | wc -l >51898 > > I have number of states set to 100.000 in advanced page so it is not > peak number. > > > Note what really surprises me is the number of request when if fails: > > apr_poll: The timeout specified has expired (70007) > Total of 28217 requests completed > > This number of 28217 is seen so often... Sometimes it is a bit more ot > less but it is very frequently withing +/- 100 of it. > > I was asked if I can connect to the remote box when this problem happens > - yes. I can SSH to the same box which runs Apache, but I can't > connect to the port 80 when this problem happens. > > So it looks like it does not like to see all these states corresponding > to the same target port number. > > > > > > > Scott > > > > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > > On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote: > > > > On 10/31/05, Fleming, John (ZeroChaos) <[EMAIL PROTECTED]> wrote: > > > > > I wonder if part of the problem is PF isn't seeing the TCP tear down. > > > > > It > > > > > seems a little odd that the max gets hit and nothing else gets > > > > > through. > > > > > I guess it could be the benchmark isn't shutting down the session > > > > > right > > > > > after its down transferring data, but I would think it would kill the > > > > > benchmark client to have 10K(ish) of open TCP sessions. > > > > > > > > One way to deterimine this would be to run pfctl -ss | wc -l once > > > > pfSense stops responding? > > > > > > Very interesting > > > > > > I tried running this before the problems but it looks strange already: > > > > > > # pfctl -ss | wc -l > > > 4893 > > > Killed > > > # pfctl -ss | wc -l > > >23245 > > > Killed > > > > > > There is nothing in dmesg or system logs. > > > > > > > > > > > > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Network Device pooling
On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote: > Are you viewing the traffic queue status? This would be normal if you are... Heh, yes good quess. These were running in the other window. So here is the output for "stalled" case # pfctl -ss | wc -l 51898 I have number of states set to 100.000 in advanced page so it is not peak number. Note what really surprises me is the number of request when if fails: apr_poll: The timeout specified has expired (70007) Total of 28217 requests completed This number of 28217 is seen so often... Sometimes it is a bit more ot less but it is very frequently withing +/- 100 of it. I was asked if I can connect to the remote box when this problem happens - yes. I can SSH to the same box which runs Apache, but I can't connect to the port 80 when this problem happens. So it looks like it does not like to see all these states corresponding to the same target port number. > > Scott > > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote: > > > On 10/31/05, Fleming, John (ZeroChaos) <[EMAIL PROTECTED]> wrote: > > > > I wonder if part of the problem is PF isn't seeing the TCP tear down. It > > > > seems a little odd that the max gets hit and nothing else gets through. > > > > I guess it could be the benchmark isn't shutting down the session right > > > > after its down transferring data, but I would think it would kill the > > > > benchmark client to have 10K(ish) of open TCP sessions. > > > > > > One way to deterimine this would be to run pfctl -ss | wc -l once > > > pfSense stops responding? > > > > Very interesting > > > > I tried running this before the problems but it looks strange already: > > > > # pfctl -ss | wc -l > > 4893 > > Killed > > # pfctl -ss | wc -l > >23245 > > Killed > > > > There is nothing in dmesg or system logs. > > > > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [pfSense Support] Network Device pooling
On Mon, 2005-10-31 at 13:25 -0600, Fleming, John (ZeroChaos) wrote: > > Can you send these while the machine is normal and when the machine is > choking? (send the output.txt file btw) Normal: # cat /tmp/output.txt Mon Oct 31 07:50:52 PST 2005 564/336/900 mbufs in use (current/cache/total) 555/269/824/17088 mbuf clusters in use (current/cache/total/max) 0/3/4528 sfbufs in use (current/peak/max) 1253K/622K/1875K bytes allocated to network (current/cache/total) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines NameMtu Network Address Ipkts IerrsOpkts Oerrs Coll em01500 00:14:22:0a:64:4c 2200575 0 2004248 0 0 em01500 fe80:1::214:2 fe80:1::214:22ff:0 -4 - - em01500 111.111.111.152 111.111.111.154 3395 -0 - - em11500 00:14:22:0a:64:4d 2003036 0 2195974 0 0 em11500 fe80:2::214:2 fe80:2::214:22ff:0 -4 - - em11500 111.111.111.152 111.111.111.1540 - 6162 - - pfsyn 20200 00 0 0 lo0 163840 00 0 0 lo0 16384 127 127.0.0.10 -0 - - lo0 16384 ::1/128 ::1 0 -0 - - lo0 16384 fe80:4::1/64 fe80:4::10 -0 - - pflog 332080 00 0 0 bridg 1500 ac:de:48:e1:dd:5f 4197981 0 4200265 0 0 Choking: Mon Oct 31 07:48:44 PST 2005 515/385/900 mbufs in use (current/cache/total) 514/310/824/17088 mbuf clusters in use (current/cache/total/max) 0/3/4528 sfbufs in use (current/peak/max) 1156K/716K/1873K bytes allocated to network (current/cache/total) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines NameMtu Network Address Ipkts IerrsOpkts Oerrs Coll em01500 00:14:22:0a:64:4c 2011449 0 1838611 0 0 em01500 fe80:1::214:2 fe80:1::214:22ff:0 -4 - - em01500 111.111.111.152 111.111.111.154 2644 -0 - - em11500 00:14:22:0a:64:4d 1835313 0 2007595 0 0 em11500 fe80:2::214:2 fe80:2::214:22ff:0 -4 - - em11500 111.111.111.152 111.111.111.1540 - 5336 - - pfsyn 20200 00 0 0 lo0 163840 00 0 0 lo0 16384 127 127.0.0.10 -0 - - lo0 16384 ::1/128 ::1 0 -0 - - lo0 16384 fe80:4::1/64 fe80:4::10 -0 - - pflog 332080 00 0 0 bridg 1500 ac:de:48:e1:dd:5f 3841883 0 3846209 0 0 Some of your advised commands fail: # sysctl hw.em0.stats=1 >> /tmp/output.txt sysctl: unknown oid 'hw.em0.stats' # # sysctl hw.em1.stats=1 >> /tmp/output.txt sysctl: unknown oid 'hw.em1.stats' # # sysctl hw.em2.stats=1 >> /tmp/output.txt sysctl: unknown oid 'hw.em2.stats' > > Are you able to try this test using routing ver bridging? I did not try with routing as this is not what I'm going to use. I however tried doing this with firewall disabled and bridging enabled which seems to show it is not bridging itself at least. > > > -Original Message- > From: Scott Ullrich [mailto:[EMAIL PROTECTED] > Sent: Monday, October 31, 2005 1:09 PM > To: support@pfsense.com > Subject: Re: [pfSense Support] Network Device pooling > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > On Mon, 2005-10-31 at 12:03 -0500, Scott Ullrich wrote: > > > Please describe the hardware your using fully. NICS, etc. This is > > > not normal behavior. > > > > Sure It is Dell Poweredge 750 > > 512MB RAM, SATA150 disk, Celeron 2.4Ghz > > > > ACPI APIC Table: > > Timecounter "i8254" frequency 1193182 Hz quality 0 > > CPU: Intel(R) Celeron(R) CPU 2.40GHz (2400.10-MHz 686-class CPU) > > Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 > > > > > Features=0xbfebfbff ,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > > Features2=0x4400> > > real memory = 536608768 (511 MB) > > avail memory = 515547136 (491 MB) > > > > > > > > Nics are build in Intel 10/100/1000 NICs: > > > > em0: port > > 0xece0-0xecff mem 0xfe1e-0xfe1f irq 18 at device 1.0 on pci1 > > em0: Ethernet address: 00:14:22:0a:64:4c >
Re: [pfSense Support] Network Device pooling
Are you viewing the traffic queue status? This would be normal if you are... Scott On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote: > > On 10/31/05, Fleming, John (ZeroChaos) <[EMAIL PROTECTED]> wrote: > > > I wonder if part of the problem is PF isn't seeing the TCP tear down. It > > > seems a little odd that the max gets hit and nothing else gets through. > > > I guess it could be the benchmark isn't shutting down the session right > > > after its down transferring data, but I would think it would kill the > > > benchmark client to have 10K(ish) of open TCP sessions. > > > > One way to deterimine this would be to run pfctl -ss | wc -l once > > pfSense stops responding? > > Very interesting > > I tried running this before the problems but it looks strange already: > > # pfctl -ss | wc -l > 4893 > Killed > # pfctl -ss | wc -l >23245 > Killed > > There is nothing in dmesg or system logs. > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Network Device pooling
On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote: > On 10/31/05, Fleming, John (ZeroChaos) <[EMAIL PROTECTED]> wrote: > > I wonder if part of the problem is PF isn't seeing the TCP tear down. It > > seems a little odd that the max gets hit and nothing else gets through. > > I guess it could be the benchmark isn't shutting down the session right > > after its down transferring data, but I would think it would kill the > > benchmark client to have 10K(ish) of open TCP sessions. > > One way to deterimine this would be to run pfctl -ss | wc -l once > pfSense stops responding? Very interesting I tried running this before the problems but it looks strange already: # pfctl -ss | wc -l 4893 Killed # pfctl -ss | wc -l 23245 Killed There is nothing in dmesg or system logs. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Network Device pooling
On 10/31/05, Fleming, John (ZeroChaos) <[EMAIL PROTECTED]> wrote: > I wonder if part of the problem is PF isn't seeing the TCP tear down. It > seems a little odd that the max gets hit and nothing else gets through. > I guess it could be the benchmark isn't shutting down the session right > after its down transferring data, but I would think it would kill the > benchmark client to have 10K(ish) of open TCP sessions. One way to deterimine this would be to run pfctl -ss | wc -l once pfSense stops responding? Scott - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [pfSense Support] Network Device pooling
I wonder if part of the problem is PF isn't seeing the TCP tear down. It seems a little odd that the max gets hit and nothing else gets through. I guess it could be the benchmark isn't shutting down the session right after its down transferring data, but I would think it would kill the benchmark client to have 10K(ish) of open TCP sessions. -Original Message- From: Scott Ullrich [mailto:[EMAIL PROTECTED] Sent: Monday, October 31, 2005 1:28 PM To: support@pfsense.com Subject: Re: [pfSense Support] Network Device pooling On 10/31/05, Fleming, John (ZeroChaos) <[EMAIL PROTECTED]> wrote: > Benchmarking 111.111.111.158 (be patient) Completed 1 requests <- > isn't 10,000 the default limit of the state table? That sure would > explain a lot. Yep. 10K is the default and it is adjustable from the System -> Advanced screen. Scott - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [pfSense Support] Network Device pooling
On Mon, 2005-10-31 at 13:26 -0600, Fleming, John (ZeroChaos) wrote: > Benchmarking 111.111.111.158 (be patient) Completed 1 requests <- > isn't 10,000 the default limit of the state table? That sure would > explain a lot. I boosted it to 10 of course - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Network Device pooling
On 10/31/05, Fleming, John (ZeroChaos) <[EMAIL PROTECTED]> wrote: > Benchmarking 111.111.111.158 (be patient) Completed 1 requests <- > isn't 10,000 the default limit of the state table? That sure would > explain a lot. Yep. 10K is the default and it is adjustable from the System -> Advanced screen. Scott - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [pfSense Support] Network Device pooling
Benchmarking 111.111.111.158 (be patient) Completed 1 requests <- isn't 10,000 the default limit of the state table? That sure would explain a lot. -Original Message- From: Peter Zaitsev [mailto:[EMAIL PROTECTED] Sent: Monday, October 31, 2005 12:56 PM To: support@pfsense.com Subject: Re: [pfSense Support] Network Device pooling On Mon, 2005-10-31 at 12:03 -0500, Scott Ullrich wrote: > Please describe the hardware your using fully. NICS, etc. This is > not normal behavior. Sure It is Dell Poweredge 750 512MB RAM, SATA150 disk, Celeron 2.4Ghz ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Celeron(R) CPU 2.40GHz (2400.10-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff Features2=0x4400> real memory = 536608768 (511 MB) avail memory = 515547136 (491 MB) Nics are build in Intel 10/100/1000 NICs: em0: port 0xece0-0xecff mem 0xfe1e-0xfe1f irq 18 at device 1.0 on pci1 em0: Ethernet address: 00:14:22:0a:64:4c em0: Speed:N/A Duplex:N/A It does not looks like this is hardware issue for me as if I disable firewall it works fine. I tried turning off scrub and it does not change anything. Still timeout after few requests: [EMAIL PROTECTED]:/tmp> ./ab2 -n 10 http://111.111.111.158/ This is ApacheBench, Version 2.0.41-dev <$Revision: 1.121.2.12 $> apache-2.0 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/ Benchmarking 111.111.111.158 (be patient) Completed 1 requests apr_poll: The timeout specified has expired (70007) > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > On Sun, 2005-10-30 at 23:14 +0100, Espen Johansen wrote: > > > Hi Peter, > > > > > > I have seen you have done a lot of testing with apache benchmarking. > > > I find it a little strange to use this as a test. Basically you will hit the > > > roof of standing I/O operations because you introduce latency with pfsense. > > > The lower the latency the more finished tasks/connections per time unit. > > > Most people don't take this into consideration when they tune apache. > > > Although, this is one of the most important aspects of web-server tuning. > > > > Espen, > > > > If you would see to the set of my emails you would see the growing > > latency with network pooling is not my concern, as well as well as > > dropping throughput with pfsense in the middle - it is all > > understandable. > > > > What is NOT ok however is the stall (20+ seconds) when CPU usage on > > pfsense drops almost to zero and no traffics come on connections. > > Sometimes it causes apache benchmark to abort sometimes just shows crazy > > response times. > > > > This does not happen in direct benchmark (no pfsense in the middle) or > > with pfsense with disable firewall. > > > > Why I used apache benchmark ? Well it is simple stress test which > > results in a lot of traffic and a lot of states in the state tables. > > > > > > > > This is the scenario: > > > > > > Client with low BW and high latency will generate a standing I/O because of > > > the way apache is designed. So if a client with 100ms latency asks for a > > > file of 100Kbyte and he has a 3KB/s transfer rate he will generate a > > > standing I/O operation for "latency + transfer time", and the I/O operation > > > will not be finished until he has a completed transfer. So basically you do > > > the same, because you change the amount of time the request takes to process > > > you will have more standing I/O operations then if pfsense does routing only > > > (faster then routing and filtering). So lets say that you increase latency > > > from 0.4 ms to 2 ms it will mean that you have standing I/O 250% longer. So > > > in turn that will mean that your ability to serve connections will be 1/5 > > > with 2ms compared to 0.4 ms latency. > > > > Well... This would be the case in real life scenario - slow clients > > blowing up number of apache children. But it is not the case in > > synthetic Apache benchmark test. In this case you set fixed > > concurrency. I obviously set it low enough for my Apache box to > > handle. > > > > Furthermore pfsense locks even with single connection (this is > > independent if device pooling is enabled) > > > > > > > > > > The ones listed below seems to be the once that has the most effect on > > > polling and performance. You will hav
RE: [pfSense Support] Network Device pooling
Send the output.txt of... date >> /tmp/output.txt netstat -m >> /tmp/output.txt netstat -in >> /tmp/output.txt sysctl hw.em0.stats=1 >> /tmp/output.txt sysctl hw.em1.stats=1 >> /tmp/output.txt sysctl hw.em2.stats=1 >> /tmp/output.txt Can you send these while the machine is normal and when the machine is choking? (send the output.txt file btw) Are you able to try this test using routing ver bridging? -Original Message- From: Scott Ullrich [mailto:[EMAIL PROTECTED] Sent: Monday, October 31, 2005 1:09 PM To: support@pfsense.com Subject: Re: [pfSense Support] Network Device pooling On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > On Mon, 2005-10-31 at 12:03 -0500, Scott Ullrich wrote: > > Please describe the hardware your using fully. NICS, etc. This is > > not normal behavior. > > Sure It is Dell Poweredge 750 > 512MB RAM, SATA150 disk, Celeron 2.4Ghz > > ACPI APIC Table: > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Intel(R) Celeron(R) CPU 2.40GHz (2400.10-MHz 686-class CPU) > Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 > > Features=0xbfebfbff > Features2=0x4400> > real memory = 536608768 (511 MB) > avail memory = 515547136 (491 MB) > > > > Nics are build in Intel 10/100/1000 NICs: > > em0: port > 0xece0-0xecff mem 0xfe1e-0xfe1f irq 18 at device 1.0 on pci1 > em0: Ethernet address: 00:14:22:0a:64:4c > em0: Speed:N/A Duplex:N/A > > > It does not looks like this is hardware issue for me as if I disable > firewall it works fine. > > I tried turning off scrub and it does not change anything. Still timeout > after few requests: And when this timeout occurs do you see anything in the system logs? Can you still telnet into the apache server behind pfsense? This really doesn't make a lot of sense. It should be able to stand up to this. Scott - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Network Device pooling
On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > On Mon, 2005-10-31 at 12:03 -0500, Scott Ullrich wrote: > > Please describe the hardware your using fully. NICS, etc. This is > > not normal behavior. > > Sure It is Dell Poweredge 750 > 512MB RAM, SATA150 disk, Celeron 2.4Ghz > > ACPI APIC Table: > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Intel(R) Celeron(R) CPU 2.40GHz (2400.10-MHz 686-class CPU) > Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 > > Features=0xbfebfbff > Features2=0x4400> > real memory = 536608768 (511 MB) > avail memory = 515547136 (491 MB) > > > > Nics are build in Intel 10/100/1000 NICs: > > em0: port > 0xece0-0xecff mem 0xfe1e-0xfe1f irq 18 at device 1.0 on pci1 > em0: Ethernet address: 00:14:22:0a:64:4c > em0: Speed:N/A Duplex:N/A > > > It does not looks like this is hardware issue for me as if I disable > firewall it works fine. > > I tried turning off scrub and it does not change anything. Still timeout > after few requests: And when this timeout occurs do you see anything in the system logs? Can you still telnet into the apache server behind pfsense? This really doesn't make a lot of sense. It should be able to stand up to this. Scott - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Network Device pooling
On Mon, 2005-10-31 at 12:03 -0500, Scott Ullrich wrote: > Please describe the hardware your using fully. NICS, etc. This is > not normal behavior. Sure It is Dell Poweredge 750 512MB RAM, SATA150 disk, Celeron 2.4Ghz ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Celeron(R) CPU 2.40GHz (2400.10-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff Features2=0x4400> real memory = 536608768 (511 MB) avail memory = 515547136 (491 MB) Nics are build in Intel 10/100/1000 NICs: em0: port 0xece0-0xecff mem 0xfe1e-0xfe1f irq 18 at device 1.0 on pci1 em0: Ethernet address: 00:14:22:0a:64:4c em0: Speed:N/A Duplex:N/A It does not looks like this is hardware issue for me as if I disable firewall it works fine. I tried turning off scrub and it does not change anything. Still timeout after few requests: [EMAIL PROTECTED]:/tmp> ./ab2 -n 10 http://111.111.111.158/ This is ApacheBench, Version 2.0.41-dev <$Revision: 1.121.2.12 $> apache-2.0 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/ Benchmarking 111.111.111.158 (be patient) Completed 1 requests apr_poll: The timeout specified has expired (70007) > > On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > > On Sun, 2005-10-30 at 23:14 +0100, Espen Johansen wrote: > > > Hi Peter, > > > > > > I have seen you have done a lot of testing with apache benchmarking. > > > I find it a little strange to use this as a test. Basically you will hit > > > the > > > roof of standing I/O operations because you introduce latency with > > > pfsense. > > > The lower the latency the more finished tasks/connections per time unit. > > > Most people don't take this into consideration when they tune apache. > > > Although, this is one of the most important aspects of web-server tuning. > > > > Espen, > > > > If you would see to the set of my emails you would see the growing > > latency with network pooling is not my concern, as well as well as > > dropping throughput with pfsense in the middle - it is all > > understandable. > > > > What is NOT ok however is the stall (20+ seconds) when CPU usage on > > pfsense drops almost to zero and no traffics come on connections. > > Sometimes it causes apache benchmark to abort sometimes just shows crazy > > response times. > > > > This does not happen in direct benchmark (no pfsense in the middle) or > > with pfsense with disable firewall. > > > > Why I used apache benchmark ? Well it is simple stress test which > > results in a lot of traffic and a lot of states in the state tables. > > > > > > > > This is the scenario: > > > > > > Client with low BW and high latency will generate a standing I/O because > > > of > > > the way apache is designed. So if a client with 100ms latency asks for a > > > file of 100Kbyte and he has a 3KB/s transfer rate he will generate a > > > standing I/O operation for "latency + transfer time", and the I/O > > > operation > > > will not be finished until he has a completed transfer. So basically you > > > do > > > the same, because you change the amount of time the request takes to > > > process > > > you will have more standing I/O operations then if pfsense does routing > > > only > > > (faster then routing and filtering). So lets say that you increase latency > > > from 0.4 ms to 2 ms it will mean that you have standing I/O 250% longer. > > > So > > > in turn that will mean that your ability to serve connections will be 1/5 > > > with 2ms compared to 0.4 ms latency. > > > > Well... This would be the case in real life scenario - slow clients > > blowing up number of apache children. But it is not the case in > > synthetic Apache benchmark test. In this case you set fixed > > concurrency. I obviously set it low enough for my Apache box to > > handle. > > > > Furthermore pfsense locks even with single connection (this is > > independent if device pooling is enabled) > > > > > > > > > > The ones listed below seems to be the once that has the most effect on > > > polling and performance. You will have to play around with these settings > > > to > > > find out what works best on your HW, as I can't seem to find some common > > > setting that works well for all kinds of HW. > > > > > > kern.polling.each_burst=80 > > > kern.polling.burst_max=1000 > > > kern.polling.user_frac=50 > > > > > > Thanks. > > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For addit
Re: [pfSense Support] Network Device pooling
Please describe the hardware your using fully. NICS, etc. This is not normal behavior. On 10/31/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > On Sun, 2005-10-30 at 23:14 +0100, Espen Johansen wrote: > > Hi Peter, > > > > I have seen you have done a lot of testing with apache benchmarking. > > I find it a little strange to use this as a test. Basically you will hit the > > roof of standing I/O operations because you introduce latency with pfsense. > > The lower the latency the more finished tasks/connections per time unit. > > Most people don't take this into consideration when they tune apache. > > Although, this is one of the most important aspects of web-server tuning. > > Espen, > > If you would see to the set of my emails you would see the growing > latency with network pooling is not my concern, as well as well as > dropping throughput with pfsense in the middle - it is all > understandable. > > What is NOT ok however is the stall (20+ seconds) when CPU usage on > pfsense drops almost to zero and no traffics come on connections. > Sometimes it causes apache benchmark to abort sometimes just shows crazy > response times. > > This does not happen in direct benchmark (no pfsense in the middle) or > with pfsense with disable firewall. > > Why I used apache benchmark ? Well it is simple stress test which > results in a lot of traffic and a lot of states in the state tables. > > > > > This is the scenario: > > > > Client with low BW and high latency will generate a standing I/O because of > > the way apache is designed. So if a client with 100ms latency asks for a > > file of 100Kbyte and he has a 3KB/s transfer rate he will generate a > > standing I/O operation for "latency + transfer time", and the I/O operation > > will not be finished until he has a completed transfer. So basically you do > > the same, because you change the amount of time the request takes to process > > you will have more standing I/O operations then if pfsense does routing only > > (faster then routing and filtering). So lets say that you increase latency > > from 0.4 ms to 2 ms it will mean that you have standing I/O 250% longer. So > > in turn that will mean that your ability to serve connections will be 1/5 > > with 2ms compared to 0.4 ms latency. > > Well... This would be the case in real life scenario - slow clients > blowing up number of apache children. But it is not the case in > synthetic Apache benchmark test. In this case you set fixed > concurrency. I obviously set it low enough for my Apache box to > handle. > > Furthermore pfsense locks even with single connection (this is > independent if device pooling is enabled) > > > > > > The ones listed below seems to be the once that has the most effect on > > polling and performance. You will have to play around with these settings to > > find out what works best on your HW, as I can't seem to find some common > > setting that works well for all kinds of HW. > > > > kern.polling.each_burst=80 > > kern.polling.burst_max=1000 > > kern.polling.user_frac=50 > > > Thanks. > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [pfSense Support] Network Device pooling
On Sun, 2005-10-30 at 23:14 +0100, Espen Johansen wrote: > Hi Peter, > > I have seen you have done a lot of testing with apache benchmarking. > I find it a little strange to use this as a test. Basically you will hit the > roof of standing I/O operations because you introduce latency with pfsense. > The lower the latency the more finished tasks/connections per time unit. > Most people don't take this into consideration when they tune apache. > Although, this is one of the most important aspects of web-server tuning. Espen, If you would see to the set of my emails you would see the growing latency with network pooling is not my concern, as well as well as dropping throughput with pfsense in the middle - it is all understandable. What is NOT ok however is the stall (20+ seconds) when CPU usage on pfsense drops almost to zero and no traffics come on connections. Sometimes it causes apache benchmark to abort sometimes just shows crazy response times. This does not happen in direct benchmark (no pfsense in the middle) or with pfsense with disable firewall. Why I used apache benchmark ? Well it is simple stress test which results in a lot of traffic and a lot of states in the state tables. > > This is the scenario: > > Client with low BW and high latency will generate a standing I/O because of > the way apache is designed. So if a client with 100ms latency asks for a > file of 100Kbyte and he has a 3KB/s transfer rate he will generate a > standing I/O operation for "latency + transfer time", and the I/O operation > will not be finished until he has a completed transfer. So basically you do > the same, because you change the amount of time the request takes to process > you will have more standing I/O operations then if pfsense does routing only > (faster then routing and filtering). So lets say that you increase latency > from 0.4 ms to 2 ms it will mean that you have standing I/O 250% longer. So > in turn that will mean that your ability to serve connections will be 1/5 > with 2ms compared to 0.4 ms latency. Well... This would be the case in real life scenario - slow clients blowing up number of apache children. But it is not the case in synthetic Apache benchmark test. In this case you set fixed concurrency. I obviously set it low enough for my Apache box to handle. Furthermore pfsense locks even with single connection (this is independent if device pooling is enabled) > > The ones listed below seems to be the once that has the most effect on > polling and performance. You will have to play around with these settings to > find out what works best on your HW, as I can't seem to find some common > setting that works well for all kinds of HW. > > kern.polling.each_burst=80 > kern.polling.burst_max=1000 > kern.polling.user_frac=50 Thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [pfSense Support] Network Device pooling
Hi Peter, I have seen you have done a lot of testing with apache benchmarking. I find it a little strange to use this as a test. Basically you will hit the roof of standing I/O operations because you introduce latency with pfsense. The lower the latency the more finished tasks/connections per time unit. Most people don't take this into consideration when they tune apache. Although, this is one of the most important aspects of web-server tuning. This is the scenario: Client with low BW and high latency will generate a standing I/O because of the way apache is designed. So if a client with 100ms latency asks for a file of 100Kbyte and he has a 3KB/s transfer rate he will generate a standing I/O operation for "latency + transfer time", and the I/O operation will not be finished until he has a completed transfer. So basically you do the same, because you change the amount of time the request takes to process you will have more standing I/O operations then if pfsense does routing only (faster then routing and filtering). So lets say that you increase latency from 0.4 ms to 2 ms it will mean that you have standing I/O 250% longer. So in turn that will mean that your ability to serve connections will be 1/5 with 2ms compared to 0.4 ms latency. I hope this explains better the behavior you see. As for device polling there are some sysctls that controls polling behavior sysctl kern.polling. will list them. The ones listed below seems to be the once that has the most effect on polling and performance. You will have to play around with these settings to find out what works best on your HW, as I can't seem to find some common setting that works well for all kinds of HW. kern.polling.each_burst=80 kern.polling.burst_max=1000 kern.polling.user_frac=50 The info/documentation on these settings seems limited so you should do some creative google searching to find out more. -lsf -Original Message- From: Peter Zaitsev [mailto:[EMAIL PROTECTED] Sent: 30. oktober 2005 05:35 To: support@pfsense.com Subject: [pfSense Support] Network Device pooling Hi, Tested this feature to see if it helps me with apache benchmark problem - no it does not . Also it looks like it is firewall related issue as if firewall is totally disabled (pf fails to load rules) everything works as expected. Speaking about Network pooling - in my case it increased packet round trip (2 Gbit Nicks) from 0.4 ms to 2ms. At the same time it well decreased CPU usage during the tests so this is something to consider if CPU performance ever becomes the problem. On other hand I was a bit surprised - according to vmstat number of interrupts even on idle box jumped to some 30.000/sec (from some 150 without this option set) I guess these are timer interrupts are used for pooling, so why they are pooled about 1000 times per second if we get so many timer interrupts ? One more thing to note: system needs to be restarted for this option to take an affect, however it does not say so anywhere. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [pfSense Support] Network Device pooling
On 10/30/05, Peter Zaitsev <[EMAIL PROTECTED]> wrote: > Hi, > > > Tested this feature to see if it helps me with apache benchmark problem > - no it does not . > > Also it looks like it is firewall related issue as if firewall is > totally disabled (pf fails to load rules) everything works as > expected. If its failing to load the rules you would have a alert on the GUI. Try running pfctl -f /tmp/rules.debug to see the errors. > Speaking about Network pooling - in my case it increased packet round > trip (2 Gbit Nicks) from 0.4 ms to 2ms. At the same time it well > decreased CPU usage during the tests so this is something to consider if > CPU performance ever becomes the problem. > > On other hand I was a bit surprised - according to vmstat number of > interrupts even on idle box jumped to some 30.000/sec (from some 150 > without this option set) > > I guess these are timer interrupts are used for pooling, so why they are > pooled about 1000 times per second if we get so many timer interrupts ? > > One more thing to note: system needs to be restarted for this option > to take an affect, however it does not say so anywhere. Try "playing around" with these sysctl's: kern.polling.each_burst: 80 kern.polling.burst_max: 1000 Also note these sysctl's (some are read only): http://www.pfsense.com/pastebin/276 Scott - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]