RE: [pfSense Support] Network Device pooling

2005-11-01 Thread Fleming, John \(ZeroChaos\)
Also I wrote when stall happens I can't telnet to port 80 on web server
host - which means it is not just program causing stall. 
Are you trying this from the same host as the benchmark program? I
wonder if a 2nd host would have the same problem.

-Original Message-
From: Peter Zaitsev [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 31, 2005 3:53 PM
To: support@pfsense.com
Subject: Re: [pfSense Support] Network Device pooling

On Mon, 2005-10-31 at 16:31 -0500, Scott Ullrich wrote:
 Are we absolutely sure this program works as intended?  Personally I
 wouldn't trust anything like this but smartbits.

Well... 

It works if filtering is disabled on pfsese  - this is what worries me.
If the program would be broken it should not work in  both cases.

Also I wrote when stall happens I can't telnet to port 80 on web server
host - which means it is not just program causing stall.

If it is protection on FreeBSD side from too much activity from same IP
(Ie as it limits response to flood ping) this would be good to know.

I hope this problem is actually something like that - I know there are a
lot of FreeBSD based routers out where  - if it would be broken for real
workloads something would scream already.

One more interesting thing I noticed: 

Percentage of the requests served within a certain time (ms)
  50% 32
  66% 33
  75% 33
  80% 33
  90% 44
  95%295
  98%324
  99%330
 100%  21285 (longest request)

Even if apache benchmark does not timeout it often shows too long
response rate -  (21 sec in this case)

What I've noticed - it can be 3,  9  or  21 secs in this case   - This
really look like the times at which SYN packets are resent by TCP/IP
stacks if no reply for previous one arrives. 

 
Doing more experiments I also discovered I can increase chance of
passing benchmark (still not to 100%)  if i reduce tcp_fin_timeout and
increase ip_local_port_range   variables ob my test driver host.

This still brings the question why  with filtering and without behavior
is different but it makes me worry less :)


 
 Scott
 
 
 On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
  On Mon, 2005-10-31 at 16:25 -0500, Scott Ullrich wrote:
   apr_poll: The timeout specified has expired (70007)
  
   What is the above from?  Your benchmark testing box?
 
  Yes. This is output from apache benchmark program.
 
 
  Benchmarking 111.111.111.158 (be patient)
  Completed 1 requests
  Completed 2 requests
  Completed 3 requests
  apr_poll: The timeout specified has expired (70007)
  Total of 30517 requests completed
 
 
 
  
   On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote:
 Are you viewing the traffic queue status?   This would be
normal if you are...
   
Heh,
   
yes good quess. These were running in the other window.
   
   
So here is the output for stalled case
   
# pfctl -ss | wc -l
   51898
   
I have number of states set to 100.000 in advanced page so it is
not
peak number.
   
   
Note what really surprises me is the number of request when if
fails:
   
apr_poll: The timeout specified has expired (70007)
Total of 28217 requests completed
   
This number of 28217 is seen so often... Sometimes it is a bit
more ot
less but it is very frequently withing +/- 100 of it.
   
I was asked if I can connect to the remote box when this problem
happens
-  yes.  I can SSH to the same box which runs Apache, but I
can't
connect to the port 80 when this problem happens.
   
So it looks like it does not like to see all these states
corresponding
to the same target port number.
   
   
   

 Scott


 On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
  On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote:
   On 10/31/05, Fleming, John (ZeroChaos)
[EMAIL PROTECTED] wrote:
I wonder if part of the problem is PF isn't seeing the
TCP tear down. It
seems a little odd that the max gets hit and nothing
else gets through.
I guess it could be the benchmark isn't shutting down
the session right
after its down transferring data, but I would think it
would kill the
benchmark client to have 10K(ish) of open TCP sessions.
  
   One way to deterimine this would be to run pfctl -ss | wc
-l once
   pfSense stops responding?
 
  Very interesting
 
  I tried running this before the problems but it looks
strange already:
 
  # pfctl -ss | wc -l
  4893
  Killed
  # pfctl -ss | wc -l
 23245
  Killed
 
  There is nothing in dmesg or  system logs.
 
 
 
 
 
 
-
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED

RE: [pfSense Support] Network Device pooling

2005-11-01 Thread Peter Zaitsev
On Tue, 2005-11-01 at 10:43 -0600, Fleming, John (ZeroChaos) wrote:
 Also I wrote when stall happens I can't telnet to port 80 on web server
 host - which means it is not just program causing stall. 
 Are you trying this from the same host as the benchmark program? I
 wonder if a 2nd host would have the same problem.

I did not have an extra host for test. 

I've finally figured out it looks like client is running out of local
ports  as increasing ip_local_port_range   allowed to get to the
different point.  

Two things confused me here 

1) For some reason it does not fail if firewall is disabled. Probably
something is different with connect closure.  

2) The error code reported by ab is connect timeout.  for this kind of
error it should be Can't assign requested address or something
similar.   I guess it could be apache runtime  abstraction library does
not report this error well enough.




 
 -Original Message-
 From: Peter Zaitsev [mailto:[EMAIL PROTECTED] 
 Sent: Monday, October 31, 2005 3:53 PM
 To: support@pfsense.com
 Subject: Re: [pfSense Support] Network Device pooling
 
 On Mon, 2005-10-31 at 16:31 -0500, Scott Ullrich wrote:
  Are we absolutely sure this program works as intended?  Personally I
  wouldn't trust anything like this but smartbits.
 
 Well... 
 
 It works if filtering is disabled on pfsese  - this is what worries me.
 If the program would be broken it should not work in  both cases.
 
 Also I wrote when stall happens I can't telnet to port 80 on web server
 host - which means it is not just program causing stall.
 
 If it is protection on FreeBSD side from too much activity from same IP
 (Ie as it limits response to flood ping) this would be good to know.
 
 I hope this problem is actually something like that - I know there are a
 lot of FreeBSD based routers out where  - if it would be broken for real
 workloads something would scream already.
 
 One more interesting thing I noticed: 
 
 Percentage of the requests served within a certain time (ms)
   50% 32
   66% 33
   75% 33
   80% 33
   90% 44
   95%295
   98%324
   99%330
  100%  21285 (longest request)
 
 Even if apache benchmark does not timeout it often shows too long
 response rate -  (21 sec in this case)
 
 What I've noticed - it can be 3,  9  or  21 secs in this case   - This
 really look like the times at which SYN packets are resent by TCP/IP
 stacks if no reply for previous one arrives. 
 
  
 Doing more experiments I also discovered I can increase chance of
 passing benchmark (still not to 100%)  if i reduce tcp_fin_timeout and
 increase ip_local_port_range   variables ob my test driver host.
 
 This still brings the question why  with filtering and without behavior
 is different but it makes me worry less :)
 
 
  
  Scott
  
  
  On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
   On Mon, 2005-10-31 at 16:25 -0500, Scott Ullrich wrote:
apr_poll: The timeout specified has expired (70007)
   
What is the above from?  Your benchmark testing box?
  
   Yes. This is output from apache benchmark program.
  
  
   Benchmarking 111.111.111.158 (be patient)
   Completed 1 requests
   Completed 2 requests
   Completed 3 requests
   apr_poll: The timeout specified has expired (70007)
   Total of 30517 requests completed
  
  
  
   
On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
 On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote:
  Are you viewing the traffic queue status?   This would be
 normal if you are...

 Heh,

 yes good quess. These were running in the other window.


 So here is the output for stalled case

 # pfctl -ss | wc -l
51898

 I have number of states set to 100.000 in advanced page so it is
 not
 peak number.


 Note what really surprises me is the number of request when if
 fails:

 apr_poll: The timeout specified has expired (70007)
 Total of 28217 requests completed

 This number of 28217 is seen so often... Sometimes it is a bit
 more ot
 less but it is very frequently withing +/- 100 of it.

 I was asked if I can connect to the remote box when this problem
 happens
 -  yes.  I can SSH to the same box which runs Apache, but I
 can't
 connect to the port 80 when this problem happens.

 So it looks like it does not like to see all these states
 corresponding
 to the same target port number.



 
  Scott
 
 
  On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
   On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote:
On 10/31/05, Fleming, John (ZeroChaos)
 [EMAIL PROTECTED] wrote:
 I wonder if part of the problem is PF isn't seeing the
 TCP tear down. It
 seems a little odd that the max gets hit and nothing
 else gets through.
 I guess it could be the benchmark isn't shutting down
 the session right
 after its down

Re: [pfSense Support] Network Device pooling

2005-11-01 Thread Scott Ullrich
Can we please let this thread die already?   I'm tired about hearing
of benchmarking the *WRONG* way.

Scott


On 11/1/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
 On Tue, 2005-11-01 at 10:43 -0600, Fleming, John (ZeroChaos) wrote:
  Also I wrote when stall happens I can't telnet to port 80 on web server
  host - which means it is not just program causing stall.
  Are you trying this from the same host as the benchmark program? I
  wonder if a 2nd host would have the same problem.

 I did not have an extra host for test.

 I've finally figured out it looks like client is running out of local
 ports  as increasing ip_local_port_range   allowed to get to the
 different point.

 Two things confused me here

 1) For some reason it does not fail if firewall is disabled. Probably
 something is different with connect closure.

 2) The error code reported by ab is connect timeout.  for this kind of
 error it should be Can't assign requested address or something
 similar.   I guess it could be apache runtime  abstraction library does
 not report this error well enough.




 
  -Original Message-
  From: Peter Zaitsev [mailto:[EMAIL PROTECTED]
  Sent: Monday, October 31, 2005 3:53 PM
  To: support@pfsense.com
  Subject: Re: [pfSense Support] Network Device pooling
 
  On Mon, 2005-10-31 at 16:31 -0500, Scott Ullrich wrote:
   Are we absolutely sure this program works as intended?  Personally I
   wouldn't trust anything like this but smartbits.
 
  Well...
 
  It works if filtering is disabled on pfsese  - this is what worries me.
  If the program would be broken it should not work in  both cases.
 
  Also I wrote when stall happens I can't telnet to port 80 on web server
  host - which means it is not just program causing stall.
 
  If it is protection on FreeBSD side from too much activity from same IP
  (Ie as it limits response to flood ping) this would be good to know.
 
  I hope this problem is actually something like that - I know there are a
  lot of FreeBSD based routers out where  - if it would be broken for real
  workloads something would scream already.
 
  One more interesting thing I noticed:
 
  Percentage of the requests served within a certain time (ms)
50% 32
66% 33
75% 33
80% 33
90% 44
95%295
98%324
99%330
   100%  21285 (longest request)
 
  Even if apache benchmark does not timeout it often shows too long
  response rate -  (21 sec in this case)
 
  What I've noticed - it can be 3,  9  or  21 secs in this case   - This
  really look like the times at which SYN packets are resent by TCP/IP
  stacks if no reply for previous one arrives.
 
 
  Doing more experiments I also discovered I can increase chance of
  passing benchmark (still not to 100%)  if i reduce tcp_fin_timeout and
  increase ip_local_port_range   variables ob my test driver host.
 
  This still brings the question why  with filtering and without behavior
  is different but it makes me worry less :)
 
 
  
   Scott
  
  
   On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
On Mon, 2005-10-31 at 16:25 -0500, Scott Ullrich wrote:
 apr_poll: The timeout specified has expired (70007)

 What is the above from?  Your benchmark testing box?
   
Yes. This is output from apache benchmark program.
   
   
Benchmarking 111.111.111.158 (be patient)
Completed 1 requests
Completed 2 requests
Completed 3 requests
apr_poll: The timeout specified has expired (70007)
Total of 30517 requests completed
   
   
   

 On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
  On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote:
   Are you viewing the traffic queue status?   This would be
  normal if you are...
 
  Heh,
 
  yes good quess. These were running in the other window.
 
 
  So here is the output for stalled case
 
  # pfctl -ss | wc -l
 51898
 
  I have number of states set to 100.000 in advanced page so it is
  not
  peak number.
 
 
  Note what really surprises me is the number of request when if
  fails:
 
  apr_poll: The timeout specified has expired (70007)
  Total of 28217 requests completed
 
  This number of 28217 is seen so often... Sometimes it is a bit
  more ot
  less but it is very frequently withing +/- 100 of it.
 
  I was asked if I can connect to the remote box when this problem
  happens
  -  yes.  I can SSH to the same box which runs Apache, but I
  can't
  connect to the port 80 when this problem happens.
 
  So it looks like it does not like to see all these states
  corresponding
  to the same target port number.
 
 
 
  
   Scott
  
  
   On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote:
 On 10/31/05, Fleming, John (ZeroChaos)
  [EMAIL PROTECTED] wrote

RE: [pfSense Support] Network Device pooling

2005-11-01 Thread alan walters
I think the first rule of testing applies start at the beginning and
work your way backwards.

Please u solved ur problems

 -Original Message-
 From: Peter Zaitsev [mailto:[EMAIL PROTECTED]
 Sent: 01 November 2005 18:16
 To: support@pfsense.com
 Subject: RE: [pfSense Support] Network Device pooling
 
 On Tue, 2005-11-01 at 10:43 -0600, Fleming, John (ZeroChaos) wrote:
  Also I wrote when stall happens I can't telnet to port 80 on web
server
  host - which means it is not just program causing stall.
  Are you trying this from the same host as the benchmark program? I
  wonder if a 2nd host would have the same problem.
 
 I did not have an extra host for test.
 
 I've finally figured out it looks like client is running out of local
 ports  as increasing ip_local_port_range   allowed to get to the
 different point.
 
 Two things confused me here
 
 1) For some reason it does not fail if firewall is disabled. Probably
 something is different with connect closure.
 
 2) The error code reported by ab is connect timeout.  for this kind
of
 error it should be Can't assign requested address or something
 similar.   I guess it could be apache runtime  abstraction library
does
 not report this error well enough.
 
 
 
 
 
  -Original Message-
  From: Peter Zaitsev [mailto:[EMAIL PROTECTED]
  Sent: Monday, October 31, 2005 3:53 PM
  To: support@pfsense.com
  Subject: Re: [pfSense Support] Network Device pooling
 
  On Mon, 2005-10-31 at 16:31 -0500, Scott Ullrich wrote:
   Are we absolutely sure this program works as intended?  Personally
I
   wouldn't trust anything like this but smartbits.
 
  Well...
 
  It works if filtering is disabled on pfsese  - this is what worries
me.
  If the program would be broken it should not work in  both cases.
 
  Also I wrote when stall happens I can't telnet to port 80 on web
server
  host - which means it is not just program causing stall.
 
  If it is protection on FreeBSD side from too much activity from same
IP
  (Ie as it limits response to flood ping) this would be good to know.
 
  I hope this problem is actually something like that - I know there
are a
  lot of FreeBSD based routers out where  - if it would be broken for
real
  workloads something would scream already.
 
  One more interesting thing I noticed:
 
  Percentage of the requests served within a certain time (ms)
50% 32
66% 33
75% 33
80% 33
90% 44
95%295
98%324
99%330
   100%  21285 (longest request)
 
  Even if apache benchmark does not timeout it often shows too long
  response rate -  (21 sec in this case)
 
  What I've noticed - it can be 3,  9  or  21 secs in this case   -
This
  really look like the times at which SYN packets are resent by TCP/IP
  stacks if no reply for previous one arrives.
 
 
  Doing more experiments I also discovered I can increase chance of
  passing benchmark (still not to 100%)  if i reduce tcp_fin_timeout
and
  increase ip_local_port_range   variables ob my test driver host.
 
  This still brings the question why  with filtering and without
behavior
  is different but it makes me worry less :)
 
 
  
   Scott
  
  
   On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
On Mon, 2005-10-31 at 16:25 -0500, Scott Ullrich wrote:
 apr_poll: The timeout specified has expired (70007)

 What is the above from?  Your benchmark testing box?
   
Yes. This is output from apache benchmark program.
   
   
Benchmarking 111.111.111.158 (be patient)
Completed 1 requests
Completed 2 requests
Completed 3 requests
apr_poll: The timeout specified has expired (70007)
Total of 30517 requests completed
   
   
   

 On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
  On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote:
   Are you viewing the traffic queue status?   This would be
  normal if you are...
 
  Heh,
 
  yes good quess. These were running in the other window.
 
 
  So here is the output for stalled case
 
  # pfctl -ss | wc -l
 51898
 
  I have number of states set to 100.000 in advanced page so
it is
  not
  peak number.
 
 
  Note what really surprises me is the number of request when
if
  fails:
 
  apr_poll: The timeout specified has expired (70007)
  Total of 28217 requests completed
 
  This number of 28217 is seen so often... Sometimes it is a
bit
  more ot
  less but it is very frequently withing +/- 100 of it.
 
  I was asked if I can connect to the remote box when this
problem
  happens
  -  yes.  I can SSH to the same box which runs Apache, but I
  can't
  connect to the port 80 when this problem happens.
 
  So it looks like it does not like to see all these states
  corresponding
  to the same target port number.
 
 
 
  
   Scott
  
  
   On 10/31/05, Peter Zaitsev [EMAIL

Re: [pfSense Support] Network Device pooling

2005-11-01 Thread Dan Swartzendruber

At 01:31 PM 11/1/2005, you wrote:

Can we please let this thread die already?   I'm tired about hearing
of benchmarking the *WRONG* way.


Must. Control.  The.  Fist.  Of.  Death.





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [pfSense Support] Network Device pooling

2005-10-31 Thread Scott Ullrich
Please describe the hardware your using fully.  NICS, etc.   This is
not normal behavior.

On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
 On Sun, 2005-10-30 at 23:14 +0100, Espen Johansen wrote:
  Hi Peter,
 
  I have seen you have done a lot of testing with apache benchmarking.
  I find it a little strange to use this as a test. Basically you will hit the
  roof of standing I/O operations because you introduce latency with pfsense.
  The lower the latency the more finished tasks/connections per time unit.
  Most people don't take this into consideration when they tune apache.
  Although, this is one of the most important aspects of web-server tuning.

 Espen,

 If you would see to the set of my emails you would see the growing
 latency with network pooling is not my concern, as well as well as
 dropping throughput with pfsense in the middle - it is all
 understandable.

 What is NOT ok however is the stall  (20+ seconds) when CPU usage on
 pfsense drops almost to  zero and no traffics come on connections.
 Sometimes it causes apache benchmark to abort sometimes just shows crazy
 response times.

 This does not happen in direct benchmark (no pfsense in the middle) or
 with pfsense with disable firewall.

 Why I used apache benchmark ?  Well it is simple stress test which
 results in a lot of traffic and a lot of states in the state tables.

 
  This is the scenario:
 
  Client with low BW and high latency will generate a standing I/O because of
  the way apache is designed. So if a client with 100ms latency asks for a
  file of 100Kbyte and he has a 3KB/s transfer rate he will generate a
  standing I/O operation for latency + transfer time, and the I/O operation
  will not be finished until he has a completed transfer. So basically you do
  the same, because you change the amount of time the request takes to process
  you will have more standing I/O operations then if pfsense does routing only
  (faster then routing and filtering). So lets say that you increase latency
  from 0.4 ms to 2 ms it will mean that you have standing I/O 250% longer. So
  in turn that will mean that your ability to serve connections will be 1/5
  with 2ms compared to 0.4 ms latency.

 Well... This would be the case in real life scenario - slow clients
 blowing up number of apache children.  But it is not the case in
 synthetic Apache benchmark test.   In this case you set fixed
 concurrency.   I obviously set it low enough for my Apache box to
 handle.

 Furthermore pfsense locks even with single connection (this is
 independent if device pooling is enabled)


 
  The ones listed below seems to be the once that has the most effect on
  polling and performance. You will have to play around with these settings to
  find out what works best on your HW, as I can't seem to find some common
  setting that works well for all kinds of HW.
 
  kern.polling.each_burst=80
  kern.polling.burst_max=1000
  kern.polling.user_frac=50


 Thanks.




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [pfSense Support] Network Device pooling

2005-10-31 Thread Scott Ullrich
On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
 On Mon, 2005-10-31 at 12:03 -0500, Scott Ullrich wrote:
  Please describe the hardware your using fully.  NICS, etc.   This is
  not normal behavior.

 Sure It is Dell Poweredge 750
 512MB RAM,  SATA150 disk, Celeron 2.4Ghz

 ACPI APIC Table: DELL   PE750   
 Timecounter i8254 frequency 1193182 Hz quality 0
 CPU: Intel(R) Celeron(R) CPU 2.40GHz (2400.10-MHz 686-class CPU)
   Origin = GenuineIntel  Id = 0xf29  Stepping = 9

 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
   Features2=0x4400CNTX-ID,b14
 real memory  = 536608768 (511 MB)
 avail memory = 515547136 (491 MB)



 Nics are build in Intel 10/100/1000 NICs:

 em0: Intel(R) PRO/1000 Network Connection, Version - 2.1.7 port
 0xece0-0xecff mem 0xfe1e-0xfe1f irq 18 at device 1.0 on pci1
 em0: Ethernet address: 00:14:22:0a:64:4c
 em0:  Speed:N/A  Duplex:N/A


 It does not looks like this is hardware issue for me as if I disable
 firewall it works fine.

 I tried turning off scrub and it does not change anything. Still timeout
 after few requests:

And when this timeout occurs do you see anything in the system logs?  
Can you still telnet into the apache server behind pfsense?   This
really doesn't make a lot of sense.  It should be able to stand up to
this.

Scott

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [pfSense Support] Network Device pooling

2005-10-31 Thread Fleming, John \(ZeroChaos\)
Send the output.txt of...

date  /tmp/output.txt

netstat -m  /tmp/output.txt

netstat -in  /tmp/output.txt

sysctl hw.em0.stats=1  /tmp/output.txt

sysctl hw.em1.stats=1  /tmp/output.txt

sysctl hw.em2.stats=1  /tmp/output.txt

Can you send these while the machine is normal and when the machine is
choking? (send the output.txt file btw)

Are you able to try this test using routing ver bridging?


-Original Message-
From: Scott Ullrich [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 31, 2005 1:09 PM
To: support@pfsense.com
Subject: Re: [pfSense Support] Network Device pooling

On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
 On Mon, 2005-10-31 at 12:03 -0500, Scott Ullrich wrote:
  Please describe the hardware your using fully.  NICS, etc.   This is
  not normal behavior.

 Sure It is Dell Poweredge 750
 512MB RAM,  SATA150 disk, Celeron 2.4Ghz

 ACPI APIC Table: DELL   PE750   
 Timecounter i8254 frequency 1193182 Hz quality 0
 CPU: Intel(R) Celeron(R) CPU 2.40GHz (2400.10-MHz 686-class CPU)
   Origin = GenuineIntel  Id = 0xf29  Stepping = 9


Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE
,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
   Features2=0x4400CNTX-ID,b14
 real memory  = 536608768 (511 MB)
 avail memory = 515547136 (491 MB)



 Nics are build in Intel 10/100/1000 NICs:

 em0: Intel(R) PRO/1000 Network Connection, Version - 2.1.7 port
 0xece0-0xecff mem 0xfe1e-0xfe1f irq 18 at device 1.0 on pci1
 em0: Ethernet address: 00:14:22:0a:64:4c
 em0:  Speed:N/A  Duplex:N/A


 It does not looks like this is hardware issue for me as if I disable
 firewall it works fine.

 I tried turning off scrub and it does not change anything. Still
timeout
 after few requests:

And when this timeout occurs do you see anything in the system logs?  
Can you still telnet into the apache server behind pfsense?   This
really doesn't make a lot of sense.  It should be able to stand up to
this.

Scott

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [pfSense Support] Network Device pooling

2005-10-31 Thread Peter Zaitsev
On Mon, 2005-10-31 at 13:26 -0600, Fleming, John (ZeroChaos) wrote:
 Benchmarking 111.111.111.158 (be patient) Completed 1 requests -
 isn't 10,000 the default limit of the state table? That sure would
 explain a lot.

I boosted it to 10 of course 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [pfSense Support] Network Device pooling

2005-10-31 Thread Fleming, John \(ZeroChaos\)
I wonder if part of the problem is PF isn't seeing the TCP tear down. It
seems a little odd that the max gets hit and nothing else gets through.
I guess it could be the benchmark isn't shutting down the session right
after its down transferring data, but I would think it would kill the
benchmark client to have 10K(ish) of open TCP sessions.

-Original Message-
From: Scott Ullrich [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 31, 2005 1:28 PM
To: support@pfsense.com
Subject: Re: [pfSense Support] Network Device pooling

On 10/31/05, Fleming, John (ZeroChaos) [EMAIL PROTECTED] wrote:
 Benchmarking 111.111.111.158 (be patient) Completed 1 requests -
 isn't 10,000 the default limit of the state table? That sure would
 explain a lot.

Yep.   10K is the default and it is adjustable from the System -
Advanced screen.

Scott

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [pfSense Support] Network Device pooling

2005-10-31 Thread Scott Ullrich
On 10/31/05, Fleming, John (ZeroChaos) [EMAIL PROTECTED] wrote:
 I wonder if part of the problem is PF isn't seeing the TCP tear down. It
 seems a little odd that the max gets hit and nothing else gets through.
 I guess it could be the benchmark isn't shutting down the session right
 after its down transferring data, but I would think it would kill the
 benchmark client to have 10K(ish) of open TCP sessions.

One way to deterimine this would be to run pfctl -ss | wc -l once
pfSense stops responding?

Scott

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [pfSense Support] Network Device pooling

2005-10-31 Thread Peter Zaitsev
On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote:
 On 10/31/05, Fleming, John (ZeroChaos) [EMAIL PROTECTED] wrote:
  I wonder if part of the problem is PF isn't seeing the TCP tear down. It
  seems a little odd that the max gets hit and nothing else gets through.
  I guess it could be the benchmark isn't shutting down the session right
  after its down transferring data, but I would think it would kill the
  benchmark client to have 10K(ish) of open TCP sessions.
 
 One way to deterimine this would be to run pfctl -ss | wc -l once
 pfSense stops responding?

Very interesting

I tried running this before the problems but it looks strange already:

# pfctl -ss | wc -l
4893
Killed
# pfctl -ss | wc -l
   23245
Killed

There is nothing in dmesg or  system logs. 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [pfSense Support] Network Device pooling

2005-10-31 Thread Scott Ullrich
Are you viewing the traffic queue status?   This would be normal if you are...

Scott


On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
 On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote:
  On 10/31/05, Fleming, John (ZeroChaos) [EMAIL PROTECTED] wrote:
   I wonder if part of the problem is PF isn't seeing the TCP tear down. It
   seems a little odd that the max gets hit and nothing else gets through.
   I guess it could be the benchmark isn't shutting down the session right
   after its down transferring data, but I would think it would kill the
   benchmark client to have 10K(ish) of open TCP sessions.
 
  One way to deterimine this would be to run pfctl -ss | wc -l once
  pfSense stops responding?

 Very interesting

 I tried running this before the problems but it looks strange already:

 # pfctl -ss | wc -l
 4893
 Killed
 # pfctl -ss | wc -l
23245
 Killed

 There is nothing in dmesg or  system logs.





 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [pfSense Support] Network Device pooling

2005-10-31 Thread Peter Zaitsev
On Mon, 2005-10-31 at 13:25 -0600, Fleming, John (ZeroChaos) wrote:



 
 Can you send these while the machine is normal and when the machine is
 choking? (send the output.txt file btw)

Normal:

# cat /tmp/output.txt
Mon Oct 31 07:50:52 PST 2005
564/336/900 mbufs in use (current/cache/total)
555/269/824/17088 mbuf clusters in use (current/cache/total/max)
0/3/4528 sfbufs in use (current/peak/max)
1253K/622K/1875K bytes allocated to network (current/cache/total)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines
NameMtu Network   Address  Ipkts IerrsOpkts
Oerrs  Coll
em01500 Link#1  00:14:22:0a:64:4c  2200575 0  2004248
0 0
em01500 fe80:1::214:2 fe80:1::214:22ff:0 -4
- -
em01500 111.111.111.152 111.111.111.154 3395 -0
- -
em11500 Link#2  00:14:22:0a:64:4d  2003036 0  2195974
0 0
em11500 fe80:2::214:2 fe80:2::214:22ff:0 -4
- -
em11500 111.111.111.152 111.111.111.1540 - 6162
- -
pfsyn  2020 Link#3   0 00
0 0
lo0   16384 Link#4   0 00
0 0
lo0   16384 127   127.0.0.10 -0
- -
lo0   16384 ::1/128   ::1  0 -0
- -
lo0   16384 fe80:4::1/64  fe80:4::10 -0
- -
pflog 33208 Link#5   0 00
0 0
bridg  1500 Link#6  ac:de:48:e1:dd:5f  4197981 0  4200265
0 0




Choking:


Mon Oct 31 07:48:44 PST 2005
515/385/900 mbufs in use (current/cache/total)
514/310/824/17088 mbuf clusters in use (current/cache/total/max)
0/3/4528 sfbufs in use (current/peak/max)
1156K/716K/1873K bytes allocated to network (current/cache/total)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines
NameMtu Network   Address  Ipkts IerrsOpkts
Oerrs  Coll
em01500 Link#1  00:14:22:0a:64:4c  2011449 0  1838611
0 0
em01500 fe80:1::214:2 fe80:1::214:22ff:0 -4
- -
em01500 111.111.111.152 111.111.111.154 2644 -0
- -
em11500 Link#2  00:14:22:0a:64:4d  1835313 0  2007595
0 0
em11500 fe80:2::214:2 fe80:2::214:22ff:0 -4
- -
em11500 111.111.111.152 111.111.111.1540 - 5336
- -
pfsyn  2020 Link#3   0 00
0 0
lo0   16384 Link#4   0 00
0 0
lo0   16384 127   127.0.0.10 -0
- -
lo0   16384 ::1/128   ::1  0 -0
- -
lo0   16384 fe80:4::1/64  fe80:4::10 -0
- -
pflog 33208 Link#5   0 00
0 0
bridg  1500 Link#6  ac:de:48:e1:dd:5f  3841883 0  3846209
0 0


Some of your advised commands fail:


# sysctl hw.em0.stats=1  /tmp/output.txt
sysctl: unknown oid 'hw.em0.stats'
#
# sysctl hw.em1.stats=1  /tmp/output.txt
sysctl: unknown oid 'hw.em1.stats'
#
# sysctl hw.em2.stats=1  /tmp/output.txt
sysctl: unknown oid 'hw.em2.stats'




 
 Are you able to try this test using routing ver bridging?

I did not try with routing as this is not what I'm going to use.
I however tried doing this with firewall disabled and bridging enabled
which seems to show it is not bridging itself at least. 


 
 
 -Original Message-
 From: Scott Ullrich [mailto:[EMAIL PROTECTED] 
 Sent: Monday, October 31, 2005 1:09 PM
 To: support@pfsense.com
 Subject: Re: [pfSense Support] Network Device pooling
 
 On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
  On Mon, 2005-10-31 at 12:03 -0500, Scott Ullrich wrote:
   Please describe the hardware your using fully.  NICS, etc.   This is
   not normal behavior.
 
  Sure It is Dell Poweredge 750
  512MB RAM,  SATA150 disk, Celeron 2.4Ghz
 
  ACPI APIC Table: DELL   PE750   
  Timecounter i8254 frequency 1193182 Hz quality 0
  CPU: Intel(R) Celeron(R) CPU 2.40GHz (2400.10-MHz 686-class CPU)
Origin = GenuineIntel  Id = 0xf29  Stepping = 9
 
 
 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE
 ,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
Features2=0x4400CNTX-ID,b14
  real memory  = 536608768 (511 MB)
  avail memory = 515547136 (491 MB)
 
 
 
  Nics are build in Intel 10/100/1000 NICs:
 
  em0: Intel(R) PRO/1000 Network Connection, Version - 2.1.7 port
  0xece0-0xecff mem 0xfe1e-0xfe1f irq 18 at device 1.0 on pci1
  em0: Ethernet address: 00:14:22:0a:64:4c
  em0:  Speed:N/A  Duplex:N/A
 
 
  It does not looks like this is hardware issue for me as if I disable
  firewall it works fine.
 
  I tried turning off scrub

Re: [pfSense Support] Network Device pooling

2005-10-31 Thread Scott Ullrich
apr_poll: The timeout specified has expired (70007)

What is the above from?  Your benchmark testing box?

On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
 On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote:
  Are you viewing the traffic queue status?   This would be normal if you 
  are...

 Heh,

 yes good quess. These were running in the other window.


 So here is the output for stalled case

 # pfctl -ss | wc -l
51898

 I have number of states set to 100.000 in advanced page so it is not
 peak number.


 Note what really surprises me is the number of request when if fails:

 apr_poll: The timeout specified has expired (70007)
 Total of 28217 requests completed

 This number of 28217 is seen so often... Sometimes it is a bit more ot
 less but it is very frequently withing +/- 100 of it.

 I was asked if I can connect to the remote box when this problem happens
 -  yes.  I can SSH to the same box which runs Apache, but I can't
 connect to the port 80 when this problem happens.

 So it looks like it does not like to see all these states corresponding
 to the same target port number.



 
  Scott
 
 
  On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
   On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote:
On 10/31/05, Fleming, John (ZeroChaos) [EMAIL PROTECTED] wrote:
 I wonder if part of the problem is PF isn't seeing the TCP tear down. 
 It
 seems a little odd that the max gets hit and nothing else gets 
 through.
 I guess it could be the benchmark isn't shutting down the session 
 right
 after its down transferring data, but I would think it would kill the
 benchmark client to have 10K(ish) of open TCP sessions.
   
One way to deterimine this would be to run pfctl -ss | wc -l once
pfSense stops responding?
  
   Very interesting
  
   I tried running this before the problems but it looks strange already:
  
   # pfctl -ss | wc -l
   4893
   Killed
   # pfctl -ss | wc -l
  23245
   Killed
  
   There is nothing in dmesg or  system logs.
  
  
  
  
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
  
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [pfSense Support] Network Device pooling

2005-10-31 Thread Scott Ullrich
Have you seen this?  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=110887

Looks like a apachebench problem to me.

Scott

On 10/31/05, Scott Ullrich [EMAIL PROTECTED] wrote:
 Are we absolutely sure this program works as intended?  Personally I
 wouldn't trust anything like this but smartbits.

 Scott


 On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
  On Mon, 2005-10-31 at 16:25 -0500, Scott Ullrich wrote:
   apr_poll: The timeout specified has expired (70007)
  
   What is the above from?  Your benchmark testing box?
 
  Yes. This is output from apache benchmark program.
 
 
  Benchmarking 111.111.111.158 (be patient)
  Completed 1 requests
  Completed 2 requests
  Completed 3 requests
  apr_poll: The timeout specified has expired (70007)
  Total of 30517 requests completed
 
 
 
  
   On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote:
 Are you viewing the traffic queue status?   This would be normal if 
 you are...
   
Heh,
   
yes good quess. These were running in the other window.
   
   
So here is the output for stalled case
   
# pfctl -ss | wc -l
   51898
   
I have number of states set to 100.000 in advanced page so it is not
peak number.
   
   
Note what really surprises me is the number of request when if fails:
   
apr_poll: The timeout specified has expired (70007)
Total of 28217 requests completed
   
This number of 28217 is seen so often... Sometimes it is a bit more ot
less but it is very frequently withing +/- 100 of it.
   
I was asked if I can connect to the remote box when this problem happens
-  yes.  I can SSH to the same box which runs Apache, but I can't
connect to the port 80 when this problem happens.
   
So it looks like it does not like to see all these states corresponding
to the same target port number.
   
   
   

 Scott


 On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
  On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote:
   On 10/31/05, Fleming, John (ZeroChaos) [EMAIL PROTECTED] wrote:
I wonder if part of the problem is PF isn't seeing the TCP tear 
down. It
seems a little odd that the max gets hit and nothing else gets 
through.
I guess it could be the benchmark isn't shutting down the 
session right
after its down transferring data, but I would think it would 
kill the
benchmark client to have 10K(ish) of open TCP sessions.
  
   One way to deterimine this would be to run pfctl -ss | wc -l once
   pfSense stops responding?
 
  Very interesting
 
  I tried running this before the problems but it looks strange 
  already:
 
  # pfctl -ss | wc -l
  4893
  Killed
  # pfctl -ss | wc -l
 23245
  Killed
 
  There is nothing in dmesg or  system logs.
 
 
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

   
   
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   
   
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [pfSense Support] Network Device pooling

2005-10-31 Thread Peter Zaitsev
On Mon, 2005-10-31 at 16:31 -0500, Scott Ullrich wrote:
 Are we absolutely sure this program works as intended?  Personally I
 wouldn't trust anything like this but smartbits.

Well... 

It works if filtering is disabled on pfsese  - this is what worries me.
If the program would be broken it should not work in  both cases.

Also I wrote when stall happens I can't telnet to port 80 on web server
host - which means it is not just program causing stall.

If it is protection on FreeBSD side from too much activity from same IP
(Ie as it limits response to flood ping) this would be good to know.

I hope this problem is actually something like that - I know there are a
lot of FreeBSD based routers out where  - if it would be broken for real
workloads something would scream already.

One more interesting thing I noticed: 

Percentage of the requests served within a certain time (ms)
  50% 32
  66% 33
  75% 33
  80% 33
  90% 44
  95%295
  98%324
  99%330
 100%  21285 (longest request)

Even if apache benchmark does not timeout it often shows too long
response rate -  (21 sec in this case)

What I've noticed - it can be 3,  9  or  21 secs in this case   - This
really look like the times at which SYN packets are resent by TCP/IP
stacks if no reply for previous one arrives. 

 
Doing more experiments I also discovered I can increase chance of
passing benchmark (still not to 100%)  if i reduce tcp_fin_timeout and
increase ip_local_port_range   variables ob my test driver host.

This still brings the question why  with filtering and without behavior
is different but it makes me worry less :)


 
 Scott
 
 
 On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
  On Mon, 2005-10-31 at 16:25 -0500, Scott Ullrich wrote:
   apr_poll: The timeout specified has expired (70007)
  
   What is the above from?  Your benchmark testing box?
 
  Yes. This is output from apache benchmark program.
 
 
  Benchmarking 111.111.111.158 (be patient)
  Completed 1 requests
  Completed 2 requests
  Completed 3 requests
  apr_poll: The timeout specified has expired (70007)
  Total of 30517 requests completed
 
 
 
  
   On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
On Mon, 2005-10-31 at 15:48 -0500, Scott Ullrich wrote:
 Are you viewing the traffic queue status?   This would be normal if 
 you are...
   
Heh,
   
yes good quess. These were running in the other window.
   
   
So here is the output for stalled case
   
# pfctl -ss | wc -l
   51898
   
I have number of states set to 100.000 in advanced page so it is not
peak number.
   
   
Note what really surprises me is the number of request when if fails:
   
apr_poll: The timeout specified has expired (70007)
Total of 28217 requests completed
   
This number of 28217 is seen so often... Sometimes it is a bit more ot
less but it is very frequently withing +/- 100 of it.
   
I was asked if I can connect to the remote box when this problem happens
-  yes.  I can SSH to the same box which runs Apache, but I can't
connect to the port 80 when this problem happens.
   
So it looks like it does not like to see all these states corresponding
to the same target port number.
   
   
   

 Scott


 On 10/31/05, Peter Zaitsev [EMAIL PROTECTED] wrote:
  On Mon, 2005-10-31 at 14:39 -0500, Scott Ullrich wrote:
   On 10/31/05, Fleming, John (ZeroChaos) [EMAIL PROTECTED] wrote:
I wonder if part of the problem is PF isn't seeing the TCP tear 
down. It
seems a little odd that the max gets hit and nothing else gets 
through.
I guess it could be the benchmark isn't shutting down the 
session right
after its down transferring data, but I would think it would 
kill the
benchmark client to have 10K(ish) of open TCP sessions.
  
   One way to deterimine this would be to run pfctl -ss | wc -l once
   pfSense stops responding?
 
  Very interesting
 
  I tried running this before the problems but it looks strange 
  already:
 
  # pfctl -ss | wc -l
  4893
  Killed
  # pfctl -ss | wc -l
 23245
  Killed
 
  There is nothing in dmesg or  system logs.
 
 
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

   
   
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   
   
  
   -