Re: serious networking (em) performance (ggate and NFS) problem

2005-07-02 Thread Matthew Dillon
Polling should not produce any improvement over interrupts for EM0.
The EM0 card will aggregate 8-14+ packets per interrupt, or more.
which is only around 8000 interrupts/sec.  I've got a ton of these 
cards installed.

# mount_nfs -a 4 dhcp61:/home /mnt
# dd if=/mnt/x of=/dev/null bs=32k
# netstat -in 1
input(Total)   output
   packets  errs  bytespackets  errs  bytes colls
 66401 0   93668746   5534 0 962920 0
 66426 0   94230092   5537 01007108 0
 66424 0   93699848   5536 0 963268 0
 66422 0   94222372   5536 01007290 0
 66391 0   93654846   5534 0 962746 0
 66375 0   94154432   5532 01006404 0

  zfod   Interrupts
Proc:r  p  d  s  wCsw  Trp  Sys  Int  Sof  Fltcow8100 total
 19  62117   75 81004   12  88864 wire   7873 mux irq10
10404 act ata0 irq14
19.2%Sys   0.0%Intr  0.0%User  0.0%Nice 80.8%Idl   864476 inact   ata1 irq15
||||||||||  58152 cache   mux irq11
==   2992 free227 clk irq0


Note that the interrupt rate is only 7873 interrupts per second
while I am transfering 94 MBytes/sec over NFS (UDP) and receiving
over 66000 packets per second (~8 packets per interrupt).

If I use a TCP mount I get just about the same thing:

# mount_nfs -T -a 4 dhcp61:/home /mnt
# dd if=/mnt/x of=/dev/null bs=32k
# netstat -in 1

input(Total)   output
   packets  errs  bytespackets  errs  bytes colls
 61752 0   93978800   8091 0 968618 0
 61780 0   93530484   8098 0 904370 0
 61710 0   93917880   8093 0 968128 0
 61754 0   93491260   8095 0 903940 0
 61756 0   93986320   8097 0 968336 0


Proc:r  p  d  s  wCsw  Trp  Sys  Int  Sof  Fltcow8145 total
   5  8 22828   13 5490 8146   13   11 141556 wire   7917 mux irq10
 7800 act ata0 irq14
26.4%Sys   0.0%Intr  0.0%User  0.0%Nice 73.6%Idl   244872 inact   ata1 irq15
||||||||||  8 cache   mux irq11
=  630780 free228 clk irq0

In this case around 8000 interrupts per second with 61700 packet per
second incoming on the interface (around ~8 packets per interrupt).
The extra interrupts are due to the additional outgoing TCP ack traffic.

If I look at the systat -vm 1 output on the NFS server it also sees
only around 8000 interrupts per second, which isn't saying much other
then it's transmit path (61700 pps outoging) is not creating an undue
interrupt burden relative to the receive path.

-Matt

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-29 Thread Andre Oppermann
"David G. Lawrence" wrote:
> 
> > >>tests.  With the re driver, no change except placing a 100BT setup with
> > >>no packet loss to a gigE setup (both linksys switches) will cause
> > >>serious packet loss at 20Mbps data rates.  I have discovered the only
> > >>way to get good performance with no packet loss was to
> > >>
> > >>1) Remove interrupt moderation
> > >>2) defrag each mbuf that comes in to the driver.
> > >
> > >Sounds like you're bumping into a queue limit that is made worse by
> > >interrupting less frequently, resulting in bursts of packets that are
> > >relatively large, rather than a trickle of packets at a higher rate.
> > >Perhaps a limit on the number of outstanding descriptors in the driver or
> > >hardware and/or a limit in the netisr/ifqueue queue depth.  You might try
> > >changing the default IFQ_MAXLEN from 50 to 128 to increase the size of the
> > >ifnet and netisr queues.  You could also try setting net.isr.enable=1 to
> > >enable direct dispatch, which in the in-bound direction would reduce the
> > >number of context switches and queueing.  It sounds like the device driver
> > >has a limit of 256 receive and transmit descriptors, which one supposes is
> > >probably derived from the hardware limit, but I have no documentation on
> > >hand so can't confirm that.
> > >
> > >It would be interesting on the send and receive sides to inspect the
> > >counters for drops at various points in the network stack; i.e., are we
> > >dropping packets at the ifq handoff because we're overfilling the
> > >descriptors in the driver, are packets dropped on the inbound path going
> > >into the netisr due to over-filling before the netisr is scheduled, etc.
> > >And, it's probably interesting to look at stats on filling the socket
> > >buffers for the same reason: if bursts of packets come up the stack, the
> > >socket buffers could well be being over-filled before the user thread can
> > >run.
> >
> > I think it's the tcp_output() path that overflows the transmit side of
> > the card.  I take that from the better numbers when he defrags the packets.
> > Once I catch up with my mails I start to put up the code I wrote over the
> > last two weeks. :-)  You can call me Mr. TCP now. ;-)
> 
>He was doing his test with NFS over TCP, right? ...That would be a single
> connection, so how is it possible to 'overflow the transmit side of the
> card'? The TCP window size will prevent more than 64KB to be outstanding.
> Assuming standard size ethernet frames, that would be a maximum of 45 packets
> in-flight at any time (65536/1460=45), well below the 256 available transmit
> descriptors.
>It is also worth pointing out that 45 full-size packets is 540us at
> gig-e speeds. Even when you add up typical switch latencies and interrupt
> overhead and coalesing on both sides, it's hard to imagine that the window
> size (bandwidth * delay) would be a significant limiting factor across a
> gig-e LAN.

For some reason he is getting long mbuf chains and that is why a call to
m_defrag helps.  With long mbuf chains you can easily overflow the transmit
descriptors.

>I too am seeing low NFS performance (both TCP and UDP) with non-SMP
> 5.3, but on the same systems I can measure raw TCP performance (using
> ttcp) of >850Mbps. It looks to me like there is something wrong with
> NFS, perhaps caused by delays with scheduling nfsd?

-- 
Andre
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-25 Thread Claus Guttesen
> > >ifnet and netisr queues.  You could also try
> setting net.isr.enable=1 to
> > >enable direct dispatch, which in the in-bound
> direction would reduce the
> > >number of context switches and queueing.  It
> sounds like the device driver
> > >has a limit of 256 receive and transmit
> descriptors, which one supposes is
> > >probably derived from the hardware limit, but I
> have no documentation on
> > >hand so can't confirm that.

It may not cast much light on the issue but I tried
setting net.isr.enable on to 1 on a nfs-server using
tcp on 5.3 RC3 copying between three clients. When
setting net.isr.enable to 0 the cpu-usage went up. The
server is running with this setting now and is being
mounted by 11 webservers. Using em-gb-card.

Claus

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-25 Thread David G. Lawrence
> >>tests.  With the re driver, no change except placing a 100BT setup with
> >>no packet loss to a gigE setup (both linksys switches) will cause
> >>serious packet loss at 20Mbps data rates.  I have discovered the only
> >>way to get good performance with no packet loss was to
> >>
> >>1) Remove interrupt moderation
> >>2) defrag each mbuf that comes in to the driver.
> >
> >Sounds like you're bumping into a queue limit that is made worse by
> >interrupting less frequently, resulting in bursts of packets that are
> >relatively large, rather than a trickle of packets at a higher rate.
> >Perhaps a limit on the number of outstanding descriptors in the driver or
> >hardware and/or a limit in the netisr/ifqueue queue depth.  You might try
> >changing the default IFQ_MAXLEN from 50 to 128 to increase the size of the
> >ifnet and netisr queues.  You could also try setting net.isr.enable=1 to
> >enable direct dispatch, which in the in-bound direction would reduce the
> >number of context switches and queueing.  It sounds like the device driver
> >has a limit of 256 receive and transmit descriptors, which one supposes is
> >probably derived from the hardware limit, but I have no documentation on
> >hand so can't confirm that.
> >
> >It would be interesting on the send and receive sides to inspect the
> >counters for drops at various points in the network stack; i.e., are we
> >dropping packets at the ifq handoff because we're overfilling the
> >descriptors in the driver, are packets dropped on the inbound path going
> >into the netisr due to over-filling before the netisr is scheduled, etc. 
> >And, it's probably interesting to look at stats on filling the socket
> >buffers for the same reason: if bursts of packets come up the stack, the
> >socket buffers could well be being over-filled before the user thread can
> >run.
> 
> I think it's the tcp_output() path that overflows the transmit side of
> the card.  I take that from the better numbers when he defrags the packets.
> Once I catch up with my mails I start to put up the code I wrote over the
> last two weeks. :-)  You can call me Mr. TCP now. ;-)

   He was doing his test with NFS over TCP, right? ...That would be a single
connection, so how is it possible to 'overflow the transmit side of the
card'? The TCP window size will prevent more than 64KB to be outstanding.
Assuming standard size ethernet frames, that would be a maximum of 45 packets
in-flight at any time (65536/1460=45), well below the 256 available transmit
descriptors.
   It is also worth pointing out that 45 full-size packets is 540us at
gig-e speeds. Even when you add up typical switch latencies and interrupt
overhead and coalesing on both sides, it's hard to imagine that the window
size (bandwidth * delay) would be a significant limiting factor across a
gig-e LAN.
   I too am seeing low NFS performance (both TCP and UDP) with non-SMP
5.3, but on the same systems I can measure raw TCP performance (using
ttcp) of >850Mbps. It looks to me like there is something wrong with
NFS, perhaps caused by delays with scheduling nfsd?

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
TeraSolutions, Inc. - http://www.terasolutions.com - (888) 346 7175
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-25 Thread Andre Oppermann
Robert Watson wrote:
On Sun, 21 Nov 2004, Sean McNeil wrote:
I have to disagree.  Packet loss is likely according to some of my
tests.  With the re driver, no change except placing a 100BT setup with
no packet loss to a gigE setup (both linksys switches) will cause
serious packet loss at 20Mbps data rates.  I have discovered the only
way to get good performance with no packet loss was to
1) Remove interrupt moderation
2) defrag each mbuf that comes in to the driver.
Sounds like you're bumping into a queue limit that is made worse by
interrupting less frequently, resulting in bursts of packets that are
relatively large, rather than a trickle of packets at a higher rate.
Perhaps a limit on the number of outstanding descriptors in the driver or
hardware and/or a limit in the netisr/ifqueue queue depth.  You might try
changing the default IFQ_MAXLEN from 50 to 128 to increase the size of the
ifnet and netisr queues.  You could also try setting net.isr.enable=1 to
enable direct dispatch, which in the in-bound direction would reduce the
number of context switches and queueing.  It sounds like the device driver
has a limit of 256 receive and transmit descriptors, which one supposes is
probably derived from the hardware limit, but I have no documentation on
hand so can't confirm that.
It would be interesting on the send and receive sides to inspect the
counters for drops at various points in the network stack; i.e., are we
dropping packets at the ifq handoff because we're overfilling the
descriptors in the driver, are packets dropped on the inbound path going
into the netisr due to over-filling before the netisr is scheduled, etc. 
And, it's probably interesting to look at stats on filling the socket
buffers for the same reason: if bursts of packets come up the stack, the
socket buffers could well be being over-filled before the user thread can
run.
I think it's the tcp_output() path that overflows the transmit side of
the card.  I take that from the better numbers when he defrags the packets.
Once I catch up with my mails I start to put up the code I wrote over the
last two weeks. :-)  You can call me Mr. TCP now. ;-)
--
Andre
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Re[4]: serious networking (em) performance (ggate and NFS) problem

2004-11-22 Thread Matthew Dillon
:Increasing the interrupt moderation frequency worked on the re driver,
:but it only made it marginally better.  Even without moderation,
:however, I could lose packets without m_defrag.  I suspect that there is
:something in the higher level layers that is causing the packet loss.  I
:have no explanation why m_defrag makes such a big difference for me, but
:it does.  I also have no idea why a 20Mbps UDP stream can lose data over
:gigE phy and not lose anything over 100BT... without the above mentioned
:changes that is.

It kinda sounds like the receiver's UDP buffer is not large enough to
handle the burst traffic.  100BT is a much slower transport and the
receiver (userland process) was likely able drain its buffer before
new packets arrived.

Use netstat -s to observe the drop statistics for udp on both the
sender and receiver sides.  You may also be able to get some useful
information looking at the ip stats on both sides too.

Try bumping up net.inet.udp.recvspace and see if that helps.

In anycase, you should be able to figure out where the drops are occuring
by observing netstat -s output.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Re[4]: serious networking (em) performance (ggate and NFS) problem

2004-11-22 Thread Sean McNeil
Hi John-Mark,

On Mon, 2004-11-22 at 13:31 -0800, John-Mark Gurney wrote:
> Sean McNeil wrote this message on Mon, Nov 22, 2004 at 12:14 -0800:
> > On Mon, 2004-11-22 at 11:34 +, Robert Watson wrote:
> > > On Sun, 21 Nov 2004, Sean McNeil wrote:
> > > 
> > > > I have to disagree.  Packet loss is likely according to some of my
> > > > tests.  With the re driver, no change except placing a 100BT setup with
> > > > no packet loss to a gigE setup (both linksys switches) will cause
> > > > serious packet loss at 20Mbps data rates.  I have discovered the only
> > > > way to get good performance with no packet loss was to
> > > > 
> > > > 1) Remove interrupt moderation
> > > > 2) defrag each mbuf that comes in to the driver.
> > > 
> > > Sounds like you're bumping into a queue limit that is made worse by
> > > interrupting less frequently, resulting in bursts of packets that are
> > > relatively large, rather than a trickle of packets at a higher rate.
> > > Perhaps a limit on the number of outstanding descriptors in the driver or
> > > hardware and/or a limit in the netisr/ifqueue queue depth.  You might try
> > > changing the default IFQ_MAXLEN from 50 to 128 to increase the size of the
> > > ifnet and netisr queues.  You could also try setting net.isr.enable=1 to
> > > enable direct dispatch, which in the in-bound direction would reduce the
> > > number of context switches and queueing.  It sounds like the device driver
> > > has a limit of 256 receive and transmit descriptors, which one supposes is
> > > probably derived from the hardware limit, but I have no documentation on
> > > hand so can't confirm that.
> > 
> > I've tried bumping IFQ_MAXLEN and it made no difference.  I could rerun
> 
> And the default for if_re is RL_IFQ_MAXLEN which is already 512...  As
> is mentioned below, the card can do 64 segments (which usually means 32
> packets since each packet usually has a header + payload in seperate
> packets)...

It sounds like you believe this is an if_re-only problem.  I had the
feeling that the if_em driver performance problems were related in some
way.  I noticed that if_em does not do anything with m_defrag and
thought it might be a little more than coincidence.

> > this test to be 100% certain I suppose.  It was done a while back.  I
> > haven't tried net.isr.enable=1, but packet loss is in the transmission
> > direction.  The device driver has been modified to have 1024 transmit
> > and receive descriptors each as that is the hardware limitation.  That
> > didn't matter either.  With 1024 descriptors I still lost packets
> > without the m_defrag.
> 
> hmmm...  you know, I wonder if this is a problem with the if_re not
> pulling enough data from memory before starting the transmit...  Though
> we currently have it set for unlimited... so, that doesn't seem like it
> would be it..

Right.  Plus it now has 1024 descriptors on my machine and, like I said,
made little difference.

> > The most difficult thing for me to understand is:  if this is some sort
> > of resource limitation why will it work with a slower phy layer
> > perfectly and not with the gigE?  The only thing I could think of was
> > that the old driver was doing m_defrag calls when it filled the transmit
> > descriptor queues up to a certain point.  Understanding the effects of
> > m_defrag would be helpful in figuring this out I suppose.
> 
> maybe the chip just can't keep the transmit fifo loaded at the higher
> speeds...  is it possible vls is doing a writev for multisegmented UDP
> packet?   I'll have to look at this again...

I suppose.  As I understand it, though, it should be sending out
1316-byte data packets at a metered pace.  Also, wouldn't it behave the
same for 100BT vs. gigE?  Shouldn't I see packet loss with 100BT if this
is the case?

> > > It would be interesting on the send and receive sides to inspect the
> > > counters for drops at various points in the network stack; i.e., are we
> > > dropping packets at the ifq handoff because we're overfilling the
> > > descriptors in the driver, are packets dropped on the inbound path going
> > > into the netisr due to over-filling before the netisr is scheduled, etc. 
> > > And, it's probably interesting to look at stats on filling the socket
> > > buffers for the same reason: if bursts of packets come up the stack, the
> > > socket buffers could well be being over-filled before the user thread can
> > > run.
> > 
> > Yes, this would be very interesting and should point out the problem.  I
> > would do such a thing if I had enough knowledge of the network pathways.
> > Alas, I am very green in this area.  The receive side has no issues,
> > though, so I would focus on transmit counters (with assistance).
> 


signature.asc
Description: This is a digitally signed message part


Re: Re[4]: serious networking (em) performance (ggate and NFS) problem

2004-11-22 Thread John-Mark Gurney
Sean McNeil wrote this message on Mon, Nov 22, 2004 at 12:14 -0800:
> On Mon, 2004-11-22 at 11:34 +, Robert Watson wrote:
> > On Sun, 21 Nov 2004, Sean McNeil wrote:
> > 
> > > I have to disagree.  Packet loss is likely according to some of my
> > > tests.  With the re driver, no change except placing a 100BT setup with
> > > no packet loss to a gigE setup (both linksys switches) will cause
> > > serious packet loss at 20Mbps data rates.  I have discovered the only
> > > way to get good performance with no packet loss was to
> > > 
> > > 1) Remove interrupt moderation
> > > 2) defrag each mbuf that comes in to the driver.
> > 
> > Sounds like you're bumping into a queue limit that is made worse by
> > interrupting less frequently, resulting in bursts of packets that are
> > relatively large, rather than a trickle of packets at a higher rate.
> > Perhaps a limit on the number of outstanding descriptors in the driver or
> > hardware and/or a limit in the netisr/ifqueue queue depth.  You might try
> > changing the default IFQ_MAXLEN from 50 to 128 to increase the size of the
> > ifnet and netisr queues.  You could also try setting net.isr.enable=1 to
> > enable direct dispatch, which in the in-bound direction would reduce the
> > number of context switches and queueing.  It sounds like the device driver
> > has a limit of 256 receive and transmit descriptors, which one supposes is
> > probably derived from the hardware limit, but I have no documentation on
> > hand so can't confirm that.
> 
> I've tried bumping IFQ_MAXLEN and it made no difference.  I could rerun

And the default for if_re is RL_IFQ_MAXLEN which is already 512...  As
is mentioned below, the card can do 64 segments (which usually means 32
packets since each packet usually has a header + payload in seperate
packets)...

> this test to be 100% certain I suppose.  It was done a while back.  I
> haven't tried net.isr.enable=1, but packet loss is in the transmission
> direction.  The device driver has been modified to have 1024 transmit
> and receive descriptors each as that is the hardware limitation.  That
> didn't matter either.  With 1024 descriptors I still lost packets
> without the m_defrag.

hmmm...  you know, I wonder if this is a problem with the if_re not
pulling enough data from memory before starting the transmit...  Though
we currently have it set for unlimited... so, that doesn't seem like it
would be it..

> The most difficult thing for me to understand is:  if this is some sort
> of resource limitation why will it work with a slower phy layer
> perfectly and not with the gigE?  The only thing I could think of was
> that the old driver was doing m_defrag calls when it filled the transmit
> descriptor queues up to a certain point.  Understanding the effects of
> m_defrag would be helpful in figuring this out I suppose.

maybe the chip just can't keep the transmit fifo loaded at the higher
speeds...  is it possible vls is doing a writev for multisegmented UDP
packet?   I'll have to look at this again...

> > It would be interesting on the send and receive sides to inspect the
> > counters for drops at various points in the network stack; i.e., are we
> > dropping packets at the ifq handoff because we're overfilling the
> > descriptors in the driver, are packets dropped on the inbound path going
> > into the netisr due to over-filling before the netisr is scheduled, etc. 
> > And, it's probably interesting to look at stats on filling the socket
> > buffers for the same reason: if bursts of packets come up the stack, the
> > socket buffers could well be being over-filled before the user thread can
> > run.
> 
> Yes, this would be very interesting and should point out the problem.  I
> would do such a thing if I had enough knowledge of the network pathways.
> Alas, I am very green in this area.  The receive side has no issues,
> though, so I would focus on transmit counters (with assistance).

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Re[4]: serious networking (em) performance (ggate and NFS) problem

2004-11-22 Thread Sean McNeil
On Mon, 2004-11-22 at 11:34 +, Robert Watson wrote:
> On Sun, 21 Nov 2004, Sean McNeil wrote:
> 
> > I have to disagree.  Packet loss is likely according to some of my
> > tests.  With the re driver, no change except placing a 100BT setup with
> > no packet loss to a gigE setup (both linksys switches) will cause
> > serious packet loss at 20Mbps data rates.  I have discovered the only
> > way to get good performance with no packet loss was to
> > 
> > 1) Remove interrupt moderation
> > 2) defrag each mbuf that comes in to the driver.
> 
> Sounds like you're bumping into a queue limit that is made worse by
> interrupting less frequently, resulting in bursts of packets that are
> relatively large, rather than a trickle of packets at a higher rate.
> Perhaps a limit on the number of outstanding descriptors in the driver or
> hardware and/or a limit in the netisr/ifqueue queue depth.  You might try
> changing the default IFQ_MAXLEN from 50 to 128 to increase the size of the
> ifnet and netisr queues.  You could also try setting net.isr.enable=1 to
> enable direct dispatch, which in the in-bound direction would reduce the
> number of context switches and queueing.  It sounds like the device driver
> has a limit of 256 receive and transmit descriptors, which one supposes is
> probably derived from the hardware limit, but I have no documentation on
> hand so can't confirm that.

I've tried bumping IFQ_MAXLEN and it made no difference.  I could rerun
this test to be 100% certain I suppose.  It was done a while back.  I
haven't tried net.isr.enable=1, but packet loss is in the transmission
direction.  The device driver has been modified to have 1024 transmit
and receive descriptors each as that is the hardware limitation.  That
didn't matter either.  With 1024 descriptors I still lost packets
without the m_defrag.

The most difficult thing for me to understand is:  if this is some sort
of resource limitation why will it work with a slower phy layer
perfectly and not with the gigE?  The only thing I could think of was
that the old driver was doing m_defrag calls when it filled the transmit
descriptor queues up to a certain point.  Understanding the effects of
m_defrag would be helpful in figuring this out I suppose.

> It would be interesting on the send and receive sides to inspect the
> counters for drops at various points in the network stack; i.e., are we
> dropping packets at the ifq handoff because we're overfilling the
> descriptors in the driver, are packets dropped on the inbound path going
> into the netisr due to over-filling before the netisr is scheduled, etc. 
> And, it's probably interesting to look at stats on filling the socket
> buffers for the same reason: if bursts of packets come up the stack, the
> socket buffers could well be being over-filled before the user thread can
> run.

Yes, this would be very interesting and should point out the problem.  I
would do such a thing if I had enough knowledge of the network pathways.
Alas, I am very green in this area.  The receive side has no issues,
though, so I would focus on transmit counters (with assistance).



signature.asc
Description: This is a digitally signed message part


Re[5]: serious networking (em) performance (ggate and NFS) problem

2004-11-22 Thread Shunsuke SHINOMIYA

 I did FastEthernet throughput test by Smartbits with SmartApp.
 It's simpler than TCP throughput measurement. :)
 This Smartbits has some FastEthernet ports, has no GbE ports.

 The router is consist of single Xeon 2.4GHz which is HTT enabled and
two on-boarded em interfaces. The kernel is 5.3-RELEASE with option SMP.
if_em.c is version 1.44.2.3 with the sysctl-able patch.
kern.random.sys.harvest.ethernet is set to 0.

 `Interrupt Moderation on' means the combination of parameters,
hw.em?.rx_int_delay: 0
hw.em?.tx_int_delay: 66
hw.em?.rx_abs_int_delay: 66
hw.em?.tx_abs_int_delay: 66
hw.em?.int_throttle_ceil: 8000.

 `Interrupt Moderation off' means the combination of parameters,
hw.em?.rx_int_delay: 0
hw.em?.tx_int_delay: 0
hw.em?.rx_abs_int_delay: 0
hw.em?.tx_abs_int_delay: 0
hw.em?.int_throttle_ceil: 0.

`on/off' means input side interface(em0)'s moderation is `on' and output
side interface(em1)'s moderation is `off'.

 hmm ..., the results of off/on are unstable.

(A) ... Passed Rate(%)
(B) ... Passed Rate(pps)

Interrupt Moderation on/on
Frame  1st Trial   2nd Trial   3rd Trial
Size  (A) (B) (A) (B) (A) (B)
  64 28.14   41876   28.14   41876   28.14   41876
 128 50.08   42301   49.50   41805   49.50   41806
 256 93.24   42229   92.00   41666   92.62   41946
 512100.00   23496  100.00   23496  100.00   23496
1024100.00   11973  100.00   11973  100.00   11973
1280100.009615  100.009615  100.009615
1518100.008127  100.008127  100.008127

Interrupt Moderation on/off
Frame  1st Trial   2nd Trial   3rd Trial
Size  (A) (B) (A) (B) (A) (B)
  64 28.14   41876   27.50   40916   28.14   41876
 128 50.08   42301   48.85   41254   49.50   41806
 256 93.24   42229   92.00   41666   92.62   41946
 512100.00   23496  100.00   23496  100.00   23496
1024100.00   11973  100.00   11973  100.00   11973
1280100.009615  100.009615  100.009615
1518100.008127  100.008127  100.008127

Interrupt Moderation off/on
Frame  1st Trial   2nd Trial   3rd Trial
Size  (B)(C)  (B) (C) (B) (C)
  64100.00  148807  100.00  148807   99.41  147927
 128100.00   84458   55.74   47080   61.41   51867
 256100.00   45289   98.75   44722  100.00   45289
 512100.00   23496  100.00   23496  100.00   23496
1024100.00   11973  100.00   11973  100.00   11973
1280100.009615  100.009615  100.009615
1518100.008127  100.008127  100.008127


Interrupt Moderation off/off
Frame  1st Trial   2nd Trial   3rd Trial
Size  (B)(C)  (B) (C) (B) (C)
  64 81.55  121358   81.55  121358   82.35  122547
 128100.00   84458  100.00   84458  100.00   84458
 256100.00   45289  100.00   45289  100.00   45289
 512100.00   23496  100.00   23496  100.00   23496
1024100.00   11973  100.00   11973  100.00   11973
1280100.009615  100.009615  100.009615
1518100.008127  100.008127  100.008127

-- 
Shunsuke SHINOMIYA <[EMAIL PROTECTED]>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Re[4]: serious networking (em) performance (ggate and NFS) problem

2004-11-22 Thread Robert Watson

On Sun, 21 Nov 2004, Sean McNeil wrote:

> I have to disagree.  Packet loss is likely according to some of my
> tests.  With the re driver, no change except placing a 100BT setup with
> no packet loss to a gigE setup (both linksys switches) will cause
> serious packet loss at 20Mbps data rates.  I have discovered the only
> way to get good performance with no packet loss was to
> 
> 1) Remove interrupt moderation
> 2) defrag each mbuf that comes in to the driver.

Sounds like you're bumping into a queue limit that is made worse by
interrupting less frequently, resulting in bursts of packets that are
relatively large, rather than a trickle of packets at a higher rate.
Perhaps a limit on the number of outstanding descriptors in the driver or
hardware and/or a limit in the netisr/ifqueue queue depth.  You might try
changing the default IFQ_MAXLEN from 50 to 128 to increase the size of the
ifnet and netisr queues.  You could also try setting net.isr.enable=1 to
enable direct dispatch, which in the in-bound direction would reduce the
number of context switches and queueing.  It sounds like the device driver
has a limit of 256 receive and transmit descriptors, which one supposes is
probably derived from the hardware limit, but I have no documentation on
hand so can't confirm that.

It would be interesting on the send and receive sides to inspect the
counters for drops at various points in the network stack; i.e., are we
dropping packets at the ifq handoff because we're overfilling the
descriptors in the driver, are packets dropped on the inbound path going
into the netisr due to over-filling before the netisr is scheduled, etc. 
And, it's probably interesting to look at stats on filling the socket
buffers for the same reason: if bursts of packets come up the stack, the
socket buffers could well be being over-filled before the user thread can
run.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Principal Research Scientist, McAfee Research



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Re[4]: serious networking (em) performance (ggate and NFS) problem

2004-11-21 Thread Sean McNeil
On Sun, 2004-11-21 at 20:42 -0800, Matthew Dillon wrote:
> : Yes, I knew that adjusting TCP window size is important to use up a link.
> : However I wanted to show adjusting the parameters of Interrupt
> : Moderation affects network performance.
> :
> : And I think a packet loss was occured by enabled Interrupt Moderation.
> : The mechanism of a packet loss in this case is not cleared, but I think
> : inappropriate TCP window size is not the only reason.
> 
> Packet loss is not likely, at least not for the contrived tests we
> are doing because GiGE links have hardware flow control (I'm fairly
> sure).

I have to disagree.  Packet loss is likely according to some of my
tests.  With the re driver, no change except placing a 100BT setup with
no packet loss to a gigE setup (both linksys switches) will cause
serious packet loss at 20Mbps data rates.  I have discovered the only
way to get good performance with no packet loss was to

1) Remove interrupt moderation
2) defrag each mbuf that comes in to the driver.

Doing both of these, I get excellent performance without any packet
loss.  All my testing has been with UDP packets, however, and nothing
was checked for TCP.

> One could calculate the worst case small-packet build up in the receive
> ring.  I'm not sure what the minimum pad for GiGE is, but lets say it's
> 64 bytes.  Then the packet rate would be around 1.9M pps or 244 packets
> per interrupt at a moderation frequency of 8000 hz.  The ring is 256
> packets.  But, don't forget the hardware flow control!  The switch
> has some buffering too.
> 
> hmm... me thinks I now understand why 8000 was chosen as the default :-)
> 
> I would say that this means packet loss due to the interrupt moderation
> is highly unlikely, at least in theory, but if one were paranoid one
> might want to use a higher moderation frequency, say 16000 hz, to be sure.

Your calculations are based on the mbufs being a particular size, no?
What happens if they are seriously defragmented?  Is this what you mean
by "small-packet"?  Are you assuming the mbufs are as small as they get?
How small can they go?  1 byte? 1 MTU?

Increasing the interrupt moderation frequency worked on the re driver,
but it only made it marginally better.  Even without moderation,
however, I could lose packets without m_defrag.  I suspect that there is
something in the higher level layers that is causing the packet loss.  I
have no explanation why m_defrag makes such a big difference for me, but
it does.  I also have no idea why a 20Mbps UDP stream can lose data over
gigE phy and not lose anything over 100BT... without the above mentioned
changes that is.



signature.asc
Description: This is a digitally signed message part


Re: Re[4]: serious networking (em) performance (ggate and NFS) problem

2004-11-21 Thread Matthew Dillon

: Yes, I knew that adjusting TCP window size is important to use up a link.
: However I wanted to show adjusting the parameters of Interrupt
: Moderation affects network performance.
:
: And I think a packet loss was occured by enabled Interrupt Moderation.
: The mechanism of a packet loss in this case is not cleared, but I think
: inappropriate TCP window size is not the only reason.

Packet loss is not likely, at least not for the contrived tests we
are doing because GiGE links have hardware flow control (I'm fairly
sure).

One could calculate the worst case small-packet build up in the receive
ring.  I'm not sure what the minimum pad for GiGE is, but lets say it's
64 bytes.  Then the packet rate would be around 1.9M pps or 244 packets
per interrupt at a moderation frequency of 8000 hz.  The ring is 256
packets.  But, don't forget the hardware flow control!  The switch
has some buffering too.

hmm... me thinks I now understand why 8000 was chosen as the default :-)

I would say that this means packet loss due to the interrupt moderation
is highly unlikely, at least in theory, but if one were paranoid one
might want to use a higher moderation frequency, say 16000 hz, to be sure.

: I found TCP throuput improvement at disabled Interrupt Moderation is related
: to congestion avoidance phase of TCP. Because these standard deviations are
: decreased when Interrupt Moderation is disabled.
:
: The following two results are outputs of `iperf -P 10'. without TCP
: window size adjustment too. I think, the difference of each throughput
: at same measurement shows congestion avoidance worked.
:
:o with default setting of Interrupt Moderation.
:> [ ID] Interval   Transfer Bandwidth
:> [ 13]  0.0-10.0 sec  80.1 MBytes  67.2 Mbits/sec
:> [ 11]  0.0-10.0 sec   121 MBytes   102 Mbits/sec
:> [ 12]  0.0-10.0 sec  98.9 MBytes  83.0 Mbits/sec
:> [  4]  0.0-10.0 sec  91.8 MBytes  76.9 Mbits/sec
:> [  7]  0.0-10.0 sec   127 MBytes   106 Mbits/sec
:> [  5]  0.0-10.0 sec   106 MBytes  88.8 Mbits/sec
:> [  6]  0.0-10.0 sec   113 MBytes  94.4 Mbits/sec
:> [ 10]  0.0-10.0 sec   117 MBytes  98.2 Mbits/sec
:> [  9]  0.0-10.0 sec   113 MBytes  95.0 Mbits/sec
:> [  8]  0.0-10.0 sec  93.0 MBytes  78.0 Mbits/sec
:> [SUM]  0.0-10.0 sec  1.04 GBytes   889 Mbits/sec

Certainly overall send/response latency will be effected by up to 1/freq,
e.g. 1/8000 = 125 uS (x2 hosts == 250 uS worst case), which is readily
observable by running ping:

[intrate]
[set on both boxes]

max:64 bytes from 216.240.41.62: icmp_seq=2 ttl=64 time=0.057 ms
10: 64 bytes from 216.240.41.62: icmp_seq=8 ttl=64 time=0.061 ms
3:  64 bytes from 216.240.41.62: icmp_seq=5 ttl=64 time=0.078 ms
8000:   64 bytes from 216.240.41.62: icmp_seq=3 ttl=64 time=0.176 ms
(large stddev too, e.g. 0.188, 0.166, etc).

But this is only relevant for applications that require that sort of
response time == not very many applications.  Note that a large packet
will turn the best case 57 uS round trip into a 140 uS round trip with
the EM card.

It might be interesting to see how interrupt moderation effects a
buildworld over NFS as that certainly results in a huge amount of
synchronous transactional traffic.

: Measureing TCP throughput was not appropriate way to indicate an effect
: of Interrupt Moderation clearly. It's my mistake. TCP is too
: complicated. :)
:
:-- 
:Shunsuke SHINOMIYA <[EMAIL PROTECTED]>

It really just comes down to how sensitive a production system is to
round trip times within the range of effect of the moderation frequency.
Usually the answer is: not very.  That is, the benefit is not sufficient
to warrent the additional interrupt load that turning moderation off
would create.  And even if low latency is desired it is not actually
necessary to turn off moderation.  It could be set fairly high,
e.g. 2, to reap most of the benefit.

Processing overheads are also important.  If the network is loaded down
you will wind up eating a significant chunk of cpu with moderation turned
off.  This is readily observable by running vmstat during an iperf test.

iperf test ~700 MBits/sec reported for all tested moderation frequencies.
using iperf -w 63.5K on DragonFly.  I would be interesting in knowing how
FreeBSD fares, though SMP might skew the reality too much to be 
meaningful.

moderation  cpu
frequency   %idle

10  2% idle
3   7% idle
2   35% idle
1   60% idle
800066% idle

In otherwords, if you are doing more then just shoving bits around the
network, for example if you need to read or write the disk or do some
sort of computation or other activity that requires cpu, turning off
moderation could wind up being a very

Re[4]: serious networking (em) performance (ggate and NFS) problem

2004-11-21 Thread Shunsuke SHINOMIYA

 Thank you, Matt.
> 
> Very interesting, but the only reason you get lower results is simply
> because the TCP window is not big enough.  That's it.
> 

 Yes, I knew that adjusting TCP window size is important to use up a link.
 However I wanted to show adjusting the parameters of Interrupt
 Moderation affects network performance.

 And I think a packet loss was occured by enabled Interrupt Moderation.
 The mechanism of a packet loss in this case is not cleared, but I think
 inappropriate TCP window size is not the only reason.

 I found TCP throuput improvement at disabled Interrupt Moderation is related
 to congestion avoidance phase of TCP. Because these standard deviations are
 decreased when Interrupt Moderation is disabled.

 The following two results are outputs of `iperf -P 10'. without TCP
 window size adjustment too. I think, the difference of each throughput
 at same measurement shows congestion avoidance worked.

o with default setting of Interrupt Moderation.
> [ ID] Interval   Transfer Bandwidth
> [ 13]  0.0-10.0 sec  80.1 MBytes  67.2 Mbits/sec
> [ 11]  0.0-10.0 sec   121 MBytes   102 Mbits/sec
> [ 12]  0.0-10.0 sec  98.9 MBytes  83.0 Mbits/sec
> [  4]  0.0-10.0 sec  91.8 MBytes  76.9 Mbits/sec
> [  7]  0.0-10.0 sec   127 MBytes   106 Mbits/sec
> [  5]  0.0-10.0 sec   106 MBytes  88.8 Mbits/sec
> [  6]  0.0-10.0 sec   113 MBytes  94.4 Mbits/sec
> [ 10]  0.0-10.0 sec   117 MBytes  98.2 Mbits/sec
> [  9]  0.0-10.0 sec   113 MBytes  95.0 Mbits/sec
> [  8]  0.0-10.0 sec  93.0 MBytes  78.0 Mbits/sec
> [SUM]  0.0-10.0 sec  1.04 GBytes   889 Mbits/sec

o with disabled Interrupt Moderation.
> [ ID] Interval   Transfer Bandwidth
> [  7]  0.0-10.0 sec   106 MBytes  88.9 Mbits/sec
> [ 10]  0.0-10.0 sec   107 MBytes  89.7 Mbits/sec
> [  8]  0.0-10.0 sec   107 MBytes  89.4 Mbits/sec
> [  9]  0.0-10.0 sec   107 MBytes  90.0 Mbits/sec
> [ 11]  0.0-10.0 sec   106 MBytes  89.2 Mbits/sec
> [ 12]  0.0-10.0 sec   104 MBytes  87.6 Mbits/sec
> [  4]  0.0-10.0 sec   106 MBytes  88.7 Mbits/sec
> [ 13]  0.0-10.0 sec   106 MBytes  88.9 Mbits/sec
> [  5]  0.0-10.0 sec   106 MBytes  88.9 Mbits/sec
> [  6]  0.0-10.0 sec   107 MBytes  89.9 Mbits/sec
> [SUM]  0.0-10.0 sec  1.04 GBytes   891 Mbits/sec


 But, By decreasing TCP windows size, it could avoid.
o with default setting of Interrupt Moderation and iperf -P 10 -w 28.3k
> [ ID] Interval   Transfer Bandwidth
> [ 12]  0.0-10.0 sec   111 MBytes  93.0 Mbits/sec
> [  4]  0.0-10.0 sec   106 MBytes  88.8 Mbits/sec
> [ 11]  0.0-10.0 sec   107 MBytes  89.9 Mbits/sec
> [  9]  0.0-10.0 sec   109 MBytes  91.6 Mbits/sec
> [  5]  0.0-10.0 sec   109 MBytes  91.5 Mbits/sec
> [ 13]  0.0-10.0 sec   108 MBytes  90.8 Mbits/sec
> [ 10]  0.0-10.0 sec   107 MBytes  89.7 Mbits/sec
> [  8]  0.0-10.0 sec   110 MBytes  92.3 Mbits/sec
> [  6]  0.0-10.0 sec   111 MBytes  93.2 Mbits/sec
> [  7]  0.0-10.0 sec   108 MBytes  90.6 Mbits/sec
> [SUM]  0.0-10.0 sec  1.06 GBytes   911 Mbits/sec


 Measureing TCP throughput was not appropriate way to indicate an effect
 of Interrupt Moderation clearly. It's my mistake. TCP is too
 complicated. :)

-- 
Shunsuke SHINOMIYA <[EMAIL PROTECTED]>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Re[2]: serious networking (em) performance (ggate and NFS) problem

2004-11-21 Thread Matthew Dillon
: I did simple benchmark at some settings.
:
: I used two boxes which are single Xeon 2.4GHz with on-boarded em.
: I measured a TCP throughput by iperf.
:
: These results show that the throughput of TCP increased if Interrupt
:Moderation is turned OFF. At least, adjusting these parameters affected
:TCP performance. Other appropriate combination of parameter may exist.

Very interesting, but the only reason you get lower results is simply
because the TCP window is not big enough.  That's it.

8000 ints/sec = ~15KB of backlogged traffic.  x 2 (sender, receiver)

Multiply by two (both the sender's reception of acks and the receiver's
reception of data) and you get ~30KB.  This is awefully close to the
default 32.5KB window size that iperf uses.

Other then window sizing issues I can think of no rational reason why
throughput would be lower.  Can you?  And, in fact, when I do the same
tests on DragonFly and play with the interrupt throttle rate I get
nearly the results I expect.

* Shuttle Athlon 64 3200+ box, EM card in 32 bit PCI slot 
* 2 machines connected through a GiGE switch
* All other hw.em0 delays set to 0 on both sides
* throttle settings set on both sides
* -w option set on iperf client AND server for 63.5KB window
* software interrupt throttling has been turned off for these tests

throttleresult  result
freq(32.5KB win)(63.5KB win)
(default)
--  ---

  maxrate   481 MBit/s  533 MBit/s  (not sure what's going on here)
  12518 MBit/s  558 MBit/s  (not sure what's going on here)
  10613 MBit/s  667 MBit/s  (not sure what's going on here)
   7679 MBit/s  691 MBit/s
   6668 MBit/s  694 MBit/s
   5678 MBit/s  684 MBit/s
   4694 MBit/s  696 MBit/s
   3694 MBit/s  696 MBit/s
   2698 MBit/s  703 MBit/s
   1707 MBit/s  716 MBit/s
9000708 MBit/s  716 MBit/s
8000710 MBit/s  717 MBit/s  <--- drop off pt 32.5KB win
7000683 MBit/s  716 MBit/s
6000680 MBit/s  720 MBit/s
5000652 MBit/s  718 MBit/s  <--- drop off pt 63.5KB win
4000555 Mbit/s  695 MBit/s
3000522 MBit/s  533 MBit/s  <--- GiGE throttling likely
2000449 MBit/s  384 MBit/s  (256 ring descriptors =
1000260 MBit/s  193 MBit/s2500 hz minimum)

Unless you are in a situation where you need to route small packets
flying around a cluster where low latency is important, it doesn't really
make any sense to turn off interrupt throttling.  It might make sense 
to change the default from 8000 to 1 to handle typical default
TCP window sizes (at least in a LAN situation), but it certainly should
not be turned off.

I got some weird results when I increased the frequency past 100KHz, and
when I turned throttling off entirely.  I'm not sure why.  Maybe setting
the ITR register to 0 is a bad idea.  If I set it to 1 (i.e. 3906250 Hz)
then I get 625 MBit/s.  Setting the ITR to 1 (i.e. 256ns delay) should
amount to the same thing as setting it to 0 but it doesn't.  Very odd.
The maximum interrupt rate as reported by systat is only ~46000 ints/sec
so all the values above 50KHz should read about the same... and they
do until we hit around 100Khz (10uS delay).  Then everything goes to
hell in a handbasket.

Conclusion: 1 hz would probably be a better default then 8000 hz.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Re[2]: serious networking (em) performance (ggate and NFS) problem

2004-11-21 Thread Sean McNeil
On Sun, 2004-11-21 at 21:27 +0900, Shunsuke SHINOMIYA wrote:
>  Jeremie, thank you for your comment.
> 
>  I did simple benchmark at some settings.
> 
>  I used two boxes which are single Xeon 2.4GHz with on-boarded em.
>  I measured a TCP throughput by iperf.
> 
>  These results show that the throughput of TCP increased if Interrupt
> Moderation is turned OFF. At least, adjusting these parameters affected
> TCP performance. Other appropriate combination of parameter may exist.

I have found interrupt moderation to seriously kill gigE performance.
Another test you can make is to have the driver always defrag packets in
em_encap().  Something like

m_head = m_defrag(*m_headp, M_DONTWAIT);
if (m_head == NULL)
return ENOBUFS;



signature.asc
Description: This is a digitally signed message part


Re[2]: serious networking (em) performance (ggate and NFS) problem

2004-11-21 Thread Shunsuke SHINOMIYA

 Jeremie, thank you for your comment.

 I did simple benchmark at some settings.

 I used two boxes which are single Xeon 2.4GHz with on-boarded em.
 I measured a TCP throughput by iperf.

 These results show that the throughput of TCP increased if Interrupt
Moderation is turned OFF. At least, adjusting these parameters affected
TCP performance. Other appropriate combination of parameter may exist.


 The settings are some combinations of
hw.em0.rx_int_delay
hw.em0.tx_int_delay
hw.em0.rx_abs_int_delay
hw.em0.tx_abs_int_delay
hw.em0.int_throttle_ceil.

In this mail, A setting, 
hw.em0.rx_int_delay: 0
hw.em0.tx_int_delay: 66
hw.em0.rx_abs_int_delay: 66
hw.em0.tx_abs_int_delay: 66
hw.em0.int_throttle_ceil: 8000
is abbreviated to (0, 66, 66, 66, 8000).

TCP window size was not adjusted by iperf's options. It mean that iperf
is used by default setting.

sender : default(0, 66, 66, 66, 8000), receiver : default(0, 66, 66, 66,
8000)
1st trial 852Mbps
2nd trial 861Mbps
3rd trial 822Mbps
4th trial 791Mbps
5th trial 826Mbps
average 830.4Mbps, std. dev. 27.6Mbps

sender : (0, 0, 0, 0, 8000), receiver : (0, 0, 0, 0, 8000)
1st trial 787Mbps
2nd trial 793Mbps
3rd trial 843Mbps
4th trial 771Mbps
5th trial 848Mbps
average 808.4Mbps, std. dev. 34.9Mbps

sender : off(0, 0, 0, 0, 0), receiver : off(0, 0, 0, 0, 0)
1st trial 902Mbps
2nd trial 901Mbps
3rd trial 899Mbps
4th trial 894Mbps
5th trial 900Mbps
average 899.2Mbps, std. dev. 3.1Mbps

-- 
Shunsuke SHINOMIYA <[EMAIL PROTECTED]>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-19 Thread sthaug
> I changed cables and couldn't reproduce that bad results so I changed cables 
> back but also cannot reproduce them, especially the ggate write, formerly 
> with 2,6MB/s now performs at 15MB/s, but I haven't done any polling tests 
> anymore, just interrupt driven, since Matt explained that em doesn't benefit 
> of polling in any way.
> 
> Results don't indicate a serious problem now but are still about a third of 
> what I'd expected with my hardware. Do I really need Gigahertz Class CPUs to 
> transfer 30MB/s over GbE?

I would be highly surprised if you did. When I tested this a while ago
(around FreeBSD 4.8) with a pair of Intel ISP 1100 1U servers using
Pentium 700 and Intel GigE cards (em driver), I was able to get around
700 Mbps using ttcp. This was done with a normal 32 bit 33 MHz PCI bus.

Steinar Haug, Nethelp consulting, [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-19 Thread M. Warner Losh
In message: 

"Daniel Eriksson" <[EMAIL PROTECTED]> writes:
: Finally, my question. What would you recommend:
: 1) Run with ACPI disabled and debug.mpsafenet=1 and hope that the mix of
: giant-safe and giant-locked (em and ahc) doesn't trigger any bugs. This is
: what I currently do.
: 2) Run with ACPI disabled and debug.mpsafenet=0 and accept lower network
: performance (it is a high-traffic server, so I'm not sure this is a valid
: option).
: 3) Run with ACPI enabled and debug.mpsafenet=1 and accept that em0
: interrupts "leak" to the atapci1+ ithread. This I have done in the past.

I don't know if I'm to a 'recommendation' so much as a 'I'd try your
normal configuration with mpsetfet=0' to see if that makes a
difference in the performance that you see.

Warner
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-19 Thread Emanuel Strobl
Am Freitag, 19. November 2004 13:56 schrieb Robert Watson:
> On Fri, 19 Nov 2004, Emanuel Strobl wrote:
> > Am Donnerstag, 18. November 2004 13:27 schrieb Robert Watson:
> > > On Wed, 17 Nov 2004, Emanuel Strobl wrote:
> > > > I really love 5.3 in many ways but here're some unbelievable transfer
[...]
> Well, the claim that if_em doesn't benefit from polling is inaccurate in
> the general case, but quite accurate in the specific case.  In a box with
> multiple NIC's, using polling can make quite a big difference, not just by
> mitigating interrupt load, but also by helping to prioritize and manage
> the load, preventing live lock.  As I indicated in my earlier e-mail,

I understand, thanks for the explanation

> It looks like the netperf TCP test is getting just under 27MB/s, or
> 214Mb/s.  That does seem on the low side for the PCI bus, but it's also

Nut sure if I understand that sentence correctly, does it mean the "slow" 
400MHz PII is causing this limit? (low side for the PCI bus?)

> instructive to look at the netperf UDP_STREAM results, which indicate that
> the box believes it is transmitting 417Mb/s but only 67Mb/s are being
> received or processed fast enough by netserver on the remote box.  This
> means you've achieved a send rate to the card of about 54Mb/s.  Note that
> you can actually do the math on cycles/packet or cycles/byte here -- with
> TCP_STREAM, it looks like some combination of recipient CPU and latency
> overhead is the limiting factor, with netserver running at 94% busy.

Hmm, I can't puzzle a picture out of this. 

>
> Could you try using geom gate to export a malloc-backed md device, and see
> what performance you see there?  This would eliminate the storage round

It's a pleasure:

test2:~#15: dd if=/dev/zero of=/mdgate/testfile bs=16k count=6000
6000+0 records in
6000+0 records out
98304000 bytes transferred in 5.944915 secs (16535812 bytes/sec)
test2:~#17: dd if=/mdgate/testfile of=/dev/null bs=16k
6000+0 records in
6000+0 records out
98304000 bytes transferred in 5.664384 secs (17354755 bytes/sec)

This time it's no difference between disk and memory filesystem, but on 
another machine with a ich2 chipset and a 3ware controller (my current 
productive system which I try to replace with this project) there was a big 
difference. Attached is the corresponding message.

Thanks,

-Harry

> trip and guarantee the source is in memory, eliminating some possible
> sources of synchronous operation (which would increase latency, reducing
> throughput).  Looking at CPU consumption here would also be helpful, as it
> would allow us to reason about where the CPU is going.
>
> > I was aware of that and because of lacking a GbE switch anyway I decided
> > to use a simple cable ;)
>
> Yes, this is my favorite configuration :-).
>
> > > (5) Next, I'd measure CPU consumption on the end box -- in particular,
> > > use top -S and systat -vmstat 1 to compare the idle condition of the
> > > system and the system under load.
> >
> > I additionally added these values to the netperf results.
>
> Thanks for your very complete and careful testing and reporting :-).
>
> Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
> [EMAIL PROTECTED]  Principal Research Scientist, McAfee Research
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
--- Begin Message ---
Am Dienstag, 2. November 2004 19:56 schrieb Doug White:
> On Tue, 2 Nov 2004, Robert Watson wrote:
> > On Tue, 2 Nov 2004, Emanuel Strobl wrote:
> > > It's a IDE Raid controller (3ware 7506-4, a real one) and the file is
> > > indeed huge, but not abnormally. I have a harddisk video recorder, so I
> > > have lots of 700MB files. Also if I copy my photo collection from the
> > > server it takes 5 Minutes but copying _to_ the server it takes almost
> > > 15 Minutes and the average file size is 5 MB. Fast Ethernet isn't
> > > really suitable for my needs, but at least the 10MB/s should be
> > > reached. I can't imagine I get better speeds when I upgrade to GbE,
> > > (which the important boxes are already, just not the switch) because
> > > NFS in it's current state isn't able to saturate a 100baseTX line, at
> > > least in one direction. That's the real anstonishing thing for me. Why
> > > does reading staurate 100BaseTX but writes only a third?
> >
> > Have you tried using tcpdump/ethereal to see if there's any significant
> > packet loss (for good reasons or not) going on?  Lots of RPC retransmits
> > would certainly explain the lower performance, and if that's not it, it
> > would be good to rule out.  The traces might also provide some insight
> > into the specific I/O operations, letting you see what block sizes are in
> > use, etc.  I've found that dumping to a file with tcpdump and reading
> > with ethereal is a really good way to get a picture of what's going on
> > with NFS

Re: serious networking (em) performance (ggate and NFS) problem

2004-11-19 Thread Robert Watson

On Fri, 19 Nov 2004, Emanuel Strobl wrote:

> Am Donnerstag, 18. November 2004 13:27 schrieb Robert Watson:
> > On Wed, 17 Nov 2004, Emanuel Strobl wrote:
> > > I really love 5.3 in many ways but here're some unbelievable transfer
> 
> First, thanks a lot to all of you paying attention to my problem again. 
> I'll use this as a cumulative answer to the many postings of you, first
> answering Roberts questions and at the bottom those of the others. 
> 
> I changed cables and couldn't reproduce that bad results so I changed
> cables back but also cannot reproduce them, especially the ggate write,
> formerly with 2,6MB/s now performs at 15MB/s, but I haven't done any
> polling tests anymore, just interrupt driven, since Matt explained that
> em doesn't benefit of polling in any way. 
> 
> Results don't indicate a serious problem now but are still about a third
> of what I'd expected with my hardware. Do I really need Gigahertz Class
> CPUs to transfer 30MB/s over GbE? 

Well, the claim that if_em doesn't benefit from polling is inaccurate in
the general case, but quite accurate in the specific case.  In a box with
multiple NIC's, using polling can make quite a big difference, not just by
mitigating interrupt load, but also by helping to prioritize and manage
the load, preventing live lock.  As I indicated in my earlier e-mail,
however, on your system it shouldn't make much difference -- 4k-8k
interrupts/second is not a big deal, and quite normal for use of an if_em
card in the interrupt-driven configuration.
 
It looks like the netperf TCP test is getting just under 27MB/s, or
214Mb/s.  That does seem on the low side for the PCI bus, but it's also
instructive to look at the netperf UDP_STREAM results, which indicate that
the box believes it is transmitting 417Mb/s but only 67Mb/s are being
received or processed fast enough by netserver on the remote box.  This
means you've achieved a send rate to the card of about 54Mb/s.  Note that
you can actually do the math on cycles/packet or cycles/byte here -- with
TCP_STREAM, it looks like some combination of recipient CPU and latency
overhead is the limiting factor, with netserver running at 94% busy.

Could you try using geom gate to export a malloc-backed md device, and see
what performance you see there?  This would eliminate the storage round
trip and guarantee the source is in memory, eliminating some possible
sources of synchronous operation (which would increase latency, reducing
throughput).  Looking at CPU consumption here would also be helpful, as it
would allow us to reason about where the CPU is going.

> I was aware of that and because of lacking a GbE switch anyway I decided
> to use a simple cable ;) 

Yes, this is my favorite configuration :-).

> > (5) Next, I'd measure CPU consumption on the end box -- in particular, use
> > top -S and systat -vmstat 1 to compare the idle condition of the
> > system and the system under load.
> >
> 
> I additionally added these values to the netperf results.

Thanks for your very complete and careful testing and reporting :-).

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Principal Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-19 Thread Emanuel Strobl
Am Donnerstag, 18. November 2004 13:27 schrieb Robert Watson:
> On Wed, 17 Nov 2004, Emanuel Strobl wrote:
> > I really love 5.3 in many ways but here're some unbelievable transfer

First, thanks a lot to all of you paying attention to my problem again.
I'll use this as a cumulative answer to the many postings of you, first 
answering Roberts questions and at the bottom those of the others.

I changed cables and couldn't reproduce that bad results so I changed cables 
back but also cannot reproduce them, especially the ggate write, formerly 
with 2,6MB/s now performs at 15MB/s, but I haven't done any polling tests 
anymore, just interrupt driven, since Matt explained that em doesn't benefit 
of polling in any way.

Results don't indicate a serious problem now but are still about a third of 
what I'd expected with my hardware. Do I really need Gigahertz Class CPUs to 
transfer 30MB/s over GbE?

>
> I think the first thing you want to do is to try and determine whether the
> problem is a link layer problem, network layer problem, or application
> (file sharing) layer problem.  Here's where I'd start looking:
>
> (1) I'd first off check that there wasn't a serious interrupt problem on
> the box, which is often triggered by ACPI problems.  Get the box to be
> as idle as possible, and then use vmstat -i or stat -vmstat to see if
> anything is spewing interrupts.

Everything is fine

>
> (2) Confirm that your hardware is capable of the desired rates: typically
> this involves looking at whether you have a decent card (most if_em
> cards are decent), whether it's 32-bit or 64-bit PCI, and so on.  For
> unidirectional send on 32-bit PCI, be aware that it is not possible to
> achieve gigabit performance because the PCI bus isn't fast enough, for
> example.

I'm aware that my 32bit/33MHz PCI bus is a "bottleneck", but I saw almost 
80MByte/s running over the bus to my test-stripe-set (over the HPT372). So 
I'm pretty sure the system is good for 40MB/s ober the GbE line, which was 
sufficient for me.

>
> (3) Next, I'd use a tool like netperf (see ports collection) to establish
> three characteristics: round trip latency from user space to user
> space (UDP_RR), TCP throughput (TCP_STREAM), and large packet
> throughput (UDP_STREAM).  With decent boxes on 5.3, you should have no
> trouble at all maxing out a single gig-e with if_em, assuming all is
> working well hardware wise and there's no software problem specific to
> your configuration.

Please find the results on http://www.schmalzbauer.de/document.php?id=21
There is also a lot of additional information and more test results

>
> (4) Note that router latency (and even switch latency) can have a
> substantial impact on gigabit performance, even with no packet loss,
> in part due to stuff like ethernet flow control.  You may want to put
> the two boxes back-to-back for testing purposes.
>

I was aware of that and because of lacking a GbE switch anyway I decided to 
use a simple cable ;)

> (5) Next, I'd measure CPU consumption on the end box -- in particular, use
> top -S and systat -vmstat 1 to compare the idle condition of the
> system and the system under load.
>

I additionally added these values to the netperf results.

> If you determine there is a link layer or IP layer problem, we can start
> digging into things like the error statistics in the card, negotiation
> issues, etc.  If not, you want to move up the stack to try and
> characterize where it is you're hitting the performance issue.

Am Donnerstag, 18. November 2004 17:53 schrieb M. Warner Losh:
> In message: <[EMAIL PROTECTED]>
>
>             Robert Watson <[EMAIL PROTECTED]> writes:
> : (1) I'd first off check that there wasn't a serious interrupt problem on
> :     the box, which is often triggered by ACPI problems.  Get the box to
> : be as idle as possible, and then use vmstat -i or stat -vmstat to see if
> : anything is spewing interrupts.
>
> Also, make sure that you aren't sharing interrupts between
> GIANT-LOCKED and non-giant-locked cards.  This might be exposing bugs
> in the network layer that debug.mpsafenet=0 might correct.  Just
> noticed that our setup here has that setup, so I'll be looking into
> that area of things.

As you can see on the link above, no shared IRQs



pgpb1ohdo7oDw.pgp
Description: PGP signature


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-19 Thread Jeremie Le Hen
>  Hi, Jeremie, how is this?
>  To disable Interrupt Moderation, sysctl hw.em?.int_throttle_valve=0.

Great, I would have called it "int_throttle_ceil", but that's a detail
and my opinion is totally subjective.

>  However, because this patch is just made now, it is not fully tested.

I'll give it a try this weekend although I won't be able to make
performance mesurements.

-- 
Jeremie Le Hen
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[2]: serious networking (em) performance (ggate and NFS) problem

2004-11-19 Thread Shunsuke SHINOMIYA

 Hi, Jeremie, how is this?
 To disable Interrupt Moderation, sysctl hw.em?.int_throttle_valve=0.

 However, because this patch is just made now, it is not fully tested.

> > if you suppose your computer has sufficient performance, please try to
> > disable or adjust parameters of Interrupt Moderation of em.
> 
> Nice !  It would be even better if there was a boot-time sysctl to
> configure the behaviour of this feature, or something like ifconfig
> link0 option of the fxp(4) driver.

-- 
Shunsuke SHINOMIYA <[EMAIL PROTECTED]>


if_em.diff
Description: Binary data
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-19 Thread Jeremie Le Hen
> if you suppose your computer has sufficient performance, please try to
> disable or adjust parameters of Interrupt Moderation of em.

Nice !  It would be even better if there was a boot-time sysctl to
configure the behaviour of this feature, or something like ifconfig
link0 option of the fxp(4) driver.

-- 
Jeremie Le Hen
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[2]: serious networking (em) performance (ggate and NFS) problem

2004-11-18 Thread Shunsuke SHINOMIYA

 Hi list,
if you suppose your computer has sufficient performance, please try to
disable or adjust parameters of Interrupt Moderation of em.
 
 In my router(Xeon 2.4GHz and on-board two em interfaces) case, it
improves a router's packet forwarding performance. I think the
interrupt delay by Interrupt Moderation caused NIF's input buffer
overflow or output buffer underrun in this case.


 In order to disable Interrupt Moderation, modify src/sys/dev/em/if_em.c
like the following patch and set hw.em.{rx,tx}_{,abs_}int_delay zero by
sysctl.


 *** if_em.c-1.44.2.3.orig   Fri Nov 19 11:22:48 2004
--- if_em.c Fri Nov 19 11:23:39 2004
*** em_initialize_receive_unit(struct adapte
*** 2611,2618 
--- 2611,2622 

  /* Set the interrupt throttling rate.  Value is calculated
   * as DEFAULT_ITR = 1/(MAX_INTS_PER_SEC * 256ns) */
+ #if 1
+ #define DEFAULT_ITR 0
+ #else
  #define MAX_INTS_PER_SEC8000
  #define DEFAULT_ITR 10/(MAX_INTS_PER_SEC * 256)
+ #endif
  E1000_WRITE_REG(&adapter->hw, ITR, DEFAULT_ITR);
  }

-- 
Shunsuke SHINOMIYA <[EMAIL PROTECTED]>

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-18 Thread Matthew Dillon
Polling should not produce any improvement over interrupts for EM0.
The EM0 card will aggregate 8-14+ packets per interrupt, or more.
which is only around 8000 interrupts/sec.  I've got a ton of these 
cards installed.

# mount_nfs -a 4 dhcp61:/home /mnt
# dd if=/mnt/x of=/dev/null bs=32k
# netstat -in 1
input(Total)   output
   packets  errs  bytespackets  errs  bytes colls
 66401 0   93668746   5534 0 962920 0
 66426 0   94230092   5537 01007108 0
 66424 0   93699848   5536 0 963268 0
 66422 0   94222372   5536 01007290 0
 66391 0   93654846   5534 0 962746 0
 66375 0   94154432   5532 01006404 0

  zfod   Interrupts
Proc:r  p  d  s  wCsw  Trp  Sys  Int  Sof  Fltcow8100 total
 19  62117   75 81004   12  88864 wire   7873 mux irq10
10404 act ata0 irq14
19.2%Sys   0.0%Intr  0.0%User  0.0%Nice 80.8%Idl   864476 inact   ata1 irq15
||||||||||  58152 cache   mux irq11
==   2992 free227 clk irq0


Note that the interrupt rate is only 7873 interrupts per second
while I am transfering 94 MBytes/sec over NFS (UDP) and receiving
over 66000 packets per second (~8 packets per interrupt).

If I use a TCP mount I get just about the same thing:

# mount_nfs -T -a 4 dhcp61:/home /mnt
# dd if=/mnt/x of=/dev/null bs=32k
# netstat -in 1

input(Total)   output
   packets  errs  bytespackets  errs  bytes colls
 61752 0   93978800   8091 0 968618 0
 61780 0   93530484   8098 0 904370 0
 61710 0   93917880   8093 0 968128 0
 61754 0   93491260   8095 0 903940 0
 61756 0   93986320   8097 0 968336 0


Proc:r  p  d  s  wCsw  Trp  Sys  Int  Sof  Fltcow8145 total
   5  8 22828   13 5490 8146   13   11 141556 wire   7917 mux irq10
 7800 act ata0 irq14
26.4%Sys   0.0%Intr  0.0%User  0.0%Nice 73.6%Idl   244872 inact   ata1 irq15
||||||||||  8 cache   mux irq11
=  630780 free228 clk irq0

In this case around 8000 interrupts per second with 61700 packet per
second incoming on the interface (around ~8 packets per interrupt).
The extra interrupts are due to the additional outgoing TCP ack traffic.

If I look at the systat -vm 1 output on the NFS server it also sees
only around 8000 interrupts per second, which isn't saying much other
then it's transmit path (61700 pps outoging) is not creating an undue
interrupt burden relative to the receive path.

-Matt

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: serious networking (em) performance (ggate and NFS) problem

2004-11-18 Thread Robert Watson

On Thu, 18 Nov 2004, Daniel Eriksson wrote:

> I have a Tyan Tiger MPX board (dual AthlonMP) that has two 64bit PCI
> slots.  I have an Adaptec 29160 and a dual port Intel Pro/1000 MT
> plugged into those slots. 
> 
> As can be seen from the vmstat -i output below, em1 shares ithread with
> ahc0. This is with ACPI disabled. With ACPI enabled all devices get
> their own ithread (I think, not 100% sure). However, because of some
> hardware problem (interrupt routing?), em0 interrupts will somehow leak
> into atapci1+, generating a higher interrupt load. I'm not sure how
> expensive this is. 

I see precisely this problem on several motherboards, including the Intel
Westville.  There's some speculation on the source of the problem, but I
see related problems in 4.x as well.  Either I get them on different
interrupts but both fire, or on the same interrupt.  FYI, picking the
right one depends a bit on your configuration, but generally scheduling
multiple ithreads is more expensive than running multiple handlers in the
same ithread, so I think it's generally preferable to run with them on the
same interrupt.  Especially if nothing on the same interrupt is acquiring
Giant.  Acquiring and dropping Giant uncontended is cheaper than context
switching, however.

> Finally, my question. What would you recommend:
> 1) Run with ACPI disabled and debug.mpsafenet=1 and hope that the mix of
> giant-safe and giant-locked (em and ahc) doesn't trigger any bugs. This is
> what I currently do.

This shouldn't cause bugs; the ithread handler is smart and will acquire
Giant around the ahc code.  That will also make it slower due to the extra
mutex operations, however.

> 2) Run with ACPI disabled and debug.mpsafenet=0 and accept lower network
> performance (it is a high-traffic server, so I'm not sure this is a valid
> option).
> 3) Run with ACPI enabled and debug.mpsafenet=1 and accept that em0
> interrupts "leak" to the atapci1+ ithread. This I have done in the past.

I think you want to run the ahc stuff, unfortunately.  The good news is
that the higher the load, the more interrupt mitigation/coalescing will
kick in for if_em, so the fewer you'll see.  Under load, usually my boxes
hang out at 4k-6k interrupts/sec for if_em and don't go much above that. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Principal Research Scientist, McAfee Research



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: serious networking (em) performance (ggate and NFS) problem

2004-11-18 Thread Daniel Eriksson
M. Warner Losh wrote:

> Also, make sure that you aren't sharing interrupts between
> GIANT-LOCKED and non-giant-locked cards.  This might be exposing bugs
> in the network layer that debug.mpsafenet=0 might correct.  Just
> noticed that our setup here has that setup, so I'll be looking into
> that area of things.

I have a Tyan Tiger MPX board (dual AthlonMP) that has two 64bit PCI slots.
I have an Adaptec 29160 and a dual port Intel Pro/1000 MT plugged into those
slots.

As can be seen from the vmstat -i output below, em1 shares ithread with
ahc0. This is with ACPI disabled. With ACPI enabled all devices get their
own ithread (I think, not 100% sure). However, because of some hardware
problem (interrupt routing?), em0 interrupts will somehow leak into
atapci1+, generating a higher interrupt load. I'm not sure how expensive
this is.

Finally, my question. What would you recommend:
1) Run with ACPI disabled and debug.mpsafenet=1 and hope that the mix of
giant-safe and giant-locked (em and ahc) doesn't trigger any bugs. This is
what I currently do.
2) Run with ACPI disabled and debug.mpsafenet=0 and accept lower network
performance (it is a high-traffic server, so I'm not sure this is a valid
option).
3) Run with ACPI enabled and debug.mpsafenet=1 and accept that em0
interrupts "leak" to the atapci1+ ithread. This I have done in the past.


# vmstat -i
interrupt  total   rate
irq1: atkbd0   2  0
irq4: sio0   710  0
irq6: fdc0 8  0
irq8: rtc   11470937127
irq13: npx01  0
irq14: ata0  1744545 19
irq15: ata1  1749617 19
irq16: em0 atapci1+186062858   2074
irq17: em1 ahc0 27028088301
irq18: atapci3   7393468 82
irq19: atapci4+  7129446 79
irq0: clk  179054582   1995
Total  421634262   4699

/Daniel Eriksson


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-18 Thread M. Warner Losh
In message: <[EMAIL PROTECTED]>
Robert Watson <[EMAIL PROTECTED]> writes:
: (1) I'd first off check that there wasn't a serious interrupt problem on
: the box, which is often triggered by ACPI problems.  Get the box to be
: as idle as possible, and then use vmstat -i or stat -vmstat to see if
: anything is spewing interrupts. 

Also, make sure that you aren't sharing interrupts between
GIANT-LOCKED and non-giant-locked cards.  This might be exposing bugs
in the network layer that debug.mpsafenet=0 might correct.  Just
noticed that our setup here has that setup, so I'll be looking into
that area of things.

Warner
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-18 Thread Mike Jakubik
Andreas Braukmann said:
> --On Mittwoch, 17. November 2004 20:48 Uhr -0500 Mike Jakubik
> <[EMAIL PROTECTED]> wrote:
>
>> I have two PCs connected together, using the em card. One is FreeBSD 6
>> from Fri Nov  5 , the other is Windows XP. I am using the default mtu of
>> 1500, no polling, and i get ~ 21MB/s tranfser rates via ftp. Im sure
>> this
>> would be higher with jumbo frames.
>
> probably
>
>> Both computers are AMD cpus with Via
>> chipsets.
>
> Which AMD Chipset? VIA did some pretty bad PCI implementations
> in the past. Once I wondered about suspiciously low transfer
> rates in the process of testing 3Ware and Adaptec (2120S, 2200S)
> RAID-Controllers. The transfer rates maxed out at ca. 30 MByte/s.
> Switching the testboxes mainboard from one with VIA chipset to
> one with  an AMD (MP / MPX) chipset was a great success.

The FreeBSD box is KT133A and Windows is KT266A. The VIA chipsets had
bandwith or latency (i cant remember) issues with the PCI bus. Perhaps you
are maxing out your PCI bus, or the HDs cant keep up?


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-18 Thread Robert Watson

On Thu, 18 Nov 2004, Wilko Bulte wrote:

> On Thu, Nov 18, 2004 at 12:27:44PM +, Robert Watson wrote..
> > 
> > On Wed, 17 Nov 2004, Emanuel Strobl wrote:
> > 
> > > I really love 5.3 in many ways but here're some unbelievable transfer
> > > rates, after I went out and bought a pair of Intel GigaBit Ethernet
> > > Cards to solve my performance problem (*laugh*): 
> > 
> > I think the first thing you want to do is to try and determine whether the
> > problem is a link layer problem, network layer problem, or application
> > (file sharing) layer problem.  Here's where I'd start looking:
> 
> And you definitely want to look at polling(4) 

He did, but he set the HZ at 256, which is sufficiently low as to
guarantee a substantial increase in latency, and likely guarantee
interface and socket queue overruns (although I haven't done the math to
verify that is the case).  Between the very finite sizes of ifnet send
queues, socket buffers, and if_em descriptors, and on-board buffers on the
card, high latency polling can result in lots of packet loss and delay
under load.  Hence the recommendation of a relatively high value of HZ so
that the queues in the driver are drained regularly, and sends
acknowledged so that the sent mbufs can be reclaimed and reused.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Principal Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-18 Thread Wilko Bulte
On Thu, Nov 18, 2004 at 12:27:44PM +, Robert Watson wrote..
> 
> On Wed, 17 Nov 2004, Emanuel Strobl wrote:
> 
> > I really love 5.3 in many ways but here're some unbelievable transfer
> > rates, after I went out and bought a pair of Intel GigaBit Ethernet
> > Cards to solve my performance problem (*laugh*): 
> 
> I think the first thing you want to do is to try and determine whether the
> problem is a link layer problem, network layer problem, or application
> (file sharing) layer problem.  Here's where I'd start looking:

And you definitely want to look at polling(4)


-- 
Wilko Bulte [EMAIL PROTECTED]


smime.p7s
Description: S/MIME cryptographic signature


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-18 Thread Robert Watson

On Thu, 18 Nov 2004, Pawel Jakub Dawidek wrote:

> On Wed, Nov 17, 2004 at 11:57:41PM +0100, Emanuel Strobl wrote:
> +> Dear best guys,
> +> 
> +> I really love 5.3 in many ways but here're some unbelievable transfer 
> rates, 
> +> after I went out and bought a pair of Intel GigaBit Ethernet Cards to 
> solve 
> +> my performance problem (*laugh*):
> [...]
> 
> I done some test in the past with ggate and PCI64/GBit NICs and I get
> ~38MB/s AFAIR. 
> 
> Remember that when using 32bit PCIs you can get transfer about
> 500Mbit/s. 
> 
> Please run those test with netperf (/usr/ports/benchmarks/netperf) and
> send results. 

Be aware, btw, that while netperf is a pretty decent tool, it performs a
lot of socket select and timing operations, so isn't always a good measure
of maximum capabilities of a system.  I.e., it is not unusual to see that
a netperf send test will only see send() as one in three or one in four
system calls -- as a result, it uses a measurable amount of CPU on things
other than sending.  In an environment with CPU constraints (slower CPU or
faster network), this can impact the performance results substantially.

For example, when measuring maximum packet send performance using minimal
packet sizes from user space, several of my test boxes are constrained
with 64-bit gig-e PCI cards based on CPU.  In particular, the combined
cost of the additional system calls and operations cuts into available CPU
for send.  By eliminating the misc.  overheads of netperf using netsend, I
can improve performance by 20%-30%. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Principal Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-18 Thread Robert Watson

On Wed, 17 Nov 2004, Emanuel Strobl wrote:

> I really love 5.3 in many ways but here're some unbelievable transfer
> rates, after I went out and bought a pair of Intel GigaBit Ethernet
> Cards to solve my performance problem (*laugh*): 

I think the first thing you want to do is to try and determine whether the
problem is a link layer problem, network layer problem, or application
(file sharing) layer problem.  Here's where I'd start looking:

(1) I'd first off check that there wasn't a serious interrupt problem on
the box, which is often triggered by ACPI problems.  Get the box to be
as idle as possible, and then use vmstat -i or stat -vmstat to see if
anything is spewing interrupts. 

(2) Confirm that your hardware is capable of the desired rates: typically
this involves looking at whether you have a decent card (most if_em
cards are decent), whether it's 32-bit or 64-bit PCI, and so on.  For
unidirectional send on 32-bit PCI, be aware that it is not possible to
achieve gigabit performance because the PCI bus isn't fast enough, for
example.

(3) Next, I'd use a tool like netperf (see ports collection) to establish
three characteristics: round trip latency from user space to user
space (UDP_RR), TCP throughput (TCP_STREAM), and large packet
throughput (UDP_STREAM).  With decent boxes on 5.3, you should have no
trouble at all maxing out a single gig-e with if_em, assuming all is
working well hardware wise and there's no software problem specific to
your configuration.

(4) Note that router latency (and even switch latency) can have a
substantial impact on gigabit performance, even with no packet loss,
in part due to stuff like ethernet flow control.  You may want to put
the two boxes back-to-back for testing purposes.

(5) Next, I'd measure CPU consumption on the end box -- in particular, use
top -S and systat -vmstat 1 to compare the idle condition of the
system and the system under load.

If you determine there is a link layer or IP layer problem, we can start
digging into things like the error statistics in the card, negotiation
issues, etc.  If not, you want to move up the stack to try and
characterize where it is you're hitting the performance issue.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Principal Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-18 Thread Pawel Jakub Dawidek
On Wed, Nov 17, 2004 at 11:57:41PM +0100, Emanuel Strobl wrote:
+> Dear best guys,
+> 
+> I really love 5.3 in many ways but here're some unbelievable transfer rates, 
+> after I went out and bought a pair of Intel GigaBit Ethernet Cards to solve 
+> my performance problem (*laugh*):
[...]

I done some test in the past with ggate and PCI64/GBit NICs and I get
~38MB/s AFAIR.

Remember that when using 32bit PCIs you can get transfer about 500Mbit/s.

Please run those test with netperf (/usr/ports/benchmarks/netperf) and
send results.

-- 
Pawel Jakub Dawidek   http://www.FreeBSD.org
[EMAIL PROTECTED]   http://garage.freebsd.pl
FreeBSD committer Am I Evil? Yes, I Am!


pgpc3pWmAT4AS.pgp
Description: PGP signature


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-18 Thread Andreas Braukmann
--On Mittwoch, 17. November 2004 20:48 Uhr -0500 Mike Jakubik <[EMAIL 
PROTECTED]> wrote:
I have two PCs connected together, using the em card. One is FreeBSD 6
from Fri Nov  5 , the other is Windows XP. I am using the default mtu of
1500, no polling, and i get ~ 21MB/s tranfser rates via ftp. Im sure this
would be higher with jumbo frames.
probably
Both computers are AMD cpus with Via
chipsets.
Which AMD Chipset? VIA did some pretty bad PCI implementations
in the past. Once I wondered about suspiciously low transfer
rates in the process of testing 3Ware and Adaptec (2120S, 2200S)
RAID-Controllers. The transfer rates maxed out at ca. 30 MByte/s.
Switching the testboxes mainboard from one with VIA chipset to
one with  an AMD (MP / MPX) chipset was a great success.
-Andreas
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-17 Thread Mike Jakubik
Emanuel Strobl said:
    ~ 15MB/s
> .and with 1m blocksize:
>  test2:~#17: dd if=/dev/zero of=/samsung/testfile bs=1m
>  ^C61+0 records in
>  60+0 records out
>  62914560 bytes transferred in 4.608726 secs (13651182 bytes/sec)
> -> ~ 13,6MB/s
>
> I can't imagine why there seems to be a absolute limit of 15MB/s that can
> be
> transfered over the network

I have two PCs connected together, using the em card. One is FreeBSD 6
from Fri Nov  5 , the other is Windows XP. I am using the default mtu of
1500, no polling, and i get ~ 21MB/s tranfser rates via ftp. Im sure this
would be higher with jumbo frames. Both computers are AMD cpus with Via
chipsets. Perhaps its your hard drive that cant keep up?


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-17 Thread Emanuel Strobl
Am Donnerstag, 18. November 2004 01:01 schrieb Chuck Swiger:
> Emanuel Strobl wrote:
> [ ... ]
>
> > Tests were done with two Intel GigaBit Ethernet cards (82547EI, 32bit PCI
> > Desktop adapter MT) connected directly without a switch/hub
>
> If filesharing via NFS is your primary goal, it's reasonable to test that,

GEOM_GATE is my primary goal, and I can remember that when Pavel wrote this 
great feature, he took care about performance and easyly outperformed NFS 
(with 100baseTX AFAIK)

> however it would be easier to make sense of your results by testing your
> network hardware at a lower level.  Since you're already running
> portmap/RPC, consider using spray to blast some packets rapidly and see
> what kind of bandwidth you max out using that.  Or use ping with -i & -s
> set to reasonable values depending on whether you're using jumbo frames or
> not.
>
> If the problem is your connection is dropping a few packets, this will show
> up better here.  Using "ping -f" is also a pretty good troubleshooter.  If
> you can dig up a gigabit switch with management capabilities to test with,
> taking a look at the per-port statistics for errors would also be worth
> doing.  A dodgy network cable can still work well enough for the cards to
> have a green link light, but fail to handle high traffic properly.

I'll do some tests regarding these issues to make sure I'm not suffering from 
ill conditions, but I'm quite sure my testbed's feeling fine. I don't have 
one of these nice managed GigaBit switches, just a x-over cable

>
> [ ... ]
>
> > - em seems to have problems with MTU greater than 1500
>
> Have you tried using an MTU of 3K or 7K?
>
> I also seem to recall that there were performance problems with em in 5.3
> and a fix is being tested in -CURRENT.  [I just saw Scott's response to the
> list, and your answer, so maybe nevermind this point.]
>
> > - UDP seems to have performance disadvantages over TCP regarding NFS
> > which should be vice versa AFAIK
>
> Hmm, yeah...again, this makes me wonder whether you are dropping packets.
> NFS over TCP does better than UDP does in lossy network conditions.

Of course, but If I connect two GbE cards (wich implies that auto-mdi-X and 
full duplex is mandatory in 1000baseTX mode) I don't expect any UDP packet to 
get lost.
But I'll verify tomorrow.

>
> > - polling and em (GbE) with HZ=256 is definitly no good idea, even
> > 10Base-2 can compete
>
> You should be setting HZ to 1000, 2000, or so when using polling, and a

Yep, I know that HZ set to 256 with polling enabled isn't really useful, but I 
don't want to drive my GbE card in polling mode at all, instead I try to 
prevent my machine from spending time doing nothing, so HZ shouldn't be too 
high.

Thank you,

-Harry

> higher HZ is definitely recommmended when you add in jumbo frames and GB
> speeds.


pgpFnJcrwRTMt.pgp
Description: PGP signature


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-17 Thread Wilkinson, Alex

ping only tests latency *not* throughput. So it is not really a good test.



 - aW

0n Wed, Nov 17, 2004 at 07:01:24PM -0500, Chuck Swiger wrote: 

Emanuel Strobl wrote:
[ ... ]
>Tests were done with two Intel GigaBit Ethernet cards (82547EI, 32bit 
PCI 
>Desktop adapter MT) connected directly without a switch/hub

what kind of bandwidth you max out using that.  Or use ping with -i & 
-s 
set to reasonable values depending on whether you're using jumbo frames 
or 
not.

If the problem is your connection is dropping a few packets, this will 
show 
up better here.  Using "ping -f" is also a pretty good troubleshooter.  

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-17 Thread Chuck Swiger
Emanuel Strobl wrote:
[ ... ]
Tests were done with two Intel GigaBit Ethernet cards (82547EI, 32bit PCI 
Desktop adapter MT) connected directly without a switch/hub
If filesharing via NFS is your primary goal, it's reasonable to test that, 
however it would be easier to make sense of your results by testing your 
network hardware at a lower level.  Since you're already running portmap/RPC, 
consider using spray to blast some packets rapidly and see what kind of 
bandwidth you max out using that.  Or use ping with -i & -s set to reasonable 
values depending on whether you're using jumbo frames or not.

If the problem is your connection is dropping a few packets, this will show up 
better here.  Using "ping -f" is also a pretty good troubleshooter.  If you 
can dig up a gigabit switch with management capabilities to test with, taking 
a look at the per-port statistics for errors would also be worth doing.  A 
dodgy network cable can still work well enough for the cards to have a green 
link light, but fail to handle high traffic properly.

[ ... ]
- em seems to have problems with MTU greater than 1500
Have you tried using an MTU of 3K or 7K?
I also seem to recall that there were performance problems with em in 5.3 and 
a fix is being tested in -CURRENT.  [I just saw Scott's response to the list, 
and your answer, so maybe nevermind this point.]

- UDP seems to have performance disadvantages over TCP regarding NFS which 
should be vice versa AFAIK
Hmm, yeah...again, this makes me wonder whether you are dropping packets.
NFS over TCP does better than UDP does in lossy network conditions.
- polling and em (GbE) with HZ=256 is definitly no good idea, even 10Base-2 
can compete
You should be setting HZ to 1000, 2000, or so when using polling, and a higher 
HZ is definitely recommmended when you add in jumbo frames and GB speeds.

--
-Chuck
PS: followup-to set to reduce crossposting...
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-17 Thread Emanuel Strobl
Am Donnerstag, 18. November 2004 00:33 schrieb Scott Long:
> Emanuel Strobl wrote:
> > Dear best guys,
> >
> > I really love 5.3 in many ways but here're some unbelievable transfer
> > rates, after I went out and bought a pair of Intel GigaBit Ethernet Cards
> > to solve my performance problem (*laugh*):
> >
> > (In short, see *** below)
[...]
> >
> > - overall network performance (regarding large file transfers) is
> > horrible
> >
> > Please, if anybody has the knowledge to dig into these problems, let me
> > know if I can do any tests to help getting ggate and NFS useful in fast
> > 5.3-stable environments.
>
> if_em in 5.3 has a large performance penalty in the common case due to a
> programming error.  I fixed it in 6-CURRENT and 5.3-STABLE.  You might
> want to try updating to the RELENG_5 branch to see if you get better
> results.

The test machines work with RELENG_5 from today.
When have you merged the fixes exactly?

Thanks a lot,

-Harry

>
> Scott
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"


pgp2ORyTWPZnj.pgp
Description: PGP signature


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-17 Thread Scott Long
Emanuel Strobl wrote:
Dear best guys,
I really love 5.3 in many ways but here're some unbelievable transfer rates, 
after I went out and bought a pair of Intel GigaBit Ethernet Cards to solve 
my performance problem (*laugh*):

(In short, see *** below)
Tests were done with two Intel GigaBit Ethernet cards (82547EI, 32bit PCI 
Desktop adapter MT) connected directly without a switch/hub and "device 
polling" compiled into a custom kernel with HZ set to 256 and 
kern.polling.enabled set to "1":

LOCAL:
(/samsung is ufs2 on /dev/ad4p1, a SAMSUNG SP080N2)
 test3:~#7: dd if=/dev/zero of=/samsung/testfile bs=16k
 ^C10524+0 records in
 10524+0 records out
 172425216 bytes transferred in 3.284735 secs (52492882 bytes/sec)
->
  ~ 52MB/s
NFS(udp,polling):
(/samsung is nfs on test3:/samsung, via em0, x-over, polling enabled)
 test2:/#21: dd if=/dev/zero of=/samsung/testfile bs=16k
 ^C1858+0 records in
 1857+0 records out
 30425088 bytes transferred in 8.758475 secs (3473788 bytes/sec)
->^^^ ~ 3,4MB/s
This example shows that using NFS over GigaBit Ethernet decimates performance 
by the factor of 15, in words fifteen!

GGATE with MTU 16114 and polling:
 test2:/dev#28: ggatec create 10.0.0.2 /dev/ad4p1
 ggate0
 test2:/dev#29: mount /dev/ggate0 /samsung/
 test2:/dev#30: dd if=/dev/zero of=/samsung/testfile bs=16k
 ^C2564+0 records in
 2563+0 records out
 41992192 bytes transferred in 15.908581 secs (2639594 bytes/sec)
-> ^^^ ~ 2,6MB/s
GGATE without polling and MTU 16114:
 test2:~#12: ggatec create 10.0.0.2 /dev/ad4p1
 ggate0
 test2:~#13: mount /dev/ggate0 /samsung/
 test2:~#14: dd if=/dev/zero of=/samsung/testfile bs=128k
 ^C1282+0 records in
 1281+0 records out
 167903232 bytes transferred in 11.274768 secs (14891945 bytes/sec)
->   ~ 15MB/s
.and with 1m blocksize:
 test2:~#17: dd if=/dev/zero of=/samsung/testfile bs=1m
 ^C61+0 records in
 60+0 records out
 62914560 bytes transferred in 4.608726 secs (13651182 bytes/sec)
-> ~ 13,6MB/s
I can't imagine why there seems to be a absolute limit of 15MB/s that can be 
transfered over the network
But it's even worse, here two excerpts of NFS (udp) with jumbo Frames 
(mtu=16114):
 test2:~#23: mount 10.0.0.2:/samsung /samsung/
 test2:~#24: dd if=/dev/zero of=/samsung/testfile bs=1m
 ^C89+0 records in
 88+0 records out
 92274688 bytes transferred in 13.294708 secs (6940708 bytes/sec)
-> ^^^ ~7MB/s
.and with 64k blocksize:
 test2:~#25: dd if=/dev/zero of=/samsung/testfile bs=64k
 ^C848+0 records in
 847+0 records out
 55508992 bytes transferred in 8.063415 secs (6884055 bytes/sec)

And with TCP-NFS (and Jumbo Frames):
 test2:~#30: mount_nfs -T 10.0.0.2:/samsung /samsung/
 test2:~#31: dd if=/dev/zero of=/samsung/testfile bs=64k
 ^C1921+0 records in
 1920+0 records out
 125829120 bytes transferred in 7.461226 secs (16864403 bytes/sec)
->  ~ 17MB/s
Again NFS (udp) but with MTU 1500:
 test2:~#9: mount_nfs 10.0.0.2:/samsung /samsung/
 test2:~#10: dd if=/dev/zero of=/samsung/testfile bs=8k
 ^C12020+0 records in
 12019+0 records out
 98459648 bytes transferred in 10.687460 secs (9212633 bytes/sec)
-> ^^^ ~ 10MB/s
And TCP-NFS with MTU 1500:
 test2:~#12: mount_nfs -T 10.0.0.2:/samsung /samsung/
 test2:~#13: dd if=/dev/zero of=/samsung/testfile bs=8k
 ^C19352+0 records in
 19352+0 records out
 158531584 bytes transferred in 12.093529 secs (13108794 bytes/sec)
->   ~ 13MB/s
GGATE with default MTU of 1500, polling disabled:
 test2:~#14: dd if=/dev/zero of=/samsung/testfile bs=64k
 ^C971+0 records in
 970+0 records out
 63569920 bytes transferred in 6.274578 secs (10131346 bytes/sec)
-> ~ 10M/s
Conclusion:
***
- It seems that GEOM_GATE is less efficient with GigaBit (em) than NFS via TCP 
is.

- em seems to have problems with MTU greater than 1500
- UDP seems to have performance disadvantages over TCP regarding NFS which 
should be vice versa AFAIK

- polling and em (GbE) with HZ=256 is definitly no good idea, even 10Base-2 
can compete
You should be setting HZ to 1000 or higher.
- NFS over TCP with MTU of 16114 gives the maximum transferrate for large 
files over GigaBit Ethernet with a value of 17MB/s, a quarter of what I'd 
expect with my test equipment.

- overall network performance (regarding large file transfers) is horrible
Please, if anybody has the knowledge to dig into these problems, let me know 
if I can do any tests to help getting ggate and NFS useful in fast 5.3-stable 
environments.
if_em in 5.3 has a large performance penalty in the co

Re: serious networking (em) performance (ggate and NFS) problem

2004-11-17 Thread Emanuel Strobl
Am Donnerstag, 18. November 2004 00:17 schrieb Sean McNeil:
> On Wed, 2004-11-17 at 23:57 +0100, Emanuel Strobl wrote:
> > Dear best guys,
> >
> > I really love 5.3 in many ways but here're some unbelievable transfer
> > rates, after I went out and bought a pair of Intel GigaBit Ethernet Cards
> > to solve my performance problem (*laugh*):
> >
> > (In short, see *** below)
[...]
> > Conclusion:
> >
> > ***
> >
> > - It seems that GEOM_GATE is less efficient with GigaBit (em) than NFS
> > via TCP is.
> >
> > - em seems to have problems with MTU greater than 1500
> >
> > - UDP seems to have performance disadvantages over TCP regarding NFS
> > which should be vice versa AFAIK
> >
> > - polling and em (GbE) with HZ=256 is definitly no good idea, even
> > 10Base-2 can compete
> >
> > - NFS over TCP with MTU of 16114 gives the maximum transferrate for large
> > files over GigaBit Ethernet with a value of 17MB/s, a quarter of what I'd
> > expect with my test equipment.
> >
> > - overall network performance (regarding large file transfers) is
> > horrible
> >
> > Please, if anybody has the knowledge to dig into these problems, let me
> > know if I can do any tests to help getting ggate and NFS useful in fast
> > 5.3-stable environments.
>
> I am very interested in this as I have similar issues with the re
> driver.  It it horrible when operating at gigE vs. 100BT.  Have you
> tried plugging the machines into a 100BT instead?

No, because I observed similar bad performance with my fileserver which is 
almost the same HW and it's em (Intel GbE) is connected to the local 
100baseTX segment.
I explicitly avoided to go via any switch/hub to eliminate further problems.
I wonder if anybody has ever been able to transfer more than 17MB/s via IP 
anyway?
I need this performance for mirroring via ggate, so I'm thinking about fwe (IP 
over Firewire).
Perhaps somebody has tried this already? If fwe gives reasonable transferrates 
I guess the perfomance problem won't be found in ethernet but in IP.

Thanks,

-Harry

>
> Cheers,
> Sean


pgp2Ebov1BURO.pgp
Description: PGP signature


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-17 Thread Sean McNeil
On Wed, 2004-11-17 at 23:57 +0100, Emanuel Strobl wrote:
> Dear best guys,
> 
> I really love 5.3 in many ways but here're some unbelievable transfer rates, 
> after I went out and bought a pair of Intel GigaBit Ethernet Cards to solve 
> my performance problem (*laugh*):
> 
> (In short, see *** below)
> 
> Tests were done with two Intel GigaBit Ethernet cards (82547EI, 32bit PCI 
> Desktop adapter MT) connected directly without a switch/hub and "device 
> polling" compiled into a custom kernel with HZ set to 256 and 
> kern.polling.enabled set to "1":
> 
> LOCAL:
> (/samsung is ufs2 on /dev/ad4p1, a SAMSUNG SP080N2)
>  test3:~#7: dd if=/dev/zero of=/samsung/testfile bs=16k
>  ^C10524+0 records in
>  10524+0 records out
>  172425216 bytes transferred in 3.284735 secs (52492882 bytes/sec)
> ->
>   ~ 52MB/s
> NFS(udp,polling):
> (/samsung is nfs on test3:/samsung, via em0, x-over, polling enabled)
>  test2:/#21: dd if=/dev/zero of=/samsung/testfile bs=16k
>  ^C1858+0 records in
>  1857+0 records out
>  30425088 bytes transferred in 8.758475 secs (3473788 bytes/sec)
> ->^^^ ~ 3,4MB/s
> 
> This example shows that using NFS over GigaBit Ethernet decimates performance 
> by the factor of 15, in words fifteen!
> 
> GGATE with MTU 16114 and polling:
>  test2:/dev#28: ggatec create 10.0.0.2 /dev/ad4p1
>  ggate0
>  test2:/dev#29: mount /dev/ggate0 /samsung/
>  test2:/dev#30: dd if=/dev/zero of=/samsung/testfile bs=16k
>  ^C2564+0 records in
>  2563+0 records out
>  41992192 bytes transferred in 15.908581 secs (2639594 bytes/sec)
> -> ^^^ ~ 2,6MB/s
> 
> GGATE without polling and MTU 16114:
>  test2:~#12: ggatec create 10.0.0.2 /dev/ad4p1
>  ggate0
>  test2:~#13: mount /dev/ggate0 /samsung/
>  test2:~#14: dd if=/dev/zero of=/samsung/testfile bs=128k
>  ^C1282+0 records in
>  1281+0 records out
>  167903232 bytes transferred in 11.274768 secs (14891945 bytes/sec)
> ->   ~ 15MB/s
> .and with 1m blocksize:
>  test2:~#17: dd if=/dev/zero of=/samsung/testfile bs=1m
>  ^C61+0 records in
>  60+0 records out
>  62914560 bytes transferred in 4.608726 secs (13651182 bytes/sec)
> -> ~ 13,6MB/s
> 
> I can't imagine why there seems to be a absolute limit of 15MB/s that can be 
> transfered over the network
> But it's even worse, here two excerpts of NFS (udp) with jumbo Frames 
> (mtu=16114):
>  test2:~#23: mount 10.0.0.2:/samsung /samsung/
>  test2:~#24: dd if=/dev/zero of=/samsung/testfile bs=1m
>  ^C89+0 records in
>  88+0 records out
>  92274688 bytes transferred in 13.294708 secs (6940708 bytes/sec)
> -> ^^^ ~7MB/s
> .and with 64k blocksize:
>  test2:~#25: dd if=/dev/zero of=/samsung/testfile bs=64k
>  ^C848+0 records in
>  847+0 records out
>  55508992 bytes transferred in 8.063415 secs (6884055 bytes/sec)
> 
> And with TCP-NFS (and Jumbo Frames):
>  test2:~#30: mount_nfs -T 10.0.0.2:/samsung /samsung/
>  test2:~#31: dd if=/dev/zero of=/samsung/testfile bs=64k
>  ^C1921+0 records in
>  1920+0 records out
>  125829120 bytes transferred in 7.461226 secs (16864403 bytes/sec)
> ->  ~ 17MB/s
> 
> Again NFS (udp) but with MTU 1500:
>  test2:~#9: mount_nfs 10.0.0.2:/samsung /samsung/
>  test2:~#10: dd if=/dev/zero of=/samsung/testfile bs=8k
>  ^C12020+0 records in
>  12019+0 records out
>  98459648 bytes transferred in 10.687460 secs (9212633 bytes/sec)
> -> ^^^ ~ 10MB/s
> And TCP-NFS with MTU 1500:
>  test2:~#12: mount_nfs -T 10.0.0.2:/samsung /samsung/
>  test2:~#13: dd if=/dev/zero of=/samsung/testfile bs=8k
>  ^C19352+0 records in
>  19352+0 records out
>  158531584 bytes transferred in 12.093529 secs (13108794 bytes/sec)
> ->   ~ 13MB/s
> 
> GGATE with default MTU of 1500, polling disabled:
>  test2:~#14: dd if=/dev/zero of=/samsung/testfile bs=64k
>  ^C971+0 records in
>  970+0 records out
>  63569920 bytes transferred in 6.274578 secs (10131346 bytes/sec)
> -> ~ 10M/s
> 
> 
> Conclusion:
> 
> ***
> 
> - It seems that GEOM_GATE is less efficient with GigaBit (em) than NFS via 
> TCP 
> is.
> 
> - em seems to have problems with MTU greater than 1500
> 
> - UDP seems to have performance disadvantages over TCP regarding NFS which 
> should be vice versa AFAIK
> 
> - polling and em (GbE) with HZ=256 is definitly no good idea, even 10Base-2 
> can compete
> 
> - NFS over TCP with MTU of 16114 gives the maximum transferrate for large 
> files over GigaBit Ethernet with a value of 17MB/s, a quarter of what I'd 
> expect with my test equipment.
> 
> - overall network performance (regarding la

serious networking (em) performance (ggate and NFS) problem

2004-11-17 Thread Emanuel Strobl
Dear best guys,

I really love 5.3 in many ways but here're some unbelievable transfer rates, 
after I went out and bought a pair of Intel GigaBit Ethernet Cards to solve 
my performance problem (*laugh*):

(In short, see *** below)

Tests were done with two Intel GigaBit Ethernet cards (82547EI, 32bit PCI 
Desktop adapter MT) connected directly without a switch/hub and "device 
polling" compiled into a custom kernel with HZ set to 256 and 
kern.polling.enabled set to "1":

LOCAL:
(/samsung is ufs2 on /dev/ad4p1, a SAMSUNG SP080N2)
 test3:~#7: dd if=/dev/zero of=/samsung/testfile bs=16k
 ^C10524+0 records in
 10524+0 records out
 172425216 bytes transferred in 3.284735 secs (52492882 bytes/sec)
->
  ~ 52MB/s
NFS(udp,polling):
(/samsung is nfs on test3:/samsung, via em0, x-over, polling enabled)
 test2:/#21: dd if=/dev/zero of=/samsung/testfile bs=16k
 ^C1858+0 records in
 1857+0 records out
 30425088 bytes transferred in 8.758475 secs (3473788 bytes/sec)
->^^^ ~ 3,4MB/s

This example shows that using NFS over GigaBit Ethernet decimates performance 
by the factor of 15, in words fifteen!

GGATE with MTU 16114 and polling:
 test2:/dev#28: ggatec create 10.0.0.2 /dev/ad4p1
 ggate0
 test2:/dev#29: mount /dev/ggate0 /samsung/
 test2:/dev#30: dd if=/dev/zero of=/samsung/testfile bs=16k
 ^C2564+0 records in
 2563+0 records out
 41992192 bytes transferred in 15.908581 secs (2639594 bytes/sec)
-> ^^^ ~ 2,6MB/s

GGATE without polling and MTU 16114:
 test2:~#12: ggatec create 10.0.0.2 /dev/ad4p1
 ggate0
 test2:~#13: mount /dev/ggate0 /samsung/
 test2:~#14: dd if=/dev/zero of=/samsung/testfile bs=128k
 ^C1282+0 records in
 1281+0 records out
 167903232 bytes transferred in 11.274768 secs (14891945 bytes/sec)
->   ~ 15MB/s
.and with 1m blocksize:
 test2:~#17: dd if=/dev/zero of=/samsung/testfile bs=1m
 ^C61+0 records in
 60+0 records out
 62914560 bytes transferred in 4.608726 secs (13651182 bytes/sec)
-> ~ 13,6MB/s

I can't imagine why there seems to be a absolute limit of 15MB/s that can be 
transfered over the network
But it's even worse, here two excerpts of NFS (udp) with jumbo Frames 
(mtu=16114):
 test2:~#23: mount 10.0.0.2:/samsung /samsung/
 test2:~#24: dd if=/dev/zero of=/samsung/testfile bs=1m
 ^C89+0 records in
 88+0 records out
 92274688 bytes transferred in 13.294708 secs (6940708 bytes/sec)
-> ^^^ ~7MB/s
.and with 64k blocksize:
 test2:~#25: dd if=/dev/zero of=/samsung/testfile bs=64k
 ^C848+0 records in
 847+0 records out
 55508992 bytes transferred in 8.063415 secs (6884055 bytes/sec)

And with TCP-NFS (and Jumbo Frames):
 test2:~#30: mount_nfs -T 10.0.0.2:/samsung /samsung/
 test2:~#31: dd if=/dev/zero of=/samsung/testfile bs=64k
 ^C1921+0 records in
 1920+0 records out
 125829120 bytes transferred in 7.461226 secs (16864403 bytes/sec)
->  ~ 17MB/s

Again NFS (udp) but with MTU 1500:
 test2:~#9: mount_nfs 10.0.0.2:/samsung /samsung/
 test2:~#10: dd if=/dev/zero of=/samsung/testfile bs=8k
 ^C12020+0 records in
 12019+0 records out
 98459648 bytes transferred in 10.687460 secs (9212633 bytes/sec)
-> ^^^ ~ 10MB/s
And TCP-NFS with MTU 1500:
 test2:~#12: mount_nfs -T 10.0.0.2:/samsung /samsung/
 test2:~#13: dd if=/dev/zero of=/samsung/testfile bs=8k
 ^C19352+0 records in
 19352+0 records out
 158531584 bytes transferred in 12.093529 secs (13108794 bytes/sec)
->   ~ 13MB/s

GGATE with default MTU of 1500, polling disabled:
 test2:~#14: dd if=/dev/zero of=/samsung/testfile bs=64k
 ^C971+0 records in
 970+0 records out
 63569920 bytes transferred in 6.274578 secs (10131346 bytes/sec)
-> ~ 10M/s


Conclusion:

***

- It seems that GEOM_GATE is less efficient with GigaBit (em) than NFS via TCP 
is.

- em seems to have problems with MTU greater than 1500

- UDP seems to have performance disadvantages over TCP regarding NFS which 
should be vice versa AFAIK

- polling and em (GbE) with HZ=256 is definitly no good idea, even 10Base-2 
can compete

- NFS over TCP with MTU of 16114 gives the maximum transferrate for large 
files over GigaBit Ethernet with a value of 17MB/s, a quarter of what I'd 
expect with my test equipment.

- overall network performance (regarding large file transfers) is horrible

Please, if anybody has the knowledge to dig into these problems, let me know 
if I can do any tests to help getting ggate and NFS useful in fast 5.3-stable 
environments.

Best regards,

-Harry


pgpDMfUm380pz.pgp
Description: PGP signature