Re: FreeBSD boxes as a 'router'...

2012-11-21 Thread Luigi Rizzo
On Tue, Nov 20, 2012 at 11:32 PM, Andre Oppermann  wrote:

> On 21.11.2012 03:16, khatfi...@socllc.net wrote:
>
>> I may be misstating.
>>
>> Specifically under high burst floods either routed or being dropped by pf
>> we would see the system
>> go unresponsive to user-level applications / SSH for example.
>>
>> The system would still function but it was inaccessible. To clarify as
>> well this was any number
>> of floods or attacks to any ports, the behavior remained. (These were not
>> SSH ports being hit)
>>
>
> I'm working on a hybrid interrupt/polling with live-lock prevention
> scheme in my svn branch.  It works with a combination of disabling
> interrupts in interrupt context and then having an ithread loop over
> the RX DMA queue until it reaches the hardware and is done.  Only
> then interrupts are re-enabled again.  On a busy system it may never
> go back to interrupt.  To prevent live-lock the ithread gives up the
> CPU after a normal quantum to let other threads/processes run as well.
> After that it gets immediately re-scheduled again with a sufficient
> high priority not get starved out by userspace.
>
>
very cool. this seems similar to NAPI.
The only adjustment i'd suggest to your scheme, if possible, is to add
some control (as it existed in the old polling architecture) to make sure
that userspace is not starved by the ithreads and other kernel tasks
(otherwise you can have livelock, as it happens with NAPI).
I am afraid that simple priorities do not work, you either need some
kind of fair scheduler, or put some hard limit on the cpu fraction used
by the kernel tasks when there are userspace processes around.

cheers
luigi

With multiple RX queues and MSI-X interrupts as many ithreads as available
> cores can be run and none of them will live-lock.  I'm also looking at
> using the CoDel algorithm for totally maxed out systems to prevent long
> FIFO packet drop chains in the NIC.  Think of it as RED queue management
> but for the input queue.  That way we can use distributed single packet
> loss as a signalling mechanism for the senders to slow down.  For a
> misbehaving sender blasting away this obviously doesn't help much.  It
> improves the chance of good packets making it through though.
>
> While live-lock prevention is good you still won't be able to log in
> via ssh through an overloaded interface.  Any other interface will
> work w/o packet loss instead.
>
> So far I've fully converted fxp(4) to this new scheme because it is one
> of the simpler drivers with sufficient documentation.  And 100Mbit is
> easy to saturate.
>
> The bge(4) driver is mostly converted but not tested due to lack of
> hardware, which should arrive later this week though.
>
> The em(4), and with it due to similarity igb(4) and ixgbe(4) family,
> is in the works as well.  Again hardware is on the way for testing.
>
> When this work has stabilized I'm looking for testers to put it through
> the paces.  If you're interested and have a suitable test bed then drop
> me an email to get notified.
>
> --
> Andre
>
>
>  Now we did a lot of sysctl resource tuning to correct this with some
>> floods but high rate would
>> still cause the behavior. Other times the system would simply drop all
>> traffic (like a buffer
>> filled or max connections) but it was not either case.
>>
>> The attacks were also well within bandwidth capabilities for the pipe and
>> network gear.
>>
>> All of these issues stopped upon adding polling or the overall threshold
>> was increased
>> tremendously with polling.
>>
>> Yet, polling has some downsides not necessarily due to FreeBSD but
>> application issues. Haproxy is
>> one example where we had handshake/premature connections terminated with
>> polling. Those issues
>> were not present with polling disabled.
>>
>> So that is my reasoning for saying that it was perfect for some things
>> and not for others.
>>
>> In the end, we spent years tinkering and it was always satisfactory but
>> never perfect. Finally we
>> grew to the point of replacing the edge with MX80's and left BSD to load
>> balancing and the like.
>> This finally resolved all issues for us.
>>
>> Albeit, we were a DDoS mitigation company running high PPS and lots of
>> bursting. BSD was
>> beautiful until we ended up needing 10Gps+ on the edge and it was time to
>> go Juniper.
>>
>> I still say BSD took us from nothing to a $30M company. So despite
>> something's requiring
>> tinkering with I think it is still worth the effort to put in the testing
>> to find what is best
>> for your gear and environment.
>>
>> I got off-track but we did find one other thing. We found ipfw did seem
>> to reduce load on the
>> interrupts (likely because we couldn't do near the scrubbing with it vs
>> pf) at any rate less
>> filtering may also fix the issue with the op.
>>
>> Your forwarding - we found doing forwarding via a simple pf rule and a
>> GRE tunnel to an app
>> server or by using a tool like haproxy on the router itself seemed to
>> r

Re: FreeBSD boxes as a 'router'...

2012-11-21 Thread Andre Oppermann

On 21.11.2012 08:55, Adrian Chadd wrote:

Something that has popped up a few times, even recently, is breaking
out of an RX loop after you service a number of frames.


That is what I basically described.


During stupidly high levels of RX, you may find the NIC happily
receiving frames faster than you can service the RX queue. If this
occurs, you could end up just plain being stuck there.


That's the live-lock.


So what I've done in the past is to loop over a certain number of
frames, then schedule a taskqueue to service whatever's left over.


Taskqueue's shouldn't be used anymore.  We've got ithreads now.
Contrary to popular belief (and due to poor documentation) an
ithread does not run at interrupt level.  Only the fast interrupt
handler does that.  The ithread is a normal kernel thread tied to
an fast interrupt handler and trailing it whenever it said
INTR_SCHEDULE_ITHREAD.


I've also had to do some proactive frame dropping at the driver layer
when under ridiculous levels of RX load, but that's a different story.


That's the CoDel stuff I mentioned where it tries to drop frames in
a fair manner.


I wonder how many drivers break out of an RX loop after a fixed amount of time..


Most.  The problem seems to be that the taskqueue runs at high priority
effectively starving out other threads/processes again.

--
Andre

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD boxes as a 'router'...

2012-11-21 Thread Andre Oppermann

On 21.11.2012 09:04, Luigi Rizzo wrote:

On Tue, Nov 20, 2012 at 11:32 PM, Andre Oppermann  wrote:


On 21.11.2012 03:16, khatfi...@socllc.net wrote:


I may be misstating.

Specifically under high burst floods either routed or being dropped by pf
we would see the system
go unresponsive to user-level applications / SSH for example.

The system would still function but it was inaccessible. To clarify as
well this was any number
of floods or attacks to any ports, the behavior remained. (These were not
SSH ports being hit)



I'm working on a hybrid interrupt/polling with live-lock prevention
scheme in my svn branch.  It works with a combination of disabling
interrupts in interrupt context and then having an ithread loop over
the RX DMA queue until it reaches the hardware and is done.  Only
then interrupts are re-enabled again.  On a busy system it may never
go back to interrupt.  To prevent live-lock the ithread gives up the
CPU after a normal quantum to let other threads/processes run as well.
After that it gets immediately re-scheduled again with a sufficient
high priority not get starved out by userspace.



very cool. this seems similar to NAPI.


I've heard about NAPI but haven't looked at it.  So I'm not sure how it
works internally.  In this case no special logic or kernel API support
is required.  Every driver can be converted and immediately use it.


The only adjustment i'd suggest to your scheme, if possible, is to add
some control (as it existed in the old polling architecture) to make sure
that userspace is not starved by the ithreads and other kernel tasks
(otherwise you can have livelock, as it happens with NAPI).
I am afraid that simple priorities do not work, you either need some
kind of fair scheduler, or put some hard limit on the cpu fraction used
by the kernel tasks when there are userspace processes around.


That's the quantum stuff I talked about.  When the ithread has used up
its CPU quantum it gives up the CPU but schedules itself again at the
same time.  Also contrary to its name the ithread does not run in
interrupt context but as a normal kernel thread with elevated priority.

I have to thank Attilio though for a detailed explanation of what is
going on behind the scenes in the interrupt and scheduler code to get
a grip at the optimal approach for network drivers.  The bus_setup_intr(9)
and ithread documentation is very poor and misleading at the moment.
I intend to fix that soon.

--
Andre

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD boxes as a 'router'...

2012-11-21 Thread Luigi Rizzo
On Wed, Nov 21, 2012 at 12:36 AM, Andre Oppermann  wrote:

> On 21.11.2012 09:04, Luigi Rizzo wrote:
>
>> On Tue, Nov 20, 2012 at 11:32 PM, Andre Oppermann 
>> wrote:
>>
>>> ...
>>
>> very cool. this seems similar to NAPI.
>>
>
> I've heard about NAPI but haven't looked at it.  So I'm not sure how it
>

have a look at some of the intel nic drivers in linux, they are very similar
to the bsd ones so you can easily compare.
NAPI is basically the same thing that you describe,
there is no special logic and the only
kernel support is (i believe) some device independent code that probably
you'll end up writing too after the first 2-3 drivers, to avoid boring
replications.

works internally.  In this case no special logic or kernel API support
> is required.  Every driver can be converted and immediately use it.
>
>
>  The only adjustment i'd suggest to your scheme, if possible, is to add
>> some control (as it existed in the old polling architecture) to make sure
>> that userspace is not starved by the ithreads and other kernel tasks
>> (otherwise you can have livelock, as it happens with NAPI).
>> I am afraid that simple priorities do not work, you either need some
>> kind of fair scheduler, or put some hard limit on the cpu fraction used
>> by the kernel tasks when there are userspace processes around.
>>
>
> That's the quantum stuff I talked about.  When the ithread has used up
> its CPU quantum it gives up the CPU but schedules itself again at the
> same time.  Also contrary to its name the ithread does not run in
> interrupt context but as a normal kernel thread with elevated priority.
>

it is the elevated priority that worries me, because it has potential
to preempt userspace and cause livelock.
Again, lacking a proper fair process scheduler, i think the only reliable
way is
to try and track the (approximate) cpu cycles consumed overall by the
various ithreads over, say, the current tick, and once you go above
the threshold drop the priority of those threads just above IDLEPRI.
Then when the next tick arrives you raise the priorities again.

Compared to when i did polling in 2000 (when i wanted to support the i486
soekris),
now we can probably assume that a cheap timecounter is available on
all architectures (even ARM and MIPS should have some tsc-like thing, right
?)
and the cycles used vs cycles per tick can be accounted with a relatively
fine grain.

cheers
luigi
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD boxes as a 'router'...

2012-11-21 Thread Andre Oppermann

On 21.11.2012 09:53, Luigi Rizzo wrote:

On Wed, Nov 21, 2012 at 12:36 AM, Andre Oppermann  wrote:

On 21.11.2012 09:04, Luigi Rizzo wrote:
  The only adjustment i'd suggest to your scheme, if possible, is to add

some control (as it existed in the old polling architecture) to make sure
that userspace is not starved by the ithreads and other kernel tasks
(otherwise you can have livelock, as it happens with NAPI).
I am afraid that simple priorities do not work, you either need some
kind of fair scheduler, or put some hard limit on the cpu fraction used
by the kernel tasks when there are userspace processes around.



That's the quantum stuff I talked about.  When the ithread has used up
its CPU quantum it gives up the CPU but schedules itself again at the
same time.  Also contrary to its name the ithread does not run in
interrupt context but as a normal kernel thread with elevated priority.



it is the elevated priority that worries me, because it has potential
to preempt userspace and cause livelock.
Again, lacking a proper fair process scheduler, i think the only reliable
way is
to try and track the (approximate) cpu cycles consumed overall by the
various ithreads over, say, the current tick, and once you go above
the threshold drop the priority of those threads just above IDLEPRI.
Then when the next tick arrives you raise the priorities again.


That is what happens.  The first time the ithread is scheduled it is
run at high priority.  When it yields after consuming its quantum it
drops much lower but above heavily nice'd processes.  Have to verify
the exact level though.


Compared to when i did polling in 2000 (when i wanted to support the i486
soekris),
now we can probably assume that a cheap timecounter is available on
all architectures (even ARM and MIPS should have some tsc-like thing, right
?)
and the cycles used vs cycles per tick can be accounted with a relatively
fine grain.


I hope to use as much existing kernel infrastructure as possible and
to keep it simple.  I certainly don't want to do cross-core load analysis.

--
Andre

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Looking for bge(4) , bce(4) and igb(4) cards

2012-11-21 Thread Mark Saad
Andre
  I'll try to do it today or next monday when I get back from vacation . They 
are all hp branded nic's . I ordered them with in the last few years to use in 
place of bce nic's on the main boards of hp servers . 

---


On Nov 14, 2012, at 5:31 AM, Andre Oppermann  wrote:

> Hello
> 
> I currently working on a number of drivers for popular network
> cards and extend them with automatic hybrid interrupt/polling
> ithread processing with life-lock prevention (so that the driver
> can't consume all CPU when under heavy load or attack).
> 
> To properly test this I need the proper hardware as PCIe network
> cards:
> 
> bge(4) Broadcom BCM57xx/BCM590x
> bce(4) Broadcom NetXtreme II (BCM5706/5708/5709/5716)
> igb(4) Intel PRO/1000 i82575, i82576, i82580, i210, i350
> 
> If you have one of these and can spare it I'd be very glad if
> you could send it to me.  I'm located in Switzerland/Europe.
> I can reply to you privately to give you my shipping address.
> 
> 
> Of course if you have any other PCIe Gigabit Ethernet cards
> with a driver in FreeBSD I'm interested in receiving one as
> well.  Of particular interest are:
> 
> em(4)  Intel i82571 to i82573
> lem(4) Intel i82540 to i82546
> age(4) Atheros L1 GigE
> ???anything else 1GigE with PCIe
> 
> 
> The same goes for 10 Gigabit Ethernet but the setup is a bit
> more involved and I haven't done that yet, but will do soon
> (the issue being expensive SPF+ optics):
> 
> bxe(4)   Broadcom BCM5771x 10GigE
> cxbge(4) Chelsio T4 10GigE
> ixgbe(4) Intel i82598 and i82599 10GigE
> mxge(4)  Myricom Myri10G
> qlxgb(4) QLogic 3200 and 8200 10GigE
> sfxge(4) Solarflare
> 
> Many thanks for your support!
> 
> -- 
> Andre
> ___
> freebsd-sta...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Low Bandwidth on intercontinental connections

2012-11-21 Thread Marc Peters
Hi list,

we are experiencing low throughput on interncontinental connections with
our FreeBSD Servers. We made several tests and are wondering, why this
would be. The first tests were on an IPSEC VPN between our datacenter in
DE and Santa Clara, CA. We are connected with two gigabit uplinks in
each DC. Pushing data by scp between our FreeBSD servers takes ages.
Starting with several MB/s it drops to 60-70KB/s:

[root@freebsd ~]# ls -alh test.tgz
-rw-r-  1 root  wheel58M Oct  5  2010 test.tgz
[root@freebsd ~]# scp test.tgz 172.16.3.10:.
Password:
test.tgz   28%   17MB  75.3KB/s   09:32 ETA


For comparision, we did a similiar test with Linux, which didn't show
this behaviour:

root@linux:~# scp jdk-6u33-linux-x64.bin 172.16.4.50:
root@172.16.4.50's password:
jdk-6u33-linux-x64.bin 100%
  69MB   3.4MB/s   00:20
root@linux:~#


Otherwise, the servers are really fast, when copying data to a machine
nearby:

[root@freebsd ~]# ls -alh test
-rw-r--r--  1 root  wheel 1G Nov 21 13:43 test
[root@freebsd ~]# scp test 172.16.3.11:
Password:
test 100% 1000MB  38.5MB/s   00:26


Intercontinental ftp downloads are the same:

[root@freebsd ~]# fetch
ftp://ftp1.us.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-bootonly.iso
FreeBSD-9.1-RC3-amd64-bootonly.iso   100% of  146 MB   46 MBps

[root@freebsd ~]# fetch
ftp://ftp1.us.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-disc1.iso

FreeBSD-9.1-RC3-amd64-disc1.iso 100% of  685 MB   36 MBps 00m00s

[root@freebsd ~]# fetch
ftp://ftp1.de.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-disc1.iso
FreeBSD-9.1-RC3-amd64-disc1.iso 0% of  685 MB   13 kBps 14h49m^C


Linux:

root@linux:~# wget
ftp://ftp1.de.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-disc1.iso
--2012-11-21 15:07:57--
ftp://ftp1.de.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-disc1.iso
   => `FreeBSD-9.1-RC3-amd64-disc1.iso'
Resolving ftp1.de.freebsd.org... 137.226.34.42
Connecting to ftp1.de.freebsd.org|137.226.34.42|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1)
/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1 ... done.
==> SIZE FreeBSD-9.1-RC3-amd64-disc1.iso ... 718800896
==> PASV ... done.==> RETR FreeBSD-9.1-RC3-amd64-disc1.iso ... done.
Length: 718800896 (686M) (unauthoritative)

100%[=>]
718,800,896 19.1M/s   in 61s

2012-11-21 15:09:01 (11.2 MB/s) - `FreeBSD-9.1-RC3-amd64-disc1.iso'
saved [718800896]


Doing some googling brought up a lot of tuning hints, but nothing worked
for us. We tweaked some sysctls:

kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_inc=16384
net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.hostcache.expire=1

but to no good. Disabling MSI and TSO4 for the card didn't change
anything, too.

The machines are all HP DL360G7 with bce cards (find dmesg, ifconfig and
pciconf -lvc at the end of this mail).

Can someone hit me with a cluestick to get the BSDs on speed?

marc

PS: The version is FreeBSD-RC2 amd64, because we need the patch for
process migration on the CPUs which didn't make it 9.0 or an errata, as
we were the only ones, hitting this bug (so kib@ said).

ifconfig:
[root@freebsd ~]# ifconfig
bce0: flags=8843 metric 0 mtu 1500

options=c01bb
ether ac:16:2d:b7:00:f4
inet 172.16.3.10 netmask 0xff00 broadcast 172.16.3.255
inet6 fe80::ae16:2dff:feb7:f4%bce0 prefixlen 64 scopeid 0x1
nd6 options=29
media: Ethernet autoselect (1000baseT )
status: active
bce1: flags=8843 metric 0 mtu 1500

options=c01bb
ether ac:16:2d:b7:00:f6
inet 172.17.3.10 netmask 0xf800 broadcast 172.17.7.255
inet6 fe80::ae16:2dff:feb7:f6%bce1 prefixlen 64 scopeid 0x2
nd6 options=29
media: Ethernet autoselect (1000baseT )
status: active
bce2: flags=8802 metric 0 mtu 1500

options=c01bb
ether ac:16:2d:b7:00:fc
nd6 options=29
media: Ethernet autoselect
bce3: flags=8802 metric 0 mtu 1500

options=c01bb
ether ac:16:2d:b7:00:fe
nd6 options=29
media: Ethernet autoselect
lo0: flags=8049 metric 0 mtu 16384
options=63
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0xb
inet 127.0.0.1 netmask 0xff00
nd6 options=21

pciconf -lvc:
[root@freebsd ~]# pciconf -lvc
hostb0@pci0:0:0:0:  class=0x06 card=0x330b103c chip=0x34068086
rev=0x13 hdr=0x00
vendor = 'Intel Corporation'
device = '5520 I/O Hub to ESI Port'
class  = bridge
subclass   = H

Re: Low Bandwidth on intercontinental connections

2012-11-21 Thread Mehmet Erol Sanliturk
On Wed, Nov 21, 2012 at 7:41 AM, Marc Peters  wrote:

> Hi list,
>
> we are experiencing low throughput on interncontinental connections with
> our FreeBSD Servers. We made several tests and are wondering, why this
> would be. The first tests were on an IPSEC VPN between our datacenter in
> DE and Santa Clara, CA. We are connected with two gigabit uplinks in
> each DC. Pushing data by scp between our FreeBSD servers takes ages.
> Starting with several MB/s it drops to 60-70KB/s:
>



.


I do not have any answer to your question , but I want to share one my
experiences .

I Linux ( KDE ) I was copying a hard disk contents to another drive by
using Dolphin .
At the beginning it was very fast , but over time its speed reduced to a
few kilobytes per second .
It listed completion time left as months .

I inspected why this is the case .

The reason was the following :

On each file it is copied , the Dolphin was producing approximately 1
Kilobyte  memory leak .
After copying more than one million file , all of the memory exhausted and
it started to swap
memory to hard disk swap space which reduced copy speed to a few kilobytes
per second .


I stopped the Dolphin and copied small directory groups by restarting the
Dolphin . This cured the problem because on each exit , all of the leaked
memory by Dolphin has been disposed ( where "Undo" item of Dolphin menu was
disabled means memory is not reserved for undo ).


Please study your data transfer software for such a possibility . It may
not be problematic in Linux but FreeBSD version may have some trouble
points .


There is another possibility : Graceful degradation .

http://en.wikipedia.org/wiki/Graceful_degradation
http://en.wikipedia.org/wiki/Fail_soft

A program part may produce graceful degradation over time or processed data
:

For example , assume a list is searched by sequentially . When list length
grows , search times
also grows linearly and produces a degradation although there is no any
error in the process .

You may study your system with respect to such a process .


These are the possibilities which come to my mind .


Thank you very much .

Mehmet Erol Sanliturk
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Low Bandwidth on intercontinental connections

2012-11-21 Thread Benjamin Villain
I don't think this is about disk or memory leak as transfering files locally  
seem to work fine.


Can you test transferring files from (and to) your Linux boxes to (and from) the  
FreeBSD servers to check that it is not a network issue inside your DCs.


King regards,

--
Ben

Mehmet Erol Sanliturk writes:


On Wed, Nov 21, 2012 at 7:41 AM, Marc Peters  wrote:

> Hi list,
>
> we are experiencing low throughput on interncontinental connections with
> our FreeBSD Servers. We made several tests and are wondering, why this
> would be. The first tests were on an IPSEC VPN between our datacenter in
> DE and Santa Clara, CA. We are connected with two gigabit uplinks in
> each DC. Pushing data by scp between our FreeBSD servers takes ages.
> Starting with several MB/s it drops to 60-70KB/s:
>



.


I do not have any answer to your question , but I want to share one my
experiences .

I Linux ( KDE ) I was copying a hard disk contents to another drive by
using Dolphin .
At the beginning it was very fast , but over time its speed reduced to a
few kilobytes per second .
It listed completion time left as months .

I inspected why this is the case .

The reason was the following :

On each file it is copied , the Dolphin was producing approximately 1
Kilobyte  memory leak .
After copying more than one million file , all of the memory exhausted and
it started to swap
memory to hard disk swap space which reduced copy speed to a few kilobytes
per second .


I stopped the Dolphin and copied small directory groups by restarting the
Dolphin . This cured the problem because on each exit , all of the leaked
memory by Dolphin has been disposed ( where "Undo" item of Dolphin menu was
disabled means memory is not reserved for undo ).


Please study your data transfer software for such a possibility . It may
not be problematic in Linux but FreeBSD version may have some trouble
points .


There is another possibility : Graceful degradation .

http://en.wikipedia.org/wiki/Graceful_degradation
http://en.wikipedia.org/wiki/Fail_soft

A program part may produce graceful degradation over time or processed data
:

For example , assume a list is searched by sequentially . When list length
grows , search times
also grows linearly and produces a degradation although there is no any
error in the process .

You may study your system with respect to such a process .


These are the possibilities which come to my mind .


Thank you very much .

Mehmet Erol Sanliturk
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: igb diver crashes in head@241037

2012-11-21 Thread Karim Fodil-Lemelin

Hi Gleb,

On 21/11/2012 1:26 AM, Gleb Smirnoff wrote:

   Jack,

On Tue, Nov 20, 2012 at 09:19:54AM -0800, Jack Vogel wrote:
J> > I'd suggest the following code:
J> >
J> > if (m)
J> > drbr_enqueue(ifp, txr->br, m);
J> > err = igb_mq_start_locked(ifp, txr, NULL);
J> >
J> > Which eventually leads us to all invocations of igb_mq_start_locked()
J> > called
J> > with third argument as NULL. This allows us to simplify this function.
J> >
J> > Patch for review attached.
J> >
J> >
J> Yes Gleb, I already have code in my internal tree which simply removes an
J> mbuf
J> pointer form the start_locked call and ALWAYS does a dequeue, start
J> similarly
J> will always enqueue. I just have been busy with ixgbe for a bit and have
J> not gotten
J> it committed yet.

   Since ixgbe work is performance tuning and this patch closes a kernel crash,
I'd ask to preempt the ixgbe job with this patch. :)

   Or you can approve my patch and I will check it in.

What about protecting the em driver from the same out of order problem 
that was fixed in igb?


Wouldn't make sense to commit a fix for both drivers?

Thanks,

Karim.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Low Bandwidth on intercontinental connections

2012-11-21 Thread Kevin Oberman
On Wed, Nov 21, 2012 at 8:58 AM, Benjamin Villain
 wrote:
> I don't think this is about disk or memory leak as transfering files locally
> seem to work fine.
>
> Can you test transferring files from (and to) your Linux boxes to (and from)
> the FreeBSD servers to check that it is not a network issue inside your DCs.
>
> King regards,
>
> --
> Ben
>
>
> Mehmet Erol Sanliturk writes:
>
>> On Wed, Nov 21, 2012 at 7:41 AM, Marc Peters  wrote:
>>
>> > Hi list,
>> >
>> > we are experiencing low throughput on interncontinental connections with
>> > our FreeBSD Servers. We made several tests and are wondering, why this
>> > would be. The first tests were on an IPSEC VPN between our datacenter in
>> > DE and Santa Clara, CA. We are connected with two gigabit uplinks in
>> > each DC. Pushing data by scp between our FreeBSD servers takes ages.
>> > Starting with several MB/s it drops to 60-70KB/s:
>> >
>>
>>
>>
>> .
>>
>>
>> I do not have any answer to your question , but I want to share one my
>> experiences .
>>
>> I Linux ( KDE ) I was copying a hard disk contents to another drive by
>> using Dolphin .
>> At the beginning it was very fast , but over time its speed reduced to a
>> few kilobytes per second .
>> It listed completion time left as months .
>>
>> I inspected why this is the case .
>>
>> The reason was the following :
>>
>> On each file it is copied , the Dolphin was producing approximately 1
>> Kilobyte  memory leak .
>> After copying more than one million file , all of the memory exhausted and
>> it started to swap
>> memory to hard disk swap space which reduced copy speed to a few kilobytes
>> per second .
>>
>>
>> I stopped the Dolphin and copied small directory groups by restarting the
>> Dolphin . This cured the problem because on each exit , all of the leaked
>> memory by Dolphin has been disposed ( where "Undo" item of Dolphin menu
>> was
>> disabled means memory is not reserved for undo ).
>>
>>
>> Please study your data transfer software for such a possibility . It may
>> not be problematic in Linux but FreeBSD version may have some trouble
>> points .
>>
>>
>> There is another possibility : Graceful degradation .
>>
>> http://en.wikipedia.org/wiki/Graceful_degradation
>> http://en.wikipedia.org/wiki/Fail_soft
>>
>> A program part may produce graceful degradation over time or processed
>> data
>> :
>>
>> For example , assume a list is searched by sequentially . When list length
>> grows , search times
>> also grows linearly and produces a degradation although there is no any
>> error in the process .
>>
>> You may study your system with respect to such a process .
>>
>>
>> These are the possibilities which come to my mind .

If you have not done so, I suggest you use SIFTR to capture data on
what is happening in TCP. It can often tell you a great deal and is
very easy to work with. Just load the kernel module and use sysctls to
control it. I have used it in conjunction with tcpdump and wireshark
to find performance problems.

Also, for high performance on bulk data transfers over long, fat
pipes, take a look at http://fasterdata.es.net. It is a detailed guide
on moving data developed by the people who have to deal with the huge
volumes of Large Hadron Collider data moving across the Atlantic from
CERN to researchers in the US. (Note that this is not FreeBSD
specific.)
-- 
R. Kevin Oberman, Network Engineer
E-mail: kob6...@gmail.com
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Low Bandwidth on intercontinental connections

2012-11-21 Thread Marc Peters
On 11/21/2012 05:58 PM, Benjamin Villain wrote:
> I don't think this is about disk or memory leak as transfering files
> locally seem to work fine.
> 
> Can you test transferring files from (and to) your Linux boxes to (and
> from) the FreeBSD servers to check that it is not a network issue inside
> your DCs.
> 
> King regards,
> 
> -- 
> Ben

Hi Ben,

i don't think this is memory related, too. We used plain CLI scp ot ftp
from base, both times.

Here is the requested data:

Linux ---> FreeBSD:

root@linux:~# scp jdk-6u33-linux-x64.bin 172.16.3.10:
Password:
jdk-6u33-linux-x64.bin 89%   61MB  59.0KB/s

FreeBSD ---> Linux:

[root@freebsd ~]# scp test.tgz 172.16.4.50:
Password:
test.tgz   100%   59MB   1.1MB/s   00:55
[root@freebsd ~]#

>From BSD to Linux is not as fast as L <--> L.

I don't think, this is network related in some sort.

Marc


> 
> Mehmet Erol Sanliturk writes:
> 
>> On Wed, Nov 21, 2012 at 7:41 AM, Marc Peters  wrote:
>>
>> > Hi list,
>> >
>> > we are experiencing low throughput on interncontinental connections
>> with
>> > our FreeBSD Servers. We made several tests and are wondering, why this
>> > would be. The first tests were on an IPSEC VPN between our
>> datacenter in
>> > DE and Santa Clara, CA. We are connected with two gigabit uplinks in
>> > each DC. Pushing data by scp between our FreeBSD servers takes ages.
>> > Starting with several MB/s it drops to 60-70KB/s:
>> >
>>
>>
>>
>> .
>>
>>
>> I do not have any answer to your question , but I want to share one my
>> experiences .
>>
>> I Linux ( KDE ) I was copying a hard disk contents to another drive by
>> using Dolphin .
>> At the beginning it was very fast , but over time its speed reduced to a
>> few kilobytes per second .
>> It listed completion time left as months .
>>
>> I inspected why this is the case .
>>
>> The reason was the following :
>>
>> On each file it is copied , the Dolphin was producing approximately 1
>> Kilobyte  memory leak .
>> After copying more than one million file , all of the memory exhausted
>> and
>> it started to swap
>> memory to hard disk swap space which reduced copy speed to a few
>> kilobytes
>> per second .
>>
>>
>> I stopped the Dolphin and copied small directory groups by restarting the
>> Dolphin . This cured the problem because on each exit , all of the leaked
>> memory by Dolphin has been disposed ( where "Undo" item of Dolphin
>> menu was
>> disabled means memory is not reserved for undo ).
>>
>>
>> Please study your data transfer software for such a possibility . It may
>> not be problematic in Linux but FreeBSD version may have some trouble
>> points .
>>
>>
>> There is another possibility : Graceful degradation .
>>
>> http://en.wikipedia.org/wiki/Graceful_degradation
>> http://en.wikipedia.org/wiki/Fail_soft
>>
>> A program part may produce graceful degradation over time or processed
>> data
>> :
>>
>> For example , assume a list is searched by sequentially . When list
>> length
>> grows , search times
>> also grows linearly and produces a degradation although there is no any
>> error in the process .
>>
>> You may study your system with respect to such a process .
>>
>>
>> These are the possibilities which come to my mind .
>>
>>
>> Thank you very much .
>>
>> Mehmet Erol Sanliturk
>> ___
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD boxes as a 'router'...

2012-11-21 Thread Barney Cordoba


--- On Wed, 11/21/12, John Fretby  wrote:

> From: John Fretby 
> Subject: Re: FreeBSD boxes as a 'router'...
> To: "Victor Balada Diaz" 
> Cc: freebsd-...@freebsd.org
> Date: Wednesday, November 21, 2012, 11:40 AM
> On 21 November 2012 14:57, Victor
> Balada Diaz 
> wrote:
> 
> 
> > I think you forgot to CC the list. I'll add it so you
> can get
> > more answers.
> >
> 
> I did forget, thanks for that! :)
> 
> 
> > em(4) and igb(4) are both drivers for Intel NICs. They
> just have
> > different capabilities. The sysctl you're asking for
> controls behavior
> > of adaptive interrupt moderation. It's a recommended
> tuning for end hosts
> > more than routers. You can read more about interrupt
> moderation on this
> > document:
> >
> > http://www.intel.com/design/network/applnots/ap450.htm
> >
> > em(4) NICs don't have all the capabilities of igb(4)
> ones. Some em(4) NICs
> > have
> > interrupt moderation (eg: 82574L) but not all of them
> do. If your em(4)
> > card does
> > have interrupt moderation you can tune it with:
> >
> > hw.em.rx_int_delay
> > hw.em.rx_abs_int_delay
> > hw.em.tx_int_delay
> > hw.em.tx_abs_int_delay
> >
> > Exchanging latency to get more throughput.
> >
> > You can take a look at this document explaining
> capabilities of different
> > NICs:
> >
> >
> > http://www.intel.com/content/dam/doc/brochure/ethernet-controllers-phys-brochure.pdf
> >
> > You should ask supermicro what's the exact model
> they'll put on your server
> > and then decide if it's OK for you.
> 
> 
> They are apparently:
> 
> em0:  port
> 0xf020-0xf03f mem
> 0xdfa0-0xdfa1,0xdfa25000-0xdfa25fff irq 20 at device
> 25.0 on pci0
> em0: Using an MSI interrupt
> ...
> em0: flags=8c02
> metric 0 mtu 1500
> options=4219b
> 
> 
> > About the interrupt storm: We've had various interrupt
> storms that were
> > caused by
> > different problems. The most common was a software bug
> with interrupts.
> > After
> > reporting on the lists it was fixed and we didn't have
> problems again.
> >
> > If you have a problem with high interrupts because too
> many small packets
> > (eg a DoS),
> > getting a card with interrupt moderation should help a
> lot. Most probably
> > your problem
> > with interrupt storms was caused by something else like
> a shared interrupt
> > with other
> > device or software bug. Without more analysis it's
> impossible to really
> > say.
> >
> 
> I have some details from when it happened - it doesn't look
> like it was a
> shared interrupt issue - it just literally looks like the
> host came up,
> with a stampeding hurd of "other" hosts hitting it for
> services that
> weren't yet running, and it folded :(
> 
> That's why I was wondering if there was a similar sysctl for
> the em driver
> - in order to raise the number of interrupts the system
> allows, before
> declaring it "a storm".
> 
> 
> >
> > Keep in mind that i'm not an expert on this area, so
> you might get better
> > answers
> > on frebsd-net@ :)
> >
> > Hope it helps.
> >
> 
> It has - half the problem is there are *so* many options,
> combinations -
> and no matter what you pick, if you look them up enough
> you'll find someone
> finding fault with them, or casting doubts on their
> performance.
> 
> Doesn't really help when all you want is something that has
> a good chance
> of "working" :)
> 

The road to mediocrity is listening to people who are not experts. The
admission that you dont understand something is a good disclosure. But the
argument than understanding how to tune a system is too complicated only
means that I don't want to listen to what you have to say. 

Polling implies that there are unnatural intervals between processing 
packets. So you're introducing delay into your network. The more often
you poll, the less the gap. 

Suppose you poll every 1ms. Each received packet will have from 0-1ms
of delay. Packets that are arriving as you ended you last poll will be
delayed 1ms. 

The more knobs you have, the  more you can determine what the system 
can do. An interrupt for every packet would have the least delay, but
that's not practical for systems managing 100s of 1000s of pps. You make
trade offs between delays and cpu usage. On a small network with a fast
cpu, you can process every packet. On a very large network you cant.

The idea that there is one way to tune a system for every environment is
simply wrong. A bridge/filter has completely different requirements than
a "router". And a "server" which only uses 1 NIC has different requirements
than a system that forwards traffic and has to manage more locks at the
hardware level.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Low Bandwidth on intercontinental connections

2012-11-21 Thread Mehmet Erol Sanliturk
On Wed, Nov 21, 2012 at 9:20 AM, Kevin Oberman  wrote:

> On Wed, Nov 21, 2012 at 8:58 AM, Benjamin Villain
>  wrote:
> > I don't think this is about disk or memory leak as transfering files
> locally
> > seem to work fine.
> >
> > Can you test transferring files from (and to) your Linux boxes to (and
> from)
> > the FreeBSD servers to check that it is not a network issue inside your
> DCs.
> >
> > King regards,
> >
> > --
> > Ben
> >
> >
> > Mehmet Erol Sanliturk writes:
> >
> >> On Wed, Nov 21, 2012 at 7:41 AM, Marc Peters  wrote:
> >>
> >> > Hi list,
> >> >
> >> > we are experiencing low throughput on interncontinental connections
> with
> >> > our FreeBSD Servers. We made several tests and are wondering, why this
> >> > would be. The first tests were on an IPSEC VPN between our datacenter
> in
> >> > DE and Santa Clara, CA. We are connected with two gigabit uplinks in
> >> > each DC. Pushing data by scp between our FreeBSD servers takes ages.
> >> > Starting with several MB/s it drops to 60-70KB/s:
> >> >
> >>
> >>
> >>
> >> .
> >>
> >>
> >> I do not have any answer to your question , but I want to share one my
> >> experiences .
> >>
> >> I Linux ( KDE ) I was copying a hard disk contents to another drive by
> >> using Dolphin .
> >> At the beginning it was very fast , but over time its speed reduced to a
> >> few kilobytes per second .
> >> It listed completion time left as months .
> >>
> >> I inspected why this is the case .
> >>
> >> The reason was the following :
> >>
> >> On each file it is copied , the Dolphin was producing approximately 1
> >> Kilobyte  memory leak .
> >> After copying more than one million file , all of the memory exhausted
> and
> >> it started to swap
> >> memory to hard disk swap space which reduced copy speed to a few
> kilobytes
> >> per second .
> >>
> >>
> >> I stopped the Dolphin and copied small directory groups by restarting
> the
> >> Dolphin . This cured the problem because on each exit , all of the
> leaked
> >> memory by Dolphin has been disposed ( where "Undo" item of Dolphin menu
> >> was
> >> disabled means memory is not reserved for undo ).
> >>
> >>
> >> Please study your data transfer software for such a possibility . It may
> >> not be problematic in Linux but FreeBSD version may have some trouble
> >> points .
> >>
> >>
> >> There is another possibility : Graceful degradation .
> >>
> >> http://en.wikipedia.org/wiki/Graceful_degradation
> >> http://en.wikipedia.org/wiki/Fail_soft
> >>
> >> A program part may produce graceful degradation over time or processed
> >> data
> >> :
> >>
> >> For example , assume a list is searched by sequentially . When list
> length
> >> grows , search times
> >> also grows linearly and produces a degradation although there is no any
> >> error in the process .
> >>
> >> You may study your system with respect to such a process .
> >>
> >>
> >> These are the possibilities which come to my mind .
>
> If you have not done so, I suggest you use SIFTR to capture data on
> what is happening in TCP. It can often tell you a great deal and is
> very easy to work with. Just load the kernel module and use sysctls to
> control it. I have used it in conjunction with tcpdump and wireshark
> to find performance problems.
>
> Also, for high performance on bulk data transfers over long, fat
> pipes, take a look at http://fasterdata.es.net. It is a detailed guide
> on moving data developed by the people who have to deal with the huge
> volumes of Large Hadron Collider data moving across the Atlantic from
> CERN to researchers in the US. (Note that this is not FreeBSD
> specific.)
> --
> R. Kevin Oberman, Network Engineer
> E-mail: kob6...@gmail.com
>

A very good link .

In the above site , please see the following especially :

http://fasterdata.es.net/data-transfer-tools/say-no-to-scp/
Say No to scp
Why you should avoid scp over a WAN

and

http://fasterdata.es.net/data-transfer-tools/scp-and-sftp/
scp and sftp


Thank you very much .


Mehmet Erol Sanliturk
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Low Bandwidth on intercontinental connections

2012-11-21 Thread Ingo Flaschberger

Am 21.11.2012 18:32, schrieb Marc Peters:

Hi Ben,

i don't think this is memory related, too. We used plain CLI scp ot ftp
from base, both times.

Here is the requested data:

Linux ---> FreeBSD:

root@linux:~# scp jdk-6u33-linux-x64.bin 172.16.3.10:
Password:
jdk-6u33-linux-x64.bin 89%   61MB  59.0KB/s

FreeBSD ---> Linux:

[root@freebsd ~]# scp test.tgz 172.16.4.50:
Password:
test.tgz   100%   59MB   1.1MB/s   00:55
[root@freebsd ~]#

 From BSD to Linux is not as fast as L <--> L.

I don't think, this is network related in some sort.


hm - sounds like a duplex problem?

*) whats the distance between Linux and FreeBSD box?
*) check network counters:
   linux: ifconfig
   FreeBSD: netstat -nia
 look for errors
   check switches between (or as far as possible) for full duplex, also 
FreeBSD (ifconfig)

*) check and compare tcpdump

Kind regards,
   Ingo Flaschberger

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


net.inet6.icmp6.nd6_useloopback - what is it supposed to do?

2012-11-21 Thread Garrett Cooper
Hi,
I've been TAHI testing FreeBSD 7.x sources for the past couple
months and over the course of my testing via the TAHI IPv6 conformance
test, I changed the knob value from net.inet6.icmp6.nd6_useloopback=1
-> net.inet6.icmp6.nd6_useloopback=0 and ran into a slew of errors
with the addr.p2 phase-1 TAHI tests.
I was wondering if someone could describe what the beforementioned
sysctl is supposed to do from a functional perspective, how it might
tie into other IPv6 RFCs (if applicable), and if disabled how it would
affect a system with IPv6 enabled.
Thanks,
-Garrett
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD boxes as a 'router'...

2012-11-21 Thread Adrian Chadd
On 21 November 2012 00:30, Andre Oppermann  wrote:
> On 21.11.2012 08:55, Adrian Chadd wrote:
>>
>> Something that has popped up a few times, even recently, is breaking
>> out of an RX loop after you service a number of frames.
>
> That is what I basically described.

Right, and this can be done right now without too much reworking,
right? I mean, people could begin by doing a drive-by on drivers for
this.
The RX path for a driver shouldn't be too difficult to do; the TX path
is the racy one.

>> During stupidly high levels of RX, you may find the NIC happily
>> receiving frames faster than you can service the RX queue. If this
>> occurs, you could end up just plain being stuck there.

> That's the live-lock.

And again you can solve this without having to devolve into polling.
Again, polling to me feels like a bludgeon beating around a system
that isn't really designed for the extreme cases it's facing.
Maybe your work in the tcp_taskqueue branch addresses the larger scale
issues here, but I've solved this relatively easily in the past.

>> So what I've done in the past is to loop over a certain number of
>> frames, then schedule a taskqueue to service whatever's left over.

> Taskqueue's shouldn't be used anymore.  We've got ithreads now.
> Contrary to popular belief (and due to poor documentation) an
> ithread does not run at interrupt level.  Only the fast interrupt
> handler does that.  The ithread is a normal kernel thread tied to
> an fast interrupt handler and trailing it whenever it said
> INTR_SCHEDULE_ITHREAD.

Sure, but taskqueues are still useful if you want to serialise access
without relying on mutexes wrapping large parts of the packet handling
code to enforce said order.

Yes, normal ithreads don't run at interrupt level.

And we can change the priority of taskqueues in each driver, right?
And/or we could change the behaviour of driver ithreads/taskqueues to
be automatically reniced?

I'm not knocking your work here, I'm just trying to understand whether
we can do this stuff as small individual pieces of work rather than
one big subsystem overhaul.

And CoDel is interesting as a concept, but it's certainly not new. But
again, if you don't drop the frames during the driver receive path
(and try to do it higher up in the stack, eg as part of some firewall
rule) you still risk reaching a stable state where the CPU is 100%
pinned because you've wasted cycles pushing those frames into the
queue only to be dropped.

What _I_ had to do there was have a quick gate to look up if a frame
was part of an active session in ipfw and if it was, let it be queued
to the driver. I also had a second gate in the driver for new TCP
connections, but that was a separate hack. Anything else was dropped.

In any case, what I'm trying to say is this - when I was last doing
this kind of stuff, I didn't just subscribe to "polling will fix all."
I spent a few months knee deep in the public intel e1000 documentation
and tuning guide, the em driver and the queue/firewall code, in order
to figure out how to attack this without using polling.

And yes, you've also just described NAPI. :-)




Adrian
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Low Bandwidth on intercontinental connections

2012-11-21 Thread Adrian Chadd
Hi!

Firstly - please file a PR.

Secondly - there's some great tcp counters in 'netstat -sp tcp' (and
ip, and udp, and icmp.) You can zero them; netstat -sp tcp -z. I
suggest dumping them before/after and compare the values.



Adrian

On 21 November 2012 07:41, Marc Peters  wrote:
> Hi list,
>
> we are experiencing low throughput on interncontinental connections with
> our FreeBSD Servers. We made several tests and are wondering, why this
> would be. The first tests were on an IPSEC VPN between our datacenter in
> DE and Santa Clara, CA. We are connected with two gigabit uplinks in
> each DC. Pushing data by scp between our FreeBSD servers takes ages.
> Starting with several MB/s it drops to 60-70KB/s:
>
> [root@freebsd ~]# ls -alh test.tgz
> -rw-r-  1 root  wheel58M Oct  5  2010 test.tgz
> [root@freebsd ~]# scp test.tgz 172.16.3.10:.
> Password:
> test.tgz   28%   17MB  75.3KB/s   09:32 ETA
>
>
> For comparision, we did a similiar test with Linux, which didn't show
> this behaviour:
>
> root@linux:~# scp jdk-6u33-linux-x64.bin 172.16.4.50:
> root@172.16.4.50's password:
> jdk-6u33-linux-x64.bin 100%
>   69MB   3.4MB/s   00:20
> root@linux:~#
>
>
> Otherwise, the servers are really fast, when copying data to a machine
> nearby:
>
> [root@freebsd ~]# ls -alh test
> -rw-r--r--  1 root  wheel 1G Nov 21 13:43 test
> [root@freebsd ~]# scp test 172.16.3.11:
> Password:
> test 100% 1000MB  38.5MB/s   00:26
>
>
> Intercontinental ftp downloads are the same:
>
> [root@freebsd ~]# fetch
> ftp://ftp1.us.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-bootonly.iso
> FreeBSD-9.1-RC3-amd64-bootonly.iso   100% of  146 MB   46 MBps
>
> [root@freebsd ~]# fetch
> ftp://ftp1.us.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-disc1.iso
>
> FreeBSD-9.1-RC3-amd64-disc1.iso 100% of  685 MB   36 MBps 00m00s
>
> [root@freebsd ~]# fetch
> ftp://ftp1.de.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-disc1.iso
> FreeBSD-9.1-RC3-amd64-disc1.iso 0% of  685 MB   13 kBps 14h49m^C
>
>
> Linux:
>
> root@linux:~# wget
> ftp://ftp1.de.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-disc1.iso
> --2012-11-21 15:07:57--
> ftp://ftp1.de.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-disc1.iso
>=> `FreeBSD-9.1-RC3-amd64-disc1.iso'
> Resolving ftp1.de.freebsd.org... 137.226.34.42
> Connecting to ftp1.de.freebsd.org|137.226.34.42|:21... connected.
> Logging in as anonymous ... Logged in!
> ==> SYST ... done.==> PWD ... done.
> ==> TYPE I ... done.  ==> CWD (1)
> /pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1 ... done.
> ==> SIZE FreeBSD-9.1-RC3-amd64-disc1.iso ... 718800896
> ==> PASV ... done.==> RETR FreeBSD-9.1-RC3-amd64-disc1.iso ... done.
> Length: 718800896 (686M) (unauthoritative)
>
> 100%[=>]
> 718,800,896 19.1M/s   in 61s
>
> 2012-11-21 15:09:01 (11.2 MB/s) - `FreeBSD-9.1-RC3-amd64-disc1.iso'
> saved [718800896]
>
>
> Doing some googling brought up a lot of tuning hints, but nothing worked
> for us. We tweaked some sysctls:
>
> kern.ipc.maxsockbuf=16777216
> net.inet.tcp.sendbuf_max=16777216
> net.inet.tcp.recvbuf_max=16777216
> net.inet.tcp.sendbuf_inc=16384
> net.inet.tcp.recvbuf_inc=524288
> net.inet.tcp.hostcache.expire=1
>
> but to no good. Disabling MSI and TSO4 for the card didn't change
> anything, too.
>
> The machines are all HP DL360G7 with bce cards (find dmesg, ifconfig and
> pciconf -lvc at the end of this mail).
>
> Can someone hit me with a cluestick to get the BSDs on speed?
>
> marc
>
> PS: The version is FreeBSD-RC2 amd64, because we need the patch for
> process migration on the CPUs which didn't make it 9.0 or an errata, as
> we were the only ones, hitting this bug (so kib@ said).
>
> ifconfig:
> [root@freebsd ~]# ifconfig
> bce0: flags=8843 metric 0 mtu 1500
> 
> options=c01bb
> ether ac:16:2d:b7:00:f4
> inet 172.16.3.10 netmask 0xff00 broadcast 172.16.3.255
> inet6 fe80::ae16:2dff:feb7:f4%bce0 prefixlen 64 scopeid 0x1
> nd6 options=29
> media: Ethernet autoselect (1000baseT )
> status: active
> bce1: flags=8843 metric 0 mtu 1500
> 
> options=c01bb
> ether ac:16:2d:b7:00:f6
> inet 172.17.3.10 netmask 0xf800 broadcast 172.17.7.255
> inet6 fe80::ae16:2dff:feb7:f6%bce1 prefixlen 64 scopeid 0x2
> nd6 options=29
> media: Ethernet autoselect (1000baseT )
> status: active
> bce2: flags=8802 metric 0 mtu 1500
> 
> options=c01bb
> ether ac:16:2d:b7:00:fc
> nd6 options=29
> media: Ethernet autoselect
> bce3: flags=8802 metric 0 mtu 1500
> 
> options=c01bb
> ether ac:16:2d:b7:00:fe
> 

[RFC] Prune net.inet6.ip6.rr_prune?

2012-11-21 Thread Garrett Cooper
While going through the tree trying to document all of our
net.inet6 sysctls, I noticed that net.inet6.ip6.rr_prune is defined,
but not actually used anywhere in the stack:

netinet6/ip6_var.h:VNET_DECLARE(int, ip6_rr_prune);  /* router
renumbering prefix
netinet6/ip6_var.h:#define   V_ip6_rr_prune  VNET(ip6_rr_prune)
netinet6/in6_proto.c:VNET_DEFINE(int, ip6_rr_prune) = 5; /* router
renumbering prefix
netinet6/in6_proto.c:SYSCTL_VNET_INT(_net_inet6_ip6, IPV6CTL_RR_PRUNE,
rr_prune, CTLFLAG_RW,
netinet6/in6_proto.c:&VNET_NAME(ip6_rr_prune), 0,

The knob was declared in r181803 and shuffled around a few times,
but isn't in use anywhere (either then or now).
Should I send out a PR to remove it (or am I missing some context)?
Thanks,
-Garrett
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Low Bandwidth on intercontinental connections

2012-11-21 Thread Julian Elischer

On 11/21/12 7:41 AM, Marc Peters wrote:

Hi list,

we are experiencing low throughput on interncontinental connections with
our FreeBSD Servers. We made several tests and are wondering, why this
would be. The first tests were on an IPSEC VPN between our datacenter in
DE and Santa Clara, CA. We are connected with two gigabit uplinks in
each DC. Pushing data by scp between our FreeBSD servers takes ages.
Starting with several MB/s it drops to 60-70KB/s:

[root@freebsd ~]# ls -alh test.tgz
-rw-r-  1 root  wheel58M Oct  5  2010 test.tgz
[root@freebsd ~]# scp test.tgz 172.16.3.10:.
Password:
test.tgz   28%   17MB  75.3KB/s   09:32 ETA


For comparision, we did a similiar test with Linux, which didn't show
this behaviour:

root@linux:~# scp jdk-6u33-linux-x64.bin 172.16.4.50:
root@172.16.4.50's password:
jdk-6u33-linux-x64.bin 100%
   69MB   3.4MB/s   00:20
root@linux:~#


Otherwise, the servers are really fast, when copying data to a machine
nearby:

[root@freebsd ~]# ls -alh test
-rw-r--r--  1 root  wheel 1G Nov 21 13:43 test
[root@freebsd ~]# scp test 172.16.3.11:
Password:
test 100% 1000MB  38.5MB/s   00:26


Intercontinental ftp downloads are the same:

[root@freebsd ~]# fetch
ftp://ftp1.us.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-bootonly.iso
FreeBSD-9.1-RC3-amd64-bootonly.iso   100% of  146 MB   46 MBps

[root@freebsd ~]# fetch
ftp://ftp1.us.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-disc1.iso

FreeBSD-9.1-RC3-amd64-disc1.iso 100% of  685 MB   36 MBps 00m00s

[root@freebsd ~]# fetch
ftp://ftp1.de.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-disc1.iso
FreeBSD-9.1-RC3-amd64-disc1.iso 0% of  685 MB   13 kBps 14h49m^C


Linux:

root@linux:~# wget
ftp://ftp1.de.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-disc1.iso
--2012-11-21 15:07:57--
ftp://ftp1.de.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1/FreeBSD-9.1-RC3-amd64-disc1.iso
=> `FreeBSD-9.1-RC3-amd64-disc1.iso'
Resolving ftp1.de.freebsd.org... 137.226.34.42
Connecting to ftp1.de.freebsd.org|137.226.34.42|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1)
/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/9.1 ... done.
==> SIZE FreeBSD-9.1-RC3-amd64-disc1.iso ... 718800896
==> PASV ... done.==> RETR FreeBSD-9.1-RC3-amd64-disc1.iso ... done.
Length: 718800896 (686M) (unauthoritative)

100%[=>]
718,800,896 19.1M/s   in 61s

2012-11-21 15:09:01 (11.2 MB/s) - `FreeBSD-9.1-RC3-amd64-disc1.iso'
saved [718800896]


Doing some googling brought up a lot of tuning hints, but nothing worked
for us. We tweaked some sysctls:

kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_inc=16384
net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.hostcache.expire=1

but to no good. Disabling MSI and TSO4 for the card didn't change
anything, too.

The machines are all HP DL360G7 with bce cards (find dmesg, ifconfig and
pciconf -lvc at the end of this mail).

Can someone hit me with a cluestick to get the BSDs on speed?
you really do need to get a tcpdump of the transfer under slow 
conditions and a SIFTR output to match.
What is the ping time between the hosts. that will allow you to work 
out how large a window you should have.


marc

PS: The version is FreeBSD-RC2 amd64, because we need the patch for
process migration on the CPUs which didn't make it 9.0 or an errata, as
we were the only ones, hitting this bug (so kib@ said).



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Low Bandwidth on intercontinental connections

2012-11-21 Thread Adrian Chadd
.. and there's also some SACK stuff and RTT prediction that you may be
totally running afoul of over that high latency link?

(I thought this stuff was fixed in -HEAD and -9?)



Adrian
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: igb diver crashes in head@241037

2012-11-21 Thread Jack Vogel
Gleb,

Here is a patch based on my latest igb internal code, I had not yet
committed this as I
was not completely confident about the start/queueing changes, I would love
to have a
wider testing base, so anyone that wishes to test this... Its against HEAD.

It does a few things: change mq_start to ALWAYS queue, hence
mq_start_locked no
longer takes an mbuf pointer arg.

Second, it gets rid of OACTIVE as far as the queues go, its still used only
in a device
wide up/down sense.

Last, there is a flow control display added, this follows what our linux
driver does, it
gives you the current flow control state when a link up event happens. I
was asked
to do this by my validation group, and it seemed kinda handy...

Let me know what you think,

Jack


On Tue, Nov 20, 2012 at 10:26 PM, Gleb Smirnoff  wrote:

>   Jack,
>
> On Tue, Nov 20, 2012 at 09:19:54AM -0800, Jack Vogel wrote:
> J> > I'd suggest the following code:
> J> >
> J> > if (m)
> J> > drbr_enqueue(ifp, txr->br, m);
> J> > err = igb_mq_start_locked(ifp, txr, NULL);
> J> >
> J> > Which eventually leads us to all invocations of igb_mq_start_locked()
> J> > called
> J> > with third argument as NULL. This allows us to simplify this function.
> J> >
> J> > Patch for review attached.
> J> >
> J> >
> J> Yes Gleb, I already have code in my internal tree which simply removes
> an
> J> mbuf
> J> pointer form the start_locked call and ALWAYS does a dequeue, start
> J> similarly
> J> will always enqueue. I just have been busy with ixgbe for a bit and have
> J> not gotten
> J> it committed yet.
>
>   Since ixgbe work is performance tuning and this patch closes a kernel
> crash,
> I'd ask to preempt the ixgbe job with this patch. :)
>
>   Or you can approve my patch and I will check it in.
>
> --
> Totus tuus, Glebius.
>


if_igb.patch
Description: Binary data
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: [RFC] Prune net.inet6.ip6.rr_prune?

2012-11-21 Thread Sergey Kandaurov
On 21 November 2012 22:39, Garrett Cooper  wrote:
> While going through the tree trying to document all of our
> net.inet6 sysctls, I noticed that net.inet6.ip6.rr_prune is defined,
> but not actually used anywhere in the stack:
>
> netinet6/ip6_var.h:VNET_DECLARE(int, ip6_rr_prune);  /* router
> renumbering prefix
> netinet6/ip6_var.h:#define   V_ip6_rr_prune  
> VNET(ip6_rr_prune)
> netinet6/in6_proto.c:VNET_DEFINE(int, ip6_rr_prune) = 5; /* router
> renumbering prefix
> netinet6/in6_proto.c:SYSCTL_VNET_INT(_net_inet6_ip6, IPV6CTL_RR_PRUNE,
> rr_prune, CTLFLAG_RW,
> netinet6/in6_proto.c:&VNET_NAME(ip6_rr_prune), 0,
>
> The knob was declared in r181803 and shuffled around a few times,
> but isn't in use anywhere (either then or now).
> Should I send out a PR to remove it (or am I missing some context)?

I believe this knob became unused with invalidation of the prefix
manipulation mechanism (including prefix or router renumbering, rfc2894)
at KAME about 11 years ago. It was intended to schedule in6_rr_timer()
callout every ip6_rr_prune seconds to check for expired prefixes and
delete the associated addresses from interface.
Last bits of old ipv6 prefix management stuff cleaned up in r231229.

-- 
wbr,
pluknet
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [RFC] Prune net.inet6.ip6.rr_prune?

2012-11-21 Thread Garrett Cooper
On Wed, Nov 21, 2012 at 3:07 PM, Sergey Kandaurov  wrote:
> On 21 November 2012 22:39, Garrett Cooper  wrote:
>> While going through the tree trying to document all of our
>> net.inet6 sysctls, I noticed that net.inet6.ip6.rr_prune is defined,
>> but not actually used anywhere in the stack:
>>
>> netinet6/ip6_var.h:VNET_DECLARE(int, ip6_rr_prune);  /* router
>> renumbering prefix
>> netinet6/ip6_var.h:#define   V_ip6_rr_prune  
>> VNET(ip6_rr_prune)
>> netinet6/in6_proto.c:VNET_DEFINE(int, ip6_rr_prune) = 5; /* router
>> renumbering prefix
>> netinet6/in6_proto.c:SYSCTL_VNET_INT(_net_inet6_ip6, IPV6CTL_RR_PRUNE,
>> rr_prune, CTLFLAG_RW,
>> netinet6/in6_proto.c:&VNET_NAME(ip6_rr_prune), 0,
>>
>> The knob was declared in r181803 and shuffled around a few times,
>> but isn't in use anywhere (either then or now).
>> Should I send out a PR to remove it (or am I missing some context)?
>
> I believe this knob became unused with invalidation of the prefix
> manipulation mechanism (including prefix or router renumbering, rfc2894)
> at KAME about 11 years ago. It was intended to schedule in6_rr_timer()
> callout every ip6_rr_prune seconds to check for expired prefixes and
> delete the associated addresses from interface.
> Last bits of old ipv6 prefix management stuff cleaned up in r231229.

Interesting. If I don't get any negative feedback, I'll roll it
into the patch I was going to submit better documenting the sysctls
that I was going to submit via a PR.
Thanks!
-Garrett
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Low Bandwidth on intercontinental connections

2012-11-21 Thread Andre Oppermann

On 21.11.2012 16:41, Marc Peters wrote:

Hi list,


-snip-

Doing some googling brought up a lot of tuning hints, but nothing worked
for us. We tweaked some sysctls:

kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_inc=16384
net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.hostcache.expire=1


This doesn't help.  Please revert it to the default values.

--
Andre

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


LOR in rtsock/ifnet

2012-11-21 Thread Rui Paulo
I just started seeing this on r243286.

lock order reversal:
 1st 0xfe0001b40400 if_addr_lock (if_addr_lock) @ 
/usr/home/rpaulo/freebsd/head/sys/net/rtsock.c:1818
 2nd 0x80c693f8 ifnet_rw (ifnet_rw) @ 
/usr/home/rpaulo/freebsd/head/sys/net/if.c:241
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b
kdb_backtrace() at kdb_backtrace+0x39
witness_checkorder() at witness_checkorder+0xc37
__rw_rlock() at __rw_rlock+0x8c
ifnet_byindex() at ifnet_byindex+0x22
sa6_recoverscope() at sa6_recoverscope+0x7b
rt_msg2() at rt_msg2+0x1a2
sysctl_rtsock() at sysctl_rtsock+0x68c
sysctl_root() at sysctl_root+0x1d7
userland_sysctl() at userland_sysctl+0x192
sys___sysctl() at sys___sysctl+0x74
amd64_syscall() at amd64_syscall+0x265
Xfast_syscall() at Xfast_syscall+0xfb
--- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x8011813ea, rsp = 
0x7fffd408, rbp = 0x7fffd440 ---

Regards,
--
Rui Paulo

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


USB ethernet support - can't complete CD install

2012-11-21 Thread Chris
Greetings,
 I just attempted a CD install of RELENG_8 on an AMD box.
The install went nearly as expected, but upon rebooting into the new
install. I ran into
trouble -- cdce0 faking MAC address. CRAP! this'll never work. My ISP
leases the DHCP
by MAC address. Now I won't be able to make an attempt to gain internet
access
for another ~24hrs.
I didn't think I'd need to parse the log during install to copy the MAC
address for later
use. As I see I'll need to maintain a _single_ "faked" MAC address.
Can anyone assist
me in effectively utilizing it?
relevent info:
nVidia nForce2 USB2.0 controller on ehci0
-- simply put; a USB cable attached from the USB port on the AMD box, to
the USB port on the
modem.

What is the proper usage in rc.conf(5)?
I'm attempting the following:
ifconfig_ue0="ether xx:xx:xx:xx:xx:xx DHCP"

Thank you for all your time, and consideration.

--Chris



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: LOR in rtsock/ifnet

2012-11-21 Thread Adrian Chadd
I've started seeing this too.

We're both running nat/bridge gateways of some sort.


Adrian


On 21 November 2012 16:31, Rui Paulo  wrote:
> I just started seeing this on r243286.
>
> lock order reversal:
>  1st 0xfe0001b40400 if_addr_lock (if_addr_lock) @ 
> /usr/home/rpaulo/freebsd/head/sys/net/rtsock.c:1818
>  2nd 0x80c693f8 ifnet_rw (ifnet_rw) @ 
> /usr/home/rpaulo/freebsd/head/sys/net/if.c:241
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b
> kdb_backtrace() at kdb_backtrace+0x39
> witness_checkorder() at witness_checkorder+0xc37
> __rw_rlock() at __rw_rlock+0x8c
> ifnet_byindex() at ifnet_byindex+0x22
> sa6_recoverscope() at sa6_recoverscope+0x7b
> rt_msg2() at rt_msg2+0x1a2
> sysctl_rtsock() at sysctl_rtsock+0x68c
> sysctl_root() at sysctl_root+0x1d7
> userland_sysctl() at userland_sysctl+0x192
> sys___sysctl() at sys___sysctl+0x74
> amd64_syscall() at amd64_syscall+0x265
> Xfast_syscall() at Xfast_syscall+0xfb
> --- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x8011813ea, rsp = 
> 0x7fffd408, rbp = 0x7fffd440 ---
>
> Regards,
> --
> Rui Paulo
>
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"