from:"Barney Cordoba"

Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1

2015-08-15 Thread Barney Cordoba via freebsd-net

I am laughing so hard that I had to open some windows to get more oxygen!  


 On Friday, August 14, 2015 1:30 PM, Maxim Sobolev sobo...@freebsd.org 
wrote:
   

 Hi guys, unfortunately no, neither reduction of the number of queues from 8
to 6 nor pinning interrupt rate at 2 per queue have not made any
difference. The card still goes kaboom at about 200Kpps no matter what. in
fact I've gone bit further, and after the first spike went on an pushed
interrupt rate even further down to 1, but again no difference either,
it still blows at the same mark. Although it did have effect on interrupt
rate reduction from 190K to some 130K according to the systat -vm, so that
the moderation itself seems to be working fine. We will try disabling
IXGBE_FDIR
tomorrow and see if it helps.

http://sobomax.sippysoft.com/ScreenShot391.png - systat -vm with
max_interrupt_rate = 2 right before overload

http://sobomax.sippysoft.com/ScreenShot392.png - systat -vm during issue
unfolding (max_interrupt_rate = 1)

http://sobomax.sippysoft.com/ScreenShot394.png - cpu/net monitoring, first
two spikes are with max_interrupt_rate = 2, the third one
max_interrupt_rate
= 1

-Max

On Wed, Aug 12, 2015 at 5:23 AM, Luigi Rizzo ri...@iet.unipi.it wrote:

 As I was telling to maxim, you should disable aim because it only matches
 the max interrupt rate to the average packet size, which is the last thing
 you want.

 Setting the interrupt rate with sysctl (one per queue) gives you precise
 control on the max rate and (hence, extra latency). 20k interrupts/s give
 you 50us of latency, and the 2k slots in the queue are still enough to
 absorb a burst of min-sized frames hitting a single queue (the os will
 start dropping long before that level, but that's another story).

 Cheers
 Luigi

 On Wednesday, August 12, 2015, Babak Farrokhi farro...@freebsd.org
 wrote:

 I ran into the same problem with almost the same hardware (Intel X520)
 on 10-STABLE. HT/SMT is disabled and cards are configured with 8 queues,
 with the same sysctl tunings as sobomax@ did. I am not using lagg, no
 FLOWTABLE.

 I experimented with pmcstat (RESOURCE_STALLS) a while ago and here [1]
 [2] you can see the results, including pmc output, callchain, flamegraph
 and gprof output.

 I am experiencing huge number of interrupts with 200kpps load:

 # sysctl dev.ix | grep interrupt_rate
 dev.ix.1.queue7.interrupt_rate: 125000
 dev.ix.1.queue6.interrupt_rate: 6329
 dev.ix.1.queue5.interrupt_rate: 50
 dev.ix.1.queue4.interrupt_rate: 10
 dev.ix.1.queue3.interrupt_rate: 5
 dev.ix.1.queue2.interrupt_rate: 50
 dev.ix.1.queue1.interrupt_rate: 50
 dev.ix.1.queue0.interrupt_rate: 10
 dev.ix.0.queue7.interrupt_rate: 50
 dev.ix.0.queue6.interrupt_rate: 6097
 dev.ix.0.queue5.interrupt_rate: 10204
 dev.ix.0.queue4.interrupt_rate: 5208
 dev.ix.0.queue3.interrupt_rate: 5208
 dev.ix.0.queue2.interrupt_rate: 71428
 dev.ix.0.queue1.interrupt_rate: 5494
 dev.ix.0.queue0.interrupt_rate: 6250

 [1] http://farrokhi.net/~farrokhi/pmc/6/
 [2] http://farrokhi.net/~farrokhi/pmc/7/

 Regards,
 Babak


 Alexander V. Chernikov wrote:
  12.08.2015, 02:28, Maxim Sobolev sobo...@freebsd.org:
  Olivier, keep in mind that we are not kernel forwarding packets, but
 app
  forwarding, i.e. the packet goes full way
  net-kernel-recvfrom-app-sendto-kernel-net, which is why we have
 much
  lower PPS limits and which is why I think we are actually benefiting
 from
  the extra queues. Single-thread sendto() in a loop is CPU-bound at
 about
  220K PPS, and while running the test I am observing that outbound
 traffic
  from one thread is mapped into a specific queue (well, pair of queues
 on
  two separate adaptors, due to lagg load balancing action). And the peak
  performance of that test is at 7 threads, which I believe corresponds
 to
  the number of queues. We have plenty of CPU cores in the box (24) with
  HTT/SMT disabled and one CPU is mapped to a specific queue. This
 leaves us
  with at least 8 CPUs fully capable of running our app. If you look at
 the
  CPU utilization, we are at about 10% when the issue hits.
 
  In any case, it would be great if you could provide some profiling info
 since there could be
  plenty of problematic places starting from TX rings contention to some
 locks inside udp or even
  (in)famous random entropy harvester..
  e.g. something like pmcstat -TS instructions -w1 might be sufficient to
 determine the reason
  ix0: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15
 port
  0x6020-0x603f mem 0xc7c0-0xc7df,0xc7e04000-0xc7e07fff irq 40 at
  device 0.0 on pci3
  ix0: Using MSIX interrupts with 9 vectors
  ix0: Bound queue 0 to cpu 0
  ix0: Bound queue 1 to cpu 1
  ix0: Bound queue 2 to cpu 2
  ix0: Bound queue 3 to cpu 3
  ix0: Bound queue 4 to cpu 4
  ix0: Bound queue 5 to cpu 5
  ix0: Bound queue 6 to cpu 6
  ix0: Bound queue 7 to cpu 7
  ix0: Ethernet address: 0c:c4:7a:5e:be:64
  ix0: PCI Express Bus: Speed

Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1

2015-08-11 Thread Barney Cordoba via freebsd-net

Also, using a slow-ass cpu like the atom is completely absurd; first, no-one
would ever use them.
You have to test cpu usage under 60% cpu usage, because as you get to higher
cpu usage levels the lock contention increases exponentially. You're increasing
lock contention by having more queues; so more queues at higher cpu % usage
will perform increasingly bad as usage increases.
You'd never run a system at 95% usage (ie totally hammering it) in real world
usage, so why would you benchmark at such a high usage? Everything changes as
cpu available become scarce.
What is the pps at 50% cpu usage is a better question to ask than the one
you're asking.
BC

On Tuesday, August 11, 2015 9:29 PM, Barney Cordoba via freebsd-net
freebsd-net@freebsd.org wrote:

Wow, this is really important! if this is a college project, I give you a D.
Maybe a D- because it's almost useless information.
You ignore the most important aspect of performance. Efficiency is arguably
the most important aspect of performance.
1M pps at 20% cpu usage is much better performance than 1.2M pps at 85%.
Why don't any of you understand this simple thing? Why does spreading equality
really matter, unless you are hitting a wall with your cpus? I don't care which
cpu processes which packet. If you weren't doing moronic things like binding to
a cpu, then you'd never have to care about distribution unless it was extremely
unbalanced.
BC

On Tuesday, August 11, 2015 7:15 PM, Olivier Cochard-Labbé
oliv...@cochard.me wrote:

On Tue, Aug 11, 2015 at 11:18 PM, Maxim Sobolev sobo...@freebsd.org wrote:

Hi folks,

Hi,

We've trying to migrate some of our high-PPS systems to a new hardware that
has four X540-AT2 10G NICs and observed that interrupt time goes through
roof after we cross around 200K PPS in and 200K out (two ports in LACP).
The previous hardware was stable up to about 350K PPS in and 350K out. I
believe the old one was equipped with the I350 and had the identical LACP
configuration. The new box also has better CPU with more cores (i.e. 24
cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3.

200K PPS, and even 350K PPS are very low value indeed.
On a Intel Xeon L5630 (4 cores only) with one X540-AT2

(then 2 10Gigabit ports) I've reached about 1.8Mpps (fastforwarding
enabled) [1].
But my setup didn't use lagg(4): Can you disable lagg configuration and
re-measure your performance without lagg ?

Do you let Intel NIC drivers using 8 queues for port too?
In my use case (forwarding smallest UDP packet size), I obtain better
behaviour by limiting NIC queues to 4 (hw.ix.num_queues or
hw.ixgbe.num_queues, don't remember) if my system had 8 cores. And this
with Gigabit Intel[2] or Chelsio NIC [3].

Don't forget to disable TSO and LRO too.

Regards,

Olivier

[1]
http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_an_ibm_system_x3550_m3_with_10-gigabit_intel_x540-at2#graphs
[2]
http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_superserver_5018a-ftn4#graph1
[3]
http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr#reducing_nic_queues
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1

2015-08-11 Thread Barney Cordoba via freebsd-net

On Tuesday, August 11, 2015 7:15 PM, Olivier Cochard-Labbé
oliv...@cochard.me wrote:

On Tue, Aug 11, 2015 at 11:18 PM, Maxim Sobolev sobo...@freebsd.org wrote:

Hi folks,

Hi,

200K PPS, and even 350K PPS are very low value indeed.
On a Intel Xeon L5630 (4 cores only) with one X540-AT2

(then 2 10Gigabit ports) I've reached about 1.8Mpps (fastforwarding
enabled) [1].
But my setup didn't use lagg(4): Can you disable lagg configuration and
re-measure your performance without lagg ?

Don't forget to disable TSO and LRO too.

Regards,

Olivier

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Exposing full 32bit RSS hash from card for ixgbe(4)

2015-08-10 Thread Barney Cordoba via freebsd-net




 On Wednesday, August 5, 2015 4:28 PM, Kevin Oberman rkober...@gmail.com 
wrote:
   

 On Wed, Aug 5, 2015 at 7:10 AM, Barney Cordoba via freebsd-net 
freebsd-net@freebsd.org wrote:




      On Wednesday, August 5, 2015 2:19 AM, Olivier Cochard-Labbé 
 oliv...@cochard.me wrote:


  On Wed, Aug 5, 2015 at 1:15 AM, Barney Cordoba via freebsd-net 
 freebsd-net@freebsd.org wrote:

  What's the point of all of this gobbledygook anyway? Seriously, 99% of
 the
  world needs a driver that passes packets in the most efficient way, and
  every time I look at igb and ixgbe it has another 2 heads. It's up to 8
  heads, and none of the things wrong with it have been fixed. This is now
  even uglier than Kip Macy's cxgb abortion.
  I'm not trying to be snarky here. I wrote a simple driver 3 years ago
 that
  runs and runs and uses little cpu; maybe 8% for a full gig load on an E3.
 

 Hi,

 I will be very happy to bench your simple driver. Where can I download the
 sources ?

 Thanks,

 Olivier
 ___

 Another unproductive dick head on the FreeBSD team? Figures.


A typical Barney thread. First he calls the developers incompetent and says
he has done better. Then someone who has experience in real world
benchmarking (not a trivial thing) offers to evaluate Barney's code, and
gets a quick, rude, obscene dismissal. Is it any wonder that, even though
he made some valid arguments (at least for some workloads), almost everyone
just dismisses him as too obnoxious to try to deal with.

Based on my pre-retirement work with high-performance networking, in some
cases it was clear that it would be better to locking down things to a
single CPU on with FreeBSD or Linux. I can further state that this was NOT
true for all workloads, so it is quite possible that Barney's code works
for some cases (perhaps his) and would be bad in others. But without good
benchmarking, it's hard to tell.

I will say that for large volume data transfers (very large flows), a
single CPU solution does work best. But if Barney is going at this with his
usual attitude, it's probably  not worth it to continue the discussion.
--
the give us the source and we'll test it nonsense is kindergarden stuff. As 
if my code is open source and you can just have it, and like you know how to 
benchmark anything since you can't even benchmark what you have. 
Some advice is to ignore guys like Oberman who spent their lives randomly 
pounding networks on slow machines with slow busses and bad NICs on OS's that 
couldn't do SMP properly. Because he'll just lead you down the road to dusty 
death. Multicore design isn't simple math; its about efficiency, lock 
minimization and the understanding that shifting memory between cpus 
unnecessarily is costly. Today's CPUs and NICs can't be judged using test 
methods of the past. You'll just end up playing the Microsoft Windows game; get 
bigger machines and more memory and don't worry about the fact that the code is 
junk.
It's just that the default in these drivers is so obviously wrong that it's 
mind-boggling. The argument to use 1, 2 or 4 queues is one worth having; using 
all of the cpus, including the hyperthreads, is just plain incompetent.
I will contribute one possibly useful tidbit:
disable_queue() only disables receive interrupts. Both tx and rx ints are 
effectively tied together by moderation so you'll just getan interrupt at the 
next slot anyway.
BC







  
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Exposing full 32bit RSS hash from card for ixgbe(4)

2015-08-05 Thread Barney Cordoba via freebsd-net

 


 On Wednesday, August 5, 2015 2:19 AM, Olivier Cochard-Labbé 
oliv...@cochard.me wrote:
   

 On Wed, Aug 5, 2015 at 1:15 AM, Barney Cordoba via freebsd-net 
freebsd-net@freebsd.org wrote:

 What's the point of all of this gobbledygook anyway? Seriously, 99% of the
 world needs a driver that passes packets in the most efficient way, and
 every time I look at igb and ixgbe it has another 2 heads. It's up to 8
 heads, and none of the things wrong with it have been fixed. This is now
 even uglier than Kip Macy's cxgb abortion.
 I'm not trying to be snarky here. I wrote a simple driver 3 years ago that
 runs and runs and uses little cpu; maybe 8% for a full gig load on an E3.


Hi,

I will be very happy to bench your simple driver. Where can I download the
sources ?

Thanks,

Olivier
___

Another unproductive dick head on the FreeBSD team? Figures.
BC

  
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Exposing full 32bit RSS hash from card for ixgbe(4)

2015-08-04 Thread Barney Cordoba via freebsd-net

What's the point of all of this gobbledygook anyway? Seriously, 99% of the 
world needs a driver that passes packets in the most efficient way, and every 
time I look at igb and ixgbe it has another 2 heads. It's up to 8 heads, and 
none of the things wrong with it have been fixed. This is now even uglier than 
Kip Macy's cxgb abortion.
I'm not trying to be snarky here. I wrote a simple driver 3 years ago that runs 
and runs and uses little cpu; maybe 8% for a full gig load on an E3. What is 
the benefit of implementing all of these stupid offload and RSS hashes? 
Spreading across cpus is incredibly inefficient; running 8 'queues' on a quad 
core cpu with hyperthreading is incredibly stupid. 1 cpu can easily handle a 
full gig, so why are you dirtying the code with 8000 features when it runs 
just fine without any of them? you're subjecting 1000s of users to constant 
instability (and fear in upgrading at all) for what amounts to a college 
science project. I know you haven't benchmarked it, so why are you doing it? 
hell, you added that buf_ring stuff without even making any determination that 
it was beneficial to use it, just because it was there.
You're trying to steal a handful of cycles with these hokey features, and then 
you're losing buckets of cycles (maybe wheelbarrows) by unnecessarily spreading 
the processes across too many cpus. It just makes no sense at all.
If you want to play, that's fine. But there should be simple I/O drivers for 
em, igb and ixgbe available as alternatives for the 99% of users who just want 
to run a router, a bridge/filter or a web server. Drivers that don't break 
features A and C when you make a change to Q and Z because you can't possibly 
test all 8000 features every time you do something.
Im horrified that some poor schlub with a 1 gig webserver is losing half of his 
cpu power because of the ridiculous defaults in the igb driver. 


 On Wednesday, July 15, 2015 2:01 PM, hiren panchasara hi...@freebsd.org 
wrote:
   

 On 07/14/15 at 02:18P, hiren panchasara wrote:
 On 07/14/15 at 12:38P, Eric Joyner wrote:
  Sorry for the delay; it looked fine to me, but I never got back to you.
  
  - Eric
  
  On Mon, Jul 13, 2015 at 3:16 PM Adrian Chadd adrian.ch...@gmail.com wrote:
  
   Hi,
  
   It's fine by me. Please do it!
 
 Thanks Adrian and Eric. Committed as r285528.
FYI:

I am planning to do a partial mfc of this to stable10. Here is the
patch:
https://people.freebsd.org/~hiren/patches/ix_expose_rss_hash_stable10.patch

(I did the same for igb(4), r282831)

Cheers,
Hiren

   
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: netmap-ipfw on em0 em1

2015-05-05 Thread Barney Cordoba via freebsd-net

Are you NOT SHARP ENOUGH to understand that my proposal DOESN'T USE THE NETWORK 
STACK? OMFG
Julien, perhaps if people weren't so hostile towards commercial companies 
providing ideas for alternative ways of doing things you'd get more input and 
more help. Why would I want to help these people?
BC 


 On Monday, May 4, 2015 11:55 PM, Jim Thompson j...@netgate.com wrote:
   

 
 On May 4, 2015, at 10:07 PM, Julian Elischer jul...@freebsd.org wrote:
 
 Jim, and Barney. I hate to sound like a broken record, but we really need 
 interested people in the network stack.
 The people who make the decisions about this are the people who stand up and 
 say I have a  few hours I can spend on this.
 If you were to do so too, then really, all these issues could be worked on. 
 get in there and help rather than standing on the bleachers and offering 
 advise.
 
 There is no person working against you here.
 
 From my counting the current active networking crew is about 10 people. with 
 another 10 doing drivers.
 You would have a lot of sway in a group that small. but you have th be in it 
 first, and the way to do that is to simple start doing stuff.  no-one was 
 ever sent an invitation. They just turned up.


I am (and we are) interested.  I’m a bit short on time, and I have a 
project/product (pfSense) to maintain, so I keep other people busy on the stack.

Examples include:

We co-sponsored the AES-GCM work.  Unfortunately, the process stopped before 
the IPsec work to leverage this we did made it upstream.
As partial remedy, gnn is currently evaluating all the patches from pfSense for 
inclusion into the FreeBSD mainline.

I was involved in the work to replace the hash function used in pf.  This is 
(only) min 3% gain, more if you carry large state tables.

There was a paper presented at AsiaBSDcon, so at least we have a methodology to 
speak about performance increases.  (Is the methodology in the paper perfect?  
No.  But at least it’s a stake in the ground.)

We’re currently working with Intel to bring support for QuickAssist to FreeBSD. 
 (Linux has it.)  While that’s not ‘networking’ per-se, the larger consumers 
for the technology
are various components in the stack.

The other flaws I pointed out are on the list of things for us to work on / 
fix.  Someone might get there first, but … that’s good.  I only care about 
getting things fixed.

Jim
p.s.  yes, I'm working on a commit bit.


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

  
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Fwd: netmap-ipfw on em0 em1

2015-05-04 Thread Barney Cordoba via freebsd-net

It's not faster than wedging into the if_input()s. It simply can't be. Your 
getting packets at interrupt time as soon as their processed and  you there's 
no network stack involved, and your able to receive and transmit without a 
process switch. At worst it's the same, without the extra plumbing. It's not 
rocket science to bypass the network stack.
The only advantage of bringing it into user space would be that it's easier to 
write threaded handlers for complex uses; but not as a firewall (which is the 
limit of the context of my comment). You can do anything in the kernel that you 
can do in user space. The reason a kernel module with if_input() hooks is 
better is that you can use the standard kernel without all of the netmap hacks. 
You can just pop it into any kernel and it works.
BC 


 On Sunday, May 3, 2015 2:13 PM, Luigi Rizzo ri...@iet.unipi.it wrote:
   

 On Sun, May 3, 2015 at 6:17 PM, Barney Cordoba via freebsd-net 
freebsd-net@freebsd.org wrote:

 Frankly I'm baffled by netmap. You can easily write a loadable kernel
 module that moves packets from 1 interface to another and hook in the
 firewall; why would you want to bring them up into user space? It's 1000s
 of lines of unnecessary code.


Because it is much faster.

The motivation for netmap-like
solutions (that includes Intel's DPDK, PF_RING/DNA
and several proprietary implementations) is speed:
they bypass the entire network stack, and a
good part of the device drivers, so you can access
packets 

10+ times faster.
So things are actually the other way around:
the 1000's of unnecessary
lines of code
(not really thousands, though)
are
those that you'd pay going through the standard
network stack
when you
don't need any of its services.

Going to userspace is just a side effect -- turns out to
be easier to develop and run your packet processing code
in userspace, but there are netmap clients (e.g. the
VALE software switch) which run entirely in the kernel.

cheers
luigi





      On Sunday, May 3, 2015 3:10 AM, Raimundo Santos rait...@gmail.com
 wrote:


  Clarifying things for the sake of documentation:

 To use the host stack, append a ^ character after the name of the interface
 you want to use. (Info from netmap(4) shipped with FreeBSD 10.1 RELEASE.)

 Examples:

 kipfw em0 does nothing useful.
 kipfw netmap:em0 disconnects the NIC from the usual data path, i.e.,
 there are no host communications.
 kipfw netmap:em0 netmap:em0^ or kipfw netmap:em0+ places the
 netmap-ipfw rules between the NIC and the host stack entry point associated
 (the IP addresses configured on it with ifconfig, ARP and RARP, etc...)
 with the same NIC.

 On 10 November 2014 at 18:29, Evandro Nunes evandronune...@gmail.com
 wrote:

  dear professor luigi,
  i have some numbers, I am filtering 773Kpps with kipfw using 60% of CPU
 and
  system using the rest, this system is a 8core at 2.4Ghz, but only one
 core
  is in use
  in this next round of tests, my NIC is now an avoton with igb(4) driver,
  currently with 4 queues per NIC (total 8 queues for kipfw bridge)
  i have read in your papers we should expect something similar to 1.48Mpps
  how can I benefit from the other CPUs which are completely idle? I tried
  CPU Affinity (cpuset) kipfw but system CPU usage follows userland kipfw
 so
  I could not set one CPU to userland while other for system
 

 All the papers talk about *generating* lots of packets, not *processing*
 lots of packets. What this netmap example does is processing. If someone
 really wants to use the host stack, the expected performance WILL BE worse
 - what's the point of using a host stack bypassing tool/framework if
 someone will end up using the host stack?

 And by generating, usually the papers means: minimum sized UDP packets.


 
  can you please enlighten?
 

 For everyone: read the manuals, read related and indicated materials
 (papers, web sites, etc), and, as a least resource, read the code. Within
 netmap's codes, it's more easy than it sounds.
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org



 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org




-- 
-+---
 Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/       . Universita` di Pisa
 TEL      +39-050-2217533              . via Diotisalvi 2
 Mobile  +39-338-6809875              . 56122 PISA (Italy)
-+---
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any

Re: Fwd: netmap-ipfw on em0 em1

2015-05-04 Thread Barney Cordoba via freebsd-net

Nothing freely available. Many commercial companies have done such things. Why 
limit the general community by force-feeding  a really fast packet generator 
into the mainstream by squashing other ideas in their infancy? Anyone who 
understands how the kernel works understands what I'm saying. A packet 
forwarder is a 3 day project (which means 2 weeks as we all know). 
When you're can't debate the merits of an implementation without having some 
weenie ask if you have a finished implementation to offer up for free, you end 
up stuck with misguided junk like netgraph and flowtables. 
The mediocrity of freebsd network utilities is a function of the collective 
imagination of its users. Its unfortunate that these lists can't be used to 
brainstorm better potential better ideas. Luigi's efforts are not diminished by 
arguing that there is a better way to do something that he recommends to be 
done with netmap.
BC


 On Monday, May 4, 2015 11:52 AM, Ian Smith smi...@nimnet.asn.au wrote:
   

 On Mon, 4 May 2015 15:29:13 +, Barney Cordoba via freebsd-net wrote:

  It's not faster than wedging into the if_input()s. It simply can't 
  be. Your getting packets at interrupt time as soon as their processed 
  and  you there's no network stack involved, and your able to receive 
  and transmit without a process switch. At worst it's the same, 
  without the extra plumbing. It's not rocket science to bypass the 
  network stack.

  The only advantage of bringing it into user space would be that it's 
  easier to write threaded handlers for complex uses; but not as a 
  firewall (which is the limit of the context of my comment). You can 
  do anything in the kernel that you can do in user space. The reason a 
  kernel module with if_input() hooks is better is that you can use the 
  standard kernel without all of the netmap hacks. You can just pop it 
  into any kernel and it works.

Barney, do you have a working alternative implementation you can share 
with us to help put this silly inferior netmap thingy out of business?

Thanks, Ian


[I'm sorry, pine doesn't quote messages from some yahoo users properly:]

On Sunday, May 3, 2015 2:13 PM, Luigi Rizzo ri...@iet.unipi.it wrote:

 On Sun, May 3, 2015 at 6:17 PM, Barney Cordoba via freebsd-net 
freebsd-net@freebsd.org wrote:

 Frankly I'm baffled by netmap. You can easily write a loadable kernel
 module that moves packets from 1 interface to another and hook in the
 firewall; why would you want to bring them up into user space? It's 1000s
 of lines of unnecessary code.


Because it is much faster.

The motivation for netmap-like
solutions (that includes Intel's DPDK, PF_RING/DNA
and several proprietary implementations) is speed:
they bypass the entire network stack, and a
good part of the device drivers, so you can access
packets 

10+ times faster.
So things are actually the other way around:
the 1000's of unnecessary
lines of code
(not really thousands, though)
are
those that you'd pay going through the standard
network stack
when you
don't need any of its services.

Going to userspace is just a side effect -- turns out to
be easier to develop and run your packet processing code
in userspace, but there are netmap clients (e.g. the
VALE software switch) which run entirely in the kernel.

cheers
luigi





      On Sunday, May 3, 2015 3:10 AM, Raimundo Santos rait...@gmail.com
 wrote:


  Clarifying things for the sake of documentation:

 To use the host stack, append a ^ character after the name of the interface
 you want to use. (Info from netmap(4) shipped with FreeBSD 10.1 RELEASE.)

 Examples:

 kipfw em0 does nothing useful.
 kipfw netmap:em0 disconnects the NIC from the usual data path, i.e.,
 there are no host communications.
 kipfw netmap:em0 netmap:em0^ or kipfw netmap:em0+ places the
 netmap-ipfw rules between the NIC and the host stack entry point associated
 (the IP addresses configured on it with ifconfig, ARP and RARP, etc...)
 with the same NIC.

 On 10 November 2014 at 18:29, Evandro Nunes evandronune...@gmail.com
 wrote:

  dear professor luigi,
  i have some numbers, I am filtering 773Kpps with kipfw using 60% of CPU
 and
  system using the rest, this system is a 8core at 2.4Ghz, but only one
 core
  is in use
  in this next round of tests, my NIC is now an avoton with igb(4) driver,
  currently with 4 queues per NIC (total 8 queues for kipfw bridge)
  i have read in your papers we should expect something similar to 1.48Mpps
  how can I benefit from the other CPUs which are completely idle? I tried
  CPU Affinity (cpuset) kipfw but system CPU usage follows userland kipfw
 so
  I could not set one CPU to userland while other for system
 

 All the papers talk about *generating* lots of packets, not *processing*
 lots of packets. What this netmap example does is processing. If someone
 really wants to use the host stack, the expected performance WILL BE worse
 - what's the point of using a host stack bypassing tool/framework if
 someone

Re: netmap-ipfw on em0 em1

2015-05-04 Thread Barney Cordoba via freebsd-net

I'll assume you're just not that clear on specific implementation. Hooking 
directly into if_input() bypasses all of the cruft. It basically uses the 
driver as-is, so any driver can be used and it will be as good as the driver. 
The bloat starts in if_ethersubr.c, which is easily completely avoided. Most 
drivers need to be tuned (or modified a bit) as most freebsd drivers are full 
of bloat and forced into a bad, cookie-cutter type way of doing things.
The problem with doing things in user space is that user space is 
unpredictable. Things work just dandily when nothing else is going on, but you 
can't control when a user space program gets context under heavy loads. In the 
kernel you can control almost exactly what the polling interval is through 
interrupt moderation on most modern controllers. 
Many otherwise credible programmers argued for years that polling was faster, 
but it was only faster in artificially controlled environment. Its mainly 
because 1) they're not thinking about the entire context of what can happen, 
and 2) because they test under unrealistic conditions that don't represent real 
world events, and 3) they don't have properly tuned ethernet drivers.
BC 


 On Monday, May 4, 2015 12:37 PM, Jim Thompson j...@netgate.com wrote:
   

 
While it is a true statement that, You can do anything in the kernel that you 
can do in user space.”, it is not a helpful statement.  Yes, the kernel is just 
a program.
In a similar way, “You can just pop it into any kernel and it works.” is also 
not helpful.  It works, but it doesn’t work well, because of other 
infrastructure issues.
Both of your statements reduce to the age-old, “proof is left as an exercise 
for the student”.

There is a lot of kernel infrastructure that is just plain crusty(*) and which 
directly impedes performance in this area.

But there is plenty of cruft, Barney.  Here are two threads which are three 
years old, with the issues it points out still unresolved, and multiple places 
where 100ns or more is lost:
https://lists.freebsd.org/pipermail/freebsd-current/2012-April/033287.html
https://lists.freebsd.org/pipermail/freebsd-current/2012-April/033351.html

100ns is death at 10Gbps with min-sized packets.

quoting: http://luca.ntop.org/10g.pdf
---
Taking as a reference a 10 Gbit/s link, the raw throughput is well below the 
memory bandwidth of modern systems (between 6 and 8 GBytes/s for CPU to memory, 
up to 5 GBytes/s on PCI-Express x16). How- ever a 10Gbit/s link can generate up 
to 14.88 million Packets Per Second (pps), which means that the system must be 
able to process one packet every 67.2 ns. This translates to about 200 clock 
cycles even for the faster CPUs, and might be a challenge considering the per- 
packet overheads normally involved by general-purpose operating systems. The 
use of large frames reduces the pps rate by a factor of 20..50, which is great 
on end hosts only concerned in bulk data transfer.  Monitoring systems and 
traffic generators, however, must be able to deal with worst case conditions.”

Forwarding and filtering must also be able to deal with worst case, and nobody 
does well with kernel-based networking here.  
https://github.com/gvnn3/netperf/blob/master/Documentation/Papers/ABSDCon2015Paper.pdf

10Gbps NICs are $200-$300 today, and they’ll be included on the motherboard 
during the next hardware refresh.  Broadwell-DE (Xeon-D) has 10G in the SoC, 
and others are coming.
10Gbps switches can be had at around $100/port.  This is exactly the point at 
which the adoption curve for 1Gbps Ethernet ramped over a decade ago.


(*) A few more simple examples of cruft:

Why, in 2015 does the kernel have a ‘fast forwarding’ option, and worse, one 
that isn’t enabled by default?  Shouldn’t “fast forwarding be the default?

Why, in 2015, does FreeBSD not ship with IPSEC enabled in GENERIC?  (Reason: 
each and every time this has come up in recent memory, someone has pointed out 
that it impacts performance.  
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=128030)

Why, in 2015, does anyone think it’s acceptable for “fast forwarding” to break 
IPSEC?

Why, in 2015, does anyone think it’s acceptable that the setkey(8) man page 
documents, of all things, DES-CBC and HMAC-MD5 for a SA?  That’s some kind of 
sick joke, right?
This completely flies in the face of RFC 4835.


 On May 4, 2015, at 10:29 AM, Barney Cordoba via freebsd-net 
 freebsd-net@freebsd.org wrote:
 
 It's not faster than wedging into the if_input()s. It simply can't be. Your 
 getting packets at interrupt time as soon as their processed and  you there's 
 no network stack involved, and your able to receive and transmit without a 
 process switch. At worst it's the same, without the extra plumbing. It's not 
 rocket science to bypass the network stack.
 The only advantage of bringing it into user space would be that it's easier 
 to write threaded handlers for complex uses; but not as a firewall (which is 
 the limit

Re: Fwd: netmap-ipfw on em0 em1

2015-05-03 Thread Barney Cordoba via freebsd-net

Frankly I'm baffled by netmap. You can easily write a loadable kernel module 
that moves packets from 1 interface to another and hook in the firewall; why 
would you want to bring them up into user space? It's 1000s of lines of 
unnecessary code.
 



 On Sunday, May 3, 2015 3:10 AM, Raimundo Santos rait...@gmail.com wrote:
   

 Clarifying things for the sake of documentation:

To use the host stack, append a ^ character after the name of the interface
you want to use. (Info from netmap(4) shipped with FreeBSD 10.1 RELEASE.)

Examples:

kipfw em0 does nothing useful.
kipfw netmap:em0 disconnects the NIC from the usual data path, i.e.,
there are no host communications.
kipfw netmap:em0 netmap:em0^ or kipfw netmap:em0+ places the
netmap-ipfw rules between the NIC and the host stack entry point associated
(the IP addresses configured on it with ifconfig, ARP and RARP, etc...)
with the same NIC.

On 10 November 2014 at 18:29, Evandro Nunes evandronune...@gmail.com
wrote:

 dear professor luigi,
 i have some numbers, I am filtering 773Kpps with kipfw using 60% of CPU and
 system using the rest, this system is a 8core at 2.4Ghz, but only one core
 is in use
 in this next round of tests, my NIC is now an avoton with igb(4) driver,
 currently with 4 queues per NIC (total 8 queues for kipfw bridge)
 i have read in your papers we should expect something similar to 1.48Mpps
 how can I benefit from the other CPUs which are completely idle? I tried
 CPU Affinity (cpuset) kipfw but system CPU usage follows userland kipfw so
 I could not set one CPU to userland while other for system


All the papers talk about *generating* lots of packets, not *processing*
lots of packets. What this netmap example does is processing. If someone
really wants to use the host stack, the expected performance WILL BE worse
- what's the point of using a host stack bypassing tool/framework if
someone will end up using the host stack?

And by generating, usually the papers means: minimum sized UDP packets.



 can you please enlighten?


For everyone: read the manuals, read related and indicated materials
(papers, web sites, etc), and, as a least resource, read the code. Within
netmap's codes, it's more easy than it sounds.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


   
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Intel Support for FreeBSD

2014-08-13 Thread Barney Cordoba via freebsd-net

Ok. It was a lot more convenient when it was a standalone module/tarball so you 
didn't have to surgically extract it from the tree and spend a week trying to 
get it to compile with whatever version you happened to be running. So if 
you're running 9.1 or 9.2 you could still use it seamlessly. 

Negative Progress is inevitable. 

BC


On Tuesday, August 12, 2014 9:57 PM, Mike Tancsa m...@sentex.net wrote:
 


On 8/12/2014 9:16 PM, Barney Cordoba via freebsd-net wrote:

 I notice that there hasn't been an update in the Intel Download Center since 
 July. Is there no official support for 10?

Hi,
The latest code is committed directly into the tree by Intel

eg
http://lists.freebsd.org/pipermail/svn-src-head/2014-July/060947.html
and
http://lists.freebsd.org/pipermail/svn-src-head/2014-June/059904.html

They have been MFC'd to RELENG_10 a few weeks ago


    ---Mike


-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada  http://www.tancsa.com/
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Intel Support for FreeBSD

2014-08-13 Thread Barney Cordoba via freebsd-net

It's not an either/or. Until last July there was both. Like F'ing Intel isn't 
making enough money to pay someone to maintain a FreeBSD version.


On Wednesday, August 13, 2014 2:24 PM, Jim Thompson j...@netgate.com wrote:
 




 On Aug 13, 2014, at 8:24, Barney Cordoba via freebsd-net 
 freebsd-net@freebsd.org wrote:
 
 Negative Progress is inevitable. 

Many here undoubtedly consider the referenced effort to be the opposite. 

Jim

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Intel Support for FreeBSD

2014-08-13 Thread Barney Cordoba via freebsd-net

This kind of stupidity really irritates me. The commercial use of FreeBSD is 
the only reason that there is a project, and anyone with 1/2 a brain knows that 
companies with products based on freebsd can't just upgrade their tree every 
time some geek gets around to writing a patch. Maybe its the reason that linux 
sucks but everyone uses it? 10 years later, some old brain dead mentality.


On Wednesday, August 13, 2014 2:49 PM, John-Mark Gurney j...@funkthat.com 
wrote:
 


Barney Cordoba via freebsd-net wrote this message on Wed, Aug 13, 2014 at 06:24 
-0700:
 Ok. It was a lot more convenient when it was a standalone module/tarball so 
 you didn't have to surgically extract it from the tree and spend a week 
 trying to get it to compile with whatever version you happened to be running. 
 So if you're running 9.1 or 9.2 you could still use it seamlessly. 
 
 Negative Progress is inevitable. 

The problem is that you are using an old version of FreeBSD that only
provides security update...  The correct solution is to update your
machines...

I'd much rather have Intel support it in tree, meaning that supported
versions of FreeBSD have an up to date driver, than to cater to your
wants of using older releases of FreeBSD...

Thanks.


 On Tuesday, August 12, 2014 9:57 PM, Mike Tancsa m...@sentex.net wrote:
  
 
 
 On 8/12/2014 9:16 PM, Barney Cordoba via freebsd-net wrote:
 
  I notice that there hasn't been an update in the Intel Download Center 
  since July. Is there no official support for 10?
 
 Hi,
 The latest code is committed directly into the tree by Intel
 
 eg
 http://lists.freebsd.org/pipermail/svn-src-head/2014-July/060947.html
 and
 http://lists.freebsd.org/pipermail/svn-src-head/2014-June/059904.html
 
 They have been MFC'd to RELENG_10 a few weeks ago

-- 
  John-Mark Gurney                Voice: +1 415 225 5579

     All that I will do, has been done, All that I have, has not.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Intel Support for FreeBSD

2014-08-12 Thread Barney Cordoba via freebsd-net

I notice that there hasn't been an update in the Intel Download Center since 
July. Is there no official support for 10?

We liked to use the intel stuff as an alternative to the latest freebsd code, 
but it doesnt  compile.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: TSO help or hindrance ? (was Re: TSO and FreeBSD vs Linux)

2013-09-13 Thread Barney Cordoba

Didn't read down far enough.  There are def issues and gains are probably 
mostly in a lab with  9k frames. Turn it off. CPUs and buses are fast. 

BC


On Sep 10, 2013, at 6:52 PM, Mike Tancsa m...@sentex.net wrote:

 On 9/10/2013 6:42 PM, Barney Cordoba wrote:
 NFS has been broken since Day 1, so lets not come to conclusions about
 anything
 as it relates to NFS.
 
 iSCSI is NFS ?
 
---Mike
 
 
 BC
 
 
 *From:* Mike Tancsa m...@sentex.net
 *To:* Rick Macklem rmack...@uoguelph.ca
 *Cc:* FreeBSD Net n...@freebsd.org; David Wolfskill da...@catwhisker.org
 *Sent:* Wednesday, September 4, 2013 11:26 AM
 *Subject:* TSO help or hindrance ? (was Re: TSO and FreeBSD vs Linux)
 
 On 9/4/2013 8:50 AM, Rick Macklem wrote:
 David Wolfskill wrote:
 
 
 I noticed that when I tried to write files to NFS, I could write
 small
 files OK, but larger ones seemed to ... hang.
 * ifconfig -v em0 showed flags TSO4  VLAN_HWTSO turned on.
 * sysctl net.inet.tcp.tso showed 1 -- enabled.
 
 As soon as I issued sudo net.inet.tcp.tso=0 ... the copy worked
 without
 a hitch or a whine.  And I was able to copy all 117709618 bytes, not
 just
 2097152 (2^21).
 
 Is the above expected?  It came rather as a surprise to me.
 
 Not surprising to me, I'm afraid. When there are serious NFS problems
 like this, it is often caused by a network fabric issue and broken
 TSO is at the top of the list w.r.t. cause.
 
 
 I was just experimenting a bit with iSCSI via FreeNAS and was a little
 disappointed at the speeds I was getting. So, I tried disabling tso on
 both boxes and it did seem to speed things up a bit.  Data and testing
 methods attached in a txt file.
 
 I did 3 cases.
 
 Just boot up FreeNAS and the initiator without tweaks.  That had the
 worst performance.
 disable tso on the nic as well as via sysctl on both boxes. That had the
 best performance.
 re-enable tso on both boxes. That had better performance than the first
 case, but still not as good as totally disabling it.  I am guessing
 something is not quite being re-enabled properly ? But its different
 than the other two cases ?!?
 
 tgt is FreeNAS-9.1.1-RELEASE-x64 (a752d35) and initiator is r254328 9.2
 AMD64
 
 The FreeNAS box has 16G of RAM, so the file is being served out of cache
 as gstat shows no activity when sending out the file
 
 
 
---Mike
 
 
 -- 
 ---
 Mike Tancsa, tel +1 519 651 3400
 Sentex Communications, m...@sentex.net mailto:m...@sentex.net
 Providing Internet services since 1994 www.sentex.net
 Cambridge, Ontario Canada  http://www.tancsa.com/
 
 ___
 freebsd-net@freebsd.org mailto:freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
 mailto:freebsd-net-unsubscr...@freebsd.org
 
 
 
 -- 
 ---
 Mike Tancsa, tel +1 519 651 3400
 Sentex Communications, m...@sentex.net
 Providing Internet services since 1994 www.sentex.net
 Cambridge, Ontario Canada   http://www.tancsa.com/
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: TSO help or hindrance ? (was Re: TSO and FreeBSD vs Linux)

2013-09-10 Thread Barney Cordoba

NFS has been broken since Day 1, so lets not come to conclusions about anything
as it relates to NFS.

BC

 From: Mike Tancsa m...@sentex.net
To: Rick Macklem rmack...@uoguelph.ca 
Cc: FreeBSD Net n...@freebsd.org; David Wolfskill da...@catwhisker.org 
Sent: Wednesday, September 4, 2013 11:26 AM
Subject: TSO help or hindrance ?  (was Re: TSO and FreeBSD vs Linux)

On 9/4/2013 8:50 AM, Rick Macklem wrote:
 David Wolfskill wrote:

 I noticed that when I tried to write files to NFS, I could write
 small
 files OK, but larger ones seemed to ... hang.
 * ifconfig -v em0 showed flags TSO4  VLAN_HWTSO turned on.
 * sysctl net.inet.tcp.tso showed 1 -- enabled.

 As soon as I issued sudo net.inet.tcp.tso=0 ... the copy worked
 without
 a hitch or a whine.  And I was able to copy all 117709618 bytes, not
 just
 2097152 (2^21).

 Is the above expected?  It came rather as a surprise to me.

 Not surprising to me, I'm afraid. When there are serious NFS problems
 like this, it is often caused by a network fabric issue and broken
 TSO is at the top of the list w.r.t. cause.

I was just experimenting a bit with iSCSI via FreeNAS and was a little
disappointed at the speeds I was getting. So, I tried disabling tso on
both boxes and it did seem to speed things up a bit.  Data and testing
methods attached in a txt file.

I did 3 cases.

Just boot up FreeNAS and the initiator without tweaks.  That had the
worst performance.
disable tso on the nic as well as via sysctl on both boxes. That had the
best performance.
re-enable tso on both boxes. That had better performance than the first
case, but still not as good as totally disabling it.  I am guessing
something is not quite being re-enabled properly ? But its different
than the other two cases ?!?

tgt is FreeNAS-9.1.1-RELEASE-x64 (a752d35) and initiator is r254328 9.2
AMD64

The FreeNAS box has 16G of RAM, so the file is being served out of cache
as gstat shows no activity when sending out the file

    ---Mike

-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada  http://www.tancsa.com/

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Flow ID, LACP, and igb

2013-09-02 Thread Barney Cordoba

Are you using a pcie3 bus? Of course this is only an issue for 10g; what pct of
FreeBSD users have a load over 9.5Gb/s? It's completely unnecessary for igb
or em driver, so why is it used? because it's there.

Here's my argument against it. The handful of brains capable of doing driver 
development
become consumed with BS like LRO and the things that need to be fixed, like
buffer management and basic driver design flaws, never get fixed. The offload
code makes the driver code a virtual mess that can only be maintained by Jack 
and
1 other guy in the entire world. And it takes 10 times longer to make a simple 
change or
to add support for a new NIC. 

In a week I ripped out the offload crap and the 9000 sysctls, eliminated the 
consumer buffer problem, reduced locking by 40% and now the igb driver
uses 20% less cpu with a full gig load.

And the code is cleaner and more easily maintained.

BC



 From: Adrian Chadd adr...@freebsd.org
To: Barney Cordoba barney_cord...@yahoo.com 
Cc: Andre Oppermann an...@freebsd.org; Alan Somers asom...@freebsd.org; 
n...@freebsd.org n...@freebsd.org; Jack F Vogel j...@freebsd.org; Justin 
T. Gibbs gi...@freebsd.org; Luigi Rizzo ri...@iet.unipi.it; T.C. Gubatayao 
tgubata...@barracuda.com 
Sent: Sunday, September 1, 2013 4:51 PM
Subject: Re: Flow ID, LACP, and igb
 

Yo,

LRO is an interesting hack that seems to do a good trick of hiding the
ridiculous locking and unfriendly cache behaviour that we do per-packet.

It helps with LAN test traffic where things are going out in batches from
the TCP layer so the RX layer sees these frames in-order and can do LRO.
When you disable it, I don't easily get 10GE LAN TCP performance. That has
to be fixed. Given how fast the CPU cores, bus interconnect and memory
interconnects are, I don't think there should be any reason why we can't
hit 10GE traffic on a LAN with LRO disabled (in both software and hardware.)

Now that I have the PMC sandy bridge stuff working right (but no PEBS, I
have to talk to Intel about that in a bit more detail before I think about
hacking that in) we can get actual live information about this stuff. But
the last time I looked, there's just too much per-packet latency going on.
The root cause looks like it's a toss up between scheduling, locking and
just lots of code running to completion per-frame. As I said, that all has
to die somehow.

2c,



-adrian



On 1 September 2013 08:45, Barney Cordoba barney_cord...@yahoo.com wrote:



 Comcast sends packets OOO. With any decent number of internet hops you're
 likely to encounter a load
 balancer or packet shaper that sends packets OOO, so you just can't be
 worried about it. In fact, your
 designs MUST work with OOO packets.

 Getting balance on your load balanced lines is certainly a bigger upside
 than the additional CPU used.
 You can buy a faster processor for your stack for a lot less than you
 can buy bandwidth.

 Frankly my opinion of LRO is that it's a science project suitable for labs
 only. It's a trick to get more bandwidth
 than your bus capacity; the answer is to not run PCIe2 if you need pcie3.
 You can use it internally if you have
 control of all of the machines. When I modify a driver the first thing
 that I do is rip it out.

 BC


 
  From: Luigi Rizzo ri...@iet.unipi.it
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: Andre Oppermann an...@freebsd.org; Alan Somers asom...@freebsd.org;
 n...@freebsd.org n...@freebsd.org; Jack F Vogel j...@freebsd.org;
 Justin T. Gibbs gi...@freebsd.org; T.C. Gubatayao 
 tgubata...@barracuda.com
 Sent: Saturday, August 31, 2013 10:27 PM
 Subject: Re: Flow ID, LACP, and igb


 On Sun, Sep 1, 2013 at 4:15 AM, Barney Cordoba barney_cord...@yahoo.com
 wrote:

  ...
 

 [your point on testing with realistic assumptions is surely a valid one]


 
  Of course there's nothing really wrong with OOO packets. We had this
  discussion before; lots of people
  have round robin dual homing without any ill effects. It's just not an
  issue.
 

 It depends on where you are.
 It may not be an issue if the reordering is not large enough to
 trigger retransmissions, but even then it is annoying as it causes
 more work in the endpoint -- it prevents LRO from working, and even
 on the host stack it takes more work to sort where an out of order
 segment goes than appending an in-order one to the socket buffer.

 cheers
 luigi
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo

Re: Flow ID, LACP, and igb

2013-09-01 Thread Barney Cordoba



Comcast sends packets OOO. With any decent number of internet hops you're 
likely to encounter a load
balancer or packet shaper that sends packets OOO, so you just can't be worried 
about it. In fact, your
designs MUST work with OOO packets. 

Getting balance on your load balanced lines is certainly a bigger upside than 
the additional CPU used.
You can buy a faster processor for your stack for a lot less than you can buy 
bandwidth. 

Frankly my opinion of LRO is that it's a science project suitable for labs 
only. It's a trick to get more bandwidth
than your bus capacity; the answer is to not run PCIe2 if you need pcie3. You 
can use it internally if you have
control of all of the machines. When I modify a driver the first thing that I 
do is rip it out.

BC



 From: Luigi Rizzo ri...@iet.unipi.it
To: Barney Cordoba barney_cord...@yahoo.com 
Cc: Andre Oppermann an...@freebsd.org; Alan Somers asom...@freebsd.org; 
n...@freebsd.org n...@freebsd.org; Jack F Vogel j...@freebsd.org; Justin 
T. Gibbs gi...@freebsd.org; T.C. Gubatayao tgubata...@barracuda.com 
Sent: Saturday, August 31, 2013 10:27 PM
Subject: Re: Flow ID, LACP, and igb
 

On Sun, Sep 1, 2013 at 4:15 AM, Barney Cordoba barney_cord...@yahoo.comwrote:

 ...


[your point on testing with realistic assumptions is surely a valid one]



 Of course there's nothing really wrong with OOO packets. We had this
 discussion before; lots of people
 have round robin dual homing without any ill effects. It's just not an
 issue.


It depends on where you are.
It may not be an issue if the reordering is not large enough to
trigger retransmissions, but even then it is annoying as it causes
more work in the endpoint -- it prevents LRO from working, and even
on the host stack it takes more work to sort where an out of order
segment goes than appending an in-order one to the socket buffer.

cheers
luigi
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Flow ID, LACP, and igb

2013-08-31 Thread Barney Cordoba

May I express my glee and astonishment that  you're debating the use of 
complicated hash functions
for something that's likely to have from 2-8 slots?

Also, the *most* important thing is distribution with realistic data. The goal 
should be to use the
most trivial function that gives the most balanced distribution with real 
numbers. Faster is
not better if the result is an unbalanced distribution.

Many of your ports will be 80 and 53, and if you're going through a router your 
ethernets
may not be very unique, so why even bother to include them? Does getting a good 
distribution
require that you hash every element individually, or can you get the same 
distribution with
a faster, simpler way of creating the seed?

There's also the other consideration of packet size. Packets on port 53 are 
likely to be smaller
than packets on port 80. What you want is equal distribution PER PORT on the 
ports that will
carry that vast majority of your traffic.

When designing efficient systems, you must not assume that ports and IPs are 
random, because they're
not. 99% of your load will be on a small number of destination ports and a 
limited range of source ports.

For a web server application, geting a perfect distribution on the http ports 
is most crucial.

The hash function in if_lagg.c looks like more of a classroom exercise than a 
practical implementation. 
If you're going to consider 100M iterations; consider that much of the time is 
wasted parsing the
packet (again). Why not add a simple sysctl that enables a hash that is created 
in the ip parser,
when all of the pieces are available without having to re-parse the mbuf?

Or better yet, use the same number of queues on igb as you have LAGG ports, and 
use the queue id (or RSS)
as the hash, so that your traffic is sync'd between the ethernet adapter queues 
and the LAGG ports. The card
has already done the work for you.

BC






 From: Luigi Rizzo ri...@iet.unipi.it
To: Alan Somers asom...@freebsd.org 
Cc: Jack F Vogel j...@freebsd.org; n...@freebsd.org n...@freebsd.org; 
Justin T. Gibbs gi...@freebsd.org; Andre Oppermann an...@freebsd.org; T.C. 
Gubatayao tgubata...@barracuda.com 
Sent: Friday, August 30, 2013 8:04 PM
Subject: Re: Flow ID, LACP, and igb
 

Alan,


On Thu, Aug 29, 2013 at 6:45 PM, Alan Somers asom...@freebsd.org wrote:


 ...
 I pulled all four hash functions out into userland and microbenchmarked
 them.  The upshot is that hash32 and fnv_hash are the fastest, jenkins_hash
 is slower, and siphash24 is the slowest.  Also, Clang resulted in much
 faster code than gcc.


i missed this part of your message, but if i read your code well,
you are running 100M iterations and the numbers below are in seconds,
so if you multiply the numbers by 10 you have the cost per hash in
nanoseconds.

What CPU did you use for your tests ?

Also some of the numbers (FNV and hash32) are suspiciously low.

I believe that the compiler (both of them) have figure out that everything
is constant in these functions, and fnv_32_buf() and hash32_buf() are
inline,
hence they can be optimized to just return a constant.
This does not happen for siphash and jenkins because they are defined
externally.

Can you please re-run the tests in a way that defeats the optimization ?
(e.g. pass a non constant argument to the the hashes so you actually need
to run the code).

cheers
luigi


http://people.freebsd.org/~asomers/lagg_hash/

 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash-gcc-4.8
 FNV: 0.76
 hash32: 1.18
 SipHash24: 44.39
 Jenkins: 6.20
 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash-gcc-4.2.1
 FNV: 0.74
 hash32: 1.35
 SipHash24: 55.25
 Jenkins: 7.37
 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.clang-3.3
 FNV: 0.30
 hash32: 0.30
 SipHash24: 55.97
 Jenkins: 6.45
 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.clang-3.2
 FNV: 0.30
 hash32: 0.30
 SipHash24: 44.52
 Jenkins: 6.48



  T.C.
 
  [1]
 
 http://svnweb.freebsd.org/base/head/sys/libkern/jenkins_hash.c?view=markup
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Flow ID, LACP, and igb

2013-08-31 Thread Barney Cordoba

And another thing; the use of modulo is very expensive when the number of ports
used in LAGG is *usually* a power of 2. foo(SLOTS-1) is a lot faster than 
(foo%SLOTS). 

if (SLOTS == 2 || SLOTS == 4 || SLOTS == 8)
    hash = hash(SLOTS-1);
else
    hash = hash % SLOTS;

is more than twice as fast as 

hash % SLOTS;

BC



 From: Luigi Rizzo ri...@iet.unipi.it
To: Alan Somers asom...@freebsd.org 
Cc: Jack F Vogel j...@freebsd.org; n...@freebsd.org n...@freebsd.org; 
Justin T. Gibbs gi...@freebsd.org; Andre Oppermann an...@freebsd.org; T.C. 
Gubatayao tgubata...@barracuda.com 
Sent: Friday, August 30, 2013 8:04 PM
Subject: Re: Flow ID, LACP, and igb
 

Alan,


On Thu, Aug 29, 2013 at 6:45 PM, Alan Somers asom...@freebsd.org wrote:


 ...
 I pulled all four hash functions out into userland and microbenchmarked
 them.  The upshot is that hash32 and fnv_hash are the fastest, jenkins_hash
 is slower, and siphash24 is the slowest.  Also, Clang resulted in much
 faster code than gcc.


i missed this part of your message, but if i read your code well,
you are running 100M iterations and the numbers below are in seconds,
so if you multiply the numbers by 10 you have the cost per hash in
nanoseconds.

What CPU did you use for your tests ?

Also some of the numbers (FNV and hash32) are suspiciously low.

I
 believe that the compiler (both of them) have figure out that everything
is constant in these functions, and fnv_32_buf() and hash32_buf() are
inline,
hence they can be optimized to just return a constant.
This does not happen for siphash and jenkins because they are defined
externally.

Can you please re-run the tests in a way that defeats the optimization ?
(e.g. pass a non constant argument to the the hashes so you actually need
to run the code).

cheers
luigi


http://people.freebsd.org/~asomers/lagg_hash/

 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash-gcc-4.8
 FNV: 0.76
 hash32: 1.18
 SipHash24: 44.39
 Jenkins: 6.20
 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash-gcc-4.2.1
 FNV: 0.74
 hash32: 1.35
 SipHash24: 55.25
 Jenkins: 7.37
 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.clang-3.3
 FNV: 0.30
 hash32: 0.30
 SipHash24: 55.97
 Jenkins: 6.45
 [root@sm4u-4 /usr/home/alans/ctest/lagg_hash]# ./lagg_hash.clang-3.2
 FNV: 0.30
 hash32: 0.30
 SipHash24: 44.52
 Jenkins: 6.48



  T.C.
 
  [1]
 
 http://svnweb.freebsd.org/base/head/sys/libkern/jenkins_hash.c?view=markup
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Intel 4-port ethernet adaptor link aggregation issue

2013-08-31 Thread Barney Cordoba

That's way too high. Your base rx requirement is 

Ports * queues * rxd 

With a quad card you shouldn't be using more than 2 queues, so your requirement
with 5 ports is 10,240 just for the receive setup. If you're using 4 queues that
number doubles, which would make 25,600 not enough. 

Note that setting mbufs to a huge number doesn't allocate the buffers; they'll 
be
allocated as needed. It's a ceiling. The reason for the ceiling is so that you 
don't
blow up your memory. If your system is using 2 million mbuf clusters then you
have much bigger problems than LAGG.

Anyone who recommends 2 million clearly has no idea what they're doing.

BC



 From: Joe Moog joem...@ebureau.com
To: freebsd-net freebsd-net@freebsd.org 
Sent: Wednesday, August 28, 2013 9:36 AM
Subject: Re: Intel 4-port ethernet adaptor link aggregation issue
 

All:

Thanks again to everybody for the responses and suggestions to our 4-port lagg 
issue. The solution (for those that may find the information of some value) was 
to set the value for kern.ipc.nmbclusters to a higher value than we had 
initially. Our previous tuning had this value set at 25600, but following a 
recommendation from the good folks at iXSystems we bumped this to a value 
closer to 200, and the 4-port lagg is functioning as expected now.

Thank you all.

Joe

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Flow ID, LACP, and igb

2013-08-31 Thread Barney Cordoba

No, no. The entire point of the hash is to separate the connections. But when 
testing you should
use realistic assumptions. You're not splitting packets, so the big packets 
will mess up your distribution
if you don't get it right. 

Of course there's nothing really wrong with OOO packets. We had this discussion 
before; lots of people
have round robin dual homing without any ill effects. It's just not an issue.


BC


 From: T.C. Gubatayao tgubata...@barracuda.com
To: Barney Cordoba barney_cord...@yahoo.com; Luigi Rizzo 
ri...@iet.unipi.it; Alan Somers asom...@freebsd.org 
Cc: Jack F Vogel j...@freebsd.org; Justin T. Gibbs gi...@freebsd.org; Andre 
Oppermann an...@freebsd.org; n...@freebsd.org n...@freebsd.org 
Sent: Saturday, August 31, 2013 9:38 PM
Subject: RE: Flow ID, LACP, and igb
 

On Sat, Aug 31, 2013 at 8:41 AM, Barney Cordoba barney_cord...@yahoo.com 
wrote:

 Also, the *most* important thing is distribution with realistic data. The goal
 should be to use the most trivial function that gives the most balanced
 distribution with real numbers. Faster is not better if the result is an
 unbalanced distribution.

Agreed, with a caveat.  It's critical that this distribution be by flow, so
that out of order packet delivery is minimized.

 Many of your ports will be 80 and 53, and if you're going through a router
 your ethernets may not be very unique, so why even bother to include them?
 Does getting a good distribution require that you hash every element
 individually, or can you get the same distribution with a faster, simpler way
 of creating the seed?

 There's also the other consideration of packet size. Packets on port 53 are
 likely to be smaller than packets on port 80. What you want is equal
 distribution PER PORT on the ports that will carry that vast majority of your
 traffic.

Unfortunately, trying to evenly distribute traffic per port based on packet
size will likely result in the reordering of packets, and bandwidth wasted on
TCP retransmissions.

 Or better yet, use the same number of queues on igb as you have LAGG ports,
 and use the queue id (or RSS) as the hash, so that your traffic is sync'd
 between the ethernet adapter queues and the LAGG ports. The card has already
 done the work for you.

Isn't this hash for selecting an outbound link?  The ingress adapter hash (RSS)
won't help for packets originating from the host, or for packets that may have
been translated or otherwise modified while traversing the stack.

T.C.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)

2013-08-21 Thread Barney Cordoba

 From: Andre Oppermann an...@freebsd.org
To: Adrian Chadd adr...@freebsd.org 
Cc: Barney Cordoba barney_cord...@yahoo.com; Luigi Rizzo 
ri...@iet.unipi.it; freebsd-net@freebsd.org freebsd-net@freebsd.org 
Sent: Wednesday, August 21, 2013 2:19 PM
Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)

On 18.08.2013 23:54, Adrian Chadd wrote:
 Hi,

 I think the UNIX architecture is a bit broken for anything other than the
 occasional (for various traffic levels defining occasional!) traffic
 connection. It's serving us well purely through the sheer force of will of
 modern CPU power but I think we can do a lot better.

I do not agree with you here.  The UNIX architecture is fine but of course
as with anything you're not going to get the full raw and theoretically
possible performance for every special case out of it.  It is extremely
versatile and performs rather good over a broad set of applications.

 _I_ think the correct model is a netmap model - batched packet handling,
 lightweight drivers pushing and pulling batches of things, with some
 lightweight plugins to service that inside the kernel and/or push into the
 netmap ring buffer in userland. Interfacing into the ethernet and socket
 layer should be something that bolts on the side, kind of netgraph style.
 It would likely look a lot more like a switching backplane with socket IO
 being one of many processing possibilities. If socket IO stays packet at a
 time than great; but that's messing up the ability to do a lot of other
 interesting things.

Sure, lets go back to MS-DOS with interrupt wedges.

First of all, the Unix model has long been abandoned. System V Streams
and all that classroom stuff (which is why I dislike netgraph) proved useless
once we got beyond Token Ring. All you heard about in the old days was
the OSI model; thank god the OSIs and CCITTs have become little more 
than noise as people started to really need to do things. How's that ISDN
thing working out?

As much as I complain, FreeBSD is far superior to other camps in their
discipline and conformance to sanity. Play around with linux internals and
you see what happens when you build an OS by an undisciplined committee.
There's no bigger abortion in computing than the sk_buff.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)

2013-08-19 Thread Barney Cordoba

 From: Luigi Rizzo ri...@iet.unipi.it
To: Barney Cordoba barney_cord...@yahoo.com 
Cc: freebsd-net@freebsd.org freebsd-net@freebsd.org; Adrian Chadd 
adr...@freebsd.org 
Sent: Sunday, August 18, 2013 5:16 PM
Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)

On Sun, Aug 18, 2013 at 11:01 PM, Barney Cordoba
barney_cord...@yahoo.comwrote:

 That's fine, it's a test tool, not a solution. It just seems that it gets
 pushed as if it's some sort of real
 world solution, which it's not. The idea that bringing packets into user
 space to forward them rather
 than just replacing the bridge module with something more efficient is
 just silliness.

you might want to have a look at the VALE switch

http://info.iet.unipi.it/~luigi/vale/

the upcoming version can attach physical interfaces to the switch and keep
all the processing within the kernel.

 If pushing packets was a useful task, the solution would be easy.
 Unfortunately you need to do
 something useful with the packets in between.

there are different definitions of what is useful:
sources, sinks, forwarding, dropping (anti DoS), logging, ids,
are all useful for different people. The mistake, i think,
is to expect that there is one magic solution to handle all the useful
cases.

cheers
luigi
___

Nobody claimed that there was a magic solution. But when so much time and 
brainpower
is spent working on kludges (instead of doing things that have mainstream 
usefulness), it
results in either 1) fewer people using it or 2) the kludges become accepted 
solutions, simply
because someone did it. Polling, dummynet, netgraph, flowtable and buf_ring
are all good examples. 

It's the big negative of open source, particularly for the bigger projects. 
Once someone has
done something, it not worth the effort in most cases to do it in a more 
correct way; and 
the something becomes all that's available.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)

2013-08-18 Thread Barney Cordoba

 From: Adrian Chadd adr...@freebsd.org
To: Barney Cordoba barney_cord...@yahoo.com 
Cc: Luigi Rizzo ri...@iet.unipi.it; Lawrence Stewart lstew...@freebsd.org; 
FreeBSD Net n...@freebsd.org 
Sent: Saturday, August 17, 2013 11:59 AM
Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)

... we get perfectly good throughput without 400k ints a second on the ixgbe 
driver.

As in, I can easily saturate 2 x 10GE on ixgbe hardware with a handful of 
flows. That's not terribly difficult.

However, there's a few interesting problems that need addressing:

* There's lock contention between the transmit side from userland and the TCP 
timers, and the receive side with ACK processing. Under very high traffic load 
a lot of lock contention stalls things. We (the royal we, I'm mostly just 
doing tooling at the moment) working on that.
* There's lock contention on the ARP, routing table and PCB lookups. The latter 
will go away when we've finally implemented RSS for transmit and receive and 
then moved things over to using PCB groups on CPUs which have NIC driver 
threads bound to them.
* There's increasing cache thrashing from a larger workload, causing the 
expensive lookups to be even more expensive.
* All the list walks suck. We need to be batching things so we use CPU caches 
much more efficiently.

The idea of using TSO on the transmit side and generic LRO on the receive side 
is to make the per-packet overhead less. I think we can be much more efficient 
in general in packet processing, but that's a big task. :-) So, using at least 
TSO is a big benefit if purely to avoid decomposing things into smaller mbufs 
and contending on those locks in a very big way.

I'm working on PMC to make it easier to use to find these bottlenecks and make 
the code and data more efficient. Then, likely, I'll end up hacking on generic 
TSO/LRO, TX/RX RSS queue management and make the PCB group thing default on for 
SMP machines. I may even take a knife to some of the packet processing overhead.

---

The ints/sec reference was based on Luigi's implication that turning off 
moderation was some sort of performance choice.

Again, you're talking throughput and not efficiency. I could fill a tx queue 
with 10gb of traffic with  yesteryear's cpus. It's not an achievement. Being 
able to bridge 
real traffic at 10gb/s with 2 cores is.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)

2013-08-18 Thread Barney Cordoba

Great. Never has the been a better explanation for the word Kludge than netmap.

 From: Adrian Chadd adr...@freebsd.org
To: Jim Thompson j...@netgate.com 
Cc: Barney Cordoba barney_cord...@yahoo.com; FreeBSD Net n...@freebsd.org; 
Luigi Rizzo ri...@iet.unipi.it; Lawrence Stewart lstew...@freebsd.org 
Sent: Sunday, August 18, 2013 11:57 AM
Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)

Right. Well, post some profiling data, let's figure this out sometime.

Luigi can do bridging with 2 cores using netmap. So it's technically
possible. There's just a lot of kernel gunk in the way of doing it ye olde
way.

-adrian

On 18 August 2013 07:25, Jim Thompson j...@netgate.com wrote:

 On Aug 18, 2013, at 8:48 AM, Barney Cordoba barney_cord...@yahoo.com
 wrote:

  I could fill a tx queue with 10gb of traffic with  yesteryear's cpus.
 It's not an achievement. Being able to bridge
  real traffic at 10gb/s with 2 cores is

 Or forward at layer 3.

 Or filter packets.

 Or IPSEC.

 Or...
___
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)

2013-08-18 Thread Barney Cordoba

That's fine, it's a test tool, not a solution. It just seems that it gets 
pushed as if it's some sort of real
world solution, which it's not. The idea that bringing packets into user space 
to forward them rather
than just replacing the bridge module with something more efficient is just 
silliness.

If pushing packets was a useful task, the solution would be easy. 
Unfortunately you need to do
something useful with the packets in between.

Reminds me of polling. The problem is that over time, people actually view it 
as a solution, when
it was never more than a kludge in the first place.

BC



 From: Adrian Chadd adr...@freebsd.org
To: Barney Cordoba barney_cord...@yahoo.com 
Cc: freebsd-net@freebsd.org freebsd-net@freebsd.org 
Sent: Sunday, August 18, 2013 3:18 PM
Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)
 







On 18 August 2013 11:39, Barney Cordoba barney_cord...@yahoo.com wrote:

Great. Never has the been a better explanation for the word Kludge than netmap.

Nah. Netmap is a reimplementation of some reasonably well known ways of pushing 
bits. Luigi just pushed it up to eleven and demonstrated what current hardware 
is capable of. I have never bought the We need eleventy cores just to push 
10ge of real traffic! before.

Luigi did note down where the per-packet inefficiencies were. What we have to 
do now is sit down and for each of those, figure out what the root causes are 
and how to mitigate it. There's some architectural things that need tidying up 
(read: CPU pinning, queue handling, some locking hilarity) but if they're 
solved, we'll end up having dual core boxes push line rate packets for routing.

So the gauntlet has been thrown. Let's fix this shit up.



-adrian
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)

2013-08-18 Thread Barney Cordoba

Criticism is the bedrock of innovation.

 From: Vijay Singh vijju.si...@gmail.com
To: Barney Cordoba barney_cord...@yahoo.com 
Cc: Adrian Chadd adr...@freebsd.org; freebsd-net@freebsd.org 
freebsd-net@freebsd.org 
Sent: Sunday, August 18, 2013 3:46 PM
Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)

Barney, did you get picked on a lot as a kid? Wonder why you're so caustic and 
negative all the time?

Sent from my iPhone

On Aug 18, 2013, at 11:39 AM, Barney Cordoba barney_cord...@yahoo.com wrote:

 Great. Never has the been a better explanation for the word Kludge than 
 netmap.

 From: Adrian Chadd adr...@freebsd.org
 To: Jim Thompson j...@netgate.com 
 Cc: Barney Cordoba barney_cord...@yahoo.com; FreeBSD Net 
 n...@freebsd.org; Luigi Rizzo ri...@iet.unipi.it; Lawrence Stewart 
 lstew...@freebsd.org 
 Sent: Sunday, August 18, 2013 11:57 AM
 Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs 
 Linux)

 Right. Well, post some profiling data, let's figure this out sometime.

 Luigi can do bridging with 2 cores using netmap. So it's technically
 possible. There's just a lot of kernel gunk in the way of doing it ye olde
 way.

 -adrian

 On 18 August 2013 07:25, Jim Thompson j...@netgate.com wrote:

 On Aug 18, 2013, at 8:48 AM, Barney Cordoba barney_cord...@yahoo.com
 wrote:

 I could fill a tx queue with 10gb of traffic with  yesteryear's cpus.
 It's not an achievement. Being able to bridge
 real traffic at 10gb/s with 2 cores is

 Or forward at layer 3.

 Or filter packets.

 Or IPSEC.

 Or...
 ___
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)

2013-08-17 Thread Barney Cordoba

Horsehockey. What are you guys running with, P4s?

Modern cpus are magnificently fast. The triviality of lookups is a non-issue 
in almost all cases. The ability of modern cpus to fill a transmit queue faster
than the data can be transmitted is incontrovertible.

With TCP you have windows and things; trying to drill down to hardware
inefficiencies as if you're running on a 200Mhz P4 is just silly.

I abandoned hardware offloads back when someone tried to sell me on data
compression boards; the truth is that the IO overhead of copying to and from 
the board was higher than the cpu cycles needed to compress the data.


The failure to understand how IO and locks interfere with traffic flow on 
multicore systems is the biggest problem with driver development; all of this
chatter about moderation is simply a waste of time; such things are completely
tunable; a task that gets far too little attention IMO. Tuning can make a world
of difference if you understand what you're doing.

The idea that having 400K ints/second to gain a tock of throughput is an 
acceptable
trade-off is patently absurd.

EFFICIENCY is tantamount. Throughput is almost always a tuning issue.


BC


 From: Luigi Rizzo ri...@iet.unipi.it
To: Lawrence Stewart lstew...@freebsd.org 
Cc: FreeBSD Net n...@freebsd.org 
Sent: Wednesday, August 14, 2013 6:21 AM
Subject: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)
 

On Wed, Aug 14, 2013 at 05:23:02PM +1000, Lawrence Stewart wrote:
 On 08/14/13 16:33, Julian Elischer wrote:
  On 8/14/13 11:39 AM, Lawrence Stewart wrote:
  On 08/14/13 03:29, Julian Elischer wrote:
  I have been tracking down a performance embarrassment on AMAZON EC2 and
  have found it I think.
  Let us please avoid conflating performance with throughput. The
  behaviour you go on to describe as a performance embarrassment is
  actually a throughput difference, and the FreeBSD behaviour you're
  describing is essentially sacrificing throughput and CPU cycles for
  lower latency. That may not be a trade-off you like, but it is an
  important factor in this discussion.
...
 Sure, there's nothing wrong with holding throughput up as a key
 performance metric for your use case.
 
 I'm just trying to pre-empt a discussion that focuses on one metric and
 fails to consider the bigger picture.
...
  I could see no latency reversion.
 
 You wouldn't because it would be practically invisible in the sorts of
 tests/measurements you're doing. Our good friends over at HRT on the
 other hand would be far more likely to care about latency on the order
 of microseconds. Again, the use case matters a lot.
...
  so, does Software LRO mean that LRO on hte NIC should be ON or OFF to
  see this?
 
 I think (check the driver code in question as I'm not sure) that if you
 ifconfig if lro and the driver has hardware support or has been made
 aware of our software implementation, it should DTRT.

The lower throughput than linux that julian was seeing is either
because of a slow (CPU-bound) sender or slow receiver. Given that
the FreeBSD tx path is quite expensive (redoing route and arp lookups
on every packet, etc.) I highly suspect the sender side is at fault.

Ack coalescing, LRO, GRO are limited to the set of packets that you
receive in the same batch, which in turn is upper bounded by the
interrupt moderation delay. Apart from simple benchmarks with only
a few flows, it is very hard that ack/lro/gro can coalesce more
than a few segments for the same flow.

    But the real fix is in tcp_output.

In fact, it has never been the case that an ack (single or coalesced)
triggers an immediate transmission in the output path.  We had this
in the past (Silly Window Syndrome) and there is code that avoids
sending less than 1-mtu under appropriate conditions (there is more
data to push out anyways, no NODELAY, there are outstanding acks,
the window can open further).  In all these cases there is no
reasonable way to experience the difference in terms of latency.

If one really cares, e.g. the High Speed Trading example, this is
a non issue because any reasonable person would run with TCP_NODELAY
(and possibly disable interrupt moderation), and optimize for latency
even on a per flow basis.

In terms of coding effort, i suspect that by replacing the 1-mtu
limit (t_maxseg i believe is the variable that we use in the SWS
avoidance code) with 1-max-tso-segment we can probably achieve good
results with little programming effort.

Then the problem remains that we should keep a copy of route and
arp information in the socket instead of redoing the lookups on
every single transmission, as they consume some 25% of the time of
a sendto(), and probably even more when it comes to large tcp
segments, sendfile() and the like.

    cheers
    luigi
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to

Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)

2013-08-17 Thread Barney Cordoba

EFFICIENCY is tantamount. Throughput is almost always a tuning issue.

Of course I meant paramount. Coffee matters :-|

From: Luigi Rizzo ri...@iet.unipi.it
To: Lawrence Stewart lstew...@freebsd.org 
Cc: FreeBSD Net n...@freebsd.org 
Sent: Wednesday, August 14, 2013 6:21 AM
Subject: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)

On Wed, Aug 14, 2013 at 05:23:02PM +1000, Lawrence Stewart wrote:
 On 08/14/13 16:33, Julian Elischer wrote:
  On 8/14/13 11:39 AM, Lawrence Stewart wrote:
  On 08/14/13 03:29, Julian Elischer wrote:
  I have been tracking down a performance embarrassment on AMAZON EC2 and
  have found it I think.
  Let us please avoid conflating performance with throughput. The
  behaviour you go on to describe as a performance embarrassment is
  actually a throughput difference, and the FreeBSD behaviour you're
  describing is essentially sacrificing throughput and CPU cycles for
  lower latency. That may not be a trade-off you like, but it is an
  important factor in this discussion.
...
 Sure, there's nothing wrong with holding throughput up as a key
 performance metric for your use case.

 I'm just trying to pre-empt a discussion that focuses on one metric and
 fails to consider the bigger picture.
...
  I could see no latency reversion.

 You wouldn't because it would be practically invisible in the sorts of
 tests/measurements you're doing. Our good friends over at HRT on the
 other hand would be far more likely to care about latency on the order
 of microseconds. Again, the use case matters a lot.
...
  so, does Software LRO mean that LRO on hte NIC should be ON or OFF to
  see this?

 I think (check the driver code in question as I'm not sure) that if you
 ifconfig if lro and the driver has hardware support or has been made
 aware of our software implementation, it should DTRT.

The lower throughput than linux that julian was seeing is either
because of a slow (CPU-bound) sender or slow receiver. Given that
the FreeBSD tx path is quite expensive (redoing route and arp lookups
on every packet, etc.) I highly suspect the sender side is at fault.

Ack coalescing, LRO, GRO are limited to the set of packets that you
receive in the same batch, which in turn is upper bounded by the
interrupt moderation delay. Apart from simple benchmarks with only
a few flows, it is very hard that ack/lro/gro can coalesce more
than a few segments for the same flow.

    But the real fix is in tcp_output.

In fact, it has never been the case that an ack (single or coalesced)
triggers an immediate transmission in the output path.  We had this
in the past (Silly Window Syndrome) and there is code that avoids
sending less than 1-mtu under appropriate conditions (there is more
data to push out anyways, no NODELAY, there are outstanding acks,
the window can open further).  In all these cases there is no
reasonable way to experience the difference in terms of latency.

If one really cares, e.g. the High Speed Trading example, this is
a non issue because any reasonable person would run with TCP_NODELAY
(and possibly disable interrupt moderation), and optimize for latency
even on a per flow basis.

In terms of coding effort, i suspect that by replacing the 1-mtu
limit (t_maxseg i believe is the variable that we use in the SWS
avoidance code) with 1-max-tso-segment we can probably achieve good
results with little programming effort.

Then the problem remains that we should keep a copy of route and
arp information in the socket instead of redoing the lookups on
every single transmission, as they consume some 25% of the time of
a sendto(), and probably even more when it comes to large tcp
segments, sendfile() and the like.

    cheers
    luigi
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Intel 4-port ethernet adaptor link aggregation issue

2013-08-03 Thread Barney Cordoba

You can create your own pipeline with some minor modifications. Why wait months
for the guys who did it wrong to make changes?

BC

 From: Adrian Chadd adr...@freebsd.org
To: Barney Cordoba barney_cord...@yahoo.com 
Cc: Zaphod Beeblebrox zbee...@gmail.com; Freddie Cash fjwc...@gmail.com; 
Steve Read steve.r...@netasq.com; freebsd-net freebsd-net@freebsd.org 
Sent: Saturday, August 3, 2013 12:21 AM
Subject: Re: Intel 4-port ethernet adaptor link aggregation issue

On 2 August 2013 16:35, Barney Cordoba barney_cord...@yahoo.com wrote:
 The stock igb driver binds to all cores, so with multiple igbs you have 
 multiple
 nics binding to the same cores. I suppose that might create issues in a lagg 
 setup.
 Try 1 queue  and/or comment out the bind code.

I have thrashed the hell out of 2-port ixgbe and 4-port chelsio
(cxgbe) on 4-core device all with lagg. All is great.

There's apparently some more igb improvements coming in the pipeline. Fear not!

-adrian
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Intel 4-port ethernet adaptor link aggregation issue

2013-08-02 Thread Barney Cordoba

The stock igb driver binds to all cores, so with multiple igbs you have multiple
nics binding to the same cores. I suppose that might create issues in a lagg 
setup.
Try 1 queue  and/or comment out the bind code.

BC



 From: Zaphod Beeblebrox zbee...@gmail.com
To: Freddie Cash fjwc...@gmail.com 
Cc: Steve Read steve.r...@netasq.com; freebsd-net freebsd-net@freebsd.org 
Sent: Friday, August 2, 2013 5:41 PM
Subject: Re: Intel 4-port ethernet adaptor link aggregation issue
 

On several machines with large numbers of IGBx interfaces, I've found that
hw.igb.enable_msix=0 is necessary to ensure proper operation.


On Fri, Aug 2, 2013 at 11:49 AM, Freddie Cash fjwc...@gmail.com wrote:

 On Fri, Aug 2, 2013 at 12:36 AM, Steve Read steve.r...@netasq.com wrote:

  On 01.08.2013 20:07, Joe Moog wrote:
 
  We have an iXsystems 1U server (E5) with an Intel 4-port ethernet NIC
  installed, model I350-T4 (manufactured May of 2013). We're trying to
 bind
  the 4 ports on this NIC together into a single lagg port, connected
 LACP to
  a distribution switch (Cisco 4900-series). We are able to successfully
 bind
  the 2 on-board ethernet ports to a single lagg, however the NIC is not
 so
  cooperative. At first we thought we had a bad NIC, but a replacement has
  not fixed the issue. We are thinking there may be a driver limitation
 with
  these Intel ethernet NICs when attempting to bind more than 2 ports to a
  lagg.
 
  FreeBSD version:
  FreeBSD 9.1-PRERELEASE #0 r244125: Wed Dec 12 11:47:47 CST 2012
 
  rc.conf:
  # LINK AGGREGATION
  ifconfig_igb2=UP
  ifconfig_igb3=UP
  ifconfig_igb4=UP
  ifconfig_igb5=UP
  cloned_interfaces=lagg0
  ifconfig_lagg0=laggproto lacp laggport igb2 laggport igb3 laggport igb4
  laggport igb5
  ifconfig_lagg0=inet 192.168.1.14  netmask 255.255.255.0
 
 

  Am I the only one who noticed that you replaced the value of
  $ifconfig_lagg0 that specifies the proto and the ports with one that
  specifies just the address?
 

 Good catch!


  Merge the two ifconfig_lagg0 lines into one, and it will work infinitely
  better, or at least no worse.
 
  ifconfig_lagg0=laggproto lacp laggport igb2 laggport igb3 laggport igb4
  laggport igb5 inet 192.168.1.14  netmask 255.255.255.0
 
  Or, if you want to keep them split into two parts (initialise lagg0, then
 add IP):

 create_args_lagg0=laggproto lacp laggport igb2 laggport igb3 laggport igb4
 laggport igb5

 ifconfig_lagg0=inet 192.168.1.14  netmask 255.255.255.0

 create_args_* are run first, then ifconfig_* are run.  I like this setup,
 as it separates create and initialise from configure for cloned/virtual
 interfaces like vlans, laggs, etc.


 --
 Freddie Cash
 fjwc...@gmail.com
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Recommendations for 10gbps NIC

2013-07-29 Thread Barney Cordoba



On Fri, 26 Jul 2013 15:14:17 -0700 (PDT) Barney Cordoba
barney_cord...@yahoo.com wrote about Re: Recommendations for 10gbps NIC:

BC I don't really understand why nearly all 10GBE cards are dual-port.
BC Surely there is a market for NICs between 1 gigabit and 20 gigabit.

Myricom has single port 10G cards. However, I only use them on Linux and
cannot comment on FreeBSD usage here.


cu
  Gerrit

I didn't write/ask that;  But intel makes a single port x540 card thats 
available
through popular online outlets.


BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: bce(4) panics, 9.2rc1 [redux]

2013-07-29 Thread Barney Cordoba

 From: Sean Bruno sean_br...@yahoo.com
To: freebsd-net@freebsd.org freebsd-net@freebsd.org 
Sent: Monday, July 29, 2013 8:56 PM
Subject: Re: bce(4) panics, 9.2rc1 [redux]

On Wed, 2013-07-24 at 14:07 -0700, Sean Bruno wrote: 
 Running 9.2 in production load mail servers.  We're hitting the
 watchdog message and crashing with the stable/9 version.  We're
 reverting the change from 2 weeks ago and seeing if it still happens.
 We didn't see this from stable/9 from about a month ago.

 Sean

Not seeing any changes to core dumps, or crashes after updating the
bce(4) interface on these Dell R410s.  IPMI was a definite false hope.
No changes noted after I modified the ipmi_attach code.

stable/7 works just fine and stable/9 fails with NMI erros on the
console very badly.  It fails so badly that it won't come into service
at all.  I've reverted stable/9 back to august of 2012 with no changes.

It sort of looks like r236216 is causing severe issues with my
configuration.  The Dell R410 has a 3rd ethernet interface for the BMC
only, not sure if that is meaningful in this context.

The 3rd interface is *not* visible from the o/s and is dedicated to the
BMC interface.

Doing more testing at this time to validate.

Sean

--

FWIW, I have an R210 with a BCM5716 running 9.1 RELEASE without
any problems. I have customized the driver a bit. Try turning off the
features and running it raw without any checksum or tso gobbledygook. 

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Recommendations for 10gbps NIC

2013-07-28 Thread Barney Cordoba

 From: Luigi Rizzo ri...@iet.unipi.it
To: Alexander V. Chernikov melif...@freebsd.org 
Cc: Barney Cordoba barney_cord...@yahoo.com; Daniel Feenberg 
feenb...@nber.org; freebsd-net@freebsd.org freebsd-net@freebsd.org 
Sent: Saturday, July 27, 2013 4:15 AM
Subject: Re: Recommendations for 10gbps NIC

On Sat, Jul 27, 2013 at 10:02 AM, Alexander V. Chernikov
melif...@freebsd.org wrote:
 On 27.07.2013 02:14, Barney Cordoba wrote:

 *From:* Daniel Feenberg feenb...@nber.org
 *To:* Alexander V. Chernikov melif...@freebsd.org
 *Cc:* Barney Cordoba barney_cord...@yahoo.com;
 freebsd-net@freebsd.org freebsd-net@freebsd.org
 *Sent:* Friday, July 26, 2013 4:59 PM

 *Subject:* Re: Recommendations for 10gbps NIC

 On Fri, 26 Jul 2013, Alexander V. Chernikov wrote:

   On 26.07.2013 19:30, Barney Cordoba wrote:

   *From:* Alexander V. Chernikov melif...@freebsd.org
 mailto:melif...@freebsd.org
   *To:* Boris Kochergin sp...@acm.poly.edu mailto:sp...@acm.poly.edu
   *Cc:* freebsd-net@freebsd.org mailto:freebsd-net@freebsd.org

   *Sent:* Thursday, July 25, 2013 2:10 PM
   *Subject:* Re: Recommendations for 10gbps NIC

   On 25.07.2013 00:26, Boris Kochergin wrote:
    Hi.
   Hello.

    I am looking for recommendations for a 10gbps NIC from someone who
 has
    successfully used it on FreeBSD. It will be used on FreeBSD
 9.1-R/amd64
    to capture packets. Some desired features are:

 We have experience with HP NC523SFP and Chelsio N320E. The key difference
 among 10GBE cards for us is how they treat foreign DACs. The HP would PXE
 boot with several brands and generic DACs, but the Chelsio required a
 Chelsio brand DAC to PXE boot. There was firmware on the NIC to check the
 brand of cable. Both worked fine once booted. The Chelsio cables were hard
 to find, which became a problem. Also, when used with diskless Unix
 clients the Chelsio cards seemed to hang from time to time. Otherwise
 packet loss was one in a million for both cards, even with 7 meter cables.

 We liked the fact that the Chelsio cards were single-port and cheaper. I
 don't really understand why nearly all 10GBE cards are dual-port. Surely
 there is a market for NICs between 1 gigabit and 20 gigabit.

 The NIC heatsinks are too hot to touch during use unless specially cooled.

 Daniel Feenberg
 NBER

 -
 The same reason that they don't make single core cpus anymore. It costs
 about the
 same to make a 1 port chip as a 2 port chip.

 I find it interesting how so many talk about the cards, when most
 often the
 differences are with the drivers. Luigi made the most useful comment;
 if you ever
 want to use netmap, you need to buy a card compatible with netmap.
 Although
 you don't need netmap just to capture 10Gb/s. Forwarding, Maybe.

 I also find it interesting that nobody seems to have a handle on the
 performance
 differences. Obviously they're all different. Maybe substantially
 different.

 It depends on what kind of performance you are talking about.
 All NICs are capable of doing linerate RX/TX for both small/big packets.

this is actually not true. I have direct experience with Intel,
Mellanox and Broadcom,
and small packets are a problem across the board even with 1 port.

From my experience only intel can do line rate (14.88Mpps) with 64-byte frames,
but suffers a bit with sizes that are not multiple of 64.
Mellanox peaks at around 7Mpps.
Broadcom is limited to some 2.5Mpps.
This is all with netmap, using the regular stack you are going to see
much much less.

Large frames (1400+) are probably not a problem for anyone, but since the
original post asked for packet capture, i thought the small-frame case
is a relevant one.

 The only notable exception I;m aware of are Intel 82598-based NICs which
 advertise PCI-E X8 gen2 with _2.5GT_ link speed, giving you maximum
 ~14Gbit/s bw for 2 ports instead of 20.

This makes me curious because i believe people have used netmap with
the 82598 and achieved close to line rate even with 64-byte frames/one port,
and i thought (maybe I am wrong ?) the various 2-port NICs use 4 lanes per port.
So the number i remember does not match with your quote of 2.5Gt/s.
Are all 82598 using 2.5GT/s (which is a gen.1 speed) instead of 5 ?

cheers
luigi
___

64 byte frames rarely require that 64 bytes be transferred across the
bus. Depending on your offloads the bus requirement can be quite a bit
less than the line speed.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Recommendations for 10gbps NIC

2013-07-28 Thread Barney Cordoba

 From: Alexander V. Chernikov melif...@freebsd.org
To: Barney Cordoba barney_cord...@yahoo.com 
Cc: freebsd-net@freebsd.org freebsd-net@freebsd.org; Daniel Feenberg 
feenb...@nber.org 
Sent: Saturday, July 27, 2013 4:02 AM
Subject: Re: Recommendations for 10gbps NIC

On 27.07.2013 02:14, Barney Cordoba wrote:

 *From:* Daniel Feenberg feenb...@nber.org
 *To:* Alexander V. Chernikov melif...@freebsd.org
 *Cc:* Barney Cordoba barney_cord...@yahoo.com;
 freebsd-net@freebsd.org freebsd-net@freebsd.org
 *Sent:* Friday, July 26, 2013 4:59 PM
 *Subject:* Re: Recommendations for 10gbps NIC

 On Fri, 26 Jul 2013, Alexander V. Chernikov wrote:

   On 26.07.2013 19:30, Barney Cordoba wrote:

   *From:* Alexander V. Chernikov melif...@freebsd.org
 mailto:melif...@freebsd.org
   *To:* Boris Kochergin sp...@acm.poly.edu mailto:sp...@acm.poly.edu
   *Cc:* freebsd-net@freebsd.org mailto:freebsd-net@freebsd.org
   *Sent:* Thursday, July 25, 2013 2:10 PM
   *Subject:* Re: Recommendations for 10gbps NIC

   On 25.07.2013 00:26, Boris Kochergin wrote:
    Hi.
   Hello.

    I am looking for recommendations for a 10gbps NIC from someone who has
    successfully used it on FreeBSD. It will be used on FreeBSD
 9.1-R/amd64
    to capture packets. Some desired features are:

 We have experience with HP NC523SFP and Chelsio N320E. The key difference
 among 10GBE cards for us is how they treat foreign DACs. The HP would PXE
 boot with several brands and generic DACs, but the Chelsio required a
 Chelsio brand DAC to PXE boot. There was firmware on the NIC to check the
 brand of cable. Both worked fine once booted. The Chelsio cables were hard
 to find, which became a problem. Also, when used with diskless Unix
 clients the Chelsio cards seemed to hang from time to time. Otherwise
 packet loss was one in a million for both cards, even with 7 meter cables.

 We liked the fact that the Chelsio cards were single-port and cheaper. I
 don't really understand why nearly all 10GBE cards are dual-port. Surely
 there is a market for NICs between 1 gigabit and 20 gigabit.

 The NIC heatsinks are too hot to touch during use unless specially cooled.

 Daniel Feenberg
 NBER

 -
 The same reason that they don't make single core cpus anymore. It costs
 about the
 same to make a 1 port chip as a 2 port chip.

 I find it interesting how so many talk about the cards, when most
 often the
 differences are with the drivers. Luigi made the most useful comment;
 if you ever
 want to use netmap, you need to buy a card compatible with netmap. Although
 you don't need netmap just to capture 10Gb/s. Forwarding, Maybe.

 I also find it interesting that nobody seems to have a handle on the
 performance
 differences. Obviously they're all different. Maybe substantially different.

It depends on what kind of performance you are talking about.
All NICs are capable of doing linerate RX/TX for both small/big packets.
The only notable exception I;m aware of are Intel 82598-based NICs which 
advertise PCI-E X8 gen2 with _2.5GT_ link speed, giving you maximum 
~14Gbit/s bw for 2 ports instead of 20.

This statement is sort of like saying all cars can do 65MPH or whatever the
speed limit is, so therefore all cars are equal.

If one device can forward 2Mpps at 20% cpu and other used 45%, obvious
there is a preference to use the more efficient driver/controller.

BC

 The x540 with RJ45 has the obvious advantage of being compatible with
 regular gigabit cards,
 and single port adapters are about $325 in the US.

 When cheap(er) 10g RJ45 switches become available, it will start to be
 used more and more.
 Very soon.

 BC

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Recommendations for 10gbps NIC

2013-07-26 Thread Barney Cordoba

 From: Alexander V. Chernikov melif...@freebsd.org
To: Boris Kochergin sp...@acm.poly.edu 
Cc: freebsd-net@freebsd.org 
Sent: Thursday, July 25, 2013 2:10 PM
Subject: Re: Recommendations for 10gbps NIC

On 25.07.2013 00:26, Boris Kochergin wrote:
 Hi.
Hello.

 I am looking for recommendations for a 10gbps NIC from someone who has
 successfully used it on FreeBSD. It will be used on FreeBSD 9.1-R/amd64
 to capture packets. Some desired features are:

 - PCIe
 - LC connectors
 - 10GBASE-SR
 - Either single- or dual-port
 - Multiqueue
Intel 82598/99/X520
Emulex OCe10102-NM
Mellanox ConnectX
Chelsio T4

Do they all cost the same, have the exact same features and have equally 
well-written drivers? Which do you recommend
and why?

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Recommendations for 10gbps NIC

2013-07-26 Thread Barney Cordoba

 From: Daniel Feenberg feenb...@nber.org
To: Alexander V. Chernikov melif...@freebsd.org 
Cc: Barney Cordoba barney_cord...@yahoo.com; freebsd-net@freebsd.org 
freebsd-net@freebsd.org 
Sent: Friday, July 26, 2013 4:59 PM
Subject: Re: Recommendations for 10gbps NIC

On Fri, 26 Jul 2013, Alexander V. Chernikov wrote:

 On 26.07.2013 19:30, Barney Cordoba wrote:

 *From:* Alexander V. Chernikov melif...@freebsd.org
 *To:* Boris Kochergin sp...@acm.poly.edu
 *Cc:* freebsd-net@freebsd.org
 *Sent:* Thursday, July 25, 2013 2:10 PM
 *Subject:* Re: Recommendations for 10gbps NIC

 On 25.07.2013 00:26, Boris Kochergin wrote:
  Hi.
 Hello.

  I am looking for recommendations for a 10gbps NIC from someone who has
  successfully used it on FreeBSD. It will be used on FreeBSD 9.1-R/amd64
  to capture packets. Some desired features are:

We have experience with HP NC523SFP and Chelsio N320E. The key difference 
among 10GBE cards for us is how they treat foreign DACs. The HP would PXE 
boot with several brands and generic DACs, but the Chelsio required a 
Chelsio brand DAC to PXE boot.  There was firmware on the NIC to check the 
brand of cable. Both worked fine once booted. The Chelsio cables were hard 
to find, which became a problem. Also, when used with diskless Unix 
clients the Chelsio cards seemed to hang from time to time. Otherwise 
packet loss was one in a million for both cards, even with 7 meter cables.

We liked the fact that the Chelsio cards were single-port and cheaper. I 
don't really understand why nearly all 10GBE cards are dual-port. Surely 
there is a market for NICs between 1 gigabit and 20 gigabit.

The NIC heatsinks are too hot to touch during use unless specially cooled.

Daniel Feenberg
NBER

-
The same reason that they don't make single core cpus anymore. It costs about 
the
same to make a 1 port chip as a 2 port chip.

I find it interesting  how so many talk about the cards, when most often the
differences are with the drivers. Luigi made the most useful comment; if you 
ever
want to use netmap,  you need to buy a card compatible with netmap. Although
you don't  need netmap just to capture 10Gb/s. Forwarding, Maybe. 

I also find it interesting that nobody seems to have a handle on the performance
differences. Obviously they're all different. Maybe substantially different.

The x540 with RJ45 has the obvious advantage of being compatible with regular 
gigabit cards, 
and single port adapters are about $325 in the US. 

When cheap(er) 10g RJ45 switches become available, it will start to be used 
more and more.
Very soon.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: LACP LAGG device problems

2013-07-21 Thread Barney Cordoba

On Sat, 7/20/13, isp ml...@ukr.net wrote:

 Subject: LACP LAGG device problems
 To: freebsd-net@freebsd.org
 Date: Saturday, July 20, 2013, 10:04 AM

 Hi! Can anybody tell me, is there any plans to improve
 LAGG(802.3ad)
 device driver in FreeBSD?
 It will be greate to have a possibility to set LACP mode
 (active/passive)
 and system priority.
 Also there is no way to set hashing algorithm and master
 interface
 (port).
 And we can't see any information about our neighbor.
 The same function in Linux is named Bonding and it is much
 more better.
 I realy can donate some money to those who can make this
 improvements.
 Best regards.

 ___

Why are you using LAGG when 10g cards are like $350? It's not
a peering protocol nor it is PTP; can you see your peer info on
an ethernet?

Bonding is a late 90s concept designed to connect 2 slow links to
get higher speeds, back in the day when 100Mb/s was ambitious.
The point of LAGG is that it's transparent; you can load balance
traffic to multiple hosts or create a redundant link without having
to have equipment running some special applications, or any special
logic above the LAGG device.

Describing how you are using LAGG (and why) might be better 
than just asking for improvements.

BC

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: LACP LAGG device problems

2013-07-21 Thread Barney Cordoba

I wasn't referring to science projects. Nor did I say it wasn't useful.
Only that 10g is cheap now and quite a bit better. LAGG isn't perfect.

- Original Message -
From: Adrian Chadd adr...@freebsd.org
To: Barney Cordoba barney_cord...@yahoo.com
Cc: freebsd-net@freebsd.org; isp ml...@ukr.net
Sent: Sunday, July 21, 2013 9:49 AM
Subject: Re: LACP LAGG device problems

Hah!

I'm pushing 20GE out using lagg right now (and fixing the er, amusing
behaviour of doing so.) I'm aiming to hit 40 once I get hardware that
doesn't get upset pushing that many bits. The netops people at ${JOB}
also point out that even today switches occasionally get confused and
crash a switchport. Ew.

So yes, there are people using lagg, both for failover and throughput reasons.

I'm working on debugging/statistics right now as part of general why
are things behaving crappy debugging. I'll see about improving some
of the peer reporting at the same time.

-adrian

On 21 July 2013 06:03, Barney Cordoba barney_cord...@yahoo.com wrote:

 On Sat, 7/20/13, isp ml...@ukr.net wrote:

  Subject: LACP LAGG device problems
  To: freebsd-net@freebsd.org
  Date: Saturday, July 20, 2013, 10:04 AM

  Hi! Can anybody tell me, is there any plans to improve
  LAGG(802.3ad)
  device driver in FreeBSD?
  It will be greate to have a possibility to set LACP mode
  (active/passive)
  and system priority.
  Also there is no way to set hashing algorithm and master
  interface
  (port).
  And we can't see any information about our neighbor.
  The same function in Linux is named Bonding and it is much
  more better.
  I realy can donate some money to those who can make this
  improvements.
  Best regards.

  ___

 Why are you using LAGG when 10g cards are like $350? It's not
 a peering protocol nor it is PTP; can you see your peer info on
 an ethernet?

 Bonding is a late 90s concept designed to connect 2 slow links to
 get higher speeds, back in the day when 100Mb/s was ambitious.
 The point of LAGG is that it's transparent; you can load balance
 traffic to multiple hosts or create a redundant link without having
 to have equipment running some special applications, or any special
 logic above the LAGG device.

 Describing how you are using LAGG (and why) might be better
 than just asking for improvements.

 BC

 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: FreeBSD router problems

2013-07-16 Thread Barney Cordoba

On Tue, 7/16/13, Eugene Grosbein eu...@grosbein.net wrote:

 Subject: Re: FreeBSD router problems
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: freebsd-net@freebsd.org
 Date: Tuesday, July 16, 2013, 1:10 AM

 On 15.07.2013 22:04, Barney Cordoba
 wrote:

  Also, IP fragmentation and TCP segments are not the
 same thing. TCP
  segments regularly will come in out of order, NFS is
 too stupid to do
  things correctly; IP fragmentation should not be done
 unless necessary
  to accommodate a smaller mtu.

 The PR is about NFS over UDP, not TCP.

--

Ok, so is there evidence that it's UDP and not an IP fragmenting problem?

Out of Order UDP is the same issue; its common for packets to traverse
different paths through the internet in the same connection, so OOO packets
are normal.

IP fragmentation is rare, except for NFS. A lot of ISPs will block fragmentation
because it's difficult to shape or filter fragments; they're often used to 
defeat
simple firewalls and filters.  

BC

 ___
 freebsd-net@freebsd.org
 mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: FreeBSD router problems

2013-07-15 Thread Barney Cordoba

On Sun, 7/14/13, Eugene Grosbein eu...@grosbein.net wrote:

 Subject: Re: FreeBSD router problems
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: isp ml...@ukr.net, freebsd-net@freebsd.org
 Date: Sunday, July 14, 2013, 1:17 PM

 On 14.07.2013 23:14, Barney Cordoba
 wrote:
  So why not get a real 10gb/s card? RJ45 10gig is here,

  and it works a lot better than LAGG.

  If you want to get more than 1Gb/s on a single
 connection,
  you'd need to use roundrobin, which will alternate
 packets
  without concern for ordering. Purists will argue
 against it,
  but it does work and modern TCP stacks know how to deal

  with out of order packets.

 Except of FreeBSD's packet reassembly is broken for long
 time.
 For example, http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/167603

-
NFS has been broken since the beginning of time.  NFS has always
had problems sending segments  the packet size.

There are a lot of ISPs that load balance multiple feeds
so OOO packets are a normal occurrence. A stack that doesn't 
handle out of order tcp packets doesn't work in today's world.

BC

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: FreeBSD router problems

2013-07-15 Thread Barney Cordoba

On Sun, 7/14/13, Eugene Grosbein eu...@grosbein.net wrote:

 Subject: Re: FreeBSD router problems
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: freebsd-net@freebsd.org, isp ml...@ukr.net
 Date: Sunday, July 14, 2013, 1:17 PM

 On 14.07.2013 23:14, Barney Cordoba
 wrote:
  So why not get a real 10gb/s card? RJ45 10gig is here,

  and it works a lot better than LAGG.

  If you want to get more than 1Gb/s on a single
 connection,
  you'd need to use roundrobin, which will alternate
 packets
  without concern for ordering. Purists will argue
 against it,
  but it does work and modern TCP stacks know how to deal

  with out of order packets.

 Except of FreeBSD's packet reassembly is broken for long
 time.
 For example, http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/167603

 ___

Also, IP fragmentation and TCP segments are not the same thing. TCP
segments regularly will come in out of order, NFS is too stupid to do
things correctly; IP fragmentation should not be done unless necessary
to accommodate a smaller mtu.

BC

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Re[2]: FreeBSD router problems

2013-07-14 Thread Barney Cordoba

So why not get a real 10gb/s card? RJ45 10gig is here, 
and it works a lot better than LAGG.

If you want to get more than 1Gb/s on a single connection,
you'd need to use roundrobin, which will alternate packets
without concern for ordering. Purists will argue against it,
but it does work and modern TCP stacks know how to deal 
with out of order packets.

ifconfig lagg0 laggproto roundrobin laggport em0 laggport em1

BC



On Thu, 7/11/13, isp ml...@ukr.net wrote:

 Subject: Re[2]: FreeBSD router problems
 To: Alan Somers asom...@freebsd.org
 Cc: freebsd-net@freebsd.org
 Date: Thursday, July 11, 2013, 2:11 PM
 
 
 
 
 I have a real network with more than 4 000 users. In normal
 case, when I
 have two 1Gbps routers, and I split VLAN's between them
 total bandwidth
 if growing up to 1.7 Gbps.
 
 --- Incoming mail ---
 From: Alan Somers asom...@freebsd.org
 Date: 11 July 2013, 21:00:41
 
 How are you benchmarking it?  Each TCP connection only
 uses one member
 of a lagg port.  So if you want to see  1 Gbps,
 you'll need to
 benchmark with multiple TCP connections.  You may also
 need multiple
 systems; I don't know the full details of LACP.
 
 On Thu, Jul 11, 2013 at 11:32 AM, isp   ml...@ukr.net 
  wrote:
 
 
 
  Hi! I have a problem with my FreeBSD router, I can't
 get more than 1 Gbps
  throught it, but I have 2 Gbps LAGG on it. There are
 only 27 IPFW rules
  (NAT+Shaping). IPoE only.
  lagg0 (VLAN's + shaping) - two 'igb' adapters
  lagg1 (NAT, tso if off) - two 'em' adapters
 
  I tried to switch off dummynet, but it doesn't helps.
 
  # uname -a
  [code]FreeBSD router 9.1-RELEASE-p3 FreeBSD
 9.1-RELEASE-p3 #0: Tue Apr 30
  20:02:00 EEST 2013 
    root@south:/usr/obj/usr/src/sys/ROUTER 
 amd64
 
  # top -aSPHI
  last pid: 91712;  load averages:  2.18, 
 2.06,
  1.97
  up 20+22:28:36  17:40:22
  120 processes: 7 running, 87 sleeping, 26 waiting
  CPU 0:  0.0% user,  0.0% nice,  1.6%
 system, 38.6% interrupt, 59.8% idle
  CPU 1:  0.0% user,  0.0% nice,  7.1%
 system, 37.0% interrupt, 55.9% idle
  CPU 2:  0.0% user,  0.0% nice,  3.9%
 system, 38.6% interrupt, 57.5% idle
  CPU 3:  0.0% user,  0.0% nice, 15.7% system,
 26.8% interrupt, 57.5% idle
  Mem: 59M Active, 1102M Inact, 942M Wired, 800M Buf,
 5529M Free
  Swap: 16G Total, 16G Free
 
  PID USERNAME PRI NICE   SIZE 
   RES
 STATE   C   TIME   WCPU
 COMMAND
  12 root     -72   
 -     0K   448K RUN 
    1 153:39 72.22% [intr{swi1:
  netisr 0}]
  11 root     155 ki31 
    0K    64K RUN 
    1 494.2H 65.19% [idle{idle:
  cpu1}]
  11 root     155 ki31 
    0K    64K CPU2    2
 494.3H 64.65% [idle{idle:
  cpu2}]
  11 root     155 ki31 
    0K    64K RUN 
    0 493.3H 63.38% [idle{idle:
  cpu0}]
  11 root     155 ki31 
    0K    64K CPU3    3
 496.4H 62.55% [idle{idle:
  cpu3}]
  12 root     -92   
 -     0K   448K
 WAIT    2  58:49  9.38% [intr{irq266:
  igb0:que}]
  12 root     -92   
 -     0K   448K
 WAIT    2  59:32  9.03% [intr{irq271:
  igb1:que}]
  12 root     -92   
 -     0K   448K
 CPU1    1  59:09  8.94% [intr{irq265:
  igb0:que}]
  12 root     -92   
 -     0K   448K
 WAIT    3  57:52  8.01% [intr{irq272:
  igb1:que}]
  12 root     -92   
 -     0K   448K
 WAIT    1  59:32  7.96% [intr{irq270:
  igb1:que}]
  12 root     -92   
 -     0K   448K
 WAIT    3  55:47  7.81% [intr{irq267:
  igb0:que}]
  12 root     -92   
 -     0K   448K
 WAIT    0  55:24  7.23% [intr{irq264:
  igb0:que}]
  12 root     -92   
 -     0K   448K
 WAIT    0  56:57  6.69% [intr{irq269:
  igb1:que}]
  12 root     -92   
 -     0K   448K
 WAIT    3 203:34  4.74% [intr{irq275:
  em1:rx 0}]
  0 root     -92    0 
    0K   336K -   
    2 427:03  2.64%
  [kernel{dummynet}]
  0 root     -92    0 
    0K   336K -   
    3 206:57  2.54% [kernel{em0
  que}]
  86278 root      20    0
 33348K  8588K select 
 0   8:35  0.54%
  /usr/local/sbin/snmpd -p /var/run/net_snmpd.pid -r
  12 root     -92   
 -     0K   448K
 WAIT    2   7:56  0.20%
 [intr{irq276:
  em1:tx 0}]
 
  # cat /etc/sysctl.conf
  dev.igb.0.rx_processing_limit=4096
  dev.igb.1.rx_processing_limit=4096
  dev.em.0.rx_int_delay=200
  dev.em.0.tx_int_delay=200
  dev.em.0.rx_abs_int_delay=4000
  dev.em.0.tx_abs_int_delay=4000
  dev.em.0.rx_processing_limit=4096
  dev.em.1.rx_int_delay=200
  dev.em.1.tx_int_delay=200
  dev.em.1.rx_abs_int_delay=4000
  dev.em.1.tx_abs_int_delay=4000
  dev.em.1.rx_processing_limit=4096
  net.inet.ip.forwarding=1
  net.inet.ip.fastforwarding=1
  net.inet.tcp.blackhole=2
  net.inet.udp.blackhole=0
  net.inet.ip.redirect=0
  net.inet.tcp.delayed_ack=0
  net.inet.tcp.recvbuf_max=4194304
  net.inet.tcp.sendbuf_max=4194304
  net.inet.tcp.sack.enable=0
  net.inet.tcp.drop_synfin=1
  net.inet.tcp.nolocaltimewait=1
  net.inet.ip.ttl=255
  net.inet.ip.sourceroute=0
  net.inet.ip.accept_sourceroute=0
  net.inet.udp.recvspace=64080
  net.inet.ip.rtmaxcache=1024
  net.inet.ip.intr_queue_maxlen=5120
  kern.ipc.nmbclusters=824288
  kern.ipc.maxsockbuf=83886080

Re: Inconsistent NIC behavior

2013-07-03 Thread Barney Cordoba

On Mon, 7/1/13, Zaphod Beeblebrox zbee...@gmail.com wrote:

 Subject: Re: Inconsistent NIC behavior
 To: Barney Cordoba barney_cord...@yahoo.com
 Date: Monday, July 1, 2013, 7:38 PM

 On Sun, Jun 30, 2013 at
 12:04 PM, Barney Cordoba barney_cord...@yahoo.com
 wrote:

 One
 particular annoyance with Freebsd is that different NICs
 have different dormant behavior.

 On this we agree. 

 For example em and igb both will show the link being active
 or not on boot whether the interface

 has been UPed or not, while ixgbe and bce do not.

 I think it's a worthy goal to have NICs work the same in
 this manner. It's very valuable to know that

 a nic is connected without having to UP it. And an annoyance
 when  you fire up a new box with a

 new nic that shows No Carrier when the link light is on.

 I disagree here.  If an interface is shutdown,
 it should give no link to the far end.  I consider it an
 error that many FreeBSD NIC drivers cannot shutdown the
 link. 

 --
I think thats a different issue. The ability to shut down a link could easily 
be a feature.

However when you boot a machine, say with a 4 port NIC, having to UP them all 
to see which one is 
plugged in is simply a logistical disaster, particularly with admins with 
marginal skills. While
shutting down a link may occasionally be useful, the preponderance of uses 
would lean towards
having some way of knowing when a nic is plugged into a switch regardless of 
whether it's
been fully initialized.

BC

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Inconsistent NIC behavior

2013-06-30 Thread Barney Cordoba

One particular annoyance with Freebsd is that different NICs have different 
dormant behavior.

For example em and igb both will show the link being active or not on boot 
whether the interface
has been UPed or not, while ixgbe and bce do not. 

I think it's a worthy goal to have NICs work the same in this manner. It's very 
valuable to know that
a nic is connected without having to UP it. And an annoyance when  you fire up 
a new box with a
new nic that shows No Carrier when the link light is on. 

It's really too much of a project for one person to have enough knowledge of 
multiple drivers to make the 
changes, so it would be best if the maintainers would do it.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: hw.igb.num_queues default

2013-06-20 Thread Barney Cordoba

--- On Thu, 6/20/13, Andre Oppermann  wrote:

gt; From: Andre Oppermann 
gt; Subject: Re: hw.igb.num_queues default
gt; To: quot;Eugene Grosbeinquot; 
gt; Cc: quot;freebsd-net@freebsd.orgquot; , quot;Eggert, Larsquot; , 
quot;Jack Vogelquot; 
gt; Date: Thursday, June 20, 2013, 10:29 AM
gt; On 20.06.2013 15:37, Eugene Grosbein
gt; wrote:
gt; gt; On 20.06.2013 17:34, Eggert, Lars wrote:
gt; gt;
gt; gt;gt; real memory  = 8589934592 (8192 MB)
gt; gt;gt; avail memory = 8239513600 (7857 MB)
gt; gt;
gt; gt;gt; By default, the igb driver seems to set up one
gt; queue per detected CPU. Googling around, people seemed to
gt; suggest that limiting the number of queues makes things work
gt; better. I can confirm that setting hw.igb.num_queues=2 seems
gt; to have fixed the issue. (Two was the first value I tried,
gt; maybe other values other than 0 would work, too.)
gt; gt;gt;
gt; gt;gt; In order to uphold POLA, should the igb driver
gt; maybe default to a conservative value for hw.igb.num_queues
gt; that may not deliver optimal performance, but at least works
gt; out of the box?
gt; gt;
gt; gt; Or, better, make nmbclusters auto-tuning smarter, if
gt; any.
gt; gt; I mean, use more nmbclusters for machines with large
gt; amounts of memory.
gt; 
gt; That has already been done in HEAD.
gt; 
gt; The other problem is the pre-filling of the large rings for
gt; all queues
gt; stranding large amounts of mbuf clusters.  OpenBSD
gt; starts with a small
gt; number of filled mbufs in the RX ring and then dynamically
gt; adjusts the
gt; number upwards if there is enough traffic to maintain deep
gt; buffers.  I
gt; don#39;t know if it always quickly scales in practice though.

You#39;re probably not running with 512MB these days, so pre-filling isn#39;t 
much of an issue.
4 queues is only 8MB of ram with 1024 descriptors per queue, and 4MB with 512.

Think about the # of queues issue. In order to have acceptable latency, you 
need to do 6k-10k 
interrupts per second per queue. So with 4 queues you have to process 40K 
ints/second
and with 2 you only process 20k.  For a gig link 2 queues is much more 
efficient.

quot;Spreadingquot; for the sake of spreading is more about Intel marketing 
than it is about
practical computing.

BC

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: netmap bridge can tranmit big packet in line rate ?

2013-05-21 Thread Barney Cordoba

--- On Tue, 5/21/13, liujie liu...@263.net wrote:

 From: liujie liu...@263.net
 Subject: Re: netmap bridge can tranmit big packet in line rate ?
 To: freebsd-net@freebsd.org
 Date: Tuesday, May 21, 2013, 5:25 AM
 Hi, Prof.Luigi RIZZO

  Firstly i should thank you for netmap. I tried to send a
 e-mail to you
 yestoday, but it was rejected.

  I used two machines to test netmap bridge. all with i7-2600
 cpu and intel
 82599 dual-interfaces card.

  One worked as sender and receiver with pkt-gen, the other
 worked as bridge
 with bridge.c.

  as you said,I feeled comfous too when i saw the big packet
 performance
 dropped, i tried to change the memory parameters of
 netmap(netmap_mem1.c
 netmap_mem2.c),but it seemed that  can not resove the
 problem.
   60-byte packet send 14882289 pps  recv 
 13994753 pps
   124-byte     
    send   8445770 pps 
 recv    7628942 pps
   252-byte     
    send   4529819 pps 
 recv     3757843 pps
   508-byte     
    send    2350815 pps 
 recv    1645647 pps
   1514-byte       send 
   814288 pps     recv  489133
 pps

These numbers indicate you're tx'ing 7.2Gb/s with 60 byte packets and
9.8Gb/s with 1514, so maybe you just need a new calculator?

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: netmap bridge can tranmit big packet in line rate ?

2013-05-21 Thread Barney Cordoba

--- On Tue, 5/21/13, Luigi Rizzo ri...@iet.unipi.it wrote:

 From: Luigi Rizzo ri...@iet.unipi.it
 Subject: Re: netmap bridge can tranmit big packet in line rate ?
 To: Hooman Fazaeli hoomanfaza...@gmail.com
 Cc: freebsd-net@freebsd.org
 Date: Tuesday, May 21, 2013, 10:39 AM
 On Tue, May 21, 2013 at 06:51:12PM
 +0430, Hooman Fazaeli wrote:
  On 5/21/2013 5:10 PM, Barney Cordoba wrote:

   --- On Tue, 5/21/13, liujie liu...@263.net
 wrote:

   From: liujie liu...@263.net
   Subject: Re: netmap bridge can tranmit big
 packet in line rate ?
   To: freebsd-net@freebsd.org
   Date: Tuesday, May 21, 2013, 5:25 AM
   Hi, Prof.Luigi RIZZO

    Firstly i should thank you for netmap. I
 tried to send a
   e-mail to you
   yestoday, but it was rejected.

    I used two machines to test netmap
 bridge. all with i7-2600
   cpu and intel
   82599 dual-interfaces card.

    One worked as sender and receiver with
 pkt-gen, the other
   worked as bridge
   with bridge.c.

    as you said,I feeled comfous too when i
 saw the big packet
   performance
   dropped, i tried to change the memory
 parameters of
   netmap(netmap_mem1.c
   netmap_mem2.c),but it seemed that  can
 not resove the
   problem.
     60-byte packet send 14882289
 pps  recv 
   13994753 pps
     124-byte 

      send   8445770 pps

   recv    7628942 pps
     252-byte 

      send   4529819 pps

   recv     3757843 pps
     508-byte 

      send    2350815 pps 
   recv    1645647 pps
     1514-byte   
    send 
     814288 pps 
    recv  489133
   pps
   These numbers indicate you're tx'ing 7.2Gb/s with
 60 byte packets and
   9.8Gb/s with 1514, so maybe you just need a new
 calculator?

   BC
   ___

  AsBarney pointed outalready, your numbers are
 reasonable. You have almost saturated
  the link with 1514 byte packets.In the case of 64 byte
 packets, you do not achieve line
  rate probably because of the congestion on the bus.Can
 you show us top -SI output on the
  sender machine?

 the OP is commenting that on the receive side he is seeing a
 much
 lower number than on the tx side (A:ix1 489Kpps vs A:ix0
 814Kpps).

     [pkt-gen -f tx ix0][ix0 bridge ]
     [   HOST A     
   ]     [    HOST B ]
     [pkt-gen -f rx ix1][ix1   
     ]

 What is unclear is where the loss occurs.

     cheers
     luigi

The ixgbe driver has mac stats that will answer that. Just look at the
sysctl output.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: High CPU interrupt load on intel I350T4 with igb on 8.3

2013-05-13 Thread Barney Cordoba

You have to admit there's a problem before you can fix it. If Eugene is 
going to blame to bottleneck and no one is going to tell him he's wrong,
then there is no discussion.

The solution in this case is to use 1 queue, which was my suggestion
many days ago. 

The defaults are broken. The driver should default to 1 queue, and be
tuned to the system environment. With 2 NICs in the box, the defaults
are defective. 

1 queue should always work. Other settings require tuning and an
understanding of how things work. 

I've had to support i350 so I've been playing with the driver a bit. It 
works fine with lots of cores. But you have to have more cores than
queues. 2 cards with 4 queues on a 6 physical core system gets into a
contention problem at certain loads.

I've also removed the cpu bindings, which is about all I'm free to disclose.

The driver needs a tuning doc as much as anything else.

BC




--- On Sat, 5/11/13, Adrian Chadd adr...@freebsd.org wrote:

 From: Adrian Chadd adr...@freebsd.org
 Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3
 To: Hooman Fazaeli hoomanfaza...@gmail.com
 Cc: Barney Cordoba barney_cord...@yahoo.com, Clément Hermann (nodens) 
 nodens2...@gmail.com, Eugene Grosbein egrosb...@rdtc.ru, 
 freebsd-net@freebsd.org
 Date: Saturday, May 11, 2013, 6:16 PM
 Hi,
 
 The motivation behind the locking scheme in igb in friends
 is for a
 very specific, userland-traffic-origin workload.
 
 Sure, it may or may not work well for forwarding/filtering
 workloads.
 
 If you want to fix it, let's have a discussion about how to
 do it,
 followed by some patches to do so.
 
 
 
 
 Adrian
 
 On 11 May 2013 13:12, Hooman Fazaeli hoomanfaza...@gmail.com
 wrote:
  On 5/11/2013 8:26 PM, Barney Cordoba wrote:
  Clearly you don't understand the problem. Your
 logic is that because other drivers are defective also;
 therefore its not a driver problem? The problem is caused by
 a multi-threaded driver that
  haphazardly launches tasks and that doesn't manage
 the case that the rest of the system can't handle the load.
 It's no different than a driver that barfs when mbuf
 clusters are exhausted. The answer
  isn't to increase memory or mbufs, even though that
 may alleviate the problem. The answer is to fix the driver,
 so that it doesn't crash the system for an event that is
 wholly predictable. igb has
  1) too many locks and 2) exasperates the problem by
 binding to cpus, which causes it to not only have to wait
 for the lock to free, but also for a specific cpu to become
 free. So it chugs along
  happily until it encounters a bottleneck, at which
 point it quickly blows up the entire system in a domino
 effect. It needs to manage locks more efficiently, and also
 to detect when the backup is
  unmanageable. Ever since FreeBSD 5 the answer has
 been it's fixed in 7, or its fixed in 9, or it's fixed in
 10. There will always be bottlenecks, and no driver should
 blow up the system no matter
  what intermediate code may present a problem. Its
 the driver's responsibility to behave and to drop packets if
 necessary. BC
 
  And how the driver should behave? You suggest dropping
 the packets. Even if we accept
  that dropping packets is a good strategy in all
 configurations (which I doubt), the driver is
  definitely not the best place to implement it, since
 that involves duplication of similar
  code between drivers. Somewhere like the Ethernet layer
 is a much better choice to watch
  load of packets and drop them to prevent them to eat
 all the cores. Furthermore, ignoring
  the fact that pf is not optimized for multi-processors
 and blaming drivers for not adjusting
  themselves with the this pf's fault, is a bit unfair, I
 believe.
 
 
  --
 
  Best regards.
  Hooman Fazaeli
 
  ___
  freebsd-net@freebsd.org
 mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-net
  To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
 ___
 freebsd-net@freebsd.org
 mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
 
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: High CPU interrupt load on intel I350T4 with igb on 8.3

2013-05-11 Thread Barney Cordoba

--- On Fri, 5/10/13, Eugene Grosbein egrosb...@rdtc.ru wrote:

 From: Eugene Grosbein egrosb...@rdtc.ru
 Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: freebsd-net@freebsd.org, Clément Hermann (nodens) 
 nodens2...@gmail.com
 Date: Friday, May 10, 2013, 8:56 AM
 On 10.05.2013 05:16, Barney Cordoba
 wrote:

  Network device driver is not guilty here,
 that's
  just pf's
  contention
  running in igb's context.

  They're both at play. Single threadedness
 aggravates
  subsystems that 
  have too many lock points.

  It can also be solved with using 1 queue,
 because
  then you don't
  have 4 queues going into a single thread.

  Again, the problem is within pf(4)'s global lock,
 not in the
  igb(4).

  Again, you're wrong. It's not the bottleneck's fault;
 it's the fault of 
  the multi-threaded code for only working properly when
 there are no
  bottlenecks.

 In practice, the problem is easily solved without any change
 in the igb code.
 The same problem will occur for other NIC drivers too -
 if several NICs were combined within one lagg(4). So, driver
 is not guilty and
 solution would be same - eliminate bottleneck and you will
 be fine and capable
 to spread the load on several CPU cores.

 Therefore, I don't care of CS theory for this particular
 case.

Clearly you don't understand the problem. Your logic is that because
other drivers are defective also; therefore its not a driver problem?

The problem is caused by a multi-threaded driver that haphazardly launches
tasks and that doesn't manage the case that the rest of the system can't
handle the load. It's no different than a driver that barfs when mbuf
clusters are exhausted. The answer isn't to increase memory or mbufs,  even
though that may alleviate the problem. The answer is to fix the driver,
so that it doesn't crash the system for an event that is wholly predictable.

igb has 1) too many locks and 2) exasperates the problem by binding to
cpus, which causes it to not only have to wait for the lock to free, but 
also for a specific cpu to become free. So it chugs along happily until 
it encounters a bottleneck, at which point it quickly blows up the entire
system in a domino effect. It needs to manage locks more efficiently, and
also to detect when the backup is unmanageable.

Ever since FreeBSD 5 the answer has been it's fixed in 7, or its fixed in
9, or it's fixed in 10. There will always be bottlenecks, and no driver
should blow up the system no matter what intermediate code may present a
problem. Its the driver's responsibility to behave and to drop packets
if necessary.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: High CPU interrupt load on intel I350T4 with igb on 8.3

2013-05-11 Thread Barney Cordoba

--- On Sat, 5/11/13, Hooman Fazaeli hoomanfaza...@gmail.com wrote:

 From: Hooman Fazaeli hoomanfaza...@gmail.com
 Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: Eugene Grosbein egrosb...@rdtc.ru, freebsd-net@freebsd.org, Clément 
 Hermann (nodens) nodens2...@gmail.com
 Date: Saturday, May 11, 2013, 4:12 PM
 On 5/11/2013 8:26 PM, Barney Cordoba
 wrote:
  Clearly you don't understand the problem. Your logic is
 that because other drivers are defective also; therefore its
 not a driver problem? The problem is caused by a
 multi-threaded driver that
  haphazardly launches tasks and that doesn't manage the
 case that the rest of the system can't handle the load. It's
 no different than a driver that barfs when mbuf clusters are
 exhausted. The answer
  isn't to increase memory or mbufs, even though that may
 alleviate the problem. The answer is to fix the driver, so
 that it doesn't crash the system for an event that is wholly
 predictable. igb has
  1) too many locks and 2) exasperates the problem by
 binding to cpus, which causes it to not only have to wait
 for the lock to free, but also for a specific cpu to become
 free. So it chugs along
  happily until it encounters a bottleneck, at which
 point it quickly blows up the entire system in a domino
 effect. It needs to manage locks more efficiently, and also
 to detect when the backup is
  unmanageable. Ever since FreeBSD 5 the answer has been
 it's fixed in 7, or its fixed in 9, or it's fixed in 10.
 There will always be bottlenecks, and no driver should blow
 up the system no matter
  what intermediate code may present a problem. Its the
 driver's responsibility to behave and to drop packets if
 necessary. BC

 And how the driver should behave? You suggest dropping the
 packets. Even if we accept
 that dropping packets is a good strategy in all
 configurations (which I doubt), the driver is
 definitely not the best place to implement it, since that
 involves duplication of similar
 code between drivers. Somewhere like the Ethernet layer is a
 much better choice to watch
 load of packets and drop them to prevent them to eat all the
 cores. Furthermore, ignoring
 the fact that pf is not optimized for multi-processors and
 blaming drivers for not adjusting
 themselves with the this pf's fault, is a bit unfair, I
 believe.

It's easier to make excuses than to write a really good driver. I'll
grant you that.
BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: High CPU interrupt load on intel I350T4 with igb on 8.3

2013-05-09 Thread Barney Cordoba

--- On Sun, 4/28/13, Barney Cordoba barney_cord...@yahoo.com wrote:

 From: Barney Cordoba barney_cord...@yahoo.com
 Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3
 To: Jack Vogel jfvo...@gmail.com
 Cc: FreeBSD Net freebsd-net@freebsd.org, Clément Hermann (nodens) 
 nodens2...@gmail.com
 Date: Sunday, April 28, 2013, 2:59 PM
 The point of lists is to be able to
 benefit from other's experiences so you don't have to waste
 your time trying things that others have already done.
 I'm not pontificating. I've done the tests. There's no
 reason for every person who is having to exact same problem
 to do the same tests over and over, hoping for somemagically
 different result. The result will always be the same.
 Because there's no chance of it working properly by
 chance.
 BC

 --- On Sun, 4/28/13, Jack Vogel jfvo...@gmail.com
 wrote:

 From: Jack Vogel jfvo...@gmail.com
 Subject: Re: High CPU interrupt load on intel I350T4 with
 igb on 8.3
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: FreeBSD Net freebsd-net@freebsd.org,
 Clément Hermann (nodens) nodens2...@gmail.com
 Date: Sunday, April 28, 2013, 1:07 PM

 Try setting your queues to 1, run some tests, then try
 settingyour queues to 2, then to 4... its called tuning, and
 rather thanjust pontificating about it, which Barney so
 loves to do, you can
 discover what works best. I ran tests last week preparing
 for anew driver version and found the best results came not
 only whiletweaking queues, but also ring size, and I could
 see changes based
 on the buf ring size  There are lots of things that may
 improve ordegrade performance depending on the workload.
 Jack

 On Sun, Apr 28, 2013 at 7:21 AM, Barney Cordoba barney_cord...@yahoo.com
 wrote:

 --- On Fri, 4/26/13, Clément Hermann (nodens) nodens2...@gmail.com
 wrote:

  From: Clément Hermann (nodens) nodens2...@gmail.com

  Subject: High CPU interrupt load on intel I350T4 with
 igb on 8.3

  To: freebsd-net@freebsd.org

  Date: Friday, April 26, 2013, 7:31 AM

  Hi list,

  We use pf+ALTQ for trafic shaping on some routers.

 % 
  We are switching to new servers : Dell PowerEdge R620
 with 2

  8-cores Intel Processor (E5-2650L), 8GB RAM and Intel
 I350T4

  (quad port) using igb driver. The old hardware is using
 em

  driver, the CPU load is high but mostly due to kernel
 and a

  large pf ruleset.

  On the new hardware, we see high CPU Interrupt load (up
 to

  95%), even though there is not much trafic currently
 (peaks

  about 150Mbps and 40Kpps). All queues are used and
 binded to

  a cpu according to top, but a lot of CPU time is spent
 on

  igb queues (interrupt or wait). The load is fine when
 we

  stay below 20Kpps.

  We see no mbuf shortage, no dropped packet, but there
 is

  little margin left on CPU time (about 25% idle at best,
 most

  of CPU time is spent on interrupts), which is
 disturbing.

  We have done some tuning, but to no avail :

  sysctl.conf :

  # mbufs

  kern.ipc.nmbclusters=65536

  # Sockets

  kern.ipc.somaxconn=8192

  net.inet.tcp.delayed_ack=0

  net.inet.tcp.sendspace=65535

  net.inet.udp.recvspace=65535

  net.inet.udp.maxdgram=57344

  net.local.stream.recvspace=65535

  net.local.stream.sendspace=65535

  # IGB

  dev.igb.0.rx_processing_limit=4096

  dev.igb.1.rx_processing_limit=4096

  dev.igb.2.rx_processing_limit=4096

  dev.igb.3.rx_processing_limit=4096

  /boot/loader.conf :

  vm.kmem_size=1G

  hw.igb.max_interrupt_rate=32000  # maximum number
 of

  interrupts/sec generated by single igb(4) (default
 8000)

  hw.igb.txd=2048           

    # number of transmit descriptors allocated by the

  driver (2048 limit)

  hw.igb.rxd=2048           

    # number of receive descriptors allocated by the

  driver (2048 limit)

  hw.igb.rx_process_limit=1000     #

  maximum number of received packets to process at a
 time, The

  default of 100 is

     # too low for most firewalls. (-1 means

  unlimited)

  Kernel HZ is 1000.

  The IGB /boot/loader.conf tuning was our last attempt,
 it

  didn't change anything.

  Does anyone have any pointer ? How could we lower CPU

  interrupt load ? should we set
 hw.igb.max_interrupt_rate

  lower instead of higher ?

  From what we saw here and there, we should be able to
 do

  much better with this hardware.

  relevant sysctl (igb1 and igb2 only, other interfaces
 are

  unused) :

  sysctl dev.igb | grep -v : 0$

  dev.igb.1.%desc: Intel(R) PRO/1000 Network Connection

  version - 2.3.1

  dev.igb.1.%driver: igb

  dev.igb.1.%location: slot=0 function=1

  dev.igb.1.%pnpinfo: vendor=0x8086 device=0x1521

  subvendor=0x8086 subdevice=0x5001 class=0x02

  dev.igb.1.%parent: pci5

  dev.igb.1.nvm: -1

  dev.igb.1

Re: High CPU interrupt load on intel I350T4 with igb on 8.3

2013-05-09 Thread Barney Cordoba

--- On Thu, 5/9/13, Eugene Grosbein egrosb...@rdtc.ru wrote:

 From: Eugene Grosbein egrosb...@rdtc.ru
 Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3
 To: Clément Hermann (nodens) nodens2...@gmail.com
 Cc: freebsd-net@freebsd.org
 Date: Thursday, May 9, 2013, 10:55 AM
 On 26.04.2013 18:31, Clément
 Hermann (nodens) wrote:
  Hi list,

  We use pf+ALTQ for trafic shaping on some routers.

  We are switching to new servers : Dell PowerEdge R620
 with 2 8-cores 
  Intel Processor (E5-2650L), 8GB RAM and Intel I350T4
 (quad port) using 
  igb driver. The old hardware is using em driver, the
 CPU load is high 
  but mostly due to kernel and a large pf ruleset.

  On the new hardware, we see high CPU Interrupt load (up
 to 95%), even 
  though there is not much trafic currently (peaks about
 150Mbps and 
  40Kpps). All queues are used and binded to a cpu
 according to top, but a 
  lot of CPU time is spent on igb queues (interrupt or
 wait). The load is 
  fine when we stay below 20Kpps.

  We see no mbuf shortage, no dropped packet, but there
 is little margin 
  left on CPU time (about 25% idle at best, most of CPU
 time is spent on 
  interrupts), which is disturbing.

 It seems you suffer from pf lock contention. You should stop
 using pf
 with multi-core systems with 8.3. Move to ipfw+dummynet or
 ng_car for 8.3
 or move to 10.0-CURRENT having new, rewritten pf that does
 not have this problem.

 Network device driver is not guilty here, that's just pf's
 contention
 running in igb's context.

 Eugene Grosbein

They're both at play. Single threadedness aggravates subsystems that 
have too many lock points.

It can also be solved with using 1 queue, because then you don't
have 4 queues going into a single thread.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: High CPU interrupt load on intel I350T4 with igb on 8.3

2013-05-09 Thread Barney Cordoba

--- On Thu, 5/9/13, Eugene Grosbein egrosb...@rdtc.ru wrote:

 From: Eugene Grosbein egrosb...@rdtc.ru
 Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: Clément Hermann (nodens) nodens2...@gmail.com, 
 freebsd-net@freebsd.org
 Date: Thursday, May 9, 2013, 12:30 PM
 On 09.05.2013 23:25, Barney Cordoba
 wrote:

  Network device driver is not guilty here, that's
 just pf's
  contention
  running in igb's context.

  Eugene Grosbein

  They're both at play. Single threadedness aggravates
 subsystems that 
  have too many lock points.

  It can also be solved with using 1 queue, because
 then you don't
  have 4 queues going into a single thread.

 Again, the problem is within pf(4)'s global lock, not in the
 igb(4).

Again, you're wrong. It's not the bottleneck's fault; it's the fault of 
the multi-threaded code for only working properly when there are no
bottlenecks.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Capture packets before kernel process

2013-05-01 Thread Barney Cordoba

--- On Tue, 4/30/13, w...@sourcearmory.com w...@sourcearmory.com wrote:

 From: w...@sourcearmory.com w...@sourcearmory.com
 Subject: Capture packets before kernel process
 To: freebsd-net@freebsd.org
 Date: Tuesday, April 30, 2013, 11:24 AM
 Hi!

 I need some help, currently I'm working in a project where I
 want to capture and process some network packets before the
 kernel. I have searched but I have found nothing.

 Is there some way to capture the packets before the kernel
 ?

You want to wedge your code to the if_input routine. Then pass the mbuf
to the original if_input routine.

BC

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: High CPU interrupt load on intel I350T4 with igb on 8.3

2013-04-28 Thread Barney Cordoba



--- On Fri, 4/26/13, Clément Hermann (nodens) nodens2...@gmail.com wrote:

 From: Clément Hermann (nodens) nodens2...@gmail.com
 Subject: High CPU interrupt load on intel I350T4 with igb on 8.3
 To: freebsd-net@freebsd.org
 Date: Friday, April 26, 2013, 7:31 AM
 Hi list,
 
 We use pf+ALTQ for trafic shaping on some routers.
 
 We are switching to new servers : Dell PowerEdge R620 with 2
 8-cores Intel Processor (E5-2650L), 8GB RAM and Intel I350T4
 (quad port) using igb driver. The old hardware is using em
 driver, the CPU load is high but mostly due to kernel and a
 large pf ruleset.
 
 On the new hardware, we see high CPU Interrupt load (up to
 95%), even though there is not much trafic currently (peaks
 about 150Mbps and 40Kpps). All queues are used and binded to
 a cpu according to top, but a lot of CPU time is spent on
 igb queues (interrupt or wait). The load is fine when we
 stay below 20Kpps.
 
 We see no mbuf shortage, no dropped packet, but there is
 little margin left on CPU time (about 25% idle at best, most
 of CPU time is spent on interrupts), which is disturbing.
 
 We have done some tuning, but to no avail :
 
 sysctl.conf :
 
 # mbufs
 kern.ipc.nmbclusters=65536
 # Sockets
 kern.ipc.somaxconn=8192
 net.inet.tcp.delayed_ack=0
 net.inet.tcp.sendspace=65535
 net.inet.udp.recvspace=65535
 net.inet.udp.maxdgram=57344
 net.local.stream.recvspace=65535
 net.local.stream.sendspace=65535
 # IGB
 dev.igb.0.rx_processing_limit=4096
 dev.igb.1.rx_processing_limit=4096
 dev.igb.2.rx_processing_limit=4096
 dev.igb.3.rx_processing_limit=4096
 
 /boot/loader.conf :
 
 vm.kmem_size=1G
 hw.igb.max_interrupt_rate=32000  # maximum number of
 interrupts/sec generated by single igb(4) (default 8000)
 hw.igb.txd=2048           
                
   # number of transmit descriptors allocated by the
 driver (2048 limit)
 hw.igb.rxd=2048           
                
   # number of receive descriptors allocated by the
 driver (2048 limit)
 hw.igb.rx_process_limit=1000     #
 maximum number of received packets to process at a time, The
 default of 100 is
                
                
                
            
    # too low for most firewalls. (-1 means
 unlimited)
 
 Kernel HZ is 1000.
 
 The IGB /boot/loader.conf tuning was our last attempt, it
 didn't change anything.
 
 Does anyone have any pointer ? How could we lower CPU
 interrupt load ? should we set hw.igb.max_interrupt_rate
 lower instead of higher ?
 From what we saw here and there, we should be able to do
 much better with this hardware.
 
 
 relevant sysctl (igb1 and igb2 only, other interfaces are
 unused) :
 
 sysctl dev.igb | grep -v : 0$
 dev.igb.1.%desc: Intel(R) PRO/1000 Network Connection
 version - 2.3.1
 dev.igb.1.%driver: igb
 dev.igb.1.%location: slot=0 function=1
 dev.igb.1.%pnpinfo: vendor=0x8086 device=0x1521
 subvendor=0x8086 subdevice=0x5001 class=0x02
 dev.igb.1.%parent: pci5
 dev.igb.1.nvm: -1
 dev.igb.1.enable_aim: 1
 dev.igb.1.fc: 3
 dev.igb.1.rx_processing_limit: 4096
 dev.igb.1.eee_disabled: 1
 dev.igb.1.link_irq: 2
 dev.igb.1.device_control: 1209795137
 dev.igb.1.rx_control: 67141658
 dev.igb.1.interrupt_mask: 4
 dev.igb.1.extended_int_mask: 2147483981
 dev.igb.1.fc_high_water: 33168
 dev.igb.1.fc_low_water: 33152
 dev.igb.1.queue0.interrupt_rate: 71428
 dev.igb.1.queue0.txd_head: 1318
 dev.igb.1.queue0.txd_tail: 1318
 dev.igb.1.queue0.tx_packets: 84663594
 dev.igb.1.queue0.rxd_head: 717
 dev.igb.1.queue0.rxd_tail: 715
 dev.igb.1.queue0.rx_packets: 43899597
 dev.igb.1.queue0.rx_bytes: 8905556030
 dev.igb.1.queue1.interrupt_rate: 90909
 dev.igb.1.queue1.txd_head: 693
 dev.igb.1.queue1.txd_tail: 693
 dev.igb.1.queue1.tx_packets: 57543349
 dev.igb.1.queue1.rxd_head: 1033
 dev.igb.1.queue1.rxd_tail: 1032
 dev.igb.1.queue1.rx_packets: 54821897
 dev.igb.1.queue1.rx_bytes: 9944955108
 dev.igb.1.queue2.interrupt_rate: 10
 dev.igb.1.queue2.txd_head: 350
 dev.igb.1.queue2.txd_tail: 350
 dev.igb.1.queue2.tx_packets: 62320990
 dev.igb.1.queue2.rxd_head: 1962
 dev.igb.1.queue2.rxd_tail: 1939
 dev.igb.1.queue2.rx_packets: 43909016
 dev.igb.1.queue2.rx_bytes: 8673941461
 dev.igb.1.queue3.interrupt_rate: 14925
 dev.igb.1.queue3.txd_head: 647
 dev.igb.1.queue3.txd_tail: 647
 dev.igb.1.queue3.tx_packets: 58776199
 dev.igb.1.queue3.rxd_head: 692
 dev.igb.1.queue3.rxd_tail: 691
 dev.igb.1.queue3.rx_packets: 55138996
 dev.igb.1.queue3.rx_bytes: 9310217354
 dev.igb.1.queue4.interrupt_rate: 10
 dev.igb.1.queue4.txd_head: 1721
 dev.igb.1.queue4.txd_tail: 1721
 dev.igb.1.queue4.tx_packets: 54337209
 dev.igb.1.queue4.rxd_head: 1609
 dev.igb.1.queue4.rxd_tail: 1598
 dev.igb.1.queue4.rx_packets: 46546503
 dev.igb.1.queue4.rx_bytes: 8818182840
 dev.igb.1.queue5.interrupt_rate: 11627
 dev.igb.1.queue5.txd_head: 254
 dev.igb.1.queue5.txd_tail: 254
 dev.igb.1.queue5.tx_packets: 53117182
 dev.igb.1.queue5.rxd_head: 701
 dev.igb.1.queue5.rxd_tail: 685
 dev.igb.1.queue5.rx_packets: 43014837
 dev.igb.1.queue5.rx_bytes: 8699057447

Re: High CPU interrupt load on intel I350T4 with igb on 8.3

2013-04-28 Thread Barney Cordoba

The point of lists is to be able to benefit from other's experiences so you 
don't have to waste your time trying things that others have already done.
I'm not pontificating. I've done the tests. There's no reason for every person 
who is having to exact same problem to do the same tests over and over, hoping 
for somemagically different result. The result will always be the same. Because 
there's no chance of it working properly by chance.
BC


--- On Sun, 4/28/13, Jack Vogel jfvo...@gmail.com wrote:

From: Jack Vogel jfvo...@gmail.com
Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3
To: Barney Cordoba barney_cord...@yahoo.com
Cc: FreeBSD Net freebsd-net@freebsd.org, Clément Hermann (nodens) 
nodens2...@gmail.com
Date: Sunday, April 28, 2013, 1:07 PM

Try setting your queues to 1, run some tests, then try settingyour queues to 2, 
then to 4... its called tuning, and rather thanjust pontificating about it, 
which Barney so loves to do, you can
discover what works best. I ran tests last week preparing for anew driver 
version and found the best results came not only whiletweaking queues, but also 
ring size, and I could see changes based
on the buf ring size  There are lots of things that may improve ordegrade 
performance depending on the workload.
Jack



On Sun, Apr 28, 2013 at 7:21 AM, Barney Cordoba barney_cord...@yahoo.com 
wrote:





--- On Fri, 4/26/13, Clément Hermann (nodens) nodens2...@gmail.com wrote:



 From: Clément Hermann (nodens) nodens2...@gmail.com

 Subject: High CPU interrupt load on intel I350T4 with igb on 8.3

 To: freebsd-net@freebsd.org

 Date: Friday, April 26, 2013, 7:31 AM

 Hi list,



 We use pf+ALTQ for trafic shaping on some routers.



 We are switching to new servers : Dell PowerEdge R620 with 2

 8-cores Intel Processor (E5-2650L), 8GB RAM and Intel I350T4

 (quad port) using igb driver. The old hardware is using em

 driver, the CPU load is high but mostly due to kernel and a

 large pf ruleset.



 On the new hardware, we see high CPU Interrupt load (up to

 95%), even though there is not much trafic currently (peaks

 about 150Mbps and 40Kpps). All queues are used and binded to

 a cpu according to top, but a lot of CPU time is spent on

 igb queues (interrupt or wait). The load is fine when we

 stay below 20Kpps.



 We see no mbuf shortage, no dropped packet, but there is

 little margin left on CPU time (about 25% idle at best, most

 of CPU time is spent on interrupts), which is disturbing.



 We have done some tuning, but to no avail :



 sysctl.conf :



 # mbufs

 kern.ipc.nmbclusters=65536

 # Sockets

 kern.ipc.somaxconn=8192

 net.inet.tcp.delayed_ack=0

 net.inet.tcp.sendspace=65535

 net.inet.udp.recvspace=65535

 net.inet.udp.maxdgram=57344

 net.local.stream.recvspace=65535

 net.local.stream.sendspace=65535

 # IGB

 dev.igb.0.rx_processing_limit=4096

 dev.igb.1.rx_processing_limit=4096

 dev.igb.2.rx_processing_limit=4096

 dev.igb.3.rx_processing_limit=4096



 /boot/loader.conf :



 vm.kmem_size=1G

 hw.igb.max_interrupt_rate=32000  # maximum number of

 interrupts/sec generated by single igb(4) (default 8000)

 hw.igb.txd=2048           

                

   # number of transmit descriptors allocated by the

 driver (2048 limit)

 hw.igb.rxd=2048           

                

   # number of receive descriptors allocated by the

 driver (2048 limit)

 hw.igb.rx_process_limit=1000     #

 maximum number of received packets to process at a time, The

 default of 100 is

                

                

                

            

    # too low for most firewalls. (-1 means

 unlimited)



 Kernel HZ is 1000.



 The IGB /boot/loader.conf tuning was our last attempt, it

 didn't change anything.



 Does anyone have any pointer ? How could we lower CPU

 interrupt load ? should we set hw.igb.max_interrupt_rate

 lower instead of higher ?

 From what we saw here and there, we should be able to do

 much better with this hardware.





 relevant sysctl (igb1 and igb2 only, other interfaces are

 unused) :



 sysctl dev.igb | grep -v : 0$

 dev.igb.1.%desc: Intel(R) PRO/1000 Network Connection

 version - 2.3.1

 dev.igb.1.%driver: igb

 dev.igb.1.%location: slot=0 function=1

 dev.igb.1.%pnpinfo: vendor=0x8086 device=0x1521

 subvendor=0x8086 subdevice=0x5001 class=0x02

 dev.igb.1.%parent: pci5

 dev.igb.1.nvm: -1

 dev.igb.1.enable_aim: 1

 dev.igb.1.fc: 3

 dev.igb.1.rx_processing_limit: 4096

 dev.igb.1.eee_disabled: 1

 dev.igb.1.link_irq: 2

 dev.igb.1.device_control: 1209795137

 dev.igb.1.rx_control: 67141658

 dev.igb.1.interrupt_mask: 4

 dev.igb.1.extended_int_mask: 2147483981

 dev.igb.1.fc_high_water: 33168

 dev.igb.1.fc_low_water: 33152

 dev.igb.1.queue0.interrupt_rate: 71428

 dev.igb.1.queue0.txd_head: 1318

 dev.igb.1.queue0.txd_tail: 1318

 dev.igb.1.queue0.tx_packets: 84663594

 dev.igb.1.queue0.rxd_head: 717

 dev.igb.1.queue0.rxd_tail: 715

 dev.igb.1.queue0.rx_packets: 43899597

 dev.igb

Re: pf performance?

2013-04-26 Thread Barney Cordoba

--- On Fri, 4/26/13, Erich Weiler wei...@soe.ucsc.edu wrote:

 From: Erich Weiler wei...@soe.ucsc.edu
 Subject: Re: pf performance?
 To: Andre Oppermann an...@freebsd.org
 Cc: Paul Tatarsky p...@soe.ucsc.edu, freebsd-net@freebsd.org
 Date: Friday, April 26, 2013, 12:04 PM
  But the work pf does would
 show up in 'system' on top right?  So if I
  see all my CPUs tied up 100%
  in 'interrupts' and very little 'system', would it
 be a reasonable
  assumption to think that if I got
  more CPU cores to handle the interrupts that
 eventually I would see
  'system' load increase as the
  interrupt load became faster to be handled? 
 And thus increase my
  bandwidth?

  Having the work of pf show up in 'interrupts' or
 'system' depends on the
  network driver and how it handles sending packets up
 the stack.  In most
  cases drivers deliver packets from interrupt context.

 Ah, I see.  Definitely appears for me in interrupts
 then.  I've got the mxge driver doing the work
 here.  So, given that I can spread out the interrupts
 to every core (like, pin an interrupt queue to each core), I
 can have all my cores work on the process.  But seeing
 as though the pf bit is still serialized I'm not sure that I
 understand how it is serialized when many CPUs are handling
 interrupts, and hence doing the work of pf as well? 
 Wouldn't that indicate that the work of pf is being handled
 by many cores, as many cores are handling the interrupts?

you're thinking exactly backwards. You're creating lock contention by
having a bunch of receive processes going into a single threaded pf
process.

Think of it like a six lane highway that has 5 lanes closed a mile up the
road. The result isn't that you go the same speed as a 1 lane highway;
what you have is a parking lot. The only thing you're doing by spreading
the interrupts is using up more cycles on more cores.

What you *should* be doing, if you can engineer it, is use 1 path through
the pf filter. You could have 4 queues feed a single process that dequeues
and runs through the filter. The problem with that is that the pf process
IS the bottleneck in that its slower than the receive processes, so you'd
best just use the other cores to do userland stuff. You could use cpuset
to make sure that no userland process uses the interrupt core, and dedicate
1 cpu to packet filtering. 1 modern CPU can easily handle a gig of traffic.
There's no need to spread in most case.

BC

 Or are you saying that pf *is* being handled by many cores,
 but just in a serialized nature?  Like, packet 1 is
 handled by core 0, then packet 2 is handled by core 1,
 packet 3 is handled by core 4, etc?  Such that even
 though multiple cores are handling it, they are just doing
 so serially and not in parallel?  And if so, maybe it
 still helps to have many cores?

 Thanks for all the excellent info!

  In other words, until I see like 100% system usage
 in one core, I
  would have room to grow?

  You have room to grow if 'idle' is more than 0% and the
 interrupts of
  the networks cards are running on different
 cores.  If one core gets
  all the interrupts a second idle core doesn't get the
 chance to help
  out.  IIRC the interrupt allocation to cores is
 done at interrupt
  registration time or driver attach time.  It can
 be re-distributed
  at run time on most architecture but I'm not sure we
 have an easily
  accessible API for that.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: igb and ALTQ in 9.1-rc3

2013-04-04 Thread Barney Cordoba

Firstly, my OP was not intended to have anything to do with Jack. 
Frankly, he's just a mechanical programmer and not a designer, so its 
others that should be responsible for guiding him. There *should* be
someone at FreeBSD who is responsible for taking mechanically sound drivers
and optimizing them. But in the open source world, that function doesn't
usually exist. 

But the idea that the FreeBSD community refuses to point out that some
things just don't work well hurts people in the community. The portrayal
of every feature in FreeBSD as being equally useful and well done doesn't
provide the information that users need to make decisions about what to
use to run their businesses. 

It also hurts people that you've made IGB worse in FreeBSD 9. There's 
*some* expectation that it should be better in 9 than 7 or 8, and that it
should have fewer bugs. But in an effort to force in a rickety
implementation of multi-queue, you've converted the driver into something
that is guaranteed to rob any system of cpu cycles.

I wrote a multiqueue driver for 7 for igb that works very well, and I'd
hoped to be able to use igb in 9 without having to port it, even if it 
wasn't as good. But it's not just not as good; it's unusable in a heavy
production environment.

While it's noble (and convenient) for you folks who have jobs where you
get paid to write open source code to rip others for not contributing, 
Im sure that some of you with real jobs know that when someone pays you a 
lot of money to write code, you're not free to share the code or even the 
specific techniques publicly. After all, technique is the difference between a 
good and a bad driver.

I try to drop hints, but Jack's lack of curiosity as to how to make the
driver better is particularly troubling. So I just have to recommend that
igb cards not be used for production flows, because there is little hope
that it will improve any time soon.


BC

--- On Sun, 3/31/13, Adrian Chadd adr...@freebsd.org wrote:

 From: Adrian Chadd adr...@freebsd.org
 Subject: Re: igb and ALTQ in 9.1-rc3
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: Jeffrey EPieper jeffrey.e.pie...@intel.com, Nick Rogers 
 ncrog...@gmail.com, Clement Hermann (nodens) nodens2...@gmail.com, 
 Jack Vogel jfvo...@gmail.com, freebsd-net@freebsd.org 
 freebsd-net@freebsd.org
 Date: Sunday, March 31, 2013, 2:48 PM
 Barney,
 
 As much as we'd like it, Jack's full time job involves other
 things
 besides supporting FreeBSD.
 
 If you want to see it done better, please work with the
 FreeBSD
 developer community and improve the intel driver. No-one is
 stopping
 you from stepping in.
 
 In fact, this would be _beneficial_ to Jack's work inside
 Intel with
 FreeBSD. If there is more of an active community
 participating with
 the intel drivers and more companies choosing intel hardware
 for
 FreeBSD network services, Intel will likely dump more effort
 into
 FreeBSD.
 
 So please, stop your non-constructive trolling and
 complaining and put
 your skills to use for the greater good.
 
 Sheesh. Intel have supplied a very thorough, detailed driver
 as well
 as programming and errata datasheets for their chips. We
 aren't in the
 dark here. There's more than enough rope to hang ourselves
 with.
 Please focus on making it better.
 
 
 Adrian
 
 
 On 31 March 2013 05:35, Barney Cordoba barney_cord...@yahoo.com
 wrote:
  The reason that Jack is a no better programmer now than
 he was in 2009 might have something to do with the fact that
 he hides when his work is criticized.
  Why not release the benchmarks you did while designing
 the igb driver, Jack? Say what,you didn't do any
 benchmarking? How does the default driver perform, say in a
 firewall,with 1000 user load? What's the optimum number of
 queues to use in such a system?What's the effect of CPU
 binding? What's the effect with multiple cards when you
 havemore queues than you have physical cpus?
  What made you decide to use buf_ring? Something new to
 play with?
  I'm guessing that you have no idea.
  BC--- On Fri, 3/29/13, Jack Vogel jfvo...@gmail.com
 wrote:
 
  From: Jack Vogel jfvo...@gmail.com
  Subject: Re: igb and ALTQ in 9.1-rc3
  To: Pieper, Jeffrey E jeffrey.e.pie...@intel.com
  Cc: Barney Cordoba barney_cord...@yahoo.com,
 Nick Rogers ncrog...@gmail.com,
 freebsd-net@freebsd.org
 freebsd-net@freebsd.org,
 Clement Hermann (nodens) nodens2...@gmail.com
  Date: Friday, March 29, 2013, 12:36 PM
 
  Fortunately, Barney doesn't speak for me, or for Intel,
 and I've long ago realized its pointless to
  attempt anything like a fair conversation with him. The
 only thing he's ever contributed is slander
 
  and pseudo-critique... another poison thread I'm done
 with.
 
  Jack
 
 
 
  On Fri, Mar 29, 2013 at 8:45 AM, Pieper, Jeffrey E
 jeffrey.e.pie...@intel.com
 wrote:
 
 
 
 
 
  -Original Message-
 
  From: owner-freebsd-...@freebsd.org
 [mailto:owner-freebsd-...@freebsd.org]
 On Behalf Of Barney Cordoba
 
  Sent: Friday, March 29, 2013

Re: igb and ALTQ in 9.1-rc3

2013-04-04 Thread Barney Cordoba

--- On Tue, 4/2/13, Adrian Chadd adr...@freebsd.org wrote:

 From: Adrian Chadd adr...@freebsd.org
 Subject: Re: igb and ALTQ in 9.1-rc3
 To: Nick Rogers ncrog...@gmail.com
 Cc: Karim Fodil-Lemelin fodillemlinka...@gmail.com, 
 freebsd-net@freebsd.org freebsd-net@freebsd.org
 Date: Tuesday, April 2, 2013, 6:39 PM
 Yes:

 * you need to add it to conf/options - see if there's an
 opt_igb.h to
 add it to, otherwise you'll need to add one;
 * Make sure the driver code includes opt_igb.h;
 * Then make sure you make kernel modules using either make
 buildkernel
 KERNCONF=X, or you set the environment appropriately so the
 build
 scripts can find your kernel build directory (where it
 populates all
 the opt_xxx.h includes) and it'll have this module set.

 Hopefully Jack will do this.

 Yes, we need a better queue management discipline API in the
 kernel.
 Jacks' just falling afoul of the fact we don't have one.
 It's not his
 fault.

That's not true at all. For a bridged system running a firewall or doing
filtering, virtually all of the proper design can be done in 
the ethernet driver. Or course if you have 2 different drivers then you
need a different scheme, but if the input and the output is the same driver
you can manage virtually all of the contention. You can't just randomly
do things; you have to design to minimize lock contention. Drivers that
seem to work fine at low volume blow up quickly as contention increases.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: igb and ALTQ in 9.1-rc3

2013-03-31 Thread Barney Cordoba

Do you know anything about the subject, Scott? I'd be interested in seeing your 
benchmarks with various queue counts, binding to cpus vs not binding, and the 
numbers comparing the pre multiqueue driver to the current one. It's the 
minimum that any marginally competent network driver developer would do.

Or are you just hurling insults because you're devoid of actual ideas?

BC


--- On Fri, 3/29/13, Scott Long scott4l...@yahoo.com wrote:

 From: Scott Long scott4l...@yahoo.com
 Subject: Re: igb and ALTQ in 9.1-rc3
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: Nick Rogers ncrog...@gmail.com, Adrian Chadd adr...@freebsd.org, 
 Jeffrey EPieper jeffrey.e.pie...@intel.com, freebsd-net@freebsd.org 
 freebsd-net@freebsd.org, Clement Hermann (nodens) nodens2...@gmail.com, 
 Jack Vogel jfvo...@gmail.com
 Date: Friday, March 29, 2013, 12:42 PM
 Comedy gold.  It's been a while
 since I've seen this much idiocy from you, Barney. 
 Hopefully the rest of the mailing list will blackhole you,
 as I'm about to, and we can all get back to real work.
 
 Scott
 
 
 
 On Mar 29, 2013, at 10:38 AM, Barney Cordoba barney_cord...@yahoo.com
 wrote:
 
  it needs a lot more than a patch. It needs to be
 completely re-thunk
  
  --- On Fri, 3/29/13, Adrian Chadd adr...@freebsd.org
 wrote:
  
  From: Adrian Chadd adr...@freebsd.org
  Subject: Re: igb and ALTQ in 9.1-rc3
  To: Barney Cordoba barney_cord...@yahoo.com
  Cc: Jack Vogel jfvo...@gmail.com,
 Nick Rogers ncrog...@gmail.com,
 Jeffrey EPieper jeffrey.e.pie...@intel.com,
 freebsd-net@freebsd.org
 freebsd-net@freebsd.org,
 Clement Hermann (nodens) nodens2...@gmail.com
  Date: Friday, March 29, 2013, 12:07 PM
  
  Barney,
  Patches gratefully accepted.
  
  
  
  Adrian
  
  
  
  On 29 March 2013 08:54, Barney Cordoba barney_cord...@yahoo.com
 wrote:
  
  
  
  
  
  --- On Fri, 3/29/13, Pieper, Jeffrey E jeffrey.e.pie...@intel.com
 wrote:
  
  
  
  From: Pieper, Jeffrey E jeffrey.e.pie...@intel.com
  
  Subject: RE: igb and ALTQ in 9.1-rc3
  
  To: Barney Cordoba barney_cord...@yahoo.com,
 Jack Vogel jfvo...@gmail.com,
 Nick Rogers ncrog...@gmail.com
  
  
  Cc: freebsd-net@freebsd.org
 freebsd-net@freebsd.org,
 Clement Hermann (nodens) nodens2...@gmail.com
  
  
  Date: Friday, March 29, 2013, 11:45 AM
  
  
  
  
  
  -Original Message-
  
  From: owner-freebsd-...@freebsd.org
  
  [mailto:owner-freebsd-...@freebsd.org]
  
  On Behalf Of Barney Cordoba
  
  Sent: Friday, March 29, 2013 5:51 AM
  
  To: Jack Vogel; Nick Rogers
  
  Cc: freebsd-net@freebsd.org;
  
  Clement Hermann (nodens)
  
  Subject: Re: igb and ALTQ in 9.1-rc3
  
  
  
  
  
  
  
  --- On Thu, 3/28/13, Nick Rogers ncrog...@gmail.com
  
  wrote:
  
  
  
  From: Nick Rogers ncrog...@gmail.com
  
  Subject: Re: igb and ALTQ in 9.1-rc3
  
  To: Jack Vogel jfvo...@gmail.com
  
  Cc: Barney Cordoba barney_cord...@yahoo.com,
  
  Clement Hermann (nodens) nodens2...@gmail.com,
  
  freebsd-net@freebsd.org
  
  freebsd-net@freebsd.org
  
  Date: Thursday, March 28, 2013, 9:29 PM
  
  On Thu, Mar 28, 2013 at 4:16 PM, Jack
  
  Vogel jfvo...@gmail.com
  
  wrote:
  
  Have been kept fairly busy with other
 matters,
  
  one
  
  thing I could do short
  
  term is
  
  change the defines in igb the way I did in
 the em
  
  driver so you could still
  
  define
  
  the older if_start entry. Right now those
 are
  
  based on
  
  OS version and so you
  
  will
  
  automatically get if_transmit, but I could
 change
  
  it to
  
  be IGB_LEGACY_TX or
  
  so,
  
  and that could be defined in the Makefile.
  
  
  
  Would this help?
  
  
  
  I'm currently using ALTQ successfully with the
 em
  
  driver, so
  
  if igb
  
  behaved the same with respect to using if_start
 instead
  
  of
  
  if_transmit
  
  when ALTQ is in play, that would be great. I do
 not
  
  completely
  
  understand the change you propose as I am not
 very
  
  familiar
  
  with the
  
  driver internals. Any kind of patch or extra
  
  Makefile/make.conf
  
  definition that would allow me to build a
 9-STABLE
  
  kernel
  
  with an igb
  
  driver that works again with ALTQ, ASAP, would
 be much
  
  appreciated.
  
  
  
  
  
  Jack
  
  
  
  
  
  
  
  On Thu, Mar 28, 2013 at 2:31 PM, Nick
 Rogers
  
  ncrog...@gmail.com
  
  wrote:
  
  
  
  On Tue, Dec 11, 2012 at 1:09 AM, Jack
 Vogel
  
  jfvo...@gmail.com
  
  wrote:
  
  On Mon, Dec 10, 2012 at 11:58 PM,
 Gleb
  
  Smirnoff gleb...@freebsd.org
  
  wrote:
  
  
  
  On Mon, Dec 10, 2012 at
 03:31:19PM
  
  -0800,
  
  Jack Vogel wrote:
  
  J UH, maybe asking the
 owner of
  
  the
  
  driver would help :)
  
  J
  
  J ... and no, I've never
 been
  
  aware of
  
  doing anything to stop
  
  supporting
  
  altq
  
  J so you wouldn't see any
  
  commits. If
  
  there's something in the altq
  
  code
  
  or
  
  J support (which I have
 nothing
  
  to do
  
  with) that caused this no-one
  
  informed
  
  J me.
  
  
  
  Switching from

Re: igb and ALTQ in 9.1-rc3

2013-03-31 Thread Barney Cordoba

The reason that Jack is a no better programmer now than he was in 2009 might 
have something to do with the fact that he hides when his work is criticized. 
Why not release the benchmarks you did while designing the igb driver, Jack? 
Say what,you didn't do any benchmarking? How does the default driver perform, 
say in a firewall,with 1000 user load? What's the optimum number of queues to 
use in such a system?What's the effect of CPU binding? What's the effect with 
multiple cards when you havemore queues than you have physical cpus? 
What made you decide to use buf_ring? Something new to play with?
I'm guessing that you have no idea.
BC--- On Fri, 3/29/13, Jack Vogel jfvo...@gmail.com wrote:

From: Jack Vogel jfvo...@gmail.com
Subject: Re: igb and ALTQ in 9.1-rc3
To: Pieper, Jeffrey E jeffrey.e.pie...@intel.com
Cc: Barney Cordoba barney_cord...@yahoo.com, Nick Rogers 
ncrog...@gmail.com, freebsd-net@freebsd.org freebsd-net@freebsd.org, 
Clement Hermann (nodens) nodens2...@gmail.com
Date: Friday, March 29, 2013, 12:36 PM

Fortunately, Barney doesn't speak for me, or for Intel, and I've long ago 
realized its pointless to
attempt anything like a fair conversation with him. The only thing he's ever 
contributed is slander

and pseudo-critique... another poison thread I'm done with.

Jack



On Fri, Mar 29, 2013 at 8:45 AM, Pieper, Jeffrey E jeffrey.e.pie...@intel.com 
wrote:





-Original Message-

From: owner-freebsd-...@freebsd.org [mailto:owner-freebsd-...@freebsd.org] On 
Behalf Of Barney Cordoba

Sent: Friday, March 29, 2013 5:51 AM

To: Jack Vogel; Nick Rogers

Cc: freebsd-net@freebsd.org; Clement Hermann (nodens)

Subject: Re: igb and ALTQ in 9.1-rc3







--- On Thu, 3/28/13, Nick Rogers ncrog...@gmail.com wrote:



 From: Nick Rogers ncrog...@gmail.com

 Subject: Re: igb and ALTQ in 9.1-rc3

 To: Jack Vogel jfvo...@gmail.com

 Cc: Barney Cordoba barney_cord...@yahoo.com, Clement Hermann (nodens) 
 nodens2...@gmail.com, freebsd-net@freebsd.org freebsd-net@freebsd.org


 Date: Thursday, March 28, 2013, 9:29 PM

 On Thu, Mar 28, 2013 at 4:16 PM, Jack

 Vogel jfvo...@gmail.com

 wrote:

  Have been kept fairly busy with other matters, one

 thing I could do short

  term is

  change the defines in igb the way I did in the em

 driver so you could still

  define

  the older if_start entry. Right now those are based on

 OS version and so you

  will

  automatically get if_transmit, but I could change it to

 be IGB_LEGACY_TX or

  so,

  and that could be defined in the Makefile.

 

  Would this help?



 I'm currently using ALTQ successfully with the em driver, so

 if igb

 behaved the same with respect to using if_start instead of

 if_transmit

 when ALTQ is in play, that would be great. I do not

 completely

 understand the change you propose as I am not very familiar

 with the

 driver internals. Any kind of patch or extra

 Makefile/make.conf

 definition that would allow me to build a 9-STABLE kernel

 with an igb

 driver that works again with ALTQ, ASAP, would be much

 appreciated.



 

  Jack

 

 

 

  On Thu, Mar 28, 2013 at 2:31 PM, Nick Rogers ncrog...@gmail.com

 wrote:

 

  On Tue, Dec 11, 2012 at 1:09 AM, Jack Vogel jfvo...@gmail.com

 wrote:

   On Mon, Dec 10, 2012 at 11:58 PM, Gleb

 Smirnoff gleb...@freebsd.org

   wrote:

  

   On Mon, Dec 10, 2012 at 03:31:19PM -0800,

 Jack Vogel wrote:

   J UH, maybe asking the owner of the

 driver would help :)

   J

   J ... and no, I've never been aware of

 doing anything to stop

   supporting

   altq

   J so you wouldn't see any commits. If

 there's something in the altq

   code

   or

   J support (which I have nothing to do

 with) that caused this no-one

   informed

   J me.

  

   Switching from if_start to if_transmit

 effectively disables ALTQ

   support.

  

   AFAIR, there is some magic implemented in

 other drivers that makes them

   modern (that means using if_transmit), but

 still capable to switch to

   queueing

   mode if SIOCADDALTQ was casted upon them.

  

  

   Oh, hmmm, I'll look into the matter after my

 vacation.

  

   Jack

 

  Has there been any progress on resolving this

 issue? I recently ran

  into this problem upgrading my servers from 8.3 to

 9.1-RELEASE and am

  wondering what the latest recommendation is. I've

 used ALTQ and igb

  successfully for years and it is unfortunate it no

 longer works.

  Appreciate any advice.

 



Do yourself a favor and either get a cheap dual port 82571 card or

2 cards and disable the IGB ports. The igb driver is defective, and until

they back out the new, untested multi-queue stuff you're just neutering

your system trying to use it.



Frankly this project made a huge mistake by moving forward with multi

queue just for the sake of saying that you support it; without having

any credible plan for implementing it. That nonsense that Bill Macy did

should have been tarballed up and deposited in the trash folder

Re: igb and ALTQ in 9.1-rc3

2013-03-31 Thread Barney Cordoba

--- On Fri, 3/29/13, Adrian Chadd adr...@freebsd.org wrote:

 From: Adrian Chadd adr...@freebsd.org
 Subject: Re: igb and ALTQ in 9.1-rc3
 To: Nick Rogers ncrog...@gmail.com
 Cc: Pieper, Jeffrey E jeffrey.e.pie...@intel.com, 
 freebsd-net@freebsd.org freebsd-net@freebsd.org, Clement Hermann 
 (nodens) nodens2...@gmail.com, Jack Vogel jfvo...@gmail.com
 Date: Friday, March 29, 2013, 1:10 PM
 On 29 March 2013 10:04, Nick Rogers
 ncrog...@gmail.com
 wrote:

  Multiqueue or not, I would appreciate any help with
 this thread's
  original issue. Whether or not its the ideal thing to
 do, I cannot
  simply just replace the NICs with an em(4) variant, as
 I have hundreds
  of customers/systems already in production running 8.3
 and relying on
  the igb driver + ALTQ. I need to be able to upgrade
 these systems to
  9.1 without making hardware changes.

 If it's that critical, have you thought about contracting
 out that task to
 a developer?

You have 100s of systems/customers using 1990s-class traffic shaping
and you have no programmer on staff with the skills to patch and test
an ethernet driver?

the igb driver has always sucked rocks, why did you use them in the first
place. Or did they just happen to be on the MB you use?

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: igb and ALTQ in 9.1-rc3

2013-03-29 Thread Barney Cordoba

--- On Thu, 3/28/13, Nick Rogers ncrog...@gmail.com wrote:

 From: Nick Rogers ncrog...@gmail.com
 Subject: Re: igb and ALTQ in 9.1-rc3
 To: Jack Vogel jfvo...@gmail.com
 Cc: Barney Cordoba barney_cord...@yahoo.com, Clement Hermann (nodens) 
 nodens2...@gmail.com, freebsd-net@freebsd.org freebsd-net@freebsd.org
 Date: Thursday, March 28, 2013, 9:29 PM
 On Thu, Mar 28, 2013 at 4:16 PM, Jack
 Vogel jfvo...@gmail.com
 wrote:
  Have been kept fairly busy with other matters, one
 thing I could do short
  term is
  change the defines in igb the way I did in the em
 driver so you could still
  define
  the older if_start entry. Right now those are based on
 OS version and so you
  will
  automatically get if_transmit, but I could change it to
 be IGB_LEGACY_TX or
  so,
  and that could be defined in the Makefile.

  Would this help?

 I'm currently using ALTQ successfully with the em driver, so
 if igb
 behaved the same with respect to using if_start instead of
 if_transmit
 when ALTQ is in play, that would be great. I do not
 completely
 understand the change you propose as I am not very familiar
 with the
 driver internals. Any kind of patch or extra
 Makefile/make.conf
 definition that would allow me to build a 9-STABLE kernel
 with an igb
 driver that works again with ALTQ, ASAP, would be much
 appreciated.

  Jack

  On Thu, Mar 28, 2013 at 2:31 PM, Nick Rogers ncrog...@gmail.com
 wrote:

  On Tue, Dec 11, 2012 at 1:09 AM, Jack Vogel jfvo...@gmail.com
 wrote:
   On Mon, Dec 10, 2012 at 11:58 PM, Gleb
 Smirnoff gleb...@freebsd.org
   wrote:

   On Mon, Dec 10, 2012 at 03:31:19PM -0800,
 Jack Vogel wrote:
   J UH, maybe asking the owner of the
 driver would help :)
   J
   J ... and no, I've never been aware of
 doing anything to stop
   supporting
   altq
   J so you wouldn't see any commits. If
 there's something in the altq
   code
   or
   J support (which I have nothing to do
 with) that caused this no-one
   informed
   J me.

   Switching from if_start to if_transmit
 effectively disables ALTQ
   support.

   AFAIR, there is some magic implemented in
 other drivers that makes them
   modern (that means using if_transmit), but
 still capable to switch to
   queueing
   mode if SIOCADDALTQ was casted upon them.

   Oh, hmmm, I'll look into the matter after my
 vacation.

   Jack

  Has there been any progress on resolving this
 issue? I recently ran
  into this problem upgrading my servers from 8.3 to
 9.1-RELEASE and am
  wondering what the latest recommendation is. I've
 used ALTQ and igb
  successfully for years and it is unfortunate it no
 longer works.
  Appreciate any advice.

Do yourself a favor and either get a cheap dual port 82571 card or
2 cards and disable the IGB ports. The igb driver is defective, and until
they back out the new, untested multi-queue stuff you're just neutering 
your system trying to use it.

Frankly this project made a huge mistake by moving forward with multi
queue just for the sake of saying that you support it; without having
any credible plan for implementing it. That nonsense that Bill Macy did
should have been tarballed up and deposited in the trash folder. The
biggest mess in programming history.

That being said, the solution is not to hack the igb driver; its to make
ALTQ if_transmit compatible, which shouldn't be all that difficult. 

BC

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: vlan with modified MAC fails to communicate

2013-03-29 Thread Barney Cordoba

--- On Fri, 3/29/13, Pablo Ribalta Lorenzo r...@semihalf.com wrote:

 From: Pablo Ribalta Lorenzo r...@semihalf.com
 Subject: vlan with modified MAC fails to communicate
 To: freebsd-net@freebsd.org
 Date: Friday, March 29, 2013, 7:53 AM
 Hi there!

 Lately I've been investigating an issue that I would like to
 share, as I feel I may have to attack it from a different
 end.

 I have an ethernet interface from where I create a vlan.
 Once I set up the ip address in the vlan I can ping
 correctly on both
 sides. The issue arrives when I try to change the MAC
 address of the vlan, as from then on it fails to communicate
 unless:

 - I restore vlan's MAC address to its previous value
 - I enable promisc mode.

 It's also worth to mention that my current setup is FreeBSD
 8.3 and the NIC driver I'm using is not fully mature.

 I was wondering if this behavior is due to some limitations
 in the NCI driver I'm using or if in fact it's the correct
 way to
 proceed, as it was possible to reproduce this same issue in
 FreeBSD 8.3 and FreeBSD CURRENT versions, even using more
 mature
 NIC drivers as 'em' and 're'.

 Could somebody please shed some light in this? Thank you.

Without looking at the code, it's likely that you should be changing
the MAC address BEFORE you set up the VLAN. The mac is probably being
mapped into some table that being used to track the vlans.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

RE: igb and ALTQ in 9.1-rc3

2013-03-29 Thread Barney Cordoba

--- On Fri, 3/29/13, Pieper, Jeffrey E jeffrey.e.pie...@intel.com wrote:

 From: Pieper, Jeffrey E jeffrey.e.pie...@intel.com
 Subject: RE: igb and ALTQ in 9.1-rc3
 To: Barney Cordoba barney_cord...@yahoo.com, Jack Vogel 
 jfvo...@gmail.com, Nick Rogers ncrog...@gmail.com
 Cc: freebsd-net@freebsd.org freebsd-net@freebsd.org, Clement Hermann 
 (nodens) nodens2...@gmail.com
 Date: Friday, March 29, 2013, 11:45 AM

 -Original Message-
 From: owner-freebsd-...@freebsd.org
 [mailto:owner-freebsd-...@freebsd.org]
 On Behalf Of Barney Cordoba
 Sent: Friday, March 29, 2013 5:51 AM
 To: Jack Vogel; Nick Rogers
 Cc: freebsd-net@freebsd.org;
 Clement Hermann (nodens)
 Subject: Re: igb and ALTQ in 9.1-rc3

 --- On Thu, 3/28/13, Nick Rogers ncrog...@gmail.com
 wrote:

  From: Nick Rogers ncrog...@gmail.com
  Subject: Re: igb and ALTQ in 9.1-rc3
  To: Jack Vogel jfvo...@gmail.com
  Cc: Barney Cordoba barney_cord...@yahoo.com,
 Clement Hermann (nodens) nodens2...@gmail.com,
 freebsd-net@freebsd.org
 freebsd-net@freebsd.org
  Date: Thursday, March 28, 2013, 9:29 PM
  On Thu, Mar 28, 2013 at 4:16 PM, Jack
  Vogel jfvo...@gmail.com
  wrote:
   Have been kept fairly busy with other matters,
 one
  thing I could do short
   term is
   change the defines in igb the way I did in the em
  driver so you could still
   define
   the older if_start entry. Right now those are
 based on
  OS version and so you
   will
   automatically get if_transmit, but I could change
 it to
  be IGB_LEGACY_TX or
   so,
   and that could be defined in the Makefile.

   Would this help?

  I'm currently using ALTQ successfully with the em
 driver, so
  if igb
  behaved the same with respect to using if_start instead
 of
  if_transmit
  when ALTQ is in play, that would be great. I do not
  completely
  understand the change you propose as I am not very
 familiar
  with the
  driver internals. Any kind of patch or extra
  Makefile/make.conf
  definition that would allow me to build a 9-STABLE
 kernel
  with an igb
  driver that works again with ALTQ, ASAP, would be much
  appreciated.

   Jack

   On Thu, Mar 28, 2013 at 2:31 PM, Nick Rogers
 ncrog...@gmail.com
  wrote:

   On Tue, Dec 11, 2012 at 1:09 AM, Jack Vogel
 jfvo...@gmail.com
  wrote:
On Mon, Dec 10, 2012 at 11:58 PM, Gleb
  Smirnoff gleb...@freebsd.org
wrote:

On Mon, Dec 10, 2012 at 03:31:19PM
 -0800,
  Jack Vogel wrote:
J UH, maybe asking the owner of
 the
  driver would help :)
J
J ... and no, I've never been
 aware of
  doing anything to stop
supporting
altq
J so you wouldn't see any
 commits. If
  there's something in the altq
code
or
J support (which I have nothing
 to do
  with) that caused this no-one
informed
J me.

Switching from if_start to
 if_transmit
  effectively disables ALTQ
support.

AFAIR, there is some magic
 implemented in
  other drivers that makes them
modern (that means using
 if_transmit), but
  still capable to switch to
queueing
mode if SIOCADDALTQ was casted upon
 them.

Oh, hmmm, I'll look into the matter after
 my
  vacation.

Jack

   Has there been any progress on resolving this
  issue? I recently ran
   into this problem upgrading my servers from
 8.3 to
  9.1-RELEASE and am
   wondering what the latest recommendation is.
 I've
  used ALTQ and igb
   successfully for years and it is unfortunate
 it no
  longer works.
   Appreciate any advice.

 Do yourself a favor and either get a cheap dual port
 82571 card or
 2 cards and disable the IGB ports. The igb driver is
 defective, and until
 they back out the new, untested multi-queue stuff you're
 just neutering 
 your system trying to use it.

 Frankly this project made a huge mistake by moving
 forward with multi
 queue just for the sake of saying that you support it;
 without having
 any credible plan for implementing it. That nonsense
 that Bill Macy did
 should have been tarballed up and deposited in the trash
 folder. The
 biggest mess in programming history.

 That being said, the solution is not to hack the igb
 driver; its to make
 ALTQ if_transmit compatible, which shouldn't be all that
 difficult. 

 BC

 I may be misunderstanding what you are saying, but if the
 solution is, as you say not to hack the igb driver, then
 how is it defective in this case? Or are you just directing
 vitriol toward Intel? Multi-queue is working fine in igb. 

 Jeff

It's defective because it's been poorly implemented and has more bugs 
than a Manhattan hotel bed. Adding queues without a proper plan just add
more lock contention. It's not a production-ready driver.

As Jack once said, Intel doesn't care about performance, they're just
example drivers. igb is an example of how not to do things.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail

RE: igb and ALTQ in 9.1-rc3

2013-03-29 Thread Barney Cordoba

--- On Fri, 3/29/13, Pieper, Jeffrey E jeffrey.e.pie...@intel.com wrote:

 From: Pieper, Jeffrey E jeffrey.e.pie...@intel.com
 Subject: RE: igb and ALTQ in 9.1-rc3
 To: Barney Cordoba barney_cord...@yahoo.com, Jack Vogel 
 jfvo...@gmail.com, Nick Rogers ncrog...@gmail.com
 Cc: freebsd-net@freebsd.org freebsd-net@freebsd.org, Clement Hermann 
 (nodens) nodens2...@gmail.com
 Date: Friday, March 29, 2013, 11:45 AM

 -Original Message-
 From: owner-freebsd-...@freebsd.org
 [mailto:owner-freebsd-...@freebsd.org]
 On Behalf Of Barney Cordoba
 Sent: Friday, March 29, 2013 5:51 AM
 To: Jack Vogel; Nick Rogers
 Cc: freebsd-net@freebsd.org;
 Clement Hermann (nodens)
 Subject: Re: igb and ALTQ in 9.1-rc3

 --- On Thu, 3/28/13, Nick Rogers ncrog...@gmail.com
 wrote:

  From: Nick Rogers ncrog...@gmail.com
  Subject: Re: igb and ALTQ in 9.1-rc3
  To: Jack Vogel jfvo...@gmail.com
  Cc: Barney Cordoba barney_cord...@yahoo.com,
 Clement Hermann (nodens) nodens2...@gmail.com,
 freebsd-net@freebsd.org
 freebsd-net@freebsd.org
  Date: Thursday, March 28, 2013, 9:29 PM
  On Thu, Mar 28, 2013 at 4:16 PM, Jack
  Vogel jfvo...@gmail.com
  wrote:
   Have been kept fairly busy with other matters,
 one
  thing I could do short
   term is
   change the defines in igb the way I did in the em
  driver so you could still
   define
   the older if_start entry. Right now those are
 based on
  OS version and so you
   will
   automatically get if_transmit, but I could change
 it to
  be IGB_LEGACY_TX or
   so,
   and that could be defined in the Makefile.

   Would this help?

  I'm currently using ALTQ successfully with the em
 driver, so
  if igb
  behaved the same with respect to using if_start instead
 of
  if_transmit
  when ALTQ is in play, that would be great. I do not
  completely
  understand the change you propose as I am not very
 familiar
  with the
  driver internals. Any kind of patch or extra
  Makefile/make.conf
  definition that would allow me to build a 9-STABLE
 kernel
  with an igb
  driver that works again with ALTQ, ASAP, would be much
  appreciated.

   Jack

   On Thu, Mar 28, 2013 at 2:31 PM, Nick Rogers
 ncrog...@gmail.com
  wrote:

   On Tue, Dec 11, 2012 at 1:09 AM, Jack Vogel
 jfvo...@gmail.com
  wrote:
On Mon, Dec 10, 2012 at 11:58 PM, Gleb
  Smirnoff gleb...@freebsd.org
wrote:

On Mon, Dec 10, 2012 at 03:31:19PM
 -0800,
  Jack Vogel wrote:
J UH, maybe asking the owner of
 the
  driver would help :)
J
J ... and no, I've never been
 aware of
  doing anything to stop
supporting
altq
J so you wouldn't see any
 commits. If
  there's something in the altq
code
or
J support (which I have nothing
 to do
  with) that caused this no-one
informed
J me.

Switching from if_start to
 if_transmit
  effectively disables ALTQ
support.

AFAIR, there is some magic
 implemented in
  other drivers that makes them
modern (that means using
 if_transmit), but
  still capable to switch to
queueing
mode if SIOCADDALTQ was casted upon
 them.

Oh, hmmm, I'll look into the matter after
 my
  vacation.

Jack

   Has there been any progress on resolving this
  issue? I recently ran
   into this problem upgrading my servers from
 8.3 to
  9.1-RELEASE and am
   wondering what the latest recommendation is.
 I've
  used ALTQ and igb
   successfully for years and it is unfortunate
 it no
  longer works.
   Appreciate any advice.

 Do yourself a favor and either get a cheap dual port
 82571 card or
 2 cards and disable the IGB ports. The igb driver is
 defective, and until
 they back out the new, untested multi-queue stuff you're
 just neutering 
 your system trying to use it.

 Frankly this project made a huge mistake by moving
 forward with multi
 queue just for the sake of saying that you support it;
 without having
 any credible plan for implementing it. That nonsense
 that Bill Macy did
 should have been tarballed up and deposited in the trash
 folder. The
 biggest mess in programming history.

 That being said, the solution is not to hack the igb
 driver; its to make
 ALTQ if_transmit compatible, which shouldn't be all that
 difficult. 

 BC

 I may be misunderstanding what you are saying, but if the
 solution is, as you say not to hack the igb driver, then
 how is it defective in this case? Or are you just directing
 vitriol toward Intel? Multi-queue is working fine in igb. 

 Jeff

It works like crap. Your definition of works is that it doesnt crash.
Mine is that it works better with multiple queues than with 1, which
it doesn't. And if you load a system up, it will blow up with  multiqueue
before it will with 1 queue. The point of using Multiqueue isn't to 
exhaust all of the cpus instead of just 2. It's to get past the wall of
using only 2 cpus when they're exhausted. The goal is to increase the 
capacity of the system; not to make it look like you're using more cpus
without any

Re: igb and ALTQ in 9.1-rc3

2013-03-29 Thread Barney Cordoba

it needs a lot more than a patch. It needs to be completely re-thunk

--- On Fri, 3/29/13, Adrian Chadd adr...@freebsd.org wrote:

From: Adrian Chadd adr...@freebsd.org
Subject: Re: igb and ALTQ in 9.1-rc3
To: Barney Cordoba barney_cord...@yahoo.com
Cc: Jack Vogel jfvo...@gmail.com, Nick Rogers ncrog...@gmail.com, 
Jeffrey EPieper jeffrey.e.pie...@intel.com, freebsd-net@freebsd.org 
freebsd-net@freebsd.org, Clement Hermann (nodens) nodens2...@gmail.com
Date: Friday, March 29, 2013, 12:07 PM

Barney,
Patches gratefully accepted.



Adrian



On 29 March 2013 08:54, Barney Cordoba barney_cord...@yahoo.com wrote:





--- On Fri, 3/29/13, Pieper, Jeffrey E jeffrey.e.pie...@intel.com wrote:



 From: Pieper, Jeffrey E jeffrey.e.pie...@intel.com

 Subject: RE: igb and ALTQ in 9.1-rc3

 To: Barney Cordoba barney_cord...@yahoo.com, Jack Vogel 
 jfvo...@gmail.com, Nick Rogers ncrog...@gmail.com


 Cc: freebsd-net@freebsd.org freebsd-net@freebsd.org, Clement Hermann 
 (nodens) nodens2...@gmail.com


 Date: Friday, March 29, 2013, 11:45 AM





 -Original Message-

 From: owner-freebsd-...@freebsd.org

 [mailto:owner-freebsd-...@freebsd.org]

 On Behalf Of Barney Cordoba

 Sent: Friday, March 29, 2013 5:51 AM

 To: Jack Vogel; Nick Rogers

 Cc: freebsd-net@freebsd.org;

 Clement Hermann (nodens)

 Subject: Re: igb and ALTQ in 9.1-rc3







 --- On Thu, 3/28/13, Nick Rogers ncrog...@gmail.com

 wrote:



  From: Nick Rogers ncrog...@gmail.com

  Subject: Re: igb and ALTQ in 9.1-rc3

  To: Jack Vogel jfvo...@gmail.com

  Cc: Barney Cordoba barney_cord...@yahoo.com,

 Clement Hermann (nodens) nodens2...@gmail.com,

 freebsd-net@freebsd.org

 freebsd-net@freebsd.org

  Date: Thursday, March 28, 2013, 9:29 PM

  On Thu, Mar 28, 2013 at 4:16 PM, Jack

  Vogel jfvo...@gmail.com

  wrote:

   Have been kept fairly busy with other matters,

 one

  thing I could do short

   term is

   change the defines in igb the way I did in the em

  driver so you could still

   define

   the older if_start entry. Right now those are

 based on

  OS version and so you

   will

   automatically get if_transmit, but I could change

 it to

  be IGB_LEGACY_TX or

   so,

   and that could be defined in the Makefile.

  

   Would this help?

 

  I'm currently using ALTQ successfully with the em

 driver, so

  if igb

  behaved the same with respect to using if_start instead

 of

  if_transmit

  when ALTQ is in play, that would be great. I do not

  completely

  understand the change you propose as I am not very

 familiar

  with the

  driver internals. Any kind of patch or extra

  Makefile/make.conf

  definition that would allow me to build a 9-STABLE

 kernel

  with an igb

  driver that works again with ALTQ, ASAP, would be much

  appreciated.

 

  

   Jack

  

  

  

   On Thu, Mar 28, 2013 at 2:31 PM, Nick Rogers

 ncrog...@gmail.com

  wrote:

  

   On Tue, Dec 11, 2012 at 1:09 AM, Jack Vogel

 jfvo...@gmail.com

  wrote:

On Mon, Dec 10, 2012 at 11:58 PM, Gleb

  Smirnoff gleb...@freebsd.org

wrote:

   

On Mon, Dec 10, 2012 at 03:31:19PM

 -0800,

  Jack Vogel wrote:

J UH, maybe asking the owner of

 the

  driver would help :)

J

J ... and no, I've never been

 aware of

  doing anything to stop

supporting

altq

J so you wouldn't see any

 commits. If

  there's something in the altq

code

or

J support (which I have nothing

 to do

  with) that caused this no-one

informed

J me.

   

Switching from if_start to

 if_transmit

  effectively disables ALTQ

support.

   

AFAIR, there is some magic

 implemented in

  other drivers that makes them

modern (that means using

 if_transmit), but

  still capable to switch to

queueing

mode if SIOCADDALTQ was casted upon

 them.

   

   

Oh, hmmm, I'll look into the matter after

 my

  vacation.

   

Jack

  

   Has there been any progress on resolving this

  issue? I recently ran

   into this problem upgrading my servers from

 8.3 to

  9.1-RELEASE and am

   wondering what the latest recommendation is.

 I've

  used ALTQ and igb

   successfully for years and it is unfortunate

 it no

  longer works.

   Appreciate any advice.

  

 

 Do yourself a favor and either get a cheap dual port

 82571 card or

 2 cards and disable the IGB ports. The igb driver is

 defective, and until

 they back out the new, untested multi-queue stuff you're

 just neutering

 your system trying to use it.

 

 Frankly this project made a huge mistake by moving

 forward with multi

 queue just for the sake of saying that you support it;

 without having

 any credible plan for implementing it. That nonsense

 that Bill Macy did

 should have been tarballed up and deposited in the trash

 folder. The

 biggest mess in programming history.

 

 That being said, the solution is not to hack the igb

 driver; its to make

 ALTQ if_transmit compatible, which shouldn't be all

Re: igb network lockups

2013-03-05 Thread Barney Cordoba

--- On Mon, 3/4/13, Zaphod Beeblebrox zbee...@gmail.com wrote:

 From: Zaphod Beeblebrox zbee...@gmail.com
 Subject: Re: igb network lockups
 To: Jack Vogel jfvo...@gmail.com
 Cc: Nick Rogers ncrog...@gmail.com, Sepherosa Ziehau 
 sepher...@gmail.com, Christopher D. Harrison harri...@biostat.wisc.edu, 
 freebsd-net@freebsd.org freebsd-net@freebsd.org
 Date: Monday, March 4, 2013, 1:58 PM
 For everyone having lockup problems
 with IGB, I'd like to ask if they could
 try disabling hyperthreads --- this worked for me on one
 system but has
 been unnecessary on others.

Gee, maybe binding an interrupt to a virtual cpu isn't a good idea?

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: igb network lockups

2013-03-05 Thread Barney Cordoba

--- On Mon, 3/4/13, Zaphod Beeblebrox zbee...@gmail.com wrote:

 From: Zaphod Beeblebrox zbee...@gmail.com
 Subject: Re: igb network lockups
 To: Jack Vogel jfvo...@gmail.com
 Cc: Nick Rogers ncrog...@gmail.com, Sepherosa Ziehau 
 sepher...@gmail.com, Christopher D. Harrison harri...@biostat.wisc.edu, 
 freebsd-net@freebsd.org freebsd-net@freebsd.org
 Date: Monday, March 4, 2013, 1:58 PM
 For everyone having lockup problems
 with IGB, I'd like to ask if they could
 try disabling hyperthreads --- this worked for me on one
 system but has
 been unnecessary on others.

Gee, maybe binding an interrupt to a virtual cpu isn't a good idea?

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: igb network lockups

2013-03-02 Thread Barney Cordoba

--- On Mon, 2/25/13, Christopher D. Harrison harri...@biostat.wisc.edu wrote:

 From: Christopher D. Harrison harri...@biostat.wisc.edu
 Subject: Re: igb network lockups
 To: Jack Vogel jfvo...@gmail.com
 Cc: freebsd-net@freebsd.org
 Date: Monday, February 25, 2013, 1:38 PM
 Sure,
 The problem appears on both systems running with ALTQ and
 vanilla.
      -C
 On 02/25/13 12:29, Jack Vogel wrote:
  I've not heard of this problem, but I think most users
 do not use 
  ALTQ, and we (Intel) do not
  test using it. Can it be eliminated from the equation?

  Jack

  On Mon, Feb 25, 2013 at 10:16 AM, Christopher D.
 Harrison 
  harri...@biostat.wisc.edu
 mailto:harri...@biostat.wisc.edu
 wrote:

      I recently have been
 experiencing network freezes and network
      lockups on our Freebsd 9.1
 systems which are running zfs and nfs
      file servers.
      I upgraded from 9.0 to 9.1
 about 2 months ago and we have been
      having issues with almost
 bi-monthly.   The issue manifests in the
      system becomes unresponsive to
 any/all nfs clients.   The system
      is not resource bound as our
 I/O is low to disk and our network is
      usually in the 20mbit/40mbit
 range.   We do notice a correlation
      between temporary i/o spikes
 and network freezes but not enough to
      send our system in to lockup
 mode for the next 5min.   Currently
      we have 4 igb nics in 2 aggr's
 with 8 queue's per nic and our
      dev.igb reports:

      dev.igb.3.%desc: Intel(R)
 PRO/1000 Network Connection version - 2.3.4

      I am almost certain the problem
 is with the ibg driver as a friend
      is also experiencing the same
 problem with the same intel igb nic.
        He has addressed the
 issue by restarting the network using netif
      on his
 systems.   According to my friend, once the
 network
      interfaces get cleared,
 everything comes back and starts working
      as expected.

      I have noticed an issue with
 the igb driver and I was looking for
      thoughts on how to help address
 this problem.

 http://freebsd.1045724.n5.nabble.com/em-igb-if-transmit-drbr-and-ALTQ-td5760338.html

      Thoughts/Ideas are greatly
 appreciated!!!

          -C

Do you have 32 cpus in the system? You've created a lock contention
nightmare; frankly Im surprised that the system runs at all.

Try running with 1 queue per nic. The point of using queues is to spread
the load; the fact that you're even using queues with such a minuscule load
is a commentary on the blind use of features without any explanation or
understanding of what they do.

Does igb still bind to CPUs without regard to whether its a real cpu or
a hyper thread? This needs to be removed.

I wish that someone who understood this stuff would have a beer with Jack
and explain to him why this design is defective. The default for this
driver is almost always the wrong configuration.

You don't need to spread the load with 40Mb/s throughput, and using
multiple queues will use a lot more CPU than using just 1. do you really
want 4 cpus using 10% instead of 1 using 14%?

You also should consider increasing your tx buffers; a property of 
applications like ALTQ is that they tend to send out big bursts of 
packets and they can overflow the rings. I'm not specifically familiar with
ALTQ so Im not sure how it handles such things; nor am I sure of how it
handles multiple tx queues, if at all.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()

2013-01-19 Thread Barney Cordoba

--- On Fri, 1/18/13, Adrian Chadd adr...@freebsd.org wrote:

 From: Adrian Chadd adr...@freebsd.org
 Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: freebsd-net@freebsd.org, Luigi Rizzo ri...@iet.unipi.it
 Date: Friday, January 18, 2013, 3:09 PM
 On 18 January 2013 06:30, Barney
 Cordoba barney_cord...@yahoo.com
 wrote:

  I don't see the distinction between the rx thread
 getting re-scheduled
  immediately vs introducing another thread. In fact
 you increase missed
  interrupts by this method. The entire point of
 interrupt moderation is
  to tune the intervals where a driver is processed.

 The problem with interrupt moderation combined with
 enabling/disabling
 interrupts is that if you get it even slightly wrong,
 you won't run the packet processing thread(s) until the next
 interrupt
 occurs - even if something is in the queue.

Which is the point of interrupt moderation. Your argument is that I only
want 6000 interrupts per second, but I'm willing to launch N tasks that
have the exact same processing load where N = 20. So you're willing to
have 12 interrupts/task_queues per second (its only possible to get
about 2000pps in 1/6000th of a second on a gigabit link unless you're
fielding runts). 

This all comes down, again, to tuning. Luigi's example would result
in 39 tasks being queued to process his 3900 backup with a process limit
of 100. This would bypass the next interrupt by a wide margin.

Is the point of moderation to not have the network task take over your
system? If you don't care, then why not just set moderation to 20Kpps?

The work should be the amount of time you're willing to process packets
within the interrupt moderation window. The settings go hand in hand. 

I'm not saying that the task_queue idea is wrong; however in Luigi's 
example it will cause substantially more overhead by launching too many
tasks. Unless you're still running a 700Mhz P3 100 is way too low for a 
workload limit. It's just arbitrarily silly.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()

2013-01-19 Thread Barney Cordoba

--- On Fri, 1/18/13, John Baldwin j...@freebsd.org wrote:

 From: John Baldwin j...@freebsd.org
 Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
 To: freebsd-net@freebsd.org
 Cc: Barney Cordoba barney_cord...@yahoo.com, Adrian Chadd 
 adr...@freebsd.org, Luigi Rizzo ri...@iet.unipi.it
 Date: Friday, January 18, 2013, 11:49 AM
 On Friday, January 18, 2013 9:30:40
 am Barney Cordoba wrote:

  --- On Thu, 1/17/13, Adrian Chadd adr...@freebsd.org
 wrote:

   From: Adrian Chadd adr...@freebsd.org
   Subject: Re: two problems in
 dev/e1000/if_lem.c::lem_handle_rxtx()
   To: Barney Cordoba barney_cord...@yahoo.com
   Cc: Luigi Rizzo ri...@iet.unipi.it,
 freebsd-net@freebsd.org
   Date: Thursday, January 17, 2013, 11:48 AM
   There's also the subtle race
   condition in TX and RX handling that
   re-queuing the taskqueue gets around.

   Which is:

   * The hardware is constantly receiving frames ,
 right until
   you blow
   the FIFO away by filling it up;
   * The RX thread receives a bunch of frames;
   * .. and processes them;
   * .. once it's done processing, the hardware may
 have read
   some more
   frames in the meantime;
   * .. and the hardware may have generated a
 mitigated
   interrupt which
   you're ignoring, since you're processing frames;
   * So if your architecture isn't 100% paranoid, you
 may end
   up having
   to wait for the next interrupt to handle what's
 currently in
   the
   queue.

   Now if things are done correct:

   * The hardware generates a mitigated interrupt
   * The mask register has that bit disabled, so you
 don't end
   up receiving it;
   * You finish your RX queue processing, and there's
 more
   stuff that's
   appeared in the FIFO (hence why the hardware has
 generated
   another
   mitigated interrupt);
   * You unmask the interrupt;
   * .. and the hardware immediately sends you the
 MSI or
   signals an interrupt;
   * .. thus you re-enter the RX processing thread
 almost(!)
   immediately.

   However as the poster(s) have said, the interrupt
   mask/unmask in the
   intel driver(s) may not be 100% correct, so you're
 going to
   end up
   with situations where interrupts are missed.

   The reason why this wasn't a big deal in the
 deep/distant
   past is
   because we didn't used to have kernel preemption,
 or
   multiple kernel
   threads running, or an overly aggressive scheduler
 trying
   to
   parallelise things as much as possible. A lot of
   net80211/ath bugs
   have popped out of the woodwork specifically
 because of the
   above
   changes to the kernel. They were bugs before, but
 people
   didn't hit
   them.

  I don't see the distinction between the rx thread
 getting re-scheduled
  immediately vs introducing another thread. In fact
 you increase missed
  interrupts by this method. The entire point of
 interrupt moderation is
  to tune the intervals where a driver is processed.

  You might as well just not have a work limit and
 process until your done.
  The idea that gee, I've been taking up too much cpu,
 I'd better yield
  to just queue a task and continue soon after doesn't
 make much sense to 
  me.

 If there are multiple threads with the same priority then
 batching the work up 
 into chunks allows the scheduler to round-robin among
 them.  However, when a 
 task requeues itself that doesn't actually work since the
 taskqueue thread 
 will see the requeued task before it yields the CPU. 
 Alternatively, if you 
 force all the relevant interrupt handlers to use the same
 thread pool and 
 instead of requeueing a separate task you requeue your
 handler in the ithread 
 pool then you can get the desired round-robin
 behavior.  (I have changes to 
 the ithread stuff that get us part of the way there in that
 handlers can 
 reschedule themselves and much of the plumbing is in place
 for shared thread 
 pools among different interrupts.)

I dont see any round robin effect here. You have:

Repeat:
- Process 100 frames
if (more)
- Queue a Task

there's only 1 task at a time. All its really doing is yielding and
rescheduling itself to resume the loop.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()

2013-01-18 Thread Barney Cordoba

--- On Thu, 1/17/13, Adrian Chadd adr...@freebsd.org wrote:

 From: Adrian Chadd adr...@freebsd.org
 Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: Luigi Rizzo ri...@iet.unipi.it, freebsd-net@freebsd.org
 Date: Thursday, January 17, 2013, 11:48 AM
 There's also the subtle race
 condition in TX and RX handling that
 re-queuing the taskqueue gets around.

 Which is:

 * The hardware is constantly receiving frames , right until
 you blow
 the FIFO away by filling it up;
 * The RX thread receives a bunch of frames;
 * .. and processes them;
 * .. once it's done processing, the hardware may have read
 some more
 frames in the meantime;
 * .. and the hardware may have generated a mitigated
 interrupt which
 you're ignoring, since you're processing frames;
 * So if your architecture isn't 100% paranoid, you may end
 up having
 to wait for the next interrupt to handle what's currently in
 the
 queue.

 Now if things are done correct:

 * The hardware generates a mitigated interrupt
 * The mask register has that bit disabled, so you don't end
 up receiving it;
 * You finish your RX queue processing, and there's more
 stuff that's
 appeared in the FIFO (hence why the hardware has generated
 another
 mitigated interrupt);
 * You unmask the interrupt;
 * .. and the hardware immediately sends you the MSI or
 signals an interrupt;
 * .. thus you re-enter the RX processing thread almost(!)
 immediately.

 However as the poster(s) have said, the interrupt
 mask/unmask in the
 intel driver(s) may not be 100% correct, so you're going to
 end up
 with situations where interrupts are missed.

 The reason why this wasn't a big deal in the deep/distant
 past is
 because we didn't used to have kernel preemption, or
 multiple kernel
 threads running, or an overly aggressive scheduler trying
 to
 parallelise things as much as possible. A lot of
 net80211/ath bugs
 have popped out of the woodwork specifically because of the
 above
 changes to the kernel. They were bugs before, but people
 didn't hit
 them.

I don't see the distinction between the rx thread getting re-scheduled
immediately vs introducing another thread. In fact you increase missed
interrupts by this method. The entire point of interrupt moderation is
to tune the intervals where a driver is processed.

You might as well just not have a work limit and process until your done.
The idea that gee, I've been taking up too much cpu, I'd better yield
to just queue a task and continue soon after doesn't make much sense to 
me.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()

2013-01-18 Thread Barney Cordoba

--- On Thu, 1/17/13, Adrian Chadd adr...@freebsd.org wrote:

 From: Adrian Chadd adr...@freebsd.org
 Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: Luigi Rizzo ri...@iet.unipi.it, freebsd-net@freebsd.org
 Date: Thursday, January 17, 2013, 11:48 AM
 There's also the subtle race
 condition in TX and RX handling that
 re-queuing the taskqueue gets around.

 Which is:

 * The hardware is constantly receiving frames , right until
 you blow
 the FIFO away by filling it up;
 * The RX thread receives a bunch of frames;
 * .. and processes them;
 * .. once it's done processing, the hardware may have read
 some more
 frames in the meantime;
 * .. and the hardware may have generated a mitigated
 interrupt which
 you're ignoring, since you're processing frames;
 * So if your architecture isn't 100% paranoid, you may end
 up having
 to wait for the next interrupt to handle what's currently in
 the
 queue.

 Now if things are done correct:

 * The hardware generates a mitigated interrupt
 * The mask register has that bit disabled, so you don't end
 up receiving it;
 * You finish your RX queue processing, and there's more
 stuff that's
 appeared in the FIFO (hence why the hardware has generated
 another
 mitigated interrupt);
 * You unmask the interrupt;
 * .. and the hardware immediately sends you the MSI or
 signals an interrupt;
 * .. thus you re-enter the RX processing thread almost(!)
 immediately.

 However as the poster(s) have said, the interrupt
 mask/unmask in the
 intel driver(s) may not be 100% correct, so you're going to
 end up
 with situations where interrupts are missed.

 The reason why this wasn't a big deal in the deep/distant
 past is
 because we didn't used to have kernel preemption, or
 multiple kernel
 threads running, or an overly aggressive scheduler trying
 to
 parallelise things as much as possible. A lot of
 net80211/ath bugs
 have popped out of the woodwork specifically because of the
 above
 changes to the kernel. They were bugs before, but people
 didn't hit
 them.

I don't see the distinction between the rx thread getting re-scheduled
immediately vs introducing another thread. In fact you increase missed
interrupts by this method. The entire point of interrupt moderation is
to tune the intervals where a driver is processed.

You might as well just not have a work limit and process until your done.
The idea that gee, I've been taking up too much cpu, I'd better yield
to just queue a task and continue soon after doesn't make much sense to 
me.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()

2013-01-18 Thread Barney Cordoba

--- On Thu, 1/17/13, Adrian Chadd adr...@freebsd.org wrote:

 From: Adrian Chadd adr...@freebsd.org
 Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
 To: Luigi Rizzo ri...@iet.unipi.it
 Cc: Barney Cordoba barney_cord...@yahoo.com, freebsd-net@freebsd.org
 Date: Thursday, January 17, 2013, 2:04 PM
 On 17 January 2013 09:44, Luigi Rizzo
 ri...@iet.unipi.it
 wrote:

  (in the lem driver this cannot happen until you
 release some
  rx slots, which only happens once at the end of the
 lem_rxeof() routine,
  not long before re-enabling interrupts)

 Right.

  * .. and the hardware immediately sends you the MSI
 or signals an interrupt;
  * .. thus you re-enter the RX processing thread
 almost(!) immediately.

  the problem i was actually seeing are slightly
 different, namely:
  - once the driver lags behind, it does not have a
 chance to recover
    even if there are CPU cycles
 available, because both interrupt
    rate and packets per interrupt are
 capped.

 Right, but the interrupt isn't being continuously asserted
 whilst
 there are packets there.
 You just get a single interrupt when the queue has frames in
 it, and
 you won't get a further interrupt for whatever the
 mitigation period
 is (or ever, if you fill up the RX FIFO, right?)

  - much worse, once the input stream stops, you have a
 huge backlog that
    is not drained. And if, say, you try
 to ping the machine,
    the incoming packet is behind another
 3900 packets, so the first
    interrupt drains 100 (but not the ping
 request, so no response),
    you keep going for a while, eventually
 the external world sees the
    machine as not responding and stops
 even trying to talk to it.

 Right, so you do need to do what you're doing - but I still
 think
 there's a possibility of a race there.
 Namely that your queue servicing does reach the end of the
 list (and
 so you don't immediately reschedule the taskqueue) but some
 more
 frames have arrived.
 You have to wait for the next mitigated interrupt for that.

i don't think that's the case. The mitigation is a minimum delay. If the 
delay is longer than the minimum, you'd get an interrupt as soon as you
enable it, which is clearly better than scheduling a task.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()

2013-01-18 Thread Barney Cordoba


 the problem i was actually seeing are slightly different,
 namely:
 - once the driver lags behind, it does not have a chance to
 recover
   even if there are CPU cycles available, because both
 interrupt 
   rate and packets per interrupt are capped.
 - much worse, once the input stream stops, you have a huge
 backlog that
   is not drained. And if, say, you try to ping the
 machine,
   the incoming packet is behind another 3900 packets,
 so the first
   interrupt drains 100 (but not the ping request, so no
 response),
   you keep going for a while, eventually the external
 world sees the
   machine as not responding and stops even trying to
 talk to it.

This is a silly example. As I said before, the 100 work limit is 
arbitrary and too low for a busy network. If you have a backlog of
3900 packets with a workload of 100, then your system is so incompetently
tuned that it's not even worthy of discussion.

If you're using workload and task queues because you don't know how to 
tune moderation the process_limit, that's one discussion. But if you can't
 process all of the packets in your RX queue in the interrupt window than
 you either need to tune your machine better or get a faster machine.

When you tune the work limit you're making a decision about the trade off 
between livelock and dropping packets. It's not an arbitrary decision.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()

2013-01-17 Thread Barney Cordoba

--- On Wed, 1/16/13, Luigi Rizzo ri...@iet.unipi.it wrote:

 From: Luigi Rizzo ri...@iet.unipi.it
 Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: freebsd-net@freebsd.org
 Date: Wednesday, January 16, 2013, 9:55 PM
 On Wed, Jan 16, 2013 at 06:19:01AM
 -0800, Barney Cordoba wrote:

  --- On Tue, 1/15/13, Luigi Rizzo ri...@iet.unipi.it
 wrote:

   From: Luigi Rizzo ri...@iet.unipi.it
   Subject: two problems in
 dev/e1000/if_lem.c::lem_handle_rxtx()
   To: h...@freebsd.org,
 freebsd-net@freebsd.org
 freebsd-net@freebsd.org
   Cc: Jack Vogel jfvo...@gmail.com
   Date: Tuesday, January 15, 2013, 8:23 PM
   Hi,
   i found a couple of problems in
   ? ? ? ?
   dev/e1000/if_lem.c::lem_handle_rxtx() ,
   (compare with dev/e1000/if_em.c::em_handle_que()
 for better
   understanding):

   1. in if_em.c::em_handle_que(), when em_rxeof()
 exceeds the
   ? rx_process_limit, the task is rescheduled so it
 can
   complete the work.
   ? Conversely, in if_lem.c::lem_handle_rxtx() the
   lem_rxeof() is
   ? only run once, and if there are more pending
 packets
   the only
   ? chance to drain them is to receive (many) more
   interrupts.

   ? This is a relatively serious problem, because
 the
   receiver has
   ? a hard time recovering.

   ? I'd like to commit a fix to this same as it is
 done
   in e1000.

   2. in if_em.c::em_handle_que(), interrupts are
 reenabled
   unconditionally,
   ???whereas lem_handle_rxtx() only enables
   them if IFF_DRV_RUNNING is set.

   ???I cannot really tell what is the correct
   way here, so I'd like
   ???to put a comment there unless there is a
   good suggestion on
   ???what to do.

   ???Accesses to the intr register are
   race-prone anyways
   ???(disabled in fastintr, enabled in the rxtx
   task without
   ???holding any lock, and generally accessed
   under EM_CORE_LOCK
   ???in other places), and presumably
   enabling/disabling the
   ???interrupts around activations of the taks
   is just an
   ???optimization (and on a VM, it is actually
   a pessimization
   ???due to the huge cost of VM exits).

   cheers
   luigi

  This is not really a big deal; this is how things works
 for a million 
  years before we had task queues.

 i agree that the second issue is not a big deal.

 The first one, on the contrary, is a real problem no matter
 how you set the 'work' parameter (unless you make it large
 enough to drain the entire queue in one call).

Which should be the goal, except in extreme circumstances. Having more
packets than work should be the extreme case and not the norm.

All work should do is normalize bursts of packets. If you're consistently
over work then either your work parameter is too low, or your interrupt
moderation is too wide. Adding a cleanup task simply compensates for bad
tuning.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()

2013-01-16 Thread Barney Cordoba

--- On Tue, 1/15/13, Luigi Rizzo ri...@iet.unipi.it wrote:

 From: Luigi Rizzo ri...@iet.unipi.it
 Subject: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
 To: h...@freebsd.org, freebsd-net@freebsd.org freebsd-net@freebsd.org
 Cc: Jack Vogel jfvo...@gmail.com
 Date: Tuesday, January 15, 2013, 8:23 PM
 Hi,
 i found a couple of problems in

 dev/e1000/if_lem.c::lem_handle_rxtx() ,
 (compare with dev/e1000/if_em.c::em_handle_que() for better
 understanding):

 1. in if_em.c::em_handle_que(), when em_rxeof() exceeds the
   rx_process_limit, the task is rescheduled so it can
 complete the work.
   Conversely, in if_lem.c::lem_handle_rxtx() the
 lem_rxeof() is
   only run once, and if there are more pending packets
 the only
   chance to drain them is to receive (many) more
 interrupts.

   This is a relatively serious problem, because the
 receiver has
   a hard time recovering.

   I'd like to commit a fix to this same as it is done
 in e1000.

 2. in if_em.c::em_handle_que(), interrupts are reenabled
 unconditionally,
    whereas lem_handle_rxtx() only enables
 them if IFF_DRV_RUNNING is set.

    I cannot really tell what is the correct
 way here, so I'd like
    to put a comment there unless there is a
 good suggestion on
    what to do.

    Accesses to the intr register are
 race-prone anyways
    (disabled in fastintr, enabled in the rxtx
 task without
    holding any lock, and generally accessed
 under EM_CORE_LOCK
    in other places), and presumably
 enabling/disabling the
    interrupts around activations of the taks
 is just an
    optimization (and on a VM, it is actually
 a pessimization
    due to the huge cost of VM exits).

 cheers
 luigi

This is not really a big deal; this is how things works for a million 
years before we had task queues.

Intel controllers have built in interrupt moderation (unless you're on an
ISA bus or something), so interrupt storms aren't possible. Typical default
is 6K ints per second, so you can't get another interrupt for 1/6000th of
a second whether there's more work to do or not.

The work parameter should be an indicator that something is happening too
slow, which can happen with a shaper that's taking a lot more time than
normal to process packets. 

Systems should have a maximum pps engineered into its tuning depending on
the cpu to avoid live-lock on legacy systems. the default work limit of
100 is too low on a gigabit system. 

queuing tasks actually creates more overhead in the system, not less. The
issue is whether the process_limit * interrupt_moderation is set to a 
pps that's suitable for your system. 

Setting low work limits isn't really a good idea unless you have some other
time sensitive kernel task. Usually networking is a priority, so setting
arbitrary work limits makes less sense than queuing an additional task,
which defeats the purpose of interrupt moderation.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: To SMP or not to SMP

2013-01-09 Thread Barney Cordoba

--- On Wed, 1/9/13, Erich Dollansky erichsfreebsdl...@alogt.com wrote:

 From: Erich Dollansky erichsfreebsdl...@alogt.com
 Subject: Re: To SMP or not to SMP
 To: Mark Atkinson atkin...@gmail.com
 Cc: freebsd-net@freebsd.org
 Date: Wednesday, January 9, 2013, 1:01 AM
 Hi,

 On Tue, 08 Jan 2013 08:29:51 -0800
 Mark Atkinson atkin...@gmail.com
 wrote:

  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1

  On 01/07/2013 18:25, Barney Cordoba wrote:
   I have a situation where I have to run 9.1 on an
 old single core
   box. Does anyone have a handle on whether it's
 better to build a
   non SMP kernel or to just use a standard SMP build
 with just the
   one core? Thanks.

  You can build a SMP kernel, but you'll get better
 performance (in my
  experience) with SCHED_4BSD on single cpu than with
 ULE.

 I would not say so. The machine behaves different with the
 two
 schedulers. It depends mostly what you want to do with the
 machine. I
 forgot which scheduler I finally left in the single CPU
 kernel.

 Erich

4BSD runs pretty well with an SMP kernel. I can test ULE and compare
easily. A no SMP kernel is problematic as the igb driver doesn't seem
to work and my onboard NICs are, sadly, igb. 

Rather than say depends what you want to do, perhaps an explanation of
which cases you might choose one or the other would be helpful.

So can anyone in the know confirm that the kernel really isn't smart enough
to know there there's only 1 core so that most of the SMP overhead is 
avoided? It seems to me that SMP scheduling should only be enabled if there
is more than 1 core as part of the scheduler initialization. Its arrogant
indeed to assume that just because SMP support is compiled in that there
are multiple cores.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: To SMP or not to SMP

2013-01-09 Thread Barney Cordoba

--- On Wed, 1/9/13, Erich Dollansky erichsfreebsdl...@alogt.com wrote:

 From: Erich Dollansky erichsfreebsdl...@alogt.com
 Subject: Re: To SMP or not to SMP
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: Mark Atkinson atkin...@gmail.com, freebsd-net@freebsd.org, 
 jack.vo...@gmail.com
 Date: Wednesday, January 9, 2013, 9:14 AM
 Hi,

 On Wed, 9 Jan 2013 05:40:13 -0800 (PST)
 Barney Cordoba barney_cord...@yahoo.com
 wrote:

  --- On Wed, 1/9/13, Erich Dollansky erichsfreebsdl...@alogt.com
  wrote:
   From: Erich Dollansky erichsfreebsdl...@alogt.com
   Subject: Re: To SMP or not to SMP
   To: Mark Atkinson atkin...@gmail.com
   Cc: freebsd-net@freebsd.org
   Date: Wednesday, January 9, 2013, 1:01 AM
   Hi,

   On Tue, 08 Jan 2013 08:29:51 -0800
   Mark Atkinson atkin...@gmail.com
   wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/07/2013 18:25, Barney Cordoba wrote:
 I have a situation where I have to run
 9.1 on an
   old single core
 box. Does anyone have a handle on
 whether it's
   better to build a
 non SMP kernel or to just use a standard
 SMP build
   with just the
 one core? Thanks.

You can build a SMP kernel, but you'll get
 better
   performance (in my
experience) with SCHED_4BSD on single cpu
 than with
   ULE.

   I would not say so. The machine behaves different
 with the
   two
   schedulers. It depends mostly what you want to do
 with the
   machine. I
   forgot which scheduler I finally left in the
 single CPU
   kernel.

   Erich

  4BSD runs pretty well with an SMP kernel. I can test
 ULE and compare
  easily. A no SMP kernel is problematic as the igb
 driver doesn't seem
  to work and my onboard NICs are, sadly, igb. 

 this is bad luck. I know of the kernels as I have had SMP
 and single
 CPU machines since 4.x times.

  Rather than say depends what you want to do, perhaps
 an explanation
  of which cases you might choose one or the other would
 be helpful.

  So can anyone in the know confirm that the kernel
 really isn't smart
  enough to know there there's only 1 core so that most
 of the SMP

 The kernel does not think like this. It is a fixed program
 flow.

  overhead is avoided? It seems to me that SMP
 scheduling should only
  be enabled if there is more than 1 core as part of the
 scheduler
  initialization. Its arrogant indeed to assume that just
 because SMP
  support is compiled in that there are multiple cores.

 I compile my own kernels and set the parameters as needed.

 Erich

This explanation defies the possibility of a GENERIC kernel, which 
of course is an important element of a GPOS. Its too bad that smp
support can't be done with logic rather than a kernel option. 

The big thing I see is the use of legacy interrupts vs msix. Its not
like flipping off SMP support only changes the scheduler behavior.

BC

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: To SMP or not to SMP

2013-01-09 Thread Barney Cordoba

--- On Wed, 1/9/13, sth...@nethelp.no sth...@nethelp.no wrote:

 From: sth...@nethelp.no sth...@nethelp.no
 Subject: Re: To SMP or not to SMP
 To: erichsfreebsdl...@alogt.com
 Cc: barney_cord...@yahoo.com, freebsd-net@freebsd.org, jack.vo...@gmail.com, 
 atkin...@gmail.com
 Date: Wednesday, January 9, 2013, 9:32 AM
   4BSD runs pretty well with
 an SMP kernel. I can test ULE and compare
   easily. A no SMP kernel is problematic as the igb
 driver doesn't seem
   to work and my onboard NICs are, sadly, igb. 

  this is bad luck. I know of the kernels as I have had
 SMP and single
  CPU machines since 4.x times.

 I have had igb working with both SMP and non-SMP kernel for
 at least a
 year or two, 8.x-STABLE. No specific problems.

 Steinar Haug, Nethelp consulting, sth...@nethelp.no

Maybe a problem with legacy interrupts on more modern processors?
I'm using an E5520 and while the NIC inits ok, it just doesnt seem to
gen interrupts. I can't spend much time debugging it

I notice that HAMMER kernels use MSI/X interrupts whether SMP is enabled
or not, while i386 kernels seem to require APIC. Is there some physical
reason for this?

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: To SMP or not to SMP

2013-01-09 Thread Barney Cordoba

--- On Tue, 1/8/13, Mark Atkinson atkin...@gmail.com wrote:

 From: Mark Atkinson atkin...@gmail.com
 Subject: Re: To SMP or not to SMP
 To: freebsd-net@freebsd.org
 Date: Tuesday, January 8, 2013, 11:29 AM
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 01/07/2013 18:25, Barney Cordoba wrote:
  I have a situation where I have to run 9.1 on an old
 single core
  box. Does anyone have a handle on whether it's better
 to build a
  non SMP kernel or to just use a standard SMP build with
 just the
  one core? Thanks.

 You can build a SMP kernel, but you'll get better
 performance (in my
 experience) with SCHED_4BSD on single cpu than with ULE.

I've tested the 2 schedulers on an SMP kernel with 1 core. I don't have
a 1 core system to test with so I'm using an E5520 with  1 core enabled.

Bridging a controlled test (curl-loader doing a web-load test with 100 
users that consistently generates 870Mb/s and 77Kpps, I see the following:

top -SH

ULE:

idle: 74.85%
kernel {em1 que} 17.68%
kernel {em0 que} 5.86%
httpd: .49%  

4BSD:

idle: 70.95%
kernel {em1 que} 18.07%
kernel {em0 que} 4.44%
httpd: .93%

Note that the https is a monitor I'm running.

so it appears that theres 7% of usage missing (all other apps show 0%
usage). 

If i had to guess just looking at the numbers, it seems that 4BSD might 
do better with the interrupt level stuff, and not as good with user 
level context switching. I think they're close enough to stick with ULE
so I can just use a stock kernel.

One thing that bothers me is the idle sits at 100% when other tasks are 
registering values under light loads, so it's certainly not all that 
accurate.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: To SMP or not to SMP

2013-01-09 Thread Barney Cordoba

--- On Wed, 1/9/13, Barney Cordoba barney_cord...@yahoo.com wrote:

 From: Barney Cordoba barney_cord...@yahoo.com
 Subject: Re: To SMP or not to SMP
 To: Mark Atkinson atkin...@gmail.com
 Cc: freebsd-net@freebsd.org
 Date: Wednesday, January 9, 2013, 1:08 PM

 --- On Tue, 1/8/13, Mark Atkinson atkin...@gmail.com
 wrote:

  From: Mark Atkinson atkin...@gmail.com
  Subject: Re: To SMP or not to SMP
  To: freebsd-net@freebsd.org
  Date: Tuesday, January 8, 2013, 11:29 AM
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1

  On 01/07/2013 18:25, Barney Cordoba wrote:
   I have a situation where I have to run 9.1 on an
 old
  single core
   box. Does anyone have a handle on whether it's
 better
  to build a
   non SMP kernel or to just use a standard SMP build
 with
  just the
   one core? Thanks.

  You can build a SMP kernel, but you'll get better
  performance (in my
  experience) with SCHED_4BSD on single cpu than with
 ULE.

 I've tested the 2 schedulers on an SMP kernel with 1 core. I
 don't have
 a 1 core system to test with so I'm using an E5520
 with  1 core enabled.

 Bridging a controlled test (curl-loader doing a web-load
 test with 100 
 users that consistently generates 870Mb/s and 77Kpps, I see
 the following:

 top -SH

 ULE:

 idle: 74.85%
 kernel {em1 que} 17.68%
 kernel {em0 que} 5.86%
 httpd: .49%  

 4BSD:

 idle: 70.95%
 kernel {em1 que} 18.07%
 kernel {em0 que} 4.44%
 httpd: .93%

 Note that the https is a monitor I'm running.

 so it appears that theres 7% of usage missing (all other
 apps show 0%
 usage). 

 If i had to guess just looking at the numbers, it seems that
 4BSD might 
 do better with the interrupt level stuff, and not as good
 with user 
 level context switching. I think they're close enough to
 stick with ULE
 so I can just use a stock kernel.

 One thing that bothers me is the idle sits at 100% when
 other tasks are 
 registering values under light loads, so it's certainly not
 all that 
 accurate.

 BC

Ok, thanks to J Baldwin's tip I got a NON-SMP kernel running with some
interesting results. Here's all 4 tests:

I've tested the 2 schedulers on an SMP kernel with 1 core. I don't have
a 1 core system to test with so I'm using an E5520 with  1 core enabled.

Bridging a controlled test (curl-loader doing a web-load test with 100 
users that consistently generates 870Mb/s and 77Kpps, I see the following:

top -SH

ULE (SMP):

idle: 74.85%
kernel {em1 que} 17.68%
kernel {em0 que} 5.86%
httpd: .49%  

4BSD (SMP):

idle: 70.95%
kernel {em1 que} 18.07%
kernel {em0 que} 4.44%
httpd: .93%

4BSD (NON-SMP):

idle: 72.95%
kernel {em1 que} 15.04%
kernel {em0 que} 6.10%
httpd: 1.17%

ULE (NON-SMP):

idle: 76.17%
kernel {em1 que} 16.99%
kernel {em0 que} 5.18%
httpd: 1.66%

A kernel with SMP off seems to be a bit more efficient. A better test would
be to have more stuff running, but Im about out of time on this project.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: To SMP or not to SMP

2013-01-08 Thread Barney Cordoba

--- On Mon, 1/7/13, Erich Dollansky erichsfreebsdl...@alogt.com wrote:

 From: Erich Dollansky erichsfreebsdl...@alogt.com
 Subject: Re: To SMP or not to SMP
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: freebsd-net@freebsd.org
 Date: Monday, January 7, 2013, 10:56 PM
 Hi,

 On Mon, 7 Jan 2013 18:25:58 -0800 (PST)
 Barney Cordoba barney_cord...@yahoo.com
 wrote:

  I have a situation where I have to run 9.1 on an old
 single core box.
  Does anyone have a handle on whether it's better to
 build a non SMP
  kernel or to just use a standard SMP build with just
 the one core?
  Thanks.

 I ran a single CPU version of FreeBSD until my last single
 CPU got hit
 by a lightning last April or May without any problems.

 I never saw a reason to include the overhead of SMP for this
 kind of
 machine and I also never ran into problems with this.

Another assumption based on logic rather than empirical evidence. I 
think I'll test it.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: To SMP or not to SMP

2013-01-08 Thread Barney Cordoba

--- On Tue, 1/8/13, Ian Smith smi...@nimnet.asn.au wrote:

 From: Ian Smith smi...@nimnet.asn.au
 Subject: Re: To SMP or not to SMP
 To: Garrett Cooper yaneg...@gmail.com
 Cc: Barney Cordoba barney_cord...@yahoo.com, Erich Dollansky 
 erichsfreebsdl...@alogt.com, freebsd-net@freebsd.org
 Date: Tuesday, January 8, 2013, 11:34 AM
 On Tue, 8 Jan 2013 07:57:04 -0800,
 Garrett Cooper wrote:
   On Jan 8, 2013, at 7:50 AM, Barney Cordoba wrote:

--- On Mon, 1/7/13, Erich Dollansky erichsfreebsdl...@alogt.com
 wrote:

From: Erich Dollansky erichsfreebsdl...@alogt.com
Subject: Re: To SMP or not to SMP
To: Barney Cordoba barney_cord...@yahoo.com
Cc: freebsd-net@freebsd.org
Date: Monday, January 7, 2013, 10:56 PM
Hi,

On Mon, 7 Jan 2013 18:25:58 -0800 (PST)
Barney Cordoba barney_cord...@yahoo.com
wrote:

I have a situation where I have to run
 9.1 on an old
single core box.
Does anyone have a handle on whether it's
 better to
build a non SMP
kernel or to just use a standard SMP
 build with just
the one core?
Thanks.

I ran a single CPU version of FreeBSD until
 my last single
CPU got hit
by a lightning last April or May without any
 problems.

I never saw a reason to include the overhead
 of SMP for this
kind of
machine and I also never ran into problems
 with this.

Another assumption based on logic rather than
 empirical evidence.

       It isn't really an offhanded
 assumption because there _is_ 
   additional overhead added into the kernel structures
 to make things 
   work SMP with locking :). Whether or not it's
 measurable for you and 
   your applications, I have no idea.
   HTH,
   -Garrett

 Where's Kris Kennaway when you need something compared,
 benchmarked 
 under N different types of loads, and nicely graphed? 
 Do we have a 
 contender? :)

 cheers, Ian

I don't need no stinking graphs. I'll do some testing.

bc
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe driver

2013-01-07 Thread Barney Cordoba

--- On Mon, 1/7/13, Willem Jan Withagen w...@digiware.nl wrote:

 From: Willem Jan Withagen w...@digiware.nl
 Subject: Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe 
 driver
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: Garrett Cooper yaneg...@gmail.com, freebsd-net@freebsd.org, Adrian 
 Chadd adr...@freebsd.org, David Christensen davi...@freebsd.org, 
 lini...@freebsd.org
 Date: Monday, January 7, 2013, 3:20 AM
 On 2013-01-05 16:17, Barney Cordoba
 wrote:

  --- On Fri, 1/4/13, Willem Jan Withagen w...@digiware.nl
 wrote:

  From: Willem Jan Withagen w...@digiware.nl
  Subject: Re: kern/174851: [bxe] [patch] UDP
 checksum offload is wrong in bxe driver
  To: Barney Cordoba barney_cord...@yahoo.com
  Cc: Garrett Cooper yaneg...@gmail.com,
 freebsd-net@freebsd.org,
 Adrian Chadd adr...@freebsd.org,
 David Christensen davi...@freebsd.org,
 lini...@freebsd.org
  Date: Friday, January 4, 2013, 9:41 AM
  On 2013-01-01 0:04, Barney Cordoba
  wrote:

  The statement above assumes that there is a
 benefit.
  voIP packets 
  are short, so the benefit of offloading is
 reduced.
  There is some
  delay added by the hardware, and there are cpu
 cycles
  used in managing
  the offload code. So those operations not only
 muddy
  the code,
  but they may not be faster than simply doing
 the
  checksum on a much, much
  faster cpu.

  Forgoing all the discussions on performance and
 possible
  penalties in
  drivers.

  I think there is a large set of UDP streams (and
 growing)
  that do use
  larger packets.

  The video streaming we did used a size of
 header(14)+7*188,
  which is the
  max number of MPEG packet to fit into anything with
 an MTU
   1500.

  Receiving those on small embedded devices which can
 do HW
  check-summing
  is very beneficial there.
  On the large servers we would generate up to 5Gbit
 of
  outgoing streams.
  I'm sure that offloading UDP checks would be an
 advantage as
  well.
  (They did run mainly Linux, but FreeBSD would also
 work)

  Unfortunately most of the infrastructure has been
 taken
  down, so it is
  no longer possible to verify any of the
 assumptions.

  --WjW

  If you haven't benchmarked it, then you're just
 guessing. That's my point.

  Its like SMP in freeBSD 4. People bought big, honking
 machines and the
  big expensive machines were slower than a single core
 system at less than
  half the price. Just because something sounds better
 doesn't mean that it is better.

 I completely agree

 Dutch proverb goes:
     To measure is to know
 Which was the subtitle of my graduation report, and my
 professional
 motto when working as a systems-architect

 That's why it is sad that the system is no longer up and
 running,
 because a 0-order check would be no more that 1 ifconfig
 would have made
 a difference.

 But that is all water under the bridge.

 --WjW

You can't really benchmark on a live network; you need a control. It's
easy enough to generate controlled UDP streams. And of course every NIC
would be a new deal. I'm sure that UDP offload is a checklist feature and
not something that the intels and broadcoms of the world do a lot of 
performance testing for.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

To SMP or not to SMP

2013-01-07 Thread Barney Cordoba

I have a situation where I have to run 9.1 on an old single core box. Does 
anyone have a handle on whether it's better to build a non SMP kernel or to 
just use a standard SMP build with just the one core? Thanks.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: To SMP or not to SMP

2013-01-07 Thread Barney Cordoba

--- On Mon, 1/7/13, Garrett Cooper yaneg...@gmail.com wrote:

 From: Garrett Cooper yaneg...@gmail.com
 Subject: Re: To SMP or not to SMP
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: freebsd-net@freebsd.org
 Date: Monday, January 7, 2013, 9:38 PM
 On Jan 7, 2013, at 6:25 PM, Barney
 Cordoba wrote:

  I have a situation where I have to run 9.1 on an old
 single core box. Does anyone have a handle on whether it's
 better to build a non SMP kernel or to just use a standard
 SMP build with just the one core? Thanks.

     Non-SMP. I don't see why it would be wise
 to involve the standard locking structure overhead for a
 single-core box.

It might not be wise, but I'd guess that 99% of the development work
is being done on SMP systems, so who knows what weirdness non-smp
systems might have.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe driver

2013-01-05 Thread Barney Cordoba

--- On Fri, 1/4/13, Willem Jan Withagen w...@digiware.nl wrote:

 From: Willem Jan Withagen w...@digiware.nl
 Subject: Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe 
 driver
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: Garrett Cooper yaneg...@gmail.com, freebsd-net@freebsd.org, Adrian 
 Chadd adr...@freebsd.org, David Christensen davi...@freebsd.org, 
 lini...@freebsd.org
 Date: Friday, January 4, 2013, 9:41 AM
 On 2013-01-01 0:04, Barney Cordoba
 wrote:

  The statement above assumes that there is a benefit.
 voIP packets 
  are short, so the benefit of offloading is reduced.
 There is some
  delay added by the hardware, and there are cpu cycles
 used in managing
  the offload code. So those operations not only muddy
 the code,
  but they may not be faster than simply doing the
 checksum on a much, much
  faster cpu.

 Forgoing all the discussions on performance and possible
 penalties in
 drivers.

 I think there is a large set of UDP streams (and growing)
 that do use
 larger packets.

 The video streaming we did used a size of header(14)+7*188,
 which is the
 max number of MPEG packet to fit into anything with an MTU
  1500.

 Receiving those on small embedded devices which can do HW
 check-summing
 is very beneficial there.
 On the large servers we would generate up to 5Gbit of
 outgoing streams.
 I'm sure that offloading UDP checks would be an advantage as
 well.
 (They did run mainly Linux, but FreeBSD would also work)

 Unfortunately most of the infrastructure has been taken
 down, so it is
 no longer possible to verify any of the assumptions.

 --WjW

If you haven't benchmarked it, then you're just guessing. That's my point.

Its like SMP in freeBSD 4. People bought big, honking machines and the
big expensive machines were slower than a single core system at less than
half the price. Just because something sounds better doesn't mean that it is 
better.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe driver

2012-12-31 Thread Barney Cordoba

--- On Mon, 12/31/12, lini...@freebsd.org lini...@freebsd.org wrote:

 From: lini...@freebsd.org lini...@freebsd.org
 Subject: Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe 
 driver
 To: lini...@freebsd.org, freebsd-b...@freebsd.org, freebsd-net@FreeBSD.org
 Date: Monday, December 31, 2012, 2:28 AM
 Old Synopsis: UDP checksum offload is
 wrong in bxe driver
 New Synopsis: [bxe] [patch] UDP checksum offload is wrong in
 bxe driver

 Responsible-Changed-From-To: freebsd-bugs-freebsd-net
 Responsible-Changed-By: linimon
 Responsible-Changed-When: Mon Dec 31 07:28:11 UTC 2012
 Responsible-Changed-Why: 
 Over to maintainer(s).

 http://www.freebsd.org/cgi/query-pr.cgi?pr=174851
 ___
 freebsd-net@freebsd.org
 mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Has anyone done an analysis on modern hardware as to whether udp csum
offloading is actually beneficial? Even on 2007 hardware I came to the
conclusion that using offloading was a negative.

Reminds me of the days when people were using intelligent ethernet cards
that were slower than the host cpu. The handshaking cost you more than 
just using shared memory.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe driver

2012-12-31 Thread Barney Cordoba

--- On Mon, 12/31/12, Adrian Chadd adr...@freebsd.org wrote:

 From: Adrian Chadd adr...@freebsd.org
 Subject: Re: kern/174851: [bxe] [patch] UDP checksum offload is wrong in bxe 
 driver
 To: Garrett Cooper yaneg...@gmail.com
 Cc: Barney Cordoba barney_cord...@yahoo.com, David Christensen 
 davi...@freebsd.org, lini...@freebsd.org, freebsd-net@freebsd.org
 Date: Monday, December 31, 2012, 2:00 PM
 On 31 December 2012 07:58, Garrett
 Cooper yaneg...@gmail.com
 wrote:

  I would ask David about whether or not there was a
 performance
  difference because they might have some numbers for
 if_bxe.

  Not sure about the concept in general, but it seems
 like a reasonable
  application protocol specific request. But by and
 large, I agree that
  UDP checksumming doesn't make logical sense because it
 adds
  unnecessary overhead on a L3 protocol that's assumed to
 be unreliable.

 People are terminating millions of VoIP calls on FreeBSD
 devices. All using UDP.

 I can imagine large scale VoIP gateways wanting to try and
 benefit from this.

The statement above assumes that there is a benefit. voIP packets 
are short, so the benefit of offloading is reduced. There is some
delay added by the hardware, and there are cpu cycles used in managing
the offload code. So those operations not only muddy the code,
but they may not be faster than simply doing the checksum on a much, much
faster cpu.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: igb and ALTQ in 9.1-rc3

2012-12-11 Thread Barney Cordoba

--- On Tue, 12/11/12, Gleb Smirnoff gleb...@freebsd.org wrote:

 From: Gleb Smirnoff gleb...@freebsd.org
 Subject: Re: igb and ALTQ in 9.1-rc3
 To: Jack Vogel jfvo...@gmail.com
 Cc: Clement Hermann (nodens) nodens2...@gmail.com, Barney Cordoba 
 barney_cord...@yahoo.com, freebsd-net@FreeBSD.org
 Date: Tuesday, December 11, 2012, 2:58 AM
 On Mon, Dec 10, 2012 at 03:31:19PM
 -0800, Jack Vogel wrote:
 J UH, maybe asking the owner of the driver would help
 :)
 J 
 J ... and no, I've never been aware of doing anything to
 stop supporting altq
 J so you wouldn't see any commits. If there's something
 in the altq code or
 J support (which I have nothing to do with) that caused
 this no-one informed
 J me.

 Switching from if_start to if_transmit effectively disables
 ALTQ support.

 AFAIR, there is some magic implemented in other drivers that
 makes them
 modern (that means using if_transmit), but still capable to
 switch to queueing
 mode if SIOCADDALTQ was casted upon them.

It seems pretty difficult to say that something is compatible with
something else if it hasn't been tested in a few years.

It seems to me that ATLQ is the one that should handle if_transmit. 
although it's a good argument for having a raw send function in 
drivers. Ethernet drivers don't need more than a send() routing that
loads a packet into the ring. The decision on what to do if you can't 
queue a packet should be in the  network layer, if we must still call
things layers. 

start is a leftover from a day when you stuffed a buffer and waited
for an interrupt to stuff in another. The whole idea is antiquated.

Imagine drivers that pull packets off of a card and simply queue it;
and that you simply submit a packet to be queued for transmit. Instead
of trying to find 35 programmers that understand all of the lock BS,
you only need to have a couple.

I always disable all of the gobbledegook like checksum offloading. They
just muddy the water and have very little effect on performance. A modern
cpu can do a checksum as fast as you can manage the capabilities without
disrupting the processing path. 

With FreeBSD, every driver is an experience. Some suck so bad that they
should come with a warning. The MSK driver is completely useless, as
an example.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: igb and ALTQ in 9.1-rc3

2012-12-11 Thread Barney Cordoba

--- On Tue, 12/11/12, Karim Fodil-Lemelin fodillemlinka...@gmail.com wrote:

 From: Karim Fodil-Lemelin fodillemlinka...@gmail.com
 Subject: Re: igb and ALTQ in 9.1-rc3
 To: freebsd-net@freebsd.org
 Cc: nodens2...@gmail.com
 Date: Tuesday, December 11, 2012, 9:56 AM
 On 11/12/2012 9:15 AM, Ermal Luçi
 wrote:
  On Tue, Dec 11, 2012 at 2:05 PM, Barney Cordoba 
  barney_cord...@yahoo.comwrote:

  --- On Tue, 12/11/12, Gleb Smirnoff gleb...@freebsd.org
 wrote:

  From: Gleb Smirnoff gleb...@freebsd.org
  Subject: Re: igb and ALTQ in 9.1-rc3
  To: Jack Vogel jfvo...@gmail.com
  Cc: Clement Hermann (nodens) nodens2...@gmail.com,
 Barney Cordoba
  barney_cord...@yahoo.com,
 freebsd-net@FreeBSD.org
  Date: Tuesday, December 11, 2012, 2:58 AM
  On Mon, Dec 10, 2012 at 03:31:19PM
  -0800, Jack Vogel wrote:
  J UH, maybe asking the owner of the driver
 would help
  :)
  J
  J ... and no, I've never been aware of
 doing anything to
  stop supporting altq
  J so you wouldn't see any commits. If
 there's something
  in the altq code or
  J support (which I have nothing to do with)
 that caused
  this no-one informed
  J me.

  Switching from if_start to if_transmit
 effectively disables
  ALTQ support.

  AFAIR, there is some magic implemented in other
 drivers that
  makes them
  modern (that means using if_transmit), but
 still capable to
  switch to queueing
  mode if SIOCADDALTQ was casted upon them.

  It seems pretty difficult to say that something is
 compatible with
  something else if it hasn't been tested in a few
 years.

  It seems to me that ATLQ is the one that should
 handle if_transmit.
  although it's a good argument for having a raw
 send function in
  drivers. Ethernet drivers don't need more than a
 send() routing that
  loads a packet into the ring. The decision on what
 to do if you can't
  queue a packet should be in the  network
 layer, if we must still call
  things layers.

  start is a leftover from a day when you stuffed a
 buffer and waited
  for an interrupt to stuff in another. The whole
 idea is antiquated.

  Imagine drivers that pull packets off of a card and
 simply queue it;
  and that you simply submit a packet to be queued
 for transmit. Instead
  of trying to find 35 programmers that understand
 all of the lock BS,
  you only need to have a couple.

  I always disable all of the gobbledegook like
 checksum offloading. They
  just muddy the water and have very little effect on
 performance. A modern
  cpu can do a checksum as fast as you can manage the
 capabilities without
  disrupting the processing path.

  With FreeBSD, every driver is an experience. Some
 suck so bad that they
  should come with a warning. The MSK driver is
 completely useless, as
  an example.

  BC
  ___
  freebsd-net@freebsd.org
 mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-net
  To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

  During implementation of if_transmit altq was not
 considered at all.
  The default if_transmit provides some compatibility but
 that is void since
  altq has not been converted to call if_transmit after
 processing the mbuf.

  ALTQ can be adapted quite easily to if_transmit model
 it just wasn't done
  at the time.
  With if_transmit model it can even be modularized and
 not be a compile
  kernel option since the queue of the iface is
 abstracted now.

  I have always wanted to do a diff but have not yet got
 to it.
  The change is quite simple just provide an
 altq_transmit default method and
  just hook into if_transmit model on the fly.
  You surely need to handle some iface events and enable
 altq based on
  request but its is not a hard to implement.

  I will always have this in my TODO but not sure when i
 can get to it.

 The issue is not only that igb doesn't support if_transmit
 or if_start 
 method but that ALTQ isn't multiqueue ready and still uses
 the IFQ_LOCK 
 for all of its enqueue/dequeue operations. A simple drop in
 of 
 if_transmit is bound to cause race conditions on any
 multiqueue driver 
 with ALTQ.

 I do have a patch set for this on igb but its ugly and needs
 more work 
 although it should get you going. Let me know if your
 interested I will 
 clean it up and send it over. For more information on ALTQ
 discussion 
 and igb please read this thread: 
 http://freebsd.1045724.n5.nabble.com/em-igb-if-transmit-drbr-and-ALTQ-td5760338.html

 Best regards,

 Karim.

At minimum, the drivers should make multiqueue an option, at least
until it works better than a single queue driver. Many MOBOs have igb
nics on board and such a mainstream NIC shouldn't be strapped using
experimental code that clearly isn't ready for prime time.

BC

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: igb and ALTQ in 9.1-rc3

2012-12-10 Thread Barney Cordoba

--- On Mon, 12/10/12, Clément Hermann (nodens) nodens2...@gmail.com wrote:

 From: Clément Hermann (nodens) nodens2...@gmail.com
 Subject: igb and ALTQ in 9.1-rc3
 To: freebsd-net@freebsd.org
 Date: Monday, December 10, 2012, 6:03 AM
 Hi there,

 I'm trying to install a new pf/altq router. I needed to use
 9.1-rc3 due to
 RAID driver issues.

 Everything works find on my quad port intel card (igb), but
 when I try to
 load my ruleset I get the following error :

 pfctl: igb0 : driver does not support ALTQ

 altq(4) states that igb is supported. There are some
 references to altq in
 if_igb.c (include opt_altq in an ifdef), but they are not in
 the em driver
 (though my ruleset load fine with a em card).

 Could anyone tell me if igb is supposed to support altq or
 not ?

 Thanks,

 Clément (nodens)

I'll take a guess that the ALTQ description was written before igb
stopped supporting it.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Latency issues with buf_ring

2012-12-08 Thread Barney Cordoba

--- On Thu, 12/6/12, Adrian Chadd adr...@freebsd.org wrote:

 From: Adrian Chadd adr...@freebsd.org
 Subject: Re: Latency issues with buf_ring
 To: Barney Cordoba barney_cord...@yahoo.com
 Cc: freebsd-net@freebsd.org, Robert Watson rwat...@freebsd.org
 Date: Thursday, December 6, 2012, 1:31 PM
 There've been plenty of discussions
 about better ways of doing this
 networking stuff.

 Barney, are you able to make it to any of the developer
 summits?

Perhaps the summits are part of the problem? The goal should be to
get the best ideas; not the best ideas of those with the time and resource
and desire to attend a summit.

Lists are the best summit. You can get ideas from people who may not be
allowed by their contract obligations to attend such a summit.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Latency issues with buf_ring

2012-12-06 Thread Barney Cordoba

--- On Thu, 12/6/12, Robert Watson rwat...@freebsd.org wrote:

 From: Robert Watson rwat...@freebsd.org
 Subject: Re: Latency issues with buf_ring
 To: Andre Oppermann opperm...@networx.ch
 Cc: Barney Cordoba barney_cord...@yahoo.com, Adrian Chadd 
 adr...@freebsd.org, John Baldwin j...@freebsd.org, 
 freebsd-net@freebsd.org
 Date: Thursday, December 6, 2012, 4:39 AM
 On Tue, 4 Dec 2012, Andre Oppermann
 wrote:

  For most if not all ethernet drivers from 100Mbit/s the
 TX DMA rings are so large that buffering at the IFQ level
 doesn't make sense anymore and only adds latency.  So
 it could simply directly put everything into the TX DMA and
 not even try to soft-queue.  If the TX DMA ring is full
 ENOBUFS is returned instead of filling yet another
 queue.  However there are ALTQ interactions and other
 mechanisms which have to be considered too making it a bit
 more involved.

 I asserted for many years that software-side queueing would
 be subsumed by increasingly large DMA descriptor rings for
 the majority of devices and configurations.  However,
 this turns out not to have happened in a number of
 scenarios, and so I've revised my conclusions there.  I
 think we will continue to need to support transmit-side
 buffering, ideally in the form of a set of libraries that
 device drivers can use to avoid code replication and
 integrate queue management features fairly transparently.

 I'm a bit worried by the level of copy-and-paste between
 10gbps device drivers right now -- for 10/100/1000 drivers,
 the network stack contains the majority of the code, and the
 responsibility of the device driver is to advertise hardware
 features and manage interactions with rings, interrupts,
 etc.  On the 10gbps side, we see lots of code
 replication, especially in queue management, and it suggests
 to me (as discussed for several years in a row at BSDCan and
 elsehwere) that it's time to do a bit of revisiting of
 ifnet, pull more code back into the central stack and out of
 device drivers, etc.  That doesn't necessarily mean
 changing notions of ownership of event models, rather,
 centralising code in libraries rather than all over the
 place.  This is something to do with some care, of
 course.

 Robert

More troubling than that is the notion that the same code that's suitable
for 10/100Gb/s should be used in a 10Gb/s environment. 10Gb/s requires a
completely different way of thinking.

BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Latency issues with buf_ring

2012-12-05 Thread Barney Cordoba

--- On Tue, 12/4/12, Bruce Evans b...@optusnet.com.au wrote:

 From: Bruce Evans b...@optusnet.com.au
 Subject: Re: Latency issues with buf_ring
 To: Andre Oppermann opperm...@networx.ch
 Cc: Adrian Chadd adr...@freebsd.org, Barney Cordoba 
 barney_cord...@yahoo.com, John Baldwin j...@freebsd.org, 
 freebsd-net@FreeBSD.org
 Date: Tuesday, December 4, 2012, 10:31 PM
 On Tue, 4 Dec 2012, Andre Oppermann
 wrote:

  For most if not all ethernet drivers from 100Mbit/s the
 TX DMA rings
  are so large that buffering at the IFQ level doesn't
 make sense anymore
  and only adds latency.

 I found sort of the opposite for bge at 1Gbps.  Most or
 all bge NICs
 have a tx ring size of 512.  The ifq length is the tx
 ring size minus
 1 (511).  I needed to expand this to imax(2 * tick / 4,
 1) to
 maximize pps.  This does bad things to latency and
 worse things to
 caching (512 buffers might fit in the L2 cache, but 1
 buffers
 bust any reasonably cache as they are cycled through), but I
 only
 tried to optimize tx pps.

  So it could simply directly put everything into
  the TX DMA and not even try to soft-queue.  If the
 TX DMA ring is full
  ENOBUFS is returned instead of filling yet another
 queue.

 That could work, but upper layers currently don't understand
 ENOBUFS
 at all, so it would work poorly now.  Also, 512 entries
 is not many,
 so even if upper layers understood ENOBUFS it is not easy
 for them to
 _always_ respond fast enough to keep the tx active, unless
 there are
 upstream buffers with many more than 512 entries. 
 There needs to be
 enough buffering somewhere so that the tx ring can be
 replenished
 almost instantly from the buffer, to handle the worst-case
 latency
 for the threads generatng new (unbuffered) packets.  At
 the line rate
 of ~1.5 Mpps for 1 Gbps, the maximum latency that can be
 covered by
 512 entries is only 340 usec.

  However there
  are ALTQ interactions and other mechanisms which have
 to be considered
  too making it a bit more involved.

 I didn't try to handle ALTQ or even optimize for TCP.

 More details: to maximize pps, the main detail is to ensure
 that the tx
 ring never becomes empty.  The tx then transmits as
 fast as possible.
 This requires some watermark processing, but FreeBSD has
 almost none
 for tx rings.  The following normally happens for
 packet generators
 like ttcp and netsend:

 - loop calling send() or sendto() until the tx ring (and
 also any
   upstream buffers) fill up.  Then ENOBUFS is
 returned.

 - watermark processing is broken in the user API at this
 point.  There
   is no way for the application to wait for the ENOBUFS
 condition to
   go away (select() and poll() don't work). 
 Applications use poor
   workarounds:

 - old (~1989) ttcp sleeps for 18 msec when send() returns
 ENOBUFS.  This
   was barely good enough for 1 Mbps ethernet (line rate
 ~1500 pps is 27
   per 18 msec, so IFQ_MAXLEN = 50 combined with just a
 1-entry tx ring
   provides a safety factor of about 2).  Expansion
 of the tx ring size to
   512 makes this work with 10 Mbps ethernet too. 
 Expansion of the ifq
   to 511 gives another factor of 2.  After losing
 the safety factor of 2,
   we can now handle 40 Mbps ethernet, and are only a
 factor of 25 short
   for 1 Gbps.  My hardware can't do line rate for
 small packets -- it
   can only do 640 kpps.  Thus ttcp is only a
 factor of 11 short of
   supporting the hardware at 1 Gbps.

   This assumes that sleeps of 18 msec are actually
 possible, which
   they aren't with HZ = 100 giving a granularity of 10
 msec so that
   sleep(18 msec) actually sleeps for an average of 23
 msec.  -current
   uses the bad default of HZ = 1000.  With that
 sleep(18 msec) would
   average 18.5 msec.  Of course, ttcp should sleep
 for more like 1
   msec if that is possible.  Then the average
 sleep is 1.5 msec.  ttcp
   can keep up with the hardware with that, and is only
 slightly behind
   the hardware with the worst-case sleep of 2 msec
 (512+511 packets
   generated every 2 msec is 511.5 kpps).

   I normally use old ttcp, except I modify it to sleep
 for 1 msec instead
   of 18 in one version, and in another version I remove
 the sleep so that
   it busy-waits in a loop that calls send() which
 almost always returns
   ENOBUFS.  The latter wastes a lot of CPU, but is
 almost good enough
   for throughput testing.

 - newer ttcp tries to program the sleep time in
 microseconds.  This doesn't
   really work, since the sleep granularity is normally
 at least a millisecond,
   and even if it could be the 340 microseconds needed
 by bge with no ifq
   (see above, and better divide the 340 by 2), then
 this is quite short
   and would take almost as much CPU as
 busy-waiting.  I consider HZ = 1000
   to be another form of polling/busy-waiting and don't
 use it except for
   testing.

 - netrate/netsend also uses a programmed sleep time. 
 This doesn't really
   work, as above.  netsend also tries to limit its
 rate

1 2 3 >

1 - 100 of 212 matches

Mail list logo