Re: Network card IRQ balancing with Intel 5000 series chipsets

2007-01-02 Thread Rick Jones

The best way to achieve such balancing is to have the network card help
and essentially be able to select the CPU to notify while at the same
time considering:
a) avoiding any packet reordering - which restricts a flow to be
processed to a single CPU at least within a timeframe
b) be per-CPU-load-aware - which means to busy out only CPUs which are
less utilized

Various such schemes have been discussed here but no vendor is making
such nics today (search Daves Blog - he did discuss this at one point or
other).


I thought that Neterion were doing something along those lines with 
their Xframe II NICs - perhaps not CPU loading aware, but doing stuff to 
spread the work of different connections across the CPUs.


I would add a:

c) some knowledge of the CPU on which the thread accessing the socket 
for that connection will run.  This could be as simple as the CPU on 
which the socket was last accessed.  Having a _NIC_ know this sort of 
thing is somewhat difficult and expensive (perhaps too much so).  If a 
NIC simply hashes the connection idendifiers you then have the issue of 
different connections, each owned/accessed by one thread, taking 
different paths through the system.  No issues about reordering, but 
perhaps some on cache lines going hither and yon.


The question boils down to - Should the application (via the scheduler) 
dictate where its connections are processed, or should the connections 
dictate where the application runs?


rick jones

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2007-01-02 Thread Rick Jones

With NAPI, if i have a few interupts it likely implies i have a huge
network load (and therefore CPU use) and would be much more happier if
you didnt start moving more interupt load to that already loaded CPU



current irqbalance accounts for napi by using the number of packets as
indicator for load, not the number of interrupts. (for network
interrupts obviously)


And hopefully some knowledge of NUMA so it doesn't balance the 
interrupts of a NIC to some far-off (topology-wise) CPU...


rick jones
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-28 Thread Krzysztof Oledzki



On Wed, 27 Dec 2006, jamal wrote:


On Wed, 2006-27-12 at 09:09 +0200, Robert Iakobashvili wrote:



My scenario is treatment of RTP packets in kernel space with a single network
card (both Rx and Tx). The default of the Intel 5000 series chipset is
affinity of each
network card to a certain CPU. Currently, neither with irqbalance nor
with kernel
irq-balancing (MSI and io-apic attempted) I do not find a way to
balance that irq.


In the near future, when the NIC vendors wake up[1] because CPU vendors
- including big bad Intel -  are going to be putting out a large number
of hardware threads, you should be able to do more clever things with
such a setup. At the moment, just tie it to a single CPU and have your
other processes that are related running/bound on the other cores so you
can utilize them. OTOH, you say you are only using 30% of the one CPU,
so it may not be a big deal to tie your single nic to on cpu.


Anyway, it seems that with more advanced firewalls/routers kernel spends 
most of a time in IPSec/crypto code, netfilter conntrack and iptables 
rules/extensions, routing lookups, etc and not in hardware IRQ handler. 
So, it would be nice if this part coulde done by all CPUs.


Best regards,


Krzysztof Olędzki

Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-27 Thread Arjan van de Ven

 Although still insufficient in certain cases. All flows are not equal; as an
 example, an IPSEC flow with 1000 packets bound to one CPU  will likely
 utilize more cycles than 5000 packets that are being plain forwarded on
 another CPU.

sure; however the kernel doesn't provide more accurate information
currently (and I doubt it could even, it's not so easy to figure out
which interface triggered the softirq if 2 interfaces share the cpu, and
then, how much work came from which etc).

also the amount of work estimate doesn't need to be accurate to 5
digits to be honest... just number of packets seems to be a quite
reasonable approximation already. (if the kernel starts exporting more
accurate data, irqbalance can easily use it of course)
-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-27 Thread jamal
On Wed, 2006-27-12 at 09:09 +0200, Robert Iakobashvili wrote:

 
 My scenario is treatment of RTP packets in kernel space with a single network
 card (both Rx and Tx). The default of the Intel 5000 series chipset is
 affinity of each
 network card to a certain CPU. Currently, neither with irqbalance nor
 with kernel
 irq-balancing (MSI and io-apic attempted) I do not find a way to
 balance that irq.

In the near future, when the NIC vendors wake up[1] because CPU vendors
- including big bad Intel -  are going to be putting out a large number
of hardware threads, you should be able to do more clever things with
such a setup. At the moment, just tie it to a single CPU and have your
other processes that are related running/bound on the other cores so you
can utilize them. OTOH, you say you are only using 30% of the one CPU,
so it may not be a big deal to tie your single nic to on cpu.

cheers,
jamal

[1] If you are able to change the NIC in your setup try looking at
netiron;  email [EMAIL PROTECTED] they have a much clever nic
than the e1000. It has multiple DMA receive rings which are selectable
via a little classifier (example you could have RTP going to CPU0 and
rest going to CPU1). The DMA rings could be tied to different
interupts/MSI and with some little work could be made to appear like
several interfaces.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-27 Thread jamal
On Wed, 2006-27-12 at 14:08 +0100, Arjan van de Ven wrote:

 sure; however the kernel doesn't provide more accurate information
 currently (and I doubt it could even, it's not so easy to figure out
 which interface triggered the softirq if 2 interfaces share the cpu, and
 then, how much work came from which etc).
 

If you sample CPU use and in between two samples you are able to know
which nic is tied to which CPU, how much cycles such cpu consumed in
user vs kernel, and how many packets were seen on such nic; then you
should have the info necessary to make a decision, no? Yes, I know it is
a handwave on my part and it is complex but by the same token, I would
suspect each kind of IO derived work (which results in interupts) will
have more inputs that could help you make a proper decision than a mere
glance of the interupts. I understand for example the SCSI subsystem
these days behaves very much like NAPI.
I think one of the failures of the APIC load balancing is a direct
result of not being able to factor in such enviromental factors.

 also the amount of work estimate doesn't need to be accurate to 5
 digits to be honest... just number of packets seems to be a quite
 reasonable approximation already. (if the kernel starts exporting more
 accurate data, irqbalance can easily use it of course)

It is certainly much more promising now than before. Most people will
probably have symettrical type of apps, so it should work for them.
For someone like myself i will still not use it because i typically dont
have symettrical loads.

cheers,
jamal
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-27 Thread Arjan van de Ven
On Wed, 2006-12-27 at 09:44 -0500, jamal wrote:
 On Wed, 2006-27-12 at 14:08 +0100, Arjan van de Ven wrote:
 
  sure; however the kernel doesn't provide more accurate information
  currently (and I doubt it could even, it's not so easy to figure out
  which interface triggered the softirq if 2 interfaces share the cpu, and
  then, how much work came from which etc).
  
 
 If you sample CPU use and in between two samples you are able to know
 which nic is tied to which CPU, how much cycles such cpu consumed in
 user vs kernel, and how many packets were seen on such nic; then you
 should have the info necessary to make a decision, no?

Note that getting softirq time itself isn't a problem, that is available
actually. (it's not very accurate but that's another kettle of fish
entirely)

But... No that isn't better than packet counts.
Cases where it simply breaks 
1) you have more nics than cpus, so you HAVE to have sharing 
2) Other loads going on than just pure networking (storage but also
timers and .. and ..)

And neither is even remotely artificial. 

 Yes, I know it is
 a handwave on my part and it is complex but by the same token, I would
 suspect each kind of IO derived work (which results in interupts) will
 have more inputs that could help you make a proper decision than a mere
 glance of the interupts. I understand for example the SCSI subsystem
 these days behaves very much like NAPI.

the difference between scsi and networking is that the work scsi does
per sector is orders and orders of magnitude less than what networking
does. SCSI does it's work mostly per transfer not per sector, and if
you're busy you tend to get larger transfers as well (megabytes is not
special). SCSI also doesn't look at the payload at all, unlike
networking (where there are those pesky headers every 1500 bytes or less
that the kernel needs to look at :)


 It is certainly much more promising now than before. Most people will
 probably have symettrical type of apps, so it should work for them.
 For someone like myself i will still not use it because i typically dont
 have symettrical loads.

unless you have more nics than you have cpus, irqbalance will do the
right thing anyway (it'll tend to not share or move networking
interrupts). And once you have more nics than you have cpus see
above.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-26 Thread jamal

If you compile in PCI-E support you should have more control of the
MSI-X, no? I would tie the MSI to a specific processor statically; my
past experiences with any form of interupt balancing with network loads
has been horrible.

cheers,
jamal

On Mon, 2006-25-12 at 14:54 +0200, Robert Iakobashvili wrote:
 Arjan,
 
 On 12/25/06, Arjan van de Ven [EMAIL PROTECTED] wrote:
  On Mon, 2006-12-25 at 13:26 +0200, Robert Iakobashvili wrote:
  
 
  it can still be done using the TPR (Thread Priority Register) of the
  APIC. It's just... not there in Linux (other OSes do use this).
 
 Interesting.
 Have you any specific pointers for doing it (beyond Internet search)?
 Your input would be very much appreciated.
 Thank you.
 
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-26 Thread Robert Iakobashvili

On 12/26/06, jamal [EMAIL PROTECTED] wrote:


If you compile in PCI-E support you should have more control of the
MSI-X, no? I would tie the MSI to a specific processor statically; my
past experiences with any form of interupt balancing with network loads
has been horrible.

cheers,
jamal


Thanks for the direction.

In meanwhile I have removed all userland processes from CPU0,
that handles network card interrupts and all packet-processing (kernel-space).

Still, it should be some way of CPU-scaling; even for the case of the
only network
card.



 
  it can still be done using the TPR (Thread Priority Register) of the
  APIC. It's just... not there in Linux (other OSes do use this).

 Have you any specific pointers for doing it (beyond Internet search)?
 Your input would be very much appreciated.




--
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...
Navigare necesse est, vivere non est necesse
...
http://sourceforge.net/projects/curl-loader
A powerful open-source HTTP/S, FTP/S traffic
generating, loading and testing tool.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-26 Thread Arjan van de Ven
On Tue, 2006-12-26 at 13:44 -0500, jamal wrote:
 If you compile in PCI-E support you should have more control of the
 MSI-X, no? I would tie the MSI to a specific processor statically; my
 past experiences with any form of interupt balancing with network loads
 has been horrible.


it is; that's why irqbalance tries really hard (with a few very rare
exceptions) to keep networking irqs to the same cpu all the time...

but if your use case is kernel level packet processing of  MTU packets
then I can see why you would at some point would run out of cpu
power ... esp on multicore where you share the cache between cores you
probably can do a little better for that very specific use case.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-26 Thread jamal
On Tue, 2006-26-12 at 21:51 +0200, Robert Iakobashvili wrote:

BTW, turn on PCI-E on in the kernel build and do cat /proc/interupts to
see what i mean.

 In meanwhile I have removed all userland processes from CPU0,
 that handles network card interrupts and all packet-processing (kernel-space).
 
 Still, it should be some way of CPU-scaling; even for the case of the
 only network card.

The best way to achieve such balancing is to have the network card help
and essentially be able to select the CPU to notify while at the same
time considering:
a) avoiding any packet reordering - which restricts a flow to be
processed to a single CPU at least within a timeframe
b) be per-CPU-load-aware - which means to busy out only CPUs which are
less utilized

Various such schemes have been discussed here but no vendor is making
such nics today (search Daves Blog - he did discuss this at one point or
other).


cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-26 Thread jamal
On Tue, 2006-26-12 at 23:06 +0100, Arjan van de Ven wrote:

 it is; that's why irqbalance tries really hard (with a few very rare
 exceptions) to keep networking irqs to the same cpu all the time...
 

The problem with irqbalance when i last used it is it doesnt take into
consideration CPU utilization. 
With NAPI, if i have a few interupts it likely implies i have a huge
network load (and therefore CPU use) and would be much more happier if
you didnt start moving more interupt load to that already loaded CPU
So if you start considering CPU load sampled over a period of time, you
could make some progress. 

 but if your use case is kernel level packet processing of  MTU packets
 then I can see why you would at some point would run out of cpu
 power ... 

Of course, otherwise there would be not much value in balancing ..

Note  MTU sized packets is not unusual for firewall/router middle boxen
and theres plenty of those out there. But these days for VOIP endpoints
(RTP and SIP) which may process such packets in user space (and handle
thousands of such flows).
Additional note: the average packet size on the internet today (and for
many years) is way below your standard ethernet MTU of 1500 bytes.
 
 esp on multicore where you share the cache between cores you
 probably can do a little better for that very specific use case.

Indeed - thats why i proposed to tie the IRQs statically. Modern
machines have much larger caches, so static config is less of a
nuisance.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-26 Thread Arjan van de Ven
On Tue, 2006-12-26 at 17:46 -0500, jamal wrote:
 On Tue, 2006-26-12 at 23:06 +0100, Arjan van de Ven wrote:
 
  it is; that's why irqbalance tries really hard (with a few very rare
  exceptions) to keep networking irqs to the same cpu all the time...
  
 
 The problem with irqbalance when i last used it is it doesnt take into
 consideration CPU utilization. 

then you used an old ancient version

 With NAPI, if i have a few interupts it likely implies i have a huge
 network load (and therefore CPU use) and would be much more happier if
 you didnt start moving more interupt load to that already loaded CPU

current irqbalance accounts for napi by using the number of packets as
indicator for load, not the number of interrupts. (for network
interrupts obviously)


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-26 Thread jamal
On Wed, 2006-27-12 at 01:28 +0100, Arjan van de Ven wrote:

 current irqbalance accounts for napi by using the number of packets as
 indicator for load, not the number of interrupts. (for network
 interrupts obviously)
 

Sounds a lot more promising.
Although still insufficient in certain cases. All flows are not equal; as an
example, an IPSEC flow with 1000 packets bound to one CPU  will likely
utilize more cycles than 5000 packets that are being plain forwarded on
another CPU.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-26 Thread Robert Iakobashvili

On 12/27/06, jamal [EMAIL PROTECTED] wrote:

On Wed, 2006-27-12 at 01:28 +0100, Arjan van de Ven wrote:

 current irqbalance accounts for napi by using the number of packets as
 indicator for load, not the number of interrupts. (for network
 interrupts obviously)


Sounds a lot more promising.
Although still insufficient in certain cases. All flows are not equal; as an
example, an IPSEC flow with 1000 packets bound to one CPU  will likely
utilize more cycles than 5000 packets that are being plain forwarded on
another CPU.


I do agree with Jamal, that there is a problem here.

My scenario is treatment of RTP packets in kernel space with a single network
card (both Rx and Tx). The default of the Intel 5000 series chipset is
affinity of each
network card to a certain CPU. Currently, neither with irqbalance nor
with kernel
irq-balancing (MSI and io-apic attempted) I do not find a way to
balance that irq.

This is a good design in general to keep a static CPU-affinity for
network card interrupt.
However, what I have is that CPU0 is idle less than 10%, whereas 3
other core are
(2 dual-core CPUs, Intel) doing about nothing.
There is a real problem of CPU scaling with such design. Some day we
can wish to
add a 10Gbps network card and 16 cores/CPUs, but it will not be
helpful to scale.

Probably, some cards have separated Rx and Tx interrupts. Still,
scaling is an issue.

I will look into PCI-E option, thanks Jamal.


--
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...
Navigare necesse est, vivere non est necesse
...
http://sourceforge.net/projects/curl-loader
A powerful open-source HTTP/S, FTP/S traffic
generating, loading and testing tool.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-25 Thread Arjan van de Ven
On Sun, 2006-12-24 at 11:34 +0200, Robert Iakobashvili wrote:
 Sorry for repeating, now in text mode.
 
 Is there a way to balance IRQs from a network card among Intel CPU cores
 with Intel 5000 series chipset?
 
 We tried the Broadcom network card (lspci is below) both in MSI and
 io-apic mode, but found that the card interrupt may be moved to
 another logical CPU, but not balanced among CPUs/cores.
 
 Is that a policy of Intel chipset, that linux cannot overwrite? Can it
 be configured
 somewhere and by which tools?

first of all please don't use the in-kernel irqbalancer, use the
userspace one from www.irqbalance.org instead... 

Am I understanding you correctly that you want to spread the load of the
networking IRQ roughly equally over 2 cpus (or cores or ..)?
If so, that is very very suboptimal, especially for networking (since
suddenly a lot of packet processing gets to deal with out of order
receives and cross-cpu reassembly).

As for the chipset capability; the behavior of the chipset you have is
to prefer the first cpu of the programmed affinity mask. There are some
ways to play with that but doing it on the granularity you seem to want
is both not practical and too expensive anyway

Greetings,
Arjan van de Ven

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-25 Thread Robert Iakobashvili

Hi Arjan,

On 12/25/06, Arjan van de Ven [EMAIL PROTECTED] wrote:

On Sun, 2006-12-24 at 11:34 +0200, Robert Iakobashvili wrote:
 Sorry for repeating, now in text mode.

 Is there a way to balance IRQs from a network card among Intel CPU cores
 with Intel 5000 series chipset?

 We tried the Broadcom network card (lspci is below) both in MSI and
 io-apic mode, but found that the card interrupt may be moved to
 another logical CPU, but not balanced among CPUs/cores.

 Is that a policy of Intel chipset, that linux cannot overwrite? Can it
 be configured
 somewhere and by which tools?

first of all please don't use the in-kernel irqbalancer, use the
userspace one from www.irqbalance.org instead...


Thanks, it was also attempted, but the result is not much different,
because, the problem seems to be in the chipset.

Kernel explicitly disables interrupt affinity for such Intel chipsets in
drivers/pci/quirk.c, unless BIOS enables such feature.
The question is not very much in linux, but rather in HW-area,
namely, Intel 5000 series chipset tuning for networking.



Am I understanding you correctly that you want to spread the load of the
networking IRQ roughly equally over 2 cpus (or cores or ..)?


Yes, 4 cores.


If so, that is very very suboptimal, especially for networking (since
suddenly a lot of packet processing gets to deal with out of order
receives and cross-cpu reassembly).


Agree. Unfortunately, we have a flow of small RTP packets with heavy
processing and both Rx and Tx component on a single network card.
The application is not too much sensitive to the out of order, etc.
Thus, there 3 cores are actually doing nothing, whereas the CPU0
is overloaded, preventing system CPU scaling.



As for the chipset capability; the behavior of the chipset you have is
to prefer the first cpu of the programmed affinity mask. There are some
ways to play with that but doing it on the granularity you seem to want
is both not practical and too expensive anyway


Agree. Particularly, for AMD NUMA I have used cpu affinity of a single card
to a single CPU. Unfortunately, our case now is the only network card with a
huge load of small RTP packets with both Rx and Tx.

Agree, that providing CPU affinity for a network interrupt is a rather
reasonable default.
However, should a chipset manufacture take from us the very freedom of
tuning, freedom of choice?

Referring to the paper below, it should be some option to balance CPU among
several CPU, which I fail to find.
http://download.intel.com/design/chipsets/applnots/31433702.pdf


if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org


Thanks. I will look into this site.


--
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...
Navigare necesse est, vivere non est necesse
...
http://sourceforge.net/projects/curl-loader
A powerful open-source HTTP/S, FTP/S traffic
generating, loading and testing tool.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-25 Thread Arjan van de Ven
On Mon, 2006-12-25 at 13:26 +0200, Robert Iakobashvili wrote:
 
  Am I understanding you correctly that you want to spread the load of the
  networking IRQ roughly equally over 2 cpus (or cores or ..)?
 
 Yes, 4 cores.
 
  If so, that is very very suboptimal, especially for networking (since
  suddenly a lot of packet processing gets to deal with out of order
  receives and cross-cpu reassembly).
 
 Agree. Unfortunately, we have a flow of small RTP packets with heavy
 processing and both Rx and Tx component on a single network card.
 The application is not too much sensitive to the out of order, etc.
 Thus, there 3 cores are actually doing nothing, whereas the CPU0
 is overloaded, preventing system CPU scaling.

in principle the actual work should still be spread over the cores;
unless you do everything in kernel space that is..

 Agree, that providing CPU affinity for a network interrupt is a rather
 reasonable default.
 However, should a chipset manufacture take from us the very freedom of
 tuning, freedom of choice?

it can still be done using the TPR (Thread Priority Register) of the
APIC. It's just... not there in Linux (other OSes do use this).

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-25 Thread Robert Iakobashvili

Arjan,

On 12/25/06, Arjan van de Ven [EMAIL PROTECTED] wrote:

On Mon, 2006-12-25 at 13:26 +0200, Robert Iakobashvili wrote:

  Am I understanding you correctly that you want to spread the load of the
  networking IRQ roughly equally over 2 cpus (or cores or ..)?

 Yes, 4 cores.

  If so, that is very very suboptimal, especially for networking (since
  suddenly a lot of packet processing gets to deal with out of order
  receives and cross-cpu reassembly).

 Agree. Unfortunately, we have a flow of small RTP packets with heavy
 processing and both Rx and Tx component on a single network card.
 The application is not too much sensitive to the out of order, etc.
 Thus, there 3 cores are actually doing nothing, whereas the CPU0
 is overloaded, preventing system CPU scaling.

in principle the actual work should still be spread over the cores;
unless you do everything in kernel space that is..


This is the case. The processing is in kernel.


 Agree, that providing CPU affinity for a network interrupt is a rather
 reasonable default.
 However, should a chipset manufacture take from us the very freedom of
 tuning, freedom of choice?

it can still be done using the TPR (Thread Priority Register) of the
APIC. It's just... not there in Linux (other OSes do use this).


Interesting.
Have you any specific pointers for doing it (beyond Internet search)?
Your input would be very much appreciated.
Thank you.


--
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...
Navigare necesse est, vivere non est necesse
...
http://sourceforge.net/projects/curl-loader
A powerful open-source HTTP/S, FTP/S traffic
generating, loading and testing tool.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Network card IRQ balancing with Intel 5000 series chipsets

2006-12-24 Thread Robert Iakobashvili

Sorry for repeating, now in text mode.

Is there a way to balance IRQs from a network card among Intel CPU cores
with Intel 5000 series chipset?

We tried the Broadcom network card (lspci is below) both in MSI and
io-apic mode, but found that the card interrupt may be moved to
another logical CPU, but not balanced among CPUs/cores.

Is that a policy of Intel chipset, that linux cannot overwrite? Can it
be configured
somewhere and by which tools?

Any clues and directions would be very much appreciated.


CONFIG_IRQ_BALANCE=y
and with the same (2.6.9, patched) kernel irq balancing works properly
with older
Intel and with AMD HW.

#lspci -v
Is there a way to balance IRQs from a network card among Intel CPU cores
with Intel 5000 series chipset?

We tried the Broadcom network card (below lspci) both in MSI and
io-apic mode, but
found that the card interrupt may be moved to another logical CPU, but
not balanced
among CPUs/cores.

Is that a policy of Intel chipset, that linux cannot overwrite? Can it
be configured
somewhere and by which tools?

Any clues and directions would be very much appreciated.


CONFIG_IRQ_BALANCE=y
and with the same (2.6.9, patched) kernel irq balancing works properly
with older
Intel and with AMD HW.

#lspci -v
00:00.0 Host bridge: Intel Corporation 5000P Chipset Memory Controller
Hub (rev 92)
   Subsystem: Intel Corporation: Unknown device 8086
   Flags: bus master, fast devsel, latency 0, IRQ 169
   Capabilities: [50] Power Management version 2
   Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
   Capabilities: [6c] Express Root Port (Slot-) IRQ 0
   Capabilities: [100] Advanced Error Reporting

00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express
x4 Port 2 (rev 92) (prog-if 00 [Normal decode])Flags: bus
master, fast devsel, latency 0
   Bus: primary=00, secondary=1a, subordinate=25, sec-latency=0
   Capabilities: [50] Power Management version 2
   Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
   Capabilities: [6c] Express Root Port (Slot-) IRQ 0
   Capabilities: [100] Advanced Error Reporting

00:03.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express
x4 Port 3 (rev 92) (prog-if 00 [Normal decode])Flags: bus
master, fast devsel, latency 0
   Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
   I/O behind bridge: 5000-5fff
   Memory behind bridge: c800-c9ff
   Prefetchable memory behind bridge: c7f0-c7f0
   Capabilities: [50] Power Management version 2
   Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
   Capabilities: [6c] Express Root Port (Slot-) IRQ 0
   Capabilities: [100] Advanced Error Reporting

00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express
x8 Port 4-5 (rev 92) (prog-if 00 [Normal decode])
   Flags: bus master, fast devsel, latency 0
   Bus: primary=00, secondary=10, subordinate=10, sec-latency=0
   I/O behind bridge: 6000-
   Capabilities: [50] Power Management version 2
   Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
   Capabilities: [6c] Express Root Port (Slot-) IRQ 0
   Capabilities: [100] Advanced Error Reporting

00:05.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express
x4 Port 5 (rev 92) (prog-if 00 [Normal decode])Flags: fast
devsel
   Bus: primary=00, secondary=45, subordinate=45, sec-latency=0
   Capabilities: [50] Power Management version 2
   Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
   Capabilities: [6c] Express Root Port (Slot-) IRQ 0
   Capabilities: [100] Advanced Error Reporting

00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express
x8 Port 6-7 (rev 92) (prog-if 00 [Normal decode])
   Flags: bus master, fast devsel, latency 0
   Bus: primary=00, secondary=07, subordinate=07, sec-latency=0
   Capabilities: [50] Power Management version 2
   Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
   Capabilities: [6c] Express Root Port (Slot-) IRQ 0
   Capabilities: [100] Advanced Error Reporting

00:07.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express
x4 Port 7 (rev 92) (prog-if 00 [Normal decode])Flags: fast
devsel
   Bus: primary=00, secondary=44, subordinate=44, sec-latency=0
   Capabilities: [50] Power Management version 2
   Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
   Capabilities: [6c] Express Root Port (Slot-) IRQ 0
   Capabilities: [100] Advanced Error Reporting

00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA
Engine (rev 92)