date:20070808

Re: [PATCH 0/1] lro: Generic Large Receive Offload for TCP traffic

2007-08-08 Thread Jeff Garzik

David Miller wrote:

From: Jan-Bernd Themann <[EMAIL PROTECTED]>
Date: Fri, 3 Aug 2007 14:41:14 +0200

I think this patch could be the final version for now. It has been tested
on two platforms (power and x86_64) and works very well.

I checked in the LRO patch and the two sample driver ports
to net-2.6.24, thanks!

Good to hear, thanks all.

I'll ponder whether I want to rebase netdev-2.6.git#upstream (2.6.24 
queue) on top of net-2.6.24.  I might put it off until the pain 
threshold rises, or I might go ahead and do it.  Undecided.

Either way, I'll want you to push to Linus before I do, when the next 
merge window opens.

Jeff

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-08 Thread Krishna Kumar2

David Miller <[EMAIL PROTECTED]> wrote on 08/09/2007 09:57:27 AM:
>
> > Patrick had suggested calling dev_hard_start_xmit() instead of
> > conditionally calling the new API and to remove the new API
> > entirely. The driver determines whether batching is required or
> > not depending on (skb==NULL) or not. Would that approach be fine
> > with this "single interface" goal ?
>
> It is a valid posibility.
>
> Note that this is similar to how we handle TSO, the driver
> sets the feature bit and in its ->hard_start_xmit() it checks
> the SKB for the given offload property.

Great, I will try to get rid of two paths entirely, and see how to
re-arrange the code cleanly.

thanks,

- KK

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NEIGH]: Combine neighbour cleanup and release

2007-08-08 Thread David Miller

From: Thomas Graf <[EMAIL PROTECTED]>
Date: Tue, 31 Jul 2007 23:12:06 +0200

> Introduces neigh_cleanup_and_release() to be used after a
> neighbour has been removed from its neighbour table. Serves
> as preparation to add event notifications.
> 
> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Applied to net-2.6.24, thanks Thomas.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NEIGH]: Netlink notifications

2007-08-08 Thread David Miller

From: Thomas Graf <[EMAIL PROTECTED]>
Date: Tue, 31 Jul 2007 23:12:35 +0200

> Currently neighbour event notifications are limited to update
> notifications and only sent if the ARP daemon is enabled. This
> patch extends the existing notification code by also reporting
> neighbours being removed due to gc or administratively and
> removes the dependency on the ARP daemon. This allows to keep
> track of neighbour states without periodically fetching the
> complete neighbour table.
> 
> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Also applied to net-2.6.24, thanks!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] lro: Generic Large Receive Offload for TCP traffic

2007-08-08 Thread David Miller

From: Jan-Bernd Themann <[EMAIL PROTECTED]>
Date: Fri, 3 Aug 2007 14:41:14 +0200

> I think this patch could be the final version for now. It has been tested
> on two platforms (power and x86_64) and works very well.

I checked in the LRO patch and the two sample driver ports
to net-2.6.24, thanks!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC]: napi_struct V5

2007-08-08 Thread Jeff Garzik

David Miller wrote:

From: jamal <[EMAIL PROTECTED]>
Date: Wed, 08 Aug 2007 11:32:34 -0400

Think of a box where you have other network interfaces, the way you
are implementing currently implies you are going to be very unfair to 
the other interfaces on the box. 

This was the point I was trying to make the other day.

Agreed.  That's one of the big selling points of NAPI when I talk to 
people -- the entire system works towards a single equilibrium, when 
multiple interfaces are under load.

And conversely, without NAPI, the driver has no knowledge of conditions 
outside its limited view, and resource imbalances inevitably ensue. 
Particularly so for infiniband, 10gb, etc. where the natural motivation 
of the driver writer appears to trend towards "performance even at the 
expense of other system entities."

Jeff

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] make atomic_t volatile on all architectures

2007-08-08 Thread Jerry Jiang

On Wed, 8 Aug 2007 21:18:25 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Wed, 8 Aug 2007, Chris Snook wrote:
> > 
> > Some architectures currently do not declare the contents of an atomic_t to 
> > be
> > volatile.  This causes confusion since atomic_read() might not actually read
> > anything if an optimizing compiler re-uses a value stored in a register, 
> > which
> > can break code that loops until something external changes the value of an
> > atomic_t.
> 
> I'd be *much* happier with "atomic_read()" doing the "volatile" instead.
> 
> The fact is, volatile on data structures is a bug. It's a wart in the C 
> language. It shouldn't be used. 

Why? It's a wart! Is it due to unclear C standard on volatile related point?

Why the *volatile-accesses-in-code* is acceptable, does C standard make it 
clear?

-- Jerry

> 
> Volatile accesses in *code* can be ok, and if we have "atomic_read()" 
> expand to a "*(volatile int *)&(x)->value", then I'd be ok with that.
> 
> But marking data structures volatile just makes the compiler screw up 
> totally, and makes code for initialization sequences etc much worse.
> 
>   Linus
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Virtual ethernet device (v.4)

2007-08-08 Thread David Miller

From: Pavel Emelyanov <[EMAIL PROTECTED]>
Date: Wed, 08 Aug 2007 14:42:50 +0400

> What are your plans about this driver?

I've added it to my net-2.6.24 which I'm building to night.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-08 Thread david


On Sat, 4 Aug 2007, Ray Lee wrote:


On 8/4/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

On Sat, 4 Aug 2007, Ingo Molnar wrote:


At least on a surface level, your report has some similarities to
http://lkml.org/lkml/2007/5/21/84 . In that message, John Miller
mentions several things he tried without effect:

< - I increased the max allowed receive buffer through
< proc/sys/net/core/rmem_max and the application calls the right
< syscall. "netstat -su" does not show any "packet receive errors".


mercury1:/proc/sys/net/core# cat rmem_*
124928
131071
mercury1:/proc/sys/net/core# netstat -su
Udp:
697853177 packets received
10025642 packets to unknown port received.
191726680 packet receive errors
63194 packets sent
RcvbufErrors: 191726680
UdpLite:
mercury1:/proc/sys/net/core# echo "512000" >rmem_max


< - After getting "kernel: swapper: page allocation failure.
< order:0, mode:0x20", I increased /proc/sys/vm/min_free_kbytes


I have not seen any similar errors


< - ixgb.txt in kernel network documentation suggests to increase
< net.core.netdev_max_backlog to 30. This did not help.


mercury1:/proc/sys/net/core# cat netdev_*
300
1000
mercury1:/proc/sys/net/core# echo "30" >netdev_max_backlog


< - I also had to increase net.core.optmem_max, because the default
< value was too small for 700 multicast groups.


I'm not running multicast.


As they're all pretty simple to test, it may be worthwhile to give
them a shot just to rule things out.


unfortunantly the load is not high enough right now to see a real 
difference (it's only doing ~1400 logs/sec) I'll catch it at a higher load 
point to see if these make any difference.


David Lang


Ray


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-08 Thread Krishna Kumar2

Hi Dave,

David Miller <[EMAIL PROTECTED]> wrote on 08/09/2007 03:31:37 AM:

> > What do you generally think of the patch/implementation ? :)
>
> We have two driver implementation paths on recieve and now
> we'll have two on send, and that's not a good trend.

Correct.

> In an ideal world all the drivers would be NAPI and netif_rx()
> would only be used by tunneling drivers and similar in the
> protocol layers.  And likewise all sends would go through
> ->hard_start_xmit().
>
> If you can come up with a long term strategy that gets rid of
> the special transmit method, that'd be great.
>
> We should make Linux network drivers easy to write, not more difficult
> by constantly adding most interfaces than we consolidate.

I think that is a good top level view, and I agree with that.

Patrick had suggested calling dev_hard_start_xmit() instead of
conditionally calling the new API and to remove the new API
entirely. The driver determines whether batching is required or
not depending on (skb==NULL) or not. Would that approach be fine
with this "single interface" goal ?

Thanks,

- KK

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP's initial cwnd setting correct?...

2007-08-08 Thread David Miller

From: John Heffner <[EMAIL PROTECTED]>
Date: Wed, 08 Aug 2007 11:20:05 -0400

> I believe the current calculation is correct.  The RFC specifies a 
> window of no more than 4380 bytes unless 2*MSS > 4380.  If you change 
> the code in this way, then MSS=1461 will give you an initial window of 
> 3*MSS == 4383, violating the spec.  Reading the pseudocode in the RFC 
> 3390 is a bit misleading because they use a clamp at 4380 bytes rather 
> than use a multiplier in the relevant range.

Thanks for this excellent clarification John.

Because this has tripped up enough people, not once but on
multiple occaisions, I'm going to add some comments to
tcp_init_cwnd() to save the next person who is confused
by what seems to be an incorrect implementation of the RFC :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC]: napi_struct V6

2007-08-08 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 8 Aug 2007 07:06:28 -0400

> Documentation of NAPI still needs more work. I'll take a start at getting
> net_device docbook format cleaned up, then start on a redo of the API
> documentation.  

Thanks for stepping up to do this Stephen.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC]: napi_struct V5

2007-08-08 Thread David Miller

From: jamal <[EMAIL PROTECTED]>
Date: Wed, 08 Aug 2007 08:10:59 -0400

> Its overdue - just hasnt been anybody who has raised their hands
> to do it. Stephen did at one point. 
> Theres a lot of nice insights in that doc that will be nice to inherit.
> Theres also a nice state machine diagram in 
> http://www.kernel.org/pub/linux/kernel/people/hadi/docs/UKUUG2005.pdf
> that will be very useful to capture in whatever new doc.

Stephen said he would find time to work on this, so what I'll
do is yank the documentation file in my napi_struct changeset
and he can crib content from the old file and your UKUUG2005
slides.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-08 Thread David Miller

From: Krishna Kumar2 <[EMAIL PROTECTED]>
Date: Thu, 9 Aug 2007 09:49:57 +0530

> Patrick had suggested calling dev_hard_start_xmit() instead of
> conditionally calling the new API and to remove the new API
> entirely. The driver determines whether batching is required or
> not depending on (skb==NULL) or not. Would that approach be fine
> with this "single interface" goal ?

It is a valid posibility.

Note that this is similar to how we handle TSO, the driver
sets the feature bit and in its ->hard_start_xmit() it checks
the SKB for the given offload property.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC]: napi_struct V5

2007-08-08 Thread David Miller

From: jamal <[EMAIL PROTECTED]>
Date: Wed, 08 Aug 2007 11:32:34 -0400

> Think of a box where you have other network interfaces, the way you
> are implementing currently implies you are going to be very unfair to 
> the other interfaces on the box. 

This was the point I was trying to make the other day.

What's good for the goose (ipoib) is not necessarily good for the
gander and NAPI exists for the gander as much as for the goose.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] make atomic_t volatile on all architectures

2007-08-08 Thread Linus Torvalds

On Wed, 8 Aug 2007, Chris Snook wrote:
> 
> Some architectures currently do not declare the contents of an atomic_t to be
> volatile.  This causes confusion since atomic_read() might not actually read
> anything if an optimizing compiler re-uses a value stored in a register, which
> can break code that loops until something external changes the value of an
> atomic_t.

I'd be *much* happier with "atomic_read()" doing the "volatile" instead.

The fact is, volatile on data structures is a bug. It's a wart in the C 
language. It shouldn't be used. 

Volatile accesses in *code* can be ok, and if we have "atomic_read()" 
expand to a "*(volatile int *)&(x)->value", then I'd be ok with that.

But marking data structures volatile just makes the compiler screw up 
totally, and makes code for initialization sequences etc much worse.

Linus
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC NET_SCHED 00/02]: Flexible SFQ flow classification

2007-08-08 Thread Paul E. McKenney

On Wed, May 30, 2007 at 10:56:27AM -0400, jamal wrote:
> 
> On Wed, 2007-30-05 at 11:40 +0200, Patrick McHardy wrote:
> > One good thing about ESFQ is the more flexible flow classification, but
> > I don't like the concept of having a set of selectable hash functions
> > very much.
> > 
> 
> In the spirit of SFQ it is probably ok to do that; 
> i.e iirc,  the idea for justification behind sfq was to provide a
> "computationaly feasible/reasonable" way to provide fair queueing with a
> gazillion queues. 
> Classification was considered damn expensive in those days (when MC68K
> was sort of state of the art); it still is but maybe a much much lesser
> issue now for what "gazillion" was in those days. So a simple hash on a
> few fields with pertubation (the part that makes allows the "stochastic"
> part in the name) was deemed the way to go and in order to provide
> fairness (while avoiding packet reordering).
> So if you want to keep that spirit it is ok to do what ESFQ does;
> I think the assumptions will still be valid if you have a gazillion
> queues in todays terms. A number like say 128K may make sense.
> (Hey Paul M is still around, just too focused on the wrong things like
> RCU;-> so we can ask his opinion)

Not only that, he is -really- slow about getting to some of his email
sometimes.  :-/  In my defense, if you CC me, I will spot it more quickly.

Yeah, the 1980s were indeed over a -long- time ago, and architectures
certainly have changed.  I wrote my first parallel kernel code about
a year after the SFQ paper was published, and not too many years after
that began my longstanding RCU habit.  And not only did I inhale at
the time, I am still inhaling deeply.  ;-)

In any case, if you can feasibly do an exact classification, then doing so
would most likely be better than using SFQ.  And with today's hardware,
exact classifications are much reasonable than they were back on my old
handful-of-MHz 68K-based Sun workstation.  If an exact classification is
unworkable, but you absolutely have to maintain perfect ordering, then
one approach is to let the SFQ drain before perturbing the hash function.

One way to do this is to have a pair of hash tables, and switch back and
forth between them on each perturbation.  Perturbation then goes as follows:

1.  Select a new hash perturbation value, and associate it with the
currently-unused hash table (call it B).

2.  Swing pointers so that subsequent incoming packets go to B.

3.  Continue outputting from A until it is drained.  If the
perturbation is lockless, you might need to ensure that a
grace period passes during this drain operation.  (Yep, still
inhaling!!!)

4.  Start outputting packets from B.

But there might be better approaches.

> > These patches change SFQ to allow attaching external classifiers and add
> > a new "flow" classifier that allows to classify flows based on an arbitary
> > combination of pre-defined keys. Its probably not the fastest classifier
> > when used with multiple keys, but frankly, I don't think speed is very
> > important in most situations where the current SFQ implementation is used.
> 
> The only one thing i noticed that changes the behavior is the use of
> skb->prio as a selector. I think if you removed that it should be fine.
> Another alternative is to create a brand new FQ qdisc and leave the
> classification to the classifiers.
> 
> > It currently does not support perturbation, I didn't want to move this into
> > the classifier, so I need to think about a way to handle it within SFQ.
> 
> It is kind of hard to put it back into the current approach because the
> basic assumptions of ensuring no re-ordering and a "fast" classifier are
> gone.
> 
> I am almost tempted to say go back and write a qdisc called FQ.

And this might well be a better approach.  ;-)

Thanx, Paul

> cheers,
> jamal
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/9 Rev3] [sched] Modify qdisc_run to support batching

2007-08-08 Thread Krishna Kumar2

Hi Patrick,

Patrick McHardy <[EMAIL PROTECTED]> wrote on 08/08/2007 07:35:17 PM:

> Krishna Kumar wrote:

> > +static inline int get_skb(struct net_device *dev, struct Qdisc *q,
> > +   struct sk_buff_head *blist, struct sk_buff **skbp)
> > +{
> > +   if (likely(!blist || (!skb_queue_len(blist) && qdisc_qlen(q) <=
1))) {
> > +  return likely((*skbp = dev_dequeue_skb(dev, q)) != NULL);
> > +   } else {
> > +  int max = dev->tx_queue_len - skb_queue_len(blist);
> > +  struct sk_buff *skb;
> > +
> > +  while (max > 0 && (skb = dev_dequeue_skb(dev, q)) != NULL)
> > + max -= dev_add_skb_to_blist(skb, dev);
> > +
> > +  *skbp = NULL;
> > +  return 1;   /* we have atleast one skb in blist */
> > +   }
> > +}
>
> I think you missed my previous reply to this in the flood of
> responses (or I missed your reply), so I'm copying it below:

Sorry, but I didn't get your post on this point earlier (thanks for
posting again).

> The entire idea of a single queue after qdiscs that is refilled
> independantly of transmissions times etc. make be worry a bit.
> By changing the timing you're effectively changing the qdiscs
> behaviour, at least in some cases. SFQ is a good example, but I
> believe it affects most work-conserving qdiscs. Think of this
> situation:
>
> 100 packets of flow 1 arrive
> 50 packets of flow 1 are sent
> 100 packets for flow 2 arrive
> remaining packets are sent
>
> On the wire you'll first see 50 packets of flow 1, than 100 packets
> alternate of flow 1 and 2, then 50 packets flow 2.
>
> With your additional queue all packets of flow 1 are pulled out of
> the qdisc immediately and put in the fifo. When the 100 packets of
> the second flow arrive they will also get pulled out immediately
> and are put in the fifo behind the remaining 50 packets of flow 1.
> So what you get on the wire is:
>
> 100 packets of flow 1
> 100 packets of flow 1

In normal case (qdisc run from xmit and not from tx_action), the code
executing is the same as regular code without any difference in wire
behavior due to the check:
  if (likely(!blist || (!skb_queue_len(blist) && qdisc_qlen(q) <= 1)))
{
return likely((*skbp = dev_dequeue_skb(dev, q)) != NULL);
(I always pass blist as NULL).

With SFQ for the above scenario, my code will first send out 50 F1 skbs
iteratively (blist==NULL), get blocked so that 50 skbs accumulate on F1
and 100 on F2, then when it is woken up, it will batch but it will pick
up 50 F1 and 50 F2 skbs in alternate order and put in the queue, and
finally pick up the remaining 50 F2 skbs and put those too in the queue.
Since I am calling dev_dequeue_skb, I am assured of RR when using SFQ.
The only time I would have 100 skbs from F1 in my queue (and hence sent
out first) is if there are NO F2 skbs.

> So SFQ is without any effect. This is not completely avoidable of
> course, but you can and should limit the damage by only pulling
> out as much packets as the driver can take and have the driver
> stop the queue afterwards.

I feel this cannot happen. Please correct me if I am wrong.

Thanks,

- KK

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] make atomic_t volatile on all architectures

2007-08-08 Thread Paul E. McKenney

On Wed, Aug 08, 2007 at 06:48:24PM -0700, David Miller wrote:
> From: Herbert Xu <[EMAIL PROTECTED]>
> Date: Thu, 09 Aug 2007 09:03:27 +0800
> 
> > Such loops should always use something like cpu_relax() which comes
> > with a barrier.
> 
> This is an excellent point.
> 
> And it needs to be weighed with the error prone'ness Andrew mentioned.
> There probably is a middle ground somewhere.

OK...  I'll bite.  ACCESS_ONCE(), see http://lkml.org/lkml/2007/7/11/664.

This would allow ACCESS_ONCE(atomic_read(&x)) to be used where refetching
would be problematic, but allow the compiler free rein in cases where
refetching is OK.

The ACCESS_ONCE() primitive of course has its limitations as well, but
you did ask for a middle ground.  ;-)

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET]: Share correct feature code between bridging and bonding

2007-08-08 Thread Herbert Xu

Hi Dave:

[NET]: Share correct feature code between bridging and bonding

http://bugzilla.kernel.org/show_bug.cgi?id=8797 shows that the
bonding driver may produce bogus combinations of the checksum
flags and SG/TSO.

For example, if you bond devices with NETIF_F_HW_CSUM and
NETIF_F_IP_CSUM you'll end up with a bonding device that
has neither flag set.  If both have TSO then this produces
an illegal combination.

The bridge device on the other hand has the correct code to
deal with this.

In fact, the same code can be used for both.  So this patch
moves that logic into net/core/dev.c and uses it for both
bonding and bridging.

In the process I've made small adjustments such as only
setting GSO_ROBUST if at least one constituent device
supports it.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 070b78d..1afda32 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1202,43 +1202,35 @@ static int bond_sethwaddr(struct net_device *bond_dev,
return 0;
 }
 
-#define BOND_INTERSECT_FEATURES \
-   (NETIF_F_SG | NETIF_F_ALL_CSUM | NETIF_F_TSO | NETIF_F_UFO)
+#define BOND_VLAN_FEATURES \
+   (NETIF_F_VLAN_CHALLENGED | NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_TX | \
+NETIF_F_HW_VLAN_FILTER)
 
 /* 
  * Compute the common dev->feature set available to all slaves.  Some
- * feature bits are managed elsewhere, so preserve feature bits set on
- * master device that are not part of the examined set.
+ * feature bits are managed elsewhere, so preserve those feature bits
+ * on the master device.
  */
 static int bond_compute_features(struct bonding *bond)
 {
-   unsigned long features = BOND_INTERSECT_FEATURES;
struct slave *slave;
struct net_device *bond_dev = bond->dev;
+   unsigned long features = bond_dev->features;
unsigned short max_hard_header_len = ETH_HLEN;
int i;
 
+   features &= ~(NETIF_F_ALL_CSUM | BOND_VLAN_FEATURES);
+   features |= NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA |
+   NETIF_F_GSO_MASK | NETIF_F_NO_CSUM;
+
bond_for_each_slave(bond, slave, i) {
-   features &= (slave->dev->features & BOND_INTERSECT_FEATURES);
+   features = netdev_compute_features(features,
+  slave->dev->features);
if (slave->dev->hard_header_len > max_hard_header_len)
max_hard_header_len = slave->dev->hard_header_len;
}
 
-   if ((features & NETIF_F_SG) && 
-   !(features & NETIF_F_ALL_CSUM))
-   features &= ~NETIF_F_SG;
-
-   /* 
-* features will include NETIF_F_TSO (NETIF_F_UFO) iff all 
-* slave devices support NETIF_F_TSO (NETIF_F_UFO), which 
-* implies that all slaves also support scatter-gather 
-* (NETIF_F_SG), which implies that features also includes 
-* NETIF_F_SG. So no need to check whether we have an  
-* illegal combination of NETIF_F_{TSO,UFO} and 
-* !NETIF_F_SG 
-*/
-
-   features |= (bond_dev->features & ~BOND_INTERSECT_FEATURES);
+   features |= (bond_dev->features & BOND_VLAN_FEATURES);
bond_dev->features = features;
bond_dev->hard_header_len = max_hard_header_len;
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4a616d7..e679b27 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1131,6 +1131,8 @@ extern void dev_seq_stop(struct seq_file *seq, void *v);
 
 extern void linkwatch_run_queue(void);
 
+extern int netdev_compute_features(unsigned long all, unsigned long one);
+
 static inline int net_gso_ok(int features, int gso_type)
 {
int feature = gso_type << NETIF_F_GSO_SHIFT;
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 5e1892d..0eded17 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -179,5 +179,5 @@ void br_dev_setup(struct net_device *dev)
dev->priv_flags = IFF_EBRIDGE;
 
dev->features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA |
-   NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_GSO_ROBUST;
+   NETIF_F_GSO_MASK | NETIF_F_NO_CSUM | NETIF_F_LLTX;
 }
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index b40dada..749f0e8 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -349,43 +349,15 @@ int br_min_mtu(const struct net_bridge *br)
 void br_features_recompute(struct net_bridge *br)
 {
struct net_bridge_port *p;
-   unsigned long features, checksum;
+   unsigned long features;
 
-   checksum = br->feature_mask & NETIF_F_ALL_CSUM ? NETIF_F_NO_CSUM : 0;
-   features = br-

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-08 Thread Krishna Kumar2

Herbert Xu <[EMAIL PROTECTED]> wrote on 08/08/2007 07:12:47 PM:

> On Wed, Aug 08, 2007 at 03:49:00AM -0700, David Miller wrote:
> >
> > Not because I think it obviates your work, but rather because I'm
> > curious, could you test a TSO-in-hardware driver converted to
> > batching and see how TSO alone compares to batching for a pure
> > TCP workload?
>
> You could even lower the bar by disabling TSO and enabling
> software GSO.

I will try with E1000 (though I didn't see improvement when I tested a long
time back). The difference I expect is that TSO would help with large
packets and not necessarily small/medium packets and not definitely in
the case of multiple different skbs (as opposed to single large skb)
getting
queue'd. I think these are two different workloads.

> > I personally don't think it will help for that case at all as
> > TSO likely does better job of coalescing the work _and_ reducing
> > bus traffic as well as work in the TCP stack.
>
> I agree.  I suspect the bulk of the effort is in getting
> these skb's created and processed by the stack so that by
> the time that they're exiting the qdisc there's not much
> to be saved anymore.

However, I am getting a large improvement for IPoIB specifically for this
same case. The reason - batching will help only when queue gets full and
stopped (and to a lesser extent if tx lock was not got, which results
in fewer amount of batching that can be done).

thanks,

- KK

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/9 Rev3] [sched] Modify qdisc_run to support batching

2007-08-08 Thread Krishna Kumar2

Evgeniy Polyakov <[EMAIL PROTECTED]> wrote on 08/08/2007 05:44:02 PM:

> On Wed, Aug 08, 2007 at 03:01:45PM +0530, Krishna Kumar
([EMAIL PROTECTED]) wrote:
> > +static inline int get_skb(struct net_device *dev, struct Qdisc *q,
> > +   struct sk_buff_head *blist, struct sk_buff **skbp)
> > +{
> > +   if (likely(!blist || (!skb_queue_len(blist) && qdisc_qlen(q) <=
1))) {
> > +  return likely((*skbp = dev_dequeue_skb(dev, q)) != NULL);
> > +   } else {
> > +  int max = dev->tx_queue_len - skb_queue_len(blist);
> > +  struct sk_buff *skb;
> > +
> > +  while (max > 0 && (skb = dev_dequeue_skb(dev, q)) != NULL)
> > + max -= dev_add_skb_to_blist(skb, dev);
> > +
> > +  *skbp = NULL;
> > +  return 1;   /* we have atleast one skb in blist */
> > +   }
> > +}
>
> Same here - is it possible to get a list in one go instead of pulling
> one-by-one, since it forces quite a few additional unneded lock
> get/releases. What about dev_dequeue_number_skb(dev, q, num), which will
> grab the lock and move a list of skbs from one queue to provided head.

OK, I will try this out.

> > @@ -158,7 +198,10 @@ static inline int qdisc_restart(struct n
> > /* And release queue */
> > spin_unlock(&dev->queue_lock);
> >
> > -   ret = dev_hard_start_xmit(skb, dev);
> > +   if (likely(skb))
> > +  ret = dev_hard_start_xmit(skb, dev);
> > +   else
> > +  ret = dev->hard_start_xmit_batch(dev);
>
> Perfectionism says that having array of two functions and calling one of
> them via array_func_pointer[!!skb] will be much faster. Just a though.
> It is actually much faster than if/else on x86 at least.

Thinking about this - I will have to store the 2 pointer array in dev
itself wasting some space, and also fn pointer will have wrong signature
as one takes an extra argument. Will ponder some more :)

thanks,

- KK

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/9 Rev3] [core] Add skb_blist & hard_start_xmit_batch

2007-08-08 Thread Krishna Kumar2

Hi Evgeniy,

Evgeniy Polyakov <[EMAIL PROTECTED]> wrote on 08/08/2007 05:31:43 PM:

> > +int dev_change_tx_batch_skb(struct net_device *dev, unsigned long
new_batch_skb)
> > +{
> > +   int ret = 0;
> > +   struct sk_buff_head *blist;
> > +
> > +   if (!dev->hard_start_xmit_batch) {
> > +  /* Driver doesn't support batching skb API */
> > +  ret = -ENOTSUPP;
> > +  goto out;
> > +   }
> > +
> > +   /* Handle invalid argument */
> > +   if (new_batch_skb < 0) {
> > +  ret = -EINVAL;
> > +  goto out;
> > +   }
> It is unsigned, how can it be less than zero?

Yuck, originally I had it as int and changed to ulong and forgot to
remove this check.

> And actually you use it just like a binary flag (casted to/from u32 in
> the code, btw), so why not using ethtool_value directly here?

I still need to check if the value is changing, so the one check is needed.
Later I am using it as a value directly.

> > +   /* Check if new value is same as the current */
> > +   if (!!dev->skb_blist == !!new_batch_skb)
> > +  goto out;
> > +
> > +   if (new_batch_skb &&
> > +   (blist = kmalloc(sizeof *blist, GFP_KERNEL)) == NULL) {
> > +  ret = -ENOMEM;
> > +  goto out;
> > +   }
> > +
> > +   spin_lock(&dev->queue_lock);
> > +   if (new_batch_skb) {
> > +  skb_queue_head_init(blist);
> > +  dev->skb_blist = blist;
> > +   } else
> > +  free_batching(dev);
> > +   spin_unlock(&dev->queue_lock);
>
> This needs bh lock too, since blist is accessed by qdisc_restart.

Yes, had it in the code, put it in the list of changes, but missed it for
some reason :(

> > +int dev_add_skb_to_blist(struct sk_buff *skb, struct net_device *dev)
> > +{
> > +   if (!list_empty(&ptype_all))
> > +  dev_queue_xmit_nit(skb, dev);
> > +
> > +   if (netif_needs_gso(dev, skb)) {
> > +  if (unlikely(dev_gso_segment(skb))) {
> > + kfree(skb);
> > + return 0;
> > +  }
> > +
> > +  if (skb->next) {
> > + int count = 0;
> > +
> > + do {
> > +struct sk_buff *nskb = skb->next;
> > +
> > +skb->next = nskb->next;
> > +__skb_queue_tail(dev->skb_blist, nskb);
> > +count++;
> > + } while (skb->next);
>
> Is it possible to move list without iterating over each entry?

Though I cannot see something obvious to do that, let me see if
something is possible as it will make a good difference.

thanks for your suggestions,

- KK

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] make atomic_t volatile on all architectures

2007-08-08 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Thu, 09 Aug 2007 09:03:27 +0800

> Such loops should always use something like cpu_relax() which comes
> with a barrier.

This is an excellent point.

And it needs to be weighed with the error prone'ness Andrew mentioned.
There probably is a middle ground somewhere.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] SSB PCI core driver fixes

2007-08-08 Thread Aurelien Jarno

Michael Buesch a écrit :
> On Thursday 09 August 2007, Aurelien Jarno wrote:
>>  - Add some delay between the configuration of the PCI controller 
>>and its registration.
> 
> Why? It is _huge_ and people won't like it ;)
> At least add a comment why this is needed.

It is need, otherwise the PCI controller gets confused, which causes a
reset of the machine. Then CFE goes into a loop booting again and again
without being able to get up to the prompt.

I agree this is a huge value, so I will try to find the minimum value
and then add some margin.

I will send an updated patch.

>> @@ -340,6 +345,7 @@
>>   * The following needs change, if we want to port hostmode
>>   * to non-MIPS platform. */
>>  set_io_port_base((unsigned long)ioremap_nocache(SSB_PCI_MEM, 
>> 0x0400));
>> +mdelay(300);
>>  register_pci_controller(&ssb_pcicore_controller);
>>  }
>>
> 
> 
> 


-- 
  .''`.  Aurelien Jarno | GPG: 1024D/F1BCDB73
 : :' :  Debian developer   | Electrical Engineer
 `. `'   [EMAIL PROTECTED] | [EMAIL PROTECTED]
   `-people.debian.org/~aurel32 | www.aurel32.net
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] SSB PCI core driver fixes

2007-08-08 Thread Aurelien Jarno

On Thu, Aug 09, 2007 at 02:38:24AM +0200, Aurelien Jarno wrote:
> Michael Buesch a écrit :
> > On Thursday 09 August 2007, Aurelien Jarno wrote:
> >>  - Add some delay between the configuration of the PCI controller 
> >>and its registration.
> > 
> > Why? It is _huge_ and people won't like it ;)
> > At least add a comment why this is needed.
> 
> It is need, otherwise the PCI controller gets confused, which causes a
> reset of the machine. Then CFE goes into a loop booting again and again
> without being able to get up to the prompt.
> 
> I agree this is a huge value, so I will try to find the minimum value
> and then add some margin.
> 
> I will send an updated patch.
> 

A few experiments have shown that the minimum value is 3ms on my WGT634U
machine. I haved changed the value to 10ms in the new patch below, which
gives some margin, and added a comment.

Aurelien



The patch below against 2.6.23-rc1-mm2 fixes various things on the SSB
PCI core driver:
 - Correctly write the configuration register value in 
   ssb_extpci_write_config() for len = 1 or len = 2.
 - Set the PCI_LATENCY_TIMER to handle devices on the PCI bus.
 - Set the PCI arbiter control to internal.
 - Add some delay between the configuration of the PCI controller 
   and its registration.

Signed-off-by: Aurelien Jarno <[EMAIL PROTECTED]>

--- a/drivers/ssb/driver_pcicore.c
+++ b/drivers/ssb/driver_pcicore.c
@@ -99,6 +99,9 @@
 
/* Enable PCI bridge BAR1 prefetch and burst */
pci_write_config_dword(dev, SSB_BAR1_CONTROL, 3);
+
+   /* Make sure our latency is high enough to handle the devices behind us 
*/
+   pci_write_config_byte(dev, PCI_LATENCY_TIMER, 0xa8);
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, ssb_fixup_pcibridge);
 
@@ -230,7 +233,7 @@
val = *((const u32 *)buf);
break;
}
-   writel(*((const u32 *)buf), mmio);
+   writel(val, mmio);
 
err = 0;
 unmap:
@@ -311,6 +314,8 @@
udelay(150); /* Assertion time demanded by the PCI standard */
val |= SSB_PCICORE_CTL_RST; /* Deassert RST# */
pcicore_write32(pc, SSB_PCICORE_CTL, val);
+   val = SSB_PCICORE_ARBCTL_INTERN;
+   pcicore_write32(pc, SSB_PCICORE_ARBCTL, val);
udelay(1); /* Assertion time demanded by the PCI standard */
 
/*TODO cardbus mode */
@@ -340,6 +345,9 @@
 * The following needs change, if we want to port hostmode
 * to non-MIPS platform. */
set_io_port_base((unsigned long)ioremap_nocache(SSB_PCI_MEM, 
0x0400));
+   /* Give some time to the PCI controller to configure itself with the new
+* values. Not waiting at this point causes crashes of the machine. */
+   mdelay(10);
register_pci_controller(&ssb_pcicore_controller);
 }
 

-- 
  .''`.  Aurelien Jarno | GPG: 1024D/F1BCDB73
 : :' :  Debian developer   | Electrical Engineer
 `. `'   [EMAIL PROTECTED] | [EMAIL PROTECTED]
   `-people.debian.org/~aurel32 | www.aurel32.net
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] make atomic_t volatile on all architectures

2007-08-08 Thread Herbert Xu

Chris Snook <[EMAIL PROTECTED]> wrote:
> 
> Some architectures currently do not declare the contents of an atomic_t to be
> volatile.  This causes confusion since atomic_read() might not actually read
> anything if an optimizing compiler re-uses a value stored in a register, which
> can break code that loops until something external changes the value of an
> atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads

Such loops should always use something like cpu_relax() which comes
with a barrier.

> of all registers used in the loop, thus hurting performance instead of helping
> it, particularly on architectures where it's unnecessary.  Since we generally

Do you have an example of such a loop where performance is hurt by this?

The IPVS code that led to this patch was simply broken and has been
fixed to use cpu_relax().

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] SSB PCI core driver fixes

2007-08-08 Thread Michael Buesch

On Thursday 09 August 2007, Aurelien Jarno wrote:
>  - Add some delay between the configuration of the PCI controller 
>and its registration.

Why? It is _huge_ and people won't like it ;)
At least add a comment why this is needed.

> @@ -340,6 +345,7 @@
>* The following needs change, if we want to port hostmode
>* to non-MIPS platform. */
>   set_io_port_base((unsigned long)ioremap_nocache(SSB_PCI_MEM, 
> 0x0400));
> + mdelay(300);
>   register_pci_controller(&ssb_pcicore_controller);
>  }
> 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] SSB PCI core driver fixes

2007-08-08 Thread Aurelien Jarno

The patch below against 2.6.23-rc1-mm2 fixes various things on the SSB
PCI core driver:
 - Correctly write the configuration register value in 
   ssb_extpci_write_config() for len = 1 or len = 2.
 - Set the PCI latency timer to handle devices on the PCI bus.
 - Set the PCI arbiter control to internal.
 - Add some delay between the configuration of the PCI controller 
   and its registration.

Note: this is the latest SSB patch from my local tree. Next ones are 
only BCM947xx or CFE related.

Signed-off-by: Aurelien Jarno <[EMAIL PROTECTED]>

--- a/drivers/ssb/driver_pcicore.c
+++ b/drivers/ssb/driver_pcicore.c
@@ -99,6 +99,9 @@
 
/* Enable PCI bridge BAR1 prefetch and burst */
pci_write_config_dword(dev, SSB_BAR1_CONTROL, 3);
+
+   /* Make sure our latency is high enough to handle the devices behind us 
*/
+   pci_write_config_byte(dev, PCI_LATENCY_TIMER, 0xa8);
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, ssb_fixup_pcibridge);
 
@@ -230,7 +233,7 @@
val = *((const u32 *)buf);
break;
}
-   writel(*((const u32 *)buf), mmio);
+   writel(val, mmio);
 
err = 0;
 unmap:
@@ -311,6 +314,8 @@
udelay(150); /* Assertion time demanded by the PCI standard */
val |= SSB_PCICORE_CTL_RST; /* Deassert RST# */
pcicore_write32(pc, SSB_PCICORE_CTL, val);
+   val = SSB_PCICORE_ARBCTL_INTERN;
+   pcicore_write32(pc, SSB_PCICORE_ARBCTL, val);
udelay(1); /* Assertion time demanded by the PCI standard */
 
/*TODO cardbus mode */
@@ -340,6 +345,7 @@
 * The following needs change, if we want to port hostmode
 * to non-MIPS platform. */
set_io_port_base((unsigned long)ioremap_nocache(SSB_PCI_MEM, 
0x0400));
+   mdelay(300);
register_pci_controller(&ssb_pcicore_controller);
 }

-- 
  .''`.  Aurelien Jarno | GPG: 1024D/F1BCDB73
 : :' :  Debian developer   | Electrical Engineer
 `. `'   [EMAIL PROTECTED] | [EMAIL PROTECTED]
   `-people.debian.org/~aurel32 | www.aurel32.net
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] ipvs: force read of atomic_t in while loop

2007-08-08 Thread Andi Kleen

On Wed, Aug 08, 2007 at 05:08:44PM -0400, Chris Snook wrote:
> Heiko Carstens wrote:
> >On Wed, Aug 08, 2007 at 03:21:31AM -0700, David Miller wrote:
> >>From: Heiko Carstens <[EMAIL PROTECTED]>
> >>Date: Wed, 8 Aug 2007 11:33:00 +0200
> >>
> >>>Just saw this while grepping for atomic_reads in a while loops.
> >>>Maybe we should re-add the volatile to atomic_t. Not sure.
> >>I think whatever the choice, it should be done consistently
> >>on every architecture.
> >>
> >>It's just asking for trouble if your arch does it differently from
> >>every other.
> >
> >Well..currently it's i386/x86_64 and s390 which have no volatile
> >in atomic_t. And yes, of course I agree it should be consistent
> >across all architectures. But it isn't.
> 
> Based on recent discussion, it's pretty clear that there's a lot of 
> confusion about this.  A lot of people (myself included, until I thought 
> about it long and hard) will reasonably assume that calling 
> atomic_read() will actually read the value from memory.  Leaving out the 
> volatile declaration seems like a pessimization to me.  If you force 
> people to use barrier() everywhere they're working with atomic_t, it 
> will force re-reads of all the non-atomic data in use as well, which 
> will cause more memory fetches of things that generally don't need 
> barrier().  That and it's a bug waiting to happen.
> 
> Andi -- your thoughts on the matter?

I also think readding volatile makes sense. An alternative would be
to stick an rmb() into atomic_read() -- that would also stop speculative reads.
Disadvantage is that it clobbers all memory, not just the specific value.

But you really have to complain to Linus (cc'ed). He came up
with the volatile removale change iirc.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-08 Thread Shirley Ma

Hello Herbert,

> > Not because I think it obviates your work, but rather because I'm
> > curious, could you test a TSO-in-hardware driver converted to
> > batching and see how TSO alone compares to batching for a pure
> > TCP workload?
> 
> You could even lower the bar by disabling TSO and enabling
> software GSO.

We had a discuss before. GSO doesn't benefit device which has no HW 
checksum (like IPoIB) by inducing an extra copy. And GSO benefits one 
stream, batching benefits multiple streams.

Thanks
Shirley
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] make atomic_t volatile on all architectures

2007-08-08 Thread Jesper Juhl

On 09/08/2007, Chris Snook <[EMAIL PROTECTED]> wrote:
> Jesper Juhl wrote:
> > On 09/08/2007, Chris Snook <[EMAIL PROTECTED]> wrote:
> >> From: Chris Snook <[EMAIL PROTECTED]>
> >>
> >> Some architectures currently do not declare the contents of an atomic_t to 
> >> be
> >> volatile.  This causes confusion since atomic_read() might not actually 
> >> read
> >> anything if an optimizing compiler re-uses a value stored in a register, 
> >> which
> >> can break code that loops until something external changes the value of an
> >> atomic_t.  Avoiding such bugs requires using barrier(), which causes 
> >> re-loads
> >> of all registers used in the loop, thus hurting performance instead of 
> >> helping
> >> it, particularly on architectures where it's unnecessary.  Since we 
> >> generally
> >> want to re-read the contents of an atomic variable on every access anyway,
> >> let's standardize the behavior across all architectures and avoid the
> >> performance and correctness problems of requiring the use of barrier() in
> >> loops that expect atomic_t variables to change externally.  This is 
> >> relevant
> >> even on non-smp architectures, since drivers may use atomic operations in
> >> interrupt handlers.
> >>
> >> Signed-off-by: Chris Snook <[EMAIL PROTECTED]>
> >>
> >
> > Hmm, I thought we were trying to move away from volatile since it is
> > very weakly defined and towards explicit barriers and locks...
> > Points to --> Documentation/volatile-considered-harmful.txt
>
> This is a special case.

I had a feeling it might be.

> Usually, the use of volatile is just lazy.  In
> this case, it's probably necessary on at least some architectures, so we
> can't remove it everywhere unless we want to rewrite atomic.h completely
> in inline assembler for several architectures, and painstakingly verify
> all that work.

Sounds quite unpleasant and actively harmful - keeping stuff in plain
readable C makes it easier to handle by most people.

> I would hope it's obvious that having consistent
> behavior on all architectures, or even at all compiler optimization
> levels within an architecture, can be agreed to be good.

Agreed, consistency is good.

>  Additionally,
> atomic_t variables are a rare exception, in that we pretty much always
> want to work with the latest value in RAM on every access.  Making this
> atomic will allow us to remove a bunch of barriers which do nothing but
> slow things down on most architectures.
>
I believe you meant to say "volatile" and not "atomic" above.  But
yes, if volatile is sufficiently defined to ensure it does what we
want in this case and using it means we can actually speed up the
kernel, then it does indeed sound reasonable. Especially since it's
confined to the implementation of atomic_t which most people shouldn't
be looking at anyway (but simply use) and when using an atomic_t it's
pretty reasonable to expect that it doesn't need any additional
locking or barriers since it's supposed to be *atomic*.

> I agree that the use of atomic_t in .c files is generally bad, but in
> certain special cases, hiding it inside defined data types may be worth
> the slight impurity, just as we sometimes tolerate lines longer than 80
> columns when "fixing" it makes things unreadable.
>
Can't argue with that.

It seems you've thought it through.  I just wanted to be sure that
this approach was actually the one that makes the most sense, and
you've convinced me of that :-)

-- 
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] make atomic_t volatile on all architectures

2007-08-08 Thread Chris Snook


Lennert Buytenhek wrote:

On Wed, Aug 08, 2007 at 07:07:33PM -0400, Chris Snook wrote:


From: Chris Snook <[EMAIL PROTECTED]>

Some architectures currently do not declare the contents of an atomic_t to be
volatile.  This causes confusion since atomic_read() might not actually read
anything if an optimizing compiler re-uses a value stored in a register, which
can break code that loops until something external changes the value of an
atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads
of all registers used in the loop, thus hurting performance instead of helping
it, particularly on architectures where it's unnecessary.  Since we generally
want to re-read the contents of an atomic variable on every access anyway,
let's standardize the behavior across all architectures and avoid the
performance and correctness problems of requiring the use of barrier() in
loops that expect atomic_t variables to change externally.  This is relevant
even on non-smp architectures, since drivers may use atomic operations in
interrupt handlers.

Signed-off-by: Chris Snook <[EMAIL PROTECTED]>


Documentation/atomic_ops.txt would need updating:

[...]

One very important aspect of these two routines is that they DO NOT
require any explicit memory barriers.  They need only perform the
atomic_t counter update in an SMP safe manner.


Thanks, I was looking for that.  I'll re-send shortly with my comment 
moved there.  People are already using atomic_t in a manner that implies 
the use of memory barriers and interrupt-safety, which is what the patch 
enforces.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] make atomic_t volatile on all architectures

2007-08-08 Thread Chris Snook


Jesper Juhl wrote:

On 09/08/2007, Chris Snook <[EMAIL PROTECTED]> wrote:

From: Chris Snook <[EMAIL PROTECTED]>

Some architectures currently do not declare the contents of an atomic_t to be
volatile.  This causes confusion since atomic_read() might not actually read
anything if an optimizing compiler re-uses a value stored in a register, which
can break code that loops until something external changes the value of an
atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads
of all registers used in the loop, thus hurting performance instead of helping
it, particularly on architectures where it's unnecessary.  Since we generally
want to re-read the contents of an atomic variable on every access anyway,
let's standardize the behavior across all architectures and avoid the
performance and correctness problems of requiring the use of barrier() in
loops that expect atomic_t variables to change externally.  This is relevant
even on non-smp architectures, since drivers may use atomic operations in
interrupt handlers.

Signed-off-by: Chris Snook <[EMAIL PROTECTED]>



Hmm, I thought we were trying to move away from volatile since it is
very weakly defined and towards explicit barriers and locks...
Points to --> Documentation/volatile-considered-harmful.txt


This is a special case.  Usually, the use of volatile is just lazy.  In 
this case, it's probably necessary on at least some architectures, so we 
can't remove it everywhere unless we want to rewrite atomic.h completely 
in inline assembler for several architectures, and painstakingly verify 
all that work.  I would hope it's obvious that having consistent 
behavior on all architectures, or even at all compiler optimization 
levels within an architecture, can be agreed to be good.  Additionally, 
atomic_t variables are a rare exception, in that we pretty much always 
want to work with the latest value in RAM on every access.  Making this 
atomic will allow us to remove a bunch of barriers which do nothing but 
slow things down on most architectures.


I agree that the use of atomic_t in .c files is generally bad, but in 
certain special cases, hiding it inside defined data types may be worth 
the slight impurity, just as we sometimes tolerate lines longer than 80 
columns when "fixing" it makes things unreadable.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] make atomic_t volatile on all architectures

2007-08-08 Thread Lennert Buytenhek

On Wed, Aug 08, 2007 at 07:07:33PM -0400, Chris Snook wrote:

> From: Chris Snook <[EMAIL PROTECTED]>
> 
> Some architectures currently do not declare the contents of an atomic_t to be
> volatile.  This causes confusion since atomic_read() might not actually read
> anything if an optimizing compiler re-uses a value stored in a register, which
> can break code that loops until something external changes the value of an
> atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads
> of all registers used in the loop, thus hurting performance instead of helping
> it, particularly on architectures where it's unnecessary.  Since we generally
> want to re-read the contents of an atomic variable on every access anyway,
> let's standardize the behavior across all architectures and avoid the
> performance and correctness problems of requiring the use of barrier() in
> loops that expect atomic_t variables to change externally.  This is relevant
> even on non-smp architectures, since drivers may use atomic operations in
> interrupt handlers.
> 
> Signed-off-by: Chris Snook <[EMAIL PROTECTED]>

Documentation/atomic_ops.txt would need updating:

[...]

One very important aspect of these two routines is that they DO NOT
require any explicit memory barriers.  They need only perform the
atomic_t counter update in an SMP safe manner.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC]: napi_struct V5

2007-08-08 Thread Shirley Ma

Dave,

> I reverted everything Roland had an issue with, I got tired of arguing
> my position and doing all of the coding too.  He won.
Thanks. I think the reschedule in the IPoIB poll should maintain fairness.

Shirley
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] make atomic_t volatile on all architectures

2007-08-08 Thread Jesper Juhl

On 09/08/2007, Chris Snook <[EMAIL PROTECTED]> wrote:
> From: Chris Snook <[EMAIL PROTECTED]>
>
> Some architectures currently do not declare the contents of an atomic_t to be
> volatile.  This causes confusion since atomic_read() might not actually read
> anything if an optimizing compiler re-uses a value stored in a register, which
> can break code that loops until something external changes the value of an
> atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads
> of all registers used in the loop, thus hurting performance instead of helping
> it, particularly on architectures where it's unnecessary.  Since we generally
> want to re-read the contents of an atomic variable on every access anyway,
> let's standardize the behavior across all architectures and avoid the
> performance and correctness problems of requiring the use of barrier() in
> loops that expect atomic_t variables to change externally.  This is relevant
> even on non-smp architectures, since drivers may use atomic operations in
> interrupt handlers.
>
> Signed-off-by: Chris Snook <[EMAIL PROTECTED]>
>

Hmm, I thought we were trying to move away from volatile since it is
very weakly defined and towards explicit barriers and locks...
Points to --> Documentation/volatile-considered-harmful.txt


-- 
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC]: napi_struct V6

2007-08-08 Thread David Miller

From: Jeff Garzik <[EMAIL PROTECTED]>
Date: Wed, 08 Aug 2007 12:27:35 -0400

> Someone please make sure Documentation/networking/netdevices.txt remains 
> correct with regards to APIs and locking.
> 
> It -is- up to date, unlike NAPI howto.  It should not change very much 
> due to this napi_struct work though, if at all.

I'll take care of it when I cut my net-2.6.24 tree later today.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] e1000e: Fix header includes

2007-08-08 Thread Auke Kok

Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 drivers/net/e1000e/82571.c   |4 
 drivers/net/e1000e/e1000.h   |7 ---
 drivers/net/e1000e/es2lan.c  |5 +
 drivers/net/e1000e/ethtool.c |3 ++-
 drivers/net/e1000e/hw.h  |2 ++
 drivers/net/e1000e/ich8lan.c |5 +
 drivers/net/e1000e/lib.c |2 ++
 drivers/net/e1000e/netdev.c  |1 +
 8 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/net/e1000e/82571.c b/drivers/net/e1000e/82571.c
index ddf2303..0f8f0ac 100644
--- a/drivers/net/e1000e/82571.c
+++ b/drivers/net/e1000e/82571.c
@@ -37,6 +37,10 @@
  * 82573L Gigabit Ethernet Controller
  */
 
+#include 
+#include 
+#include 
+
 #include "e1000.h"
 
 #define ID_LED_RESERVED_F746 0xF746
diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index a1394d6..de17537 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -31,10 +31,11 @@
 #ifndef _E1000_H_
 #define _E1000_H_
 
+#include 
+#include 
+#include 
+#include 
 #include 
-#include 
-#include 
-#include 
 
 #include "hw.h"
 
diff --git a/drivers/net/e1000e/es2lan.c b/drivers/net/e1000e/es2lan.c
index 5604c50..8100d03 100644
--- a/drivers/net/e1000e/es2lan.c
+++ b/drivers/net/e1000e/es2lan.c
@@ -31,6 +31,11 @@
  * 80003ES2LAN Gigabit Ethernet Controller (Serdes)
  */
 
+#include 
+#include 
+#include 
+#include 
+
 #include "e1000.h"
 
 #define E1000_KMRNCTRLSTA_OFFSET_FIFO_CTRL  0x00
diff --git a/drivers/net/e1000e/ethtool.c b/drivers/net/e1000e/ethtool.c
index 6c417ea..a8fa1db 100644
--- a/drivers/net/e1000e/ethtool.c
+++ b/drivers/net/e1000e/ethtool.c
@@ -29,8 +29,9 @@
 /* ethtool support for e1000 */
 
 #include 
-
 #include 
+#include 
+#include 
 
 #include "e1000.h"
 
diff --git a/drivers/net/e1000e/hw.h b/drivers/net/e1000e/hw.h
index 4d562c4..848217a 100644
--- a/drivers/net/e1000e/hw.h
+++ b/drivers/net/e1000e/hw.h
@@ -29,6 +29,8 @@
 #ifndef _E1000_HW_H_
 #define _E1000_HW_H_
 
+#include 
+
 struct e1000_hw;
 struct e1000_adapter;
 
diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
index 042abd4..85095af 100644
--- a/drivers/net/e1000e/ich8lan.c
+++ b/drivers/net/e1000e/ich8lan.c
@@ -40,6 +40,11 @@
  * 82566MM Gigabit Network Connection
  */
 
+#include 
+#include 
+#include 
+#include 
+
 #include "e1000.h"
 
 #define ICH_FLASH_GFPREG   0x
diff --git a/drivers/net/e1000e/lib.c b/drivers/net/e1000e/lib.c
index 3bbe63e..c92ea77 100644
--- a/drivers/net/e1000e/lib.c
+++ b/drivers/net/e1000e/lib.c
@@ -27,6 +27,8 @@
 
***/
 
 #include 
+#include 
+#include 
 #include 
 
 #include "e1000.h"
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 741965d..0ea6eda 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] make atomic_t volatile on all architectures

2007-08-08 Thread Chris Snook

From: Chris Snook <[EMAIL PROTECTED]>

Some architectures currently do not declare the contents of an atomic_t to be
volatile.  This causes confusion since atomic_read() might not actually read
anything if an optimizing compiler re-uses a value stored in a register, which
can break code that loops until something external changes the value of an
atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads
of all registers used in the loop, thus hurting performance instead of helping
it, particularly on architectures where it's unnecessary.  Since we generally
want to re-read the contents of an atomic variable on every access anyway,
let's standardize the behavior across all architectures and avoid the
performance and correctness problems of requiring the use of barrier() in
loops that expect atomic_t variables to change externally.  This is relevant
even on non-smp architectures, since drivers may use atomic operations in
interrupt handlers.

Signed-off-by: Chris Snook <[EMAIL PROTECTED]>


diff -urp linux-2.6.23-rc2-orig/include/asm-blackfin/atomic.h 
linux-2.6.23-rc2/include/asm-blackfin/atomic.h
--- linux-2.6.23-rc2-orig/include/asm-blackfin/atomic.h 2007-07-08 
19:32:17.0 -0400
+++ linux-2.6.23-rc2/include/asm-blackfin/atomic.h  2007-08-08 
18:18:43.0 -0400
@@ -13,8 +13,13 @@
  * Tony Kou ([EMAIL PROTECTED])   Lineo Inc.   2001
  */
 
+/*
+ * Make atomic_t volatile to remove the need for barriers in loops that
+ * wait for an outside event.  We generally want to re-load the atomic_t
+ * variable each time anyway, but don't need to re-load everything else.
+ */
 typedef struct {
-   int counter;
+   volatile int counter;
 } atomic_t;
 #define ATOMIC_INIT(i) { (i) }
 
diff -urp linux-2.6.23-rc2-orig/include/asm-frv/atomic.h 
linux-2.6.23-rc2/include/asm-frv/atomic.h
--- linux-2.6.23-rc2-orig/include/asm-frv/atomic.h  2007-07-08 
19:32:17.0 -0400
+++ linux-2.6.23-rc2/include/asm-frv/atomic.h   2007-08-08 18:21:55.0 
-0400
@@ -35,8 +35,13 @@
 #define smp_mb__before_atomic_inc()barrier()
 #define smp_mb__after_atomic_inc() barrier()
 
+/*
+ * Make atomic_t volatile to remove the need for barriers in loops that
+ * wait for an outside event.  We generally want to re-load the atomic_t
+ * variable each time anyway, but don't need to re-load everything else.
+ */
 typedef struct {
-   int counter;
+   volatile int counter;
 } atomic_t;
 
 #define ATOMIC_INIT(i) { (i) }
diff -urp linux-2.6.23-rc2-orig/include/asm-h8300/atomic.h 
linux-2.6.23-rc2/include/asm-h8300/atomic.h
--- linux-2.6.23-rc2-orig/include/asm-h8300/atomic.h2007-07-08 
19:32:17.0 -0400
+++ linux-2.6.23-rc2/include/asm-h8300/atomic.h 2007-08-08 18:24:02.0 
-0400
@@ -6,7 +6,12 @@
  * resource counting etc..
  */
 
-typedef struct { int counter; } atomic_t;
+/*
+ * Make atomic_t volatile to remove the need for barriers in loops that
+ * wait for an outside event.  We generally want to re-load the atomic_t
+ * variable each time anyway, but don't need to re-load everything else.
+ */
+typedef struct { volatile int counter; } atomic_t;
 #define ATOMIC_INIT(i) { (i) }
 
 #define atomic_read(v) ((v)->counter)
diff -urp linux-2.6.23-rc2-orig/include/asm-i386/atomic.h 
linux-2.6.23-rc2/include/asm-i386/atomic.h
--- linux-2.6.23-rc2-orig/include/asm-i386/atomic.h 2007-07-08 
19:32:17.0 -0400
+++ linux-2.6.23-rc2/include/asm-i386/atomic.h  2007-08-08 18:26:09.0 
-0400
@@ -14,8 +14,12 @@
  * Make sure gcc doesn't try to be clever and move things around
  * on us. We need to use _exactly_ the address the user gave us,
  * not some alias that contains the same information.
+ *
+ * Make atomic_t volatile to remove the need for barriers in loops that
+ * wait for an outside event.  We generally want to re-load the atomic_t
+ * variable each time anyway, but don't need to re-load everything else.
  */
-typedef struct { int counter; } atomic_t;
+typedef struct { volatile int counter; } atomic_t;
 
 #define ATOMIC_INIT(i) { (i) }
 
diff -urp linux-2.6.23-rc2-orig/include/asm-m68k/atomic.h 
linux-2.6.23-rc2/include/asm-m68k/atomic.h
--- linux-2.6.23-rc2-orig/include/asm-m68k/atomic.h 2007-07-08 
19:32:17.0 -0400
+++ linux-2.6.23-rc2/include/asm-m68k/atomic.h  2007-08-08 18:28:58.0 
-0400
@@ -13,7 +13,12 @@
  * We do not have SMP m68k systems, so we don't have to deal with that.
  */
 
-typedef struct { int counter; } atomic_t;
+/*
+ * Make atomic_t volatile to remove the need for barriers in loops that
+ * wait for an outside event.  We generally want to re-load the atomic_t
+ * variable each time anyway, but don't need to re-load everything else.
+ */
+typedef struct { volatile int counter; } atomic_t;
 #define ATOMIC_INIT(i) { (i) }
 
 #define atomic_read(v) ((v)->counter)
diff -urp linux-2.6.23-rc2-orig/include/asm-m68knommu/atomic.h 
linux-2.6.23-rc2/include/asm-m68knommu/atomic.h
--- linux-2.6.23-r

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-08 Thread jamal

On Wed, 2007-08-08 at 15:22 -0700, David Miller wrote:

> The driver path, however, does not exist on an island and what
> we care about is the final result with the changes running
> inside the full system.
> 
> So, to be honest, besides for initial internal development
> feedback, the isolated tests only have minimal merit and
> it's the full protocol tests that are really interesting.

But you cant go there if cant show the path which is supposedly improved
has indeed improved;-> I would certainly agree with you that if it
doesnt prove consistently useful with protocols it has no value
(remember thats why i never submitted these patches all this time).
We just need better analysis of the results - i cant ignore that the
selection of the clock sources for example gives me different results
and that when i boot i cant be guaranteed the same clock source. I cant
ignore the fact that i get different results when i use a different
congestion control algorithm. And none of this has to do with the
batching patches.

I am using UDP at the moment because it is simpler to analyze. And yes,
it would be "an interesting idea that gets shelved" if we cant achieve
any of the expected goals. We've shelved ideas before. BTW, read the
little doc i wrote on the dev->prep_xmit() you may find it interesting.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-08 Thread jamal

On Wed, 2007-08-08 at 21:55 +0100, Stephen Hemminger wrote:

> > pktgen shows a clear win if you test the driver path - which is what you
> > should test because thats where the batching changes are. 
> > Using TCP or UDP adds other variables[1] that need to be isolated first
> > in order to quantify the effect of batching. 
> > For throughput and CPU utilization, the benefit will be clear when there
> > are a lot more flows. 
> 
> Optimizing for pktgen is a mistake for most users. 

There is no "optimization for pktgen". 

If you are improving the tx path, the first step is to test that you can
show the tx path improved. pktgen happens to be the best test suite for
that because it talks to the driver and exercises the changes.
i.e if one cant show that exercising the direct path demonstrates
improvements you probably wont be able to show batching improves TCP -
but i dont even wanna swear by that. 
Does that make sense?

> Please show something
> useful like router forwarding, TCP (single and multi flow) and/or better
> yet application benchmark improvement.

Absolutely, but first things first. Analysis of why something improves
is extremely important, just saying "TCP throughput improved" is not
interesting and lazy. 
To be scientific, it is important to isolate variables first in order to
come up with meaningful results that can be analysed. 
To make a point, I have noticed extremely different results between TCP
BIC vs reno with batching. So congestion control as a variable is
important. 

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] ipvs: force read of atomic_t in while loop

2007-08-08 Thread Chris Snook


Heiko Carstens wrote:

On Wed, Aug 08, 2007 at 02:31:15PM -0700, Andrew Morton wrote:

On Wed, 08 Aug 2007 17:08:44 -0400
Chris Snook <[EMAIL PROTECTED]> wrote:


Heiko Carstens wrote:

On Wed, Aug 08, 2007 at 03:21:31AM -0700, David Miller wrote:

From: Heiko Carstens <[EMAIL PROTECTED]>
Date: Wed, 8 Aug 2007 11:33:00 +0200


Just saw this while grepping for atomic_reads in a while loops.
Maybe we should re-add the volatile to atomic_t. Not sure.

I think whatever the choice, it should be done consistently
on every architecture.

It's just asking for trouble if your arch does it differently from
every other.

Well..currently it's i386/x86_64 and s390 which have no volatile
in atomic_t. And yes, of course I agree it should be consistent
across all architectures. But it isn't.
Based on recent discussion, it's pretty clear that there's a lot of 
confusion about this.  A lot of people (myself included, until I thought 
about it long and hard) will reasonably assume that calling 
atomic_read() will actually read the value from memory.  Leaving out the 
volatile declaration seems like a pessimization to me.  If you force 
people to use barrier() everywhere they're working with atomic_t, it 
will force re-reads of all the non-atomic data in use as well, which 
will cause more memory fetches of things that generally don't need 
barrier().  That and it's a bug waiting to happen.


Andi -- your thoughts on the matter?

I'm not Andi, but this not-Andi thinks that permitting the compiler to cache
the results of atomic_read() is dumb.


Ok, how about this:

Subject: [PATCH] Add 'volatile' to atomic_t again.

From: Heiko Carstens <[EMAIL PROTECTED]>

This basically reverts f9e9dcb38f5106fa8cdac04a9e967d5487f1cd20 which
removed 'volatile' from atomic_t for i386/x86_64. Reason for this
is to make sure that code like
while (atomic_read(&whatever));
continues to work.
Otherwise the compiler might generate code that will loop forever.
Also this makes sure atomic_t is the same across all architectures.

Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Martin Schwidefsky <[EMAIL PROTECTED]>
Signed-off-by: Heiko Carstens <[EMAIL PROTECTED]>
---

s390 patch will go in via Martin if this is accepted.

 include/asm-i386/atomic.h   |2 +-
 include/asm-x86_64/atomic.h |2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6/include/asm-i386/atomic.h
===
--- linux-2.6.orig/include/asm-i386/atomic.h
+++ linux-2.6/include/asm-i386/atomic.h
@@ -15,7 +15,7 @@
  * on us. We need to use _exactly_ the address the user gave us,
  * not some alias that contains the same information.
  */
-typedef struct { int counter; } atomic_t;
+typedef struct { volatile int counter; } atomic_t;
 
 #define ATOMIC_INIT(i)	{ (i) }
 
Index: linux-2.6/include/asm-x86_64/atomic.h

===
--- linux-2.6.orig/include/asm-x86_64/atomic.h
+++ linux-2.6/include/asm-x86_64/atomic.h
@@ -22,7 +22,7 @@
  * on us. We need to use _exactly_ the address the user gave us,
  * not some alias that contains the same information.
  */
-typedef struct { int counter; } atomic_t;
+typedef struct { volatile int counter; } atomic_t;
 
 #define ATOMIC_INIT(i)	{ (i) }
 


Good so far, but we need to fix it on non-SMP architectures too, since 
drivers may use atomic_t in interrupt code.  Ideally I'd like to be able 
to remove a whole bunch of barriers, since they cause a lot of needless 
re-fetches for everything else in the loop.  We should also document the 
semantics of atomic_t to ensure consistency in the future.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] ipvs: force read of atomic_t in while loop

2007-08-08 Thread Heiko Carstens

On Wed, Aug 08, 2007 at 02:31:15PM -0700, Andrew Morton wrote:
> On Wed, 08 Aug 2007 17:08:44 -0400
> Chris Snook <[EMAIL PROTECTED]> wrote:
> 
> > Heiko Carstens wrote:
> > > On Wed, Aug 08, 2007 at 03:21:31AM -0700, David Miller wrote:
> > >> From: Heiko Carstens <[EMAIL PROTECTED]>
> > >> Date: Wed, 8 Aug 2007 11:33:00 +0200
> > >>
> > >>> Just saw this while grepping for atomic_reads in a while loops.
> > >>> Maybe we should re-add the volatile to atomic_t. Not sure.
> > >> I think whatever the choice, it should be done consistently
> > >> on every architecture.
> > >>
> > >> It's just asking for trouble if your arch does it differently from
> > >> every other.
> > > 
> > > Well..currently it's i386/x86_64 and s390 which have no volatile
> > > in atomic_t. And yes, of course I agree it should be consistent
> > > across all architectures. But it isn't.
> > 
> > Based on recent discussion, it's pretty clear that there's a lot of 
> > confusion about this.  A lot of people (myself included, until I thought 
> > about it long and hard) will reasonably assume that calling 
> > atomic_read() will actually read the value from memory.  Leaving out the 
> > volatile declaration seems like a pessimization to me.  If you force 
> > people to use barrier() everywhere they're working with atomic_t, it 
> > will force re-reads of all the non-atomic data in use as well, which 
> > will cause more memory fetches of things that generally don't need 
> > barrier().  That and it's a bug waiting to happen.
> > 
> > Andi -- your thoughts on the matter?
> 
> I'm not Andi, but this not-Andi thinks that permitting the compiler to cache
> the results of atomic_read() is dumb.

Ok, how about this:

Subject: [PATCH] Add 'volatile' to atomic_t again.

From: Heiko Carstens <[EMAIL PROTECTED]>

This basically reverts f9e9dcb38f5106fa8cdac04a9e967d5487f1cd20 which
removed 'volatile' from atomic_t for i386/x86_64. Reason for this
is to make sure that code like
while (atomic_read(&whatever));
continues to work.
Otherwise the compiler might generate code that will loop forever.
Also this makes sure atomic_t is the same across all architectures.

Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Martin Schwidefsky <[EMAIL PROTECTED]>
Signed-off-by: Heiko Carstens <[EMAIL PROTECTED]>
---

s390 patch will go in via Martin if this is accepted.

 include/asm-i386/atomic.h   |2 +-
 include/asm-x86_64/atomic.h |2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6/include/asm-i386/atomic.h
===
--- linux-2.6.orig/include/asm-i386/atomic.h
+++ linux-2.6/include/asm-i386/atomic.h
@@ -15,7 +15,7 @@
  * on us. We need to use _exactly_ the address the user gave us,
  * not some alias that contains the same information.
  */
-typedef struct { int counter; } atomic_t;
+typedef struct { volatile int counter; } atomic_t;
 
 #define ATOMIC_INIT(i) { (i) }
 
Index: linux-2.6/include/asm-x86_64/atomic.h
===
--- linux-2.6.orig/include/asm-x86_64/atomic.h
+++ linux-2.6/include/asm-x86_64/atomic.h
@@ -22,7 +22,7 @@
  * on us. We need to use _exactly_ the address the user gave us,
  * not some alias that contains the same information.
  */
-typedef struct { int counter; } atomic_t;
+typedef struct { volatile int counter; } atomic_t;
 
 #define ATOMIC_INIT(i) { (i) }
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-08 Thread David Miller

From: jamal <[EMAIL PROTECTED]>
Date: Wed, 08 Aug 2007 11:14:35 -0400

> pktgen shows a clear win if you test the driver path - which is what
> you should test because thats where the batching changes are.

The driver path, however, does not exist on an island and what
we care about is the final result with the changes running
inside the full system.

So, to be honest, besides for initial internal development
feedback, the isolated tests only have minimal merit and
it's the full protocol tests that are really interesting.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC]: napi_struct V5

2007-08-08 Thread David Miller

From: Shirley Ma <[EMAIL PROTECTED]>
Date: Wed, 8 Aug 2007 08:14:13 -0700

> Dave, could you please hold this portion of the patch for a moment. I will
> test this patch ASAP. According to our previous experience, this changes
> significant changes some IPoIB driver performance.

I reverted everything Roland had an issue with, I got tired of arguing
my position and doing all of the coding too.  He won.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] e1000e: Remove unused or empty labels

2007-08-08 Thread Jeff Garzik


Auke Kok wrote:

Remove labels with only return, remove E1000_SUCCESS code and
replace with 0. Remove most goto's.

Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 drivers/net/e1000e/82571.c   |  138 +-
 drivers/net/e1000e/defines.h |1 
 drivers/net/e1000e/es2lan.c  |  162 +++--

 drivers/net/e1000e/ich8lan.c |  401 +-
 drivers/net/e1000e/lib.c |  314 +
 drivers/net/e1000e/netdev.c  |4 
 drivers/net/e1000e/phy.c |  278 -

 7 files changed, 521 insertions(+), 777 deletions(-)


applied all three patches to #e1000e


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread David Miller

From: Andi Kleen <[EMAIL PROTECTED]>
Date: 08 Aug 2007 14:38:10 +0200

> Jeff Garzik <[EMAIL PROTECTED]> writes:
> > > + val, reg_index, addr, addr+4); */
> > > + writel(cpu_to_le32(reg_index), addr);
> > > + writel(cpu_to_le32(val),(u8 *)addr + 4);
> > 
> > wrong -- endian conversion macros not needed with writel()
> 
> Are you sure? I don't think that's true. e.g. powerpc writel 
> doesn't convert endian

raw_writel() doesn't, but writel() does.

Why not look at the code, as I did, instead of "think"ing? :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-08 Thread David Miller

From: Krishna Kumar2 <[EMAIL PROTECTED]>
Date: Wed, 8 Aug 2007 16:39:47 +0530

> What do you generally think of the patch/implementation ? :)

We have two driver implementation paths on recieve and now
we'll have two on send, and that's not a good trend.

In an ideal world all the drivers would be NAPI and netif_rx()
would only be used by tunneling drivers and similar in the
protocol layers.  And likewise all sends would go through
->hard_start_xmit().

If you can come up with a long term strategy that gets rid of
the special transmit method, that'd be great.

We should make Linux network drivers easy to write, not more difficult
by constantly adding most interfaces than we consolidate.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: IPV6_PKTINFO socket option

2007-08-08 Thread David Miller

From: Gerrit Renker <[EMAIL PROTECTED]>
Date: Wed, 8 Aug 2007 12:16:51 +0100

>  2. On sparc64 with the same kernel IPV6_PKTINFO works without problems, even 
> pulls
> out the cmsg fields correctly. Conversely, when trying to set the 
> IPV6_RECVPKTINFO
> sticky option on the socket, no cmsg fields are generated. 
> The kernel is of the same date and revision as the i386 kernel - library 
> issue ???
> It is very annoying, since the application needs to run on both 
> architectures.

Almost certainly a compat layer bug in the kernel, it would
affect 32-bit applications on powerpc64 and similar if so.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] ipvs: force read of atomic_t in while loop

2007-08-08 Thread Andrew Morton

On Wed, 08 Aug 2007 17:08:44 -0400
Chris Snook <[EMAIL PROTECTED]> wrote:

> Heiko Carstens wrote:
> > On Wed, Aug 08, 2007 at 03:21:31AM -0700, David Miller wrote:
> >> From: Heiko Carstens <[EMAIL PROTECTED]>
> >> Date: Wed, 8 Aug 2007 11:33:00 +0200
> >>
> >>> Just saw this while grepping for atomic_reads in a while loops.
> >>> Maybe we should re-add the volatile to atomic_t. Not sure.
> >> I think whatever the choice, it should be done consistently
> >> on every architecture.
> >>
> >> It's just asking for trouble if your arch does it differently from
> >> every other.
> > 
> > Well..currently it's i386/x86_64 and s390 which have no volatile
> > in atomic_t. And yes, of course I agree it should be consistent
> > across all architectures. But it isn't.
> 
> Based on recent discussion, it's pretty clear that there's a lot of 
> confusion about this.  A lot of people (myself included, until I thought 
> about it long and hard) will reasonably assume that calling 
> atomic_read() will actually read the value from memory.  Leaving out the 
> volatile declaration seems like a pessimization to me.  If you force 
> people to use barrier() everywhere they're working with atomic_t, it 
> will force re-reads of all the non-atomic data in use as well, which 
> will cause more memory fetches of things that generally don't need 
> barrier().  That and it's a bug waiting to happen.
> 
> Andi -- your thoughts on the matter?

I'm not Andi, but this not-Andi thinks that permitting the compiler to cache
the results of atomic_read() is dumb.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [patch 1/5][RFC] NET: Change pci_enable_device topci_reenable_device to keep device enable balance

2007-08-08 Thread Ramkrishna Vepa

Brandon,

Before slot_reset event is called io_error_detected could be called
(where pci_disable_device() is called), right? 

The pci_reenable_device() will call enable only if the device was
enabled before and would not be enabled if the device were disabled. Is
this the intended behavior?

Ram

> -Original Message-
> From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
> On Behalf Of Brandon Philips
> Sent: Thursday, August 02, 2007 3:44 PM
> To: netdev@vger.kernel.org
> Cc: [EMAIL PROTECTED]; Brandon Philips
> Subject: [patch 1/5][RFC] NET: Change pci_enable_device
> topci_reenable_device to keep device enable balance
> 
> On a slot_reset event pci_disable_device() is never called so calling
> pci_enable_device() will unbalance the enable count.
> 
> Signed-off-by: Brandon Philips <[EMAIL PROTECTED]>
> 
> ---
>  drivers/net/e100.c |2 +-
>  drivers/net/e1000/e1000_main.c |2 +-
>  drivers/net/ixgb/ixgb_main.c   |2 +-
>  drivers/net/s2io.c |2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6/drivers/net/e100.c
> ===
> --- linux-2.6.orig/drivers/net/e100.c
> +++ linux-2.6/drivers/net/e100.c
> @@ -2828,7 +2828,7 @@ static pci_ers_result_t e100_io_slot_res
>   struct net_device *netdev = pci_get_drvdata(pdev);
>   struct nic *nic = netdev_priv(netdev);
> 
> - if (pci_enable_device(pdev)) {
> + if (pci_reenable_device(pdev)) {
>   printk(KERN_ERR "e100: Cannot re-enable PCI device after
> reset.\n");
>   return PCI_ERS_RESULT_DISCONNECT;
>   }
> Index: linux-2.6/drivers/net/e1000/e1000_main.c
> ===
> --- linux-2.6.orig/drivers/net/e1000/e1000_main.c
> +++ linux-2.6/drivers/net/e1000/e1000_main.c
> @@ -5270,7 +5270,7 @@ static pci_ers_result_t e1000_io_slot_re
>   struct net_device *netdev = pci_get_drvdata(pdev);
>   struct e1000_adapter *adapter = netdev->priv;
> 
> - if (pci_enable_device(pdev)) {
> + if (pci_reenable_device(pdev)) {
>   printk(KERN_ERR "e1000: Cannot re-enable PCI device
after
> reset.\n");
>   return PCI_ERS_RESULT_DISCONNECT;
>   }
> Index: linux-2.6/drivers/net/ixgb/ixgb_main.c
> ===
> --- linux-2.6.orig/drivers/net/ixgb/ixgb_main.c
> +++ linux-2.6/drivers/net/ixgb/ixgb_main.c
> @@ -2294,7 +2294,7 @@ static pci_ers_result_t ixgb_io_slot_res
>   struct net_device *netdev = pci_get_drvdata(pdev);
>   struct ixgb_adapter *adapter = netdev_priv(netdev);
> 
> - if(pci_enable_device(pdev)) {
> + if(pci_reenable_device(pdev)) {
>   DPRINTK(PROBE, ERR, "Cannot re-enable PCI device after
> reset.\n");
>   return PCI_ERS_RESULT_DISCONNECT;
>   }
> Index: linux-2.6/drivers/net/s2io.c
> ===
> --- linux-2.6.orig/drivers/net/s2io.c
> +++ linux-2.6/drivers/net/s2io.c
> @@ -7833,7 +7833,7 @@ static pci_ers_result_t s2io_io_slot_res
>   struct net_device *netdev = pci_get_drvdata(pdev);
>   struct s2io_nic *sp = netdev->priv;
> 
> - if (pci_enable_device(pdev)) {
> + if (pci_reenable_device(pdev)) {
>   printk(KERN_ERR "s2io: "
>  "Cannot re-enable PCI device after reset.\n");
>   return PCI_ERS_RESULT_DISCONNECT;
> 
> --
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] ipvs: force read of atomic_t in while loop

2007-08-08 Thread Chris Snook

Heiko Carstens wrote:

On Wed, Aug 08, 2007 at 03:21:31AM -0700, David Miller wrote:

From: Heiko Carstens <[EMAIL PROTECTED]>
Date: Wed, 8 Aug 2007 11:33:00 +0200

Just saw this while grepping for atomic_reads in a while loops.
Maybe we should re-add the volatile to atomic_t. Not sure.

I think whatever the choice, it should be done consistently
on every architecture.

It's just asking for trouble if your arch does it differently from
every other.

Well..currently it's i386/x86_64 and s390 which have no volatile
in atomic_t. And yes, of course I agree it should be consistent
across all architectures. But it isn't.

Based on recent discussion, it's pretty clear that there's a lot of 
confusion about this.  A lot of people (myself included, until I thought 
about it long and hard) will reasonably assume that calling 
atomic_read() will actually read the value from memory.  Leaving out the 
volatile declaration seems like a pessimization to me.  If you force 
people to use barrier() everywhere they're working with atomic_t, it 
will force re-reads of all the non-atomic data in use as well, which 
will cause more memory fetches of things that generally don't need 
barrier().  That and it's a bug waiting to happen.

Andi -- your thoughts on the matter?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-08 Thread Stephen Hemminger

On Wed, 08 Aug 2007 11:14:35 -0400
jamal <[EMAIL PROTECTED]> wrote:

> On Wed, 2007-08-08 at 21:42 +0800, Herbert Xu wrote:
> > On Wed, Aug 08, 2007 at 03:49:00AM -0700, David Miller wrote:
> > > 
> > > Not because I think it obviates your work, but rather because I'm
> > > curious, could you test a TSO-in-hardware driver converted to
> > > batching and see how TSO alone compares to batching for a pure
> > > TCP workload?
> > 
> > You could even lower the bar by disabling TSO and enabling
> > software GSO.
> 
> >From my observation for TCP packets slightly above MDU (upto 2K), GSO
> gives worse performance than non-GSO throughput-wise. Actually this has
> nothing to do with batching, rather the behavior is consistent with or
> without batching changes. 
> 
> > > I personally don't think it will help for that case at all as
> > > TSO likely does better job of coalescing the work _and_ reducing
> > > bus traffic as well as work in the TCP stack.
> > 
> > I agree.
> > I suspect the bulk of the effort is in getting
> > these skb's created and processed by the stack so that by
> > the time that they're exiting the qdisc there's not much
> > to be saved anymore.
> 
> pktgen shows a clear win if you test the driver path - which is what you
> should test because thats where the batching changes are. 
> Using TCP or UDP adds other variables[1] that need to be isolated first
> in order to quantify the effect of batching. 
> For throughput and CPU utilization, the benefit will be clear when there
> are a lot more flows. 

Optimizing for pktgen is a mistake for most users. Please show something
useful like router forwarding, TCP (single and multi flow) and/or better
yet application benchmark improvement.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net/tulip/xircom_cb.c: remove superfulous priv assignment

2007-08-08 Thread Arjan van de Ven

On Wed, 2007-08-08 at 13:20 +0200, Mariusz Kozlowski wrote:
> Hello,
> 
>   Unpatched version does sth like this:
> 
> dev = alloc_etherdev(...
> private = netdev_priv(dev);
> ...
> dev->priv = private;
> 
> which doesn't make much sense (does it?) because this is done in
> alloc_netdev() already.


Looks good

Acked-by: Arjan van de Ven <[EMAIL PROTECTED]>


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ewg] [PATCH 11/14] nes: OpenFabrics kernel verbs

2007-08-08 Thread Roland Dreier

 > +static struct ib_mw *nes_alloc_mw(struct ib_pd *ibpd) {

 > +get_random_bytes(&next_stag_index, sizeof(next_stag_index));

Could this use up a lot of entropy?  Is random32() sufficient?

 > +stag_key = (u8)next_stag_index;

I don't think this cast is needed.

 > +if (ret) {
 > +return (ERR_PTR(ret));
 > +}

Don't need braces for one-line blocks.

 > +if (NULL == cqp_request) {

It's more idiomatic to write "if (!cqp_request) {"
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Jeff Garzik


Michael Buesch wrote:

On Wednesday 08 August 2007 18:59:08 Jeff Garzik wrote:

Michael Buesch wrote:

writel doesn't guarantee flushing either.
readl does.


Not quite -- there are multiple kinds of flushing.  You're thinking 
about flushing across PCI bridges, which is correct, but you also have 
CPU write posting and CPU write ordering and such.


Without taking all that into account, you might be tempted to think that 
__raw_readl() will perform all flushes necessary following a 
__raw_writel() -- but that would be incorrect.


So, kind of...
Better use writel(swab32(...
unless you like being shot into the foot.


Correct.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

2.6.23-rc2: WARNING at kernel/irq/resend.c:70 check_irq_resend()

2007-08-08 Thread Indan Zupancic

Hi,

Just added an old network card, RTL-8029(AS), ne2k-pci driver, and tried to
expand the network (failed because I didn't use a cross-over cable).

The code snippet that spat the thing:

/*
 * IRQ resend
 *
 * Is called with interrupts disabled and desc->lock held.
 */
void check_irq_resend(struct irq_desc *desc, unsigned int irq)
{
unsigned int status = desc->status;

/*
 * Make sure the interrupt is enabled, before resending it:
 */
desc->chip->enable(irq);

/*
 * Temporary hack to figure out more about the problem, which
 * is causing the ancient network cards to die.
 */
if (desc->handle_irq != handle_edge_irq) {
WARN_ON_ONCE(1);
return;
}

and relevant dmesg snippet:

[  446.836399] eth1: RealTek RTL-8029 found at 0x9000, IRQ 16, 
00:00:B4:BB:BF:E5.
[  855.458468] r8169: eth0: link up
[  857.500667] r8169: eth0: link up
[  902.294283] r8169: eth0: link up
[ 4126.348427] r8169: eth0: link up
[ 5598.446233] r8169: eth0: link down
[ 5618.904846] r8169: eth0: link up
[ 5623.133090] r8169: eth0: link down
[ 5624.708071] r8169: eth0: link up
[ 6328.267872] WARNING: at /home/indan/src/git/linux-2.6/kernel/irq/resend.c:70 
check_irq_resend()
[ 6328.267896]  [] check_irq_resend+0x4f/0x8c
[ 6328.267912]  [] enable_irq+0x84/0xaa
[ 6328.267918]  [] ne2k_pci_block_output+0x0/0x138 [ne2k_pci]
[ 6328.267931]  [] ne2k_pci_block_output+0x0/0x138 [ne2k_pci]
[ 6328.267939]  [] ne2k_pci_block_output+0x0/0x138 [ne2k_pci]
[ 6328.267947]  [] ei_start_xmit+0x29f/0x2b9 [8390]
[ 6328.267966]  [] dev_hard_start_xmit+0x19e/0x1fd
[ 6328.267976]  [] __qdisc_run+0x6c/0x151
[ 6328.267986]  [] dev_queue_xmit+0x12a/0x271
[ 6328.267991]  [] arp_send+0x4c/0x64
[ 6328.268002]  [] arp_xmit+0x4d/0x51
[ 6328.268009]  [] arp_process+0x29b/0x50a
[ 6328.268018]  [] ip_forward+0x22c/0x23a
[ 6328.268026]  [] ip_rcv_finish+0x0/0x282
[ 6328.268032]  [] ip_rcv+0x479/0x4a8
[ 6328.268037]  [] ip_rcv_finish+0x0/0x282
[ 6328.268044]  [] arp_rcv+0xf0/0x104
[ 6328.268049]  [] hrtimer_run_queues+0x12/0x1a1
[ 6328.268055]  [] arp_rcv+0x0/0x104
[ 6328.268061]  [] netif_receive_skb+0x2d9/0x317
[ 6328.268068]  [] process_backlog+0x6d/0xd2
[ 6328.268075]  [] net_rx_action+0x81/0x14d
[ 6328.268082]  [] __do_softirq+0x35/0x75
[ 6328.268088]  [] do_softirq+0x3e/0x8d
[ 6328.268098]  [] handle_fasteoi_irq+0x0/0xbe
[ 6328.268103]  [] irq_exit+0x25/0x30
[ 6328.268107]  [] do_IRQ+0x94/0xad
[ 6328.268114]  [] common_interrupt+0x23/0x28
[ 6328.268123]  [] default_idle+0x27/0x39
[ 6328.268128]  [] cpu_idle+0x41/0x55
[ 6328.268133]  [] start_kernel+0x20e/0x213
[ 6328.268140]  [] unknown_bootoption+0x0/0x196
[ 6328.268146]  ===

# cat /proc/interrupts
   CPU0
  0:9804926   IO-APIC-edge  timer
  1:   8270   IO-APIC-edge  i8042
  9:  0   IO-APIC-fasteoi   acpi
 12: 181281   IO-APIC-edge  i8042
 16:  40410   IO-APIC-fasteoi   sata_sil, eth1
 17:  0   IO-APIC-fasteoi   ohci_hcd:usb1, NVidia nForce2
 18: 499428   IO-APIC-fasteoi   ohci_hcd:usb2
 19:  2   IO-APIC-fasteoi   ehci_hcd:usb3
 20: 689588   IO-APIC-fasteoi   eth0
NMI:  0
LOC:9805098
ERR:  0
MIS:  0

So the card is sharing an IRQ with the disk controller.

No idea if the network card died after this warning, for all
practical purposes it didn't work before, so couldn't check.

I can provide more information and run some tests if anyone
wants that, just keep in mind that the card isn't connected to
anything.

Greetings,

Indan


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [E1000-devel] [PATCH] NET: Change pci_enable_device to pci_reenable_device to keep device enable balance

2007-08-08 Thread Brandon Philips

On 10:36 Wed 08 Aug 2007, Kok, Auke wrote:
> Brandon Philips wrote:
>> I sent this last week as part of my devres patches but it is purely a
>> bug fix and can be merged now.
>> On a slot_reset event pci_disable_device() is never called so calling
>> pci_enable_device() will unbalance the enable count.
>> Signed-off-by: Brandon Philips <[EMAIL PROTECTED]>
>> Acked-by: Tejun Heo <[EMAIL PROTECTED]>
>
>
> ACK all the parts except s2io. Of course, I haven't seen the 
> pci_reenable_device rename patch in Jeff's tree yet...

Actually, it is in Linus's tree on git.kernel.org.

>> ---
>>  drivers/net/e100.c |2 +-
>>  drivers/net/e1000/e1000_main.c |2 +-
>>  drivers/net/ixgb/ixgb_main.c   |2 +-
>>  drivers/net/s2io.c |2 +-
>>  4 files changed, 4 insertions(+), 4 deletions(-)
>> Index: linux-2.6/drivers/net/e100.c
>> ===
>> --- linux-2.6.orig/drivers/net/e100.c
>> +++ linux-2.6/drivers/net/e100.c
>> @@ -2828,7 +2828,7 @@ static pci_ers_result_t e100_io_slot_res
>>  struct net_device *netdev = pci_get_drvdata(pdev);
>>  struct nic *nic = netdev_priv(netdev);
>>  -   if (pci_enable_device(pdev)) {
>> +if (pci_reenable_device(pdev)) {
>>  printk(KERN_ERR "e100: Cannot re-enable PCI device after 
>> reset.\n");
>>  return PCI_ERS_RESULT_DISCONNECT;
>>  }
>> Index: linux-2.6/drivers/net/e1000/e1000_main.c
>> ===
>> --- linux-2.6.orig/drivers/net/e1000/e1000_main.c
>> +++ linux-2.6/drivers/net/e1000/e1000_main.c
>> @@ -5270,7 +5270,7 @@ static pci_ers_result_t e1000_io_slot_re
>>  struct net_device *netdev = pci_get_drvdata(pdev);
>>  struct e1000_adapter *adapter = netdev->priv;
>>  -   if (pci_enable_device(pdev)) {
>> +if (pci_reenable_device(pdev)) {
>>  printk(KERN_ERR "e1000: Cannot re-enable PCI device after 
>> reset.\n");
>>  return PCI_ERS_RESULT_DISCONNECT;
>>  }
>> Index: linux-2.6/drivers/net/ixgb/ixgb_main.c
>> ===
>> --- linux-2.6.orig/drivers/net/ixgb/ixgb_main.c
>> +++ linux-2.6/drivers/net/ixgb/ixgb_main.c
>> @@ -2294,7 +2294,7 @@ static pci_ers_result_t ixgb_io_slot_res
>>  struct net_device *netdev = pci_get_drvdata(pdev);
>>  struct ixgb_adapter *adapter = netdev_priv(netdev);
>>  -   if(pci_enable_device(pdev)) {
>> +if(pci_reenable_device(pdev)) {
>>  DPRINTK(PROBE, ERR, "Cannot re-enable PCI device after 
>> reset.\n");
>>  return PCI_ERS_RESULT_DISCONNECT;
>>  }
>> Index: linux-2.6/drivers/net/s2io.c
>> ===
>> --- linux-2.6.orig/drivers/net/s2io.c
>> +++ linux-2.6/drivers/net/s2io.c
>> @@ -7833,7 +7833,7 @@ static pci_ers_result_t s2io_io_slot_res
>>  struct net_device *netdev = pci_get_drvdata(pdev);
>>  struct s2io_nic *sp = netdev->priv;
>>  -   if (pci_enable_device(pdev)) {
>> +if (pci_reenable_device(pdev)) {
>>  printk(KERN_ERR "s2io: "
>> "Cannot re-enable PCI device after reset.\n");
>>  return PCI_ERS_RESULT_DISCONNECT;
>> -
>> This SF.net email is sponsored by: Splunk Inc.
>> Still grepping through log files to find problems?  Stop.
>> Now Search log events and configuration files using AJAX and a browser.
>> Download your FREE copy of Splunk now >>  http://get.splunk.com/
>> ___
>> E1000-devel mailing list
>> [EMAIL PROTECTED]
>> https://lists.sourceforge.net/lists/listinfo/e1000-devel
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [E1000-devel] [PATCH] NET: Change pci_enable_device to pci_reenable_device to keep device enable balance

2007-08-08 Thread Kok, Auke


Brandon Philips wrote:

I sent this last week as part of my devres patches but it is purely a
bug fix and can be merged now.

On a slot_reset event pci_disable_device() is never called so calling
pci_enable_device() will unbalance the enable count.

Signed-off-by: Brandon Philips <[EMAIL PROTECTED]>
Acked-by: Tejun Heo <[EMAIL PROTECTED]>



ACK all the parts except s2io. Of course, I haven't seen the pci_reenable_device 
rename patch in Jeff's tree yet...



Auke




---
 drivers/net/e100.c |2 +-
 drivers/net/e1000/e1000_main.c |2 +-
 drivers/net/ixgb/ixgb_main.c   |2 +-
 drivers/net/s2io.c |2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/net/e100.c
===
--- linux-2.6.orig/drivers/net/e100.c
+++ linux-2.6/drivers/net/e100.c
@@ -2828,7 +2828,7 @@ static pci_ers_result_t e100_io_slot_res
struct net_device *netdev = pci_get_drvdata(pdev);
struct nic *nic = netdev_priv(netdev);
 
-	if (pci_enable_device(pdev)) {

+   if (pci_reenable_device(pdev)) {
printk(KERN_ERR "e100: Cannot re-enable PCI device after 
reset.\n");
return PCI_ERS_RESULT_DISCONNECT;
}
Index: linux-2.6/drivers/net/e1000/e1000_main.c
===
--- linux-2.6.orig/drivers/net/e1000/e1000_main.c
+++ linux-2.6/drivers/net/e1000/e1000_main.c
@@ -5270,7 +5270,7 @@ static pci_ers_result_t e1000_io_slot_re
struct net_device *netdev = pci_get_drvdata(pdev);
struct e1000_adapter *adapter = netdev->priv;
 
-	if (pci_enable_device(pdev)) {

+   if (pci_reenable_device(pdev)) {
printk(KERN_ERR "e1000: Cannot re-enable PCI device after 
reset.\n");
return PCI_ERS_RESULT_DISCONNECT;
}
Index: linux-2.6/drivers/net/ixgb/ixgb_main.c
===
--- linux-2.6.orig/drivers/net/ixgb/ixgb_main.c
+++ linux-2.6/drivers/net/ixgb/ixgb_main.c
@@ -2294,7 +2294,7 @@ static pci_ers_result_t ixgb_io_slot_res
struct net_device *netdev = pci_get_drvdata(pdev);
struct ixgb_adapter *adapter = netdev_priv(netdev);
 
-	if(pci_enable_device(pdev)) {

+   if(pci_reenable_device(pdev)) {
DPRINTK(PROBE, ERR, "Cannot re-enable PCI device after 
reset.\n");
return PCI_ERS_RESULT_DISCONNECT;
}
Index: linux-2.6/drivers/net/s2io.c
===
--- linux-2.6.orig/drivers/net/s2io.c
+++ linux-2.6/drivers/net/s2io.c
@@ -7833,7 +7833,7 @@ static pci_ers_result_t s2io_io_slot_res
struct net_device *netdev = pci_get_drvdata(pdev);
struct s2io_nic *sp = netdev->priv;
 
-	if (pci_enable_device(pdev)) {

+   if (pci_reenable_device(pdev)) {
printk(KERN_ERR "s2io: "
   "Cannot re-enable PCI device after reset.\n");
return PCI_ERS_RESULT_DISCONNECT;

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
E1000-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/e1000-devel

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Michael Buesch

On Wednesday 08 August 2007 18:59:08 Jeff Garzik wrote:
> Michael Buesch wrote:
> > writel doesn't guarantee flushing either.
> > readl does.
> 
> 
> Not quite -- there are multiple kinds of flushing.  You're thinking 
> about flushing across PCI bridges, which is correct, but you also have 
> CPU write posting and CPU write ordering and such.
> 
> Without taking all that into account, you might be tempted to think that 
> __raw_readl() will perform all flushes necessary following a 
> __raw_writel() -- but that would be incorrect.

So, kind of...
Better use writel(swab32(...
unless you like being shot into the foot.

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ewg] [PATCH 0/14] nes: NetEffect 10Gb RNIC Driver

2007-08-08 Thread Roland Dreier

 > git.openfabrics.org/~glenn/libnes.git
 > git.openfabrics.org/~glenn/ofed_1_2.git
 > git.openfabrics.org/~glenn/ofascripts.git
 > git.openfabrics.org/~glenn/ofed_1_2_scripts.git

these aren't actually git URLs.  prepending git:// seems to work though.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] e1000e: remove duplicate shadowing reference to adapter->hw

2007-08-08 Thread Auke Kok

Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 drivers/net/e1000e/netdev.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index dd4eca6..741965d 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -2981,7 +2981,6 @@ static void e1000_watchdog_task(struct work_struct *work)
} else {
/* make sure the receive unit is started */
if (adapter->flags & FLAG_RX_NEEDS_RESTART) {
-   struct e1000_hw *hw = &adapter->hw;
u32 rctl = er32(RCTL);
ew32(RCTL, rctl |
E1000_RCTL_EN);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] e1000e: Make a few functions static

2007-08-08 Thread Auke Kok

After moving code around we can reduce namespace usage
by making a few functions static.

Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 drivers/net/e1000e/e1000.h  |2 --
 drivers/net/e1000e/lib.c|2 +-
 drivers/net/e1000e/netdev.c |2 +-
 drivers/net/e1000e/phy.c|   10 ++
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index 65c31d3..a1394d6 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -365,7 +365,6 @@ extern struct e1000_info e1000_ich9_info;
 extern struct e1000_info e1000_es2_info;
 
 extern s32  e1000_commit_phy(struct e1000_hw *hw);
-extern s32  e1000_set_d0_lplu_state(struct e1000_hw *hw, bool active);
 
 extern bool e1000_enable_mng_pass_thru(struct e1000_hw *hw);
 
@@ -438,7 +437,6 @@ extern s32 e1000_phy_has_link_generic(struct e1000_hw *hw, 
u32 iterations,
   u32 usec_interval, bool *success);
 extern s32 e1000_phy_reset_dsp(struct e1000_hw *hw);
 extern s32 e1000_check_downshift(struct e1000_hw *hw);
-extern s32 e1000_wait_autoneg(struct e1000_hw *hw);
 
 static inline s32 e1000_phy_hw_reset(struct e1000_hw *hw)
 {
diff --git a/drivers/net/e1000e/lib.c b/drivers/net/e1000e/lib.c
index d11b518..3bbe63e 100644
--- a/drivers/net/e1000e/lib.c
+++ b/drivers/net/e1000e/lib.c
@@ -2289,7 +2289,7 @@ bool e1000_enable_tx_pkt_filtering(struct e1000_hw *hw)
  *
  *  Writes the command header after does the checksum calculation.
  **/
-s32 e1000_mng_write_cmd_header(struct e1000_hw *hw,
+static s32 e1000_mng_write_cmd_header(struct e1000_hw *hw,
  struct e1000_host_mng_command_header *hdr)
 {
u16 i, length = sizeof(struct e1000_host_mng_command_header);
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index c8d50cc..dd4eca6 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -48,7 +48,7 @@
 char e1000_driver_name[] = "e1000e";
 const char e1000_driver_version[] = DRV_VERSION;
 
-const struct e1000_info * e1000_info_tbl[] = {
+static const struct e1000_info *e1000_info_tbl[] = {
[board_82571]   = &e1000_82571_info,
[board_82572]   = &e1000_82572_info,
[board_82573]   = &e1000_82573_info,
diff --git a/drivers/net/e1000e/phy.c b/drivers/net/e1000e/phy.c
index d7947b0..1ccbad7 100644
--- a/drivers/net/e1000e/phy.c
+++ b/drivers/net/e1000e/phy.c
@@ -28,8 +28,10 @@
 
 #include "e1000.h"
 
-static s32  e1000_get_phy_cfg_done(struct e1000_hw *hw);
-static s32  e1000_phy_force_speed_duplex(struct e1000_hw *hw);
+static s32 e1000_get_phy_cfg_done(struct e1000_hw *hw);
+static s32 e1000_phy_force_speed_duplex(struct e1000_hw *hw);
+static s32 e1000_set_d0_lplu_state(struct e1000_hw *hw, bool active);
+static s32 e1000_wait_autoneg(struct e1000_hw *hw);
 
 /* Cable length tables */
 static const u16 e1000_m88_cable_length_table[] =
@@ -1281,7 +1283,7 @@ static s32 e1000_check_polarity_igp(struct e1000_hw *hw)
  *  Waits for auto-negotiation to complete or for the auto-negotiation time
  *  limit to expire, which ever happens first.
  **/
-s32 e1000_wait_autoneg(struct e1000_hw *hw)
+static s32 e1000_wait_autoneg(struct e1000_hw *hw)
 {
s32 ret_val = 0;
u16 i, phy_status;
@@ -1760,7 +1762,7 @@ s32 e1000_commit_phy(struct e1000_hw *hw)
  *  During driver activity, SmartSpeed should be enabled so performance is
  *  maintained.  This is a function pointer entry point called by drivers.
  **/
-s32 e1000_set_d0_lplu_state(struct e1000_hw *hw, bool active)
+static s32 e1000_set_d0_lplu_state(struct e1000_hw *hw, bool active)
 {
if (hw->phy.ops.set_d0_lplu_state)
return hw->phy.ops.set_d0_lplu_state(hw, active);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] NET: Change pci_enable_device to pci_reenable_device to keep device enable balance

2007-08-08 Thread Brandon Philips

I sent this last week as part of my devres patches but it is purely a
bug fix and can be merged now.

On a slot_reset event pci_disable_device() is never called so calling
pci_enable_device() will unbalance the enable count.

Signed-off-by: Brandon Philips <[EMAIL PROTECTED]>
Acked-by: Tejun Heo <[EMAIL PROTECTED]>

---
 drivers/net/e100.c |2 +-
 drivers/net/e1000/e1000_main.c |2 +-
 drivers/net/ixgb/ixgb_main.c   |2 +-
 drivers/net/s2io.c |2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/net/e100.c
===
--- linux-2.6.orig/drivers/net/e100.c
+++ linux-2.6/drivers/net/e100.c
@@ -2828,7 +2828,7 @@ static pci_ers_result_t e100_io_slot_res
struct net_device *netdev = pci_get_drvdata(pdev);
struct nic *nic = netdev_priv(netdev);
 
-   if (pci_enable_device(pdev)) {
+   if (pci_reenable_device(pdev)) {
printk(KERN_ERR "e100: Cannot re-enable PCI device after 
reset.\n");
return PCI_ERS_RESULT_DISCONNECT;
}
Index: linux-2.6/drivers/net/e1000/e1000_main.c
===
--- linux-2.6.orig/drivers/net/e1000/e1000_main.c
+++ linux-2.6/drivers/net/e1000/e1000_main.c
@@ -5270,7 +5270,7 @@ static pci_ers_result_t e1000_io_slot_re
struct net_device *netdev = pci_get_drvdata(pdev);
struct e1000_adapter *adapter = netdev->priv;
 
-   if (pci_enable_device(pdev)) {
+   if (pci_reenable_device(pdev)) {
printk(KERN_ERR "e1000: Cannot re-enable PCI device after 
reset.\n");
return PCI_ERS_RESULT_DISCONNECT;
}
Index: linux-2.6/drivers/net/ixgb/ixgb_main.c
===
--- linux-2.6.orig/drivers/net/ixgb/ixgb_main.c
+++ linux-2.6/drivers/net/ixgb/ixgb_main.c
@@ -2294,7 +2294,7 @@ static pci_ers_result_t ixgb_io_slot_res
struct net_device *netdev = pci_get_drvdata(pdev);
struct ixgb_adapter *adapter = netdev_priv(netdev);
 
-   if(pci_enable_device(pdev)) {
+   if(pci_reenable_device(pdev)) {
DPRINTK(PROBE, ERR, "Cannot re-enable PCI device after 
reset.\n");
return PCI_ERS_RESULT_DISCONNECT;
}
Index: linux-2.6/drivers/net/s2io.c
===
--- linux-2.6.orig/drivers/net/s2io.c
+++ linux-2.6/drivers/net/s2io.c
@@ -7833,7 +7833,7 @@ static pci_ers_result_t s2io_io_slot_res
struct net_device *netdev = pci_get_drvdata(pdev);
struct s2io_nic *sp = netdev->priv;
 
-   if (pci_enable_device(pdev)) {
+   if (pci_reenable_device(pdev)) {
printk(KERN_ERR "s2io: "
   "Cannot re-enable PCI device after reset.\n");
return PCI_ERS_RESULT_DISCONNECT;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[1/2] 2.6.23-rc2: known regressions with patches v2

2007-08-08 Thread Michal Piotrowski

Hi all,

Here is a list of some known regressions in 2.6.23-rc2
with patches available.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions

List of Aces

NameRegressions fixed since 21-Jun-2007
Adrian Bunk6
Andi Kleen 4
Andrew Morton  4
Linus Torvalds 4
Al Viro3
Cornelia Huck  3
Jens Axboe 3
Tejun Heo  3
David Woodhouse2
Hugh Dickins   2
Peter Zijlstra 2
Trent Piepho   2



Unclassified

Subject : Kconfig prompts without help text
References  : http://lkml.org/lkml/2007/7/16/326
Last known good : ?
Submitter   : Stefan Richter <[EMAIL PROTECTED]>
Caused-By   : ?
Handled-By  : Jan Engelhardt <[EMAIL PROTECTED]>
Patch   : http://lkml.org/lkml/2007/7/18/236
Status  : patch available

Subject : Oops while modprobing phy fixed module
References  : http://lkml.org/lkml/2007/7/14/63
Last known good : ?
Submitter   : Gabriel C <[EMAIL PROTECTED]>
Caused-By   : ?
Handled-By  : Satyam Sharma <[EMAIL PROTECTED]>
  Vitaly Bordug <[EMAIL PROTECTED]>
Patch1  : http://lkml.org/lkml/2007/7/18/506
Status  : patch available



FS

Subject : NFSv4 poops itself
References  : http://lkml.org/lkml/2007/7/27/144
Last known good : ?
Submitter   : Jeff Garzik <[EMAIL PROTECTED]> 
Caused-By   : ?
Handled-By  : Trond Myklebust <[EMAIL PROTECTED]>
Patch   : http://lkml.org/lkml/2007/7/27/183
Status  : patch available


Modpost

Subject : modpost bug breaks ia64 cross compilation
References  : http://lkml.org/lkml/2007/7/27/30
  http://lkml.org/lkml/2007/7/27/418
Last known good : ?
Submitter   : Jan Dittmer <[EMAIL PROTECTED]>
Caused-By   : Thomas Renninger <[EMAIL PROTECTED]>
  commit 29b71a1ca74491fab9fed09e9d835d840d042690
Handled-By  : ?
Patch   : http://lkml.org/lkml/2007/8/2/211
Status  : patch was suggested



Networking

Subject : sky2 boot crash in sky2_mac_intr
References  : http://lkml.org/lkml/2007/7/24/91
Last known good : ?
Submitter   : Florian Lohoff <[EMAIL PROTECTED]>
Caused-By   : 
Handled-By  : Stephen Hemminger <[EMAIL PROTECTED]>
Patch   : http://marc.info/?l=linux-netdev&m=118651402523966&w=2
Status  : patch available



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2/3] 2.6.23-rc2: known regressions v2

2007-08-08 Thread Michal Piotrowski

Hi all,

Here is a list of some known regressions in 2.6.23-rc2.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions

List of Aces

NameRegressions fixed since 21-Jun-2007
Adrian Bunk6
Andi Kleen 4
Andrew Morton  4
Linus Torvalds 4
Al Viro3
Cornelia Huck  3
Jens Axboe 3
Tejun Heo  3
David Woodhouse2
Hugh Dickins   2
Peter Zijlstra 2
Trent Piepho   2



CPUFREQ

Subject : ide problems: 2.6.22-git17 working, 2.6.23-rc1* is not
References  : http://lkml.org/lkml/2007/7/27/298
  http://lkml.org/lkml/2007/7/29/371
Last known good : ?
Submitter   : dth <[EMAIL PROTECTED]>
Caused-By   : Len Brown <[EMAIL PROTECTED]>
  commit f79e3185dd0f8650022518d7624c876d8929061b
Handled-By  : Len Brown <[EMAIL PROTECTED]>
Status  : problem is being debugged



FS

Subject : [NFSD OOPS] 2.6.23-rc1-git10
References  : http://lkml.org/lkml/2007/8/2/462
Last known good : ?
Submitter   : Andrew Clayton <[EMAIL PROTECTED]>
Caused-By   : ?
Handled-By  : ?
Status  : unknown



MTD

Subject : error: implicit declaration of function 'cfi_interleave'
References  : http://lkml.org/lkml/2007/8/6/272
Last known good : ?
Submitter   : Ingo Molnar <[EMAIL PROTECTED]>
Caused-By   : ?
Handled-By  : ?
Status  : unknown



Networking

Subject : IP v4 routing is broken
References  : 
http://www.stardust.webpages.pl/files/tbf/bugs/bug_report01.txt
Last known good : 2.6.22-git2
Submitter   : Uwe Bugla <[EMAIL PROTECTED]>
Caused-By   : ?
Handled-By  : ?
Status  : unknown

Subject : New wake ups from sky2
References  : http://lkml.org/lkml/2007/7/20/386
Last known good : ?
Submitter   : Thomas Meyer <[EMAIL PROTECTED]>
Caused-By   : Stephen Hemminger <[EMAIL PROTECTED]>
  commit eb35cf60e462491249166182e3e755d3d5d91a28
Handled-By  : Stephen Hemminger <[EMAIL PROTECTED]>
Status  : unknown



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Jeff Garzik


Michael Buesch wrote:

writel doesn't guarantee flushing either.
readl does.



Not quite -- there are multiple kinds of flushing.  You're thinking 
about flushing across PCI bridges, which is correct, but you also have 
CPU write posting and CPU write ordering and such.


Without taking all that into account, you might be tempted to think that 
__raw_readl() will perform all flushes necessary following a 
__raw_writel() -- but that would be incorrect.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread akepner

On Wed, Aug 08, 2007 at 09:46:16AM -0700, Roland Dreier wrote:

> 
> Not mmiowb() -- that is for ordering between CPUs, eg on systems like
> Altix where PCI transactions might get reordered in the system fabric
> before reaching the PCI bus.
> 

Yes, that's right. This is a continual source of confusion. 
FWIW, Jesse Barnes documented mmiowb() in 
Documentation/DocBook/deviceiobook.tmpl 

-- 
Arthur

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Roland Dreier

 > The barrier/ordering issue however might be a critical thing,
 > when using __raw_XXX. So one must always mmiowb() after such a write.

Not mmiowb() -- that is for ordering between CPUs, eg on systems like
Altix where PCI transactions might get reordered in the system fabric
before reaching the PCI bus.

You need a full wmb() to order between __raw_writel()s.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Michael Buesch

On Wednesday 08 August 2007 18:33:24 Jeff Garzik wrote:
> Michael Buesch wrote:
> > On Wednesday 08 August 2007 18:18:31 Roland Dreier wrote:
> >>  > But there are indeed a few cases that look wrong.
> >>
> >> yes...
> >>
> >>  > arch/x86_64/kernel/pci-calgary.c:   writel(cpu_to_be32(val), 
> >> target);
> >>
> >> eg this almost certainly wants to be
> >>
> >>writel(swab32(val), target);
> >>
> >> or something equivalent like
> >>
> >>__raw_writel(cpu_to_be32(val), target);
> >>/* plus some suffficent memory ordering */
> >>
> >>  - R.
> >>
> >>
> > 
> > certainly, yes.
> > Most likely the __raw_writel variant is portable, but I am not
> > sure. Anybody sure?
> 
> Yes, it's portable.  You must however be aware of the guarantees that 
> writel() provides and __raw_writel() does not:  no barriers or flushes, 
> no endian conversions, no ordering constraints, ...  Probably a few more 
> details I'm forgetting too :)

writel doesn't guarantee flushing either.
readl does.
The barrier/ordering issue however might be a critical thing,
when using __raw_XXX. So one must always mmiowb() after such a write.

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Roland Dreier

 > Most likely the __raw_writel variant is portable, but I am not
 > sure. Anybody sure?

Yes, it should be fine.  I use that construct in a few IB drivers.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 68] drivers/net/s2io.c: kmalloc + memset conversion to k[cz]alloc

2007-08-08 Thread Jeff Garzik


Ramkrishna Vepa wrote:

We have a few patches queued and can send this patch in along with ours.


That would be great, thanks.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Jeff Garzik

Michael Buesch wrote:

On Wednesday 08 August 2007 18:18:31 Roland Dreier wrote:

 > But there are indeed a few cases that look wrong.

yes...

 > arch/x86_64/kernel/pci-calgary.c:   writel(cpu_to_be32(val), target);

eg this almost certainly wants to be

writel(swab32(val), target);

or something equivalent like

__raw_writel(cpu_to_be32(val), target);
/* plus some suffficent memory ordering */

 - R.

certainly, yes.
Most likely the __raw_writel variant is portable, but I am not
sure. Anybody sure?

Yes, it's portable.  You must however be aware of the guarantees that 
writel() provides and __raw_writel() does not:  no barriers or flushes, 
no endian conversions, no ordering constraints, ...  Probably a few more 
details I'm forgetting too :)

Jeff

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Michael Buesch

On Wednesday 08 August 2007 18:18:31 Roland Dreier wrote:
>  > But there are indeed a few cases that look wrong.
> 
> yes...
> 
>  > arch/x86_64/kernel/pci-calgary.c:   writel(cpu_to_be32(val), target);
> 
> eg this almost certainly wants to be
> 
>   writel(swab32(val), target);
> 
> or something equivalent like
> 
>   __raw_writel(cpu_to_be32(val), target);
>   /* plus some suffficent memory ordering */
> 
>  - R.
> 
> 

certainly, yes.
Most likely the __raw_writel variant is portable, but I am not
sure. Anybody sure?

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC]: napi_struct V6

2007-08-08 Thread Jeff Garzik

Someone please make sure Documentation/networking/netdevices.txt remains 
correct with regards to APIs and locking.


It -is- up to date, unlike NAPI howto.  It should not change very much 
due to this napi_struct work though, if at all.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Jeff Garzik


Roland Dreier wrote:

 > But there are indeed a few cases that look wrong.

yes...

 > arch/x86_64/kernel/pci-calgary.c:   writel(cpu_to_be32(val), target);

eg this almost certainly wants to be

writel(swab32(val), target);

or something equivalent like

__raw_writel(cpu_to_be32(val), target);
/* plus some suffficent memory ordering */


Precisely.  Some of those cases are "we know the underlying writel() 
swaps... we want to swap in this case anyway"


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 68] drivers/net/s2io.c: kmalloc + memset conversion to k[cz]alloc

2007-08-08 Thread Ramkrishna Vepa

We have a few patches queued and can send this patch in along with ours.

Ram

> -Original Message-
> From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
> On Behalf Of Jeff Garzik
> Sent: Tuesday, August 07, 2007 2:58 PM
> To: Mariusz Kozlowski
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
Andrew
> Morton; netdev@vger.kernel.org
> Subject: Re: [PATCH 68] drivers/net/s2io.c: kmalloc + memset
conversion to
> k[cz]alloc
> 
> Mariusz Kozlowski wrote:
> > Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]>
> >
> >  drivers/net/s2io.c | 235587 -> 235340 (-247 bytes)
> >  drivers/net/s2io.o | 460768 -> 460120 (-648 bytes)
> >
> >  drivers/net/s2io.c |   14 +-
> >  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> ACK but didn't apply, please wait 24-48 hours (so that s2io fixes go
> upstream), then rediff and resend
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Jeff Garzik


Andi Kleen wrote:

Jeff Garzik <[EMAIL PROTECTED]> writes:

+   val, reg_index, addr, addr+4); */
+   writel(cpu_to_le32(reg_index), addr);
+   writel(cpu_to_le32(val),(u8 *)addr + 4);

wrong -- endian conversion macros not needed with writel()


Are you sure?


Yes.

read[bwl] and write[bwl] are always defined in terms of the 
little-endian PCI bus.  This has been true since my first days in the 
kernel ages (decade+) ago, when we had a long discussion about it with 
regards to framebuffer drivers.


If you want to skip barriers and endian conversions, __raw_write[bwl]() 
exists.


The rare exceptions are a few embedded arches that implemented writel() 
for a non-PCI bus.  Those cases need to be renamed to mybus_writel(), 
but at least they do not interfere with mainstream drivers and APIs.



I don't think that's true. e.g. powerpc writel 
doesn't convert endian


Incorrect -- read the code.  PPC most certainly does convert endian.

Ten years ago Linus said something along the lines of "writel() means 
PCI means little endian.  period."  ;-)


Jeff


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Roland Dreier

 > But there are indeed a few cases that look wrong.

yes...

 > arch/x86_64/kernel/pci-calgary.c:   writel(cpu_to_be32(val), target);

eg this almost certainly wants to be

writel(swab32(val), target);

or something equivalent like

__raw_writel(cpu_to_be32(val), target);
/* plus some suffficent memory ordering */

 - R.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFT 0/4] fib_trie cleanup patches

2007-08-08 Thread Robert Olsson

Stephen Hemminger writes:

 > I don't have a machine with anywhere near enough routes to test this,
 > so would someone with many routes give it a go and make sure nothing
 > got busted in the process.

 Hello! 

 It's not only the numbers of routes thats important... 

 Anyway I've done what can to verify that route selection (comparing
 lookups via netlink and userland longest-prefix-match using the same
 full routing table) and locking (concurrent rDoS on multiple CPU:s 
 also while loading the full routing table) is intact.

 Keep TKEY_GET_MASK in patch #2 as it gives some hint whats going on.

 And how about Paul E. McKenney comment about rcu_dereference() 
 covering the initial fetch?

 BTW. Right now the lab is setup for tests described above...

 Cheers.

--ro

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ixgbe: New driver for Pci-Express 10GbE 82598 support

2007-08-08 Thread Kok, Auke


Auke Kok wrote:

This patch adds support for the Intel 82598 PCI-Express 10GbE
chipset. Devices will be available on the market soon.

This version of the driver is largely the same as the last release:

* Driver uses a single RX and single TX queue, each using 1 MSI-X
  irq vector.
* Driver runs in NAPI mode only
* Driver is largely multiqueue-ready (TM)



For those interested in the specs of the product and all the features that it 
supports, I just noticed that the product brief was posted on our website with 
all the features and facts on the 82598-based products. Please take a look:



Overview:
  http://www.intel.com/design/network/products/lan/controllers/82598.htm

Product brief:
  http://download.intel.com/design/network/prodbrf/317796.pdf


Cheers,

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP's initial cwnd setting correct?...

2007-08-08 Thread John Heffner

I believe the current calculation is correct.  The RFC specifies a 
window of no more than 4380 bytes unless 2*MSS > 4380.  If you change 
the code in this way, then MSS=1461 will give you an initial window of 
3*MSS == 4383, violating the spec.  Reading the pseudocode in the RFC 
3390 is a bit misleading because they use a clamp at 4380 bytes rather 
than use a multiplier in the relevant range.


  -John


David Miller wrote:

From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Mon, 6 Aug 2007 15:37:15 +0300 (EEST)


@@ -805,13 +805,13 @@ void tcp_update_metrics(struct sock *sk)
}
 }
 
-/* Numbers are taken from RFC2414.  */

+/* Numbers are taken from RFC3390.  */
 __u32 tcp_init_cwnd(struct tcp_sock *tp, struct dst_entry *dst)
 {
__u32 cwnd = (dst ? dst_metric(dst, RTAX_INITCWND) : 0);
 
 	if (!cwnd) {

-   if (tp->mss_cache > 1460)
+   if (tp->mss_cache >= 2190)
cwnd = 2;
else
cwnd = (tp->mss_cache > 1095) ? 3 : 4;


I remember suggesting something similar about 5 or 6 years
ago and Alexey Kuznetsov at the time explained the numbers
which are there and why they should not be changed.

I forget the reasons though, and I'll try to do the research.

These numbers have been like this forever, FWIW.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC]: napi_struct V5

2007-08-08 Thread jamal

On Wed, 2007-08-08 at 08:14 -0700, Shirley Ma wrote:

> Dave, could you please hold this portion of the patch for a moment. I
> will test this patch ASAP. According to our previous experience, this
> changes significant changes some IPoIB driver performance.
> 
If you adjust your quantum while doing that testing you may find an
optimal value. 
Think of a box where you have other network interfaces, the way you
are implementing currently implies you are going to be very unfair to 
the other interfaces on the box. 

cheers,
jamal  


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP's initial cwnd setting correct?...

2007-08-08 Thread John Heffner


That sounds right to me.

  -John


Ilpo Järvinen wrote:

On Mon, 6 Aug 2007, Ilpo Järvinen wrote:

...Goto logic could be cleaner (somebody has any suggestion for better 
way to structure it?)


...I could probably move the setting of snd_cwnd earlier to avoid 
this problem if this seems a valid fix at all.




-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 3/14] nes: connection manager routines

2007-08-08 Thread Glenn Grundstrom

This code is far from a TCP stack.  It's main purpose is to exchange
RDMA MPA request/response messages that are required by the iWarp specs
and therefore needed by our hardware.  All RNIC hardware vendors need
this MPA message exchange, including those already accepted into
kernel.org.  Do you have an alternative suggestion?

Thanks,
Glenn.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andi Kleen
Sent: Wednesday, August 08, 2007 7:41 AM
To: Glenn Grundstrom
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; netdev@vger.kernel.org
Subject: Re: [PATCH 3/14] nes: connection manager routines

[EMAIL PROTECTED] writes:

> NetEffect connection manager routines.

This seems more like a new TCP stack. I don't think such code is
appropiate, since Linux already has one.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/9 Rev3] [sched] Modify qdisc_run to support batching

2007-08-08 Thread jamal

On Wed, 2007-08-08 at 16:05 +0200, Patrick McHardy wrote:

> This is not completely avoidable of
> course, but you can and should limit the damage by only pulling
> out as much packets as the driver can take and have the driver
> stop the queue afterwards.

This why the dev->xmit_win exists in my patches. The driver says how
much space it has using this variable - and only those packets get
pulled off; so there is no conflict with any scheduling algorithm be it
work/non-work conserving. 
The ideal case is never to carry over anything in dev->blist. However,
there is a challenge with GSO/TSO: if say the descriptors in the driver
are less than the list of skbs, you are faced with the dilema of sending
a fraction of the gso list first.
My current approach is to transmit as much as the driver would allow.
On the next opportunity to transmit, you do immediately send anything
remaining first before you ask the qdisc for more packets. An
alternative approach i had entertained was not to send anything from the
skb list when hitting such a case, but it hasnt seem needed so far
(havent seen any leftovers from my experiments).

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-08 Thread jamal

On Wed, 2007-08-08 at 21:42 +0800, Herbert Xu wrote:
> On Wed, Aug 08, 2007 at 03:49:00AM -0700, David Miller wrote:
> > 
> > Not because I think it obviates your work, but rather because I'm
> > curious, could you test a TSO-in-hardware driver converted to
> > batching and see how TSO alone compares to batching for a pure
> > TCP workload?
> 
> You could even lower the bar by disabling TSO and enabling
> software GSO.

>From my observation for TCP packets slightly above MDU (upto 2K), GSO
gives worse performance than non-GSO throughput-wise. Actually this has
nothing to do with batching, rather the behavior is consistent with or
without batching changes. 

> > I personally don't think it will help for that case at all as
> > TSO likely does better job of coalescing the work _and_ reducing
> > bus traffic as well as work in the TCP stack.
> 
> I agree.
> I suspect the bulk of the effort is in getting
> these skb's created and processed by the stack so that by
> the time that they're exiting the qdisc there's not much
> to be saved anymore.

pktgen shows a clear win if you test the driver path - which is what you
should test because thats where the batching changes are. 
Using TCP or UDP adds other variables[1] that need to be isolated first
in order to quantify the effect of batching. 
For throughput and CPU utilization, the benefit will be clear when there
are a lot more flows. 

cheers,
jamal

[1] I think there are too many other variables in play unfortunately
when you are dealing with a path that starts above the driver and one
that covers end to end effect:  traffic/app source, system clock sources
as per my recent discovery, congestion control algorithms used, tuning
of recevier etc.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 6/14] nes: hardware init

2007-08-08 Thread Glenn Grundstrom

Jeff,

Thanks for the input.  I'll take your suggestions into account for the
patch v2 posting.

Glenn.

-Original Message-
From: Jeff Garzik [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 07, 2007 8:58 PM
To: Glenn Grundstrom
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; netdev@vger.kernel.org
Subject: Re: [PATCH 6/14] nes: hardware init

[EMAIL PROTECTED] wrote:
> +struct nes_adapter *nes_init_adapter(struct nes_device *nesdev, u8
hw_rev) {
> + struct nes_adapter *nesadapter = NULL;
> + unsigned long num_pds;
> + u32 u32temp;
> + u32 port_count;
> + u16 max_rq_wrs;
> + u16 max_sq_wrs;
> + u32 max_mr;
> + u32 max_256pbl;
> + u32 max_4kpbl;
> + u32 max_qp;
> + u32 max_irrq;
> + u32 max_cq;
> + u32 hte_index_mask;
> + u32 adapter_size;
> + u32 arp_table_size;
> + u8  OneG_Mode;
> +
> + /* search the list of existing adapters */
> + list_for_each_entry(nesadapter, &nes_adapter_list, list) {
> + dprintk("Searching Adapter list for PCI devfn = 0x%X,"
> + " adapter PCI slot/bus = %u/%u, pci
devices PCI slot/bus = %u/%u, .\n",
> + nesdev->pcidev->devfn,
> + PCI_SLOT(nesadapter->devfn),
> + nesadapter->bus_number,
> + PCI_SLOT(nesdev->pcidev->devfn),
> + nesdev->pcidev->bus->number );
> + if ((PCI_SLOT(nesadapter->devfn) ==
PCI_SLOT(nesdev->pcidev->devfn)) &&
> + (nesadapter->bus_number ==
nesdev->pcidev->bus->number)) {
> + nesadapter->ref_count++;
> + return(nesadapter);

you don't need any of this PCI bus scanning at all.  Please convert to
normal PCI usage



> + /* no adapter found */
> + num_pds = pci_resource_len(nesdev->pcidev, BAR_1) / 4096;

see, this is why the BAR_1 define should go away -- it's actually define

to the value '2'


> + if (hw_rev != NE020_REV) {
> + dprintk("%s: NE020 driver detected unknown hardware
revision 0x%x\n",
> + __FUNCTION__, hw_rev);
> + return(NULL);
> + }

move this test to the top of the function


> + dprintk("%s:%u Determine Soft Reset, QP_control=0x%x, CPU0=0x%x,
CPU1=0x%x, CPU2=0x%x\n",
> + __FUNCTION__, __LINE__,
> + nes_read_indexed(nesdev, NES_IDX_QP_CONTROL +
PCI_FUNC(nesdev->pcidev->devfn) * 8),
> + nes_read_indexed(nesdev,
NES_IDX_INT_CPU_STATUS),
> + nes_read_indexed(nesdev, 0x00A4),
> + nes_read_indexed(nesdev, 0x00A8));
> +
> + dprintk("%s: Reset and init NE020\n", __FUNCTION__);
> + if ((port_count = nes_reset_adapter_ne020(nesdev,
&OneG_Mode)) == 0) {
> + return(NULL);
> + }
> + if (nes_init_serdes(nesdev, port_count)) {
> + return(NULL);
> + }

kill braces


> + nes_init_csr_ne020(nesdev, hw_rev, port_count);
> +
> + /* Setup and enable the periodic timer */
> + nesdev->et_rx_coalesce_usecs_irq = interrupt_mod_interval;
> + if (nesdev->et_rx_coalesce_usecs_irq) {
> + nes_write32(nesdev->regs+NES_PERIODIC_CONTROL,
0x8000 |
> + ((u32)(nesdev->et_rx_coalesce_usecs_irq
* 8)));
> + } else {
> + nes_write32(nesdev->regs+NES_PERIODIC_CONTROL,
0x);
> + }
> +
> + max_qp = nes_read_indexed(nesdev, NES_IDX_QP_CTX_SIZE);
> + dprintk("%s: QP_CTX_SIZE=%u\n", __FUNCTION__, max_qp);
> +
> + u32temp = nes_read_indexed(nesdev,
NES_IDX_QUAD_HASH_TABLE_SIZE);
> + if (max_qp > ((u32)1 << (u32temp & 0x001f))) {
> + dprintk("Reducing Max QPs to %u due to hash table size =
0x%08X\n",
> + max_qp, u32temp);
> + max_qp = (u32)1 << (u32temp & 0x001f);
> + }
> +
> + hte_index_mask = ((u32)1 << ((u32temp & 0x001f)+1))-1;
> + dprintk("Max QP = %u, hte_index_mask = 0x%08X.\n", max_qp,
hte_index_mask);
> +
> + u32temp = nes_read_indexed(nesdev, NES_IDX_IRRQ_COUNT);
> +
> + max_irrq = 1 << (u32temp & 0x001f);
> +
> + if (max_qp > max_irrq) {
> + max_qp = max_irrq;
> + dprintk("Reducing Max QPs to %u due to Available
Q1s.\n", max_qp);
> + }
> +
> + /* there should be no reason to allocate more pds than qps */
> + if (num_pds > max_qp)
> + num_pds = max_qp;
> +
> + u32temp = nes_read_indexed(nesdev, NES_IDX_MRT_SIZE);
> + max_mr = (u32)8192 << (u32temp & 0x7);
> +
> + u32temp = nes_read_indexed(nesdev, NES_IDX_PBL_REGION_SIZE);
> + max_256pbl = (u32)1 << (u32temp & 0x001f);
> + max_4kpbl = (u32)1 << ((u32temp >> 16) & 0x001f);
> + max_cq = nes_read_indexed(nesdev, NES_IDX_CQ_CTX_SIZE);
> +
> + u32temp = nes_read_indexed(nesdev, NES_ID

Re: [PATCH 3/9 Rev3] [sched] Modify qdisc_run to support batching

2007-08-08 Thread Patrick McHardy

Krishna Kumar wrote:
> + * Algorithm to get skb(s) is:
> + *   - Non batching drivers, or if the batch list is empty and there is
> + * 1 skb in the queue - dequeue skb and put it in *skbp to tell the
> + * caller to use the single xmit API.
> + *   - Batching drivers where the batch list already contains atleast one
> + * skb, or if there are multiple skbs in the queue: keep dequeue'ing
> + * skb's upto a limit and set *skbp to NULL to tell the caller to use
> + * the multiple xmit API.
> + *
> + * Returns:
> + *   1 - atleast one skb is to be sent out, *skbp contains skb or NULL
> + *   (in case >1 skbs present in blist for batching)
> + *   0 - no skbs to be sent.
> + */
> +static inline int get_skb(struct net_device *dev, struct Qdisc *q,
> +   struct sk_buff_head *blist, struct sk_buff **skbp)
> +{
> + if (likely(!blist || (!skb_queue_len(blist) && qdisc_qlen(q) <= 1))) {
> + return likely((*skbp = dev_dequeue_skb(dev, q)) != NULL);
> + } else {
> + int max = dev->tx_queue_len - skb_queue_len(blist);
> + struct sk_buff *skb;
> +
> + while (max > 0 && (skb = dev_dequeue_skb(dev, q)) != NULL)
> + max -= dev_add_skb_to_blist(skb, dev);
> +
> + *skbp = NULL;
> + return 1;   /* we have atleast one skb in blist */
> + }
> +}

I think you missed my previous reply to this in the flood of
responses (or I missed your reply), so I'm copying it below:

The entire idea of a single queue after qdiscs that is refilled
independantly of transmissions times etc. make be worry a bit.
By changing the timing you're effectively changing the qdiscs
behaviour, at least in some cases. SFQ is a good example, but I
believe it affects most work-conserving qdiscs. Think of this
situation:

100 packets of flow 1 arrive
50 packets of flow 1 are sent
100 packets for flow 2 arrive
remaining packets are sent

On the wire you'll first see 50 packets of flow 1, than 100 packets
alternate of flow 1 and 2, then 50 packets flow 2.

With your additional queue all packets of flow 1 are pulled out of
the qdisc immediately and put in the fifo. When the 100 packets of
the second flow arrive they will also get pulled out immediately
and are put in the fifo behind the remaining 50 packets of flow 1.
So what you get on the wire is:

100 packets of flow 1
100 packets of flow 1

So SFQ is without any effect. This is not completely avoidable of
course, but you can and should limit the damage by only pulling
out as much packets as the driver can take and have the driver
stop the queue afterwards.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Michael Buesch

On Wednesday 08 August 2007 15:48:25 Michael Buesch wrote:
> > However if that's true then there are quite some drivers wrong:
> > 
> > % grep -r write[bwl]\(cpu_to *   | wc -l
> > 57
> 
> Yeah, seems so.

Most of them seem to be
 __raw_writew(cpu_toXX(...

which I think might be valid.
But there are indeed a few cases that look wrong.

arch/mips/pci/ops-bonito64.c:   writel(cpu_to_le32(*data), addrp);
arch/x86_64/kernel/pci-calgary.c:   writel(cpu_to_be32(val), target);
arch/x86_64/kernel/pci-calgary.c:   writel(cpu_to_be32(val), target);
arch/x86_64/kernel/pci-calgary.c:   writel(cpu_to_be32(val32), 
target);
arch/x86_64/kernel/pci-calgary.c:   writel(cpu_to_be32(val), target);
arch/x86_64/kernel/pci-calgary.c:   writel(cpu_to_be32(val32), target);
arch/x86_64/kernel/pci-calgary.c:   writel(cpu_to_be32(val32), target);
drivers/atm/fore200e.c:writel(cpu_to_le32(val), addr);
drivers/net/starfire.c: writew(cpu_to_be16(eaddrs[2]), setup_frm); 
setup_frm += 4;
drivers/net/starfire.c: writew(cpu_to_be16(eaddrs[1]), setup_frm); 
setup_frm += 4;
drivers/net/starfire.c: writew(cpu_to_be16(eaddrs[0]), setup_frm); 
setup_frm += 8;
drivers/net/starfire.c: writew(cpu_to_be16(i), 
filter_addr);
drivers/net/starfire.c: writew(cpu_to_be16(eaddrs[2]), 
filter_addr); filter_addr += 4;
drivers/net/starfire.c: writew(cpu_to_be16(eaddrs[1]), 
filter_addr); filter_addr += 4;
drivers/net/starfire.c: writew(cpu_to_be16(eaddrs[0]), 
filter_addr); filter_addr += 8;
drivers/net/starfire.c: writew(cpu_to_be16(eaddrs[0]), 
filter_addr); filter_addr += 4;
drivers/net/starfire.c: writew(cpu_to_be16(eaddrs[1]), 
filter_addr); filter_addr += 4;
drivers/net/starfire.c: writew(cpu_to_be16(eaddrs[2]), 
filter_addr); filter_addr += 8;
drivers/net/starfire.c: writew(cpu_to_be16(eaddrs[0]), 
filter_addr); filter_addr += 4;
drivers/net/starfire.c: writew(cpu_to_be16(eaddrs[1]), 
filter_addr); filter_addr += 4;
drivers/net/starfire.c: writew(cpu_to_be16(eaddrs[2]), 
filter_addr); filter_addr += 8;
drivers/net/hamachi.c:  writel(cpu_to_le64(hmp->rx_ring_dma), ioaddr + RxPtr);
drivers/net/hamachi.c:  writel(cpu_to_le64(hmp->rx_ring_dma) >> 32, ioaddr + 
RxPtr + 4);
drivers/net/hamachi.c:  writel(cpu_to_le64(hmp->tx_ring_dma), ioaddr + TxPtr);
drivers/net/hamachi.c:  writel(cpu_to_le64(hmp->tx_ring_dma) >> 32, ioaddr + 
TxPtr + 4);
drivers/net/hamachi.c:  writel(cpu_to_le32(hmp->rx_ring_dma), ioaddr + RxPtr);
drivers/net/hamachi.c:  writel(cpu_to_le32(hmp->tx_ring_dma), ioaddr + TxPtr);
drivers/net/tokenring/lanstreamer.c:
writel(cpu_to_le32(pci_map_single(streamer_priv->pci_dev, rx_ring, 512, 
PCI_DMA_FROMDEVICE)),
drivers/net/tokenring/lanstreamer.c:
writel(cpu_to_le32(pci_map_single(streamer_priv->pci_dev, 
&streamer_priv->streamer_rx_ring[0],
drivers/net/tokenring/lanstreamer.c:
writel(cpu_to_le32(pci_map_single(streamer_priv->pci_dev, 
&streamer_priv->streamer_rx_ring[STREAMER_RX_RING_SIZE - 1],
drivers/net/tokenring/lanstreamer.c:
writel(cpu_to_le32(pci_map_single(streamer_priv->pci_dev, 
drivers/net/tokenring/lanstreamer.c:
writel(cpu_to_le32(pci_map_single(streamer_priv->pci_dev, 
drivers/net/tokenring/lanstreamer.c:
writel(cpu_to_le32(pci_map_single(streamer_priv->pci_dev, 
drivers/net/via-velocity.c: writel(cpu_to_le32(vptr->rd_pool_dma), 
®s->RDBaseLo);
drivers/net/via-velocity.c: 
writel(cpu_to_le32(vptr->td_pool_dma[i]), &(regs->TDBaseLo[i]));
drivers/scsi/nsp32_io.h:writew(cpu_to_le16(val), ptr);
drivers/scsi/nsp32_io.h:writel(cpu_to_le32(val), ptr);
drivers/scsi/nsp32_io.h:writew(cpu_to_le16(val), data_ptr );
drivers/block/umem.c:   writel(cpu_to_le32((page->page_dma+offset)&0x),
drivers/block/umem.c:   writel(cpu_to_le32(((u64)page->page_dma)>>32),
drivers/block/umem.c:   writel(cpu_to_le32(DMASCR_GO | DMASCR_CHAIN_EN | 
pci_cmds),
drivers/block/umem.c:   
writel(cpu_to_le32(DMASCR_DMA_COMPLETE|DMASCR_CHAIN_COMPLETE),

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Michael Buesch

On Wednesday 08 August 2007 15:38:50 Andi Kleen wrote:
> On Wed, Aug 08, 2007 at 03:28:33PM +0200, Michael Buesch wrote:
> > On Wednesday 08 August 2007 15:08:28 Andi Kleen wrote:
> > > On Wed, Aug 08, 2007 at 03:02:35PM +0200, Michael Buesch wrote:
> > > > On Wednesday 08 August 2007 14:55:11 Andi Kleen wrote:
> > > > > On Wed, Aug 08, 2007 at 01:50:35PM +0200, Michael Buesch wrote:
> > > > > > On Wednesday 08 August 2007 14:38:10 Andi Kleen wrote:
> > > > > > > Jeff Garzik <[EMAIL PROTECTED]> writes:
> > > > > > > > > + val, reg_index, addr, addr+4); */
> > > > > > > > > + writel(cpu_to_le32(reg_index), addr);
> > > > > > > > > + writel(cpu_to_le32(val),(u8 *)addr + 4);
> > > > > > > > 
> > > > > > > > wrong -- endian conversion macros not needed with writel()
> > 
> > > > Fact is that we do _not_ need cpu_to_le32 with writel.
> > > 
> > > We do on a big endian platform if the register is LE. I assume that's the 
> > > case 
> > > on this hardware.
> > 
> > That is not true.
> > writeX does automatically convert to bus-endian.
> > Which, in case of the PCI bus, is little endian.
> > So if your register is LE (which it is most likely), you don't
> > need any conversion. If your register is BE (which I very much doubt),
> > then you need swab32().
> > In _no_ case you need cpu_to_xx().
> 
> Hmm, I checked a couple of BE architectures and none seem to convert 
> explicitely.

That depends on the arch.
Look at this from ppc, for example:

184 static inline void writel(__u32 b, volatile void __iomem *addr)
185 {
186 out_le32(addr, b);
187 }

out_le32 writes a little endian value. So it converts it.
Also note that b is __u32. Sparse would complain, if someone used cpu_to_xx
on it.

> But ok it's possible that their PCI bridges convert. I'll take your
> word for it although it sounds somewhat dubious.
>
> However if that's true then there are quite some drivers wrong:
> 
> % grep -r write[bwl]\(cpu_to *   | wc -l
> 57

Yeah, seems so.
Here comes an example (with 16bit values)

Little endian register
LittleEndian arch   BigEngian arch
value   0xBBAA  0xAABB
writew  0xBBAA  0xAABB
on bus  0xBBAA  0xBBAA
on dev  0xBBAA  0xBBAA

Big endian register
LittleEndian arch   BigEngian arch
value   0xBBAA  0xAABB
swab16  0xAABB  0xBBAA
writew  0xAABB  0xBBAA
on bus  0xAABB  0xAABB
on dev  0xAABB  0xAABB

I hope I got it right. :)
Same result on every arch.

Most hardware has little endian registers. Some can switch
endianess based on some bit, too. So in case of a BE register that
bit has to be flipped, or if not possible swabX() has to be used.

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-08 Thread Herbert Xu

On Wed, Aug 08, 2007 at 03:49:00AM -0700, David Miller wrote:
> 
> Not because I think it obviates your work, but rather because I'm
> curious, could you test a TSO-in-hardware driver converted to
> batching and see how TSO alone compares to batching for a pure
> TCP workload?

You could even lower the bar by disabling TSO and enabling
software GSO.

> I personally don't think it will help for that case at all as
> TSO likely does better job of coalescing the work _and_ reducing
> bus traffic as well as work in the TCP stack.

I agree.  I suspect the bulk of the effort is in getting
these skb's created and processed by the stack so that by
the time that they're exiting the qdisc there's not much
to be saved anymore.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] drivers/net/wireless/wl3501_cs.c: remove redundant memset

2007-08-08 Thread John W. Linville

On Wed, Aug 08, 2007 at 07:58:21AM +0200, Mariusz Kozlowski wrote:
> > Please send wireless patches to [EMAIL PROTECTED],
> > and CC me.
> 
> Ok. Did you pick the patch up?

Yes, I have it.

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Andi Kleen

On Wed, Aug 08, 2007 at 03:28:33PM +0200, Michael Buesch wrote:
> On Wednesday 08 August 2007 15:08:28 Andi Kleen wrote:
> > On Wed, Aug 08, 2007 at 03:02:35PM +0200, Michael Buesch wrote:
> > > On Wednesday 08 August 2007 14:55:11 Andi Kleen wrote:
> > > > On Wed, Aug 08, 2007 at 01:50:35PM +0200, Michael Buesch wrote:
> > > > > On Wednesday 08 August 2007 14:38:10 Andi Kleen wrote:
> > > > > > Jeff Garzik <[EMAIL PROTECTED]> writes:
> > > > > > > > +   val, reg_index, addr, addr+4); */
> > > > > > > > +   writel(cpu_to_le32(reg_index), addr);
> > > > > > > > +   writel(cpu_to_le32(val),(u8 *)addr + 4);
> > > > > > > 
> > > > > > > wrong -- endian conversion macros not needed with writel()
> 
> > > Fact is that we do _not_ need cpu_to_le32 with writel.
> > 
> > We do on a big endian platform if the register is LE. I assume that's the 
> > case 
> > on this hardware.
> 
> That is not true.
> writeX does automatically convert to bus-endian.
> Which, in case of the PCI bus, is little endian.
> So if your register is LE (which it is most likely), you don't
> need any conversion. If your register is BE (which I very much doubt),
> then you need swab32().
> In _no_ case you need cpu_to_xx().

Hmm, I checked a couple of BE architectures and none seem to convert 
explicitely.
But ok it's possible that their PCI bridges convert. I'll take your
word for it although it sounds somewhat dubious.

However if that's true then there are quite some drivers wrong:

% grep -r write[bwl]\(cpu_to *   | wc -l
57

-Andi

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-08 Thread Evgeniy Polyakov

On Wed, Aug 08, 2007 at 02:17:09PM +0400, Evgeniy Polyakov ([EMAIL PROTECTED]) 
wrote:
> This throttling mechanism allows to limit maximum amount of queued bios 
> per physical device. By default it is turned off and old block layer 
> behaviour with unlimited number of bios is used. When turned on (queue
> limit is set to something different than -1U via blk_set_queue_limit()),
> generic_make_request() will sleep until there is room in the queue.
> number of bios is increased in generic_make_request() and reduced either
> in bio_endio(), when bio is completely processed (bi_size is zero), and
> recharged from original queue when new device is assigned to bio via
> blk_set_bdev(). All oerations are not atomic, since we do not care about
> precise number of bios, but a fact, that we are close or close enough to
> the limit.
> 
> Tested on distributed storage device - with limit of 2 bios it works
> slow :)

As addon I can cook up a patch to configure this via sysfs if needed.
Thoughts?

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Michael Buesch

On Wednesday 08 August 2007 15:08:28 Andi Kleen wrote:
> On Wed, Aug 08, 2007 at 03:02:35PM +0200, Michael Buesch wrote:
> > On Wednesday 08 August 2007 14:55:11 Andi Kleen wrote:
> > > On Wed, Aug 08, 2007 at 01:50:35PM +0200, Michael Buesch wrote:
> > > > On Wednesday 08 August 2007 14:38:10 Andi Kleen wrote:
> > > > > Jeff Garzik <[EMAIL PROTECTED]> writes:
> > > > > > > + val, reg_index, addr, addr+4); */
> > > > > > > + writel(cpu_to_le32(reg_index), addr);
> > > > > > > + writel(cpu_to_le32(val),(u8 *)addr + 4);
> > > > > > 
> > > > > > wrong -- endian conversion macros not needed with writel()

> > Fact is that we do _not_ need cpu_to_le32 with writel.
> 
> We do on a big endian platform if the register is LE. I assume that's the 
> case 
> on this hardware.

That is not true.
writeX does automatically convert to bus-endian.
Which, in case of the PCI bus, is little endian.
So if your register is LE (which it is most likely), you don't
need any conversion. If your register is BE (which I very much doubt),
then you need swab32().
In _no_ case you need cpu_to_xx().

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/9 Rev3] [IPoIB] CM & Multicast changes

2007-08-08 Thread Krishna Kumar

IPoIB CM & Multicast changes based on header file changes.

Signed-off-by: Krishna Kumar <[EMAIL PROTECTED]>
---
 ipoib_cm.c|   13 +
 ipoib_multicast.c |4 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

diff -ruNp ORG/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
NEW/drivers/infiniband/ulp/ipoib/ipoib_cm.c
--- ORG/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-07-17 08:48:35.0 
+0530
+++ NEW/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-08-07 13:11:19.0 
+0530
@@ -493,14 +493,19 @@ static inline int post_send(struct ipoib
unsigned int wr_id,
u64 addr, int len)
 {
+   int ret;
struct ib_send_wr *bad_wr;
 
-   priv->tx_sge.addr = addr;
-   priv->tx_sge.length   = len;
+   priv->tx_sge[0].addr  = addr;
+   priv->tx_sge[0].length= len;
+
+   priv->tx_wr[0].wr_id  = wr_id;
 
-   priv->tx_wr.wr_id = wr_id;
+   priv->tx_wr[0].next = NULL;
+   ret = ib_post_send(tx->qp, priv->tx_wr, &bad_wr);
+   priv->tx_wr[0].next = &priv->tx_wr[1];
 
-   return ib_post_send(tx->qp, &priv->tx_wr, &bad_wr);
+   return ret;
 }
 
 void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct 
ipoib_cm_tx *tx)
diff -ruNp ORG/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 
NEW/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
--- ORG/drivers/infiniband/ulp/ipoib/ipoib_multicast.c  2007-07-12 
08:55:06.0 +0530
+++ NEW/drivers/infiniband/ulp/ipoib/ipoib_multicast.c  2007-08-07 
13:11:19.0 +0530
@@ -217,7 +217,7 @@ static int ipoib_mcast_join_finish(struc
if (!memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
sizeof (union ib_gid))) {
priv->qkey = be32_to_cpu(priv->broadcast->mcmember.qkey);
-   priv->tx_wr.wr.ud.remote_qkey = priv->qkey;
+   priv->tx_wr[0].wr.ud.remote_qkey = priv->qkey;
}
 
if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) {
@@ -736,7 +736,7 @@ out:
}
}
 
-   ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN);
+   ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN, 1);
}
 
 unlock:
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] stuff from tcp-2.6 partially merged to upcoming net-2.6.24?

2007-08-08 Thread Ilpo Järvinen

On Wed, 8 Aug 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
> Date: Tue, 7 Aug 2007 16:19:58 +0300 (EEST)
> 
> > Do you have any suggestion how I should proceed? Or do you perhaps object 
> > such partial merge completely? ...I could try to come up with a cleaned up 
> > patch series which has original and their bug fix parts combined to a 
> > single patch per change (would provide cleaner history and shouldn't be 
> > very hard to do either)...
> 
> Thank you for doing the rebase.  I agree with you that we should
> seperate out as much of the non-rb-tree stuff as possible and put it
> into net-2.6.24

...It would also help a lot in case we at some point of time in the future 
decide to merge rb-tree stuff but then have to back up to have it merged 
in a cleaner state which is easier to revert than with cleanups 
combined... Besides once I get some additional bits done (haven't yet 
dared to do L-bit cascade surgery which will fix timedout loop entry 
bugs and allow dropping of retrans hint counter), long enough mm sit would 
be nice route to rbtree stuff since there it's getting at least bit more
versitile testing that we alone can put to it...

> I'll will try hard to make time to look at your rebase and try to move
> things forward. 

...I case you want to validate them, git-patch-id helps to exclude 
identical changes between before and after allowing you to spend more
time on things that required non-trivial resolution...

> I've been all-consumed by the NAPI work and a driver I've been writing 
> in the background for the past few weeks.

...I can try to help... I suggest that as first step we take all changes 
that do not cause any conflicts (I've already tried cherry-picks), only 
thing I'm not sure whether I should combine change+fix parts or keep 
them as they were in tcp-2.6 (I suggest we combine them but you may 
disagree..?). I can either post them as series or give you them in 
--pretty=oneline format if you want to cherry-pick them yourself (in 
that case having a common tree would help a bit as commit ids would
match as well then)).

However, at least highest_sack inclusion (no space loss as one hint 
counter can be then dropped and allows accurate fackets out which I've 
partially coded already) will need also SACK-block validator. But I
basically have to redo as the validator patch was sacktag reorganization 
based previously (I've already found couple of improvements too to it's
accuracy :-))...

-- 
 i.

Re: [PATCH 2/14] nes: device structures and defines

2007-08-08 Thread Andi Kleen

On Wed, Aug 08, 2007 at 03:02:35PM +0200, Michael Buesch wrote:
> On Wednesday 08 August 2007 14:55:11 Andi Kleen wrote:
> > On Wed, Aug 08, 2007 at 01:50:35PM +0200, Michael Buesch wrote:
> > > On Wednesday 08 August 2007 14:38:10 Andi Kleen wrote:
> > > > Jeff Garzik <[EMAIL PROTECTED]> writes:
> > > > > > +   val, reg_index, addr, addr+4); */
> > > > > > +   writel(cpu_to_le32(reg_index), addr);
> > > > > > +   writel(cpu_to_le32(val),(u8 *)addr + 4);
> > > > > 
> > > > > wrong -- endian conversion macros not needed with writel()
> > > > 
> > > > Are you sure? I don't think that's true. e.g. powerpc writel 
> > > > doesn't convert endian
> > > 
> > > Andi, you are wrong.
> > > writeX take CPU endian args.
> > 
> > That is what I wrote.
> 
> Uhm, so
> Why did you complain to Jeff?

Because Jeff wrote the opposite.

> I'm a little bit confused now :)
> 
> Fact is that we do _not_ need cpu_to_le32 with writel.

We do on a big endian platform if the register is LE. I assume that's the case 
on this hardware.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 151 matches

Mail list logo