Re: 2.6.19 and up to 2.6.20-rc2 Ethernet problems x86_64

2007-01-05 Thread Sid Boyce

Len Brown wrote:
..same problem with 2.6.20-rc3. Last worked with 
2.6.19-rc6-git12, so it was 2.6.19 where it failed.




  
Attaching both case1 normal, case2 acpi=noirq. With acpi=noirq ethernet 
doesn't get configured, route -n says it's an Unsupported operation, 
ifconfig only shows for localhost, ifconfig eth0 192.168.10.5 also 
complains of a config error.



It seems that the "acpi=noirq" (and probably also the acpi=off) case
is simply an additional broken case, not a success case to compare to.

The thing we really want to compare is dmesg and /proc/interrupts
from 2.6.19-rc6-git12, and the broken current release.
Perhaps you can put that info in the bug report when you file it.

thanks,
-Len



  
2.6.19-rc6-git12 fails in exactly the same way, from /var/log/messages 
it seems 2.6.19-rc6 19/11/06 first saw the problem, details later when I 
boot 2.6.19-rc5.
If I boot an affected kernel with the SuSEfirewall2 enabled and then 
stop the firewall, the problems goes away.

Regards
Sid.

--
Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support Specialist, 
Cricket Coach
Microsoft Windows Free Zone - Linux used for all Computing Tasks


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-05 Thread Herbert Xu
On Fri, Jan 05, 2007 at 07:38:44AM +0100, Jarek Poplawski wrote:
> 
> I'd only suggest to change "goto out;" to
> "return NULL;" at the end of inetdev_init because
> now RCU is engaged unnecessarily.

I agree.  The RCU assignment should come before the out label.
Can you send a patch?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] devinet: inetdev_init out label moved after RCU assignment

2007-01-05 Thread Jarek Poplawski
On Fri, Jan 05, 2007 at 08:38:47PM +1100, Herbert Xu wrote:
> On Fri, Jan 05, 2007 at 07:38:44AM +0100, Jarek Poplawski wrote:
> > 
> > I'd only suggest to change "goto out;" to
> > "return NULL;" at the end of inetdev_init because
> > now RCU is engaged unnecessarily.
> 
> I agree.  The RCU assignment should come before the out label.
> Can you send a patch?

Why me? (I didn't spoil this!)

Cheers,
Jarek P.

PS: should be applied after David's Stevens patch from 2007.01.04
---

Subject: [PATCH] devinet: inetdev_init out label moved after RCU assignment

inetdev_init out label moved after RCU assignment
(final suggestion by Herbert Xu)

Signed-off-by: Jarek Poplawski <[EMAIL PROTECTED]>

---

diff -Nurp linux-2.6.20-rc3-/net/ipv4/devinet.c 
linux-2.6.20-rc3/net/ipv4/devinet.c
--- linux-2.6.20-rc3-/net/ipv4/devinet.c2007-01-05 11:53:16.0 
+0100
+++ linux-2.6.20-rc3/net/ipv4/devinet.c 2007-01-05 11:55:32.0 +0100
@@ -174,9 +174,10 @@ struct in_device *inetdev_init(struct ne
ip_mc_init_dev(in_dev);
if (dev->flags & IFF_UP)
ip_mc_up(in_dev);
-out:
+
/* we can receive as soon as ip_ptr is set -- do this last */
rcu_assign_pointer(dev->ip_ptr, in_dev);
+out:
return in_dev;
 out_kfree:
kfree(in_dev);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] devinet: inetdev_init out label moved after RCU assignment

2007-01-05 Thread Herbert Xu
On Fri, Jan 05, 2007 at 12:19:10PM +0100, Jarek Poplawski wrote:
> 
> Why me? (I didn't spoil this!)

You spotted the problem, so it's only fair that you get the credit :)

> Subject: [PATCH] devinet: inetdev_init out label moved after RCU assignment
> 
> inetdev_init out label moved after RCU assignment
> (final suggestion by Herbert Xu)
> 
> Signed-off-by: Jarek Poplawski <[EMAIL PROTECTED]>

Thanks, looks good to me.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] devinet: inetdev_init out label moved after RCU assignment

2007-01-05 Thread Jarek Poplawski
On Fri, Jan 05, 2007 at 10:23:53PM +1100, Herbert Xu wrote:
> On Fri, Jan 05, 2007 at 12:19:10PM +0100, Jarek Poplawski wrote:
> > 
> > Why me? (I didn't spoil this!)
> 
> You spotted the problem, so it's only fair that you get the credit :)

Strange... It recalls me the army now! (many years ago)

Jarek P. 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] tcp: fix ambiguity in the `before' relation

2007-01-05 Thread Gerrit Renker
|  > The key point where the new definition differs from the old is that _the 
relation_
|  > before(x,y) is unambiguous: the case "before(x,y) && before(y,x)" will no 
longer occur.
|  
|  This is highly dependent on how the before macro is used in actual code.
|  There is nothing to suggest that this change won't create new security
|  holes in DCCP or any other protocol that uses this macro.  The only
|  way to be sure is to audit every single use.
I fully agree, merely changing the definition means going only half way.
  
|  So I think we need to do one of two things:
|  
|  1) Audit every single before/after check to ensure that it works
|  correctly with the new definition.
For DCCP I will perform such an audit and post the results to [EMAIL PROTECTED] 

With regard to TCP: I am heavily snowed under with other work at the moment. If 
there
are experienced TCP people on the list who would be happy to look at this, it 
would be
great. I counted the number of times before() is used - it amounted to 68. 
There are
of course obvious cases which are quick to dismiss, but in particular the 
example you
presented yesterday points out that careful analysis is needed.

I asked Dave to revert to the old TCP definition (patch has been committed); 
for the moment
this seems the safest thing to do.
 
|  2) Change before/after such that before(x, x+2^31) == !before(x+2^31, x).
This is what the new definition does: in the old definition we always have that
before(x, x+2^31) == before(x+2^31, x).
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] tcp: fix ambiguity in the `before' relation

2007-01-05 Thread Herbert Xu
On Fri, Jan 05, 2007 at 11:51:16AM +, Gerrit Renker wrote:
>  
> |  2) Change before/after such that before(x, x+2^31) == !before(x+2^31, x).
> This is what the new definition does: in the old definition we always have 
> that
> before(x, x+2^31) == before(x+2^31, x).

Sorry but the new definition has exactly the same problem since

before(x, x+2^31) == before(x+2^31, x) == 0

While the old definition had

before(x, x+2^31) == before(x+2^31, x) == 1

Both are equally bad.  It's only unambiguous if

before(x, x+2^31) == !before(x+2^31, x) == 0

or

before(x, x+2^31) == !before(x+2^31, x) == 1

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] tcp: fix ambiguity in the `before' relation

2007-01-05 Thread Gerrit Renker
Quoting Herbert Xu:
|  On Fri, Jan 05, 2007 at 11:51:16AM +, Gerrit Renker wrote:
|  >  
|  > |  2) Change before/after such that before(x, x+2^31) == !before(x+2^31, 
x).
|  > This is what the new definition does: in the old definition we always have 
that
|  > before(x, x+2^31) == before(x+2^31, x).
|  
|  Sorry but the new definition has exactly the same problem since
|  
|   before(x, x+2^31) == before(x+2^31, x) == 0

You are right. Sorry, I misread the text. Please see below.


|  While the old definition had
|  
|   before(x, x+2^31) == before(x+2^31, x) == 1
|  
|  Both are equally bad.  It's only unambiguous if
|  
|   before(x, x+2^31) == !before(x+2^31, x) == 0
|  
|  or
|  
|   before(x, x+2^31) == !before(x+2^31, x) == 1
Implementing such a solution is a challenge - RFC 1982 suggests here (sec. 3.2):
 "Thus the problem case is left undefined, implementations are free to
  return either result, or to flag an error, and users must take care
  not to depend on any particular outcome."

I think that a definition which satisfies before(x, x+2^31) != !before(x+2^31, 
x)
will be more complex to implement and will need more instructions.

To illustrate with an example, if we assume that `before' operates on minutes 
and uses
modulo-60 arithmetic, the above requirement becomes

before60(x, x+30)   !=   before(x+30, x)

On a clock, one cannot tell this when we merely look at the minute hands: "half 
before xx o'clock" 
is the same as "xx o'clock before half". Only if we also take the hour hand 
into consideration, the
statement "half before 2:00 o'clock" becomes unambiguous (although one would 
rather say "half past 
1:00 o'clock" :-). 

With regard to 31-bit sequence numbers, this would mean that we need additional 
information to enforce

before(x, x+2^31) != before(x+2^31, x)

Taking the clock example further, it could be disambiguated by using 33 bits, 
but then the same problem
crops up with regard to modulo-2^33 arithmetic: how to disambiguate the case 
for x and x+2^32.

Please let me restate the differences between the old and new definition:

1) Old definition has the following list of exclusive-or cases
* x == y- identity
* before(x, y) && !before(y, x) - x `before' y and y != (x + 
2^31) % 2^32
* before(y, x) && !before(x, y) - y `before' x and y != (x + 
2^31) % 2^32
* before(x, y) && before(y, x)  - y == (x + 2^31) % 2^32

2) New definition has the following list of exclusive-or cases
* x == y- identity
* before(x, y)  - x `before' y and y != (x + 
2^31) % 2^32
* before(y, x)  - y `before' x and y != (x + 
2^31) % 2^32
* !before(x, y) && !before(y, x)- y == (x + 2^31) % 2^32

Since the old definition is not used in the way "before(x, y) && !before(y, 
x)", but rather in the
fashion "before(x, y)" or "after(y, x)", the main advantage of the new 
definition is that it makes
this type of use a safe case. 

My view is that this is as good as one can get; if you have a suggestion of how 
one could also
disambiguate before(x, x+2^31) != before(x+2^31, x), can you please let me know.


Gerrit
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-05 Thread Jarek Poplawski
On Thu, Jan 04, 2007 at 09:04:29AM -0800, Ben Greear wrote:
> Jarek Poplawski wrote:
> >On Thu, Jan 04, 2007 at 09:27:07PM +1100, Herbert Xu wrote:
> >  
> >>On Thu, Jan 04, 2007 at 09:50:14AM +0100, Jarek Poplawski wrote:
> >>
> >>>Could you explain? I can see some inet_rtm_newaddr
> >>>interrupted. For me it could be e.g.:
> >>>
> >>>after
> >>>vconfig add eth0 9
> >>>
> >>>ip addr add dev eth0.9 ...
> >>>  
> >>Whether eth0.9 is up or not does not affect this at all.  The spin
> >>locks are initialised (and used) when the first IPv4 address is added,
> >>not when the device comes up.
> >>
> >
> >I understand this. I consider IFF_UP as a sign all 
> >initialisations (open functions including) are
> >completed and there is permission for working (so
> >logically, if I would do eth0.9 down all traffic
> >should be stopped, what probably isn't true now).
> >  
> It is certainly valid for an interface to be IF_UP, but have no IP 
> address.  My application
> does bring the network device up before it assigns the IP, for instance.

Yes, but I think in any case it isn't races safe
now with vlans. I thought more about the reverse
situation where skb->dev !IFF_UP could be
unnecessarily processed. But the same should be
valid according to the rest of initializations
which are done during address assigning. 

> There may be other issues with IF_UP, but that could be handled with a 
> different
> investigation.  If you have a particular test case that fails with 
> 802.1Q VLANs, then
> I will be happy to work on it...

Sorry, I even didn't use this yet... 

Wish you sunny weekend,

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix phy_read/write redefinition errors in ucc_geth_phy.c

2007-01-05 Thread Kumar Gala

Jeff,

Friendly reminder that this should go in for 2.6.20

- k

On Dec 18, 2006, at 10:24 AM, Kumar Gala wrote:


Jeff,

Can you pickup this patch for 2.6.20.  It addresses a name conflict  
issue with the phylib and the phy handling in the ucc driver.


thanks

- k

On Dec 13, 2006, at 5:08 PM, [EMAIL PROTECTED] wrote:


From: Timur Tabi <[EMAIL PROTECTED]>

The local versions of phy_read() and phy_write() in ucc_geth_phy.c  
conflict
with the prototypes in include/linux/phy.h, so this patch renames  
them,
moves them to the top of the file (while eliminating the redundant  
prototype),

and makes them static.

Signed-off-by: Timur Tabi <[EMAIL PROTECTED]>
Signed-off-by: Kumar Gala <[EMAIL PROTECTED]>

---
 drivers/net/ucc_geth_phy.c |  134 + 
+--

 1 files changed, 66 insertions(+), 68 deletions(-)

diff --git a/drivers/net/ucc_geth_phy.c b/drivers/net/ucc_geth_phy.c
index 5360ec0..3c86592 100644
--- a/drivers/net/ucc_geth_phy.c
+++ b/drivers/net/ucc_geth_phy.c
@@ -68,8 +68,31 @@ static int gbit_config_aneg(struct ugeth
 static int genmii_config_aneg(struct ugeth_mii_info *mii_info);
 static int genmii_update_link(struct ugeth_mii_info *mii_info);
 static int genmii_read_status(struct ugeth_mii_info *mii_info);
-u16 phy_read(struct ugeth_mii_info *mii_info, u16 regnum);
-void phy_write(struct ugeth_mii_info *mii_info, u16 regnum, u16  
val);

+
+static u16 ucc_geth_phy_read(struct ugeth_mii_info *mii_info, u16  
regnum)

+{
+   u16 retval;
+   unsigned long flags;
+
+   ugphy_vdbg("%s: IN", __FUNCTION__);
+
+   spin_lock_irqsave(&mii_info->mdio_lock, flags);
+	retval = mii_info->mdio_read(mii_info->dev, mii_info->mii_id,  
regnum);

+   spin_unlock_irqrestore(&mii_info->mdio_lock, flags);
+
+   return retval;
+}
+
+static void ucc_geth_phy_write(struct ugeth_mii_info *mii_info,  
u16 regnum, u16 val)

+{
+   unsigned long flags;
+
+   ugphy_vdbg("%s: IN", __FUNCTION__);
+
+   spin_lock_irqsave(&mii_info->mdio_lock, flags);
+   mii_info->mdio_write(mii_info->dev, mii_info->mii_id, regnum, val);
+   spin_unlock_irqrestore(&mii_info->mdio_lock, flags);
+}

 /* Write value to the PHY for this device to the register at  
regnum, */

 /* waiting until the write is done before it returns.  All PHY */
@@ -184,7 +207,7 @@ static void config_genmii_advert(struct
advertise = mii_info->advertising;

/* Setup standard advertisement */
-   adv = phy_read(mii_info, MII_ADVERTISE);
+   adv = ucc_geth_phy_read(mii_info, MII_ADVERTISE);
adv &= ~(ADVERTISE_ALL | ADVERTISE_100BASE4);
if (advertise & ADVERTISED_10baseT_Half)
adv |= ADVERTISE_10HALF;
@@ -194,7 +217,7 @@ static void config_genmii_advert(struct
adv |= ADVERTISE_100HALF;
if (advertise & ADVERTISED_100baseT_Full)
adv |= ADVERTISE_100FULL;
-   phy_write(mii_info, MII_ADVERTISE, adv);
+   ucc_geth_phy_write(mii_info, MII_ADVERTISE, adv);
 }

 static void genmii_setup_forced(struct ugeth_mii_info *mii_info)
@@ -204,7 +227,7 @@ static void genmii_setup_forced(struct u

ugphy_vdbg("%s: IN", __FUNCTION__);

-   ctrl = phy_read(mii_info, MII_BMCR);
+   ctrl = ucc_geth_phy_read(mii_info, MII_BMCR);

ctrl &=
 	~(BMCR_FULLDPLX | BMCR_SPEED100 | BMCR_SPEED1000 |  
BMCR_ANENABLE);

@@ -234,7 +257,7 @@ static void genmii_setup_forced(struct u
break;
}

-   phy_write(mii_info, MII_BMCR, ctrl);
+   ucc_geth_phy_write(mii_info, MII_BMCR, ctrl);
 }

 /* Enable and Restart Autonegotiation */
@@ -244,9 +267,9 @@ static void genmii_restart_aneg(struct u

ugphy_vdbg("%s: IN", __FUNCTION__);

-   ctl = phy_read(mii_info, MII_BMCR);
+   ctl = ucc_geth_phy_read(mii_info, MII_BMCR);
ctl |= (BMCR_ANENABLE | BMCR_ANRESTART);
-   phy_write(mii_info, MII_BMCR, ctl);
+   ucc_geth_phy_write(mii_info, MII_BMCR, ctl);
 }

 static int gbit_config_aneg(struct ugeth_mii_info *mii_info)
@@ -261,14 +284,14 @@ static int gbit_config_aneg(struct ugeth
config_genmii_advert(mii_info);
advertise = mii_info->advertising;

-   adv = phy_read(mii_info, MII_1000BASETCONTROL);
+   adv = ucc_geth_phy_read(mii_info, MII_1000BASETCONTROL);
adv &= ~(MII_1000BASETCONTROL_FULLDUPLEXCAP |
 MII_1000BASETCONTROL_HALFDUPLEXCAP);
if (advertise & SUPPORTED_1000baseT_Half)
adv |= MII_1000BASETCONTROL_HALFDUPLEXCAP;
if (advertise & SUPPORTED_1000baseT_Full)
adv |= MII_1000BASETCONTROL_FULLDUPLEXCAP;
-   phy_write(mii_info, MII_1000BASETCONTROL, adv);
+   ucc_geth_phy_write(mii_info, MII_1000BASETCONTROL, adv);

/* Start/Restart aneg */
genmii_restart_aneg(mii_info);
@@ -298,10 +321,10 @@ static int genmii_update_link(st

Re: [PATCH] Update ucc_geth.c for new workqueue structure

2007-01-05 Thread Kumar Gala

Jeff,

Friendly reminder that this should go in for 2.6.20

- k

On Dec 18, 2006, at 10:23 AM, Kumar Gala wrote:


Jeff,

Can you pickup this patch for 2.6.20 as it fixes a compile issue  
due to the workqueue changes.


- kumar

On Dec 13, 2006, at 5:08 PM, [EMAIL PROTECTED] wrote:


From: Timur Tabi <[EMAIL PROTECTED]>

The workqueue interface changed with David Howell's patch on  
11/22/2006

(SHA 65f27f38446e1976cc98fd3004b110fedcddd189).  Several drivers were
updated with that patch to handle the new interface, but ucc_geth.c
was not one of them.  This patch updates ucc_geth.c to support the  
new

model.

A compiler warning in set_mac_addr() was also fixed.

Signed-off-by: Timur Tabi <[EMAIL PROTECTED]>
Signed-off-by: Kumar Gala <[EMAIL PROTECTED]>

---
 drivers/net/ucc_geth.c |   12 +++-
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 1f05511..d33bb0c 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -472,7 +473,7 @@ static void put_enet_addr_container(stru
kfree(enet_addr_cont);
 }

-static int set_mac_addr(__be16 __iomem *reg, u8 *mac)
+static void set_mac_addr(__be16 __iomem *reg, u8 *mac)
 {
out_be16(®[0], ((u16)mac[5] << 8) | mac[4]);
out_be16(®[1], ((u16)mac[3] << 8) | mac[2]);
@@ -3918,10 +3919,11 @@ static irqreturn_t phy_interrupt(int irq
 }

 /* Scheduled by the phy_interrupt/timer to handle PHY changes */
-static void ugeth_phy_change(void *data)
+static void ugeth_phy_change(struct work_struct *work)
 {
-   struct net_device *dev = (struct net_device *)data;
-   struct ucc_geth_private *ugeth = netdev_priv(dev);
+   struct ucc_geth_private *ugeth =
+   container_of(work, struct ucc_geth_private, tq);
+   struct net_device *dev = ugeth->dev;
struct ucc_geth *ug_regs;
int result = 0;

@@ -4078,7 +4080,7 @@ static int ucc_geth_open(struct net_devi
 #endif /* CONFIG_UGETH_NAPI */

/* Set up the PHY change work queue */
-   INIT_WORK(&ugeth->tq, ugeth_phy_change, dev);
+   INIT_WORK(&ugeth->tq, ugeth_phy_change);

init_timer(&ugeth->phy_info_timer);
ugeth->phy_info_timer.function = &ugeth_phy_startup_timer;
--
1.4.4

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [openib-general] [PATCH v4 01/13] Linux RDMA Core Changes

2007-01-05 Thread Steve Wise
On Thu, 2007-01-04 at 13:34 -0800, Roland Dreier wrote:
> OK, I'm back from vacation today.
> 
> Anyway I don't have a definitive statement on this right now.  I guess
> I agree that I don't like having an extra parameter to a function that
> should be pretty fast (although req notify isn't quite as hot as
> something like posting a send request or polling a cq), given that it
> adds measurable overhead.  (And I am surprised that the overhead is
> measurable, since 3 arguments still fit in registers, but OK).
> 
> I also agree that adding an extra entry point just to pass in the user
> data is ugly, and also racy.
> 
> Giving the kernel driver a pointer it can read seems OK I guess,
> although it's a little ugly to have a backdoor channel like that.
> 

Another alternative is for the cq-index u32 memory to be allocated by
the kernel and mapped into the user process.  So the lib can read/write
it, and the kernel can read it directly.  This is the fastest way
perfwise, but I didn't want to do it because of the page granularity of
mapping.  IE it would require a page of address space (and backing
memory I guess) just for 1 u32.  The CQ element array memory is already
allocated this way (and its DMA coherent too), but I didn't want to
overload that memory with this extra variable either.  Mapping just
seemed ugly and wasteful to me. 

So given 3 approaches:

1) allow user data to be passed into ib_req_notify_cq() via the standard
uverbs mechanisms.

2) hide this in the chelsio driver and have the driver copyin the info
directly.

3) allocate the memory for this in the kernel and map it to the user
process.

I chose 1 because it seemed the cleanest from an architecture point of
view and I didn't think it would impact performance much.


Steve.




-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tipc: checking returns and Re: Possible Circular Locking in TIPC

2007-01-05 Thread Jon Maloy

Jarek Poplawski wrote:



If you are sure there is no circular locking possible
between these two functions and this entry->lock here
isn't endangered by other functions, you could try to
make lockdep "silent" like this: 



   write_lock_bh(&ref_table_lock);
   if (tipc_ref_table.first_free) {
   index = tipc_ref_table.first_free;
   entry = &(tipc_ref_table.entries[index]);
   index_mask = tipc_ref_table.index_mask;
   /* take lock in case a previous user of entry still holds it */

-spin_lock_bh(&entry->lock, );
+   local_bh_disable();
+   spin_lock_nested(&entry->lock, SINGLE_DEPTH_NESTING);

   next_plus_upper = entry->data.next_plus_upper;
   tipc_ref_table.first_free = next_plus_upper & index_mask;
   reference = (next_plus_upper & ~index_mask) + index;
   entry->data.reference = reference;
   entry->object = object;
   if (lock != 0)
   *lock = &entry->lock;

/* may stay as is or: */
-spin_unlock_bh(&entry->lock);
+   spin_unlock(&entry->lock);
+   local_bh_enable();

   }
   write_unlock_bh(&ref_table_lock);


 


Looks like an acceptable solution. I will try this.
Thanks
///Jon

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [openib-general] [PATCH v4 01/13] Linux RDMA Core Changes

2007-01-05 Thread Felix Marti


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Steve Wise
> Sent: Friday, January 05, 2007 6:22 AM
> To: Roland Dreier
> Cc: linux-kernel@vger.kernel.org; openib-general@openib.org;
> netdev@vger.kernel.org
> Subject: Re: [openib-general] [PATCH v4 01/13] Linux RDMA Core Changes
> 
> On Thu, 2007-01-04 at 13:34 -0800, Roland Dreier wrote:
> > OK, I'm back from vacation today.
> >
> > Anyway I don't have a definitive statement on this right now.  I
guess
> > I agree that I don't like having an extra parameter to a function
that
> > should be pretty fast (although req notify isn't quite as hot as
> > something like posting a send request or polling a cq), given that
it
> > adds measurable overhead.  (And I am surprised that the overhead is
> > measurable, since 3 arguments still fit in registers, but OK).
> >
> > I also agree that adding an extra entry point just to pass in the
user
> > data is ugly, and also racy.
> >
> > Giving the kernel driver a pointer it can read seems OK I guess,
> > although it's a little ugly to have a backdoor channel like that.
> >
> 
> Another alternative is for the cq-index u32 memory to be allocated by
> the kernel and mapped into the user process.  So the lib can
read/write
> it, and the kernel can read it directly.  This is the fastest way
> perfwise, but I didn't want to do it because of the page granularity
of
> mapping.  IE it would require a page of address space (and backing
> memory I guess) just for 1 u32.  The CQ element array memory is
already
> allocated this way (and its DMA coherent too), but I didn't want to
> overload that memory with this extra variable either.  Mapping just
> seemed ugly and wasteful to me.
> 
> So given 3 approaches:
> 
> 1) allow user data to be passed into ib_req_notify_cq() via the
standard
> uverbs mechanisms.
> 
> 2) hide this in the chelsio driver and have the driver copyin the info
> directly.
> 
> 3) allocate the memory for this in the kernel and map it to the user
> process.
> 
> I chose 1 because it seemed the cleanest from an architecture point of
> view and I didn't think it would impact performance much.

[Felix Marti] In addition, is arming the CQ really in the performance
path? - Don't apps poll the CQ as long as there are pending CQEs and
only arm the CQ for notification once there is nothing left to do? If
this is the case, it would mean that we waste a few cycles 'idle'
cycles. Steve, next to the micro benchmark that you did, did you run any
performance benchmark that actually moves traffic? If so, did you see a
difference in performance?

> 
> 
> Steve.
> 
> 
> 
> 
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!

2007-01-05 Thread Thibaut VARENE

Hi,

I've been experiencing this bug on my Pegasos II (PPC G4 1GHz, 512M
RAM) box for a while: I can reliably kill my machine in about half an
hour while watching some video read from a remote nfs volume (hence
the "mplayer" task in the following dump). It was relatively uneasy to
get proper debug info as the crash happens while video was playing on
the screen, but it's there anyway :)

This particular dump comes from kernel 2.6.19-ck2 but I reproduced the
bug with vanilla 2.6.19 too, so the bug lives in mainline. I'm not
really familiar with that particular code, but I'd gladly provide as
much debug info as I can.

The box is hooked to a gigabit switch and the NIC is configured as
gigabit too. Interestingly, when I reboot immediately after the crash,
the NIC gets a bogus MAC address, and I have to reboot again to get
back to normal.

HTH

T-Bone

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
Oops: Exception in kernel mode, sig: 5 [#1] 
PREEMPT 
Modules linked in: nfs lockd sunrpc eeprom sbp2 scsi_mod eth1394 uhci_hcd ohci14
NIP: C020F0E0 LR: C0210C54 CTR: C0210B98
REGS: c7f6f670 TRAP: 0700   Not tainted  (2.6.19-ck2)   
MSR: 00021032   CR: 24022488  XER:    
TASK = c49a8d10[2227] 'mplayer' THREAD: c7f6e000
GPR00:  C7F6F720 C49A8D10 DFF41260 DFF41000 000B CE0CF932   
GPR08: 0CEA 0001 1000 0CEB 44022422 1085F9B8 C50B0368 B241  
GPR16: C7F6FD28 B240  DFF412DC C038 9032 0400 C7F6E000  
GPR24:   DFF41000 C7F6E000 C0210B98 CE0EAC80 DFF41260 CE0CF900  
NIP [C020F0E0] eth_alloc_tx_desc_index+0x44/0x50
LR [C0210C54] mv643xx_eth_start_xmit+0xbc/0x3b8 
Call Trace: 
[C7F6F720] [CE0CF930] 0xce0cf930 (unreliable)   
[C7F6F760] [C0299714] dev_hard_start_xmit+0x1d4/0x2c8   
[C7F6F780] [C029C0E0] dev_queue_xmit+0x2bc/0x334
[C7F6F7A0] [C02B6E1C] ip_output+0x124/0x248 
[C7F6F7C0] [C02B7E54] ip_queue_xmit+0x17c/0x404 
[C7F6F830] [C02C91BC] tcp_transmit_skb+0x38c/0x7dc  
[C7F6F860] [C02C65E4] __tcp_ack_snd_check+0x64/0xbc 
[C7F6F870] [C02C8100] tcp_rcv_established+0x5d4/0x980   
[C7F6F8A0] [C02CEDCC] tcp_v4_do_rcv+0xd8/0x3e4  
[C7F6F8D0] [C02D1610] tcp_v4_rcv+0x788/0x98c
[C7F6F900] [C02B2594] ip_local_deliver+0xe4/0x1a4   
[C7F6F920] [C02B2A50] ip_rcv+0x288/0x46c
[C7F6F950] [C0299308] netif_receive_skb+0x214/0x304 
[C7F6F980] [C0211CBC] mv643xx_poll+0x41c/0x48c  
[C7F6F9D0] [C029B550] net_rx_action+0x98/0x200  
[C7F6FA00] [C0026958] __do_softirq+0x80/0xf4
[C7F6FA30] [C0006930] do_softirq+0x58/0x5c  
[C7F6FA40] [C0026408] irq_exit+0x60/0x80
[C7F6FA50] [C00069DC] do_IRQ+0xa8/0xc8  
[C7F6FA60] [C0012498] ret_from_except+0x0/0x14  
--- Exception: 501 at __kmalloc+0x30/0xc0   
LR = rpc_malloc+0x48/0xac [sunrpc]  
[C7F6FB20] [C3D72508] 0xc3d72508 (unreliable)   
[C7F6FB30] [E2A88E18] rpc_malloc+0x48/0xac [sunrpc] 
[C7F6FB40] [E2A835F8] call_allocate+0x88/0x108 [sunrpc] 
[C7F6FB60] [E2A89554] __rpc_execute+0x94/0x248 [sunrpc] 
[C7F6FB80] [E2B0EEB0] nfs_execute_read+0x40/0x64 [nfs]  
[C7F6FBB0] [E2B0F6A4] nfs_pagein_one+0x2a0/0x300 [nfs]  
[C7F6FBF0] [E2B0FA9C] nfs_readpages+0x118/0x1f8 [nfs]   
[C7F6FC40] [C00521DC] __do_page_cache_readahead+0x1e8/0x318 
[C7F6FCD0] [C0052390] blockable_page_cache_readahead+0x84/0x114 
[C7F6FCF0] [C00524A4] make_ahead_window+0x84/0xd4   
[C7F6FD00] [C00525AC] page_cache_readahead+0xb8/0x220   
[C7F6FD20] [C004B00C] do_generic_mapping_read+0x574/0x5e8   
[C7F6FDC0] [C004D624] generic_file_aio_read+0x120/0x274 
[C7F6FE00] [E2B06F00] nfs_file_read+0xc4/0

Re: [PATCH] devinet: inetdev_init out label moved after RCU assignment

2007-01-05 Thread David Stevens
Yeah, sure.

+-DLS

Acked-by: David L Stevens <[EMAIL PROTECTED]>

> Signed-off-by: Jarek Poplawski <[EMAIL PROTECTED]>
> 
> ---
> 
> diff -Nurp linux-2.6.20-rc3-/net/ipv4/devinet.c 
linux-2.6.20-rc3/net/ipv4/devinet.c
> --- linux-2.6.20-rc3-/net/ipv4/devinet.c   2007-01-05 11:53:16.0 
+0100
> +++ linux-2.6.20-rc3/net/ipv4/devinet.c   2007-01-05 11:55:32.0 
+0100
> @@ -174,9 +174,10 @@ struct in_device *inetdev_init(struct ne
> ip_mc_init_dev(in_dev);
> if (dev->flags & IFF_UP)
>ip_mc_up(in_dev);
> -out:
> +
> /* we can receive as soon as ip_ptr is set -- do this last */
> rcu_assign_pointer(dev->ip_ptr, in_dev);
> +out:
> return in_dev;
>  out_kfree:
> kfree(in_dev);
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] d80211: Fix inconsistent sta_lock usage

2007-01-05 Thread Ivo van Doorn
On Tuesday 02 January 2007 17:22, Christoph Hellwig wrote:
> On Tue, Jan 02, 2007 at 04:30:41PM +0100, Ivo Van Doorn wrote:
> > +static inline void __bss_tim_set(struct ieee80211_local *local,
> > +struct ieee80211_if_ap *bss, int aid)
> > +{
> > +   bss->tim[(aid)/8] |= 1<<((aid) % 8);
> > +}
> 
> This really screams to be converted to __set_bit.  Also the local
> argument is entirely unused.  I'd probaby not even add a helper for
> this but just opencode it as:
> 
>   __set_bit(&bss->time, aid);
> 
> > +static inline void __bss_tim_clear(struct ieee80211_local *local,
> > +  struct ieee80211_if_ap *bss, int aid)
> > +{
> > +   bss->tim[(aid)/8] &= !(1<<((aid) % 8));
> 
> Similarly this should just be __clear_bit

This patch uses the __set_bit and __clear_but as suggested by Christoph.
It also removes the local argument since it was unused.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
Signed-off-by Ivo van Doorn <[EMAIL PROTECTED]>

---

diff --git a/net/d80211/ieee80211_i.h b/net/d80211/ieee80211_i.h
index 6ba6a61..c5c04d2 100644
--- a/net/d80211/ieee80211_i.h
+++ b/net/d80211/ieee80211_i.h
@@ -560,19 +560,23 @@ struct sta_attribute {
ssize_t (*store)(struct sta_info *, const char *buf, size_t count);
 };
 
+#define __bss_tim_set(__bss, __aid)__set_bit((__aid), &(__bss)->tim)
+
 static inline void bss_tim_set(struct ieee80211_local *local,
   struct ieee80211_if_ap *bss, int aid)
 {
-   spin_lock(&local->sta_lock);
-   bss->tim[(aid)/8] |= 1<<((aid) % 8);
-   spin_unlock(&local->sta_lock);
+   spin_lock_bh(&local->sta_lock);
+   __bss_tim_set(bss, aid);
+   spin_unlock_bh(&local->sta_lock);
 }
 
+#define __bss_tim_clear(__bss, __aid)  __clear_bit((__aid), &(__bss)->tim)
+
 static inline void bss_tim_clear(struct ieee80211_local *local,
 struct ieee80211_if_ap *bss, int aid)
 {
spin_lock(&local->sta_lock);
-   bss->tim[(aid)/8] &= !(1<<((aid) % 8));
+   __bss_tim_clear(bss, aid);
spin_unlock(&local->sta_lock);
 }
 
diff --git a/net/d80211/ieee80211_ioctl.c b/net/d80211/ieee80211_ioctl.c
index 81a6e82..5a21b2d 100644
--- a/net/d80211/ieee80211_ioctl.c
+++ b/net/d80211/ieee80211_ioctl.c
@@ -286,7 +286,9 @@ static int ieee80211_ioctl_add_sta(struct net_device *dev,
if (sta->dev != dev) {
/* Binding STA to a new interface, so remove all references to
 * the old BSS. */
+   spin_lock_bh(&local->sta_lock);
sta_info_remove_aid_ptr(sta);
+   spin_unlock_bh(&local->sta_lock);
}
 
 /* TODO
@@ -360,7 +362,7 @@ static int ieee80211_ioctl_remove_sta(struct net_device 
*dev,
sta = sta_info_get(local, param->sta_addr);
if (sta) {
sta_info_put(sta);
-   sta_info_free(sta, 1);
+   sta_info_free(sta, 0);
}
 
return sta ? 0 : -ENOENT;
diff --git a/net/d80211/sta_info.c b/net/d80211/sta_info.c
index 0c42ae8..f27bd0e 100644
--- a/net/d80211/sta_info.c
+++ b/net/d80211/sta_info.c
@@ -439,7 +439,7 @@ void sta_info_remove_aid_ptr(struct sta_info *sta)
sdata->local->ops->set_tim(local_to_hw(sdata->local),
  sta->aid, 0);
if (sdata->bss)
-   bss_tim_clear(sdata->local, sdata->bss, sta->aid);
+   __bss_tim_clear(sdata->bss, sta->aid);
 }
 
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Two pretty trivial NetLabel bugfixes

2007-01-05 Thread Paul Moore
This is a small patchset both in the number of patches (two) and the size of
the patches themselves.  The first patch fixes a potential locking issue with
a NetLabel state variable and the second patch fixes a couple of problems seen
when adding new CIPSO DOI definitions.

In light of the recent NetLabel locking issues I've spent this week going over
all of the NetLabel related locks and I firmly believe this patchset should
address the last of the problems.  However, if anyone with a good knowledge of
socket/sk locking can afford to take a look I would greatly appreciate it.

I tested both of these patches with what I believe to be pretty much all of
the kernel debug options enabled and I have not encountered any problems.
Please consider these for the 2.6.20 release.

--
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] NetLabel: correct locking in selinux_netlbl_socket_setsid()

2007-01-05 Thread Paul Moore
The spinlock protecting the update of the "sksec->nlbl_state" variable is not
currently softirq safe which can lead to problems.  This patch fixes this by
changing the spin_{un}lock() functions into spin_{un}lock_bh() functions.

Signed-off-by: Paul Moore <[EMAIL PROTECTED]>
---
 security/selinux/ss/services.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: net-2.6.20_bugfix_3/security/selinux/ss/services.c
===
--- net-2.6.20_bugfix_3.orig/security/selinux/ss/services.c
+++ net-2.6.20_bugfix_3/security/selinux/ss/services.c
@@ -2492,9 +2492,9 @@ static int selinux_netlbl_socket_setsid(
 
rc = netlbl_socket_setattr(sock, &secattr);
if (rc == 0) {
-   spin_lock(&sksec->nlbl_lock);
+   spin_lock_bh(&sksec->nlbl_lock);
sksec->nlbl_state = NLBL_LABELED;
-   spin_unlock(&sksec->nlbl_lock);
+   spin_unlock_bh(&sksec->nlbl_lock);
}
 
 netlbl_socket_setsid_return:

--
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] NetLabel: correct CIPSO tag handling when adding new DOI definitions

2007-01-05 Thread Paul Moore
The current netlbl_cipsov4_add_common() function has two problems which are
fixed with this patch.  The first is an off-by-one bug where it is possibile to
overflow the doi_def->tags[] array.  The second is a bug where the same 
doi_def->tags[] array was not always fully initialized, which caused sporadic
failures.

Signed-off-by: Paul Moore <[EMAIL PROTECTED]>
---
 net/netlabel/netlabel_cipso_v4.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: net-2.6.20_bugfix_3/net/netlabel/netlabel_cipso_v4.c
===
--- net-2.6.20_bugfix_3.orig/net/netlabel/netlabel_cipso_v4.c
+++ net-2.6.20_bugfix_3/net/netlabel/netlabel_cipso_v4.c
@@ -130,12 +130,12 @@ static int netlbl_cipsov4_add_common(str
 
nla_for_each_nested(nla, info->attrs[NLBL_CIPSOV4_A_TAGLST], nla_rem)
if (nla->nla_type == NLBL_CIPSOV4_A_TAG) {
-   if (iter > CIPSO_V4_TAG_MAXCNT)
+   if (iter >= CIPSO_V4_TAG_MAXCNT)
return -EINVAL;
doi_def->tags[iter++] = nla_get_u8(nla);
}
-   if (iter < CIPSO_V4_TAG_MAXCNT)
-   doi_def->tags[iter] = CIPSO_V4_TAG_INVALID;
+   while (iter < CIPSO_V4_TAG_MAXCNT)
+   doi_def->tags[iter++] = CIPSO_V4_TAG_INVALID;
 
return 0;
 }

--
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-05 Thread David Miller
From: Ben Greear <[EMAIL PROTECTED]>
Date: Fri, 05 Jan 2007 12:33:43 -0800

> We were able to reproduce the problem twice on the un-patched 2.6.18.2 kernel 
> in about
> 2 hours of our stress test yesterday.  I applied this patch (well, the
> ipv4 part..the ipv6 won't apply to 2.6.18.2), and it has run the stress
> test clean for a total of about 8 hours.
> 
> So, I do believe this was the problem we were hitting, and it seems fixed.

Thanks for all the testing Ben, much appreciated!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-05 Thread Ben Greear

David Miller wrote:

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Thu, 04 Jan 2007 17:26:27 +1100


David Stevens <[EMAIL PROTECTED]> wrote:

   You're right, I don't know whether it'll fix the problem Ben saw
or not, but it looks like the original code can do a receive before the
in_device is fully initialized, and that, of course, is bad.
   If the device for ip_rcv() is not the same one we were
initializing when the receive interrupted, then the patch should have
no effect either way -- I don't think it'll hide other problems.
   If it's hard to reproduce (which I guess is true), then you're
right, no soft lockup doesn't really tell us if it's fixed or not.

Actually I missed your point that the multicast locks aren't even
initialised at that point.  So this does explain the soft lock-up
and therefore your patch is clearly the correct solution.


I agree too, therefore I've added David's patch as below.

I'll push this to the -stable branches as well.  This fix is
correct even if it does not entirely clear up the soft lockup
bug being discussed in this thread, but I think it will :-)


We were able to reproduce the problem twice on the un-patched 2.6.18.2 kernel 
in about
2 hours of our stress test yesterday.  I applied this patch (well, the
ipv4 part..the ipv6 won't apply to 2.6.18.2), and it has run the stress
test clean for a total of about 8 hours.

So, I do believe this was the problem we were hitting, and it seems fixed.

Thanks!
Ben

--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] tcp: fix ambiguity in the `before' relation

2007-01-05 Thread Herbert Xu
On Fri, Jan 05, 2007 at 12:49:13PM +, Gerrit Renker wrote:
> 
> Since the old definition is not used in the way "before(x, y) && !before(y, 
> x)", but rather in the
> fashion "before(x, y)" or "after(y, x)", the main advantage of the new 
> definition is that it makes
> this type of use a safe case. 

This is not true because

if (before(x, y))
goto drop;

means that you're effectively using it as !before(x, y).  In other words,
the change is good if our code read

if (before(x, y))
process_packet();

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [openib-general] [PATCH v4 01/13] Linux RDMA Core Changes

2007-01-05 Thread Steve Wise
On Fri, 2007-01-05 at 09:32 -0800, Felix Marti wrote:
> 
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:openib-general-
> > [EMAIL PROTECTED] On Behalf Of Steve Wise
> > Sent: Friday, January 05, 2007 6:22 AM
> > To: Roland Dreier
> > Cc: linux-kernel@vger.kernel.org; openib-general@openib.org;
> > netdev@vger.kernel.org
> > Subject: Re: [openib-general] [PATCH v4 01/13] Linux RDMA Core Changes
> > 
> > On Thu, 2007-01-04 at 13:34 -0800, Roland Dreier wrote:
> > > OK, I'm back from vacation today.
> > >
> > > Anyway I don't have a definitive statement on this right now.  I
> guess
> > > I agree that I don't like having an extra parameter to a function
> that
> > > should be pretty fast (although req notify isn't quite as hot as
> > > something like posting a send request or polling a cq), given that
> it
> > > adds measurable overhead.  (And I am surprised that the overhead is
> > > measurable, since 3 arguments still fit in registers, but OK).
> > >
> > > I also agree that adding an extra entry point just to pass in the
> user
> > > data is ugly, and also racy.
> > >
> > > Giving the kernel driver a pointer it can read seems OK I guess,
> > > although it's a little ugly to have a backdoor channel like that.
> > >
> > 
> > Another alternative is for the cq-index u32 memory to be allocated by
> > the kernel and mapped into the user process.  So the lib can
> read/write
> > it, and the kernel can read it directly.  This is the fastest way
> > perfwise, but I didn't want to do it because of the page granularity
> of
> > mapping.  IE it would require a page of address space (and backing
> > memory I guess) just for 1 u32.  The CQ element array memory is
> already
> > allocated this way (and its DMA coherent too), but I didn't want to
> > overload that memory with this extra variable either.  Mapping just
> > seemed ugly and wasteful to me.
> > 
> > So given 3 approaches:
> > 
> > 1) allow user data to be passed into ib_req_notify_cq() via the
> standard
> > uverbs mechanisms.
> > 
> > 2) hide this in the chelsio driver and have the driver copyin the info
> > directly.
> > 
> > 3) allocate the memory for this in the kernel and map it to the user
> > process.
> > 
> > I chose 1 because it seemed the cleanest from an architecture point of
> > view and I didn't think it would impact performance much.
> 
> [Felix Marti] In addition, is arming the CQ really in the performance
> path? - Don't apps poll the CQ as long as there are pending CQEs and
> only arm the CQ for notification once there is nothing left to do? If
> this is the case, it would mean that we waste a few cycles 'idle'
> cycles. 

I tend to agree.  This shouldn't be the hot perf path like
post_send/recv and poll.

> Steve, next to the micro benchmark that you did, did you run any
> performance benchmark that actually moves traffic? If so, did you see a
> difference in performance?
> 

No.  But I didn't explicitly measure with and without this one single
change.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html