date:20060802

Re: [patch] RFC: matching interface groups

2006-08-02 Thread Balazs Scheidler

On Tue, 2006-08-01 at 21:18 +0200, Sven Schuster wrote:
> Hi Phil,
> 
> On Tue, Aug 01, 2006 at 11:46:55AM -0700, Phil Oester told us:
> > Since in this scenario userspace is able to determine ppp vs pptp, 
> > could you not also do something like have an inbound_ppp and inbound_pptp
> > chain, then jump to the appropriate chain depending on type?  If you
> > need per-interface rules, then create an inbound_pppX chain, populate
> > it with rules, then jump to that chain if -i pppX.  In ip-down, just
> > delete the chain as well as the jump.
> 
> if I understood Balazs correctly, one of the things he wanted to
> avoid is addition/deletion of iptables rules on every pppX interface
> up/down 

Exactly.

> as this would require the complete chain (say, INPUT or
> OUTPUT) to be "downloaded" to userspace, modified and then again
> "uploaded" to the kernel. At least until iptables redesign to
> allow replacement/insertion/deletion of single rules is completed
> which if started at all will take quite some more time :-)

Iptables operates on a per-table basis, so it is not only the INPUT or
OUTPUT chain that needs to be down and uploaded, but the whole filter
table.

And in addition, in my humble opinion the iptables ruleset should be up
to the user to maintain, once some kind of automatism starts to
add/remove rules on the fly, it becomes more difficult to do other
changes to add independent rules to the table. For example the user
needs to save the current ruleset using iptables-save, then modify the
resulting file, and then load it again. If the ruleset is generated as
it happens with a lot of tools, this might not be so easy.

-- 
Bazsi

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] RFC: matching interface groups

2006-08-02 Thread Balazs Scheidler

On Tue, 2006-08-01 at 11:29 -0700, Stephen Hemminger wrote:
> On Tue, 01 Aug 2006 19:10:09 +0200
> Balazs Scheidler <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> > 
> > I would like to easily match a set of dynamically created interfaces
> > from my packet filter rules. The attached patch forms the basis of my
> > implementation and I would like to know whether something like this is
> > mergeable to mainline.
> > 
> > The use-case is as follows:
> > 
> > * I have two different subsystems creating interfaces dynamically (for
> > example pptpd and serial pppd lines, each creating dynamic pppX
> > interfaces),
> > * I would like to assign a different set of iptables rules for these
> > clients,
> > * I would like to react to a new interface being added to a specific set
> > in a userspace application,
> > 
> > The reasons I see this needs new kernel functionality:
> > 
> > * iptables supports wildcard interface matching (for example "iptables
> > -i ppp+"), but as the names of the interfaces used by PPTPD and PPPD
> > cannot be distinguished this way, this is not enough,
> > * Reloading the iptables ruleset everytime a new interface comes up is
> > not really feasible, as it abrupts packet processing, and validating the
> > ruleset in the kernel can take significant amount of time,
> > * the kernel change is very simple, adapting userspace to this change is
> > also very simple, and in userspace various software packages can easily
> > interoperate with each-other once this is merged.
> > 
> > The implementation:
> > 
> > Each interface can belong to a single "group" at a time, an interface
> > comes up without being a member in any of the groups.
> > 
> > Userspace can assign interfaces to groups after being created, this
> > would typically be performed in /etc/ppp/ip-up.d (and similar) scripts.
> > 
> > In spirit "interface group" is somewhat similar to the "routing
> > protocol" field for routing entries, which contains information on which
> > routing daemon was responsible for adding the given route entry.
> > 
> > [snip]

> I like the concept, but it probably needs more review.
> 
> There is a bigger issue, which is how should the network device namespace
> exist? There are virtualization efforts, that want to virtualize it,
> and network device names have always lived in a parallel universe.
> I don't expect your patch to solve this...

I have read the OLS paper on virtualization, it states that the current
state of affairs is that struct net_device will be assigned to one
specific namespace. As my change changes struct net_device itself, I
expect to work without problems when virtualization comes, the interface
group can be interpreted on a per-namespace basis.

There probably will be several iptables rulesets when the time comes,
one for each namespace, but again, struct net_device will be assigned to
a namespace, and the proper iptables tables will be iterated based on
the net_device assignment.

Am I missing something?

-- 
Bazsi

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 speed/duplex error

2006-08-02 Thread a1

Hi, Auke.

Auke Kok wrote:
AK> Here's that part of the driver documentation:

AK> $ modprobe e1000 AutoNeg=0x08
AK> e1000: :00:00.0: e1000_validate_option: AutoNeg advertising 100/FD


AK>   99 /* Auto-negotiation Advertisement Override
AK> 100  *
AK> 101  * Valid Range: 0x01-0x0F, 0x20-0x2F (copper); 0x20 (fiber)
AK> 102  *
AK> 103  * The AutoNeg value is a bit mask describing which speed and duplex
AK> 104  * combinations should be advertised during auto-negotiation.
AK> 105  * The supported speed and duplex modes are listed below
AK> 106  *
AK> 107  * Bit   7 6 5  4  3 2 1  0
AK> 108  * Speed (Mbps)  N/A   N/A   1000   N/A100   100   10 10
AK> 109  * DuplexFull  Full  Half  Full   Half
AK> 110  *
AK> 111  * Default Value: 0x2F (copper); 0x20 (fiber)
AK> 112  */

This is not what I'm thinking of. Say, for example, I have a bunch of
e1000 adapters in my box and want to dynamically change one's spd/dplx.
For that works in the way you described I need to stop all of them and
load with autoneg parameter (can I pass this parameter only to single
card?) and loose all connection I had on other adapters.
It would be better to handle it in ethtool way, since I discovered
it's a common behavior.
Thanks.


AK> hth,

AK> Auke


-- 
Best Regards,
 Alexandr Kotov   mailto:[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] gre: transparent ethernet bridging

2006-08-02 Thread Lennert Buytenhek

On Mon, Jul 31, 2006 at 10:08:22PM -0700, Stephen Hemminger wrote:

> > > Why not use existing bridge code?
> > 
> > It does use the existing bridge code.  Perhaps the name is misleading.
> > All it does is encapsulate the full ethernet header in a gre packet,
> > rather than only layer 3.  That is, currently gre uses ARPHRD_IPGRE,
> > but bridging requires ARPHRD_ETHER.
> 
> I am not against making the bridge code smarter to handle other
> encapsulation.

What if you want to run ethernet directly over a GRE tunnel, without
using bridging?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 speed/duplex error

2006-08-02 Thread Jeff Kirsher


On 8/2/06, a1 <[EMAIL PROTECTED]> wrote:

Hi, Auke.

Auke Kok wrote:
AK> Here's that part of the driver documentation:

AK> $ modprobe e1000 AutoNeg=0x08
AK> e1000: :00:00.0: e1000_validate_option: AutoNeg advertising 100/FD


AK>   99 /* Auto-negotiation Advertisement Override
AK> 100  *
AK> 101  * Valid Range: 0x01-0x0F, 0x20-0x2F (copper); 0x20 (fiber)
AK> 102  *
AK> 103  * The AutoNeg value is a bit mask describing which speed and duplex
AK> 104  * combinations should be advertised during auto-negotiation.
AK> 105  * The supported speed and duplex modes are listed below
AK> 106  *
AK> 107  * Bit   7 6 5  4  3 2 1  0
AK> 108  * Speed (Mbps)  N/A   N/A   1000   N/A100   100   10 10
AK> 109  * DuplexFull  Full  Half  Full   Half
AK> 110  *
AK> 111  * Default Value: 0x2F (copper); 0x20 (fiber)
AK> 112  */

This is not what I'm thinking of. Say, for example, I have a bunch of
e1000 adapters in my box and want to dynamically change one's spd/dplx.
For that works in the way you described I need to stop all of them and
load with autoneg parameter (can I pass this parameter only to single
card?) and loose all connection I had on other adapters.
It would be better to handle it in ethtool way, since I discovered
it's a common behavior.
Thanks.


AK> hth,

AK> Auke


--
Best Regards,
 Alexandr Kotov   mailto:[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



I agree.  Although ethtool does not have that functionality as of yet.
Feel free to provide a patch to the ethtool maintainer (Jeff Garzik)
if you would like.  I will put it on my plate of things to do, but I
will admit that it is near the bottom of the list of items to get done
for me.  Feel free to ping me once in awhile to remind me.

--
Cheers,
Jeff
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [take2 1/4] kevent: core files.

2006-08-02 Thread David Miller

From: Evgeniy Polyakov <[EMAIL PROTECTED]>
Date: Wed, 2 Aug 2006 10:39:18 +0400

> u64 is not aligned, so I prefer to use u32 as much as possible.

We have aligned_u64 exactly for this purpose, netfilter makes
use of it to avoid the x86_64 vs. x86 u64 alignment discrepency.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [take2 1/4] kevent: core files.

2006-08-02 Thread Evgeniy Polyakov

On Wed, Aug 02, 2006 at 12:25:05AM -0700, David Miller ([EMAIL PROTECTED]) 
wrote:
> From: Evgeniy Polyakov <[EMAIL PROTECTED]>
> Date: Wed, 2 Aug 2006 10:39:18 +0400
> 
> > u64 is not aligned, so I prefer to use u32 as much as possible.
> 
> We have aligned_u64 exactly for this purpose, netfilter makes
> use of it to avoid the x86_64 vs. x86 u64 alignment discrepency.

Ok, I will use that type.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Mobile IPv6 introduction

2006-08-02 Thread Ville Nuorvala

Hugo Santos wrote:
> David,
> 
> On Tue, Aug 01, 2006 at 05:35:35PM -0700, David Miller wrote:
>> This is partly why the multiple routing table code is being
>> added as the initial infrastrucutre, so that source based
>> things are possible.
> 
>There have been other approaches for partial source-based stuff. For
>  instance, in my tree i brought Subtrees back to a point of being
>  usable. But what i was refering to was route-caching (some places only
>  check cookies based on dst because we "don't support source routing"),
>  APIs, etc. A few mails back you pointed the extension of a public
>  structure to include a "source" attribute -- this is the kind of stuff
>  we must add and that i think it's an independent work (read: even if
>  the rest isn't merged, this should).
> 
>Still regarding Subtrees, is there any interest in revitalizing that
>  code? I have a couple patches that i could submit.

Hi Hugo,

I don't want to be dismissive towards your patches, but I've been
working with the subtree routing stuff for several years now. And let me
tell you: it has provided us with some nasty little surprises every now
and then. I'm only saying it's surprisingly difficult to get right on
the first try.

To name just one issue: the chicken and egg problem of source address
selection and source address based routing. I solved this problem by
letting policy rules (with a source prefix) add additional constraints
to the address selection. This did however mean the source address
selection had to be moved inside the routing code.

This is just one example; trust me, there are several more.

My latest incarnation of source address routing is against my previous
version of policy routing, which luckily isn't that different from
current the version by Thomas. Unless Yoshifuji-san has already ported
my code to Thomas'es policy routing code, I'll start working on it.

>> Such a scheme would need provisions for handling the case where
>> the user eats the message, but never tells us what to do.
>> In such a case we'd need to emit some kind of ICMPv6 message,
>> even if it would be just a timeout generated parameter problem.
> 
>As i see it, the moment there is a raw socket open for dealing with a
>  particular protocol, whoever opened that socket (handling the protocol)
>  is responsible of generating any error messages associated with the
>  protocol running.  Which is the case, the kernel shouldn't need to know
>  whether any of the Mobile IPv6 specific messages have problems. The
>  particular patch i was refering to does partial MIPv6 message
>  processing inside the kernel before handing it to the socket as you
>  only have access to the full received headers there.
> 
>> Such a layer would be needed if we ever put some kernel level
>> components of Mobile IPv4 into the tree, which I see no reason
>> not to, since it has this route optimization as well.
> 
>Yes, the functionality is needed. My only problem is with exposing
>  MODE_ROUTEOPTIMIZATION, it isn't modular. But it's something i can live
>  with.

But route optimization is just one form of packet transform; it just
adds a Routing Header type 2 and/or Home Address Option Destination
Header to the outgoing packet. Isn't xfrm just the right place for this?

You are right that we (HUT and USAGI) have mostly just looked at the
xfrm framework from a MIPv6+IPsec perspective, but even this has helped
us pinpoint several shortcomings in the current only IPsec specific
framework.

IMO, this doesn't hinder, but rather helps change xfrm into the generic
packet transform framework it was originally envisioned to be.

Regards,
Ville
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/23] [PATCH] [XFRM]: Add XFRM_MODE_xxx for future use.

2006-08-02 Thread Masahide NAKAMURA

Herbert Xu wrote:
> Please rebase your tree on something that's more recent.  We've had
> xfrm modes for more than two months now.

OK, I use rebase to catch up with the latest tree.
(This tree is just for review then it is not against the latest but 2.6.17.)


-- 
Masahide NAKAMURA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/20][IPV6/XFRM] MIPv6 CN (part B)

2006-08-02 Thread Masahide NAKAMURA

David Miller wrote:
> From: Masahide NAKAMURA <[EMAIL PROTECTED]>
> Date: Sat, 29 Jul 2006 18:37:04 +0900
> 
>> Here is Part B patches, following this mail.
>>
>> Part B is also available as mip6cn-20060716-review branch at:
>>
>> git://git.skbuff.net:9419/gitroot/nakam/linux-2.6-mip6cn
>>
>> This tree includes part A, then it has all patches about
>> "Advanced XFRM for CN".
> 
> These patches mainly deal with the specifics of ipv6
> mobility processing, they look mostly fine to me and
> I could not spot any obvious errors.

Thank you for reviewing.

Next time I prepare the patch for the latest tree
with fixes about comments.

Thanks,

-- 
Masahide NAKAMURA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/23] [PATCH] [XFRM] STATE: Add a hook to find where to be inserted header in outbound.

2006-08-02 Thread Masahide NAKAMURA

David Miller wrote:
> From: Masahide NAKAMURA <[EMAIL PROTECTED]>
> Date: Wed, 02 Aug 2006 11:20:30 +0900
> 
>> David Miller wrote:
>>> I see a dangerous pattern of adding many, many, many methods
>>> to the xfrm_type structure which are only used by ipv6.
>>> But I cannot suggest another method.
>> Sometimes this is a difficult point for me to design.
> 
> Do not worry so much about it right now, it is not a barrier
> for code integration.  We can try to refine this later on.

OK, I improve my code for current framework at first.
Thanks :-)

-- 
Masahide NAKAMURA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 speed/duplex error

2006-08-02 Thread a1

Hi, Jeff.

JK> On 8/2/06, a1 <[EMAIL PROTECTED]> wrote:
>> Hi, Auke.
>>
>> Auke Kok wrote:
>> AK> Here's that part of the driver documentation:
>>
>> AK> $ modprobe e1000 AutoNeg=0x08
>> AK> e1000: :00:00.0: e1000_validate_option: AutoNeg advertising 100/FD
>>
>>
>> AK>   99 /* Auto-negotiation Advertisement Override
>> AK> 100  *
>> AK> 101  * Valid Range: 0x01-0x0F, 0x20-0x2F (copper); 0x20 (fiber)
>> AK> 102  *
>> AK> 103  * The AutoNeg value is a bit mask describing which speed and duplex
>> AK> 104  * combinations should be advertised during auto-negotiation.
>> AK> 105  * The supported speed and duplex modes are listed below
>> AK> 106  *
>> AK> 107  * Bit   7 6 5  4  3 2 1  0
>> AK> 108  * Speed (Mbps)  N/A   N/A   1000   N/A100   100   10 10
>> AK> 109  * DuplexFull  Full  Half  Full   Half
>> AK> 110  *
>> AK> 111  * Default Value: 0x2F (copper); 0x20 (fiber)
>> AK> 112  */
>>
>> This is not what I'm thinking of. Say, for example, I have a bunch of
>> e1000 adapters in my box and want to dynamically change one's spd/dplx.
>> For that works in the way you described I need to stop all of them and
>> load with autoneg parameter (can I pass this parameter only to single
>> card?) and loose all connection I had on other adapters.
>> It would be better to handle it in ethtool way, since I discovered
>> it's a common behavior.
>> Thanks.
>>
>>
>> AK> hth,
>>
>> AK> Auke
>>
>>
>> --
>> Best Regards,
>>  Alexandr Kotov   mailto:[EMAIL PROTECTED]
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to [EMAIL PROTECTED]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

JK> I agree.  Although ethtool does not have that functionality as of yet.
JK>  Feel free to provide a patch to the ethtool maintainer (Jeff Garzik)
JK> if you would like.  I will put it on my plate of things to do, but I
JK> will admit that it is near the bottom of the list of items to get done
JK> for me.  Feel free to ping me once in awhile to remind me.

Ethtool already have support for that, but e1000 driver doesn't treat
all values passed from ethtool correctly.

For example, if I run ethtool with the following parameters:
ethtool -s eth0 speed 100 duplex full autoneg on

parameters filled by ethtool looks like:

ecmd->autoneg = AUTONEG_ENABLE;
ecmd->advertising = ADVERTISED_100baseT_Full;

but then they passed to the driver, driver fills the structure passed
to the hw layer with all possible advertise values.

static int
e1000_set_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
{
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;

/* When SoL/IDER sessions are active, autoneg/speed/duplex
 * cannot be changed */
if (e1000_check_phy_reset_block(hw)) {
DPRINTK(DRV, ERR, "Cannot change link characteristics "
"when SoL/IDER is active.\n");
return -EINVAL;
}

if (ecmd->autoneg == AUTONEG_ENABLE) {
hw->autoneg = 1;
if (hw->media_type == e1000_media_type_fiber)
hw->autoneg_advertised = ADVERTISED_1000baseT_Full |
 ADVERTISED_FIBRE |
 ADVERTISED_Autoneg;
else
--->hw->autoneg_advertised = ADVERTISED_10baseT_Half |
  ADVERTISED_10baseT_Full |
  ADVERTISED_100baseT_Half |
  ADVERTISED_100baseT_Full |
  ADVERTISED_1000baseT_Full|
  ADVERTISED_Autoneg |
  ADVERTISED_TP;
ecmd->advertising = hw->autoneg_advertised;
} else
if (e1000_set_spd_dplx(adapter, ecmd->speed + ecmd->duplex))
return -EINVAL;

/* reset the link */

if (netif_running(adapter->netdev))
e1000_reinit_locked(adapter);
else
e1000_reset(adapter);

return 0;
}

If you change it that way everything works like I thought

--- e1000_ethtool.c.origMon Jun 26 14:13:26 2006
+++ e1000_ethtool.c Wed Aug 02 12:35:36 2006
@@ -225,13 +225,7 @@
 ADVERTISED_FIBRE |
 ADVERTISED_Autoneg;
else
-   hw->autoneg_advertised = ADVERTISED_10baseT_Half |
- ADVERTISED_10baseT_Full |
- ADVERTISED_100baseT_Half |
- ADVERTISED_100baseT_Full |
-

Re: Linville's L2 rant... -- Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-02 Thread Christophe Devriese

On Tuesday 01 August 2006 19:21, you wrote:
> John W. Linville wrote:
> >>>I'm just not sure that cleverness is worth the headache, especially
> >>>since the most clever things usually only work by accident...
> >>
> >>Or, work by solid, modular design and small tweaks!
> >
> > Point taken.  But stashing little hacks in the networking core for
> > specific virtual drivers isn't totally modular either.  And even if
> > it were, "modular design" probably belongs on the list of "things
> > that can be taken too far", like "everything in userland", "never
> > use ioctl", and "microkernels are superior". :-)
>
> To be honest, I'm not over-joyed to see bridging hooks included
> in the VLAN code..but if that is what it takes to get bridging
> and VLANs to play well and be flexible, I think it is a fair price.
>
> It certainly wouldn't hurt to have someone take a holistic view of the
> various L2 device interactions.  Just documenting current functionality
> on, say, the netdev wiki would be a good first step.

Ultimate flexibility could be provided by making the netif_rx routine (and the 
others, including vlan etc), a "virtual" routine.

That way a list of "filters" could be defined that allow any processing to be 
done on the packet before it is handed of to the linux kernel's higher 
layers, including not delivering it on that interface, or delivering it on 
another interface.

This would allow very complex implementations including stuff like a 
high-level l2 bridge, with vlan support, and a number of protocols like rstp, 
pvst+, ... with relatively simple code, that could be isolated from the main 
kernel.

Would anyone be interested in signing off on such a patch ? (which basically 
creates netif_rx and vlan_acc_netif_rx lists in the net_device structure, and 
then modify bridging and bonding drivers to just use this) 

Regards,

Christophe
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] RFC: matching interface groups

2006-08-02 Thread Amin Azez

* Balazs Scheidler wrote, On 02/08/06 08:04:
> On Tue, 2006-08-01 at 21:18 +0200, Sven Schuster wrote:
>> as this would require the complete chain (say, INPUT or
>> OUTPUT) to be "downloaded" to userspace, modified and then again
>> "uploaded" to the kernel. At least until iptables redesign to
>> allow replacement/insertion/deletion of single rules is completed
>> which if started at all will take quite some more time :-)
> 
> Iptables operates on a per-table basis, so it is not only the INPUT or
> OUTPUT chain that needs to be down and uploaded, but the whole filter
> table.
> 
> And in addition, in my humble opinion the iptables ruleset should be up
> to the user to maintain, once some kind of automatism starts to
> add/remove rules on the fly, it becomes more difficult to do other
> changes to add independent rules to the table. For example the user
> needs to save the current ruleset using iptables-save, then modify the
> resulting file, and then load it again. If the ruleset is generated as
> it happens with a lot of tools, this might not be so easy.
> 

Even without this scenario it is not easily safe; if two interfaces
chanegd at the same time, two copies of iptables would be downloaded to
user space, both modified differently and the last one to be uploaded
would win, the other one loosing its changes.

This has bitten me and is one of my reasons for liking ipt_condition

Sam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Mobile IPv6 introduction

2006-08-02 Thread Hugo Santos

Hi,

   Thanks for the reply, however:

On Wed, Aug 02, 2006 at 12:24:30PM +0900, Masahide NAKAMURA wrote:
> Our patch is similar as you said.  Our design is that kernel does nothing
> as possible about validation which can be done by user-space.
> As you mentioned ICMPv6 error is hard to be sent by user-space because it 
> carries
> original packet causing error. MIPv6 RFC says when mobility header length is 
> too short
> ICMPv6 error (parameter problem) is sent. We also discussed about design like 
> your choice.
> but we have not taken it because ICMPv6 sending mechanism is already in kernel
> then it is reasonable to use it. We MIPL developers concluded that kernel 
> should
> know mobility header types and their minimum length at least. I guess when we 
> would
> support NEMO and FMIPv6, we just add their defines at that time.
> (Actually, their implementations based on MIPL2 exists.)
> If somebody would feel that such defines should be removed from kernel we 
> have another
> idea to make new socket interface like ICMP filter to store mobility header 
> type and its
> minimum length to kernel by user-space.

   Although the ICMP-filter approach would be better, it is not flexible
 enough to handle this situation. We must also send ICMPv6 Parameter
 Problems when ip6mh_proto isn't IPPROTO_NONE. I don't think it is too
 much of a burthen to handle ICMPv6 in the control daemon because you
 should already do so to react to ICMPv6 error messages from peers
 concerning MIPv6 signalling. I'm strongly against doing these checks in
 the kernel for the simple reason that it is not easily extendable.  You
 wouldn't be able to deploy a new daemon version over an existing kernel
 with these changes if it supported a new control protocol with new
 messages. I think we should follow a different path here and i propose
 either have a hdrinc=1 mode (for reception only) for protocol raw
 sockets, possibly adding with control on reception which specifies the
 offset of the UPL header; or have a control message to obtain the
 network headers. For instance:

  put_cmsg(msg, SOL_IPV6, ..., (skb->h.raw - skb->nh.raw),
   skb->nh.raw);

   Hugo

signature.asc
Description: Digital signature

Re: [RFC] Mobile IPv6 introduction

2006-08-02 Thread Hugo Santos

Hi Ville,

On Wed, Aug 02, 2006 at 10:58:49AM +0300, Ville Nuorvala wrote:
> To name just one issue: the chicken and egg problem of source address
> selection and source address based routing. I solved this problem by
> letting policy rules (with a source prefix) add additional constraints
> to the address selection. This did however mean the source address
> selection had to be moved inside the routing code.

   To tell you the truth i don't know what MIPL does in terms of policy
 management. In my implementation, all routing policies go into
 Subtrees without any kind of extra routing tables. I also had the
 problem you describe, but i opted for what i think is a simpler
 solution:

   - Access router default routes are installed with a source-address,
 the address that was generated from the announced prefix (which to
 be fair degenerates to several entries if a single router announces
 multiple prefixes). This is based on the assumption that access
 routers do perform source-based ingress filtering so you may only
 use a particular access router for global connectivity using a
 particular address.
   - The default home route is installed without a source-address for
 the "default" Home address (i may have several).

   This means Linux's source address selection works without
 modifications: if no address is specified, it will pick the default
 home route and then the Home address (which has a preference as well).
 In this sense, subtrees have worked fine for me.

> But route optimization is just one form of packet transform; it just
> adds a Routing Header type 2 and/or Home Address Option Destination
> Header to the outgoing packet. Isn't xfrm just the right place for this?
> 
> You are right that we (HUT and USAGI) have mostly just looked at the
> xfrm framework from a MIPv6+IPsec perspective, but even this has helped
> us pinpoint several shortcomings in the current only IPsec specific
> framework.

   XFRM is indeed the right place for this; i just would rather not have
 the mode exposed and prefer wrapping any mode-specific stuff into
 optional callbacks. It might not be as performant but would allow
 adding new modes more easily.

   Hugo

signature.asc
Description: Digital signature

Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch

2006-08-02 Thread Stephen Smalley

On Wed, 2006-08-02 at 02:47 -0400, Catherine Zhang wrote:
> Hi, all,
> 
> Enclosed please find the updated patch incorporating comments from
> Stephen and Dave.

Note that this patch is intended for 2.6.18 as a bug fix for the memory
leak introduced by the original dgram peersec patches.

> Again thanks for your help!
> Catherine
> 
> --
> 
> 
> From: [EMAIL PROTECTED]
> 
> This patch implements a cleaner fix for the memory leak problem of the 
> original 
> unix datagram getpeersec patch.  Instead of creating a security context each
> time a unix datagram is sent, we only create the security context when the
> receiver requests it.
> 
> This new design requires modification of the current unix_getsecpeer_dgram
> LSM hook and addition of two new hooks, namely, secid_to_secctx and
> release_secctx.  The former retrieves the security context and the latter
> releases it.  A hook is required for releasing the security context because
> it is up to the security module to decide how that's done.  In the case of
> Selinux, it's a simple kfree operation.

Acked-by:  Stephen Smalley <[EMAIL PROTECTED]>

> 
> 
> ---
> 
>  include/linux/security.h |   41 +++--
>  include/net/af_unix.h|6 ++
>  include/net/scm.h|   29 +
>  net/ipv4/ip_sockglue.c   |9 +++--
>  net/unix/af_unix.c   |   17 +
>  security/dummy.c |   14 --
>  security/selinux/hooks.c |   38 --
>  7 files changed, 110 insertions(+), 44 deletions(-)
> 
> diff -puN include/net/scm.h~af_unix-datagram-getpeersec-ml-fix 
> include/net/scm.h
> --- linux-2.6.18-rc2/include/net/scm.h~af_unix-datagram-getpeersec-ml-fix 
> 2006-07-22 21:28:21.0 -0400
> +++ linux-2.6.18-rc2-cxzhang/include/net/scm.h2006-08-01 
> 22:43:50.0 -0400
> @@ -3,6 +3,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  /* Well, we should have at least one descriptor open
>   * to accept passed FDs 8)
> @@ -20,8 +21,7 @@ struct scm_cookie
>   struct ucredcreds;  /* Skb credentials  */
>   struct scm_fp_list  *fp;/* Passed files */
>  #ifdef CONFIG_SECURITY_NETWORK
> - char*secdata;   /* Security context */
> - u32 seclen; /* Security length  */
> + u32 secid;  /* Passed security ID   */
>  #endif
>   unsigned long   seq;/* Connection seqno */
>  };
> @@ -32,6 +32,16 @@ extern int __scm_send(struct socket *soc
>  extern void __scm_destroy(struct scm_cookie *scm);
>  extern struct scm_fp_list * scm_fp_dup(struct scm_fp_list *fpl);
>  
> +#ifdef CONFIG_SECURITY_NETWORK
> +static __inline__ void unix_get_peersec_dgram(struct socket *sock, struct 
> scm_cookie *scm)
> +{
> + security_socket_getpeersec_dgram(sock, NULL, &scm->secid);
> +}
> +#else
> +static __inline__ void unix_get_peersec_dgram(struct socket *sock, struct 
> scm_cookie *scm)
> +{ }
> +#endif /* CONFIG_SECURITY_NETWORK */
> +
>  static __inline__ void scm_destroy(struct scm_cookie *scm)
>  {
>   if (scm && scm->fp)
> @@ -47,6 +57,7 @@ static __inline__ int scm_send(struct so
>   scm->creds.pid = p->tgid;
>   scm->fp = NULL;
>   scm->seq = 0;
> + unix_get_peersec_dgram(sock, scm);
>   if (msg->msg_controllen <= 0)
>   return 0;
>   return __scm_send(sock, msg, scm);
> @@ -55,8 +66,18 @@ static __inline__ int scm_send(struct so
>  #ifdef CONFIG_SECURITY_NETWORK
>  static inline void scm_passec(struct socket *sock, struct msghdr *msg, 
> struct scm_cookie *scm)
>  {
> - if (test_bit(SOCK_PASSSEC, &sock->flags) && scm->secdata != NULL)
> - put_cmsg(msg, SOL_SOCKET, SCM_SECURITY, scm->seclen, 
> scm->secdata);
> + char *secdata;
> + u32 seclen;
> + int err;
> +
> + if (test_bit(SOCK_PASSSEC, &sock->flags)) {
> + err = security_secid_to_secctx(scm->secid, &secdata, &seclen);
> +
> + if (!err) {
> + put_cmsg(msg, SOL_SOCKET, SCM_SECURITY, seclen, 
> secdata);
> + security_release_secctx(secdata, seclen);
> + }
> + }
>  }
>  #else
>  static inline void scm_passec(struct socket *sock, struct msghdr *msg, 
> struct scm_cookie *scm)
> diff -puN net/unix/af_unix.c~af_unix-datagram-getpeersec-ml-fix 
> net/unix/af_unix.c
> --- linux-2.6.18-rc2/net/unix/af_unix.c~af_unix-datagram-getpeersec-ml-fix
> 2006-07-22 23:01:26.0 -0400
> +++ linux-2.6.18-rc2-cxzhang/net/unix/af_unix.c   2006-08-02 
> 02:25:00.454243480 -0400
> @@ -128,23 +128,17 @@ static atomic_t unix_nr_socks = ATOMIC_I
>  #define UNIX_ABSTRACT(sk)(unix_sk(sk)->addr->hash != UNIX_HASH_SIZE)
>  
>  #ifdef CONFIG_SECURITY_NETWORK
> -static void unix_get_peersec_dgram(struct sk_buff *skb)
> +static void unix_get_secdata(struct scm_cookie *scm,

RE: [RFC 2/3] secid reconciliation on inbound: add LSM hooks

2006-08-02 Thread Venkat Yekkirala

> > -   if (err)
> > -   goto out;
> > +   /* if (err) */
> > +   /*  goto out; */
> > 
> > -   err = selinux_xfrm_sock_rcv_skb(sksec->sid, skb, &ad);
> > -out:   +   /* err = 
> selinux_xfrm_sock_rcv_skb(sksec->sid, skb, &ad); */
> > +out:   return err;
> > }
> 
> 
> Did you mean to leave the call to selinux_xfrm_sock_rcv_skb() 
> commented 
> out?

I actually meant to take the call out entirely. Will fix this in
the next round.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [RFC 1/3] secid reconciliation on inbound

2006-08-02 Thread Venkat Yekkirala

> On Tue, 1 Aug 2006, James Morris wrote:
> 
> > On Tue, 1 Aug 2006, Venkat Yekkirala wrote:
> > 
> > > +#define PACKET__COME_THRU 0x0008UL
> > > +#define PACKET__GO_THRU   0x0010UL
> > 
> > These names seem awkward, and do we really need a separate 
> perm for each 
> > direction?
> 
> Ok, I see we need separate permissions.  The naming, still...

You are probably seeing something I haven't :), because I did
consider using just one perm such as flow_thru for both directions
but then thought separate perms would make things easier to understand.

As for naming, how about "enter" and "leave"? Or "flow_in" and "flow_out".
Any other suggestions out there?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Mobile IPv6 introduction

2006-08-02 Thread Masahide NAKAMURA

Hugo Santos wrote:
>Although the ICMP-filter approach would be better, it is not flexible
>  enough to handle this situation. We must also send ICMPv6 Parameter
>  Problems when ip6mh_proto isn't IPPROTO_NONE. I don't think it is too

I don't think IPPROTO_NONE case is a suitable example here
(it is also supported by our kernel patch).
We don't have any problem about who checks next header field since its
offset of mobility header never changes then its value
can be checked as the same way for all type number.

But anyway,

>  much of a burthen to handle ICMPv6 in the control daemon because you
>  should already do so to react to ICMPv6 error messages from peers
>  concerning MIPv6 signalling. I'm strongly against doing these checks in
>  the kernel for the simple reason that it is not easily extendable.  You
>  wouldn't be able to deploy a new daemon version over an existing kernel
>  with these changes if it supported a new control protocol with new
>  messages. I think we should follow a different path here and i propose
>  either have a hdrinc=1 mode (for reception only) for protocol raw
>  sockets, possibly adding with control on reception which specifies the
>  offset of the UPL header; or have a control message to obtain the
>  network headers. For instance:
>
>   put_cmsg(msg, SOL_IPV6, ..., (skb->h.raw - skb->nh.raw),
>skb->nh.raw);

I can agree such suggestion as new kernel feature but I'm not sure
MIPv6 stuff should depend on it just for new message type to extend later.
On our design MIPv6 signaling itself is almost done by user-space daemon.
When developer wants to add new or original type number, it is enough for
kernel to be added the number and its length. All other things can be modified 
at
user-space application. If there is much requirement to add new type number
without any modification of kernel code at all I would support ICMPv6 filter 
approach,
too.

-- 
Masahide NAKAMURA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] DECnet Fix for DECnet routing bug

2006-08-02 Thread Patrick Caulfield

This patch fixes a bug in the DECnet routing code where we were selecting
a loopback device in preference to an outward facing device even when
the destination was known non-local. This patch should fix the problem.

Signed-off-by: Patrick Caulfield <[EMAIL PROTECTED]>
Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]>

diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 1355614..743e9fc 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -925,8 +925,13 @@ static int dn_route_output_slow(struct d
for(dev_out = dev_base; dev_out; dev_out = dev_out->next) {
if (!dev_out->dn_ptr)
continue;
-   if (dn_dev_islocal(dev_out, oldflp->fld_src))
-   break;
+   if (!dn_dev_islocal(dev_out, oldflp->fld_src))
+   continue;
+   if ((dev_out->flags & IFF_LOOPBACK) &&
+   oldflp->fld_dst &&
+   !dn_dev_islocal(dev_out, oldflp->fld_dst))
+   continue;
+   break;
}
read_unlock(&dev_base_lock);
if (dev_out == NULL)


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [NET]: Fix ___pskb_trim when entire frag_list needs dropping

2006-08-02 Thread Marco Berizzi

Herbert Xu wrote:

On Thu, Jul 13, 2006 at 07:03:41PM +1000, herbert wrote:
>
> This needs to go into stable as well.  In fact, there is another 
unrelated

> bug with exactly the same symptoms which was inadvertently fixed by the
> GSO patches.  So I'll send a simpler fix for that to stable.
>
> [NET]: Update frag_list in pskb_trim

Marco told me that he was still seeing the same problem.  Turns out that
my patch missed one important case.  I hope this is really the last time
I have to look at this bug :)

[NET]: Fix ___pskb_trim when entire frag_list needs dropping

When the trim point is within the head and there is no paged data,
___pskb_trim fails to drop the first element in the frag_list.
This patch fixes this by moving the len <= offset case out of the
page data loop.

This patch also adds a missing kfree_skb on the frag that we just
cloned.

The problem is fixed now. I have applied this
patch to 2.6.18-rc2
Many thanks Herbert for all the time spent
to debug this problem.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 speed/duplex error

2006-08-02 Thread Auke Kok


a1 wrote:

JK> I agree.  Although ethtool does not have that functionality as of yet.
JK>  Feel free to provide a patch to the ethtool maintainer (Jeff Garzik)
JK> if you would like.  I will put it on my plate of things to do, but I
JK> will admit that it is near the bottom of the list of items to get done
JK> for me.  Feel free to ping me once in awhile to remind me.

Ethtool already have support for that, but e1000 driver doesn't treat
all values passed from ethtool correctly.

For example, if I run ethtool with the following parameters:
ethtool -s eth0 speed 100 duplex full autoneg on

parameters filled by ethtool looks like:

ecmd->autoneg = AUTONEG_ENABLE;
ecmd->advertising = ADVERTISED_100baseT_Full;

but then they passed to the driver, driver fills the structure passed
to the hw layer with all possible advertise values.



if (ecmd->autoneg == AUTONEG_ENABLE) {
hw->autoneg = 1;
if (hw->media_type == e1000_media_type_fiber)
hw->autoneg_advertised = ADVERTISED_1000baseT_Full |
 ADVERTISED_FIBRE |
 ADVERTISED_Autoneg;
else
--->hw->autoneg_advertised = ADVERTISED_10baseT_Half |
  ADVERTISED_10baseT_Full |
  ADVERTISED_100baseT_Half |
  ADVERTISED_100baseT_Full |
  ADVERTISED_1000baseT_Full|
  ADVERTISED_Autoneg |
  ADVERTISED_TP;
ecmd->advertising = hw->autoneg_advertised;
} else



If you change it that way everything works like I thought

--- e1000_ethtool.c.origMon Jun 26 14:13:26 2006
+++ e1000_ethtool.c Wed Aug 02 12:35:36 2006
@@ -225,13 +225,7 @@
 ADVERTISED_FIBRE |
 ADVERTISED_Autoneg;
else
-   hw->autoneg_advertised = ADVERTISED_10baseT_Half |
- ADVERTISED_10baseT_Full |
- ADVERTISED_100baseT_Half |
- ADVERTISED_100baseT_Full |
- ADVERTISED_1000baseT_Full|
- ADVERTISED_Autoneg |
- ADVERTISED_TP;
+   hw->autoneg_advertised = ecmd->advertising;


Don't you mean this? :

+   hw->autoneg_advertised = ecmd->advertising |
+ADVERTISED_Autoneg |
+ADVERTISED_TP;

and we'd also have to do this for fibre...

>

ecmd->advertising = hw->autoneg_advertised;
} else
if (e1000_set_spd_dplx(adapter, ecmd->speed + ecmd->duplex))



but that's not really what you want: the way ethtool works currently only 
allows you to pass *one* speed/duplex tuple and autonegotiate with that, or 
all (by omitting any speed/duplex tuple).


ethtool needs some code that allows you to specify "autonegotiate 10_half or 
100_full or 1000_full" (3 tuples, but not implying 100_half or 10_full). This 
is something mii-tool was able to do but this functionality never made it into 
ethtool AFAIK :)


This is the most useful case for everyone, you can omit advertising gig link 
if you only have 100mbit switches and speed up link times that way etc.


Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Fix more per-cpu typos

2006-08-02 Thread Alexey Dobriyan

On Wed, Aug 02, 2006 at 05:02:11AM +0200, Andi Kleen wrote:
> > --- a/arch/x86_64/kernel/smp.c
> > +++ b/arch/x86_64/kernel/smp.c
> > @@ -203,7 +203,7 @@ int __cpuinit init_smp_flush(void)
> >  {
> > int i;
> > for_each_cpu_mask(i, cpu_possible_map) {
> > -   spin_lock_init(&per_cpu(flush_state.tlbstate_lock, i));
> > +   spin_lock_init(&per_cpu(flush_state, i).tlbstate_lock);
>
> What advantage does this have over the earlier form?

I've grepped tree after seeing "[PATCH] fix vmstat per cpu usage"¹.
Rationale mentioned in that thread are
1) invalid asm on s390
2) it only works because per-cpu macros are very simple

> In general this should be split up into three patches.

Yep, I see Andrew splitted them.

¹ http://marc.theaimsgroup.com/?l=linux-kernel&m=115445399826223&w=2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 20/23] [PATCH] [XFRM] POLICY: sub policy support.

2006-08-02 Thread James Morris

On Sat, 29 Jul 2006, Masahide NAKAMURA wrote:

> Sub policy is introduced. Main and sub policy are applied the same flow.
> (Policy that current kernel uses is named as main.)
> It is required another transformation policy management to keep IPsec
> and Mobile IPv6 lives separate.
> Policy which lives shorter time in kernel should be a sub i.e. normally
> main is for IPsec and sub is for Mobile IPv6.
> (Such usage as two IPsec policies on different database can be used, too.)

Why can't IPSec & MIP transforms be bundled on the same policy?

Or, perhaps a different approach is needed, where the disposition of a 
policy can be to re-submit a packet for another policy match after the 
current bundle has been traversed (something like NF_REPEAT).

- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ipv4: don't call upper-layer disconnect function if not connected

2006-08-02 Thread Brian Haley




The socket could have been bind()'d to, in which case it will
not move to connected state and we still need to invoke
the disconnect methods such as udp_disconnect() to clear out
that binding.


Ok.


You seem to be groveling in random areas of the ipv4 and ipv6 stack,
what are you working on?


Was looking into a customer-reported memory leak that seemed to be in 
this code path.  It wasn't, but this tweak seemed sane at the time.


-Brian
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IPv6: only set err in rawv6_bind() when necessary

2006-08-02 Thread Brian Haley




Every other path going from this location in rawv6_bind()
will clear err to zero, so your patch also doesn't fix any
bug.


I knew it didn't fix a bug, I just hadn't noticed the C idiom you 
pointed-out until I knew to look for it.  rawv6_bind() even does this, duh.


-Brian
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch

2006-08-02 Thread Michal Piotrowski


Hi Catherine,

On 02/08/06, Catherine Zhang <[EMAIL PROTECTED]> wrote:

Hi, all,

Enclosed please find the updated patch incorporating comments from
Stephen and Dave.


Thanks!



Again thanks for your help!
Catherine

--


Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/wiki/)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.18-rc2][e1000][swsusp] - Regression - Suspend to disk and resume breaks e1000 - RESOLVED Bug #6867

2006-08-02 Thread Auke Kok


Shawn Starr wrote:

On Sunday 16 July 2006 12:33 pm, Auke Kok wrote:

[adding netdev to the cc]

unfortunately I didn't.

e1000 has a special e1000_pci_save_state/e1000_pci_restore_state set of
routines that save and restore the configuration space. the fact that it
works for suspend to memory to me suggests that there is nothing wrong with
that.



Hi Auke,

It appears 2.6.18-rc3 this does not occur anymore. I suspended to disk/ram and 
the interface pci registers were restored. Bugzilla #6867


I would not be surprised if all the suspend issues in 2.6.18rcX were not 
involved in this somehow... thanks for reporting back in.


Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 speed/duplex error

2006-08-02 Thread Auke Kok


a1 wrote:

Hi, Auke.

Auke Kok wrote:
AK> Here's that part of the driver documentation:

AK> $ modprobe e1000 AutoNeg=0x08
AK> e1000: :00:00.0: e1000_validate_option: AutoNeg advertising 100/FD


AK>   99 /* Auto-negotiation Advertisement Override
AK> 100  *
AK> 101  * Valid Range: 0x01-0x0F, 0x20-0x2F (copper); 0x20 (fiber)
AK> 102  *
AK> 103  * The AutoNeg value is a bit mask describing which speed and duplex
AK> 104  * combinations should be advertised during auto-negotiation.
AK> 105  * The supported speed and duplex modes are listed below
AK> 106  *
AK> 107  * Bit   7 6 5  4  3 2 1  0
AK> 108  * Speed (Mbps)  N/A   N/A   1000   N/A100   100   10 10
AK> 109  * DuplexFull  Full  Half  Full   Half
AK> 110  *
AK> 111  * Default Value: 0x2F (copper); 0x20 (fiber)
AK> 112  */

This is not what I'm thinking of. Say, for example, I have a bunch of
e1000 adapters in my box and want to dynamically change one's spd/dplx.
For that works in the way you described I need to stop all of them and
load with autoneg parameter (can I pass this parameter only to single
card?) and loose all connection I had on other adapters.


you can pass these parameters per card as such:

$ modprobe e1000 AutoNeg=0x2f,0x28,0x2f,0x2f

this way card #2 will see the non-default value, the other 3 will run with the 
default value (0x2f).


Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.16.19 2/2] LARTC: trace control for netem: kernelspace

2006-08-02 Thread Rainer Baumann

Trace Control for Netem: Emulate network properties such as
long-dependency and self-similarity of cross-traffic.

The delay, drop, duplication and corruption values are readout in user
space and sent to kernel space via procfs.
The kernel determines the time when new values should be sent by the use
of SIGSTOP and SIGCONT signals.
In order to have always packet action values ready to apply, there are
two buffers that hold these values.
Packet action values can be read from one buffer and the other buffer
can be refilled with new values simultaneously.
If a buffer is empty it will be switched to the other buffer and a
SIGCONT signal is sent in order to receive new packet action values.

Having applied the delay value to a packet, the packet gets processed by
the original netem functions.

Signed-off-by: Rainer Baumann <[EMAIL PROTECTED]>

---

Patch for linux kernel 2.6.16.19: http://tcn.hypert.net/tcnKernel.patch



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.16.19 1/2] LARTC: trace control for netem: userspace

2006-08-02 Thread Rainer Baumann

Trace Control for Netem: Emulate network properties such as
long-dependency and self-similarity of cross-traffic.

The directory tc/netem was split in two parts, one containing the
original distributions and the other the tools to generate trace files
as well as the program responsible for reading the delay values from the
trace file and sending them to the kernel (called flowseed).
If the trace option is set, netem starts the flowseedprocess and
initializes the kernel. To be able to kill the flowseedprocess, in case
the command was faulty, the PID of the flowseedprocess is passed to the
netem kernel module. If the kernel receives packet delay data from a not
registered PID, the Process will be killed. The flowseedprocess does not
send data to the kernel until the registration is completed.

Signed-off-by: Rainer Baumann <[EMAIL PROTECTED]>

---

Patch for iproute2-2.6.16-060323: http://tcn.hypert.net/tcnIproute.patch



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.16.19 0/2] LARTC: trace control for netem

2006-08-02 Thread Rainer Baumann

Hi,

We developed an extension to the network emulator netem, that provides
emulation of long term network properties such as long-range dependence
and self-similarity of cross-traffic. It is not possible to emulate
these properties with the  statistical tables for the packet delay
values used by the original netem.

We read the values for the packet delay, drop, loss and corruption from
a pre-generated trace file. This trace file is obtained by monitoring
network traffic and writing all actions to a trace file. During the
emulation the packets get processed according the values in such a trace
file. Detailed information are available on our
Webseitehttp://tcn.hypert.net

A new option (trace) has been added to the netem command. If the trace
option is used, the values for packet delay etc. are read from a trace
file, afterwards the packets are processed by the normal netem functions.
The packet action values are readout from the trace file in user space
and sent to kernel space via procfs.

The evaluation results show similar behavior for our enhancement and the
original netem with respect to packet delay precision and packet loss at
high load (e.g. 80'000 packets per second).
It is possible to add, change or delete multiple netem qdiscs on-the-fly
(original netem qdiscs and trace qdiscs mixed).

We are looking forward for any comments, feedback and suggestions!

Thanks,
Rainer



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] gre: transparent ethernet bridging

2006-08-02 Thread Stephen Hemminger

On Wed, 02 Aug 2006 16:17:42 +1000
Philip Craig <[EMAIL PROTECTED]> wrote:

> > Stephen Hemminger wrote:
> >> I am not against making the bridge code smarter to handle other
> >> encapsulation.
> 
> Here's an updated patch that fixes all issues I am aware of.
> 
> It generates a random mac address for gre ports, and also stores
> a copy of the mac address for ethernet ports, rather than checking
> dev->type everywhere.

That looks cleaner. I wonder if using a fixed OUI would be better
than random addresses but then choosing an OUI would be a problem.

You probably should add a comment about what this function is doing, 
and why.

> static int __br_nf_dev_queue_xmit(struct sk_buff *skb)
> +{
> + if (skb->dst == (struct dst_entry *)&__fake_rtable) {
> + dst_release(skb->dst);
> + skb->dst = NULL;
> + }
> +
> + return br_dev_queue_push_xmit(skb);
> +}
> +
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

means to artificially alter the bandwidth of a system

2006-08-02 Thread Irfan Habib


Hi,

For research purposes we are considering to develop a program to alter
the bandwidth of a system via the software, so instance: a machine has
100 MB/s and we change it to 1MB/s.

Does something like this already exist? Or is there a way to do this
without creating a program/kernel module

Any help will be highly appreciated!

Irfan Habib
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Stackable devices.

2006-08-02 Thread Ben Greear


Stephen Hemminger wrote:

On Wed, 2 Aug 2006 11:02:20 +0200
Christophe Devriese <[EMAIL PROTECTED]> wrote:



On Tuesday 01 August 2006 19:21, you wrote:


John W. Linville wrote:


I'm just not sure that cleverness is worth the headache, especially
since the most clever things usually only work by accident...


Or, work by solid, modular design and small tweaks!


Point taken.  But stashing little hacks in the networking core for
specific virtual drivers isn't totally modular either.  And even if
it were, "modular design" probably belongs on the list of "things
that can be taken too far", like "everything in userland", "never
use ioctl", and "microkernels are superior". :-)


To be honest, I'm not over-joyed to see bridging hooks included
in the VLAN code..but if that is what it takes to get bridging
and VLANs to play well and be flexible, I think it is a fair price.

It certainly wouldn't hurt to have someone take a holistic view of the
various L2 device interactions.  Just documenting current functionality
on, say, the netdev wiki would be a good first step.


Ultimate flexibility could be provided by making the netif_rx routine (and the 
others, including vlan etc), a "virtual" routine.


That way a list of "filters" could be defined that allow any processing to be 
done on the packet before it is handed of to the linux kernel's higher 
layers, including not delivering it on that interface, or delivering it on 
another interface.


This would allow very complex implementations including stuff like a 
high-level l2 bridge, with vlan support, and a number of protocols like rstp, 
pvst+, ... with relatively simple code, that could be isolated from the main 
kernel.


Would anyone be interested in signing off on such a patch ? (which basically 
creates netif_rx and vlan_acc_netif_rx lists in the net_device structure, and 
then modify bridging and bonding drivers to just use this) 



I have thought about this, but you end up reinventing System V streams.
The problem is for simple up/down call, the stacking model works fine but
once you add flow-control and multiplexing issues the problem becomes complex.
It is hard to think of a good general solution where the performance wouldn't
end up sucking.


Currently, the bridge hook logic is something like:

if (bridge-consumed-pkt) {
return
}

// drop through to other layers


There are several other hooks I'd like to see added (pktgen receive processing,
mac-vlans, etc).  Each of these hooks are logically similar to the bridge hook,
ie if it consumes the pkt, return, else, drop through to the next hook untill
we get to the regular protocol processing logic.

I would like to be able to chain layer-2 handlers, such as bridge, mac-vlan,
pktgen such that if one consumed, you break out of the handling, else, you
try the next handler.  The handlers can be dynamically registered and inserted
in any order, controllable by user-space and/or module load/unload.

For many of the handlers, the logic will re-insert the packet by re-calling the
netif-rx logic, so there would need to be some protection to keep loops from
occurring that would recurse too much and overflow the stack.

Ben

--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Stackable devices.

2006-08-02 Thread Stephen Hemminger

On Wed, 2 Aug 2006 11:02:20 +0200
Christophe Devriese <[EMAIL PROTECTED]> wrote:

> On Tuesday 01 August 2006 19:21, you wrote:
> > John W. Linville wrote:
> > >>>I'm just not sure that cleverness is worth the headache, especially
> > >>>since the most clever things usually only work by accident...
> > >>
> > >>Or, work by solid, modular design and small tweaks!
> > >
> > > Point taken.  But stashing little hacks in the networking core for
> > > specific virtual drivers isn't totally modular either.  And even if
> > > it were, "modular design" probably belongs on the list of "things
> > > that can be taken too far", like "everything in userland", "never
> > > use ioctl", and "microkernels are superior". :-)
> >
> > To be honest, I'm not over-joyed to see bridging hooks included
> > in the VLAN code..but if that is what it takes to get bridging
> > and VLANs to play well and be flexible, I think it is a fair price.
> >
> > It certainly wouldn't hurt to have someone take a holistic view of the
> > various L2 device interactions.  Just documenting current functionality
> > on, say, the netdev wiki would be a good first step.
> 
> Ultimate flexibility could be provided by making the netif_rx routine (and 
> the 
> others, including vlan etc), a "virtual" routine.
> 
> That way a list of "filters" could be defined that allow any processing to be 
> done on the packet before it is handed of to the linux kernel's higher 
> layers, including not delivering it on that interface, or delivering it on 
> another interface.
> 
> This would allow very complex implementations including stuff like a 
> high-level l2 bridge, with vlan support, and a number of protocols like rstp, 
> pvst+, ... with relatively simple code, that could be isolated from the main 
> kernel.
> 
> Would anyone be interested in signing off on such a patch ? (which basically 
> creates netif_rx and vlan_acc_netif_rx lists in the net_device structure, and 
> then modify bridging and bonding drivers to just use this) 

I have thought about this, but you end up reinventing System V streams.
The problem is for simple up/down call, the stacking model works fine but
once you add flow-control and multiplexing issues the problem becomes complex.
It is hard to think of a good general solution where the performance wouldn't
end up sucking.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
"And in the Packet there writ down that doome"
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.16.19 0/2] LARTC: trace control for netem

2006-08-02 Thread Stephen Hemminger

On Wed, 02 Aug 2006 19:21:27 +0200
Rainer Baumann <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> We developed an extension to the network emulator netem, that provides
> emulation of long term network properties such as long-range dependence
> and self-similarity of cross-traffic. It is not possible to emulate
> these properties with the  statistical tables for the packet delay
> values used by the original netem.
> 
> We read the values for the packet delay, drop, loss and corruption from
> a pre-generated trace file. This trace file is obtained by monitoring
> network traffic and writing all actions to a trace file. During the
> emulation the packets get processed according the values in such a trace
> file. Detailed information are available on our
> Webseitehttp://tcn.hypert.net
> 
> A new option (trace) has been added to the netem command. If the trace
> option is used, the values for packet delay etc. are read from a trace
> file, afterwards the packets are processed by the normal netem functions.
> The packet action values are readout from the trace file in user space
> and sent to kernel space via procfs.
> 
> The evaluation results show similar behavior for our enhancement and the
> original netem with respect to packet delay precision and packet loss at
> high load (e.g. 80'000 packets per second).
> It is possible to add, change or delete multiple netem qdiscs on-the-fly
> (original netem qdiscs and trace qdiscs mixed).
> 
> We are looking forward for any comments, feedback and suggestions!
> 
> Thanks,
> Rainer

I like the idea and want to get it incorporated.

Major things that need fixing:
* Don't extend size of tc_netem_qopt instead use a new netlink
  payload.
+ add type to TCA_NETEM_ enum
+ new structure containing the payload
  This allows for binary compatiablity.

* Don't use proc for a interface to netem features. Use netlink.
  Either add a new command (or option) to the iproute2 commands
  to handle flow table, or add a new payload.


Minor stuff:
* the bzero macro in netem is a BSDism, just use memset
* bad indentation and style issues.
* minor whitespace damage in several places in patch

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][SECURITY] secmark: nul-terminate secdata

2006-08-02 Thread Chris Wright

* James Morris ([EMAIL PROTECTED]) wrote:
> cc'd Chris Wright, as this patch seems like a candidate for the stable 
> tree.

Would be, but I thought secmark went in post 2.6.17.  And I expect Dave
will push this well before 2.6.18.

thanks,
-chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [bug] e100: checksum mismatch on 82551ER rev10

2006-08-02 Thread Auke Kok


[cc-ing netdev]
[adding original thread authors back, please do not strip CC]

Charlie Brady wrote:

Molle Bestefich wrote:

The NICs are working perfectly.

How can you tell? Do you know if jumbo frames work correctly? Is the
device properly checksumming? is flow control working properly? These
and many, many more settings are determined by the EEPROM. Seemingly it
may work correctly, but there is no guarantee whatsoever that it will 
work

correctly at all if the checksum is bad. Again, you can lose data, or
worse, you could corrupt memory in the system causing massive failure 
(DMA

timings, etc). Unlikely? sure, but not impossible.


Let's assume that these things are all true, and the NIC currently does 
not work perfectly, just imperfectly, but acceptably. With the recent 
driver change, it now does not work at all. That's surely a bug in the 
driver.


There is no logic in that sentence at all. You're saying that the driver is 
broken because it doesn't fix an error in the EEPROM?


We're trying extremely hard to fix real errors here (especially when we find 
that hardware resellers send out hardware with EEPROM problems) and you are 
asking for a workaround that will (likely) introduce random errors and failure 
into your kernel. I do not want to accept responsability for that and I also 
do not think any other kernel developer would like me to release such a risk 
into the kernel. I'd probably get whistled back instantly :)


If you want to edit your own kernel then I am fine with it. If you want to 
recalculate the checksum yourself and put it in the EEPROM then I am also fine 
with that. As long as you never ask for support for that NIC. But we can't 
support an option that allows all users to willingly enable a piece of 
non-properly-working hardware. Because that is what it is: Not properly 
configured hardware.


The bottom line is that your problem is that a specific hardware vendor is/was 
selling badly configured hardware, and you buy it from them, even after it's 
End Of Lifed for that vendor. Even though that vendor did buy the units 
properly configured and had all the tools needed to configure them properly. I 
can maybe fix your problem by seeing if we can get you an eeprom update, but I 
can not break everyone elses kernel for that.


Auke



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [bug] e100: checksum mismatch on 82551ER rev10

2006-08-02 Thread Charlie Brady



On Wed, 2 Aug 2006, Auke Kok wrote:


[cc-ing netdev]
[adding original thread authors back, please do not strip CC]


[There were no Cc's visible in the lkml archive I used as source of my 
quotes.]



Charlie Brady wrote:


Let's assume that these things are all true, and the NIC currently does 
not work perfectly, just imperfectly, but acceptably. With the recent 
driver change, it now does not work at all. That's surely a bug in the 
driver.


There is no logic in that sentence at all. You're saying that the driver is 
broken because it doesn't fix an error in the EEPROM?


I am not asking the driver to fix errors in the EEPROM. I'm asking it to 
send and receive packets, as it has done in the past.


We're trying extremely hard to fix real errors here (especially when we find 
that hardware resellers send out hardware with EEPROM problems) ...


I do not expect the kernel to perform QA tests on my hardware, just work.

and you are 
asking for a workaround that will (likely) introduce random errors and 
failure into your kernel. I do not want to accept responsability for 
that ...


You publish your code under the GPL. You explicitly disclaim any warranty.


If you want to edit your own kernel then I am fine with it.


I suspect that if all/many T23 laptops perform as mine does then some 
major vendors will also edit their kernels. I'm sure they would rather not 
do that.


If you want to recalculate the checksum yourself and put it in the 
EEPROM then I am also fine with that.


Can you provide a reference as to how I might do that?

As long as you never ask for support for that NIC. But we can't support 
an option that allows all users to willingly enable a piece of 
non-properly-working hardware. Because that is what it is: Not properly 
configured hardware.


Which it may be. But it doesn't work at all with the new kernel, where it 
has in the past.


The bottom line is that your problem is that a specific hardware vendor 
is/was selling badly configured hardware, and you buy it from them, even 
after it's End Of Lifed for that vendor. Even though that vendor did buy the 
units properly configured and had all the tools needed to configure them 
properly.


I don't think either of us knows that.

I can maybe fix your problem by seeing if we can get you an eeprom 
update...


That'd be great. Thanks!

Regards

---
Charlie
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [bug] e100: checksum mismatch on 82551ER rev10

2006-08-02 Thread Auke Kok


Charlie Brady wrote:
Let's assume that these things are all true, and the NIC currently 
does not work perfectly, just imperfectly, but acceptably. With the 
recent driver change, it now does not work at all. That's surely a 
bug in the driver.


There is no logic in that sentence at all. You're saying that the 
driver is broken because it doesn't fix an error in the EEPROM?


I am not asking the driver to fix errors in the EEPROM. I'm asking it to 
send and receive packets, as it has done in the past.


maybe you are confusing e100 with eepro100. e100 has done this since it made 
it into 2.6.4 or so.


Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.16.19 0/2] LARTC: trace control for netem

2006-08-02 Thread Rainer Baumann

Thanx for your feedback! We will try to fix this.
Rainer

Stephen Hemminger wrote:
> On Wed, 02 Aug 2006 19:21:27 +0200
> Rainer Baumann <[EMAIL PROTECTED]> wrote:
>
>   
>> Hi,
>>
>> We developed an extension to the network emulator netem, that provides
>> emulation of long term network properties such as long-range dependence
>> and self-similarity of cross-traffic. It is not possible to emulate
>> these properties with the  statistical tables for the packet delay
>> values used by the original netem.
>>
>> We read the values for the packet delay, drop, loss and corruption from
>> a pre-generated trace file. This trace file is obtained by monitoring
>> network traffic and writing all actions to a trace file. During the
>> emulation the packets get processed according the values in such a trace
>> file. Detailed information are available on our
>> Webseitehttp://tcn.hypert.net
>>
>> A new option (trace) has been added to the netem command. If the trace
>> option is used, the values for packet delay etc. are read from a trace
>> file, afterwards the packets are processed by the normal netem functions.
>> The packet action values are readout from the trace file in user space
>> and sent to kernel space via procfs.
>>
>> The evaluation results show similar behavior for our enhancement and the
>> original netem with respect to packet delay precision and packet loss at
>> high load (e.g. 80'000 packets per second).
>> It is possible to add, change or delete multiple netem qdiscs on-the-fly
>> (original netem qdiscs and trace qdiscs mixed).
>>
>> We are looking forward for any comments, feedback and suggestions!
>>
>> Thanks,
>> Rainer
>> 
>
> I like the idea and want to get it incorporated.
>
> Major things that need fixing:
> * Don't extend size of tc_netem_qopt instead use a new netlink
>   payload.
> + add type to TCA_NETEM_ enum
> + new structure containing the payload
>   This allows for binary compatiablity.
>
> * Don't use proc for a interface to netem features. Use netlink.
>   Either add a new command (or option) to the iproute2 commands
>   to handle flow table, or add a new payload.
>
>
> Minor stuff:
> * the bzero macro in netem is a BSDism, just use memset
> * bad indentation and style issues.
> * minor whitespace damage in several places in patch
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] SMSC LAN911x and LAN921x vendor driver

2006-08-02 Thread Steve . Glendinning

Hi John,

Thanks for all your feedback.

> > +/* waits for MAC not busy, with timeout.  Assumes MacPhyAccessLock 
has
> > + * already been acquired */
> > +static int smsc911x_mac_notbusy(struct smsc911x_data *pdata)
> > +{
> > +int i;
> > +
> > +for (i = 0; i < 40; i++) {
> > +if ((smsc911x_reg_read(pdata, 
MAC_CSR_CMD)
> > + & MAC_CSR_CMD_CSR_BUSY_) == 0) {
> > +return 1;
> > +}
> > +}
> > +SMSC_WARNING("Timed out waiting for MAC not BUSY. "
> > + "MAC_CSR_CMD: 0x%08X", 
smsc911x_reg_read(pdata,
> > +  MAC_CSR_CMD));
> > +return 0;
> > +}
> 
> How is the length of this timeout controlled?  IOW, what prevents
> it from being too short when the Omegatron 128 running at 10GHz hits
> the market?  Are you relying on the MII clock rate?

The LAN911x and LAN921x devices uses an SRAM-like bus interface with a 
minimum cycle time of 45ns, so smsc_reg_read() and smsc_reg_write() are 
guaranteed to take at least 45ns.  The MAC operates a little slower, but 
the operation shouldn't take longer than 225ns (5 read cycles).

The PHY is accessed via slave registers in the MAC (which then relays the 
command over mii), so its timeout works in the same way.

The timeouts are only there to prevent total lockup if the hardware fails, 
if the part is working it should take nowhere near 40 iterations.


> > +/* Auto-detect PHY */
> > +for (address = 0; address <= 31; 
address++) {
> > +pdata->phy_address = 
address;
> > +phyid1 = 
smsc911x_phy_read(pdata, MII_PHYSID1);
> > +phyid2 = 
smsc911x_phy_read(pdata, MII_PHYSID2);
> > +if ((phyid1 != 0xU) 
|| (phyid2 != 0xU)) {
> > + SMSC_TRACE("Detected PHY at address = "
> > + "0x%02X = %d", address, address);
> > +break;
> > +}
> > +}
> 
> Does this need the magic "for (addr=1; addr <=32; addr++)" trick that
> has become idiomatic for PHY discovery in our drivers?

I don't understand the question - surely 32 is not a valid PHY address?


> > +/* SMSC911x registers and bitfields */
> > +#define RX_DATA_FIFO0x00
> > +
> > +#define TX_DATA_FIFO0x20
> > +#define TX_CMD_A_ON_COMP_   0x8000
> > +#define TX_CMD_A_BUF_END_ALGN_  0x0300
> > +#define TX_CMD_A_4_BYTE_ALGN_   0x
> > +#define TX_CMD_A_16_BYTE_ALGN_  0x0100
> > +#define TX_CMD_A_32_BYTE_ALGN_  0x0200
> > +#define TX_CMD_A_DATA_OFFSET_   0x001F
> > +#define TX_CMD_A_FIRST_SEG_ 0x2000
> > +#define TX_CMD_A_LAST_SEG_  0x1000
> > +#define TX_CMD_A_BUF_SIZE_  0x07FF
> > +#define TX_CMD_B_PKT_TAG_   0x
> > +#define TX_CMD_B_ADD_CRC_DISABLE_  0x2000
> > +#define TX_CMD_B_DISABLE_PADDING_  0x1000
> > +#define TX_CMD_B_PKT_BYTE_LENGTH_  0x07FF
> 
> Looks like something went haywire w/ your tabbing in this file...?

Its just the "+ " in the patch, once applied it looks quite pretty!

Best Regards,
--
Steve Glendinning
SMSC GmbH
m: +44 777 933 9124
e: [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] SMSC LAN911x and LAN921x vendor driver

2006-08-02 Thread Steve . Glendinning

Hi Francois,

Thanks again for all your feedback.  I have implemented most of your 
suggestions, 

> >  /* Enable phy clocks to the MAC */
> >  hwcfg &= (~HW_CFG_PHY_CLK_SEL_);
> >  hwcfg |= HW_CFG_PHY_CLK_SEL_EXT_PHY_;
> >  smsc911x_reg_write(hwcfg, pdata, HW_CFG);
> >  udelay(10); /* Enough time 
for clocks to restart */
> 
> (back to my original question that I should have reworded in a different
> thread)
> 
> Does the platform guarantees that the register write has actually 
reached
> the real register when the udelay is issued ?

I think so, but maybe you can help me check.  The LAN911x device is always 
directly connected to a simple SRAM-like host bus, and smsc911x_reg_write 
is implemented using readl.  Does this implicitly guarantee it to be 
volatile?


> > 
> >  if (!pdata->software_irq_signal) {
> >  printk(KERN_WARNING "%s: ISR failed 
signaling test (IRQ %d)\n",
> > dev->name, dev->irq);
> >  return -ENODEV;
> >  }
> >  SMSC_TRACE("IRQ handler passed test using IRQ %d", 
dev->irq);
> > 
> >  printk(KERN_INFO "%s: SMSC911x/921x identified at %#08lx, 
IRQ: %d\n",
> > dev->name, (unsigned long)pdata->ioaddr, 
dev->irq);
> > 
> >  spin_lock_irqsave(&pdata->phy_lock, flags);
> 
> flags useless: ->open() is issued in irq-enabled context.

How do you mean? I thought an irq-enabled context meant i DO have to 
disable irqs?


> >  unsigned long flags;
> > 
> >  SMSC_TRACE("ioctl cmd 0x%x", cmd);
> >  switch (cmd) {
> >  case SIOCGMIIPHY:
> >  case SIOCDEVPRIVATE:
> 
> The SIOCDEVPRIVATE can/should be removed.

I have removed these, they were only in as a quick fix because mii-tool 
here sends SIOCDEVPRIVATE instead of SIOCGMIIPHY.  I fixed my copy of 
mii-tool instead :o)

Best Regards,
--
Steve Glendinning
SMSC GmbH
m: +44 777 933 9124
e: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] SMSC LAN911x and LAN921x vendor driver

2006-08-02 Thread John W. Linville

On Wed, Aug 02, 2006 at 08:23:40PM +0100, [EMAIL PROTECTED] wrote:

> > Does this need the magic "for (addr=1; addr <=32; addr++)" trick that
> > has become idiomatic for PHY discovery in our drivers?
> 
> I don't understand the question - surely 32 is not a valid PHY address?

That's why it is magic! :-)

The idea is to probe PHY addr 0 last in the series.  Apparently some
PHYs don't like seeing addr 0 or somesuch, so you try it last to avoid
screwing them up.  It may well be folklore and legend at this point.
Still, you will find several examples in the various drivers.
The sundance driver is one example.

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/1]SNMPv2 "ipv6IfStatsInHdrErrors" counter error

2006-08-02 Thread David Miller

From: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date: Mon, 31 Jul 2006 18:36:25 +0900 (JST)

> Hello.
> 
> Next time, please put your "Signed-off-by" line before the patch.
> Thank you.
> 
> In article <[EMAIL PROTECTED]> (at Tue, 01 Aug 2006 05:45:33 -0400), weidong 
> <[EMAIL PROTECTED]> says:
> 
> > signed-off-by:Wei Dong <[EMAIL PROTECTED]>
> Acked-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

Patch applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: means to artificially alter the bandwidth of a system

2006-08-02 Thread Ian McDonald


>Hi,
>
>For research purposes we are considering to develop a program to alter
>the bandwidth of a system via the software, so instance: a machine has
>100 MB/s and we change it to 1MB/s.
>
>Does something like this already exist? Or is there a way to do this
>without creating a program/kernel module

Of course: see http://linux-net.osdl.org/index.php/Iproute2 (especially tc)

>Any help will be highly appreciated!
>
>Irfan Habib

HGN


You may also want to look at Netem
http://linux-net.osdl.org/index.php/Netem if you want to play with
delay, loss as well. The examples there are good but I can send
scripts for you as well if you wish.

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: means to artificially alter the bandwidth of a system

2006-08-02 Thread Hagen Paul Pfeifer

* Irfan Habib | 2006-08-02 23:04:41 [+0500]:

>Hi,
>
>For research purposes we are considering to develop a program to alter
>the bandwidth of a system via the software, so instance: a machine has
>100 MB/s and we change it to 1MB/s.
>
>Does something like this already exist? Or is there a way to do this
>without creating a program/kernel module

Of course: see http://linux-net.osdl.org/index.php/Iproute2 (especially tc)

>Any help will be highly appreciated!
>
>Irfan Habib

HGN


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/1]SNMPv2 "ipv6IfStatsOutFragCreates" counter error

2006-08-02 Thread David Miller

From: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date: Mon, 31 Jul 2006 18:43:14 +0900 (JST)

> The patch seems sane to me.
> 
> In article <[EMAIL PROTECTED]> (at Tue, 01 Aug 2006 05:45:39 -0400), weidong 
> <[EMAIL PROTECTED]> says:
> 
> > signed-off-by: Wei Dong <[EMAIL PROTECTED]>
> Acked-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

Also applied, thank you.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-02 Thread David Miller

From: Christophe Devriese <[EMAIL PROTECTED]>
Date: Mon, 31 Jul 2006 10:15:40 +0200

Thanks for the detailed explanation.

> If you bond 2 vlan subinterfaces, the patch is not necessary at all. In that 
> case also the source device will be changed from eth0. to bond. So 
> that's correct behavior no ?
> 
> In the second case, you create vlan subifs on a bonding device, vlan 
> subinterfaces will be created on the slave interfaces. In that case the vlan 
> code will reassign the skb->dev node, and because skb_bond needs to know the 
> actual input device in order to make an informed drop decision before passing 
> this code (skb active-backup mode needs to drop packets from the backup slave 
> interface, if you don't do that you get big problems with broadcasts). 
> 
> The same struct vlan_group is assigned to all slave devices and so the only 
> vlan subinterfaces that exist in this case are the bond. 
> subinterfaces, and the vlan path for both slaves will assign the 
> bond. interface to skb->dev, thereby erasing the information about 
> where the packet came from.

Assuming it is correct to do the skb_bond() here in the VLAN hwaccel
RX path, then there is still one piece missing from what I can see.

Notice that in the netif_receive_skb() path, the return value from
skb_bond() is used as the third argument to the deliver_skb() routine
and friends which in turn gets passed to the packet_type functions.

Bonding, in particular, makes use of this third argument, see:

bond_3ad_lacpdu_recv()
rlb_arp_recv()

So if the new "orig_dev" you are computing in the VLAN hwaccel RX path
is the correct one, somehow this has to propagate down to the third
argument of the packet type ->func() invocations, right?

Finally, I'm still a little stumped about why this change is necessary
still, to be honest.  When you configure the bond, the slaves should
be the VLAN devices as far as I can tell.  Therefore it should be the
"vlan_device->masster" that we are interested in not the top-level
"dev->master".

If the ethernet is on a VLAN, and the administrator configures the
underlying ethernet device as the slaves of the bond, this to me seems
like a misconfiguration rather than something we should put hacks in
to support.

The fact that you do not propagate the "orig_dev" returned from
skb_bond() down to the packet type functions seems to support this.
>From my perspective, this looks like a hack for a bonding
misconfiguration.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: neigh_lookup lockdep bug.

2006-08-02 Thread David Miller

From: Arjan van de Ven <[EMAIL PROTECTED]>
Date: Wed, 02 Aug 2006 04:26:49 +0200

> fwiw the patch is at
> http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc2/2.6.18-rc2-mm1/broken-out/lockdep-split-the-skb_queue_head_init-lock-class.patch
> and a followup cleanup at
> http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc2/2.6.18-rc2-mm1/broken-out/lockdep-split-the-skb_queue_head_init-lock-class-tidy.patch

Both applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] NET: fix kernel panic from no dev->hard_header_len space

2006-08-02 Thread Krzysztof Halasa

David Miller <[EMAIL PROTECTED]> writes:

> I think Alexey is saying that setting ->hard_header() creates an
> agreement between the device and IP that IP will make sure
> that dev->hard_header_len bytes are available in the header
> area.

I think I now understand it: hard_header_len is guaranteed while
calling hard_header() (because the check is done just before the call)
but not elsewhere, particularly not in hard_start_xmit().
dev->hard_header being NULL or not doesn't change anything.

I think hard_start_xmit() can be called without calling hard_header()
first, for example with things like PF_PACKET. This way the
hard_header_len check is skipped.

It looks it needs to be audited. I think either:
a) dev->hard_header_len must be eliminated completely and skb allocations
   have to assume some sane amount of header space (32 bytes or so), or
b) all dev->hard_header() and dev->hard_start_xmit() calling paths must
   be checked to contain at least dev->hard_header_len header space, or
c) dev->hard_header_len must be clearly marked as advisory, no core
   code changes (all drivers must be audited and fixed).
d) another idea?

What do you prefer?

a) would IMHO the best code quality, reallocations where they are needed
   and no strange semantics which can be easily broken by accident
   (nobody would count on nonexistent hard_header_len either).
   Fast path would not need to reallocate skb data, though the check
   would still be in place.
   We could test it by reducing "default" header space to zero,
   possibly a "hacking" kernel config option may be useful?

b) my patch is the starting point but I'm not sure it's practical.
c) IMHO the worst by all means.

I think I could do a) in a couple of weeks so that it could go into
2.6.19.

Back to my patch. I understand the part about ip_output() is ok
for 2.6.18, isn't it?

What about the psched_mtu() thing? While it's not kernel panic,
I think we should fix it. I'm not sure it should return  dev->mtu +
dev->hard_header_len or just dev->mtu, though.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] SMSC LAN911x and LAN921x vendor driver

2006-08-02 Thread Francois Romieu

[EMAIL PROTECTED] <[EMAIL PROTECTED]> :
> Mezigues :
[...]
> > Does the platform guarantees that the register write has actually 
> reached
> > the real register when the udelay is issued ?
> 
> I think so, but maybe you can help me check.  The LAN911x device is always 
> directly connected to a simple SRAM-like host bus, and smsc911x_reg_write 
> is implemented using readl.  Does this implicitly guarantee it to be 
> volatile?

(s/readl/writel/)

It's probably safe if it's non-cached SRAM like but I strongly suggest to
read Documentation/DocBook/deviceiobook.tmpl. It explains better than me.

[...]
> > >  spin_lock_irqsave(&pdata->phy_lock, flags);
> > 
> > flags useless: ->open() is issued in irq-enabled context.
> 
> How do you mean? I thought an irq-enabled context meant i DO have to 
> disable irqs?

Yes but you can disable unconditionally and later enable unconditionnally
because you know that the irq are _always_ enabled before the lock (in
->open()).

'flags' saves the state. If the state is constant, you can either:
- s/spin_{lock_irqsave/unlock_irqrestore}/spin_{lock/unlock}_irq/
  (irq always on before the lock)
or:
- s/spin_{lock_irqsave/unlock_irqrestore}/spin_{lock/unlock}/
  (irq always off before the lock)

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Mobile IPv6 introduction

2006-08-02 Thread David Miller

From: Masahide NAKAMURA <[EMAIL PROTECTED]>
Date: Wed, 02 Aug 2006 22:03:16 +0900

> If there is much requirement to add new type number without any
> modification of kernel code at all I would support ICMPv6 filter
> approach, too.

There is no such requirement, please just continue to prepare
your current patches for inclusion.

There is a limit to how much nit-picking we can do for such
a large body of work, and we should thus take evolutionary
approach to this work.  We can make all kinds of refinements
later to improve the implementation.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET]: Fix ___pskb_trim when entire frag_list needs dropping

2006-08-02 Thread David Miller

From: "Marco Berizzi" <[EMAIL PROTECTED]>
Date: Wed, 02 Aug 2006 17:01:17 +0200

> The problem is fixed now. I have applied this
> patch to 2.6.18-rc2
> Many thanks Herbert for all the time spent
> to debug this problem.

Thank you for testing.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] DECnet Fix for DECnet routing bug

2006-08-02 Thread David Miller

From: Patrick Caulfield <[EMAIL PROTECTED]>
Date: Wed, 02 Aug 2006 15:19:24 +0100

> This patch fixes a bug in the DECnet routing code where we were selecting
> a loopback device in preference to an outward facing device even when
> the destination was known non-local. This patch should fix the problem.
> 
> Signed-off-by: Patrick Caulfield <[EMAIL PROTECTED]>
> Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]>

Applied, thanks Patrick.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Please pull 'upstream-fixes' branch of wireless-2.6

2006-08-02 Thread John W. Linville

The following changes since commit 49b1e3ea19b1c95c2f012b8331ffb3b169e4c042:
  Linus Torvalds:
Merge branch 'merge' of git://git.kernel.org/.../paulus/powerpc

are found in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git 
upstream-fixes

Daniel Drake:
  zd1211rw: Pass more management frame types up to host
  zd1211rw: Fix software encryption/decryption
  zd1211rw: Remove bogus assert

Ulrich Kunitz:
  zd1211rw: Fixes radiotap header
  zd1211rw: Fixed endianess issue with length info tag detection
  zd1211rw: Packet filter fix for managed (STA) mode

 drivers/net/wireless/zd1211rw/zd_chip.c |4 ++--
 drivers/net/wireless/zd1211rw/zd_chip.h |   10 ++
 drivers/net/wireless/zd1211rw/zd_mac.c  |   16 
 drivers/net/wireless/zd1211rw/zd_usb.c  |7 +++
 4 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/net/wireless/zd1211rw/zd_chip.c 
b/drivers/net/wireless/zd1211rw/zd_chip.c
index efc9c4b..da9d06b 100644
--- a/drivers/net/wireless/zd1211rw/zd_chip.c
+++ b/drivers/net/wireless/zd1211rw/zd_chip.c
@@ -797,7 +797,7 @@ static int zd1211_hw_init_hmac(struct zd
{ CR_ADDA_MBIAS_WARMTIME,   0x3808 },
{ CR_ZD1211_RETRY_MAX,  0x2 },
{ CR_SNIFFER_ON,0 },
-   { CR_RX_FILTER, AP_RX_FILTER },
+   { CR_RX_FILTER, STA_RX_FILTER },
{ CR_GROUP_HASH_P1, 0x00 },
{ CR_GROUP_HASH_P2, 0x8000 },
{ CR_REG1,  0xa4 },
@@ -844,7 +844,7 @@ static int zd1211b_hw_init_hmac(struct z
{ CR_ZD1211B_AIFS_CTL2, 0x008C003C },
{ CR_ZD1211B_TXOP,  0x01800824 },
{ CR_SNIFFER_ON,0 },
-   { CR_RX_FILTER, AP_RX_FILTER },
+   { CR_RX_FILTER, STA_RX_FILTER },
{ CR_GROUP_HASH_P1, 0x00 },
{ CR_GROUP_HASH_P2, 0x8000 },
{ CR_REG1,  0xa4 },
diff --git a/drivers/net/wireless/zd1211rw/zd_chip.h 
b/drivers/net/wireless/zd1211rw/zd_chip.h
index 8051210..069d2b4 100644
--- a/drivers/net/wireless/zd1211rw/zd_chip.h
+++ b/drivers/net/wireless/zd1211rw/zd_chip.h
@@ -461,10 +461,15 @@ #define CR_UNDERRUN_CNT   CTL_REG(0x0688
 
 #define CR_RX_FILTER   CTL_REG(0x068c)
 #define RX_FILTER_ASSOC_RESPONSE   0x0002
+#define RX_FILTER_REASSOC_RESPONSE 0x0008
 #define RX_FILTER_PROBE_RESPONSE   0x0020
 #define RX_FILTER_BEACON   0x0100
+#define RX_FILTER_DISASSOC 0x0400
 #define RX_FILTER_AUTH 0x0800
-/* Sniff modus sets filter to 0xf */
+#define AP_RX_FILTER   0x0400feff
+#define STA_RX_FILTER  0x
+
+/* Monitor mode sets filter to 0xf */
 
 #define CR_ACK_TIMEOUT_EXT CTL_REG(0x0690)
 #define CR_BCN_FIFO_SEMAPHORE  CTL_REG(0x0694)
@@ -546,9 +551,6 @@ #define CR_ZD1211B_AIFS_CTL2CTL_REG(0x
 #define CR_ZD1211B_TXOPCTL_REG(0x0b20)
 #define CR_ZD1211B_RETRY_MAX   CTL_REG(0x0b28)
 
-#define AP_RX_FILTER   0x0400feff
-#define STA_RX_FILTER  0x
-
 #define CWIN_SIZE  0x007f043f
 
 
diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c 
b/drivers/net/wireless/zd1211rw/zd_mac.c
index 3bdc54d..d6f3e02 100644
--- a/drivers/net/wireless/zd1211rw/zd_mac.c
+++ b/drivers/net/wireless/zd1211rw/zd_mac.c
@@ -108,7 +108,9 @@ int zd_mac_init_hw(struct zd_mac *mac, u
if (r)
goto disable_int;
 
-   r = zd_set_encryption_type(chip, NO_WEP);
+   /* We must inform the device that we are doing encryption/decryption in
+* software at the moment. */
+   r = zd_set_encryption_type(chip, ENC_SNIFFER);
if (r)
goto disable_int;
 
@@ -136,10 +138,8 @@ static int reset_mode(struct zd_mac *mac
 {
struct ieee80211_device *ieee = zd_mac_to_ieee80211(mac);
struct zd_ioreq32 ioreqs[3] = {
-   { CR_RX_FILTER, RX_FILTER_BEACON|RX_FILTER_PROBE_RESPONSE|
-   RX_FILTER_AUTH|RX_FILTER_ASSOC_RESPONSE },
+   { CR_RX_FILTER, STA_RX_FILTER },
{ CR_SNIFFER_ON, 0U },
-   { CR_ENCRYPTION_TYPE, NO_WEP },
};
 
if (ieee->iw_mode == IW_MODE_MONITOR) {
@@ -713,10 +713,10 @@ static int zd_mac_tx(struct zd_mac *mac,
 struct zd_rt_hdr {
struct ieee80211_radiotap_header rt_hdr;
u8  rt_flags;
+   u8  rt_rate;
u16 rt_channel;
u16 rt_chbitmask;
-   u16 rt_rate;
-};
+} __attribute__((packed));
 
 static void fill_rt_header(void *buffer, struct zd_mac *mac,

Re: [PATCH 2/6] zd1211rw: Pass more management frame types up to host

2006-08-02 Thread John W. Linville

On Tue, Aug 01, 2006 at 11:43:31PM +0200, Ulrich Kunitz wrote:
> From: Daniel Drake <[EMAIL PROTECTED]>
> 
> We'll be needing these at some point...

This one doesn't really seem like a fix.  But since the later fixes
seem to depend on it, I guess it makes sense to take it.

I just didn't want you to think I wasn't looking... :-)

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 20/23] [PATCH] [XFRM] POLICY: sub policy support.

2006-08-02 Thread David Miller

From: James Morris <[EMAIL PROTECTED]>
Date: Wed, 2 Aug 2006 12:04:31 -0400 (EDT)

> Why can't IPSec & MIP transforms be bundled on the same policy?

At the first year of netconf, Yoshifuji went into detail
as to why the IPSEC and MIP transformations had to live
seperately.

It's partly a side effect of different userland daemons controlling
IPSEC vs. MIP configuration.

> Or, perhaps a different approach is needed, where the disposition of a 
> policy can be to re-submit a packet for another policy match after the 
> current bundle has been traversed (something like NF_REPEAT).

We can consider an approach like this as a future refinement.
It would allow arbitrary nesting of sub-transforms, for sure,
just like netfilter's NF_REPEAT.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: r8169 driver problem with RTL8110SB chip (on iop3xx ARM board)

2006-08-02 Thread Francois Romieu

Martin Michlmayr <[EMAIL PROTECTED]> :
[...]
> Sorry, to pester you, but I was wondering if you had a chance to look
> at the register dump.

No problem. It would have been easier with a decoded output of the register
dump though (see Lennert dump below).

Lines prefixed by '>' come from Realtek's driver. I have outlined the
differences which seem relevant to me.

0x00: MAC Address  00:14:fd:10:27:74
> 000x00
> 010x14
> 020xfd
> 030x10
> 040x27
> 050x74

> 060x00
> 070x00

0x08: Multicast Address Filter 0x 0x
> 080x00
> 090x00
> 100x00
> 110x80
  ^^
/me scratches head

> 120x00
> 130x00
> 140x00
> 150x00

0x10: Dump Tally Counter Command   0xd7bbfec0 0xfb74b6fb
> 160xc0
> 170xfe
> 180xbb
> 190xdf

> 200x7b
> 210xb6
> 220x74
> 230xff

> 240x00
> 250x00
> 260x00
> 270x00

> 280x00
> 290x00
> 300x00
> 310x00

0x20: Tx Normal Priority Ring Addr 0x 0x
> 320x00
> 330x80
> 340x20
> 350x07

> 360x00
> 370x00
> 380x00
> 390x00

0x28: Tx High Priority Ring Addr   0xfffc3f00 0xfef7f6ad
> 400x00
> 410x3f
> 420xfc
> 430xff

> 440xac
> 450xf7
> 460xf7
> 470xfe

0x30: Flash memory read/write 0x
> 480x00
> 490x00
> 500x00
> 510x00

0x34: Early Rx Byte Count  0
> 520x00
> 530x00

0x36: Early Rx Status   0x00
> 540x08

0x37: Command   0x00
  Rx off, Tx off
> 550x0c
  ^^ -> CmdRxEnb | CmdTxEnb

If ChipCmd is not set, the driver will hardly work.

Lennert, your dump was taken while the kernel driver was down, right ?

> 560x00
> 570x00
> 580x00
> 590x00

0x3C: Interrupt Mask  0x
> 600x7f
  ^^
> 610x00

RxFIFOOver = 0x40,
LinkChg= 0x20,
RxOverflow = 0x10,
TxErr  = 0x08,
TxOK   = 0x04,
RxErr  = 0x02,
RxOK   = 0x01,

0x is typical of rtl8169_irq_mask_and_ack().
The device seems down.

0x3E: Interrupt Status0x
> 620x00
> 630x00

0x40: Tx Configuration0x1000
> 640x00
> 650x07
  ^^
Realtek's driver uses an unlimited DMA burst (7) instead
of 1024 (6, see TX_DMA_BURST). Probably harmless.

> 660x00
> 670x13

0x44: Rx Configuration0x0002
> 680x0e
  ^^
These bit should be set by the kernel driver through
rtl8169_set_rx_mode() when the device is brought up

The exact same value would require to replace (in rtl8169_set_rx_mode):
rx_mode = AcceptBroadcast | AcceptMyPhys;
by:
rx_mode = AcceptBroadcast | AcceptMulticast | AcceptMyPhys;


> 690xe7
> 700x00
> 710x00

0x48: Timer count 0x95887845
> 720x24
> 730x57
> 740xaf
> 750x31

0x4C: Missed packet counter 0x00
> 760x00
> 770x00
> 780x00
> 790x00

0x50: EEPROM Command0x00
> 800x00

0x51: Config 0  0x04
> 810x04

0x52: Config 1  0x1f
> 820x1f

0x53: Config 2  0x10
> 830x10

0x54: Config 3  0x20
> 840x20

0x55: Config 4  0x80
> 850x80

0x56: Config 5  0x01
> 860x03
> 870x00

0x58: Timer interrupt 0x
> 880x00
> 890x00
> 900x00
> 910x00

0x5C: Multiple Interrupt Select   0x
> 920x00
> 930x00

> 940x10
> 950x00

0x60: PHY access  0x80001000
> 960x6d
> 970x79
> 980x01
> 990x80

0x64: TBI control and status  0x
> 100   0x00
> 101   0x00
> 102   0x00
> 103   0x00

0x68: TBI Autonegotiation advertisement (ANAR)0x
> 104   0x00
> 105   0x00

0x6A: TBI Link partner ability (LPAR) 0x
> 106   0x00
> 107   0x00

0x6C: PHY status0x6b
> 108   0x6b

> 109   0x00
> 110   0x00
> 111   0x00
> 112   0xff
> 113   0xfd
> 114   0xfb
> 115   0xff
> 116   0xfc
> 117   0x03
> 118   0x00
> 119   0x00
> 120   0x00
> 121   0xff
> 122   0xff
> 123   0xff
> 124   0x00
> 125   0x00
> 126   0x00
> 127   0x00
> 128   0x00
> 129   0x00
> 130   0x00
> 131   0x00

0x84: PM wakeup frame 00xbcfeef7f 0xde9f
> 132   0x7f
> 133   0xef
> 134   0xfe
> 135   0xfc
> 136   0x9f
> 137   0xfe
> 138   0xff
> 139   0xdf

0x8C: PM wakeup frame 10xabf7cf3f 0xfffbdbbf
> 140   0x3f
> 141   0xcf
> 142   0xf7
> 143   0xab
> 144   0xb

Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch

2006-08-02 Thread David Miller


Catherine you really must begin to remember to add
proper "Signed-off-by: " lines to your patch submissions.

I'll sign off on this bug fix, but in the future I will not
do so for you any more as you've been told at least 3 or 4
times about this.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch

2006-08-02 Thread Xiaolan Zhang

David,

I will remember this in the future, I promise.

thank you,
Catherine

David Miller <[EMAIL PROTECTED]> wrote on 08/02/2006 05:11:03 PM:

> 
> Catherine you really must begin to remember to add
> proper "Signed-off-by: " lines to your patch submissions.
> 
> I'll sign off on this bug fix, but in the future I will not
> do so for you any more as you've been told at least 3 or 4
> times about this.
> 
> Thank you.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch

2006-08-02 Thread David Miller

From: Xiaolan Zhang <[EMAIL PROTECTED]>
Date: Wed, 2 Aug 2006 17:14:31 -0400

> I will remember this in the future, I promise.

Can you also remember to test your patches with CONFIG_SECURITY
disabled, as you also promised in the past several times?!??!?!

In file included from init/main.c:34:
include/linux/security.h: In function $,1rx(Bsecurity_release_secctx$,1ry(B:
include/linux/security.h:2757: warning: $,1rx(Breturn$,1ry(B with a value, 
in function returning void

I'll fix this one up, but this is getting rediculious.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

orinoco driver causes lots of lockdep spew

2006-08-02 Thread Dave Jones

Wow. Nearly 400 lines of debug spew, from a simple 'ifup eth1'.

Dave


ADDRCONF(NETDEV_UP): eth1: link is not ready
eth1: New link status: Disconnected (0002)

==
[ INFO: hard-safe -> hard-unsafe lock order detected ]
--
events/0/5 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 (af_callback_keys + sk->sk_family){-.--}, at: [] 
sock_def_readable+0x19/0x6f

and this task is already holding:
 (&priv->lock){++..}, at: [] orinoco_send_wevents+0x28/0x8b 
[orinoco]
which would create a new lock dependency:
 (&priv->lock){++..} -> (af_callback_keys + sk->sk_family){-.--}

but this new dependency connects a hard-irq-safe lock:
 (&priv->lock){++..}
... which became hard-irq-safe at:
  [] lock_acquire+0x4a/0x69
  [] _spin_lock_irqsave+0x2b/0x3c
  [] orinoco_interrupt+0x4d/0xf49 [orinoco]
  [] handle_IRQ_event+0x2b/0x64
  [] __do_IRQ+0xae/0x114
  [] do_IRQ+0xf7/0x107
  [] common_interrupt+0x64/0x65

to a hard-irq-unsafe lock:
 (af_callback_keys + sk->sk_family){-.--}
... which became hard-irq-unsafe at:
...  [] lock_acquire+0x4a/0x69
  [] _write_lock_bh+0x29/0x36
  [] netlink_release+0x139/0x2ca
  [] sock_release+0x19/0x9b
  [] sock_close+0x33/0x3a
  [] __fput+0xc6/0x1a8
  [] fput+0x13/0x16
  [] filp_close+0x64/0x70
  [] sys_close+0x93/0xb0
  [] system_call+0x7d/0x83

other info that might help us debug this:

1 lock held by events/0/5:
 #0:  (&priv->lock){++..}, at: [] 
orinoco_send_wevents+0x28/0x8b [orinoco]

the hard-irq-safe lock's dependencies:
-> (&priv->lock){++..} ops: 0 {
   initial-use  at:
[] lock_acquire+0x4a/0x69
[] _spin_lock_irq+0x2a/0x38
[] orinoco_init+0x934/0x966 [orinoco]
[] register_netdevice+0xe6/0x375
[] register_netdev+0x5a/0x69
[] orinoco_cs_probe+0x3d7/0x475 
[orinoco_cs]
[] pcmcia_device_probe+0x7f/0x124
[] driver_probe_device+0x5b/0xb1
[] __driver_attach+0x88/0xdb
[] bus_for_each_dev+0x48/0x7a
[] driver_attach+0x1b/0x1e
[] bus_add_driver+0x88/0x138
[] driver_register+0x8e/0x93
[] pcmcia_register_driver+0xd0/0xda
[] 0x880a9024
[] sys_init_module+0x16f2/0x18b7
[] system_call+0x7d/0x83
   in-hardirq-W at:
[] lock_acquire+0x4a/0x69
[] _spin_lock_irqsave+0x2b/0x3c
[] orinoco_interrupt+0x4d/0xf49 
[orinoco]
[] handle_IRQ_event+0x2b/0x64
[] __do_IRQ+0xae/0x114
[] do_IRQ+0xf7/0x107
[] common_interrupt+0x64/0x65
   in-softirq-W at:
[] lock_acquire+0x4a/0x69
[] _spin_lock_irqsave+0x2b/0x3c
[] orinoco_interrupt+0x4d/0xf49 
[orinoco]
[] handle_IRQ_event+0x2b/0x64
[] __do_IRQ+0xae/0x114
[] do_IRQ+0xf7/0x107
[] common_interrupt+0x64/0x65
[] scheduler_tick+0xc1/0x362
[] call_softirq+0x1d/0x28
[] irq_exit+0x56/0x59
[] smp_apic_timer_interrupt+0x5c/0x62
[] apic_timer_interrupt+0x69/0x70
 }
 ... key  at: [] __key.22351+0x0/0x27fa 
[orinoco]
 -> (&cwq->lock){++..} ops: 0 {
initial-use  at:
  [] lock_acquire+0x4a/0x69
  [] _spin_lock_irqsave+0x2b/0x3c
  [] __queue_work+0x17/0x5e
  [] queue_work+0x4d/0x57
  [] 
call_usermodehelper_keys+0x119/0x137
  [] kobject_uevent+0x3e5/0x42e
  [] class_device_add+0x314/0x471
  [] class_device_register+0x18/0x1d
  [] class_device_create+0xf7/0x129
  [] vtconsole_class_init+0x74/0xbb
  [] init+0x1fc/0x3cd
  [] child_rip+0x7/0x12
in-hardirq-W at:
  [] lock_acquire+0x4a/0x69
  [] _spin_lock_irqsave+0x2b/0x3c
  [] __queue_work+0x17/0x5e
  [] queue_work+0x4d/0x57
  [] kblockd_schedule_work+0x15/0x18
  [] __cfq_slice_expired+0x63/0xe6
  [] cfq_completed_request+0x116/0x154
  [] elv_completed_request+0x38/0x85
  [] __blk_put_request+0x35/0x9f
  [] end_that_request_la

Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch

2006-08-02 Thread Xiaolan Zhang

David,

I did test it with CONFIG_SECURITY disabled, but did not catch the warning 
-- I verified that the build completes with a valid vmlinux image.  There 
are many warnings (device drivers, and others) during the build and I 
didn't do a grep to find which one is specific to my patch.  Next time 
I'll do a diff on warnings too.

thanks,
Catherine

David Miller <[EMAIL PROTECTED]> wrote on 08/02/2006 05:32:04 PM:

> 
> Can you also remember to test your patches with CONFIG_SECURITY
> disabled, as you also promised in the past several times?!??!?!
> 
> In file included from init/main.c:34:
> include/linux/security.h: In function rxsecurity_release_secctxry:
> include/linux/security.h:2757: warning: rxreturnry with a value, in 
> function returning void
> 
> I'll fix this one up, but this is getting rediculious.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch

2006-08-02 Thread David Miller

From: Xiaolan Zhang <[EMAIL PROTECTED]>
Date: Wed, 2 Aug 2006 18:18:07 -0400

> I did test it with CONFIG_SECURITY disabled, but did not catch the warning 
> -- I verified that the build completes with a valid vmlinux image.  There 
> are many warnings (device drivers, and others) during the build and I 
> didn't do a grep to find which one is specific to my patch.  Next time 
> I'll do a diff on warnings too.

Some platforms build their platform code under arch/${ARCH}/foo with
-Werror added to CFLAGS, sparc64 is one such platform.  So the build
did break for me.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch

2006-08-02 Thread Xiaolan Zhang

I see.  The build was fine under x86 and there are so many warnings that a 
-Werror probably won't work for me.

thanks,
Catherine

David Miller <[EMAIL PROTECTED]> wrote on 08/02/2006 06:19:06 PM:

> From: Xiaolan Zhang <[EMAIL PROTECTED]>
> Date: Wed, 2 Aug 2006 18:18:07 -0400
> 
> > I did test it with CONFIG_SECURITY disabled, but did not catch the 
warning 
> > -- I verified that the build completes with a valid vmlinux image. 
There 
> > are many warnings (device drivers, and others) during the build and I 
> > didn't do a grep to find which one is specific to my patch.  Next time 

> > I'll do a diff on warnings too.
> 
> Some platforms build their platform code under arch/${ARCH}/foo with
> -Werror added to CFLAGS, sparc64 is one such platform.  So the build
> did break for me.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 0/2][RFC] iWARP Core Support

2006-08-02 Thread Steve Wise


Roland, 

Here is the iWARP Core Support patchset merged to your latest for-2.6.19
branch.  It has gone through 3 reviews on lklm and netdev a while ago, and
I think its ready to be pulled in.

Steve.



This patchset defines the modifications to the Linux infiniband subsystem
to support iWARP devices.  

The patchset consists of 2 patches:

1 - New iWARP CM implementation.  
2 - Core changes to support iWARP.

Signed-off-by: Tom Tucker <[EMAIL PROTECTED]>
Signed-off-by: Steve Wise <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 2/2] iWARP Core Changes.

2006-08-02 Thread Steve Wise


This patch contains modifications to the existing rdma header files,
core files, drivers, and ulp files to support iWARP.

V2 Review updates:

V1 Review updates:

- copy_addr() -> rdma_copy_addr()

- dst_dev_addr param in rdma_copy_addr to const.

- various spacing nits with recasting

- include linux/inetdevice.h to get ip_dev_find() prototype.

- dev_put() after successful ip_dev_find()
---

 drivers/infiniband/core/Makefile |4 
 drivers/infiniband/core/addr.c   |   19 +
 drivers/infiniband/core/cache.c  |8 -
 drivers/infiniband/core/cm.c |3 
 drivers/infiniband/core/cma.c|  356 +++---
 drivers/infiniband/core/device.c |6 
 drivers/infiniband/core/mad.c|   11 +
 drivers/infiniband/core/sa_query.c   |5 
 drivers/infiniband/core/smi.c|   18 +
 drivers/infiniband/core/sysfs.c  |   18 +
 drivers/infiniband/core/ucm.c|5 
 drivers/infiniband/core/user_mad.c   |9 -
 drivers/infiniband/hw/ipath/ipath_verbs.c|2 
 drivers/infiniband/hw/mthca/mthca_provider.c |2 
 drivers/infiniband/ulp/ipoib/ipoib_main.c|8 +
 drivers/infiniband/ulp/srp/ib_srp.c  |2 
 include/rdma/ib_addr.h   |   16 +
 include/rdma/ib_verbs.h  |   39 ++-
 18 files changed, 438 insertions(+), 93 deletions(-)

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 68e73ec..163d991 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -1,7 +1,7 @@
 infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS) := ib_addr.o rdma_cm.o
 
 obj-$(CONFIG_INFINIBAND) +=ib_core.o ib_mad.o ib_sa.o \
-   ib_cm.o $(infiniband-y)
+   ib_cm.o iw_cm.o $(infiniband-y)
 obj-$(CONFIG_INFINIBAND_USER_MAD) +=   ib_umad.o
 obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=ib_uverbs.o ib_ucm.o
 
@@ -14,6 +14,8 @@ ib_sa-y :=sa_query.o
 
 ib_cm-y := cm.o
 
+iw_cm-y := iwcm.o
+
 rdma_cm-y :=   cma.o
 
 ib_addr-y :=   addr.o
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index d294bbc..83f84ef 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -32,6 +32,7 @@ #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -60,12 +61,15 @@ static LIST_HEAD(req_list);
 static DECLARE_WORK(work, process_req, NULL);
 static struct workqueue_struct *addr_wq;
 
-static int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
-unsigned char *dst_dev_addr)
+int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
+const unsigned char *dst_dev_addr)
 {
switch (dev->type) {
case ARPHRD_INFINIBAND:
-   dev_addr->dev_type = IB_NODE_CA;
+   dev_addr->dev_type = RDMA_NODE_IB_CA;
+   break;
+   case ARPHRD_ETHER:
+   dev_addr->dev_type = RDMA_NODE_RNIC;
break;
default:
return -EADDRNOTAVAIL;
@@ -77,6 +81,7 @@ static int copy_addr(struct rdma_dev_add
memcpy(dev_addr->dst_dev_addr, dst_dev_addr, MAX_ADDR_LEN);
return 0;
 }
+EXPORT_SYMBOL(rdma_copy_addr);
 
 int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr)
 {
@@ -88,7 +93,7 @@ int rdma_translate_ip(struct sockaddr *a
if (!dev)
return -EADDRNOTAVAIL;
 
-   ret = copy_addr(dev_addr, dev, NULL);
+   ret = rdma_copy_addr(dev_addr, dev, NULL);
dev_put(dev);
return ret;
 }
@@ -160,7 +165,7 @@ static int addr_resolve_remote(struct so
 
/* If the device does ARP internally, return 'done' */
if (rt->idev->dev->flags & IFF_NOARP) {
-   copy_addr(addr, rt->idev->dev, NULL);
+   rdma_copy_addr(addr, rt->idev->dev, NULL);
goto put;
}
 
@@ -180,7 +185,7 @@ static int addr_resolve_remote(struct so
src_in->sin_addr.s_addr = rt->rt_src;
}
 
-   ret = copy_addr(addr, neigh->dev, neigh->ha);
+   ret = rdma_copy_addr(addr, neigh->dev, neigh->ha);
 release:
neigh_release(neigh);
 put:
@@ -244,7 +249,7 @@ static int addr_resolve_local(struct soc
if (ZERONET(src_ip)) {
src_in->sin_family = dst_in->sin_family;
src_in->sin_addr.s_addr = dst_ip;
-   ret = copy_addr(addr, dev, dev->dev_addr);
+   ret = rdma_copy_addr(addr, dev, dev->dev_addr);
} else if (LOOPBACK(src_ip)) {
ret = rdma_translate_ip((struct sockaddr *)dst_in, addr);
if (!ret)
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache

[PATCH v4 1/2] iWARP Connection Manager.

2006-08-02 Thread Steve Wise


This patch provides the new files implementing the iWARP Connection
Manager.

This module is a logical instance of the xx_cm where xx is the transport
type (ib or iw). The symbols exported are used by the transport
independent rdma_cm module, and are available also for transport
dependent ULPs.

V2 Review Changes:

- BUG_ON(1) -> BUG()

- Don't typecast whan assigning between something* and void*

- pre-allocate iwcm_work objects to avoid allocating them in the interrupt
  context.

- copy private data on connect request and connect reply events.

- #if !defined() -> #ifndef

V1 Review Changes:

- sizeof -> sizeof()

- removed printks

- removed TT debug code

- cleaned up lock/unlock around switch statements.

- waitqueue -> completion for destroy path.
---

 drivers/infiniband/core/iwcm.c | 1008 
 include/rdma/iw_cm.h   |  255 ++
 include/rdma/iw_cm_private.h   |   63 +++
 3 files changed, 1326 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
new file mode 100644
index 000..fe43c00
--- /dev/null
+++ b/drivers/infiniband/core/iwcm.c
@@ -0,0 +1,1008 @@
+/*
+ * Copyright (c) 2004, 2005 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
+ * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
+ * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
+ * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved.
+ * Copyright (c) 2005 Network Appliance, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_AUTHOR("Tom Tucker");
+MODULE_DESCRIPTION("iWARP CM");
+MODULE_LICENSE("Dual BSD/GPL");
+
+static struct workqueue_struct *iwcm_wq;
+struct iwcm_work {
+   struct work_struct work;
+   struct iwcm_id_private *cm_id;
+   struct list_head list;
+   struct iw_cm_event event;
+   struct list_head free_list;
+};
+
+/* 
+ * The following services provide a mechanism for pre-allocating iwcm_work 
+ * elements.  The design pre-allocates them  based on the cm_id type:
+ * LISTENING IDS:  Get enough elements preallocated to handle the
+ * listen backlog.
+ * ACTIVE IDS: 4: CONNECT_REPLY, ESTABLISHED, DISCONNECT, CLOSE
+ * PASSIVE IDS:3: ESTABLISHED, DISCONNECT, CLOSE 
+ *
+ * Allocating them in connect and listen avoids having to deal
+ * with allocation failures on the event upcall from the provider (which 
+ * is called in the interrupt context).  
+ *
+ * One exception is when creating the cm_id for incoming connection requests.  
+ * There are two cases:
+ * 1) in the event upcall, cm_event_handler(), for a listening cm_id.  If
+ *the backlog is exceeded, then no more connection request events will
+ *be processed.  cm_event_handler() returns -ENOMEM in this case.  Its up
+ *to the provider to reject the connectino request.
+ * 2) in the connection request workqueue handler, cm_conn_req_handler().
+ *If work elements cannot be allocated for the new connect request cm_id,
+ *then IWCM will call the provider reject method.  This is ok since
+ *cm_conn_req_handler() runs in the workqueue thread context.
+ */
+
+static struct iwcm_work *get_work(struct iwcm_id_private *cm_id_priv)
+{
+   struct iwcm_work *work;
+
+   if (list_empty(&cm_id_priv->work_free_list))
+   return NULL;
+   work = list_entry(cm_id_priv->work_free_list

[PATCH 3/6] htb: if HTB_HYSTERIS cleanup

2006-08-02 Thread Stephen Hemminger

Change the conditional compilation around HTB_HYSTERSIS
since code was splitting mid expression.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/sched/sch_htb.c |   27 +--
 1 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index c0b80b7..d8c1a6b 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -483,6 +483,20 @@ static void htb_deactivate_prios(struct 
htb_remove_class_from_row(q,cl,mask);
 }
 
+#if HTB_HYSTERESIS
+static inline long htb_lowater(const struct htb_class *cl)
+{
+   return cl->cmode != HTB_CANT_SEND ? -cl->cbuffer : 0;
+}
+static inline long htb_hiwater(const struct htb_class *cl)
+{
+   return cl->cmode == HTB_CAN_SEND ? -cl->buffer : 0;
+}
+#else
+#define htb_lowater(cl)(0)
+#define htb_hiwater(cl)(0)
+#endif
+
 /**
  * htb_class_mode - computes and returns current class mode
  *
@@ -499,19 +513,12 @@ htb_class_mode(struct htb_class *cl,long
 {
 long toks;
 
-if ((toks = (cl->ctokens + *diff)) < (
-#if HTB_HYSTERESIS
-   cl->cmode != HTB_CANT_SEND ? -cl->cbuffer :
-#endif
-   0)) {
+if ((toks = (cl->ctokens + *diff)) < htb_lowater(cl)) {
*diff = -toks;
return HTB_CANT_SEND;
 }
-if ((toks = (cl->tokens + *diff)) >= (
-#if HTB_HYSTERESIS
-   cl->cmode == HTB_CAN_SEND ? -cl->buffer :
-#endif
-   0))
+
+if ((toks = (cl->tokens + *diff)) >= htb_hiwater(cl))
return HTB_CAN_SEND;
 
 *diff = -toks;
-- 
1.4.0

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATHC 4/6] htb: Lindent

2006-08-02 Thread Stephen Hemminger

Code was a mess in terms of indentation.  Run through Lindent
script, and cleanup the damage. Also, don't use, vim magic
comment, and substitute inline for __inline__.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/sched/sch_htb.c | 1001 +++
 1 files changed, 526 insertions(+), 475 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index d8c1a6b..528d5c5 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1,4 +1,4 @@
-/* vim: ts=8 sw=8
+/*
  * net/sched/sch_htb.c Hierarchical token bucket, feed tree version
  *
  * This program is free software; you can redistribute it and/or
@@ -68,11 +68,11 @@ #include 
 one less than their parent.
 */
 
-#define HTB_HSIZE 16   /* classid hash size */
-#define HTB_EWMAC 2/* rate average over HTB_EWMAC*HTB_HSIZE sec */
-#define HTB_RATECM 1/* whether to use rate computer */
-#define HTB_HYSTERESIS 1/* whether to use mode hysteresis for speedup */
-#define HTB_VER 0x30011/* major must be matched with number suplied by 
TC as version */
+#define HTB_HSIZE 16   /* classid hash size */
+#define HTB_EWMAC 2/* rate average over HTB_EWMAC*HTB_HSIZE sec */
+#define HTB_RATECM 1   /* whether to use rate computer */
+#define HTB_HYSTERESIS 1   /* whether to use mode hysteresis for speedup */
+#define HTB_VER 0x30011/* major must be matched with number 
suplied by TC as version */
 
 #if HTB_VER >> 16 != TC_HTB_PROTOVER
 #error "Mismatched sch_htb.c and pkt_sch.h"
@@ -80,154 +80,152 @@ #endif
 
 /* used internaly to keep status of single class */
 enum htb_cmode {
-HTB_CANT_SEND, /* class can't send and can't borrow */
-HTB_MAY_BORROW,/* class can't send but may borrow */
-HTB_CAN_SEND   /* class can send */
+   HTB_CANT_SEND,  /* class can't send and can't borrow */
+   HTB_MAY_BORROW, /* class can't send but may borrow */
+   HTB_CAN_SEND/* class can send */
 };
 
 /* interior & leaf nodes; props specific to leaves are marked L: */
-struct htb_class
-{
-/* general class parameters */
-u32 classid;
-struct gnet_stats_basic bstats;
-struct gnet_stats_queue qstats;
-struct gnet_stats_rate_est rate_est;
-struct tc_htb_xstats xstats;/* our special stats */
-int refcnt;/* usage count of this class */
+struct htb_class {
+   /* general class parameters */
+   u32 classid;
+   struct gnet_stats_basic bstats;
+   struct gnet_stats_queue qstats;
+   struct gnet_stats_rate_est rate_est;
+   struct tc_htb_xstats xstats;/* our special stats */
+   int refcnt; /* usage count of this class */
 
 #ifdef HTB_RATECM
-/* rate measurement counters */
-unsigned long rate_bytes,sum_bytes;
-unsigned long rate_packets,sum_packets;
+   /* rate measurement counters */
+   unsigned long rate_bytes, sum_bytes;
+   unsigned long rate_packets, sum_packets;
 #endif
 
-/* topology */
-int level; /* our level (see above) */
-struct htb_class *parent;  /* parent class */
-struct list_head hlist;/* classid hash list item */
-struct list_head sibling;  /* sibling list item */
-struct list_head children; /* children list */
-
-union {
-   struct htb_class_leaf {
-   struct Qdisc *q;
-   int prio;
-   int aprio;  
-   int quantum;
-   int deficit[TC_HTB_MAXDEPTH];
-   struct list_head drop_list;
-   } leaf;
-   struct htb_class_inner {
-   struct rb_root feed[TC_HTB_NUMPRIO]; /* feed trees */
-   struct rb_node *ptr[TC_HTB_NUMPRIO]; /* current class ptr */
-/* When class changes from state 1->2 and disconnects from 
-   parent's feed then we lost ptr value and start from the
-  first child again. Here we store classid of the
-  last valid ptr (used when ptr is NULL). */
-  u32 last_ptr_id[TC_HTB_NUMPRIO];
-   } inner;
-} un;
-struct rb_node node[TC_HTB_NUMPRIO]; /* node for self or feed tree */
-struct rb_node pq_node; /* node for event queue */
-unsigned long pq_key;  /* the same type as jiffies global */
-
-int prio_activity; /* for which prios are we active */
-enum htb_cmode cmode;  /* current mode of the class */
-
-/* class attached filters */
-struct tcf_proto *filter_list;
-int filter_cnt;
-
-int warned;/* only one warning about non work conserving 
.. */
-
-/* token bucket parameters */
-struct qdisc_rate_table *rate; /* rate table of the class itself */
-struct qdisc_rate_table *ceil; /* ceiling rate (limits borrows too) */
-long buffer,cbuffer;   /*

[PATCH 1/6] htb: remove broken debug code

2006-08-02 Thread Stephen Hemminger


The HTB network scheduler had debug code that wouldn't compile
and confused and obfuscated the code, remove it.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/sched/sch_htb.c |  302 ++-
 1 files changed, 34 insertions(+), 268 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 880a339..73094e7 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -70,7 +70,6 @@ #include 
 
 #define HTB_HSIZE 16   /* classid hash size */
 #define HTB_EWMAC 2/* rate average over HTB_EWMAC*HTB_HSIZE sec */
-#undef HTB_DEBUG   /* compile debugging support (activated by tc tool) */
 #define HTB_RATECM 1/* whether to use rate computer */
 #define HTB_HYSTERESIS 1/* whether to use mode hysteresis for speedup */
 #define HTB_QLOCK(S) spin_lock_bh(&(S)->dev->queue_lock)
@@ -81,51 +80,6 @@ #if HTB_VER >> 16 != TC_HTB_PROTOVER
 #error "Mismatched sch_htb.c and pkt_sch.h"
 #endif
 
-/* debugging support; S is subsystem, these are defined:
-  0 - netlink messages
-  1 - enqueue
-  2 - drop & requeue
-  3 - dequeue main
-  4 - dequeue one prio DRR part
-  5 - dequeue class accounting
-  6 - class overlimit status computation
-  7 - hint tree
-  8 - event queue
- 10 - rate estimator
- 11 - classifier 
- 12 - fast dequeue cache
-
- L is level; 0 = none, 1 = basic info, 2 = detailed, 3 = full
- q->debug uint32 contains 16 2-bit fields one for subsystem starting
- from LSB
- */
-#ifdef HTB_DEBUG
-#define HTB_DBG_COND(S,L) (((q->debug>>(2*S))&3) >= L)
-#define HTB_DBG(S,L,FMT,ARG...) if (HTB_DBG_COND(S,L)) \
-   printk(KERN_DEBUG FMT,##ARG)
-#define HTB_CHCL(cl) BUG_TRAP((cl)->magic == HTB_CMAGIC)
-#define HTB_PASSQ q,
-#define HTB_ARGQ struct htb_sched *q,
-#define static
-#undef __inline__
-#define __inline__
-#undef inline
-#define inline
-#define HTB_CMAGIC 0xFEFAFEF1
-#define htb_safe_rb_erase(N,R) do { BUG_TRAP((N)->rb_color != -1); \
-   if ((N)->rb_color == -1) break; \
-   rb_erase(N,R); \
-   (N)->rb_color = -1; } while (0)
-#else
-#define HTB_DBG_COND(S,L) (0)
-#define HTB_DBG(S,L,FMT,ARG...)
-#define HTB_PASSQ
-#define HTB_ARGQ
-#define HTB_CHCL(cl)
-#define htb_safe_rb_erase(N,R) rb_erase(N,R)
-#endif
-
-
 /* used internaly to keep status of single class */
 enum htb_cmode {
 HTB_CANT_SEND, /* class can't send and can't borrow */
@@ -136,9 +90,6 @@ enum htb_cmode {
 /* interior & leaf nodes; props specific to leaves are marked L: */
 struct htb_class
 {
-#ifdef HTB_DEBUG
-   unsigned magic;
-#endif
 /* general class parameters */
 u32 classid;
 struct gnet_stats_basic bstats;
@@ -238,7 +189,6 @@ struct htb_sched
 int nwc_hit;   /* this to disable mindelay complaint in dequeue */
 
 int defcls;/* class where unclassified flows go to */
-u32 debug; /* subsystem debug levels */
 
 /* filters for qdisc itself */
 struct tcf_proto *filter_list;
@@ -354,75 +304,21 @@ #endif
return cl;
 }
 
-#ifdef HTB_DEBUG
-static void htb_next_rb_node(struct rb_node **n);
-#define HTB_DUMTREE(root,memb) if(root) { \
-   struct rb_node *n = (root)->rb_node; \
-   while (n->rb_left) n = n->rb_left; \
-   while (n) { \
-   struct htb_class *cl = rb_entry(n, struct htb_class, memb); \
-   printk(" %x",cl->classid); htb_next_rb_node (&n); \
-   } }
-
-static void htb_debug_dump (struct htb_sched *q)
-{
-   int i,p;
-   printk(KERN_DEBUG "htb*g j=%lu lj=%lu\n",jiffies,q->jiffies);
-   /* rows */
-   for (i=TC_HTB_MAXDEPTH-1;i>=0;i--) {
-   printk(KERN_DEBUG "htb*r%d m=%x",i,q->row_mask[i]);
-   for (p=0;prow[i][p].rb_node) continue;
-   printk(" p%d:",p);
-   HTB_DUMTREE(q->row[i]+p,node[p]);
-   }
-   printk("\n");
-   }
-   /* classes */
-   for (i = 0; i < HTB_HSIZE; i++) {
-   struct list_head *l;
-   list_for_each (l,q->hash+i) {
-   struct htb_class *cl = list_entry(l,struct 
htb_class,hlist);
-   long diff = PSCHED_TDIFF_SAFE(q->now, cl->t_c, 
(u32)cl->mbuffer);
-   printk(KERN_DEBUG "htb*c%x m=%d t=%ld c=%ld pq=%lu 
df=%ld ql=%d "
-   "pa=%x f:",
-   cl->classid,cl->cmode,cl->tokens,cl->ctokens,
-   cl->pq_node.rb_color==-1?0:cl->pq_key,diff,
-   
cl->level?0:cl->un.leaf.q->q.qlen,cl->prio_activity);
-   if (cl->level)
-   for (p=0;pun.inner.feed[p].rb_node) continue;
-   printk(" p%d 
a=%x:",p,cl->un.inner.ptr[p]?rb_entry(cl->un.inner.ptr[p], struct 
htb_class,node[p])->classid:0);
-   HTB_DUMTREE(cl->un.inner.feed+p,node[p]);
-   }
-

[PATCH 6/6] htb: rbtree cleanup

2006-08-02 Thread Stephen Hemminger

Add code to initialize rb tree nodes, and check for double deletion.
This is not a real fix, but I can make it trap sometimes and may
be a bandaid for: http://bugzilla.kernel.org/show_bug.cgi?id=6681

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/sched/sch_htb.c |   34 +++---
 1 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 7853c6f..3f3e9df 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -366,7 +366,7 @@ static void htb_add_to_wait_tree(struct 
  * When we are past last key we return NULL.
  * Average complexity is 2 steps per call.
  */
-static void htb_next_rb_node(struct rb_node **n)
+static inline void htb_next_rb_node(struct rb_node **n)
 {
*n = rb_next(*n);
 }
@@ -388,6 +388,18 @@ static inline void htb_add_class_to_row(
}
 }
 
+/* If this triggers, it is a bug in this code, but it need not be fatal */
+static void htb_safe_rb_erase(struct rb_node *rb, struct rb_root *root)
+{
+   if (RB_EMPTY_NODE(rb)) {
+   WARN_ON(1);
+   } else {
+   rb_erase(rb, root);
+   RB_CLEAR_NODE(rb);
+   }
+}
+
+
 /**
  * htb_remove_class_from_row - removes class from its row
  *
@@ -401,10 +413,12 @@ static inline void htb_remove_class_from
 
while (mask) {
int prio = ffz(~mask);
+
mask &= ~(1 << prio);
if (q->ptr[cl->level][prio] == cl->node + prio)
htb_next_rb_node(q->ptr[cl->level] + prio);
-   rb_erase(cl->node + prio, q->row[cl->level] + prio);
+
+   htb_safe_rb_erase(cl->node + prio, q->row[cl->level] + prio);
if (!q->row[cl->level][prio].rb_node)
m |= 1 << prio;
}
@@ -472,7 +486,7 @@ static void htb_deactivate_prios(struct 
p->un.inner.ptr[prio] = NULL;
}
 
-   rb_erase(cl->node + prio, p->un.inner.feed + prio);
+   htb_safe_rb_erase(cl->node + prio, p->un.inner.feed + 
prio);
 
if (!p->un.inner.feed[prio].rb_node)
mask |= 1 << prio;
@@ -739,7 +753,7 @@ #define HTB_ACCNT(T,B,R) toks = diff + c
htb_change_class_mode(q, cl, &diff);
if (old_mode != cl->cmode) {
if (old_mode != HTB_CAN_SEND)
-   rb_erase(&cl->pq_node, q->wait_pq + cl->level);
+   htb_safe_rb_erase(&cl->pq_node, q->wait_pq + 
cl->level);
if (cl->cmode != HTB_CAN_SEND)
htb_add_to_wait_tree(q, cl, diff);
}
@@ -782,7 +796,7 @@ static long htb_do_events(struct htb_sch
if (time_after(cl->pq_key, q->jiffies)) {
return cl->pq_key - q->jiffies;
}
-   rb_erase(p, q->wait_pq + level);
+   htb_safe_rb_erase(p, q->wait_pq + level);
diff = PSCHED_TDIFF_SAFE(q->now, cl->t_c, (u32) cl->mbuffer);
htb_change_class_mode(q, cl, &diff);
if (cl->cmode != HTB_CAN_SEND)
@@ -1279,7 +1293,7 @@ static void htb_destroy_class(struct Qdi
htb_deactivate(q, cl);
 
if (cl->cmode != HTB_CAN_SEND)
-   rb_erase(&cl->pq_node, q->wait_pq + cl->level);
+   htb_safe_rb_erase(&cl->pq_node, q->wait_pq + cl->level);
 
kfree(cl);
 }
@@ -1370,6 +1384,8 @@ static int htb_change_class(struct Qdisc
 
if (!cl) {  /* new class */
struct Qdisc *new_q;
+   int prio;
+
/* check for valid classid */
if (!classid || TC_H_MAJ(classid ^ sch->handle)
|| htb_find(classid, sch))
@@ -1389,6 +1405,10 @@ static int htb_change_class(struct Qdisc
INIT_HLIST_NODE(&cl->hlist);
INIT_LIST_HEAD(&cl->children);
INIT_LIST_HEAD(&cl->un.leaf.drop_list);
+   RB_CLEAR_NODE(&cl->pq_node);
+
+   for (prio = 0; prio < TC_HTB_NUMPRIO; prio++)
+   RB_CLEAR_NODE(&cl->node[prio]);
 
/* create leaf qdisc early because it uses kmalloc(GFP_KERNEL)
   so that can't be used inside of sch_tree_lock
@@ -1404,7 +1424,7 @@ static int htb_change_class(struct Qdisc
 
/* remove from evt list because of level change */
if (parent->cmode != HTB_CAN_SEND) {
-   rb_erase(&parent->pq_node, q->wait_pq);
+   htb_safe_rb_erase(&parent->pq_node, q->wait_pq);
parent->cmode = HTB_CAN_SEND;
}
parent->level = (parent->parent ? parent->parent->level
-- 
1.4.0

-
To unsubscribe from this list: send the line "unsubscr

[PATCH 5/6] htb: use hlist for hash lists.

2006-08-02 Thread Stephen Hemminger

Use hlist instead of list for the hash list. This saves
space, and we can check for double delete better.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/sched/sch_htb.c |   49 +++--
 1 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 528d5c5..7853c6f 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -104,7 +104,7 @@ #endif
/* topology */
int level;  /* our level (see above) */
struct htb_class *parent;   /* parent class */
-   struct list_head hlist; /* classid hash list item */
+   struct hlist_node hlist;/* classid hash list item */
struct list_head sibling;   /* sibling list item */
struct list_head children;  /* children list */
 
@@ -163,8 +163,8 @@ static inline long L2T(struct htb_class 
 
 struct htb_sched {
struct list_head root;  /* root classes list */
-   struct list_head hash[HTB_HSIZE];   /* hashed by classid */
-   struct list_head drops[TC_HTB_NUMPRIO]; /* active leaves (for drops) */
+   struct hlist_head hash[HTB_HSIZE];  /* hashed by classid */
+   struct list_head drops[TC_HTB_NUMPRIO];/* active leaves (for drops) */
 
/* self list - roots of self generating tree */
struct rb_root row[TC_HTB_MAXDEPTH][TC_HTB_NUMPRIO];
@@ -220,12 +220,13 @@ #endif
 static inline struct htb_class *htb_find(u32 handle, struct Qdisc *sch)
 {
struct htb_sched *q = qdisc_priv(sch);
-   struct list_head *p;
+   struct hlist_node *p;
+   struct htb_class *cl;
+
if (TC_H_MAJ(handle) != sch->handle)
return NULL;
 
-   list_for_each(p, q->hash + htb_hash(handle)) {
-   struct htb_class *cl = list_entry(p, struct htb_class, hlist);
+   hlist_for_each_entry(cl, p, q->hash + htb_hash(handle), hlist) {
if (cl->classid == handle)
return cl;
}
@@ -675,7 +676,9 @@ static void htb_rate_timer(unsigned long
 {
struct Qdisc *sch = (struct Qdisc *)arg;
struct htb_sched *q = qdisc_priv(sch);
-   struct list_head *p;
+   struct hlist_node *p;
+   struct htb_class *cl;
+
 
/* lock queue so that we can muck with it */
spin_lock_bh(&sch->dev->queue_lock);
@@ -686,9 +689,8 @@ static void htb_rate_timer(unsigned long
/* scan and recompute one bucket at time */
if (++q->recmp_bucket >= HTB_HSIZE)
q->recmp_bucket = 0;
-   list_for_each(p, q->hash + q->recmp_bucket) {
-   struct htb_class *cl = list_entry(p, struct htb_class, hlist);
 
+   hlist_for_each_entry(cl,p, q->hash + q->recmp_bucket, hlist) {
RT_GEN(cl->sum_bytes, cl->rate_bytes);
RT_GEN(cl->sum_packets, cl->rate_packets);
}
@@ -1041,10 +1043,10 @@ static void htb_reset(struct Qdisc *sch)
int i;
 
for (i = 0; i < HTB_HSIZE; i++) {
-   struct list_head *p;
-   list_for_each(p, q->hash + i) {
-   struct htb_class *cl =
-   list_entry(p, struct htb_class, hlist);
+   struct hlist_node *p;
+   struct htb_class *cl;
+
+   hlist_for_each_entry(cl, p, q->hash + i, hlist) {
if (cl->level)
memset(&cl->un.inner, 0, sizeof(cl->un.inner));
else {
@@ -1091,7 +1093,7 @@ static int htb_init(struct Qdisc *sch, s
 
INIT_LIST_HEAD(&q->root);
for (i = 0; i < HTB_HSIZE; i++)
-   INIT_LIST_HEAD(q->hash + i);
+   INIT_HLIST_HEAD(q->hash + i);
for (i = 0; i < TC_HTB_NUMPRIO; i++)
INIT_LIST_HEAD(q->drops + i);
 
@@ -1269,7 +1271,8 @@ static void htb_destroy_class(struct Qdi
  struct htb_class, sibling));
 
/* note: this delete may happen twice (see htb_delete) */
-   list_del(&cl->hlist);
+   if (!hlist_unhashed(&cl->hlist))
+   hlist_del(&cl->hlist);
list_del(&cl->sibling);
 
if (cl->prio_activity)
@@ -1317,7 +1320,9 @@ static int htb_delete(struct Qdisc *sch,
sch_tree_lock(sch);
 
/* delete from hash and active; remainder in destroy_class */
-   list_del_init(&cl->hlist);
+   if (!hlist_unhashed(&cl->hlist))
+   hlist_del(&cl->hlist);
+
if (cl->prio_activity)
htb_deactivate(q, cl);
 
@@ -1381,7 +1386,7 @@ static int htb_change_class(struct Qdisc
 
cl->refcnt = 1;
INIT_LIST_HEAD(&cl->sibling);
-   INIT_LIST_HEAD(&cl->hlist);
+   INIT_HLIST_NODE(&cl->hlist);
INIT_LIST_HEAD(&cl->children);
INIT_LIST_HEAD(&cl->un.leaf.drop_list);
 
@@ -1420,7 +1425,7 @@ static int htb_change_class(struct Qdisc

[PATCH 0/6] htb: cleanup

2006-08-02 Thread Stephen Hemminger

The HTB scheduler code is a mess, this patch set does some basic
house cleaning.  The first four should cause no code change, but the
last two need more testing.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
"And in the Packet there writ down that doome"
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/6] htb: remove lock macro

2006-08-02 Thread Stephen Hemminger

Get rid of the macro's being used to obscure the locking.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/sched/sch_htb.c |   18 --
 1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 73094e7..c0b80b7 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -72,8 +72,6 @@ #define HTB_HSIZE 16  /* classid hash siz
 #define HTB_EWMAC 2/* rate average over HTB_EWMAC*HTB_HSIZE sec */
 #define HTB_RATECM 1/* whether to use rate computer */
 #define HTB_HYSTERESIS 1/* whether to use mode hysteresis for speedup */
-#define HTB_QLOCK(S) spin_lock_bh(&(S)->dev->queue_lock)
-#define HTB_QUNLOCK(S) spin_unlock_bh(&(S)->dev->queue_lock)
 #define HTB_VER 0x30011/* major must be matched with number suplied by 
TC as version */
 
 #if HTB_VER >> 16 != TC_HTB_PROTOVER
@@ -667,7 +665,7 @@ static void htb_rate_timer(unsigned long
struct list_head *p;
 
/* lock queue so that we can muck with it */
-   HTB_QLOCK(sch);
+   spin_lock_bh(&sch->dev->queue_lock);
 
q->rttim.expires = jiffies + HZ;
add_timer(&q->rttim);
@@ -681,7 +679,7 @@ static void htb_rate_timer(unsigned long
RT_GEN (cl->sum_bytes,cl->rate_bytes);
RT_GEN (cl->sum_packets,cl->rate_packets);
}
-   HTB_QUNLOCK(sch);
+   spin_unlock_bh(&sch->dev->queue_lock);
 }
 #endif
 
@@ -1089,7 +1087,7 @@ static int htb_dump(struct Qdisc *sch, s
unsigned char*b = skb->tail;
struct rtattr *rta;
struct tc_htb_glob gopt;
-   HTB_QLOCK(sch);
+   spin_lock_bh(&sch->dev->queue_lock);
gopt.direct_pkts = q->direct_pkts;
 
gopt.version = HTB_VER;
@@ -1100,10 +1098,10 @@ static int htb_dump(struct Qdisc *sch, s
RTA_PUT(skb, TCA_OPTIONS, 0, NULL);
RTA_PUT(skb, TCA_HTB_INIT, sizeof(gopt), &gopt);
rta->rta_len = skb->tail - b;
-   HTB_QUNLOCK(sch);
+   spin_unlock_bh(&sch->dev->queue_lock);
return skb->len;
 rtattr_failure:
-   HTB_QUNLOCK(sch);
+   spin_unlock_bh(&sch->dev->queue_lock);
skb_trim(skb, skb->tail - skb->data);
return -1;
 }
@@ -1116,7 +1114,7 @@ static int htb_dump_class(struct Qdisc *
struct rtattr *rta;
struct tc_htb_opt opt;
 
-   HTB_QLOCK(sch);
+   spin_lock_bh(&sch->dev->queue_lock);
tcm->tcm_parent = cl->parent ? cl->parent->classid : TC_H_ROOT;
tcm->tcm_handle = cl->classid;
if (!cl->level && cl->un.leaf.q)
@@ -1133,10 +1131,10 @@ static int htb_dump_class(struct Qdisc *
opt.level = cl->level; 
RTA_PUT(skb, TCA_HTB_PARMS, sizeof(opt), &opt);
rta->rta_len = skb->tail - b;
-   HTB_QUNLOCK(sch);
+   spin_unlock_bh(&sch->dev->queue_lock);
return skb->len;
 rtattr_failure:
-   HTB_QUNLOCK(sch);
+   spin_unlock_bh(&sch->dev->queue_lock);
skb_trim(skb, b - skb->data);
return -1;
 }
-- 
1.4.0

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] htb: rbtree cleanup

2006-08-02 Thread Stephen Hemminger

Add code to initialize rb tree nodes, and check for double deletion.
This is not a real fix, but I can make it trap sometimes and may
be a bandaid for: http://bugzilla.kernel.org/show_bug.cgi?id=6681

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/sched/sch_htb.c |   34 +++---
 1 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 7853c6f..3f3e9df 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -366,7 +366,7 @@ static void htb_add_to_wait_tree(struct 
  * When we are past last key we return NULL.
  * Average complexity is 2 steps per call.
  */
-static void htb_next_rb_node(struct rb_node **n)
+static inline void htb_next_rb_node(struct rb_node **n)
 {
*n = rb_next(*n);
 }
@@ -388,6 +388,18 @@ static inline void htb_add_class_to_row(
}
 }
 
+/* If this triggers, it is a bug in this code, but it need not be fatal */
+static void htb_safe_rb_erase(struct rb_node *rb, struct rb_root *root)
+{
+   if (RB_EMPTY_NODE(rb)) {
+   WARN_ON(1);
+   } else {
+   rb_erase(rb, root);
+   RB_CLEAR_NODE(rb);
+   }
+}
+
+
 /**
  * htb_remove_class_from_row - removes class from its row
  *
@@ -401,10 +413,12 @@ static inline void htb_remove_class_from
 
while (mask) {
int prio = ffz(~mask);
+
mask &= ~(1 << prio);
if (q->ptr[cl->level][prio] == cl->node + prio)
htb_next_rb_node(q->ptr[cl->level] + prio);
-   rb_erase(cl->node + prio, q->row[cl->level] + prio);
+
+   htb_safe_rb_erase(cl->node + prio, q->row[cl->level] + prio);
if (!q->row[cl->level][prio].rb_node)
m |= 1 << prio;
}
@@ -472,7 +486,7 @@ static void htb_deactivate_prios(struct 
p->un.inner.ptr[prio] = NULL;
}
 
-   rb_erase(cl->node + prio, p->un.inner.feed + prio);
+   htb_safe_rb_erase(cl->node + prio, p->un.inner.feed + 
prio);
 
if (!p->un.inner.feed[prio].rb_node)
mask |= 1 << prio;
@@ -739,7 +753,7 @@ #define HTB_ACCNT(T,B,R) toks = diff + c
htb_change_class_mode(q, cl, &diff);
if (old_mode != cl->cmode) {
if (old_mode != HTB_CAN_SEND)
-   rb_erase(&cl->pq_node, q->wait_pq + cl->level);
+   htb_safe_rb_erase(&cl->pq_node, q->wait_pq + 
cl->level);
if (cl->cmode != HTB_CAN_SEND)
htb_add_to_wait_tree(q, cl, diff);
}
@@ -782,7 +796,7 @@ static long htb_do_events(struct htb_sch
if (time_after(cl->pq_key, q->jiffies)) {
return cl->pq_key - q->jiffies;
}
-   rb_erase(p, q->wait_pq + level);
+   htb_safe_rb_erase(p, q->wait_pq + level);
diff = PSCHED_TDIFF_SAFE(q->now, cl->t_c, (u32) cl->mbuffer);
htb_change_class_mode(q, cl, &diff);
if (cl->cmode != HTB_CAN_SEND)
@@ -1279,7 +1293,7 @@ static void htb_destroy_class(struct Qdi
htb_deactivate(q, cl);
 
if (cl->cmode != HTB_CAN_SEND)
-   rb_erase(&cl->pq_node, q->wait_pq + cl->level);
+   htb_safe_rb_erase(&cl->pq_node, q->wait_pq + cl->level);
 
kfree(cl);
 }
@@ -1370,6 +1384,8 @@ static int htb_change_class(struct Qdisc
 
if (!cl) {  /* new class */
struct Qdisc *new_q;
+   int prio;
+
/* check for valid classid */
if (!classid || TC_H_MAJ(classid ^ sch->handle)
|| htb_find(classid, sch))
@@ -1389,6 +1405,10 @@ static int htb_change_class(struct Qdisc
INIT_HLIST_NODE(&cl->hlist);
INIT_LIST_HEAD(&cl->children);
INIT_LIST_HEAD(&cl->un.leaf.drop_list);
+   RB_CLEAR_NODE(&cl->pq_node);
+
+   for (prio = 0; prio < TC_HTB_NUMPRIO; prio++)
+   RB_CLEAR_NODE(&cl->node[prio]);
 
/* create leaf qdisc early because it uses kmalloc(GFP_KERNEL)
   so that can't be used inside of sch_tree_lock
@@ -1404,7 +1424,7 @@ static int htb_change_class(struct Qdisc
 
/* remove from evt list because of level change */
if (parent->cmode != HTB_CAN_SEND) {
-   rb_erase(&parent->pq_node, q->wait_pq);
+   htb_safe_rb_erase(&parent->pq_node, q->wait_pq);
parent->cmode = HTB_CAN_SEND;
}
parent->level = (parent->parent ? parent->parent->level
-- 
1.4.0

-
To unsubscribe from this list: send the line "unsubscr

Re: [PATCH 0/6] htb: cleanup

2006-08-02 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 2 Aug 2006 12:56:36 -0700

> The HTB scheduler code is a mess, this patch set does some basic
> house cleaning.  The first four should cause no code change, but the
> last two need more testing.

These patches look fine to me.  Once everyone think's they
are ready just let me know and I'll push them into net-2.6.19
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] bridge: netlink status fix

2006-08-02 Thread Stephen Hemminger

Fix code that passes back netlink status messages about
bridge changes. Submitted by [EMAIL PROTECTED]

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/bridge/br_netlink.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 06abb66..53086fb 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -85,7 +85,7 @@ void br_ifinfo_notify(int event, struct 
goto err_out;
 
err = br_fill_ifinfo(skb, port, current->pid, 0, event, 0);
-   if (err)
+   if (err < 0)
goto err_kfree;
 
NETLINK_CB(skb).dst_group = RTNLGRP_LINK;
-- 
1.4.0


-- 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] gre: transparent ethernet bridging

2006-08-02 Thread Philip Craig

Stephen Hemminger wrote:
> On Wed, 02 Aug 2006 16:17:42 +1000
> Philip Craig <[EMAIL PROTECTED]> wrote:
>> It generates a random mac address for gre ports, and also stores
>> a copy of the mac address for ethernet ports, rather than checking
>> dev->type everywhere.
>
> That looks cleaner. I wonder if using a fixed OUI would be better
> than random addresses but then choosing an OUI would be a problem.

random_ether_addr() sets the local assignment bit. This is what
various other virtual devices do (including tap devices, which can
also be bridged).

> You probably should add a comment about what this function is doing,
> and why.

Okay.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug in IPSEC?

2006-08-02 Thread Herbert Xu

On Wed, Aug 02, 2006 at 05:08:39PM +0200, Louis Croisez wrote:
> 
> I think that 96 bits for the truncated version of the hmac is not
> enough with respect to RFC 2104, p5 ?1 :
> "... We recommend that the output length to be not less than half the
> length of the hash output ... and not less than 80 bits ..."
> 
> I thing that the truncated length should be 128 bits in this case...
> Do you agree?

(To recap our sha256 IPsec implementation truncates the output to 96
bits while the last IETF draft on sha256 and the general HMAC RFC
requires 128 bits)

Yes I agree with your assessment.

Changing it is nasty though since we don't know how many Linux users
have deployed this.

Also, we should keep in mind that the IETF has given up on sha256
altogether.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] gre: transparent ethernet bridging

2006-08-02 Thread Philip Craig

Lennert Buytenhek wrote:
> On Mon, Jul 31, 2006 at 10:08:22PM -0700, Stephen Hemminger wrote:
> 
 Why not use existing bridge code?
>>> It does use the existing bridge code.  Perhaps the name is misleading.
>>> All it does is encapsulate the full ethernet header in a gre packet,
>>> rather than only layer 3.  That is, currently gre uses ARPHRD_IPGRE,
>>> but bridging requires ARPHRD_ETHER.
>> I am not against making the bridge code smarter to handle other
>> encapsulation.
> 
> What if you want to run ethernet directly over a GRE tunnel, without
> using bridging?

But on the other hand, this method allows you to send both ethernet
and non-ethernet traffic over the same GRE tunnel.  Is that useful?
Actually, this feature is what makes the handling of the LLC_SAP_BSPAN
packets simple.

The patch to bridging is a lot cleaner than the patch to GRE, and it
also sidesteps the userspace configuration issues, so I don't want to
go back to modifying the GRE device.

Both could be achieved by creating a new virtual device that sits
between GRE and bridging.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Create IP100A Driver

2006-08-02 Thread Jesse Huang

Dear Jeff:

I had discuss with our peoples. We decided to use sundance.c to support
IP100A. We will also update some bug fix to this driver.

Thanks for your suggestion.

Best Regards,
Jesse Huang

- Original Message - 
From: "Jeff Garzik" <[EMAIL PROTECTED]>
To: "Jesse Huang" <[EMAIL PROTECTED]>
Cc: "John W. Linville" <[EMAIL PROTECTED]>;
; ; <[EMAIL PROTECTED]>
Sent: Friday, July 28, 2006 6:14 PM
Subject: Re: [PATCH] Create IP100A Driver

Although it is occasionally OK to duplicate a driver, I do not see a
compelling case with ip100a.

The stronger case for a single codebase is won on the strengths of lower
long-term maintenance costs, increased strength of review, doesn't break
existing sundance driver uses, and re-use of existing testing benefits.

If you feel strongly about not showing "sundance" to your users, you can
always submit a one-line MODULE_ALIAS() change which permits users to
load "ip100a" (really sundance.c).  Using MODULE_ALIAS() seems quite
reasonable, given that IC Plus appears to be taking the lead in future
Sundance-like chip development.

So, please resubmit as changes to the existing sundance.c.  This is
better for the standard Linux kernel engineering process.

Thanks,

Jeff


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch] RFC: matching interface groups

2006-08-02 Thread Stephen J. Bevan

Balazs Scheidler writes:
 > I would like to easily match a set of dynamically created interfaces
 > from my packet filter rules. The attached patch forms the basis of my
 > implementation and I would like to know whether something like this is
 > mergeable to mainline.
[snip]
 > The implementation:
 > 
 > Each interface can belong to a single "group" at a time, an interface
 > comes up without being a member in any of the groups.

You can get a similar effect by (ab)using the iflink field i.e. set
the iflink to the parent interface and modify
ip_tables.c:ip_packet_match to check the ifindex (or iflink if
defined) for a match.  An advantage of this is that it doesn't require
adding any new fields and the only kernel change is to
ip_tables.c:ip_packet_match (and its caller).  That said, an explicit
group (or zone as various firewall vendors call it) is cleaner.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

bonding questions: carrier based link monitoring / slave device state flags

2006-08-02 Thread Or Gerlitz

First, i'd like to verify what is the parameter setting to have the
bonding driver use netif_carrier_ok(slave_device) as the means for
link detection. Is setting use_carrier = 1 enough or one needs to set
miimon to non-zero as well??? (where the value of miimon translates
to the link monitoring frequency).

Second, I understand that an enslaved device must **not** be UP, so
when enslaving a device the bonding driver calls dev_open(slave_device),
and make sure the device is UP, correct?

What i want to better understand here, is whether for the bonding driver
to declare a slave as "being able to carry traffic" it assumes the slave
will move from UP to RUNNING state (and later netif_carrier_ok would
return TRUE) without an IP address being set for the slave device ???

Is (per the bonding driver) the **time** it should take the slave to get
from UP to (RUNNING && carrier_ok) state limited and/or controlled?

Or.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/6] zd1211rw: Pass more management frame types up to host

2006-08-02 Thread Ulrich Kunitz

On 06-08-02 17:58 John W. Linville wrote:

> On Tue, Aug 01, 2006 at 11:43:31PM +0200, Ulrich Kunitz wrote:
> > From: Daniel Drake <[EMAIL PROTECTED]>
> > 
> > We'll be needing these at some point...
> 
> This one doesn't really seem like a fix.  But since the later fixes
> seem to depend on it, I guess it makes sense to take it.
> 
> I just didn't want you to think I wasn't looking... :-)

John,

you are absolutely right. The patch is needed, because the patch
sequence would break without it.

Would it be acceptable if I hint on such "bridge" patches in the
futture or should we create a clean patch sequence? The latter would
require us to rewrite patches manually.

BTW we will have a greater number of patches for 2.6.19 (around 30),
which supports more devices and also cleans stuff. Some work is
however still required.

Regards,

Uli

-- 
Uli Kunitz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

88 matches

Mail list logo