Re: [patch net-next v5 5/8] switchdev: introduce possibility to defer obj_add/del
On Wed, Oct 14, 2015 at 10:40 AM, Jiri Pirkowrote: > From: Jiri Pirko > > Similar to the attr usecase, the caller knows if he is holding RTNL and is > in atomic section. So let the called to decide the correct call variant. > > This allows drivers to sleep inside their ops and wait for hw to get the > operation status. Then the status is propagated into switchdev core. > This avoids silent errors in drivers. > > Signed-off-by: Jiri Pirko > --- > include/net/switchdev.h | 1 + > net/switchdev/switchdev.c | 100 > -- > 2 files changed, 81 insertions(+), 20 deletions(-) > > diff --git a/include/net/switchdev.h b/include/net/switchdev.h > index f8672d7..bc865e2 100644 > --- a/include/net/switchdev.h > +++ b/include/net/switchdev.h > @@ -69,6 +69,7 @@ enum switchdev_obj_id { > > struct switchdev_obj { > enum switchdev_obj_id id; > + u32 flags; > }; > > /* SWITCHDEV_OBJ_ID_PORT_VLAN */ > diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c > index 5963d7a..eac68c4 100644 > --- a/net/switchdev/switchdev.c > +++ b/net/switchdev/switchdev.c > @@ -362,21 +362,8 @@ static int __switchdev_port_obj_add(struct net_device > *dev, > return err; > } > > -/** > - * switchdev_port_obj_add - Add port object > - * > - * @dev: port device > - * @id: object ID > - * @obj: object to add > - * > - * Use a 2-phase prepare-commit transaction model to ensure > - * system is not left in a partially updated state due to > - * failure from driver/device. > - * > - * rtnl_lock must be held. > - */ > -int switchdev_port_obj_add(struct net_device *dev, > - const struct switchdev_obj *obj) > +static int switchdev_port_obj_add_now(struct net_device *dev, > + const struct switchdev_obj *obj) > { > struct switchdev_trans trans; > int err; > @@ -418,18 +405,53 @@ int switchdev_port_obj_add(struct net_device *dev, > > return err; > } > -EXPORT_SYMBOL_GPL(switchdev_port_obj_add); > + > +static void switchdev_port_obj_add_deferred(struct net_device *dev, > + const void *data) > +{ > + const struct switchdev_obj *obj = data; > + int err; > + > + err = switchdev_port_obj_add_now(dev, obj); > + if (err && err != -EOPNOTSUPP) > + netdev_err(dev, "failed (err=%d) to add object (id=%d)\n", > + err, obj->id); > +} > + > +static int switchdev_port_obj_add_defer(struct net_device *dev, > + const struct switchdev_obj *obj) > +{ > + return switchdev_deferred_enqueue(dev, obj, sizeof(*obj), > + switchdev_port_obj_add_deferred); > +} Hi Jiri, I think there is a problem here with this patch. When we defer adding an obj, we're only copying the inner struct switchdev_obj. The rest of the containing obj is lost. So when switchdev_port_obj_add_deferred() runs, we're up casting to garbage. So if this FDB obj was deferred, the addr, vid, and ndm_state members would be garbage: struct switchdev_obj_port_fdb { struct switchdev_obj obj; unsigned char addr[ETH_ALEN]; u16 vid; u16 ndm_state; }; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] Adding switchdev ageing notification on port bridged
On Mon, Oct 19, 2015 at 5:37 AM, Elad Raz <el...@mellanox.com> wrote: > Configure ageing time to the HW for newly bridged device > > CC: Scott Feldman <sfel...@gmail.com> > CC: Jiri Pirko <j...@resnulli.us> > Signed-off-by: Elad Raz <el...@mellanox.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred
On Fri, Oct 16, 2015 at 1:23 AM, Jiri Pirkowrote: > Thu, Oct 15, 2015 at 05:21:22PM CEST, john.fastab...@gmail.com wrote: >>On 15-10-14 10:40 AM, Jiri Pirko wrote: >>> From: Jiri Pirko >>> >>> Caller should know if he can call attr_set directly (when holding RTNL) >>> or if he has to defer the att_set processing for later. >>> >>> This also allows drivers to sleep inside attr_set and report operation >>> status back to switchdev core. Switchdev core then warns if status is >>> not ok, instead of silent errors happening in drivers. >>> >>> Benefit from newly introduced switchdev deferred ops infrastructure. >>> >> >>A nit but the patch description should note your setting the defer bit >>on the bridge set state. >> >>> Signed-off-by: Jiri Pirko >>> --- >>> include/net/switchdev.h | 1 + >>> net/bridge/br_stp.c | 3 +- >>> net/switchdev/switchdev.c | 108 >>> ++ >>> 3 files changed, 46 insertions(+), 66 deletions(-) >>> >>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h >>> index d1c7f90..f7de6f8 100644 >>> --- a/include/net/switchdev.h >>> +++ b/include/net/switchdev.h >>> @@ -17,6 +17,7 @@ >>> >>> #define SWITCHDEV_F_NO_RECURSE BIT(0) >>> #define SWITCHDEV_F_SKIP_EOPNOTSUPP BIT(1) >>> +#define SWITCHDEV_F_DEFER BIT(2) >>> >>> struct switchdev_trans_item { >>> struct list_head list; >>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c >>> index db6d243de..80c34d7 100644 >>> --- a/net/bridge/br_stp.c >>> +++ b/net/bridge/br_stp.c >>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned >>> int state) >>> { >>> struct switchdev_attr attr = { >>> .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE, >>> +.flags = SWITCHDEV_F_DEFER, >>> .u.stp_state = state, >>> }; >> >> >>This creates a possible race (with 6/8) I think, please check! > > Wait. This patch does not change the previous behaviour. Patch 6 does, > so I don't understand why you are asking here. Confusing. > > >> >>In del_nbp() we call br_stp_disable_port() to set the port state >>to BR_STATE_DISABLE and disabling learning events. But with this >>patch it can be deferred. Also note the STP agent may be in userspace >>which actually seems more likely the case because you likely want to >>run some more modern variant of STP than the kernel supports. >> >>So at some point in the future the driver will turn off learning. At >>the same time we call br_fdb_delete_by_port which calls a deferred >>set of fdb deletes. >> >>I don't see how you guarantee learning is off before you start doing >>the deletes here and possibly learning new addresses after the software >>side believes the port is down. >> >>So >> >> br_stp_disable_port >> br_fdb_delete_by_port >> {fdb_del_external_learn} >> [hw learns a fdb] >> [hw disables learning] >> >>What stops this from happening? > > Okay. This behaviour is the same as without the patchset. What would > resolve the issue it to put switchdev_deferred_process() after > br_stp_disable_port() and before br_fdb_delete_by_port() call. > That would enforce stp change to happen in hw before fdbs are explicitly > deleted. Sound good to you? Doesn't HW already see things in the right order since items are dequeued from the deferred list in the order queued? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelotwrote: > On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote: >> Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.dide...@savoirfairelinux.com >> wrote: >> >On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote: >> >> Mon, Oct 12, 2015 at 08:36:25PM IDT, vivien.dide...@savoirfairelinux.com >> >> wrote: >> >> >Hi guys, >> >> > >> >> >On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote: >> >> >> From: Nikolay Aleksandrov >> >> >> >> >> >> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges. >> >> >> >> >> >> Signed-off-by: Nikolay Aleksandrov >> >> >> --- >> >> >> net/switchdev/switchdev.c | 3 +++ >> >> >> 1 file changed, 3 insertions(+) >> >> >> >> >> >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c >> >> >> index 6e4a4f9ad927..256c596de896 100644 >> >> >> --- a/net/switchdev/switchdev.c >> >> >> +++ b/net/switchdev/switchdev.c >> >> >> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct >> >> >> net_device *dev, >> >> >> if (vlan.vid_begin) >> >> >> return -EINVAL; >> >> >> vlan.vid_begin = vinfo->vid; >> >> >> + /* don't allow range of pvids */ >> >> >> + if (vlan.flags & BRIDGE_VLAN_INFO_PVID) >> >> >> + return -EINVAL; >> >> >> } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) { >> >> >> if (!vlan.vid_begin) >> >> >> return -EINVAL; >> >> >> -- >> >> >> 2.4.3 >> >> >> >> >> > >> >> >Yes the patch looks good, but it is a minor check though. I hope the >> >> >subject of this thread is making sense. >> >> > >> >> >VLAN ranges seem to have been included for an UX purpose (so commands >> >> >look like Cisco IOS). We don't want to change any existing interface, so >> >> >we pushed that down to drivers, with the only valid reason that, maybe >> >> >one day, an hardware can be capable of programming a range on a per-port >> >> >basis. >> >> Hi, >> >> >> >> That's actually what we are doing in mlxsw. We can do up to 256 entries in >> >> one go. We've yet to submit this part. >> > >> >Perfect Ido, thanks for pointing this out! I'm OK with the range then. >> > >> >So there is now a very last question in my head for this, which is more >> >a matter of kernel design. Should the user be aware of such underlying >> >support? In other words, would it make sense to do this in a driver: >> > >> >foo_port_vlan_add(struct net_device *dev, >> > struct switchdev_obj_port_vlan *vlan) >> >{ >> >if (vlan->vid_begin != vlan->vid_end) >> >return -ENOTSUPP; /* or something more relevant for user */ >> > >> >return foo_port_single_vlan_add(dev, vlan->vid_begin); >> >} >> > >> >So drivers keep being simple, and we can easily propagate the fact that >> >one-or-all VLAN is not supportable, vs. the VLAN feature itself is not >> >implemented and must be done in software. >> I think that if you want to keep it simple, then Scott's advice from the >> previous thread is the most appropriate one. I believe the hardware you >> are using is simply not meant to support multiple 802.1Q bridges. > > You mean allowing only one Linux bridge over an hardware switch? > > It would for sure simplify how, as developers and users, we represent a > physical switch. But I am not sure how to achieve that and I don't have > strong opinions on this TBH. Hi Vivien, I think it's possible to keep switch ports on just one bridge if we do a little bit of work on the NETDEV_CHANGEUPPER notifier. This will give you the driver-level control you want. Do you have time to investigate? The idea is: 1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is being added to a second bridge,then return NOTIFY_BAD. Your driver needs to track the bridge count. 2) In __netdev_upper_dev_link(), check the return code from the call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if NOTIFY_BAD, abort the linking operation (goto rollback_xxx). -scott -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred
On Wed, Oct 14, 2015 at 10:40 AM, Jiri Pirkowrote: > From: Jiri Pirko > > Caller should know if he can call attr_set directly (when holding RTNL) > or if he has to defer the att_set processing for later. > > This also allows drivers to sleep inside attr_set and report operation > status back to switchdev core. Switchdev core then warns if status is > not ok, instead of silent errors happening in drivers. > > Benefit from newly introduced switchdev deferred ops infrastructure. > > Signed-off-by: Jiri Pirko > --- > include/net/switchdev.h | 1 + > net/bridge/br_stp.c | 3 +- > net/switchdev/switchdev.c | 108 > ++ > 3 files changed, 46 insertions(+), 66 deletions(-) > > diff --git a/include/net/switchdev.h b/include/net/switchdev.h > index d1c7f90..f7de6f8 100644 > --- a/include/net/switchdev.h > +++ b/include/net/switchdev.h > @@ -17,6 +17,7 @@ > > #define SWITCHDEV_F_NO_RECURSE BIT(0) > #define SWITCHDEV_F_SKIP_EOPNOTSUPPBIT(1) > +#define SWITCHDEV_F_DEFER BIT(2) > > struct switchdev_trans_item { > struct list_head list; > diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c > index db6d243de..80c34d7 100644 > --- a/net/bridge/br_stp.c > +++ b/net/bridge/br_stp.c > @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int > state) > { > struct switchdev_attr attr = { > .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE, > + .flags = SWITCHDEV_F_DEFER, > .u.stp_state = state, > }; > int err; > > p->state = state; > err = switchdev_port_attr_set(p->dev, ); > - if (err && err != -EOPNOTSUPP) > + if (err) > br_warn(p->br, "error setting offload STP state on port > %u(%s)\n", > (unsigned int) p->port_no, p->dev->name); > } Should this part of the patch be moved to patch 6/8 where switchdev_deferred_process() is called from del_nbp()? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 4/6] net: dsa: mv88e6xxx: implement port_fdb_dump
On Tue, Oct 13, 2015 at 9:46 AM, Vivien Didelot <vivien.dide...@savoirfairelinux.com> wrote: > Implement the port_fdb_dump DSA operation. > > Signed-off-by: Vivien Didelot <vivien.dide...@savoirfairelinux.com> Reviewed-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn
On Mon, Oct 12, 2015 at 11:05 PM, Jiri Pirko <j...@resnulli.us> wrote: > Tue, Oct 13, 2015 at 05:28:20AM CEST, sfel...@gmail.com wrote: >>On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <j...@resnulli.us> wrote: >>> From: Jiri Pirko <j...@mellanox.com> >>> >>> Since spinlock is held here, defer the switchdev operation. >>> >>> Signed-off-by: Jiri Pirko <j...@mellanox.com> >>> --- >>> net/bridge/br_fdb.c | 5 - >>> net/bridge/br_if.c | 3 +++ >>> 2 files changed, 7 insertions(+), 1 deletion(-) >>> >>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c >>> index f5e7da0..c88bd8e 100644 >>> --- a/net/bridge/br_fdb.c >>> +++ b/net/bridge/br_fdb.c >>> @@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, >>> const unsigned char *addr) >>> static void fdb_del_external_learn(struct net_bridge_fdb_entry *f) >>> { >>> struct switchdev_obj_port_fdb fdb = { >>> - .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB, >>> + .obj = { >>> + .id = SWITCHDEV_OBJ_ID_PORT_FDB, >>> + .flags = SWITCHDEV_F_DEFER, >>> + }, >>> .vid = f->vlan_id, >>> }; >>> >>> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c >>> index 934cae9..09147cb 100644 >>> --- a/net/bridge/br_if.c >>> +++ b/net/bridge/br_if.c >>> @@ -24,6 +24,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> >>> #include "br_private.h" >>> >>> @@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p) >>> list_del_rcu(>list); >>> >>> br_fdb_delete_by_port(br, p, 0, 1); >>> + switchdev_flush_deferred(); >>> + >> >>This potentially flushes other (valid) work on the deferred queue not >>related to FDB del. > > flush_deferred is implemented by flush_workqueue. This enforces the > workqueue to be processed heve and sleeps until it is done. Nothing is > lost. My bad, I saw "flush" and assumed everything went down the toilet. Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v4 6/7] rocker: remove nowait from switchdev callbacks.
On Mon, Oct 12, 2015 at 11:25 PM, Jiri Pirko <j...@resnulli.us> wrote: > Tue, Oct 13, 2015 at 06:02:28AM CEST, sfel...@gmail.com wrote: >>On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <j...@resnulli.us> wrote: >>> From: Jiri Pirko <j...@mellanox.com> >>> >>> No need to avoid sleeping in switchdev callbacks now, as the switchdev >>> core allows it. >>> >>> Signed-off-by: Jiri Pirko <j...@mellanox.com> >>> --- >>> drivers/net/ethernet/rocker/rocker.c | 7 +++ >>> 1 file changed, 3 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/rocker/rocker.c >>> b/drivers/net/ethernet/rocker/rocker.c >>> index bb956a5..9629c5b5 100644 >>> --- a/drivers/net/ethernet/rocker/rocker.c >>> +++ b/drivers/net/ethernet/rocker/rocker.c >>> @@ -3672,7 +3672,7 @@ static int rocker_port_fdb_flush(struct rocker_port >>> *rocker_port, >>> rocker_port->stp_state == BR_STATE_FORWARDING) >>> return 0; >>> >>> - flags |= ROCKER_OP_FLAG_REMOVE; >>> + flags |= ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE; >> >>I understand the two changes below where you're removing NOWAIT, but >>here you're adding NOWAIT which I'm not sure how that is related to >>the switchdev core changes. Is this two patches? > > I removed ROCKER_OP_FLAG_NOWAIT from attr_set. But here in > rocker_port_fdb_flush, which is called from attr_set, we call > rocker_port_fdb_learn with spin lock. Therefore I had to put > ROCKER_OP_FLAG_NOWAIT here. Before ROCKER_OP_FLAG_NOWAIT removal from > attr_set this was there already. Gotcha. Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
On Mon, Oct 12, 2015 at 11:21 PM, Jiri Pirkowrote: > Tue, Oct 13, 2015 at 08:03:46AM CEST, john.fastab...@gmail.com wrote: >>On 15-10-12 10:44 PM, Jiri Pirko wrote: >>> Tue, Oct 13, 2015 at 04:52:42AM CEST, sfel...@gmail.com wrote: On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko wrote: > From: Jiri Pirko > > Caller should know if he can call attr_set directly (when holding RTNL) > or if he has to defer the att_set processing for later. > > This also allows drivers to sleep inside attr_set and report operation > status back to switchdev core. Switchdev core then warns if status is > not ok, instead of silent errors happening in drivers. > > Signed-off-by: Jiri Pirko > --- > include/net/switchdev.h | 1 + > net/bridge/br_stp.c | 3 +- > net/switchdev/switchdev.c | 107 > -- > 3 files changed, 59 insertions(+), 52 deletions(-) > > diff --git a/include/net/switchdev.h b/include/net/switchdev.h > index d2879f2..6b109e4 100644 > --- a/include/net/switchdev.h > +++ b/include/net/switchdev.h > @@ -17,6 +17,7 @@ > > #define SWITCHDEV_F_NO_RECURSE BIT(0) > #define SWITCHDEV_F_SKIP_EOPNOTSUPPBIT(1) > +#define SWITCHDEV_F_DEFER BIT(2) > > struct switchdev_trans_item { > struct list_head list; > diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c > index db6d243de..80c34d7 100644 > --- a/net/bridge/br_stp.c > +++ b/net/bridge/br_stp.c > @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned > int state) > { > struct switchdev_attr attr = { > .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE, > + .flags = SWITCHDEV_F_DEFER, > .u.stp_state = state, > }; > int err; > > p->state = state; > err = switchdev_port_attr_set(p->dev, ); > - if (err && err != -EOPNOTSUPP) > + if (err) This looks like a problem as now all other non-switchdev ports will get an WARN in the log when STP state changes. We should only WARN if there was an err and the err is not -EOPNOTSUPP. >>> >>> If SWITCHDEV_F_DEFER flag is set, there's only 0 of -ENOMEM. >>> >>> > br_warn(p->br, "error setting offload STP state on port > %u(%s)\n", > (unsigned int) p->port_no, p->dev->name); > } > struct switchdev_attr_set_work { > struct work_struct work; > struct net_device *dev; > @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct > work_struct *work) > { > struct switchdev_attr_set_work *asw = > container_of(work, struct switchdev_attr_set_work, work); > + bool rtnl_locked = rtnl_is_locked(); > int err; > > - rtnl_lock(); > - err = switchdev_port_attr_set(asw->dev, >attr); > + if (!rtnl_locked) > + rtnl_lock(); I'm not following this change. If someone else has rtnl_lock, we'll not wait to grab it here ourselves, and proceed as if we have the lock. But what if that someone else releases the lock in the middle of us doing switchdev_port_attr_set_now? Seems we want to unconditionally wait and grab the lock. We need to block anything from moving while we do the attr set. >>> >>> Why would someone we call (driver) return the lock? In that case, he is >>> buggy and should be fixed. >>> >>> This hunk only ensures we have rtnl_lock. If not, we take it here. We do >>> not take it unconditionally because we may already have it, for example >>> if caller of switchdev_flush_deferred holds rtnl_lock. >>> >> >>This is where you lost me. How do you know another core doesn't happen >>to have the lock when you hit this code path? Maybe someone is running >>an ethtool command on another core or something. > > You are right. The same problem exists currently in switchdev_port_attr_set. You are right as in you'll change this back to unconditional grabbing of rtnl_lock? I don't follow how this problem currently exists as current code does an unconditional grab of rtnl_lock. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 4/4] bridge: vlan: combine (br|nbp)_vlan_flush into one
On Mon, Oct 12, 2015 at 10:51 AM, Ido Schimmelwrote: > Mon, Oct 12, 2015 at 02:41:09PM IDT, ra...@blackwall.org wrote: >>From: Nikolay Aleksandrov >> >>As Ido Schimmel pointed out the vlan_vid_del() loop in nbp_vlan_flush is >>unnecessary (and is actually a remnant of the old vlan code) so we can >>remove it and combine both br/nbp vlan_flush functions into one. > Just a small note to Scott and Vivien: > > One of the side effects of Nik's recent patchsets is that when VLANs are > flushed on a port the deletion is propagated to the driver via > switchdev ops, as __vlan_vid_del is called. > > Therefore there is no need to do internal bookkeeping and remove VLANs > yourself when port is removed from bridge. Thanks for the heads-up. Seems like a nice result. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
On Tue, Oct 13, 2015 at 12:30 AM, Jiri Pirkowrote: > Tue, Oct 13, 2015 at 08:53:45AM CEST, sfel...@gmail.com wrote: >>On Mon, Oct 12, 2015 at 11:21 PM, Jiri Pirko wrote: >>> Tue, Oct 13, 2015 at 08:03:46AM CEST, john.fastab...@gmail.com wrote: On 15-10-12 10:44 PM, Jiri Pirko wrote: > Tue, Oct 13, 2015 at 04:52:42AM CEST, sfel...@gmail.com wrote: >> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko wrote: >>> From: Jiri Pirko >>> >>> Caller should know if he can call attr_set directly (when holding RTNL) >>> or if he has to defer the att_set processing for later. >>> >>> This also allows drivers to sleep inside attr_set and report operation >>> status back to switchdev core. Switchdev core then warns if status is >>> not ok, instead of silent errors happening in drivers. >>> >>> Signed-off-by: Jiri Pirko >>> --- >>> include/net/switchdev.h | 1 + >>> net/bridge/br_stp.c | 3 +- >>> net/switchdev/switchdev.c | 107 >>> -- >>> 3 files changed, 59 insertions(+), 52 deletions(-) >>> >>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h >>> index d2879f2..6b109e4 100644 >>> --- a/include/net/switchdev.h >>> +++ b/include/net/switchdev.h >>> @@ -17,6 +17,7 @@ >>> >>> #define SWITCHDEV_F_NO_RECURSE BIT(0) >>> #define SWITCHDEV_F_SKIP_EOPNOTSUPPBIT(1) >>> +#define SWITCHDEV_F_DEFER BIT(2) >>> >>> struct switchdev_trans_item { >>> struct list_head list; >>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c >>> index db6d243de..80c34d7 100644 >>> --- a/net/bridge/br_stp.c >>> +++ b/net/bridge/br_stp.c >>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, >>> unsigned int state) >>> { >>> struct switchdev_attr attr = { >>> .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE, >>> + .flags = SWITCHDEV_F_DEFER, >>> .u.stp_state = state, >>> }; >>> int err; >>> >>> p->state = state; >>> err = switchdev_port_attr_set(p->dev, ); >>> - if (err && err != -EOPNOTSUPP) >>> + if (err) >> >> This looks like a problem as now all other non-switchdev ports will >> get an WARN in the log when STP state changes. We should only WARN if >> there was an err and the err is not -EOPNOTSUPP. > > If SWITCHDEV_F_DEFER flag is set, there's only 0 of -ENOMEM. > > >> >>> br_warn(p->br, "error setting offload STP state on port >>> %u(%s)\n", >>> (unsigned int) p->port_no, >>> p->dev->name); >>> } >> >> >> >>> struct switchdev_attr_set_work { >>> struct work_struct work; >>> struct net_device *dev; >>> @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct >>> work_struct *work) >>> { >>> struct switchdev_attr_set_work *asw = >>> container_of(work, struct switchdev_attr_set_work, >>> work); >>> + bool rtnl_locked = rtnl_is_locked(); >>> int err; >>> >>> - rtnl_lock(); >>> - err = switchdev_port_attr_set(asw->dev, >attr); >>> + if (!rtnl_locked) >>> + rtnl_lock(); >> >> I'm not following this change. If someone else has rtnl_lock, we'll >> not wait to grab it here ourselves, and proceed as if we have the >> lock. But what if that someone else releases the lock in the middle >> of us doing switchdev_port_attr_set_now? Seems we want to >> unconditionally wait and grab the lock. We need to block anything >>from moving while we do the attr set. > > Why would someone we call (driver) return the lock? In that case, he is > buggy and should be fixed. > > This hunk only ensures we have rtnl_lock. If not, we take it here. We do > not take it unconditionally because we may already have it, for example > if caller of switchdev_flush_deferred holds rtnl_lock. > This is where you lost me. How do you know another core doesn't happen to have the lock when you hit this code path? Maybe someone is running an ethtool command on another core or something. >>> >>> You are right. The same problem exists currently in switchdev_port_attr_set. >> >>You are right as in you'll change this back to unconditional grabbing >>of rtnl_lock? I don't follow how this problem currently exists as >>current code does an unconditional grab of rtnl_lock. > > cpu1 cpu2 > rtnl_lock() > switchdev_port_attr_set > !rtnl_is_locked() == false > switchdev_trans_init >
Re: [PATCH net-next] bridge: fix gc_timer mod/del race condition
d8a1 ("bridge: push bridge setting ageing_time down to > switchdev") > Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com> > --- > net/bridge/br_sysfs_br.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c > index 04ef1926ee7e..8365bd53c421 100644 > --- a/net/bridge/br_sysfs_br.c > +++ b/net/bridge/br_sysfs_br.c > @@ -102,7 +102,15 @@ static ssize_t ageing_time_show(struct device *d, > > static int set_ageing_time(struct net_bridge *br, unsigned long val) > { > - return br_set_ageing_time(br, val); > + int ret; > + > + if (!rtnl_trylock()) > + return restart_syscall(); > + > + ret = br_set_ageing_time(br, val); > + rtnl_unlock(); > + > + return ret; > } Looks good, thanks Nikolay. The other option would have been to simply not restart gc_timer in br_set_ageing_time(), but this would make changes to ageing_time "sluggish" in that it might take a while for the fdb cleanup algo to use the new ageing_time value. By kicking the timer when the ageing_time changes, fdb cleanup immediately switches to the new schedule. In any case, Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
On Mon, Oct 12, 2015 at 10:36 AM, Vivien Didelotwrote: > Hi guys, > > On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote: >> From: Nikolay Aleksandrov >> >> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges. >> >> Signed-off-by: Nikolay Aleksandrov >> --- >> net/switchdev/switchdev.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c >> index 6e4a4f9ad927..256c596de896 100644 >> --- a/net/switchdev/switchdev.c >> +++ b/net/switchdev/switchdev.c >> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device >> *dev, >> if (vlan.vid_begin) >> return -EINVAL; >> vlan.vid_begin = vinfo->vid; >> + /* don't allow range of pvids */ >> + if (vlan.flags & BRIDGE_VLAN_INFO_PVID) >> + return -EINVAL; >> } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) { >> if (!vlan.vid_begin) >> return -EINVAL; >> -- >> 2.4.3 >> > > Yes the patch looks good, but it is a minor check though. I hope the > subject of this thread is making sense. > > VLAN ranges seem to have been included for an UX purpose (so commands > look like Cisco IOS). We don't want to change any existing interface, so > we pushed that down to drivers, with the only valid reason that, maybe > one day, an hardware can be capable of programming a range on a per-port > basis. > > So what happens is that we'll add some code to fix and check non-sense > (e.g. range + PVID) in switchdev, bridge, and I'm sure we are missing > other spots. > > Sorry for being insistent, but this still doesn't look right to me. > > It seems like we are bloating bridge, switchdev and drivers for the only > reason to maintain a kernel support for something like: > > # for i in $(seq 100 4000); do bridge vlan add vid $i dev swp0; done [Adding Roopa] (Roopa, correct me if I'm wrong). Ref: commit # bdced7ef IIRC, vlan ranges were introduced to reduce the volume of netlink traffic when doing a command like your's above. Your command would create 3901 netlink msgs with a single IFLA_BRIDGE_VLAN_INFO, sent to the kernel and then echoed back to user-space. With vlan ranges, the cmd "bridge vlan add vid 100-4000 dev swp0" would result in one netlink msg with one IFLA_BRIDGE_VLAN_INFO. Seems there were problems with a getlink dump over-running the netlink msg (not using NLM_MULTI)? Maybe Roopa can expand on the problem that was solved. So the bridge command, and everything below it, has this feature. We can't undo it...it's in the wild, so to speak And, as mentioned before, it's and all-or-nothing command, and the underlying switchdev or driver code needs to deal with overlapping vlan maps. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirkowrote: > From: Jiri Pirko > > Caller should know if he can call attr_set directly (when holding RTNL) > or if he has to defer the att_set processing for later. > > This also allows drivers to sleep inside attr_set and report operation > status back to switchdev core. Switchdev core then warns if status is > not ok, instead of silent errors happening in drivers. > > Signed-off-by: Jiri Pirko > --- > include/net/switchdev.h | 1 + > net/bridge/br_stp.c | 3 +- > net/switchdev/switchdev.c | 107 > -- > 3 files changed, 59 insertions(+), 52 deletions(-) > > diff --git a/include/net/switchdev.h b/include/net/switchdev.h > index d2879f2..6b109e4 100644 > --- a/include/net/switchdev.h > +++ b/include/net/switchdev.h > @@ -17,6 +17,7 @@ > > #define SWITCHDEV_F_NO_RECURSE BIT(0) > #define SWITCHDEV_F_SKIP_EOPNOTSUPPBIT(1) > +#define SWITCHDEV_F_DEFER BIT(2) > > struct switchdev_trans_item { > struct list_head list; > diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c > index db6d243de..80c34d7 100644 > --- a/net/bridge/br_stp.c > +++ b/net/bridge/br_stp.c > @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int > state) > { > struct switchdev_attr attr = { > .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE, > + .flags = SWITCHDEV_F_DEFER, > .u.stp_state = state, > }; > int err; > > p->state = state; > err = switchdev_port_attr_set(p->dev, ); > - if (err && err != -EOPNOTSUPP) > + if (err) This looks like a problem as now all other non-switchdev ports will get an WARN in the log when STP state changes. We should only WARN if there was an err and the err is not -EOPNOTSUPP. > br_warn(p->br, "error setting offload STP state on port > %u(%s)\n", > (unsigned int) p->port_no, p->dev->name); > } > struct switchdev_attr_set_work { > struct work_struct work; > struct net_device *dev; > @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct > work_struct *work) > { > struct switchdev_attr_set_work *asw = > container_of(work, struct switchdev_attr_set_work, work); > + bool rtnl_locked = rtnl_is_locked(); > int err; > > - rtnl_lock(); > - err = switchdev_port_attr_set(asw->dev, >attr); > + if (!rtnl_locked) > + rtnl_lock(); I'm not following this change. If someone else has rtnl_lock, we'll not wait to grab it here ourselves, and proceed as if we have the lock. But what if that someone else releases the lock in the middle of us doing switchdev_port_attr_set_now? Seems we want to unconditionally wait and grab the lock. We need to block anything from moving while we do the attr set. > + err = switchdev_port_attr_set_now(asw->dev, >attr); > if (err && err != -EOPNOTSUPP) > netdev_err(asw->dev, "failed (err=%d) to set attribute > (id=%d)\n", >err, asw->attr.id); > - rtnl_unlock(); > + if (!rtnl_locked) > + rtnl_unlock(); > > dev_put(asw->dev); > kfree(work); > @@ -211,7 +257,7 @@ static int switchdev_port_attr_set_defer(struct > net_device *dev, > asw->dev = dev; > memcpy(>attr, attr, sizeof(asw->attr)); > > - schedule_work(>work); > + queue_work(switchdev_wq, >work); > > return 0; > } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v4 1/7] switchdev: introduce switchdev workqueue
On Mon, Oct 12, 2015 at 10:54 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > This is going to be used for deferred operations. > > Signed-off-by: Jiri Pirko <j...@mellanox.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v4 3/7] switchdev: remove pointers from switchdev objects
On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > When object is used in deferred work, we cannot use pointers in > switchdev object structures because the memory they point at may be already > used by someone else. So rather do local copy of the value. > > Signed-off-by: Jiri Pirko <j...@mellanox.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn
On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirkowrote: > From: Jiri Pirko > > Since spinlock is held here, defer the switchdev operation. > > Signed-off-by: Jiri Pirko > --- > net/bridge/br_fdb.c | 5 - > net/bridge/br_if.c | 3 +++ > 2 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c > index f5e7da0..c88bd8e 100644 > --- a/net/bridge/br_fdb.c > +++ b/net/bridge/br_fdb.c > @@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, const > unsigned char *addr) > static void fdb_del_external_learn(struct net_bridge_fdb_entry *f) > { > struct switchdev_obj_port_fdb fdb = { > - .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB, > + .obj = { > + .id = SWITCHDEV_OBJ_ID_PORT_FDB, > + .flags = SWITCHDEV_F_DEFER, > + }, > .vid = f->vlan_id, > }; > > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c > index 934cae9..09147cb 100644 > --- a/net/bridge/br_if.c > +++ b/net/bridge/br_if.c > @@ -24,6 +24,7 @@ > #include > #include > #include > +#include > > #include "br_private.h" > > @@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p) > list_del_rcu(>list); > > br_fdb_delete_by_port(br, p, 0, 1); > + switchdev_flush_deferred(); > + This potentially flushes other (valid) work on the deferred queue not related to FDB del. I wonder if this flush step is necessary at all? The work we deferred to delete the FDB entry can still happen after the port has been removed (del_nbp). If the port driver/device find the FDB entry, then delete it, otherwise ignore it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] switchdev: check if the vlan id is in the proper vlan range
On Mon, Oct 12, 2015 at 5:31 AM, Nikolay Aleksandrov <ra...@blackwall.org> wrote: > From: Nikolay Aleksandrov <niko...@cumulusnetworks.com> > > VLANs 0 and 4095 are reserved and shouldn't be used, add checks to > switchdev similar to the bridge. Also make sure ids above 4095 cannot > be passed either. > > Fixes: 47f8328bb1a4 ("switchdev: add new switchdev bridge setlink") > Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v4 4/7] switchdev: introduce possibility to defer obj_add/del
On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirkowrote: > From: Jiri Pirko > > Similar to the attr usecase, the caller knows if he is holding RTNL and is > in atomic section. So let the called to decide the correct call variant. > > This allows drivers to sleep inside their ops and wait for hw to get the > operation status. Then the status is propagated into switchdev core. > This avoids silent errors in drivers. > > Signed-off-by: Jiri Pirko > +static void switchdev_port_obj_work(struct work_struct *work) > +{ > + struct switchdev_obj_work *ow = > + container_of(work, struct switchdev_obj_work, work); > + bool rtnl_locked = rtnl_is_locked(); > + int err; > + > + if (!rtnl_locked) > + rtnl_lock(); Same comment as on patch 2/7 about not unconditionally grabbing rtnl_lock. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v4 6/7] rocker: remove nowait from switchdev callbacks.
On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirkowrote: > From: Jiri Pirko > > No need to avoid sleeping in switchdev callbacks now, as the switchdev > core allows it. > > Signed-off-by: Jiri Pirko > --- > drivers/net/ethernet/rocker/rocker.c | 7 +++ > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/rocker/rocker.c > b/drivers/net/ethernet/rocker/rocker.c > index bb956a5..9629c5b5 100644 > --- a/drivers/net/ethernet/rocker/rocker.c > +++ b/drivers/net/ethernet/rocker/rocker.c > @@ -3672,7 +3672,7 @@ static int rocker_port_fdb_flush(struct rocker_port > *rocker_port, > rocker_port->stp_state == BR_STATE_FORWARDING) > return 0; > > - flags |= ROCKER_OP_FLAG_REMOVE; > + flags |= ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE; I understand the two changes below where you're removing NOWAIT, but here you're adding NOWAIT which I'm not sure how that is related to the switchdev core changes. Is this two patches? > spin_lock_irqsave(>fdb_tbl_lock, lock_flags); > > @@ -4382,8 +4382,7 @@ static int rocker_port_attr_set(struct net_device *dev, > > switch (attr->id) { > case SWITCHDEV_ATTR_ID_PORT_STP_STATE: > - err = rocker_port_stp_update(rocker_port, trans, > -ROCKER_OP_FLAG_NOWAIT, > + err = rocker_port_stp_update(rocker_port, trans, 0, > attr->u.stp_state); > break; > case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS: > @@ -4517,7 +4516,7 @@ static int rocker_port_fdb_del(struct rocker_port > *rocker_port, >const struct switchdev_obj_port_fdb *fdb) > { > __be16 vlan_id = rocker_port_vid_to_vlan(rocker_port, fdb->vid, NULL); > - int flags = ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE; > + int flags = ROCKER_OP_FLAG_REMOVE; > > if (!rocker_port_is_bridged(rocker_port)) > return -EINVAL; > -- > 1.9.3 > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn
On Mon, Oct 12, 2015 at 8:31 PM, John Fastabend <john.fastab...@gmail.com> wrote: > On 15-10-12 08:28 PM, Scott Feldman wrote: >> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <j...@resnulli.us> wrote: >>> From: Jiri Pirko <j...@mellanox.com> >>> >>> Since spinlock is held here, defer the switchdev operation. >>> >>> Signed-off-by: Jiri Pirko <j...@mellanox.com> >>> --- >>> net/bridge/br_fdb.c | 5 - >>> net/bridge/br_if.c | 3 +++ >>> 2 files changed, 7 insertions(+), 1 deletion(-) >>> >>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c >>> index f5e7da0..c88bd8e 100644 >>> --- a/net/bridge/br_fdb.c >>> +++ b/net/bridge/br_fdb.c >>> @@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, >>> const unsigned char *addr) >>> static void fdb_del_external_learn(struct net_bridge_fdb_entry *f) >>> { >>> struct switchdev_obj_port_fdb fdb = { >>> - .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB, >>> + .obj = { >>> + .id = SWITCHDEV_OBJ_ID_PORT_FDB, >>> + .flags = SWITCHDEV_F_DEFER, >>> + }, >>> .vid = f->vlan_id, >>> }; >>> >>> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c >>> index 934cae9..09147cb 100644 >>> --- a/net/bridge/br_if.c >>> +++ b/net/bridge/br_if.c >>> @@ -24,6 +24,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> >>> #include "br_private.h" >>> >>> @@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p) >>> list_del_rcu(>list); >>> >>> br_fdb_delete_by_port(br, p, 0, 1); >>> + switchdev_flush_deferred(); >>> + >> >> This potentially flushes other (valid) work on the deferred queue not >> related to FDB del. >> >> I wonder if this flush step is necessary at all? The work we deferred >> to delete the FDB entry can still happen after the port has been >> removed (del_nbp). If the port driver/device find the FDB entry, then >> delete it, otherwise ignore it. >> > > Just the first thing that springs to mind reading this comment is, > > - del gets deffered > - add fdb > - del runs > > Is there an issue here? Sorry I'll do a more thorough review now just > thought I would toss it out there before I forget. It's a valid thought to consider, for sure. The context is these are only FDB entries added by an external learn event. So I believe in your sequence, the second step to add fdb entry wouldn't happen as the fdb entry already exists at that point (in other words, the entry has already been learned on external device and pushed up via notifier to bridge). So I think we're OK in regards to your question. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v3 3/4] bridge: push bridge setting ageing_time down to switchdev
On Sat, Oct 10, 2015 at 8:56 AM, Vivien Didelotwrote: > Scott, didn't you have a plan to add a struct device for the parent of > switchdev ports? I had sent out a rough RFC for a switch device in the last window. I have continued working on it, and I plan to send it very soon, probably again as an RFC. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v3 3/4] bridge: push bridge setting ageing_time down to switchdev
On Sat, Oct 10, 2015 at 12:04 AM, Jiri Pirko <j...@resnulli.us> wrote: > Sat, Oct 10, 2015 at 04:53:52AM CEST, sfel...@gmail.com wrote: >>On Thu, Oct 8, 2015 at 9:38 PM, Premkumar Jonnala <pjonn...@broadcom.com> >>wrote: >>> >>> >>>> -Original Message- >>>> From: sfel...@gmail.com [mailto:sfel...@gmail.com] >>>> Sent: Friday, October 09, 2015 7:53 AM >>>> To: netdev@vger.kernel.org >>>> Cc: da...@davemloft.net; j...@resnulli.us; siva.mannem@gmail.com; >>>> Premkumar Jonnala; step...@networkplumber.org; >>>> ro...@cumulusnetworks.com; and...@lunn.ch; f.faine...@gmail.com; >>>> vivien.dide...@savoirfairelinux.com >>>> Subject: [PATCH net-next v3 3/4] bridge: push bridge setting ageing_time >>>> down >>>> to switchdev >>>> >>>> From: Scott Feldman <sfel...@gmail.com> >>>> >>>> Use SWITCHDEV_F_SKIP_EOPNOTSUPP to skip over ports in bridge that don't >>>> support setting ageing_time (or setting bridge attrs in general). >>>> >>>> If push fails, don't update ageing_time in bridge and return err to user. >>>> >>>> If push succeeds, update ageing_time in bridge and run gc_timer now to >>>> recalabrate when to run gc_timer next, based on new ageing_time. >>>> >>>> Signed-off-by: Scott Feldman <sfel...@gmail.com> >>>> Signed-off-by: Jiri Pirko <j...@resnulli.us> >> >> >> >>>> +int br_set_ageing_time(struct net_bridge *br, u32 ageing_time) >>>> +{ >>>> + struct switchdev_attr attr = { >>>> + .id = SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME, >>>> + .flags = SWITCHDEV_F_SKIP_EOPNOTSUPP, >>>> + .u.ageing_time = ageing_time, >>>> + }; >>>> + unsigned long t = clock_t_to_jiffies(ageing_time); >>>> + int err; >>>> + >>>> + if (t < BR_MIN_AGEING_TIME || t > BR_MAX_AGEING_TIME) >>>> + return -ERANGE; >>>> + >>>> + err = switchdev_port_attr_set(br->dev, ); >>> >>> A thought - given that the ageing time is not a per-bridge-port attr, why >>> are we using a "port based api" >>> to pass the attribute down? May be I'm missing something here? >> >>I think Florian raised the same point earlier. Sigh, I think this >>should be addressedv4 coming soon...thanks guys for keeping the >>standard high. > > Scott, can you tell us how do you want to address this? I like the > current implementation. I like it also, but there is some awkwardness in calling switchdev_port_attr_set() with the first argument being the bridge netdev, not a port netdev. But, the basic algorithm to recurse from _this_ netdev down to its lower netdevs works great in this case; it's just the name "switchdev_port_attr_set" implies a port netdev for top-level netdev. So I was thinking about adding another call, something like "switchdev_master_attr_set", which basically just does the same thing as switchdev_port_attr_set, except maybe skips the check to call the ops->switchdev_port_attr_add on the top-level netdev. But now I don't like that idea so much as "master" would be confusing when your passing a bond netdev (which is also a master), but the bond _is_ the port netdev this time, a port on the bridge. So let's scrap v4 and go with v3. I think I can live with this naming awkwardness, given that we got something for essentially free by using switchdev_port_attr_set() in a new way. Davem, I think we're OK going with v3. (There is a follow-on discussion about a switch device, which we'll continue but it shouldn't block this v3 version). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: switchdev and VLAN ranges
On Sun, Oct 11, 2015 at 5:13 PM, Nikolay Aleksandrovwrote: > On 10/12/2015 12:41 AM, Vivien Didelot wrote: >> On Oct. Sunday 11 (41) 09:12 AM, Jiri Pirko wrote: >>> Sat, Oct 10, 2015 at 12:36:26PM CEST, niko...@cumulusnetworks.com wrote: On 10/10/2015 09:49 AM, Elad Raz wrote: > >> On Oct 10, 2015, at 2:30 AM, Vivien Didelot >> wrote: >> >> I have two concerns in mind: >> >> a) if we imagine that drivers like Rocker allocate memory in the prepare >> phase for each VID, preparing a range like 100-4000 would definitely not >> be recommended. >> >> b) imagine that you have two Linux bridges on a switch, one using the >> hardware VLAN 100. If you request the VLAN range 99-101 for the other >> bridge members, it is not possible for the driver to say "I can >> accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for >> the whole range. > > Another concern I have with vid_being..vid_end range is the “flags”. > Where flags can be BRIDGE_VLAN_INFO_PVID. > There is no sense having more than one VLAN as a PVID. > This leave the HW vendor the choice which VLAN id they will use as the > PVID. > iproute2 doesn't allow to do it but I can see that someone can actually make it so the flags for the range have it and it doesn't look correct. Perhaps we need something like the patch below to enforce this from kernel-side. diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index d78b4429505a..02b17b53e9a6 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -524,6 +524,9 @@ static int br_afspec(struct net_bridge *br, if (vinfo_start) return -EINVAL; vinfo_start = vinfo; + /* don't allow range of pvids */ + if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID) + return -EINVAL; continue; } >>> >>> Looks correct to me. Could you please submit this properly? Thanks! >> >> The above patch is correct, but we only solve part of the problem, since >> the range and bridge flags are exposed by switchdev_obj_port_vlan as is. >> >> Thanks, >> -v >> > > Yes, the above fixes the bridge side. About the switchdev side it seems like > it's > up to the switchdev driver to do the right thing in its switchdev_ops. I took > a > quick look at DSA and it seems correct, the flag isn't saved and on dump > request > the flags are generated so it shouldn't be possible to export multiple pvids. > But switchdev_port_br_afspec() seems problematic, in fact I don't even see a > vlan > id check, i.e. ==0 || >= VLAN_N_MASK. > Of course, I might be totally off point as I'm not that familiar with > switchdev and > it's very late. :-) > But maybe it needs something like: > > diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c > index 6e4a4f9ad927..3dd52a53867f 100644 > --- a/net/switchdev/switchdev.c > +++ b/net/switchdev/switchdev.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -716,10 +717,14 @@ static int switchdev_port_br_afspec(struct net_device > *dev, > return -EINVAL; > vinfo = nla_data(attr); > vlan.flags = vinfo->flags; > + if (!vinfo->vid || vinfo->vid >= VLAN_VID_MASK) > +return -EINVAL; > if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) { > if (vlan.vid_begin) > return -EINVAL; > vlan.vid_begin = vinfo->vid; > + if (vlan.flags & BRIDGE_VLAN_INFO_PVID) > + return -EINVAL; > } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) { > if (!vlan.vid_begin) > return -EINVAL; This (and you other patch) seem right to me, if we're going to block setting PVID when specifying a vlan range. Would you mind combining and resending both patches as one as a proper patch? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next] bridge: try switchdev op first in __vlan_vid_add/del
On Sat, Oct 10, 2015 at 9:03 AM, Vivien Didelot <vivien.dide...@savoirfairelinux.com> wrote: > On Oct. Friday 09 (41) 08:20 PM, Scott Feldman wrote: >> On Fri, Oct 9, 2015 at 3:44 PM, Vivien Didelot >> <vivien.dide...@savoirfairelinux.com> wrote: >> > Hi Jiri, >> > >> > On Oct. Friday 09 (41) 01:54 PM, Jiri Pirko wrote: >> >> From: Jiri Pirko <j...@mellanox.com> >> >> >> >> Some drivers need to implement both switchdev vlan ops and >> >> vid_add/kill ndos. For that to work in bridge code, we need to try >> >> switchdev op first when adding/deleting vlan id. >> > >> > Just curious, when would a driver need to have both operations? >> >> Ya, I was kind of curious of that myself. Is this because the driver >> wants to support standalone VLANs using 802.1q module and vconfig, as >> well as bridge vlans? With the vlan support built into the bridge, >> I've been working under the assumption that 802.1q module (and >> vconfig) aren't needed, and vlans for a bridged and non-bridge port >> can be managed using the "bridge" iproute2 cmd. >> >> > I kinda have the same question regarding ndo_fdb_{add,del} and the >> > bridge_{get,set}link equivalent, which is a bit confusing to me. >> >> I had to look back at my commit 7f109539 to remind myself about the >> vid_add/kill ndos and bridge_{get,set}link usage. Maybe that >> write-up helps? I'm not following you on the ndo_fdb_add/del part of >> your question. > > Sorry I wasn't clear. What is confusing to me for FDB ops is that we > define net_device_ops like: > > .ndo_fdb_add= switchdev_port_fdb_add, > .ndo_fdb_del= switchdev_port_fdb_del, > .ndo_fdb_dump = switchdev_port_fdb_dump, > > But if I'm not mistaken, "bridge fdb" commands use the > .ndo_bridge_{get,set,del}link ops, isn't it? No, the "bridge fdb" cmds use the .ndo_fdb_xxx ops. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next RFC 3/3] switchdev: introduce deferred variants of obj_add/del helpers
On Thu, Oct 8, 2015 at 6:25 AM, Jiri Pirkowrote: > Thu, Oct 08, 2015 at 03:21:44PM CEST, gerlitz...@gmail.com wrote: >>On Thu, Oct 8, 2015 at 4:09 PM, Jiri Pirko wrote: >>> Thu, Oct 08, 2015 at 10:28:58AM CEST, j...@resnulli.us wrote: Thu, Oct 08, 2015 at 08:45:58AM CEST, gerlitz...@gmail.com wrote: >On Wed, Oct 7, 2015 at 9:30 PM, Jiri Pirko wrote: >> >This introduced a regression to the 2-phase commit scheme, since the >prepare commit can fail >and that would go un-noticed toward the upper layer, agree? >> Well, no. This still does the transaction for all lower devices in one go. No change in that. >> >>> Now I get it, yes you are right. But currently there is no code in >>> kernel which would control retval of deferred attr_set or obj_add/del >> >>I am not sure to understand your reply. You are saying that when the deferred >>procedures complete (e.g fail in the prepare phase) they can't actually let >>the upper layer to realize that this change isn't possible? this is >>exactly the bug. > > Correct. But check the code. Callers of current deferred variants do > not care about the retval. Therefore this is not a regression. > > It makes sense in my opinion. If you are a called and you explicitly say to > defer the operation, you cannot expect retval. Makes sense to me also, FWIW. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 00/13] rocker: add support for multiple worlds
On Fri, Oct 9, 2015 at 7:36 AM, Jiri Pirkowrote: > Wed, Oct 07, 2015 at 07:39:56PM CEST, j...@resnulli.us wrote: >>Wed, Oct 07, 2015 at 06:53:22PM CEST, sfel...@gmail.com wrote: >>>On Tue, Oct 6, 2015 at 11:03 PM, Jiri Pirko wrote: Tue, Oct 06, 2015 at 07:14:39PM CEST, sfel...@gmail.com wrote: >On Tue, Oct 6, 2015 at 12:30 AM, Jiri Pirko wrote: >> Tue, Oct 06, 2015 at 05:56:12AM CEST, sfel...@gmail.com wrote: >>>On Mon, Oct 5, 2015 at 10:43 AM, Jiri Pirko wrote: From: Jiri Pirko This patchset allows new rocker worlds to be easily added in future (like eBPF based one I have been working on). The main part of the patchset is the OF-DPA carve-out. It resuts in OF-DPA specific file. Clean cut. v1->v2: - rtnl rocker mode change userspace expose patch was removed Jiri Pirko (13): rocker: remove unused rocker_port param from alloc funcs and shorten their names rocker: rename rocker.h to rocker_hw.h rocker: rename rocker.c to rocker_main.c rocker: push tlv processing into separate files rocker: implement set settings mode command rocker: introduce worlds infrastructure rocker: introduce OF-DPA world skeleton rocker: set default world on port probe and clean world on remove rocker: pass "learning" value as a parameter to rocker_port_set_learning rocker: pre-allocate wait structures during cmd ring init rocker: remove trans parameter to rocker_cmd_exec function rocker: call rocker_cmd_exec function with "nowait" boolean instead of flags rocker: move OF-DPA stuff into separate file >>> >>>A couple of my tests are failing with this patchset. A simple port >>>test is failing and IPv4 routing test is failing. >>> >>>The port test is simple: just connect a port on DUT to a port on >>>another system and assign an IP address to each port and verify IP >>>connectivity. I have this: >>> >>> DUT:sw1p1 (11.0.0.1/24) <---> host1:eth0 (11.0.0.2/24) >>> >>>The IPv4 routing tests is a bit more complicated to setup. I'm using >>>OSPF, but I'm not seeing full routes formed in the topology, so I >>>suspect OSPF hellos aren't getting thru. >>> >>>Please fix find/fix these issues and send v3. I don't want any git >>>bisect issues when running tests. Thanks. >> >> I fixed that. Sending v3 in a sec. Thanks. > >Sorry, both tests are still broken. Would you send me your tests >scripts so I can see why your tests are passing? I'm trying some smoke tests including bridge setup and just ip-ip setup by hand. Meybe if you send me your scripts, I can run it locally. >>> >>>My test scripts are already included in the qemu tree. >> >>Okay, will rework and use your scripts. Hope I will find some time >>during this weekend. > > Scott, could you try to test with current net-next? > I'm trying basic: > DUT:sw1p1 (11.0.0.1/24) <---> host1:eth0 (11.0.0.2/24) > and it does not work for me now. It worked previously when I tested with > my patchset. This is getting odd. I had just re-run the tests against net-next before submitting the ageing_time patchset and everything passes. Are you using a namespace or a VM for host1? Either one should work. This would be a bad test, as the kernel will loop the traffic and the offload device will not see it: DUT:sw1p1 (11.0.0.1/24) <>DUT:sw1p2(11.0.0.2/24) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next RFC 2/3] switchdev: allow caller to explicitly use deferred attr_set version
On Thu, Oct 8, 2015 at 11:46 PM, Jiri Pirkowrote: > Fri, Oct 09, 2015 at 06:39:41AM CEST, sfel...@gmail.com wrote: >>On Thu, Oct 8, 2015 at 1:26 AM, Jiri Pirko wrote: >>> Thu, Oct 08, 2015 at 08:03:35AM CEST, sfel...@gmail.com wrote: On Wed, Oct 7, 2015 at 10:39 PM, Jiri Pirko wrote: > Thu, Oct 08, 2015 at 06:27:07AM CEST, sfel...@gmail.com wrote: >>On Wed, Oct 7, 2015 at 11:30 AM, Jiri Pirko wrote: >>> From: Jiri Pirko >>> >>> Caller should know if he can call attr_set directly (when holding RTNL) >>> or if he has to use deferred version of this function. >>> >>> This also allows drivers to sleep inside attr_set and report operation >>> status back to switchdev core. Switchdev core then warns if status is >>> not ok, instead of silent errors happening in drivers. >>> >>> Signed-off-by: Jiri Pirko >>> --- >>> include/net/switchdev.h | 2 + >>> net/bridge/br_stp.c | 4 +- >>> net/switchdev/switchdev.c | 113 >>> +- >>> 3 files changed, 65 insertions(+), 54 deletions(-) >>> >>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h >>> index 89266a3..320be44 100644 >>> --- a/include/net/switchdev.h >>> +++ b/include/net/switchdev.h >>> @@ -168,6 +168,8 @@ int switchdev_port_attr_get(struct net_device *dev, >>> struct switchdev_attr *attr); >>> int switchdev_port_attr_set(struct net_device *dev, >>> struct switchdev_attr *attr); >>> +int switchdev_port_attr_set_deferred(struct net_device *dev, >>> +struct switchdev_attr *attr); >> >>Rather than adding another op, use attr->flags and define: >> >>#define SWITCHDEV_F_DEFERRED BIT(x) >> >>So we get: >> >>void br_set_state(struct net_bridge_port *p, unsigned int state) >>{ >>struct switchdev_attr attr = { >>.id = SWITCHDEV_ATTR_ID_PORT_STP_STATE, >>+ .flags = SWITCHDEV_F_DEFERRED, >>.u.stp_state = state, >>}; >>int err; >> >>p->state = state; >>err = switchdev_port_attr_set(p->dev, ); >>if (err && err != -EOPNOTSUPP) >>br_warn(p->br, "error setting offload STP state on >>port %u(%s)\n", >>(unsigned int) p->port_no, >>p->dev->name); >>} >> >>(And add obj->flags to do the same). > > That's what I wanted to avoid. Also because the obj is const and for > call from work, this flag would have to be removed. What did you want to avoid? >>> >>> Having this as a flag. I don't like it too much. >>> But that is cosmetics. Other than that, does the patchset make sense? >>> Do you see some possible issues? >> >>patch 1/3 makes sense, I tested it out and no issues. (Looks like >>there are other places to assert rtnl_lock, are you going to add >>those?) > > Sure, can you pinpoint the places? Isn't every place we use netdev_for_each_lower_dev, like you mentioned in 1/3 patch? >>patch 2/3: Rather than trying to guess the call context in the core, >>make the caller call the right variant for its context. That part is >>good. On the flag vs. no flags, the reasons why I want this as a flag >>are: >> >>a) I want to keep the switchdev ops set to the core set: get/set attr >>and add/del/dump objs. I've pushed back on changing this before. I >>don't want ops explosion (like netdev_ops), and I'd like to avoid the >>1000-line patch when the arg list in an op changes, and we need to >>update N drivers. The flags lets the caller modify the algo behavior, >>while keeping the core call (and args) fixed. >> >>b) the caller can combine flags, where it makes sense. For example, >>maybe I'm in a locked context and I don't want to recurse the device >>tree, so I would make the call with NO_RECURSE | DEFERRED. If we >>didn't use flags, then we need to supply ops for each variant on the >>call, and then things explode. > > Fair enough. I'll process this in. Actually, I realized later that my reply here was only half true. Part b) to combine flags for various calling situation is good. Part a) is bogus because I confused adding a new op or adding a new wrapper to call existing op. You did the latter; but I was complaining about the former. Sorry about that. Regardless, port b) I think justifies using flags. > >> >>patch 3/3 I haven't looked at yet...I'm stuck on 2/3. > > It is very similar to 2/3, only for obj_add/del. Do we have examples of a deferred obj add or del? Maybe we should hold off adding that support until someone finds a use-case. I'm kind of hoping there isn't a use-case, but who knows? -- To unsubscribe
Re: [PATCH net-next v3 3/4] bridge: push bridge setting ageing_time down to switchdev
On Thu, Oct 8, 2015 at 9:38 PM, Premkumar Jonnala <pjonn...@broadcom.com> wrote: > > >> -Original Message- >> From: sfel...@gmail.com [mailto:sfel...@gmail.com] >> Sent: Friday, October 09, 2015 7:53 AM >> To: netdev@vger.kernel.org >> Cc: da...@davemloft.net; j...@resnulli.us; siva.mannem@gmail.com; >> Premkumar Jonnala; step...@networkplumber.org; >> ro...@cumulusnetworks.com; and...@lunn.ch; f.faine...@gmail.com; >> vivien.dide...@savoirfairelinux.com >> Subject: [PATCH net-next v3 3/4] bridge: push bridge setting ageing_time down >> to switchdev >> >> From: Scott Feldman <sfel...@gmail.com> >> >> Use SWITCHDEV_F_SKIP_EOPNOTSUPP to skip over ports in bridge that don't >> support setting ageing_time (or setting bridge attrs in general). >> >> If push fails, don't update ageing_time in bridge and return err to user. >> >> If push succeeds, update ageing_time in bridge and run gc_timer now to >> recalabrate when to run gc_timer next, based on new ageing_time. >> >> Signed-off-by: Scott Feldman <sfel...@gmail.com> >> Signed-off-by: Jiri Pirko <j...@resnulli.us> >> +int br_set_ageing_time(struct net_bridge *br, u32 ageing_time) >> +{ >> + struct switchdev_attr attr = { >> + .id = SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME, >> + .flags = SWITCHDEV_F_SKIP_EOPNOTSUPP, >> + .u.ageing_time = ageing_time, >> + }; >> + unsigned long t = clock_t_to_jiffies(ageing_time); >> + int err; >> + >> + if (t < BR_MIN_AGEING_TIME || t > BR_MAX_AGEING_TIME) >> + return -ERANGE; >> + >> + err = switchdev_port_attr_set(br->dev, ); > > A thought - given that the ageing time is not a per-bridge-port attr, why are > we using a "port based api" > to pass the attribute down? May be I'm missing something here? I think Florian raised the same point earlier. Sigh, I think this should be addressedv4 coming soon...thanks guys for keeping the standard high. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next] bridge: try switchdev op first in __vlan_vid_add/del
On Fri, Oct 9, 2015 at 4:54 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > Some drivers need to implement both switchdev vlan ops and > vid_add/kill ndos. For that to work in bridge code, we need to try > switchdev op first when adding/deleting vlan id. > > Signed-off-by: Jiri Pirko <j...@mellanox.com> > Signed-off-by: Ido Schimmel <ido...@mellanox.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: switchdev and VLAN ranges
On Fri, Oct 9, 2015 at 4:30 PM, Vivien Didelotwrote: > Hi All, > > I understand that specifying a VLAN range on the command line is nice > for the user, and it makes no big deal for software implementation. [Adding Roopa, since she did the original vlan range support in the kernel/iproute2] > However, AFAICT a VLAN range does not make sense at all for hardware > such as Ethernet switch chips. Am I wrong? > > I would suggest to make switchdev directly answer to a bridge request > that the operation is not supported when the user asks for a VLAN range. > > That way, we can simply use a single "vid" member in struct > switchdev_obj_port_vlan instead of "vid_begin" and "vid_end" and thus > avoid making drivers heavier with iteration loops on such range. > > I have two concerns in mind: > > a) if we imagine that drivers like Rocker allocate memory in the prepare > phase for each VID, preparing a range like 100-4000 would definitely not > be recommended. This call should be in process context so it doesn't seem to terrible for the driver to take its time to reserve/allocate resources in prepare phase, even for a vlan range. I think I'm missing your point. > b) imagine that you have two Linux bridges on a switch, one using the > hardware VLAN 100. If you request the VLAN range 99-101 for the other > bridge members, it is not possible for the driver to say "I can > accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for > the whole range. Well, it probably should return -ERANGE to indicate the range can't be added, but that's an aside. The reason why vlan ranges need to work down to the switchdev driver is, from the user's perspective, it's an all-or-nothing request from the user to add the vlan range to the device. So we need to ask the driver in the prepare phase, "can you support this range, completely?", and if yes, then commit it as a whole. The netlink response back to the user isn't equipped to describe what subset of the range was added, and what subset was not. > That's why I think that avoiding VLAN range at the switchdev level would > be a good idea. As a general rule with switchdev, we've tried to keep the user's experience the same when using {Linux} as a soft switch/router vs. using {Linux + offload device} as a hard switch/router. So if native Linux supports some operation, for example vlan ranges, then we should try to extend that to the offload model. In other words, we don't want to re-train the user when moving from soft switch to hard switch! But there are physical limitations when dealing with an offload device Anyway, with your vlan range example, we've got a case where each soft bridge has an independent vlan set, and the vlan sets between soft bridges can overlap. For the (typical) hard switch, there is one vlan set for the whole switch, and trying to overlay the soft bridges' (overlapping) vlan sets on the hard switch fails. That failure is reported to the user. We tried, but due to offload device limitations, we can't support that operation. Of course, if the vlan sets didn't overlap, then we don't have a problem. This will not be the only case where something we can do on a soft Linux switch/router can't be offloaded to some physical offload device. But I think the philosophy has been to try offload what we can, up to the point of failure. In some cases, we can mask that failure from the user by falling back to soft-switch only, but in other cases the failure will pop up right in the user's face, like in your example. One idea to help mitigate the user's confusion would be to limit the number of bridges overlaid on the device to just one. Our drivers know when ports are enslaved to bridges, so is there something we can do there to fail the enslave on a second bridge? Exercise left to the reader. If we had that, now vlan ranges work 1:1 with soft Linux because both soft bridge and device have single vlan set. Sorry for the long-winded response. -scott -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ipv6 route: use err pointers instead of returning pointer by reference
On Thu, Oct 8, 2015 at 10:26 AM, Roopa Prabhuwrote: > From: Roopa Prabhu > > This patch makes ip6_route_info_create return err pointer instead of > returning the rt pointer by reference as suggested by Dave > > Signed-off-by: Roopa Prabhu > --- > Dave, sorry abt the delay on this one. net-next was closed when i got to it > and its been in my queue since then. > > net/ipv6/route.c | 30 -- > 1 file changed, 16 insertions(+), 14 deletions(-) > int ip6_route_add(struct fib6_config *cfg) > @@ -1980,9 +1976,12 @@ int ip6_route_add(struct fib6_config *cfg) > struct rt6_info *rt = NULL; nit: don't need to init rt since it's now set unconditionally. > int err; > > - err = ip6_route_info_create(cfg, ); > - if (err) > + rt = ip6_route_info_create(cfg); > + if (IS_ERR(rt)) { > + err = PTR_ERR(rt); > + rt = NULL; > goto out; > + } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next] bridge: try switchdev op first in __vlan_vid_add/del
On Fri, Oct 9, 2015 at 3:44 PM, Vivien Didelotwrote: > Hi Jiri, > > On Oct. Friday 09 (41) 01:54 PM, Jiri Pirko wrote: >> From: Jiri Pirko >> >> Some drivers need to implement both switchdev vlan ops and >> vid_add/kill ndos. For that to work in bridge code, we need to try >> switchdev op first when adding/deleting vlan id. > > Just curious, when would a driver need to have both operations? Ya, I was kind of curious of that myself. Is this because the driver wants to support standalone VLANs using 802.1q module and vconfig, as well as bridge vlans? With the vlan support built into the bridge, I've been working under the assumption that 802.1q module (and vconfig) aren't needed, and vlans for a bridged and non-bridge port can be managed using the "bridge" iproute2 cmd. > I kinda have the same question regarding ndo_fdb_{add,del} and the > bridge_{get,set}link equivalent, which is a bit confusing to me. I had to look back at my commit 7f109539 to remind myself about the vid_add/kill ndos and bridge_{get,set}link usage. Maybe that write-up helps? I'm not following you on the ndo_fdb_add/del part of your question. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v3] bridge: allow adding of fdb entries pointing to the bridge device
On Thu, Oct 8, 2015 at 10:38 AM, Roopa Prabhuwrote: > From: Roopa Prabhu > > This patch enables adding of fdb entries pointing to the bridge device. > This can be used to propagate mac address of vlan interfaces > configured on top of the vlan filtering bridge. > > Before: > $bridge fdb add 44:38:39:00:27:9f dev bridge > RTNETLINK answers: Invalid argument > > After: > $bridge fdb add 44:38:39:00:27:9f dev bridge > > Signed-off-by: Roopa Prabhu > Reviewed-by: Nikolay Aleksandrov > --- > v1 - v2 : fix kbuild warnings > v2 - v3 : address review comments from Nikolay (use of br_vlan_should_use) > > net/bridge/br_fdb.c | 122 > ++- > net/bridge/br_vlan.c | 1 + > 2 files changed, 93 insertions(+), 30 deletions(-) > > diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c > index 7f7d551..f43ce05 100644 > --- a/net/bridge/br_fdb.c > +++ b/net/bridge/br_fdb.c > @@ -608,13 +608,14 @@ void br_fdb_update(struct net_bridge *br, struct > net_bridge_port *source, > } > } > > -static int fdb_to_nud(const struct net_bridge_fdb_entry *fdb) > +static int fdb_to_nud(const struct net_bridge *br, > + const struct net_bridge_fdb_entry *fdb) > { > if (fdb->is_local) > return NUD_PERMANENT; > else if (fdb->is_static) > return NUD_NOARP; > - else if (has_expired(fdb->dst->br, fdb)) > + else if (has_expired(br, fdb)) > return NUD_STALE; > else > return NUD_REACHABLE; > @@ -640,7 +641,7 @@ static int fdb_fill_info(struct sk_buff *skb, const > struct net_bridge *br, > ndm->ndm_flags = fdb->added_by_external_learn ? NTF_EXT_LEARNED : 0; > ndm->ndm_type= 0; > ndm->ndm_ifindex = fdb->dst ? fdb->dst->dev->ifindex : > br->dev->ifindex; > - ndm->ndm_state = fdb_to_nud(fdb); > + ndm->ndm_state = fdb_to_nud(br, fdb); > > if (nla_put(skb, NDA_LLADDR, ETH_ALEN, >addr)) > goto nla_put_failure; > @@ -785,7 +786,7 @@ static int fdb_add_entry(struct net_bridge_port *source, > const __u8 *addr, > } > } > > - if (fdb_to_nud(fdb) != state) { > + if (fdb_to_nud(br, fdb) != state) { Hi Roopa, Are the above changes to fdb_to_nud() related to the patch subject? I was trying to figure out this part of the patch...seems unrelated. Is fdb->dst->br now not valid in some cases? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 5/6] net: dsa: push prepare phase in port_fdb_add
On Wed, Oct 7, 2015 at 4:48 PM, Vivien Didelot <vivien.dide...@savoirfairelinux.com> wrote: > Now that the prepare phase is pushed down to the DSA drivers, propagate > it to the port_fdb_add function. > > Signed-off-by: Vivien Didelot <vivien.dide...@savoirfairelinux.com> Reviewed-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 6/6] net: dsa: use switchdev obj in port_fdb_del
On Wed, Oct 7, 2015 at 4:48 PM, Vivien Didelot <vivien.dide...@savoirfairelinux.com> wrote: > For consistency with the FDB add operation, propagate the > switchdev_obj_port_fdb structure in the DSA drivers. > > Signed-off-by: Vivien Didelot <vivien.dide...@savoirfairelinux.com> Reviewed-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 4/6] net: dsa: add port_fdb_prepare
On Wed, Oct 7, 2015 at 4:48 PM, Vivien Didelot <vivien.dide...@savoirfairelinux.com> wrote: > Push the prepare phase for FDB operations down to the DSA drivers, with > a new port_fdb_prepare function. Currently only mv88e6xxx is affected. > > Signed-off-by: Vivien Didelot <vivien.dide...@savoirfairelinux.com> Reviewed-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next RFC 2/3] switchdev: allow caller to explicitly use deferred attr_set version
On Wed, Oct 7, 2015 at 10:39 PM, Jiri Pirkowrote: > Thu, Oct 08, 2015 at 06:27:07AM CEST, sfel...@gmail.com wrote: >>On Wed, Oct 7, 2015 at 11:30 AM, Jiri Pirko wrote: >>> From: Jiri Pirko >>> >>> Caller should know if he can call attr_set directly (when holding RTNL) >>> or if he has to use deferred version of this function. >>> >>> This also allows drivers to sleep inside attr_set and report operation >>> status back to switchdev core. Switchdev core then warns if status is >>> not ok, instead of silent errors happening in drivers. >>> >>> Signed-off-by: Jiri Pirko >>> --- >>> include/net/switchdev.h | 2 + >>> net/bridge/br_stp.c | 4 +- >>> net/switchdev/switchdev.c | 113 >>> +- >>> 3 files changed, 65 insertions(+), 54 deletions(-) >>> >>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h >>> index 89266a3..320be44 100644 >>> --- a/include/net/switchdev.h >>> +++ b/include/net/switchdev.h >>> @@ -168,6 +168,8 @@ int switchdev_port_attr_get(struct net_device *dev, >>> struct switchdev_attr *attr); >>> int switchdev_port_attr_set(struct net_device *dev, >>> struct switchdev_attr *attr); >>> +int switchdev_port_attr_set_deferred(struct net_device *dev, >>> +struct switchdev_attr *attr); >> >>Rather than adding another op, use attr->flags and define: >> >>#define SWITCHDEV_F_DEFERRED BIT(x) >> >>So we get: >> >>void br_set_state(struct net_bridge_port *p, unsigned int state) >>{ >>struct switchdev_attr attr = { >>.id = SWITCHDEV_ATTR_ID_PORT_STP_STATE, >>+ .flags = SWITCHDEV_F_DEFERRED, >>.u.stp_state = state, >>}; >>int err; >> >>p->state = state; >>err = switchdev_port_attr_set(p->dev, ); >>if (err && err != -EOPNOTSUPP) >>br_warn(p->br, "error setting offload STP state on >>port %u(%s)\n", >>(unsigned int) p->port_no, >>p->dev->name); >>} >> >>(And add obj->flags to do the same). > > That's what I wanted to avoid. Also because the obj is const and for > call from work, this flag would have to be removed. What did you want to avoid? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 1/4] switchdev: add bridge attributes
On Thu, Oct 8, 2015 at 1:39 AM, Jiri Pirko <j...@resnulli.us> wrote: > Thu, Oct 08, 2015 at 08:04:40AM CEST, sfel...@gmail.com wrote: >>From: Scott Feldman <sfel...@gmail.com> >> >>Setting the stage to push bridge-level attributes down to port driver so >>hardware can be programmed accordingly. Bridge-level attribute example is >>ageing_time. This is a per-bridge attribute, not a per-bridge-port attr. >> >>Signed-off-by: Scott Feldman <sfel...@gmail.com> >>--- >> include/net/switchdev.h |5 + >> include/uapi/linux/if_link.h |2 +- >> 2 files changed, 6 insertions(+), 1 deletion(-) >> >>diff --git a/include/net/switchdev.h b/include/net/switchdev.h >>index 89266a3..8d92cd0 100644 >>--- a/include/net/switchdev.h >>+++ b/include/net/switchdev.h >>@@ -43,6 +43,7 @@ enum switchdev_attr_id { >> SWITCHDEV_ATTR_ID_PORT_PARENT_ID, >> SWITCHDEV_ATTR_ID_PORT_STP_STATE, >> SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS, >>+ SWITCHDEV_ATTR_ID_BRIDGE, >> }; >> >> struct switchdev_attr { >>@@ -52,6 +53,10 @@ struct switchdev_attr { >> struct netdev_phys_item_id ppid;/* PORT_PARENT_ID */ >> u8 stp_state; /* PORT_STP_STATE */ >> unsigned long brport_flags; /* PORT_BRIDGE_FLAGS */ >>+ struct switchdev_attr_bridge { /* BRIDGE */ >>+ enum ifla_br attr; > > I don't like pushing down IFLA_BR_* values throught switchdev. I think > it might better to just intruduce: > SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME > > and "u32 ageing_time" here. Something similar to stp_state. Much easier > to read and does not give blank cheque for passing any bridge IFLA_BR_* > down to drivers. > > It also aligns with bridge code nicely, I believe. Done, v3 sent with your suggested change. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v3 3/4] bridge: push bridge setting ageing_time down to switchdev
On Thu, Oct 8, 2015 at 7:40 PM, Florian Fainelli <f.faine...@gmail.com> wrote: > 2015-10-08 19:23 GMT-07:00 <sfel...@gmail.com>: >> From: Scott Feldman <sfel...@gmail.com> >> >> Use SWITCHDEV_F_SKIP_EOPNOTSUPP to skip over ports in bridge that don't >> support setting ageing_time (or setting bridge attrs in general). >> >> If push fails, don't update ageing_time in bridge and return err to user. >> >> If push succeeds, update ageing_time in bridge and run gc_timer now to >> recalabrate when to run gc_timer next, based on new ageing_time. >> >> Signed-off-by: Scott Feldman <sfel...@gmail.com> >> Signed-off-by: Jiri Pirko <j...@resnulli.us> >> --- >> net/bridge/br_ioctl.c|3 +-- >> net/bridge/br_netlink.c |6 +++--- >> net/bridge/br_private.h |1 + >> net/bridge/br_stp.c | 23 +++ >> net/bridge/br_sysfs_br.c |3 +-- >> 5 files changed, 29 insertions(+), 7 deletions(-) >> >> diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c >> index 8d423bc..263b4de 100644 >> --- a/net/bridge/br_ioctl.c >> +++ b/net/bridge/br_ioctl.c >> @@ -200,8 +200,7 @@ static int old_dev_ioctl(struct net_device *dev, struct >> ifreq *rq, int cmd) >> if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN)) >> return -EPERM; >> >> - br->ageing_time = clock_t_to_jiffies(args[1]); >> - return 0; >> + return br_set_ageing_time(br, args[1]); >> >> case BRCTL_GET_PORT_INFO: >> { >> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c >> index d78b442..544ab96 100644 >> --- a/net/bridge/br_netlink.c >> +++ b/net/bridge/br_netlink.c >> @@ -870,9 +870,9 @@ static int br_changelink(struct net_device *brdev, >> struct nlattr *tb[], >> } >> >> if (data[IFLA_BR_AGEING_TIME]) { >> - u32 ageing_time = nla_get_u32(data[IFLA_BR_AGEING_TIME]); >> - >> - br->ageing_time = clock_t_to_jiffies(ageing_time); >> + err = br_set_ageing_time(br, >> nla_get_u32(data[IFLA_BR_AGEING_TIME])); >> + if (err) >> + return err; >> } >> >> if (data[IFLA_BR_STP_STATE]) { >> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h >> index 09d3ecb..ba0c67b 100644 >> --- a/net/bridge/br_private.h >> +++ b/net/bridge/br_private.h >> @@ -882,6 +882,7 @@ void __br_set_forward_delay(struct net_bridge *br, >> unsigned long t); >> int br_set_forward_delay(struct net_bridge *br, unsigned long x); >> int br_set_hello_time(struct net_bridge *br, unsigned long x); >> int br_set_max_age(struct net_bridge *br, unsigned long x); >> +int br_set_ageing_time(struct net_bridge *br, u32 ageing_time); >> >> >> /* br_stp_if.c */ >> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c >> index 3a982c0..db6d243de 100644 >> --- a/net/bridge/br_stp.c >> +++ b/net/bridge/br_stp.c >> @@ -566,6 +566,29 @@ int br_set_max_age(struct net_bridge *br, unsigned long >> val) >> >> } >> >> +int br_set_ageing_time(struct net_bridge *br, u32 ageing_time) >> +{ >> + struct switchdev_attr attr = { >> + .id = SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME, >> + .flags = SWITCHDEV_F_SKIP_EOPNOTSUPP, >> + .u.ageing_time = ageing_time, >> + }; >> + unsigned long t = clock_t_to_jiffies(ageing_time); >> + int err; >> + >> + if (t < BR_MIN_AGEING_TIME || t > BR_MAX_AGEING_TIME) >> + return -ERANGE; >> + >> + err = switchdev_port_attr_set(br->dev, ); >> + if (err) >> + return err; >> + >> + br->ageing_time = t; >> + mod_timer(>gc_timer, jiffies); > > If the switch driver/HW supports ageing, does it still make sense to > have this software timer ticking? Yes, because the bridge still needs to age out entries it has learned (those not marked with added_by_external_learn), for example entries learned on non-offloaded ports. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next RFC 2/3] switchdev: allow caller to explicitly use deferred attr_set version
On Thu, Oct 8, 2015 at 1:26 AM, Jiri Pirkowrote: > Thu, Oct 08, 2015 at 08:03:35AM CEST, sfel...@gmail.com wrote: >>On Wed, Oct 7, 2015 at 10:39 PM, Jiri Pirko wrote: >>> Thu, Oct 08, 2015 at 06:27:07AM CEST, sfel...@gmail.com wrote: On Wed, Oct 7, 2015 at 11:30 AM, Jiri Pirko wrote: > From: Jiri Pirko > > Caller should know if he can call attr_set directly (when holding RTNL) > or if he has to use deferred version of this function. > > This also allows drivers to sleep inside attr_set and report operation > status back to switchdev core. Switchdev core then warns if status is > not ok, instead of silent errors happening in drivers. > > Signed-off-by: Jiri Pirko > --- > include/net/switchdev.h | 2 + > net/bridge/br_stp.c | 4 +- > net/switchdev/switchdev.c | 113 > +- > 3 files changed, 65 insertions(+), 54 deletions(-) > > diff --git a/include/net/switchdev.h b/include/net/switchdev.h > index 89266a3..320be44 100644 > --- a/include/net/switchdev.h > +++ b/include/net/switchdev.h > @@ -168,6 +168,8 @@ int switchdev_port_attr_get(struct net_device *dev, > struct switchdev_attr *attr); > int switchdev_port_attr_set(struct net_device *dev, > struct switchdev_attr *attr); > +int switchdev_port_attr_set_deferred(struct net_device *dev, > +struct switchdev_attr *attr); Rather than adding another op, use attr->flags and define: #define SWITCHDEV_F_DEFERRED BIT(x) So we get: void br_set_state(struct net_bridge_port *p, unsigned int state) { struct switchdev_attr attr = { .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE, + .flags = SWITCHDEV_F_DEFERRED, .u.stp_state = state, }; int err; p->state = state; err = switchdev_port_attr_set(p->dev, ); if (err && err != -EOPNOTSUPP) br_warn(p->br, "error setting offload STP state on port %u(%s)\n", (unsigned int) p->port_no, p->dev->name); } (And add obj->flags to do the same). >>> >>> That's what I wanted to avoid. Also because the obj is const and for >>> call from work, this flag would have to be removed. >> >>What did you want to avoid? > > Having this as a flag. I don't like it too much. > But that is cosmetics. Other than that, does the patchset make sense? > Do you see some possible issues? patch 1/3 makes sense, I tested it out and no issues. (Looks like there are other places to assert rtnl_lock, are you going to add those?) patch 2/3: Rather than trying to guess the call context in the core, make the caller call the right variant for its context. That part is good. On the flag vs. no flags, the reasons why I want this as a flag are: a) I want to keep the switchdev ops set to the core set: get/set attr and add/del/dump objs. I've pushed back on changing this before. I don't want ops explosion (like netdev_ops), and I'd like to avoid the 1000-line patch when the arg list in an op changes, and we need to update N drivers. The flags lets the caller modify the algo behavior, while keeping the core call (and args) fixed. b) the caller can combine flags, where it makes sense. For example, maybe I'm in a locked context and I don't want to recurse the device tree, so I would make the call with NO_RECURSE | DEFERRED. If we didn't use flags, then we need to supply ops for each variant on the call, and then things explode. patch 3/3 I haven't looked at yet...I'm stuck on 2/3. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next RFC 2/3] switchdev: allow caller to explicitly use deferred attr_set version
On Wed, Oct 7, 2015 at 11:30 AM, Jiri Pirkowrote: > From: Jiri Pirko > > Caller should know if he can call attr_set directly (when holding RTNL) > or if he has to use deferred version of this function. > > This also allows drivers to sleep inside attr_set and report operation > status back to switchdev core. Switchdev core then warns if status is > not ok, instead of silent errors happening in drivers. > > Signed-off-by: Jiri Pirko > --- > include/net/switchdev.h | 2 + > net/bridge/br_stp.c | 4 +- > net/switchdev/switchdev.c | 113 > +- > 3 files changed, 65 insertions(+), 54 deletions(-) > > diff --git a/include/net/switchdev.h b/include/net/switchdev.h > index 89266a3..320be44 100644 > --- a/include/net/switchdev.h > +++ b/include/net/switchdev.h > @@ -168,6 +168,8 @@ int switchdev_port_attr_get(struct net_device *dev, > struct switchdev_attr *attr); > int switchdev_port_attr_set(struct net_device *dev, > struct switchdev_attr *attr); > +int switchdev_port_attr_set_deferred(struct net_device *dev, > +struct switchdev_attr *attr); Rather than adding another op, use attr->flags and define: #define SWITCHDEV_F_DEFERRED BIT(x) So we get: void br_set_state(struct net_bridge_port *p, unsigned int state) { struct switchdev_attr attr = { .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE, + .flags = SWITCHDEV_F_DEFERRED, .u.stp_state = state, }; int err; p->state = state; err = switchdev_port_attr_set(p->dev, ); if (err && err != -EOPNOTSUPP) br_warn(p->br, "error setting offload STP state on port %u(%s)\n", (unsigned int) p->port_no, p->dev->name); } (And add obj->flags to do the same). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v3 06/14] rocker: introduce worlds infrastructure
On Tue, Oct 6, 2015 at 11:14 PM, Jiri Pirkowrote: > Wed, Oct 07, 2015 at 03:50:08AM CEST, sfel...@gmail.com wrote: > > > >> >>> Also I wonder how this works when a pkt ingresses a port in mode A and >>> egresses a port in mode B? What fib/fdb tables does it cross when this >>> happens? It seems easier to just have two switch devices not a >>> hybrid. If this per port implementation maps to some hardware that >>> would be really interesting though. >> >>In retrospect, I regret adding the port mode feature to rocker. I >>like the world idea, so we can have a device with different >>pipeline/resources, but we should have locked all ports on a switch to >>one mode, or even as you hinted at earlier, use a unique sub-device ID >>for a switch with all ports in a particular mode. If you want to >>ports with different worlds, just instantiate a switch in each world. >>Instantiating new devices is easy. >> >>But, now Jiri has locked on to the dynamic port mode idea with pit >>bull zeal, to the point of being able to switch a port mode at any >>time from one mode to another from the host. I just don't see that as >>a real-world use-case. Life is too short and we need to be focusing >>on switchdev features, not refactoring or adding cool but useless >>features. > > Can can still change this if you want. We can make > ROCKER_TLV_CMD_PORT_SETTINGS_MODE read-only in hw (As it is in fact now > as we have only one world). > > Then we add another property: > static Property rocker_properties[] = { > DEFINE_PROP_STRING("name", Rocker, name), > DEFINE_PROP_STRING("world", Rocker, world), > > > and we use this value in pci_rocker_init instead of r->world_dflt > > Looks straightforward. Yes, perfect, I totally on-board with that change. This puts all ports on the device in the same mode. I you want ports in a different mode, instantiate another switch. Mixing port modes on a switch, either statically or dynamically, would have created another problem. Switchdev uses the switch_id for each port to know when ports belong to the same switch, same switching domain, if you will. Mixing port modes within a switch breaks this. For example, the check we added to avoid double forwarding of pkts by the bridge and the device depends on ports on the switch having the same switch_id. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 00/13] rocker: add support for multiple worlds
On Tue, Oct 6, 2015 at 11:03 PM, Jiri Pirkowrote: > Tue, Oct 06, 2015 at 07:14:39PM CEST, sfel...@gmail.com wrote: >>On Tue, Oct 6, 2015 at 12:30 AM, Jiri Pirko wrote: >>> Tue, Oct 06, 2015 at 05:56:12AM CEST, sfel...@gmail.com wrote: On Mon, Oct 5, 2015 at 10:43 AM, Jiri Pirko wrote: > From: Jiri Pirko > > This patchset allows new rocker worlds to be easily added in future (like > eBPF > based one I have been working on). The main part of the patchset is the > OF-DPA > carve-out. It resuts in OF-DPA specific file. Clean cut. > > v1->v2: > - rtnl rocker mode change userspace expose patch was removed > > Jiri Pirko (13): > rocker: remove unused rocker_port param from alloc funcs and shorten > their names > rocker: rename rocker.h to rocker_hw.h > rocker: rename rocker.c to rocker_main.c > rocker: push tlv processing into separate files > rocker: implement set settings mode command > rocker: introduce worlds infrastructure > rocker: introduce OF-DPA world skeleton > rocker: set default world on port probe and clean world on remove > rocker: pass "learning" value as a parameter to > rocker_port_set_learning > rocker: pre-allocate wait structures during cmd ring init > rocker: remove trans parameter to rocker_cmd_exec function > rocker: call rocker_cmd_exec function with "nowait" boolean instead of > flags > rocker: move OF-DPA stuff into separate file A couple of my tests are failing with this patchset. A simple port test is failing and IPv4 routing test is failing. The port test is simple: just connect a port on DUT to a port on another system and assign an IP address to each port and verify IP connectivity. I have this: DUT:sw1p1 (11.0.0.1/24) <---> host1:eth0 (11.0.0.2/24) The IPv4 routing tests is a bit more complicated to setup. I'm using OSPF, but I'm not seeing full routes formed in the topology, so I suspect OSPF hellos aren't getting thru. Please fix find/fix these issues and send v3. I don't want any git bisect issues when running tests. Thanks. >>> >>> I fixed that. Sending v3 in a sec. Thanks. >> >>Sorry, both tests are still broken. Would you send me your tests >>scripts so I can see why your tests are passing? > > I'm trying some smoke tests including bridge setup and just ip-ip > setup by hand. Meybe if you send me your scripts, I can run it locally. My test scripts are already included in the qemu tree. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v3 06/14] rocker: introduce worlds infrastructure
On Tue, Oct 6, 2015 at 2:25 PM, John Fastabendwrote: > [...] > Using void * in these ops is unacceptable, I can't agree to this patch. There is a much cleaner way to architect this. If you look at the ops defined, they're mostly duplicates of the already defined switchdev_ops. It would be much cleaner to: 0) set port mode on qemu/rocker (the device) 1) get the port mode on port probe 2) based on port mode, set the switchdev_ops to point to the port mode world switchdev_ops 3) sub-class rocker_port, like I mentioned in before, to store world-specific stuff in rocker_port I don't buy the argument that we need to change port mode dynamically from the driver. Set it at the device and be done. >>> >>> Maybe as a reference this strikes me as similar to how we do multiple >>> device support in a single driver like ixgbe or fm10k (the two I'm most >>> familiar with). At probe time we read the device id and then stub in >>> the specific callbacks for that device. >> >> Exactly >> >>> Sorry I'm still hung up on the multiple worlds thing, is it really >>> trying to model different devices under a single driver? In which case >>> maybe rather than port mode expose it as its own device id. Just a >>> thought. >> >> Yes, different devices under single driver. New device ID or >> sub-device ID will not work in this case as we're trying to slice it >> at the port level, not the device level. >> > > OK uncovered my next level of suspicion/confusion. > > Do you actually have or seen hardware that has completely different > programming interface per port? And completely different pipelines? I haven't. > This seems really strange to me and perhaps just an artifact of > the qemu implementation? Typically or at least what I expect is you > have a switch pipeline with a set of data structures, tcams, hash > tables, etc all connected together in some (possibly configurable) > topology. Ports feed packets into this pipeline and packets egress > out ports. In my logical view of a "switch" device the pipeline > is a shared resource you can partition it so that ports are isolated > in some sense but you can't use fundamentally different underlying > resources per ports. Its not a per port attribute/mode like this > series sort of hints at. Yes, this is an artifact of the qemu implementation. The idea was ports could run in one mode (world) or another. The device would make sure port traffic stays within the world. Each world would have its own pipeline and resources. So one switch device could have ports in different worlds. > Also I wonder how this works when a pkt ingresses a port in mode A and > egresses a port in mode B? What fib/fdb tables does it cross when this > happens? It seems easier to just have two switch devices not a > hybrid. If this per port implementation maps to some hardware that > would be really interesting though. In retrospect, I regret adding the port mode feature to rocker. I like the world idea, so we can have a device with different pipeline/resources, but we should have locked all ports on a switch to one mode, or even as you hinted at earlier, use a unique sub-device ID for a switch with all ports in a particular mode. If you want to ports with different worlds, just instantiate a switch in each world. Instantiating new devices is easy. But, now Jiri has locked on to the dynamic port mode idea with pit bull zeal, to the point of being able to switch a port mode at any time from one mode to another from the host. I just don't see that as a real-world use-case. Life is too short and we need to be focusing on switchdev features, not refactoring or adding cool but useless features. -scott -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v3 06/14] rocker: introduce worlds infrastructure
On Tue, Oct 6, 2015 at 9:15 AM, Jiri Pirkowrote: > Tue, Oct 06, 2015 at 06:06:45PM CEST, sfel...@gmail.com wrote: >>On Tue, Oct 6, 2015 at 12:51 AM, Jiri Pirko wrote: >>> From: Jiri Pirko >>> >>> This is another step on the way to per-world clean cut. Introduce world >>> ops hooks which each world can implement in world-specific way. >>> Also introduce world infrastructure including function for port worlds >>> change. >>> >>> Signed-off-by: Jiri Pirko >>> --- >>> v1->v2: >>> - removed functions to change worlds based on name, as rtnl mode >>>set patch is removed from patchset. >>> v2->v3: >>> - fix checks in rocker_world_port_open and rocker_world_port_stop >>> --- >>> drivers/net/ethernet/rocker/rocker.h | 56 >>> drivers/net/ethernet/rocker/rocker_main.c | 464 >>> +- >>> 2 files changed, 519 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/net/ethernet/rocker/rocker.h >>> b/drivers/net/ethernet/rocker/rocker.h >>> index 650caa0..d49bc5d 100644 >>> --- a/drivers/net/ethernet/rocker/rocker.h >>> +++ b/drivers/net/ethernet/rocker/rocker.h >>> @@ -12,7 +12,11 @@ >>> #ifndef _ROCKER_H >>> #define _ROCKER_H >>> >>> +#include >>> #include >>> +#include >>> +#include >>> +#include >>> >>> #include "rocker_hw.h" >>> >>> @@ -24,4 +28,56 @@ struct rocker_desc_info { >>> dma_addr_t mapaddr; >>> }; >>> >>> +struct rocker; >>> +struct rocker_port; >>> + >>> +struct rocker_world_ops { >>> + const char *kind; >>> + size_t priv_size; >>> + size_t port_priv_size; >>> + u8 mode; >>> + int (*init)(struct rocker *rocker, void *priv); >>> + void (*fini)(void *priv); >>> + int (*port_init)(struct rocker_port *rocker_port, void *priv, >>> +void *port_priv); >>> + void (*port_fini)(void *port_priv); >>> + int (*port_open)(void *port_priv); >>> + void (*port_stop)(void *port_priv); >>> + int (*port_attr_stp_state_set)(void *port_priv, u8 state, >>> + struct switchdev_trans *trans); >>> + int (*port_attr_bridge_flags_set)(void *port_priv, >>> + unsigned long brport_flags, >>> + struct switchdev_trans *trans); >>> + int (*port_attr_bridge_flags_get)(void *port_priv, >>> + unsigned long *p_brport_flags); >>> + int (*port_obj_vlan_add)(void *port_priv, >>> +const struct switchdev_obj_port_vlan *vlan, >>> +struct switchdev_trans *trans); >>> + int (*port_obj_vlan_del)(void *port_priv, >>> +const struct switchdev_obj_port_vlan >>> *vlan); >>> + int (*port_obj_vlan_dump)(void *port_priv, >>> + struct switchdev_obj_port_vlan *vlan, >>> + switchdev_obj_dump_cb_t *cb); >>> + int (*port_obj_fib4_add)(void *port_priv, >>> +const struct switchdev_obj_ipv4_fib *fib4, >>> +struct switchdev_trans *trans); >>> + int (*port_obj_fib4_del)(void *port_priv, >>> +const struct switchdev_obj_ipv4_fib *fib4); >>> + int (*port_obj_fdb_add)(void *port_priv, >>> + const struct switchdev_obj_port_fdb *fdb, >>> + struct switchdev_trans *trans); >>> + int (*port_obj_fdb_del)(void *port_priv, >>> + const struct switchdev_obj_port_fdb *fdb); >>> + int (*port_obj_fdb_dump)(void *port_priv, >>> +struct switchdev_obj_port_fdb *fdb, >>> +switchdev_obj_dump_cb_t *cb); >>> + int (*port_master_linked)(void *port_priv, struct net_device >>> *master); >>> + int (*port_master_unlinked)(void *port_priv, struct net_device >>> *master); >>> + int (*port_neigh_update)(void *port_priv, struct neighbour *n); >>> + int (*port_neigh_destroy)(void *port_priv, struct neighbour *n); >>> + int (*port_ev_mac_vlan_seen)(void *port_priv, >>> +const unsigned char *addr, >>> +__be16 vlan_id); >>> +}; >> >>Using void * in these ops is unacceptable, I can't agree to this patch. > > Could you please elaborate more why is it unacceptable. I don't see any > problem in that. void * is OK if you're passing around un-typed, TBD-sized objects. Wait, this conversation doesn't even seem real. You're really advocating, publicly, the use of void * in ops callbacks, when the pointer passed could be a well-typed pointer? How could define a new API internal to the driver with void * as the main arg? >>There is a much cleaner way to
Re: [patch net-next v3 06/14] rocker: introduce worlds infrastructure
On Tue, Oct 6, 2015 at 12:51 AM, Jiri Pirkowrote: > From: Jiri Pirko > > This is another step on the way to per-world clean cut. Introduce world > ops hooks which each world can implement in world-specific way. > Also introduce world infrastructure including function for port worlds > change. > > Signed-off-by: Jiri Pirko > --- > v1->v2: > - removed functions to change worlds based on name, as rtnl mode >set patch is removed from patchset. > v2->v3: > - fix checks in rocker_world_port_open and rocker_world_port_stop > --- > drivers/net/ethernet/rocker/rocker.h | 56 > drivers/net/ethernet/rocker/rocker_main.c | 464 > +- > 2 files changed, 519 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/rocker/rocker.h > b/drivers/net/ethernet/rocker/rocker.h > index 650caa0..d49bc5d 100644 > --- a/drivers/net/ethernet/rocker/rocker.h > +++ b/drivers/net/ethernet/rocker/rocker.h > @@ -12,7 +12,11 @@ > #ifndef _ROCKER_H > #define _ROCKER_H > > +#include > #include > +#include > +#include > +#include > > #include "rocker_hw.h" > > @@ -24,4 +28,56 @@ struct rocker_desc_info { > dma_addr_t mapaddr; > }; > > +struct rocker; > +struct rocker_port; > + > +struct rocker_world_ops { > + const char *kind; > + size_t priv_size; > + size_t port_priv_size; > + u8 mode; > + int (*init)(struct rocker *rocker, void *priv); > + void (*fini)(void *priv); > + int (*port_init)(struct rocker_port *rocker_port, void *priv, > +void *port_priv); > + void (*port_fini)(void *port_priv); > + int (*port_open)(void *port_priv); > + void (*port_stop)(void *port_priv); > + int (*port_attr_stp_state_set)(void *port_priv, u8 state, > + struct switchdev_trans *trans); > + int (*port_attr_bridge_flags_set)(void *port_priv, > + unsigned long brport_flags, > + struct switchdev_trans *trans); > + int (*port_attr_bridge_flags_get)(void *port_priv, > + unsigned long *p_brport_flags); > + int (*port_obj_vlan_add)(void *port_priv, > +const struct switchdev_obj_port_vlan *vlan, > +struct switchdev_trans *trans); > + int (*port_obj_vlan_del)(void *port_priv, > +const struct switchdev_obj_port_vlan *vlan); > + int (*port_obj_vlan_dump)(void *port_priv, > + struct switchdev_obj_port_vlan *vlan, > + switchdev_obj_dump_cb_t *cb); > + int (*port_obj_fib4_add)(void *port_priv, > +const struct switchdev_obj_ipv4_fib *fib4, > +struct switchdev_trans *trans); > + int (*port_obj_fib4_del)(void *port_priv, > +const struct switchdev_obj_ipv4_fib *fib4); > + int (*port_obj_fdb_add)(void *port_priv, > + const struct switchdev_obj_port_fdb *fdb, > + struct switchdev_trans *trans); > + int (*port_obj_fdb_del)(void *port_priv, > + const struct switchdev_obj_port_fdb *fdb); > + int (*port_obj_fdb_dump)(void *port_priv, > +struct switchdev_obj_port_fdb *fdb, > +switchdev_obj_dump_cb_t *cb); > + int (*port_master_linked)(void *port_priv, struct net_device *master); > + int (*port_master_unlinked)(void *port_priv, struct net_device > *master); > + int (*port_neigh_update)(void *port_priv, struct neighbour *n); > + int (*port_neigh_destroy)(void *port_priv, struct neighbour *n); > + int (*port_ev_mac_vlan_seen)(void *port_priv, > +const unsigned char *addr, > +__be16 vlan_id); > +}; Using void * in these ops is unacceptable, I can't agree to this patch. There is a much cleaner way to architect this. If you look at the ops defined, they're mostly duplicates of the already defined switchdev_ops. It would be much cleaner to: 0) set port mode on qemu/rocker (the device) 1) get the port mode on port probe 2) based on port mode, set the switchdev_ops to point to the port mode world switchdev_ops 3) sub-class rocker_port, like I mentioned in before, to store world-specific stuff in rocker_port I don't buy the argument that we need to change port mode dynamically from the driver. Set it at the device and be done. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v3 06/14] rocker: introduce worlds infrastructure
On Tue, Oct 6, 2015 at 9:19 AM, John Fastabend <john.fastab...@gmail.com> wrote: > On 15-10-06 09:06 AM, Scott Feldman wrote: >> On Tue, Oct 6, 2015 at 12:51 AM, Jiri Pirko <j...@resnulli.us> wrote: >>> From: Jiri Pirko <j...@mellanox.com> >>> >>> This is another step on the way to per-world clean cut. Introduce world >>> ops hooks which each world can implement in world-specific way. >>> Also introduce world infrastructure including function for port worlds >>> change. >>> >>> Signed-off-by: Jiri Pirko <j...@mellanox.com> >>> --- >>> v1->v2: >>> - removed functions to change worlds based on name, as rtnl mode >>>set patch is removed from patchset. >>> v2->v3: >>> - fix checks in rocker_world_port_open and rocker_world_port_stop >>> --- >>> drivers/net/ethernet/rocker/rocker.h | 56 >>> drivers/net/ethernet/rocker/rocker_main.c | 464 >>> +- >>> 2 files changed, 519 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/net/ethernet/rocker/rocker.h >>> b/drivers/net/ethernet/rocker/rocker.h >>> index 650caa0..d49bc5d 100644 >>> --- a/drivers/net/ethernet/rocker/rocker.h >>> +++ b/drivers/net/ethernet/rocker/rocker.h >>> @@ -12,7 +12,11 @@ >>> #ifndef _ROCKER_H >>> #define _ROCKER_H >>> >>> +#include >>> #include >>> +#include >>> +#include >>> +#include >>> >>> #include "rocker_hw.h" >>> >>> @@ -24,4 +28,56 @@ struct rocker_desc_info { >>> dma_addr_t mapaddr; >>> }; >>> >>> +struct rocker; >>> +struct rocker_port; >>> + >>> +struct rocker_world_ops { >>> + const char *kind; >>> + size_t priv_size; >>> + size_t port_priv_size; >>> + u8 mode; >>> + int (*init)(struct rocker *rocker, void *priv); >>> + void (*fini)(void *priv); >>> + int (*port_init)(struct rocker_port *rocker_port, void *priv, >>> +void *port_priv); >>> + void (*port_fini)(void *port_priv); >>> + int (*port_open)(void *port_priv); >>> + void (*port_stop)(void *port_priv); >>> + int (*port_attr_stp_state_set)(void *port_priv, u8 state, >>> + struct switchdev_trans *trans); >>> + int (*port_attr_bridge_flags_set)(void *port_priv, >>> + unsigned long brport_flags, >>> + struct switchdev_trans *trans); >>> + int (*port_attr_bridge_flags_get)(void *port_priv, >>> + unsigned long *p_brport_flags); >>> + int (*port_obj_vlan_add)(void *port_priv, >>> +const struct switchdev_obj_port_vlan *vlan, >>> +struct switchdev_trans *trans); >>> + int (*port_obj_vlan_del)(void *port_priv, >>> +const struct switchdev_obj_port_vlan >>> *vlan); >>> + int (*port_obj_vlan_dump)(void *port_priv, >>> + struct switchdev_obj_port_vlan *vlan, >>> + switchdev_obj_dump_cb_t *cb); >>> + int (*port_obj_fib4_add)(void *port_priv, >>> +const struct switchdev_obj_ipv4_fib *fib4, >>> +struct switchdev_trans *trans); >>> + int (*port_obj_fib4_del)(void *port_priv, >>> +const struct switchdev_obj_ipv4_fib *fib4); >>> + int (*port_obj_fdb_add)(void *port_priv, >>> + const struct switchdev_obj_port_fdb *fdb, >>> + struct switchdev_trans *trans); >>> + int (*port_obj_fdb_del)(void *port_priv, >>> + const struct switchdev_obj_port_fdb *fdb); >>> + int (*port_obj_fdb_dump)(void *port_priv, >>> +struct switchdev_obj_port_fdb *fdb, >>> +switchdev_obj_dump_cb_t *cb); >>> + int (*port_master_linked)(void *port_priv, struct net_device >>> *master); >>> + int (*port_master_unlinked)(void *port_priv, struct net_device >>> *master); >>> + int (*port_ne
Re: [patch net-next v2 00/13] rocker: add support for multiple worlds
On Tue, Oct 6, 2015 at 12:30 AM, Jiri Pirkowrote: > Tue, Oct 06, 2015 at 05:56:12AM CEST, sfel...@gmail.com wrote: >>On Mon, Oct 5, 2015 at 10:43 AM, Jiri Pirko wrote: >>> From: Jiri Pirko >>> >>> This patchset allows new rocker worlds to be easily added in future (like >>> eBPF >>> based one I have been working on). The main part of the patchset is the >>> OF-DPA >>> carve-out. It resuts in OF-DPA specific file. Clean cut. >>> >>> v1->v2: >>> - rtnl rocker mode change userspace expose patch was removed >>> >>> Jiri Pirko (13): >>> rocker: remove unused rocker_port param from alloc funcs and shorten >>> their names >>> rocker: rename rocker.h to rocker_hw.h >>> rocker: rename rocker.c to rocker_main.c >>> rocker: push tlv processing into separate files >>> rocker: implement set settings mode command >>> rocker: introduce worlds infrastructure >>> rocker: introduce OF-DPA world skeleton >>> rocker: set default world on port probe and clean world on remove >>> rocker: pass "learning" value as a parameter to >>> rocker_port_set_learning >>> rocker: pre-allocate wait structures during cmd ring init >>> rocker: remove trans parameter to rocker_cmd_exec function >>> rocker: call rocker_cmd_exec function with "nowait" boolean instead of >>> flags >>> rocker: move OF-DPA stuff into separate file >> >>A couple of my tests are failing with this patchset. A simple port >>test is failing and IPv4 routing test is failing. >> >>The port test is simple: just connect a port on DUT to a port on >>another system and assign an IP address to each port and verify IP >>connectivity. I have this: >> >> DUT:sw1p1 (11.0.0.1/24) <---> host1:eth0 (11.0.0.2/24) >> >>The IPv4 routing tests is a bit more complicated to setup. I'm using >>OSPF, but I'm not seeing full routes formed in the topology, so I >>suspect OSPF hellos aren't getting thru. >> >>Please fix find/fix these issues and send v3. I don't want any git >>bisect issues when running tests. Thanks. > > I fixed that. Sending v3 in a sec. Thanks. Sorry, both tests are still broken. Would you send me your tests scripts so I can see why your tests are passing? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 13/13] rocker: move OF-DPA stuff into separate file
On Mon, Oct 5, 2015 at 10:44 AM, Jiri Pirkowrote: > From: Jiri Pirko > > Carve out OF-DPA would specific code from the common file to the world > file. This change required struct rocker and struct rocker_port split > into world specific struct ofdpa and struct ofdpa_port. Along with this > the world specific functions and defines were renamed from prefix > "rocker_" to "ofdpa_". > > Signed-off-by: Jiri Pirko I'd like a chance to run tests against this patchset and study the patches closer. It's going to take a couple of days, so I'd like to ask that we take a little bit of time before merging this in. It's all good work; just want to do due-diligence before it's pushed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 00/14] rocker: add support for multiple worlds
On Mon, Oct 5, 2015 at 9:58 AM, John Fastabendwrote: > On 15-10-05 09:30 AM, Jiri Pirko wrote: >> Mon, Oct 05, 2015 at 05:41:38PM CEST, john.fastab...@gmail.com wrote: >>> On 15-10-04 02:25 PM, Jiri Pirko wrote: From: Jiri Pirko This patchset allows new rocker worlds to be easily added in future (like eBPF based one I have been working on). The main part of the patchset is the OF-DPA carve-out. It resuts in OF-DPA specific file. Clean cut. The user is able to change rocker port world/mode using rtnl. >>> >>> Hi Jiri, >>> >>> I'm not sure I understand the motivation here. Are you thinking the >>> "real" drivers will start to load worlds or what I've been calling >>> profiles on the devices I have here. If this is the case using >>> opaque strings without any other infrastructure around it to expose >>> what the profile is doing is not sufficient in my opinion. What I >>> would rather have is for drivers to expose the actual configuration >>> parameters they are using, preferable these would be both readable >>> and writable so we don't end up with what the firmware/device driver >>> writers think is best. I think we can get there by exposing a model >>> of the device and configuring "tables". I'll post my latest patch >>> set today to give you a better idea what I'm thinking here. Without >>> this I guess you will end up with drivers creating many profiles and >>> in no consistent way so you end up with here is my "vxlan" profile, >>> here is my "geneve" profile, here is my "magic-foo" profile, etc. I >>> wanted to avoid this. >> >> This is just for rocker purposes. I do not want to do something similar >> for real devices. It does not make sense as real hw always have some >> hard-wired topology. Rocker HW does not. I think that this is the main >> part that may cause some misunderstandings. > > I think your underestimating the flexibility of hardware. And > completely missing the hardware that is based on FPGAs and/or cell > architectures. This hardware is available today and could support > topology changes like this. But even less exotic hardware can/will > support parser updates which makes the device behave differently. > > Other hardware can reconfigure the topology within some constraints, > the fm10k device supports this model. An extreme example would put > an ebpf interpreter in a fpga on the nic and expose it via a driver. > > If its just for rocker purposes I'm not really excited about adding > it to the kernel to support a qemu device. If we allow it for one > driver I don't see how/why we should block it for "real" devices. > From the kernels point of view these are all real drivers. I could > build a qemu model that maps 1:1 with real hardware and do a drop > in replacement. > >> >> Rocker has a notion of "worlds". When a port is set to be in a certain >> world, it behaves in completely different way. Now we have just OF-DPA >> world. I will be adding BPF world shortly. >> >> This has nothing to do with profiles as you describe it, this is >> something completely different! >> >> > > I'm missing why its different. > > Would you object to me adding multiple worlds to fm10k > using opaque strings? I'll create a world with a topology that maps > well to ipv4 networks, a world for ipv6 networks, a world for l2 flat > networks, etc. Each world in this example will have a specific table > topology and parser to support it. In this sense the ports will behave > in completely different ways i.e. packets will be processed by > different pipelines. Are you suggesting we do this? > > I'm not sure what you mean by completely different? Is it just a > different parser and table topology? Real hardware can support changing > or at least modifying these today. > >>> >>> But if this is only meant to be a rocker thing then why expose it on >>> the driver side vs just compiling it on the qemu side? If its just >> >> I want user to be able to set the world/mode of the port on fly. No need to >> re-set the hardware if possible to do it from driver. >> > > But the user has no way to know what these strings are doing? > >> >>> for convenience and only meant for the emulated device we should be >>> clear in the documentation and patch set. >> >> This is rocker-only patchset, where do you want to clear it? >> > > I don't think this is reasonable from the kernel side to "know" or > expose a driver is running on qemu like this. The kernel shouldn't > know or care if a device is emulated or not. > >> >>> >>> Final, comment can we abstract the interfaces better? An L2 and L3 >>> table could be mapped generically onto a table pipeline model if the >>> driver gave some small hints like this is my l2 table and this is my l3 >>> table. Then you don't need all the world specific callbacks and the >>> OF-DPA model just looks like an instance of a pipeline with some >>> specific hints where to put l2/l3 rules. >> >> I think you are
Re: [patch net-next 05/14] rocker: implement set settings mode command
On Sun, Oct 4, 2015 at 2:25 PM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > Introduce a helper to ask HW for change of the port mode (world). > > Signed-off-by: Jiri Pirko <j...@mellanox.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 01/13] rocker: remove unused rocker_port param from alloc funcs and shorten their names
On Mon, Oct 5, 2015 at 10:43 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > No need to pass rocker_port around to alloc/free rocker functions, > since they now use switchdev_trans for memory management storage. > With the param removal, shorten the name of the functions since they now > has nothing to do with rocker port. > > Signed-off-by: Jiri Pirko <j...@mellanox.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 03/13] rocker: rename rocker.c to rocker_main.c
On Mon, Oct 5, 2015 at 10:43 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > Since "rocker.c" is going to be split into multiple files, start with > renaming original "rocker.c" file to "rocker_main.c". Multiple code > parts are going to be cut from "rocker_main.c" later on. > > Signed-off-by: Jiri Pirko <j...@mellanox.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 06/13] rocker: introduce worlds infrastructure
On Mon, Oct 5, 2015 at 10:43 AM, Jiri Pirkowrote: > From: Jiri Pirko > > This is another step on the way to per-world clean cut. Introduce world > ops hooks which each world can implement in world-specific way. > Also introduce world infrastructure including function for port worlds > change. > > Signed-off-by: Jiri Pirko > --- > v1->v2: > - removed functions to change worlds based on name, as rtnl mode >set patch is removed from patchset. > --- > drivers/net/ethernet/rocker/rocker.h | 56 > drivers/net/ethernet/rocker/rocker_main.c | 464 > +- > 2 files changed, 519 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/rocker/rocker.h > b/drivers/net/ethernet/rocker/rocker.h > index 650caa0..d49bc5d 100644 > --- a/drivers/net/ethernet/rocker/rocker.h > +++ b/drivers/net/ethernet/rocker/rocker.h > @@ -12,7 +12,11 @@ > #ifndef _ROCKER_H > #define _ROCKER_H > > +#include > #include > +#include > +#include > +#include > > #include "rocker_hw.h" > > @@ -24,4 +28,56 @@ struct rocker_desc_info { > dma_addr_t mapaddr; > }; > > +struct rocker; > +struct rocker_port; > + > +struct rocker_world_ops { > + const char *kind; > + size_t priv_size; > + size_t port_priv_size; > + u8 mode; > + int (*init)(struct rocker *rocker, void *priv); > + void (*fini)(void *priv); > + int (*port_init)(struct rocker_port *rocker_port, void *priv, > +void *port_priv); > + void (*port_fini)(void *port_priv); > + int (*port_open)(void *port_priv); > + void (*port_stop)(void *port_priv); > + int (*port_attr_stp_state_set)(void *port_priv, u8 state, > + struct switchdev_trans *trans); > + int (*port_attr_bridge_flags_set)(void *port_priv, > + unsigned long brport_flags, > + struct switchdev_trans *trans); > + int (*port_attr_bridge_flags_get)(void *port_priv, > + unsigned long *p_brport_flags); > + int (*port_obj_vlan_add)(void *port_priv, > +const struct switchdev_obj_port_vlan *vlan, > +struct switchdev_trans *trans); > + int (*port_obj_vlan_del)(void *port_priv, > +const struct switchdev_obj_port_vlan *vlan); > + int (*port_obj_vlan_dump)(void *port_priv, > + struct switchdev_obj_port_vlan *vlan, > + switchdev_obj_dump_cb_t *cb); > + int (*port_obj_fib4_add)(void *port_priv, > +const struct switchdev_obj_ipv4_fib *fib4, > +struct switchdev_trans *trans); > + int (*port_obj_fib4_del)(void *port_priv, > +const struct switchdev_obj_ipv4_fib *fib4); > + int (*port_obj_fdb_add)(void *port_priv, > + const struct switchdev_obj_port_fdb *fdb, > + struct switchdev_trans *trans); > + int (*port_obj_fdb_del)(void *port_priv, > + const struct switchdev_obj_port_fdb *fdb); > + int (*port_obj_fdb_dump)(void *port_priv, > +struct switchdev_obj_port_fdb *fdb, > +switchdev_obj_dump_cb_t *cb); > + int (*port_master_linked)(void *port_priv, struct net_device *master); > + int (*port_master_unlinked)(void *port_priv, struct net_device > *master); > + int (*port_neigh_update)(void *port_priv, struct neighbour *n); > + int (*port_neigh_destroy)(void *port_priv, struct neighbour *n); > + int (*port_ev_mac_vlan_seen)(void *port_priv, > +const unsigned char *addr, > +__be16 vlan_id); > +}; Yuck, void *. Can we do better? How about using base struct rocker_port and then sub-classing world-specific rocker ports. And make rocker_world_ops pass struct rocker_port * (for the port-centric ops). struct rocker_port { // common stuff for all worlds }; struct my_world_port { struct rocker_port rocker_port; // world-specific stuff for port }; #define MY_WORLD_PORT(rocker_port) \ container_of(rocker_port, struct my_world_port, rocker_port) Same for world-centric ops: struct rocker_world { // common stuff for all worlds }; struct my_world { struct rocker_world rocker_world; // stuff for this world }; > +static int rocker_world_port_ev_mac_vlan_seen(struct rocker_port > *rocker_port, > + const unsigned char *addr, > + __be16 vlan_id); > > static int rocker_event_mac_vlan_seen(const struct rocker *rocker,
Re: [patch net-next v2 02/13] rocker: rename rocker.h to rocker_hw.h
On Mon, Oct 5, 2015 at 10:43 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > Since "rocker.h" file is going to be used for different purpose, > rename the hardware-specific header to "rocker_hw.h". > > Signed-off-by: Jiri Pirko <j...@mellanox.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 04/13] rocker: push tlv processing into separate files
On Mon, Oct 5, 2015 at 10:43 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > Carve out TLV processing helpers into separate files. > > Signed-off-by: Jiri Pirko <j...@mellanox.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 00/13] rocker: add support for multiple worlds
On Mon, Oct 5, 2015 at 10:43 AM, Jiri Pirkowrote: > From: Jiri Pirko > > This patchset allows new rocker worlds to be easily added in future (like eBPF > based one I have been working on). The main part of the patchset is the OF-DPA > carve-out. It resuts in OF-DPA specific file. Clean cut. > > v1->v2: > - rtnl rocker mode change userspace expose patch was removed > > Jiri Pirko (13): > rocker: remove unused rocker_port param from alloc funcs and shorten > their names > rocker: rename rocker.h to rocker_hw.h > rocker: rename rocker.c to rocker_main.c > rocker: push tlv processing into separate files > rocker: implement set settings mode command > rocker: introduce worlds infrastructure > rocker: introduce OF-DPA world skeleton > rocker: set default world on port probe and clean world on remove > rocker: pass "learning" value as a parameter to > rocker_port_set_learning > rocker: pre-allocate wait structures during cmd ring init > rocker: remove trans parameter to rocker_cmd_exec function > rocker: call rocker_cmd_exec function with "nowait" boolean instead of > flags > rocker: move OF-DPA stuff into separate file A couple of my tests are failing with this patchset. A simple port test is failing and IPv4 routing test is failing. The port test is simple: just connect a port on DUT to a port on another system and assign an IP address to each port and verify IP connectivity. I have this: DUT:sw1p1 (11.0.0.1/24) <---> host1:eth0 (11.0.0.2/24) The IPv4 routing tests is a bit more complicated to setup. I'm using OSPF, but I'm not seeing full routes formed in the topology, so I suspect OSPF hellos aren't getting thru. Please fix find/fix these issues and send v3. I don't want any git bisect issues when running tests. Thanks. -scott -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 09/14] rocker: add rtnl ops for port mode [gs]etting
On Mon, Oct 5, 2015 at 9:50 AM, Jiri Pirkowrote: > Mon, Oct 05, 2015 at 06:37:20PM CEST, sfel...@gmail.com wrote: >>On Mon, Oct 5, 2015 at 8:53 AM, Jiri Pirko wrote: >>> Mon, Oct 05, 2015 at 05:41:29PM CEST, sfel...@gmail.com wrote: On Sun, Oct 4, 2015 at 2:25 PM, Jiri Pirko wrote: > From: Jiri Pirko > > Introduce a stub for allowing user to change rocker port world/mode. > This is implemented using rtnl changelink op. > > Signed-off-by: Jiri Pirko [cut] > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h > index 3a5f263..7da768e 100644 > --- a/include/uapi/linux/if_link.h > +++ b/include/uapi/linux/if_link.h > @@ -416,6 +416,17 @@ enum { > }; > #define IFLA_GENEVE_MAX(__IFLA_GENEVE_MAX - 1) > > +/* Rocker section */ > +enum { > + IFLA_ROCKER_UNSPEC, > + IFLA_ROCKER_MODE, > + __IFLA_ROCKER_MAX, > +}; > + > +#define IFLA_ROCKER_MAX(__IFLA_ROCKER_MAX - 1) > + > +#define ROCKER_MODE_MAX 16 Does this mean there is going to be a "ip link set dev DEV type rocker mode MODE" command option? It doesn't seem right to be adding driver-specific IFLA_'s here. I think this sets bad precedence for other drivers to add their own knobs without thinking about a generic shared mechanism. >>> >>> I understand you point. I somehow share it as well. But on the other >>> hand, That is neat way to set mode of the port, and I cannot find other >>> this nice. >>> >>> Actually, I don't see the point of letting the user dynamically change the port mode. I would prefer this knob be moved to qemu/rocker. Let the port mode be specified on device creation. >>> >>> Hmm, interesting, why? I find it great for user to be able to switch the >>> port mode easily on the running system. It is a setting of a port. >>> I don't see why this should be a hard-coded hw setting. We just have >>> to find a way to expose this to user. >> >>You could just as easy restart the qemu session with different port >>mode. Even if rocker was a real device, I don't see (real) users >>wanting to change the port mode at run-time. It just doesn't seem >>like a real-world scenario. I know it's convenient for developers, > > We the hw-iface is already designed for on-fly changes. > I'm convinced that this is one of the situation where we should not do > stuff in HW just because we can. (Or we should do both). I've used that argument myself for other things, such as the ageing timer in rocker driver, so I agree with you on that point. > We should develop in-kernel solution in order to wrap up HW functionality. > Even if not easy. That is the purpose of rocker from day 1, and that is > why we love it :) Well then you need to define a generic interface for this concept of changing port mode that's not rocker-specific. What you have in if_link.h is easy. What's the not-easy solution? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 09/14] rocker: add rtnl ops for port mode [gs]etting
On Sun, Oct 4, 2015 at 2:25 PM, Jiri Pirkowrote: > From: Jiri Pirko > > Introduce a stub for allowing user to change rocker port world/mode. > This is implemented using rtnl changelink op. > > Signed-off-by: Jiri Pirko [cut] > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h > index 3a5f263..7da768e 100644 > --- a/include/uapi/linux/if_link.h > +++ b/include/uapi/linux/if_link.h > @@ -416,6 +416,17 @@ enum { > }; > #define IFLA_GENEVE_MAX(__IFLA_GENEVE_MAX - 1) > > +/* Rocker section */ > +enum { > + IFLA_ROCKER_UNSPEC, > + IFLA_ROCKER_MODE, > + __IFLA_ROCKER_MAX, > +}; > + > +#define IFLA_ROCKER_MAX(__IFLA_ROCKER_MAX - 1) > + > +#define ROCKER_MODE_MAX 16 Does this mean there is going to be a "ip link set dev DEV type rocker mode MODE" command option? It doesn't seem right to be adding driver-specific IFLA_'s here. I think this sets bad precedence for other drivers to add their own knobs without thinking about a generic shared mechanism. Actually, I don't see the point of letting the user dynamically change the port mode. I would prefer this knob be moved to qemu/rocker. Let the port mode be specified on device creation. -scott -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 09/14] rocker: add rtnl ops for port mode [gs]etting
On Mon, Oct 5, 2015 at 8:53 AM, Jiri Pirkowrote: > Mon, Oct 05, 2015 at 05:41:29PM CEST, sfel...@gmail.com wrote: >>On Sun, Oct 4, 2015 at 2:25 PM, Jiri Pirko wrote: >>> From: Jiri Pirko >>> >>> Introduce a stub for allowing user to change rocker port world/mode. >>> This is implemented using rtnl changelink op. >>> >>> Signed-off-by: Jiri Pirko >> >>[cut] >> >>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h >>> index 3a5f263..7da768e 100644 >>> --- a/include/uapi/linux/if_link.h >>> +++ b/include/uapi/linux/if_link.h >>> @@ -416,6 +416,17 @@ enum { >>> }; >>> #define IFLA_GENEVE_MAX(__IFLA_GENEVE_MAX - 1) >>> >>> +/* Rocker section */ >>> +enum { >>> + IFLA_ROCKER_UNSPEC, >>> + IFLA_ROCKER_MODE, >>> + __IFLA_ROCKER_MAX, >>> +}; >>> + >>> +#define IFLA_ROCKER_MAX(__IFLA_ROCKER_MAX - 1) >>> + >>> +#define ROCKER_MODE_MAX 16 >> >>Does this mean there is going to be a "ip link set dev DEV type rocker >>mode MODE" command option? >> >>It doesn't seem right to be adding driver-specific IFLA_'s here. I >>think this sets bad precedence for other drivers to add their own >>knobs without thinking about a generic shared mechanism. > > I understand you point. I somehow share it as well. But on the other > hand, That is neat way to set mode of the port, and I cannot find other > this nice. > > >> >>Actually, I don't see the point of letting the user dynamically change >>the port mode. I would prefer this knob be moved to qemu/rocker. Let >>the port mode be specified on device creation. > > Hmm, interesting, why? I find it great for user to be able to switch the > port mode easily on the running system. It is a setting of a port. > I don't see why this should be a hard-coded hw setting. We just have > to find a way to expose this to user. You could just as easy restart the qemu session with different port mode. Even if rocker was a real device, I don't see (real) users wanting to change the port mode at run-time. It just doesn't seem like a real-world scenario. I know it's convenient for developers, but a qemu knob is just as convenient. Doing this in qemu will keep the rocker driver less complicated, cleaner. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 3/6] switchdev: rename switchdev_obj_vlan to switchdev_obj_port_vlan
On Thu, Oct 1, 2015 at 2:03 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > Make the struct name in sync with object id name. > > Suggested-by: Vivien Didelot <vivien.dide...@savoirfairelinux.com> > Signed-off-by: Jiri Pirko <j...@mellanox.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 2/6] switchdev: rename SWITCHDEV_ATTR_* enum values to SWITCHDEV_ATTR_ID_*
On Thu, Oct 1, 2015 at 2:03 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > To be aligned with obj. > > Signed-off-by: Jiri Pirko <j...@mellanox.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 5/6] switchdev: bring back switchdev_obj and use it as a generic object param
On Thu, Oct 1, 2015 at 2:03 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > Replace "void *obj" with a generic structure. Introduce couple of > helpers along that. > > Signed-off-by: Jiri Pirko <j...@mellanox.com> > --- > v1->v2: > * align object structure getter name with structure name as suggested by > Vivien > --- Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 1/6] switchdev: rename SWITCHDEV_OBJ_* enum values to SWITCHDEV_OBJ_ID_*
On Thu, Oct 1, 2015 at 2:03 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > Suggested-by: Vivien Didelot <vivien.dide...@savoirfairelinux.com> > Signed-off-by: Jiri Pirko <j...@mellanox.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 6/6] switchdev: push object ID back to object structure
On Thu, Oct 1, 2015 at 2:03 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > Suggested-by: Scott Feldman <sfel...@gmail.com> > Signed-off-by: Jiri Pirko <j...@mellanox.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 4/6] switchdev: rename switchdev_obj_fdb to switchdev_obj_port_fdb
On Thu, Oct 1, 2015 at 2:03 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > Make the struct name in sync with object id name. > > Suggested-by: Vivien Didelot <vivien.dide...@savoirfairelinux.com> > Signed-off-by: Jiri Pirko <j...@mellanox.com> Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v3 02/10] switchdev: introduce transaction item queue for attr_set and obj_add
On Wed, Sep 30, 2015 at 11:56 AM, Vivien Didelot <vivien.dide...@savoirfairelinux.com> wrote: > Hi all, > > On Sep. Friday 25 (39) 11:03 AM, Vivien Didelot wrote: >> On Sep. Thursday 24 (39) 10:55 PM, David Miller wrote: >> > From: Scott Feldman <sfel...@gmail.com> >> > Date: Thu, 24 Sep 2015 22:29:43 -0700 >> > >> > > I'd rather keep 2-phase not optional, or at least make it some what of >> > > a pain for drivers to opt-out of 2-phase. Forcing the driver to see >> > > both phases means the driver needs to put some code to skip phase 1 >> > > (and hopefully has some persistent comment explaining why its being >> > > skipped). Something like: >> > > >> > > /* I'm skipping phase 1 prepare for this operation. I have infinite >> > > hardware >> > > * resources and I'm not setting any persistent state in the driver or >> > > device >> > > * and I don't need any dynamic resources from the kernel, so its >> > > impossible >> > > * for me to fail phase 2 commit. Nothing to prepare, sorry. >> > > */ >> > >> > I agree with Scott here. >> > >> > If you can opt out of something, you can not think about it and thus >> > more likely get it wrong. >> > >> > I can just see a driver not implementing prepare at all and then doing >> > stupid things in commit when they hit some resource limit or whatever, >> > rather than taking care of such issues in prepare. >> >> OK, I have no experience with stacked devices nor what it actually looks >> like, but I understand that it is a redundant setup where it makes sense >> to ensure that an operation is feasible before programming the hardware. >> >> I agree with both of you on imposing switchdev drivers such notion. >> >> I was confused with the rtnl lock (from bridge netlink requests) which >> seemed to limit a lot the usage of this prepare phase. >> >> I don't know the batch mode neither, but I can think about a potentially >> powerful usage of the prepare phase in Marvell switches (or any basic >> home router switches), please tell me if the following is feasible: >> >> Every hardware VLANs I know of are programmed with all port membership >> in one shot. This is not feasible today with the bridge command. If I >> could bundle in one request the equivalent of ("VID 100: 0u 1u 5t"): >> >> bridge vlan add master dev swp0 vid 100 pvid untagged >> bridge vlan add master dev swp1 vid 100 pvid untagged >> bridge vlan add master dev swp5 vid 100 # cpu >> >> In such case the prepare phase could be great to allocate and populate a >> VLAN entry structure (i.e. struct mv88e6xxx_vtu_stu_entry) before >> programming the hardware *just once*. Is that doable? > > May I get answers for this? I'd need that in order to suggest a next > step for the prepare phase in DSA drivers. You know we're just making this all up as we go, right? ;) Actually, bringing real-world examples us think about solutions, so thank you. A partial answer to your question, I guess, is to consider switchdev_port_obj_add(dev, obj), which is basically: switchdev_port_obj_add(dev, obj) dev->ops->switchdev_port_obj_add(dev, obj, PREPARE) if no err dev->ops->switchdev_port_obj_add(dev, obj, COMMIT) else // abort transaction Seems you could make a batch version that iterates over list of devs to add the same obj: switchdev_port_obj_add_batch(devs, obj) for dev in devs: dev->ops->switchdev_port_obj_add(dev, obj, PREPARE) if no errs: for dev in devs: dev->ops->switchdev_port_obj_add(dev, obj, COMMIT) else // abort transaction (on all devs) Who calls this _batch version with a list of devs is TBD. Not sure if you can get the -batch and -force options in iproute2 ip cmd to get you here. And as I look back at your example, you really have two diff objs in your cmd list. First obj is vlan 100 pvid untagged, second obj is vlan 100. Would this be two batches? Oh, hmmm, maybe the batch version iterates over list of {dev, obj} tuples. Something like (sorry, my pseudo code is turning into python): switchdev_port_obj_add_batch(dev_objs) for dev, obj in dev_objs.items: dev->ops->switchdev_port_obj_add(dev, obj, PREPARE) if no errs: for dev, obj in dev_objs.items: dev->ops->switchdev_port_obj_add(dev, obj, COMMIT) else // abort transaction (on all devs) Does this help? Maybe we should walk before running and focus on getting non-batch ops working and then revisit? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next] switchdev: bring back switchdev_obj and use it as a generic object param
On Wed, Sep 30, 2015 at 9:00 AM, Jiri Pirkowrote: > From: Jiri Pirko > > Replace "void *obj" with a generic structure. Introduce couple of > helpers along that. > > Signed-off-by: Jiri Pirko Looks good to me, except for the macro/enum collision Vivien pointed out. Vivien's suggestion to use SWITCHDEV_OBJ_ID_xxx for ID enums would work. Actually, id should be moved back into switchdev_obj, otherwise that info is lost in dump callbacks. Could have a common callback that switches on id, for id specific work. I'll submit a patch to move id back into obj if Jiri doesn't add it to this one. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 6/6] net: switchdev: extract struct switchdev_obj_*
On Wed, Sep 30, 2015 at 3:36 AM, Jiri Pirkowrote: > Tue, Sep 29, 2015 at 06:07:18PM CEST, vivien.dide...@savoirfairelinux.com > wrote: >>Now that switchdev and its drivers directly use specific switchdev_obj_* >>structures, move them out of the switchdev_obj union and get rif of this >>outer structure. >> >>Signed-off-by: Vivien Didelot >>--- >> include/net/switchdev.h | 53 >> - >> 1 file changed, 26 insertions(+), 27 deletions(-) >> >>diff --git a/include/net/switchdev.h b/include/net/switchdev.h >>index bcadac3..e11425e 100644 >>--- a/include/net/switchdev.h >>+++ b/include/net/switchdev.h >>@@ -64,30 +64,29 @@ enum switchdev_obj_id { >> SWITCHDEV_OBJ_PORT_FDB, >> }; >> >>-struct switchdev_obj { >>- enum switchdev_obj_id id; >>- int (*cb)(struct switchdev_obj *obj); >>- union { >>- struct switchdev_obj_vlan { /* PORT_VLAN */ >>- u16 flags; >>- u16 vid_begin; >>- u16 vid_end; >>- } vlan; >>- struct switchdev_obj_ipv4_fib { /* IPV4_FIB */ >>- u32 dst; >>- int dst_len; >>- struct fib_info *fi; >>- u8 tos; >>- u8 type; >>- u32 nlflags; >>- u32 tb_id; >>- } ipv4_fib; >>- struct switchdev_obj_fdb { /* PORT_FDB */ >>- const unsigned char *addr; >>- u16 vid; >>- u16 ndm_state; >>- } fdb; >>- } u; >>+/* SWITCHDEV_OBJ_PORT_VLAN */ >>+struct switchdev_obj_vlan { >>+ u16 flags; >>+ u16 vid_begin; >>+ u16 vid_end; >>+}; >>+ >>+/* SWITCHDEV_OBJ_IPV4_FIB */ >>+struct switchdev_obj_ipv4_fib { >>+ u32 dst; >>+ int dst_len; >>+ struct fib_info *fi; >>+ u8 tos; >>+ u8 type; >>+ u32 nlflags; >>+ u32 tb_id; >>+}; >>+ >>+/* SWITCHDEV_OBJ_PORT_FDB */ >>+struct switchdev_obj_fdb { >>+ const unsigned char *addr; >>+ u16 vid; >>+ u16 ndm_state; >> }; > > > I don't like these structs being passed down as a "void *". I think that > we should have some "common" struct for these objects, event if it would > be empty and pass it down. "void *" does not look good at all, does not > tell the reader what that param is about. How about: > > struct switchdev_obj { > }; > > struct switchdev_obj_vlan { > struct switchdev_obj obj; > u16 flags; > u16 vid_begin; > u16 vid_end; > }; > #define SWITCHDEV_OBJ_VLAN(obj) \ > container_of(obj, struct switchdev_obj_vlan, obj) > > /* SWITCHDEV_OBJ_IPV4_FIB */ > struct switchdev_obj_ipv4_fib { > struct switchdev_obj obj; > u32 dst; > int dst_len; > struct fib_info *fi; > u8 tos; > u8 type; > u32 nlflags; > u32 tb_id; > }; > #define SWITCHDEV_OBJ_IPV4_FIB(obj) \ > container_of(obj, struct switchdev_obj_ipv4_fib, obj) > > /* SWITCHDEV_OBJ_PORT_FDB */ > struct switchdev_obj_fdb { > struct switchdev_obj obj; > const unsigned char *addr; > u16 vid; > u16 ndm_state; > }; > #define SWITCHDEV_OBJ_FDB(obj) \ > container_of(obj, struct switchdev_obj_fdb, obj) > > > then pass struct switchdev_obj *obj down to drivers and in driver, get > original object by SWITCHDEV_OBJ_* ? Yes, I agree with Jiri, keep the struct switchdev_obj, use it within each specific struct, and pass struct switchdev * in core funcs rather than void *. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v3 00/10] switchdev: transaction item queue and cleanup
On Thu, Sep 24, 2015 at 1:02 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > Jiri Pirko (10): > switchdev: rename "trans" to "trans_ph". > switchdev: introduce transaction item queue for attr_set and obj_add > switchdev: move transaction phase enum under transaction structure > switchdev: add switchdev_trans_ph_prepare/commit helpers > rocker: push struct switchdev_trans down through rocker code > rocker: use switchdev transaction queue for allocated memory > switchdev: remove "NONE" transaction phase > switchdev: remove "ABORT" transaction phase > dsa: use prepare/commit switchdev transaction helpers > switchdev: reduce transaction phase enum down to a boolean > > Documentation/networking/switchdev.txt | 19 ++ > drivers/net/ethernet/rocker/rocker.c | 308 > ++--- > include/net/switchdev.h| 40 +++-- > net/dsa/slave.c| 31 ++-- > net/switchdev/switchdev.c | 125 ++++++--- > 5 files changed, 296 insertions(+), 227 deletions(-) Tests pass. ACK on all. Acked-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/4] switchdev: push bridge attributes down
On Thu, Sep 24, 2015 at 2:05 PM, Stephen Hemminger <step...@networkplumber.org> wrote: > On Thu, 24 Sep 2015 13:59:26 -0700 > sfel...@gmail.com wrote: > >> From: Scott Feldman <sfel...@gmail.com> >> >> Push bridge-level attributes down to switchdev drivers. This patchset >> adds the infrastructure and then pushes, as an example, ageing_time attribute >> down from bridge to switchdev (rocker) driver. Add some range-checking >> for ageing_time. >> >> # ip link set dev br0 type bridge ageing_time 1000 >> >> # ip link set dev br0 type bridge ageing_time 999 >> RTNETLINK answers: Numerical result out of range >> >> Up until now, switchdev attrs where port-level attrs, so the netdev used in >> switchdev_attr_set() would be a switch port or bond of switch ports. With >> bridge-level attrs, the netdev passed to switchdev_attr_set() is the bridge >> netdev. The same recusive algo is used to visit the leaves of the stacked >> drivers to set the attr, it's just in this case we start one layer higher in >> the stack. One note is not all ports in the bridge may support setting a >> bridge-level attribute, so rather than failing the entire set, we'll skip >> over >> those ports returning -EOPNOTSUPP. > > > Rather than having more in bridge, shouldn't this just be a netlink event? You lost me? Oh, do you mean netdev notifier? Jiri (privately) had suggested that also. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/4] switchdev: push bridge attributes down
On Thu, Sep 24, 2015 at 6:23 PM, Florian Fainelli <f.faine...@gmail.com> wrote: > On 24/09/15 13:59, sfel...@gmail.com wrote: >> From: Scott Feldman <sfel...@gmail.com> >> >> Push bridge-level attributes down to switchdev drivers. This patchset >> adds the infrastructure and then pushes, as an example, ageing_time attribute >> down from bridge to switchdev (rocker) driver. Add some range-checking >> for ageing_time. >> >> # ip link set dev br0 type bridge ageing_time 1000 >> >> # ip link set dev br0 type bridge ageing_time 999 >> RTNETLINK answers: Numerical result out of range >> >> Up until now, switchdev attrs where port-level attrs, so the netdev used in >> switchdev_attr_set() would be a switch port or bond of switch ports. With >> bridge-level attrs, the netdev passed to switchdev_attr_set() is the bridge >> netdev. The same recusive algo is used to visit the leaves of the stacked >> drivers to set the attr, it's just in this case we start one layer higher in >> the stack. One note is not all ports in the bridge may support setting a >> bridge-level attribute, so rather than failing the entire set, we'll skip >> over >> those ports returning -EOPNOTSUPP. > > So, without a better device to hold that kind of information (in the > future it could be a global, switch-specific device holding that > information), I agree with your decision to take the bridge device to > hold that attribute, it still feels a bit uncomfortable to have > switchdev_attr_port() take a bridge device parameter, but whatever, here > is a scenario I am wondering how we would want to proceed with: > > - suppose we have a switch which is only able to control ageing > globally, not per port or any other kind of logical domain > > - we have enabled two software bridges on the same physical switch, with > different ageing timeouts > > It does not seem to me like it hurts ageing the other bridge faster than > expected (even though that could be expensive for MDIO devices), but we > would need to have consistent reporting here for the other bridge. It could hurt a little bit by ageing out entries prematurely, forcing relearning. ;) I think if the switch can't support multiple ageing timeouts (which is probably typical), then the switch driver should not implement this switchdev attr at the port level (this patchset). So how does ageing timeout get set for a switch with a global timer? I believe we need the switch-specific device to handle those switch-global attr sets. I'll send out a refresh of my RFC patches for switch device class, and add this ageing_time attr. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v3 02/10] switchdev: introduce transaction item queue for attr_set and obj_add
On Thu, Sep 24, 2015 at 9:36 PM, Vivien Didelotwrote: > Hi Jiri, > > On Sep. Thursday 24 (39) 10:02 AM, Jiri Pirko wrote: >> From: Jiri Pirko >> >> Now, the memory allocation in prepare/commit state is done separatelly >> in each driver (rocker). Introduce the similar mechanism in generic >> switchdev code, in form of queue. That can be used not only for memory >> allocations, but also for different items. Abort item destruction >> is handled as well. >> >> Signed-off-by: Jiri Pirko > > [...] > >> /** >> * struct switchdev_ops - switchdev operations >> * >> @@ -94,9 +110,11 @@ struct switchdev_ops { >> int (*switchdev_port_attr_get)(struct net_device *dev, >> struct switchdev_attr *attr); >> int (*switchdev_port_attr_set)(struct net_device *dev, >> -struct switchdev_attr *attr); >> +struct switchdev_attr *attr, >> +struct switchdev_trans *trans); >> int (*switchdev_port_obj_add)(struct net_device *dev, >> - struct switchdev_obj *obj); >> + struct switchdev_obj *obj, >> + struct switchdev_trans *trans); >> int (*switchdev_port_obj_del)(struct net_device *dev, >> struct switchdev_obj *obj); >> int (*switchdev_port_obj_dump)(struct net_device *dev, > > This version is better than the current state, but it's too bad we > didn't get feedback yet on the real purpose of this 2-phase model... Hmmm...I think I missed the question about the 2-phase model. Let me see if I can answer it here. We got here first when working with stacked drivers and we wanted to propagate an change (attr set or obj add) down the stack in a safe way as to not leave hardware or software in an inconsistent state when unexpected things happen. The simple stacked driver example was two switch ports bonded. Setting an attr on the bond would propagate down to each switch port. If the setting stuck on the first port, but didn't on the second port, we didn't want to have to go back and undo the setting on the first port. It gets messy real fast trying to handle the undo operation. So 2-phase model was suggested by davem to solve the problem in a generic way. First phase was to check with each device in the stack if the operation is OK, and second phase would commit operation. The second phase cannot fail; the first phase already said everything was OK to proceed. That means no running out of hardware resources in phase 2. Phase 1 should have check/reserved the resources. And no OOM situations in phase 2. And so on for other detectable failures. Stacked driver case is one case where the 2-phase model strives to maintain hw/sw consistency. I could see it working for batch operations also, although we don't have any examples of batch processing yet (in switchdev). I'd rather keep 2-phase not optional, or at least make it some what of a pain for drivers to opt-out of 2-phase. Forcing the driver to see both phases means the driver needs to put some code to skip phase 1 (and hopefully has some persistent comment explaining why its being skipped). Something like: /* I'm skipping phase 1 prepare for this operation. I have infinite hardware * resources and I'm not setting any persistent state in the driver or device * and I don't need any dynamic resources from the kernel, so its impossible * for me to fail phase 2 commit. Nothing to prepare, sorry. */ -scott -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v2 02/10] switchdev: introduce transaction item queue for attr_set and obj_add
On Wed, Sep 23, 2015 at 12:57 AM, Jiri Pirkowrote: > From: Jiri Pirko > > Now, the memory allocation in prepare/commit state is done separatelly > in each driver (rocker). Introduce the similar mechanism in generic > switchdev code, in form of queue. That can be used not only for memory > allocations, but also for different items. Commit/abort item destruction > is handled as well. > > Signed-off-by: Jiri Pirko > --- > v1-v2: > - added comment blocks to exported functions as suggested by Scott > - made switchdev_trans_init and switchdev_trans_items_destroy static > - added documentation for this to switchdev.txt as suggested by Scott [cut] > @@ -232,10 +304,13 @@ static int __switchdev_port_obj_add(struct net_device > *dev, > */ > int switchdev_port_obj_add(struct net_device *dev, struct switchdev_obj *obj) > { > + struct switchdev_trans trans; > int err; > > ASSERT_RTNL(); > > + switchdev_trans_init(); > + > /* Phase I: prepare for obj add. Driver/device should fail > * here if there are going to be issues in the commit phase, > * such as lack of resources or support. The driver/device > @@ -244,7 +319,7 @@ int switchdev_port_obj_add(struct net_device *dev, struct > switchdev_obj *obj) > */ > > obj->trans_ph = SWITCHDEV_TRANS_PREPARE; > - err = __switchdev_port_obj_add(dev, obj); > + err = __switchdev_port_obj_add(dev, obj, ); > if (err) { > /* Prepare phase failed: abort the transaction. Any > * resources reserved in the prepare phase are > @@ -253,7 +328,8 @@ int switchdev_port_obj_add(struct net_device *dev, struct > switchdev_obj *obj) > > if (err != -EOPNOTSUPP) { > obj->trans_ph = SWITCHDEV_TRANS_ABORT; > - __switchdev_port_obj_add(dev, obj); > + __switchdev_port_obj_add(dev, obj, ); > + switchdev_trans_items_destroy(); > } > > return err; > @@ -265,8 +341,9 @@ int switchdev_port_obj_add(struct net_device *dev, struct > switchdev_obj *obj) > */ > > obj->trans_ph = SWITCHDEV_TRANS_COMMIT; > - err = __switchdev_port_obj_add(dev, obj); > + err = __switchdev_port_obj_add(dev, obj, ); > WARN(err, "%s: Commit of object (id=%d) failed.\n", dev->name, > obj->id); > + switchdev_trans_items_destroy(); Before this patchset, rocker would BUG_ON that the list was empty after commit. This was really helpful in finding bugs in the driver as it meant the commit and prepare phases didn't follow the same code path. (Everything the driver queued in prepare was dequeued in commit, so the list should be empty after commit). Should this last switchdev_trans_items_destroy() be replaced with: BUG_ON(!list_empty(_list)); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] rtnl_fdb_dump: catch errors from ndo_fdb_dump and ndo_dflt_fdb_dump
On Wed, Sep 23, 2015 at 9:21 AM, Roopa Prabhu <ro...@cumulusnetworks.com> wrote: > From: Wilson Kok <w...@cumulusnetworks.com> > > current ndo_fdb_dump and ndo_dflt_fdb_dump always return the current > fdb index. They dont return errors. Which results in fdb dumps > continuing on errors. > > In one such case where bridges and vxlan devices were involved, > bridge driver returned -EMSGSIZE on a bridge, but since it continued > on error, the next vxlan device fdb dump (which was smaller in size) > succeeded, leaving fdb idx at an inconsistent value. This > resulted in the bridge fdb entry getting skipped and vxlan > fdb entry getting dumped twice. > > This patch changes ndo_fdb_dump() to return the status and pass the > idx by reference for update. The dump aborts if non-zero status is > returned. > > Signed-off-by: Wilson Kok <w...@cumulusnetworks.com> > Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com> Reviewed-by: Scott Feldman <sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 02/10] switchdev: introduce transaction item queue for attr_set and obj_add
On Tue, Sep 22, 2015 at 6:53 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > Now, the memory allocation in prepare/commit state is done separatelly > in each driver (rocker). Introduce the similar mechanism in generic > switchdev code, in form of queue. That can be used not only for memory > allocations, but also for different items. Commit/abort item destruction > is handled as well. > > Signed-off-by: Jiri Pirko <j...@mellanox.com> > diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c > index df5a544..55a8411 100644 > --- a/net/switchdev/switchdev.c > +++ b/net/switchdev/switchdev.c > @@ -1,6 +1,6 @@ > /* > * net/switchdev/switchdev.c - Switch device API > - * Copyright (c) 2014 Jiri Pirko <j...@resnulli.us> > + * Copyright (c) 2014-2015 Jiri Pirko <j...@resnulli.us> > * Copyright (c) 2014-2015 Scott Feldman <sfel...@gmail.com> > * > * This program is free software; you can redistribute it and/or modify > @@ -16,9 +16,56 @@ > #include > #include > #include > +#include > #include > #include > > +void switchdev_trans_item_enqueue(struct switchdev_trans *trans, > + void *data, void (*destructor)(void const > *), > + struct switchdev_trans_item *tritem) > +{ > + tritem->data = data; > + tritem->destructor = destructor; > + list_add_tail(>list, >item_list); > +} > +EXPORT_SYMBOL_GPL(switchdev_trans_item_enqueue); > + > +static struct switchdev_trans_item * > +__switchdev_trans_item_dequeue(struct switchdev_trans *trans) > +{ > + struct switchdev_trans_item *tritem; > + > + if (list_empty(>item_list)) > + return NULL; > + tritem = list_first_entry(>item_list, > + struct switchdev_trans_item, list); > + list_del(>list); > + return tritem; > +} > + > +void *switchdev_trans_item_dequeue(struct switchdev_trans *trans) > +{ > + struct switchdev_trans_item *tritem; > + > + tritem = __switchdev_trans_item_dequeue(trans); > + BUG_ON(!tritem); > + return tritem->data; > +} > +EXPORT_SYMBOL_GPL(switchdev_trans_item_dequeue); Please add header comment block for these EXPORT_SYMBOL_GPL funcs. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 00/10] switchdev: transaction item queue and cleanup
On Tue, Sep 22, 2015 at 6:53 AM, Jiri Pirkowrote: > From: Jiri Pirko > > Jiri Pirko (10): > switchdev: rename "trans" to "trans_ph". > switchdev: introduce transaction item queue for attr_set and obj_add > switchdev: move transaction phase enum under transaction structure > switchdev: add switchdev_trans_ph_prepare/commit helpers > rocker: push struct switchdev_trans down through rocker code > rocker: use switchdev transaction queue for allocated memory > switchdev: remove "NONE" transaction phase > switchdev: remove "ABORT" transaction phase > dsa: use prepare/commit switchdev transaction helpers > switchdev: reduce transaction phase enum down to a boolean > > drivers/net/ethernet/rocker/rocker.c | 311 > +++ > include/net/switchdev.h | 40 +++-- > net/dsa/slave.c | 31 ++-- > net/switchdev/switchdev.c| 99 --- > 4 files changed, 252 insertions(+), 229 deletions(-) I like this version much better! Thank you for making the adjustments. My main concern about easily opting-out of prepare phase is gone. And I appreciate that you moved trans * as arg to attr_set and obj_add. Would you add a write-up in switchdev.txt about prepare-commit model and how to use the transaction object? Add it to this patchset, if you can. What testing have you done? I'll do testing today on my bench and report back, but I kind of want to know what to expect. Give a day or two for testing and review. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split
On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirkowrote: > Jiri Pirko (6): > switchdev: rename "trans" to "trans_ph". > switchdev: introduce transaction infrastructure for attr_set and > obj_add > rocker: switch to local transaction phase enum > switchdev: move transaction phase enum under transaction structure > rocker: use switchdev transaction queue for allocated memory > switchdev: split commit and prepare phase into two callbacks Patches compile, but first test bombs. Cut-and-paste of dump at end of this email. I'm not sure I'm liking this patchset because it looks like a way for switchdev drivers to easily opt-out of the prepare-commit transaction model by simply not implementing the *_pre op. I would rather drivers explicitly handle the PREPARE phase in code, even if that means skipping it gracefully (in code) with a comment (in code) explaining why it does not matter for this device/operation. That's what DSA had done, mostly because it was a retro-fit. Also, the patchset removes the ABORT callback in case of a rollback due to a failed PREPARE. We can't make the assumption that it's just a memory list to destroy on ABORT. The driver, on PREPARE, may have reserved device space or staged an operation on the device which we'll need to undo on ABORT. So we need ABORT back, and we need PREPARE to not be optional, so what's left list enqueue/dequeue helpers, which I'm not seeing much value in up-leveling as the driver can do list_add/del itself. Am I missing something? I didn't see a motivation statement for the RFC so I'm not sure where you wanted to take this. -scott [1.998791] BUG: unable to handle kernel NULL pointer dereference at 0008 [2.05] IP: [] rocker_port_kzalloc.isra.50+0x5/0x10 [rocker] [2.05] PGD 0 [2.05] Oops: [#1] SMP [2.05] Modules linked in: floppy(+) ata_piix(+) libata rocker(+) virtio_pci(+) virtio_ring virtio scsi_mod [2.05] CPU: 0 PID: 91 Comm: modprobe Not tainted 4.2.0+ #3 [2.05] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 [2.05] task: 88000f5e0800 ti: 88000f5f8000 task.ti: 88000f5f8000 [2.05] RIP: 0010:[] [] rocker_port_kzalloc.isra.50+0x5/0x10 [rocker] [2.05] RSP: 0018:88000f5fba20 EFLAGS: 00010246 [2.05] RAX: 88000f17c050 RBX: 88000b40 RCX: 0020 [2.05] RDX: RSI: RDI: [2.05] RBP: R08: R09: a0058680 [2.05] R10: 0001 R11: 9319 R12: [2.05] R13: 88000f5b7000 R14: 88000f5b7840 R15: [2.05] FS: 7fd132237700() GS:88000fc0() knlGS: [2.05] CS: 0010 DS: ES: CR0: 8005003b [2.05] CR2: 0008 CR3: 0f5d5000 CR4: 06f0 [2.05] Stack: [2.05] a005a32a 0001 00ff88000f5b7000 8175725a [2.05] a0057750 a0058680 0020001a [2.05] 88000b40 88000ea06000 88000f5b7000 [2.05] Call Trace: [2.05] [] ? rocker_cmd_exec+0x5a/0x1f0 [rocker] [2.05] [] ? rocker_cmd_get_port_stats_prep+0x90/0x90 [rocker] [2.05] [] ? rocker_cmd_get_port_settings_phys_name_proc+0xb0/0xb0 [rocker] [2.05] [] ? rocker_probe+0xb6a/0xe59 [rocker] [2.05] [] ? local_pci_probe+0x48/0xa0 [2.05] [] ? pci_device_probe+0x112/0x130 [2.05] [] ? driver_probe_device+0x196/0x2a0 [2.05] [] ? driver_probe_device+0x2a0/0x2a0 [2.05] [] ? __driver_attach+0x8d/0x90 [2.05] [] ? driver_probe_device+0x2a0/0x2a0 [2.05] [] ? bus_for_each_dev+0x4c/0x80 [2.05] [] ? bus_add_driver+0x119/0x220 [2.05] [] ? 0xa0065000 [2.05] [] ? driver_register+0x5a/0xe0 [2.05] [] ? 0xa0065000 [2.05] [] ? rocker_module_init+0x33/0x1000 [rocker] [2.05] [] ? do_one_initcall+0x90/0x1e0 [2.05] [] ? do_init_module+0x50/0x1d8 [2.05] [] ? load_module+0x1c1c/0x2240 [2.05] [] ? show_initstate+0x50/0x50 [2.05] [] ? SyS_init_module+0x104/0x130 [2.05] [] ? entry_SYSCALL_64_fastpath+0x16/0x75 [2.05] Code: 48 89 c1 48 c7 c2 10 dc 15 81 48 89 c6 e8 e4 0d 4a e1 48 8b 3b e8 1c 1b 4a e1 eb b6 66 2e 0f 1f 84 00 00 00 00 00 48 89 d1 89 f2 <8b> 77 08 e9 73 ff ff ff 0f 1f 00 48 83 ec 68 4c 89 7c 24 60 41 [2.05] RIP [] rocker_port_kzalloc.isra.50+0x5/0x10 [rocker] -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split
On Mon, Sep 21, 2015 at 1:09 AM, Jiri Pirkowrote: > Mon, Sep 21, 2015 at 09:23:24AM CEST, sfel...@gmail.com wrote: >>On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko wrote: >>> Jiri Pirko (6): >>> switchdev: rename "trans" to "trans_ph". >>> switchdev: introduce transaction infrastructure for attr_set and >>> obj_add >>> rocker: switch to local transaction phase enum >>> switchdev: move transaction phase enum under transaction structure >>> rocker: use switchdev transaction queue for allocated memory >>> switchdev: split commit and prepare phase into two callbacks >> >>Patches compile, but first test bombs. Cut-and-paste of dump at end >>of this email. > > Told you :) > > >> >>I'm not sure I'm liking this patchset because it looks like a way for >>switchdev drivers to easily opt-out of the prepare-commit transaction >>model by simply not implementing the *_pre op. I would rather drivers >>explicitly handle the PREPARE phase in code, even if that means >>skipping it gracefully (in code) with a comment (in code) explaining >>why it does not matter for this device/operation. That's what DSA had >>done, mostly because it was a retro-fit. > > Each driver should handle this inside it. If it does not need prepare > state, it simply does not implement it. That is the same for all cb, > ndos, netdev notifiers, etc. It is much cleaner and nicer to have these as > separate callbacks. Implementing multiple callback in one is just ugly, > sorry. > > >> >>Also, the patchset removes the ABORT callback in case of a rollback >>due to a failed PREPARE. We can't make the assumption that it's just >>a memory list to destroy on ABORT. The driver, on PREPARE, may have >>reserved device space or staged an operation on the device which we'll >>need to undo on ABORT. > > Yep, just register an item with custom destructor, there you can do > whatever. Also, I believe much nicer comparing to current code. > > >> >>So we need ABORT back, and we need PREPARE to not be optional, so >>what's left list enqueue/dequeue helpers, which I'm not seeing much >>value in up-leveling as the driver can do list_add/del itself. > > Why would every driver do it itself, over and over when there can be a > clean infrastructure to do that. Including abort phase. Without the driver > needed to be involved. > > >> >>Am I missing something? I didn't see a motivation statement for the >>RFC so I'm not sure where you wanted to take this. > > I want to make current code much nicer, easier to read and implement in > other drivers. Look at rocker.c and how often there is == PREPARE there. > It's nearly impossible to followthe code, sorry. You're going to end up duplicate a lot of code and make it easier to make mistakes. The reason why the == PREPARE checks are there is the same code is run for both PREPARE and COMMIT, with == PREPARE checks right at the last moment to avoid a irreversible change, such as deleting an entry from a table or changing a state variable or bumping a hardware desc pointer. Otherwise the code paths should be the same for PREPARE and COMMIT. If you split these, you're going to duplicate lots of code and one path will deviate from the other over time and now we've broken the prepare-commit model. We want PREPARE to answer the question "is it OK to do this?" and COMMIT to do without failure. Do you have a specific example where you can't follow the code with == PREPARE? > My next patchset is to un-mess rocker.c (that freaking ofdpa stuff is > everywhere) Rocker implements OF-DPA so it's seems natural that freaking OF-DPA stuff would be everywhere. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 7/7] switchdev: update documentation on FDB ageing_time
On Sun, Sep 20, 2015 at 7:24 AM, roopa <ro...@cumulusnetworks.com> wrote: > On 9/19/15, 7:21 PM, Scott Feldman wrote: >> >> Yes, your switch driver is in user-space so you have to use NTF_USE to >> refresh the entry since you cannot use the kernel driver model to >> call_switchdev_notifiers(SWITCHDEV_FDB_ADD, ...). Consequently, your >> entries are not marked with NTF_EXT_LEARNED, so this patch is a no-op >> for you. You can continue to use the bridge driver to age out your >> entries. > > yes, correct. I was not really saying this because it will cause us any > problems. > I was trying to say this for switchdev in general. > >> I'd rather someone add that knob when it's actually needed. When the first >> in-kernel switchdev driver that wants to use the bridge driver's ageing >> function, then we can make that adjustment. > > I was suggesting the other way around. Keep the default to what is in the > kernel today and the first in-kernel switchdev driver that wants to age, > should introduce the ability to not age in the bridge driver (Rocker will > continue to work as it does today). Because, I am only concerned that rocker > may end up being the only device that uses the default behavior introduced > by this patch. And every real hardware uses the bridge driver to age > (because there are no in kernel examples today). I am curious to know who > else is using hardware ageing today. A driver patch for a (real) hardware device which does the ageing in hw is around the corner. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 7/7] switchdev: update documentation on FDB ageing_time
On Sat, Sep 19, 2015 at 6:21 PM, roopa <ro...@cumulusnetworks.com> wrote: > On 9/18/15, 12:55 PM, sfel...@gmail.com wrote: >> >> From: Scott Feldman <sfel...@gmail.com> >> >> Signed-off-by: Scott Feldman <sfel...@gmail.com> >> --- >> Documentation/networking/switchdev.txt | 24 >> 1 file changed, 12 insertions(+), 12 deletions(-) >> >> diff --git a/Documentation/networking/switchdev.txt >> b/Documentation/networking/switchdev.txt >> index 476df04..67e43ee 100644 >> --- a/Documentation/networking/switchdev.txt >> +++ b/Documentation/networking/switchdev.txt >> @@ -239,20 +239,20 @@ The driver should initialize the attributes to the >> hardware defaults. >> FDB Ageing >> ^^ >> -There are two FDB ageing models supported: 1) ageing by the device, and >> 2) >> -ageing by the kernel. Ageing by the device is preferred if many FDB >> entries >> -are supported. The driver calls >> call_switchdev_notifiers(SWITCHDEV_FDB_DEL, >> -...) to age out the FDB entry. In this model, ageing by the kernel >> should be >> -turned off. XXX: how to turn off ageing in kernel on a per-port basis or >> -otherwise prevent the kernel from ageing out the FDB entry? >> - >> -In the kernel ageing model, the standard bridge ageing mechanism is used >> to age >> -out stale FDB entries. To keep an FDB entry "alive", the driver should >> refresh >> -the FDB entry by calling call_switchdev_notifiers(SWITCHDEV_FDB_ADD, >> ...). The >> +The bridge will skip ageing FDB entries marked with NTF_EXT_LEARNED and >> it is >> +the responsibility of the port driver/device to age out these entries. >> If the >> +port device supports ageing, when the FDB entry expires, it will notify >> the >> +driver which in turn will notify the bridge with SWITCHDEV_FDB_DEL. If >> the >> +device does not support ageing, the driver can simulate ageing using a >> +garbage collection timer to monitor FBD entries. Expired entries will be >> +notified to the bridge using SWITCHDEV_FDB_DEL. See rocker driver for >> +example of driver running ageing timer. > > We do rely on the bridge driver ageing out entries. We have gone from > hardware ageing to ageing in the switch driver to ultimately ageing in the > bridge driver. :). And we keep the fdb entries in the bridge driver "alive" > by using 'NTF_USE' from the user-space driver. Yes, your switch driver is in user-space so you have to use NTF_USE to refresh the entry since you cannot use the kernel driver model to call_switchdev_notifiers(SWITCHDEV_FDB_ADD, ...). Consequently, your entries are not marked with NTF_EXT_LEARNED, so this patch is a no-op for you. You can continue to use the bridge driver to age out your entries. >> +To keep an NTF_EXT_LEARNED entry "alive", the driver should refresh the >> FDB >> +entry by calling call_switchdev_notifiers(SWITCHDEV_FDB_ADD, ...). The >> > Even with your current patches, looks like the switch driver will need to > refresh the entries anyways to keep the "last-used" time to now. > In which case is there much value in switch driver doing the ageing ?. "should" not "must". Value is for the many learned FDB entries case, to move the ageing function to hardware. > I am thinking keeping the default behavior of the bridge driver to age and > anything else configurable might be a better option. I'd rather someone add that knob when it's actually needed. When the first in-kernel switchdev driver that wants to use the bridge driver's ageing function, then we can make that adjustment. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 3/7] rocker: adding port ageing_time for ageing out FDB entries
On Fri, Sep 18, 2015 at 11:30 PM, Jiri Pirko <j...@resnulli.us> wrote: > Fri, Sep 18, 2015 at 09:55:47PM CEST, sfel...@gmail.com wrote: >>From: Scott Feldman <sfel...@gmail.com> >> >>Follow-up patcheset will allow user to change ageing_time, but for now >>just hard-code it to a fixed value (the same value used as the default >>for the bridge driver). >> >>Signed-off-by: Scott Feldman <sfel...@gmail.com> >>--- >> drivers/net/ethernet/rocker/rocker.c |2 ++ >> 1 file changed, 2 insertions(+) >> >>diff --git a/drivers/net/ethernet/rocker/rocker.c >>b/drivers/net/ethernet/rocker/rocker.c >>index f55ed2c..eba22f5 100644 >>--- a/drivers/net/ethernet/rocker/rocker.c >>+++ b/drivers/net/ethernet/rocker/rocker.c >>@@ -221,6 +221,7 @@ struct rocker_port { >> __be16 internal_vlan_id; >> int stp_state; >> u32 brport_flags; >>+ unsigned long ageing_time; >> bool ctrls[ROCKER_CTRL_MAX]; >> unsigned long vlan_bitmap[ROCKER_VLAN_BITMAP_LEN]; >> struct napi_struct napi_tx; >>@@ -4975,6 +4976,7 @@ static int rocker_probe_port(struct rocker *rocker, >>unsigned int port_number) >> rocker_port->port_number = port_number; >> rocker_port->pport = port_number + 1; >> rocker_port->brport_flags = BR_LEARNING | BR_LEARNING_SYNC; >>+ rocker_port->ageing_time = 300 * HZ; > > How about to add also "BR_DEFAULT_AGEING_TIME" and use it here? Yes, added for v2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 4/7] bridge: define some min/max ageing time constants we'll use next
On Fri, Sep 18, 2015 at 11:45 PM, Jiri Pirko <j...@resnulli.us> wrote: > Fri, Sep 18, 2015 at 09:55:48PM CEST, sfel...@gmail.com wrote: >>From: Scott Feldman <sfel...@gmail.com> >> >>Signed-off-by: Scott Feldman <sfel...@gmail.com> >>--- >> include/linux/if_bridge.h |4 >> 1 file changed, 4 insertions(+) >> >>diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h >>index dad8b00..6cc6dbc 100644 >>--- a/include/linux/if_bridge.h >>+++ b/include/linux/if_bridge.h >>@@ -46,6 +46,10 @@ struct br_ip_list { >> #define BR_LEARNING_SYNC BIT(9) >> #define BR_PROXYARP_WIFI BIT(10) >> >>+/* values as per ieee8021QBridgeFdbAgingTime */ >>+#define BR_MIN_AGEING_TIME(10 * HZ) >>+#define BR_MAX_AGEING_TIME(100 * HZ) > > I think that a bridge patch checking against these values should be > introduced along with these values, in the same patchset I need the MIN value for this patchset in rocker's ageing timer, so it's introduced here. MIN/MAX will be used again in follow-on patch Prem is going to send to range check user input. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split
On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirkowrote: > Jiri Pirko (6): > switchdev: rename "trans" to "trans_ph". > switchdev: introduce transaction infrastructure for attr_set and > obj_add > rocker: switch to local transaction phase enum > switchdev: move transaction phase enum under transaction structure > rocker: use switchdev transaction queue for allocated memory > switchdev: split commit and prepare phase into two callbacks Whew, that's a lot of work! Seems like a good idea to up-level this for other drivers to share. Let me apply the patches and run my tests and get back to you. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-next 1/3] net: switchdev: extract switchdev_obj_vlan
On Tue, Sep 8, 2015 at 1:47 PM, Vivien Didelotwrote: > Move the switchdev_obj_vlan structure out of the switchdev_obj union. > > This lightens the switchdev_obj structure and allows drivers to access > the object transaction and callback directly from a switchdev_obj_vlan. > This is more consistent and convenient for add and dump operations. > > The patch updates bridge, the Rocker driver and DSA accordingly. > > Signed-off-by: Vivien Didelot > --- [cut] > diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c > index 16c1c43..9923a97 100644 > --- a/net/switchdev/switchdev.c > +++ b/net/switchdev/switchdev.c > @@ -397,7 +397,7 @@ int call_switchdev_notifiers(unsigned long val, struct > net_device *dev, > EXPORT_SYMBOL_GPL(call_switchdev_notifiers); > > struct switchdev_vlan_dump { > - struct switchdev_obj obj; > + struct switchdev_obj_vlan vlan; /* must be first */ > struct sk_buff *skb; > u32 filter_mask; > u16 flags; > @@ -439,9 +439,8 @@ static int switchdev_port_vlan_dump_put(struct net_device > *dev, > static int switchdev_port_vlan_dump_cb(struct net_device *dev, >struct switchdev_obj *obj) > { > - struct switchdev_vlan_dump *dump = > - container_of(obj, struct switchdev_vlan_dump, obj); > - struct switchdev_obj_vlan *vlan = >obj.u.vlan; > + struct switchdev_vlan_dump *dump = (struct switchdev_vlan_dump *) obj; > + struct switchdev_obj_vlan *vlan = (struct switchdev_obj_vlan *) obj; Same comment as Andrew's: use container_of() rather than casting for both of the use-cases above. (And don't require inner-structure to be fist in outer structure). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-next 0/3] net: switchdev: extract specific object structures
On Tue, Sep 8, 2015 at 1:47 PM, Vivien Didelotwrote: > Hi! > > Current implementations of .switchdev_port_obj_add and > .switchdev_port_obj_dump > must pass the generic switchdev_obj structure instead of a specific one (e.g. > switchdev_obj_fdb) to the related driver accessors, because it contains the > object transaction and callback. This is not very convenient for drivers. > > Instead of having all specific structures included into a switchdev_obj union, > it would be simpler to embed a switchdev_obj structure as the first member of > the more specific ones. That way, a driver can cast the switchdev_obj pointer > to the specific structure and have access to all the information from it. > > As an example, this allows Rocker to change its two FDB accessors: > > rocker_port_fdb_add(rocker_port, obj->trans, >u.fdb); > rocker_port_fdb_dump(rocker_port, obj); > > for these most consistent ones: > > fdb = (struct switchdev_obj_fdb *) obj; > rocker_port_fdb_add(rocker_port, fdb); > rocker_port_fdb_dump(rocker_port, fdb); > > This is what struct netdev_notifier_info and its specific supersets (e.g. > struct netdev_notifier_changeupper_info) do in include/linux/netdevice.h. > > This patchset does that and updates bridge, Rocker and DSA accordingly. > > Note that this patchset was sent as an RFC not to bother David with new > net-next stuffs, but if the changes look good, it is ready to merge. > > Also, please take note that the change sits on top of: > http://patchwork.ozlabs.org/patch/514894/ > HI Vivien, looks good. Some requests for the next version: 1) Move the cb() member out of struct switchdev_obj and into the containing struct switchdev_obj_xxx structure. This way the second arg to cb() can be struct switchdev_obj_xxx *. (And, I have pending patches where the first arg to cb() is different, so making the dump cb() func specific to obj type will help). 2) As already mentioned, use container_of where possible to avoid locking struct member at a position within struct. Should the same treatment be done for switchdev_attr? Maybe that's a second patchset. -scott -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net] switchdev: fix return value of switchdev_port_fdb_dump in case of error
On Thu, Sep 3, 2015 at 5:04 AM, Jiri Pirko <j...@resnulli.us> wrote: > From: Jiri Pirko <j...@mellanox.com> > > switchdev_port_fdb_dump is used as .ndo_fdb_dump. Its return value is > idx, so we cannot return errval. > > Fixes: 45d4122ca7cd ("switchdev: add support for fdb add/del/dump via > switchdev_port_obj ops.") > Signed-off-by: Jiri Pirko <j...@mellanox.com> Acked-by: Scott Feldman<sfel...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 0/3] ipv4: Hash-based multipath routing
On Sat, Aug 29, 2015 at 1:46 PM, David Miller da...@davemloft.net wrote: From: Peter Nørlund p...@ordbogen.com Date: Sat, 29 Aug 2015 22:31:15 +0200 On Sat, 29 Aug 2015 13:14:29 -0700 (PDT) David Miller da...@davemloft.net wrote: From: p...@ordbogen.com Date: Fri, 28 Aug 2015 22:00:47 +0200 When the routing cache was removed in 3.6, the IPv4 multipath algorithm changed from more or less being destination-based into being quasi-random per-packet scheduling. This increases the risk of out-of-order packets and makes it impossible to use multipath together with anycast services. Don't even try to be fancy. Simply kill the round-robin stuff off completely, and make hash based routing the one and only mode, no special configuration stuff necessary. I like the sound of that! Just to be clear - are you telling me to stick with L3 and skip the L4 part? For now it seems best to just do L3 and make ipv4 and ipv6 behave the same. That makes mapping the hash to offload hardware easier also. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-next 0/2] Add new switchdev device class
On Thu, Aug 27, 2015 at 2:06 AM, Andrew Lunn and...@lunn.ch wrote: On Thu, Aug 27, 2015 at 01:42:24AM -0700, Scott Feldman wrote: On Thu, Aug 27, 2015 at 12:45 AM, Andrew Lunn and...@lunn.ch wrote: I don't know about how this overlaps with DSA platform_class. Florian? There is some overlap with DSA, but the current DSA model, with respect to probing, is broken. So this might be interesting as a way towards fix that. One thing to keep in mind is the D in DSA. You talk about switch, singular. DSA has a number of switches in a cluster. We currently export a single switchdev interface for the cluster, but there are some properties which are per switch, e.g. temperature, eeprom contents, statistics, power management etc. Export a single 'switchdev' or 'netdev' for the cluster? I hope that was a typo. I probably expressed that badly. The hardware i have on my desk has three Marvell switches in a chain, with one end of the chain connected to a host Ethernet interface. From the switchdev ops level, you don't see anything of this chain. But some of the operations do need to be aware of this chain, for example vlans which span multiple chips in this chain. With switchdev device class, you'd instantiate one per phy switch, and have per-switch props (temp, eeprom, etc) thru each switchdev instance. O.K. This is fine, but we need people to understand that a switchdev device class represents some middle layer in the hierarchy, not the top layer. Otherwise false assumptions might be made. So with kobj, a device can have a parent. So I experimented with my RFC patch and changed register_switchdev to take a parent switchdev arg, which is NULL for leaf switchdevs: int register_switchdev(struct switchdev *sdev, const char *name, struct switchdev *parent) { struct device *dev = sdev-dev; int err; device_initialize(dev); dev-class = switchdev_class; if (parent) dev-parent = parent-dev; err = dev_set_name(dev, %s, name); if (err) return err; return device_add(dev); } Then I tried this with rocker and it works as expected. On module load, I create the master switchdev, and then on PCI probe for each phys switch dev, I put the slave switchdev in the master using register_switchdev. Here's one slave in the master rockers switch: tree /sys/class/switchdev/rockers /sys/class/switchdev/rockers ├── 525400123501 │ ├── device - ../../rocker │ ├── foo │ ├── power │ │ ├── async │ │ ├── autosuspend_delay_ms │ │ ├── control │ │ ├── runtime_active_kids │ │ ├── runtime_active_time │ │ ├── runtime_enabled │ │ ├── runtime_status │ │ ├── runtime_suspended_time │ │ └── runtime_usage │ ├── subsystem - ../../../../../class/switchdev │ └── uevent ├── foo ├── power │ ├── async │ ├── autosuspend_delay_ms │ ├── control │ ├── runtime_active_kids │ ├── runtime_active_time │ ├── runtime_enabled │ ├── runtime_status │ ├── runtime_suspended_time │ └── runtime_usage ├── subsystem - ../../../../class/switchdev └── uevent With this, we can stack switchdevs, I guess as high as we want. Does this look usable for DSA? An attr set on the master would get pushed down to the leaves. We'd can do it with the same style of recursive algos we use for switchdev port attrs. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 3/6] net: add netif_is_ovs_master helper with IFF_OPENVSWITCH private flag
On Wed, Aug 26, 2015 at 10:43 PM, Jiri Pirko j...@resnulli.us wrote: Wed, Aug 26, 2015 at 07:43:18PM CEST, sfel...@gmail.com wrote: On Wed, Aug 26, 2015 at 9:36 AM, Jiri Pirko j...@resnulli.us wrote: From: Jiri Pirko j...@mellanox.com Add this helper so code can easily figure out if netdev is openswitch. Signed-off-by: Jiri Pirko j...@mellanox.com --- include/linux/netdevice.h| 8 net/openvswitch/vport-internal_dev.c | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index be625f4..0a884e6 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1264,6 +1264,7 @@ struct net_device_ops { * @IFF_MACVLAN: Macvlan device * @IFF_VRF_MASTER: device is a VRF master * @IFF_NO_QUEUE: device can run without qdisc attached + * @IFF_VRF_OPENVSWITCH: device is a Open vSwitch master */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 10, @@ -1293,6 +1294,7 @@ enum netdev_priv_flags { IFF_IPVLAN_SLAVE= 124, IFF_VRF_MASTER = 125, IFF_NO_QUEUE= 126, + IFF_OPENVSWITCH = 127, }; #define IFF_802_1Q_VLANIFF_802_1Q_VLAN @@ -1322,6 +1324,7 @@ enum netdev_priv_flags { #define IFF_IPVLAN_SLAVE IFF_IPVLAN_SLAVE #define IFF_VRF_MASTER IFF_VRF_MASTER #define IFF_NO_QUEUE IFF_NO_QUEUE +#define IFF_OPENVSWITCHIFF_OPENVSWITCH /** * struct net_device - The DEVICE structure. @@ -3853,6 +3856,11 @@ static inline bool netif_is_bridge_master(const struct net_device *dev) return dev-priv_flags IFF_EBRIDGE; } +static inline bool netif_is_ovs_master(const struct net_device *dev) +{ + return dev-priv_flags IFF_OPENVSWITCH; +} We're going to run out of priv_flags bits. This flag doesn't seem like something that will be checked lots of places. How about using rtnl_link_ops-kind to save a bit in priv_flags? static inline bool netif_is_ovs_master(const struct net_device *dev) { return !strcmp(dev-rtnl_link_ops-kind, openvswitch)); } There are lot of helpers like this for other soft-devices. I think that is okay to have it this way. The thing is that sometimes you need to use thi helper in fast path and in that case, you do not want to strcmp. There is plenty of priv_flags bits for now when I killed the bonding stuff. Ya, but think about the bit: you (and others) used a bit in priv_flags to indicate the netdev type. Can you add an enum field to rtnl_link_ops-type to indicate link type? Then it's not a strcmp. You can write your helper using strcmp first, and then later migrate to using rtnl_link_ops-type. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-next 0/2] Add new switchdev device class
On Thu, Aug 27, 2015 at 12:27 AM, Jiri Pirko j...@resnulli.us wrote: Thu, Aug 27, 2015 at 09:16:44AM CEST, sfel...@gmail.com wrote: From: Scott Feldman sfel...@gmail.com In the switchdev model, we use netdevs to represent switchdev ports, but we have no representation for the switch itself. So, introduce a new switchdev device class so we can define semantics and programming interfaces for the switch itself. Switchdev device class isn't tied to any particular bus. This patch set is just the skeleton to get us started. It adds the sysfs object registration for the new class and defines a class-level attr foo. With the new class, we could hook PM functions, for example, to handle power transitions at the switch level. I registered rocker and get: $ ls /sys/class/switchdev/525400123501/ foo power subsystem uevent No, please avoid adding anything to sysfs. If we need to add anything, lets make is accesible using Netlink only. I see no harm in using the device model to define and new device class which just so happens to show up in sysfs. What sysfs attrs get exposed is where we can have some discussion/rules. So what next? I'd rather not build APIs around sysfs, so we need a netlink API we can build on top of this. It's not really rtnl. Maybe genl would work? What ever it is, we'd need to teach iproute2 about a new 'switch' command. Netlink API would allow us to represent switch-wide objects such as registers, tables, stats, firmware, and maybe even control. I think with with netlink TLVs, we can create a framework for these objects but still allow the switch driver provide switch-specific info. For example, a table object: [TABLES] [TABLE] [FIELDS] [FIELD] [ID, TYPE] [DATA] [ID, VALUE] Alert! I feel that someone would like to abuse this iface for writing configuration through. This should be read-only by design. I also think that this should not be something switch-specific. I believe that NIC drivers would benefit from this iface as well when they want to expose something. I think we should use genl for this. Read-only is fine. Look, I'm just trying to dump rocker internal tables in some format I can grep outside the kernel. The tables are get big and complicated fast and printk doesn't cut it. I can use degugfs privately, but I need to be able to dump same for field troubleshooting. I can't use debugfs, so I want some kind of XML-like dump facility. It's going to have device-specific data, so I'm looking for an XML-like way to represent this data in netlink. Maybe iproute2 has pretty-printers for specific switches like ethtool has for reg dumps. I feel like a lot of what you described overlaps with existing interfaces and tools. Why don't we just reuse that? For firmware for example, just take one of the ports. Same for stats (I plan to expose my mlxsw switch-wide stats in ethtool so they are accessible through every port netdevice). Port-centric stats are fine for port netdevs, but I'd like switch-wide stats to show up elsewhere. Thinking ahead: I'd like to put port into namespaces and I don't want to dump stats on a port and see stats on ports in other namespaces. I still do not see the need for new device class. I have strong feeling that it should be avoided. Ok -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-next 0/2] Add new switchdev device class
On Thu, Aug 27, 2015 at 12:36 AM, John Fastabend john.fastab...@gmail.com wrote: On 15-08-27 12:16 AM, sfel...@gmail.com wrote: From: Scott Feldman sfel...@gmail.com In the switchdev model, we use netdevs to represent switchdev ports, but we have no representation for the switch itself. So, introduce a new switchdev device class so we can define semantics and programming interfaces for the switch itself. Switchdev device class isn't tied to any particular bus. This patch set is just the skeleton to get us started. It adds the sysfs object registration for the new class and defines a class-level attr foo. With the new class, we could hook PM functions, for example, to handle power transitions at the switch level. I registered rocker and get: $ ls /sys/class/switchdev/525400123501/ foo power subsystem uevent So what next? I'd rather not build APIs around sysfs, so we need a netlink API we can build on top of this. It's not really rtnl. Maybe genl would work? What ever it is, we'd need to teach iproute2 about a new 'switch' command. Netlink API would allow us to represent switch-wide objects such as registers, tables, stats, firmware, and maybe even control. I think with with netlink TLVs, we can create a framework for these objects but still allow the switch driver provide switch-specific info. For example, a table object: Hi Scott, I was going to take the flow-api presented in Feb and submitted to netdev and get it up to date. I started doing this today should be ready end of week. [TABLES] [TABLE] [FIELDS] [FIELD] [ID, TYPE] [DATA] [ID, VALUE] The structure I used previously which looks like your prototype I think, (https://github.com/jrfastab/rocker-net-next/blob/master/include/uapi/linux/if_flow.h) * [NFL_TABLE_IDENTIFIER_TYPE] * [NFL_TABLE_IDENTIFIER] * [NFL_TABLE_TABLES] * [NFL_TABLE] * [NFL_TABLE_ATTR_NAME] * [NFL_TABLE_ATTR_UID] * [NFL_TABLE_ATTR_SOURCE] * [NFL_TABLE_ATTR_APPLY] * [NFL_TABLE_ATTR_SIZE] * [NFL_TABLE_ATTR_MATCHES] * [NFL_FIELD_REF] * [NFL_FIELD_REF_INSTANCE] * [NFL_FIELD_REF_HEADER] * [NFL_FIELD_REF_FIELD] * [NFL_FIELD_REF_MASK] * [NFL_FIELD_REF_TYPE] * [...] * [NFL_TABLE_ATTR_ACTIONS] * [NFL_ACTION_ATTR_UID] * [...] * [NFL_TABLE] * [...] * This is well-typed per Dave's comment. And because its Netlink based it can be easily extended as needed. The above gives basic information on the table. Sure it could stand to have some other entries in it but I never needed them for my capabilities/resource discovery. We could argue about removing some if they are too specific to my use cases. I was looking for something more generic. Not quite as raw as ethtool reg dump, but not too rigid I have to bend the def of fields to make it work. Maybe what I want is impossible. Maybe iproute2 has pretty-printers for specific switches like ethtool has for reg dumps. I don't know about how this overlaps with DSA platform_class. Florian? Comments? A few other interesting (at least to me) structures you can find in the if_flow.h header file give how the tables in the hardware are put together. I found this really useful when trying to sort out where my various ACLs/nexthop entries/etc were in the hardware order. This is important to know for many cases. This is NFL_TABLE_GRAPH. Also I found being able to know' what headers the hardware supports to be very useful. For example I want/need to know if QinQ will work. Or VXLAN/NSH, Geneve, etc. This is NFL_HEADER_GRAPH. Sure if_flow.h is a poor name and FlowAPI is probably not really great. But both those names come from how I model the hardware as match action tables. I can change those names to switchdev_resources or switchdev_info if folks want. My point is the interface is generic and provides a common interface for most hardware I've seen to date. Certainly curious if there is hardware it doesn't map to. Although tables and pipelines seems fairly universal for a large set of devices to me. Further it can be extended as needed. I'll drop the set_rule/del_rule parts to be merged with ebpf/tc/qdisc. Ya, you're talking about tables in hardware. I just want to dump some driver-internal driver-specific data in tabular form to user-space so I can grep thru it and debug. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 3/6] net: add netif_is_ovs_master helper with IFF_OPENVSWITCH private flag
On Wed, Aug 26, 2015 at 11:30 PM, Jiri Pirko j...@resnulli.us wrote: Thu, Aug 27, 2015 at 08:23:13AM CEST, sfel...@gmail.com wrote: On Wed, Aug 26, 2015 at 10:43 PM, Jiri Pirko j...@resnulli.us wrote: Wed, Aug 26, 2015 at 07:43:18PM CEST, sfel...@gmail.com wrote: On Wed, Aug 26, 2015 at 9:36 AM, Jiri Pirko j...@resnulli.us wrote: From: Jiri Pirko j...@mellanox.com We're going to run out of priv_flags bits. This flag doesn't seem like something that will be checked lots of places. How about using rtnl_link_ops-kind to save a bit in priv_flags? static inline bool netif_is_ovs_master(const struct net_device *dev) { return !strcmp(dev-rtnl_link_ops-kind, openvswitch)); } There are lot of helpers like this for other soft-devices. I think that is okay to have it this way. The thing is that sometimes you need to use thi helper in fast path and in that case, you do not want to strcmp. There is plenty of priv_flags bits for now when I killed the bonding stuff. Ya, but think about the bit: you (and others) used a bit in priv_flags to indicate the netdev type. Can you add an enum field to rtnl_link_ops-type to indicate link type? Then it's not a strcmp. You can write your helper using strcmp first, and then later migrate to using rtnl_link_ops-type. Also, dev can be multiple things, it can be bridge port and vlan dev at the same time. Flags are good for this. priv_flags bits are three types: 1) dev attribute (IFF_XMIT_DST_RELEASE, IFF_DISABLE_NETPOLL, etc) 2) dev type (IFF_802_1Q_VLAN, IFF_EBRIDGE, etc) 3) and dev membership (IFF_BRIDGE_PORT, IFF_TEAM_PORT, etc) Are there types 2 or 3 in any fast paths? Type 2 can move to enum; they're mutually exclusive. Type 3 is the dev's master's type 2, and since dev can have only one master, no flag is needed: just look up master type to know if dev is bridged or ovs'ed. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-next 0/2] Add new switchdev device class
On Thu, Aug 27, 2015 at 12:45 AM, Andrew Lunn and...@lunn.ch wrote: I don't know about how this overlaps with DSA platform_class. Florian? There is some overlap with DSA, but the current DSA model, with respect to probing, is broken. So this might be interesting as a way towards fix that. One thing to keep in mind is the D in DSA. You talk about switch, singular. DSA has a number of switches in a cluster. We currently export a single switchdev interface for the cluster, but there are some properties which are per switch, e.g. temperature, eeprom contents, statistics, power management etc. Export a single 'switchdev' or 'netdev' for the cluster? I hope that was a typo. With switchdev device class, you'd instantiate one per phy switch, and have per-switch props (temp, eeprom, etc) thru each switchdev instance. The cluster netdev would stay a netdev which spans the switches. Although ethtool does have options for these, it is not always a natural fit. ethtool --eeprom-dump on a switch port dumps the switch EEPROM, and all ports on the switch can be used. --register-dump on a port is good for showing the per ports registers, but there is no natural interface for showing the global switch registers. Florian's resent L2 interface patch shows we have issues getting access to the 'cpu' port, the port which interfaces to the host. We need to be careful that any new interfaces we add are better at representing the true structure of the hardware, which includes there being multiple physical switches below a switchdev, and they are connected together by ports which are currently not visible as netdevs. Right. I'm thinking one switchdev instance (based on the new switchdev class) per phy switch. Port netdevs for switch ports worth exposing (for user interaction). And cluster netdev for dsa tagging encap/decap. Do I have this right? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 2/3] mlxsw: expose EMAD transactions statistics via debugfs
On Wed, Aug 26, 2015 at 10:49 AM, David Miller da...@davemloft.net wrote: From: Jiri Pirko j...@resnulli.us Date: Wed, 26 Aug 2015 09:37:57 +0200 I don't think that are much more cases like this. Therefore I think that for this cases, debugfs might be a good way to expose debugging stats. Scott wanted to do similar things in rocker. DSA guys too. Every switch device is going to have some kind of hierarchy like this, it's not a unique situation. We've been able to get buy so far without a user-visible device for the switch. The switch ports are represented by netdevs, so that's easy. How can we create an object for the switch itself, so we can attach common interfaces for the user to dump switch-level stats or tables? Using another netdev doesn't seem right. Do we need a new device class for switches, and then create some common tool/interfaces for switch device class? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 0/6] rocker: make master change handling nicer
On Wed, Aug 26, 2015 at 9:36 AM, Jiri Pirko j...@resnulli.us wrote: From: Jiri Pirko j...@mellanox.com Jiri Pirko (6): net: introduce change upper device notifier change info net: add netif_is_bridge_master helper net: add netif_is_ovs_master helper with IFF_OPENVSWITCH private flag net: kill long time unused bonding private flags rocker: use new helper to figure out master kind rocker: use change upper info Looks good Jiri, thanks for cleaning this up. Only nit was about conserving netdev-priv_flags. Using rtnl_link_ops-kind to determine netdev type scales better. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 6/6] rocker: use change upper info
On Wed, Aug 26, 2015 at 9:36 AM, Jiri Pirko j...@resnulli.us wrote: From: Jiri Pirko j...@mellanox.com Since now information about changed upper is passed along, benefit from that and use this info directly. This also fixes possible issues that could happen when non-master device is added (current code does not distinguish between master and non-master upper device). Signed-off-by: Jiri Pirko j...@mellanox.com Acked-by: Scott Feldman sfel...@gmail.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 5/6] rocker: use new helper to figure out master kind
On Wed, Aug 26, 2015 at 9:36 AM, Jiri Pirko j...@resnulli.us wrote: From: Jiri Pirko j...@mellanox.com Looking at rtnl kind string is kind of ugly. So use new helpers to do this in nicer way. Signed-off-by: Jiri Pirko j...@mellanox.com Acked-by: Scott Feldman sfel...@gmail.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 3/6] net: add netif_is_ovs_master helper with IFF_OPENVSWITCH private flag
On Wed, Aug 26, 2015 at 9:36 AM, Jiri Pirko j...@resnulli.us wrote: From: Jiri Pirko j...@mellanox.com Add this helper so code can easily figure out if netdev is openswitch. Signed-off-by: Jiri Pirko j...@mellanox.com --- include/linux/netdevice.h| 8 net/openvswitch/vport-internal_dev.c | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index be625f4..0a884e6 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1264,6 +1264,7 @@ struct net_device_ops { * @IFF_MACVLAN: Macvlan device * @IFF_VRF_MASTER: device is a VRF master * @IFF_NO_QUEUE: device can run without qdisc attached + * @IFF_VRF_OPENVSWITCH: device is a Open vSwitch master */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 10, @@ -1293,6 +1294,7 @@ enum netdev_priv_flags { IFF_IPVLAN_SLAVE= 124, IFF_VRF_MASTER = 125, IFF_NO_QUEUE= 126, + IFF_OPENVSWITCH = 127, }; #define IFF_802_1Q_VLANIFF_802_1Q_VLAN @@ -1322,6 +1324,7 @@ enum netdev_priv_flags { #define IFF_IPVLAN_SLAVE IFF_IPVLAN_SLAVE #define IFF_VRF_MASTER IFF_VRF_MASTER #define IFF_NO_QUEUE IFF_NO_QUEUE +#define IFF_OPENVSWITCHIFF_OPENVSWITCH /** * struct net_device - The DEVICE structure. @@ -3853,6 +3856,11 @@ static inline bool netif_is_bridge_master(const struct net_device *dev) return dev-priv_flags IFF_EBRIDGE; } +static inline bool netif_is_ovs_master(const struct net_device *dev) +{ + return dev-priv_flags IFF_OPENVSWITCH; +} We're going to run out of priv_flags bits. This flag doesn't seem like something that will be checked lots of places. How about using rtnl_link_ops-kind to save a bit in priv_flags? static inline bool netif_is_ovs_master(const struct net_device *dev) { return !strcmp(dev-rtnl_link_ops-kind, openvswitch)); } + static inline bool netif_index_is_vrf(struct net *net, int ifindex) { bool rc = false; diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c index c058bbf..80b3e12 100644 --- a/net/openvswitch/vport-internal_dev.c +++ b/net/openvswitch/vport-internal_dev.c @@ -135,7 +135,7 @@ static void do_setup(struct net_device *netdev) netdev-netdev_ops = internal_dev_netdev_ops; netdev-priv_flags = ~IFF_TX_SKB_SHARING; - netdev-priv_flags |= IFF_LIVE_ADDR_CHANGE; + netdev-priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_OPENVSWITCH; netdev-destructor = internal_dev_destructor; netdev-ethtool_ops = internal_dev_ethtool_ops; netdev-rtnl_link_ops = internal_dev_link_ops; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html