Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-21 Thread Jiri Pirko
Tue, Sep 22, 2015 at 03:36:28AM CEST, vivien.dide...@savoirfairelinux.com wrote:
>Hi Jiri,
>
>On Sep. Monday 21 (39) 08:25 PM, Jiri Pirko wrote:
>> Mon, Sep 21, 2015 at 07:13:58PM CEST, vivien.dide...@savoirfairelinux.com 
>> wrote:
>> >Hi Jiri, Scott,
>> >
>> >On Sep. Monday 21 (39) 10:09 AM, Jiri Pirko wrote:
>> >> Mon, Sep 21, 2015 at 09:23:24AM CEST, sfel...@gmail.com wrote:
>> >> >On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko  wrote:
>> >> >> Jiri Pirko (6):
>> >> >>   switchdev: rename "trans" to "trans_ph".
>> >> >>   switchdev: introduce transaction infrastructure for attr_set and
>> >> >> obj_add
>> >> >>   rocker: switch to local transaction phase enum
>> >> >>   switchdev: move transaction phase enum under transaction structure
>> >> >>   rocker: use switchdev transaction queue for allocated memory
>> >> >>   switchdev: split commit and prepare phase into two callbacks
>> >> >
>> >> >Patches compile, but first test bombs.  Cut-and-paste of dump at end
>> >> >of this email.
>> >> 
>> >> Told you :)
>> >> 
>> >> 
>> >> >
>> >> >I'm not sure I'm liking this patchset because it looks like a way for
>> >> >switchdev drivers to easily opt-out of the prepare-commit transaction
>> >> >model by simply not implementing the *_pre op.  I would rather drivers
>> >> >explicitly handle the PREPARE phase in code, even if that means
>> >> >skipping it gracefully (in code) with a comment (in code) explaining
>> >> >why it does not matter for this device/operation.  That's what DSA had
>> >> >done, mostly because it was a retro-fit.
>> >> 
>> >> Each driver should handle this inside it. If it does not need prepare
>> >> state, it simply does not implement it. That is the same for all cb,
>> >> ndos, netdev notifiers, etc. It is much cleaner and nicer to have these as
>> >> separate callbacks. Implementing multiple callback in one is just ugly,
>> >> sorry.
>> >
>> >This is true, (in DSA) we don't have to implement the prepare phase if
>> >we fully support the feature in hardware.
>> >
>> >To give a real example, Marvell switch drivers currently implement all
>> >add/del/dump calls for VLAN FDB (where VID 0 means the port itself). No
>> >prepare phase needed.
>> >
>> >Now, I have local patches to enable strict 802.1Q mode in these switches
>> >(all the logic is based on the hardware VLAN table). But it does not use
>> >per-port FDB, so fdb_add with VID 0 doesn't make sense anymore. That's
>> >why we need to push the feature checking down to the drivers in DSA.
>> >
>> >I have another pending patch to add .port_fdb_pre_add, where mv88e6xxx
>> >code will return -EOPNOTSUPP if the given VID is 0.
>> >
>> >Another example: mv88e6xxx support tagged VLANs, so no hardware check
>> >needed. But the Broadcom Starfighter 2 only supports port-based VLANs
>> >(which is today wrongly implemented through "bridge_join/leave"). By
>> >implementing .port_vlan_pre_add (another pending patch for DSA), the
>> >driver will be able to return -EOPNOTSUPP if !BRIDGE_VLAN_INFO_PVID.
>> >
>> >Also, having logic in switchdev drivers to check SWITCHDEV_TRANS_NONE
>> >and SWITCHDEV_TRANS_ABORT is not really nice. Having switchdev handle
>> >the abort phase (calling each destructor) and getting rid of the
>> >SWITCHDEV_TRANS_* flags sounds better to me.
>> 
>> Agree, if pre/commit is going to be in one function, we should have
>> only prepare/commit enums. It can be carried around as a single bool
>> value in switchdev_trans structure. Will include that in my transaction
>> patchset.
>> 
>> 
>> >
>> >> >Also, the patchset removes the ABORT callback in case of a rollback
>> >> >due to a failed PREPARE.  We can't make the assumption that it's just
>> >> >a memory list to destroy on ABORT.  The driver, on PREPARE, may have
>> >> >reserved device space or staged an operation on the device which we'll
>> >> >need to undo on ABORT.
>> >> 
>> >> Yep, just register an item with custom destructor, there you can do
>> >> whatever. Also, I believe much nicer comparing to current code.
>> >> 
>> >> 
>> >> >
>> >> >So we need ABORT back, and we need PREPARE to not be optional, so
>> >> >what's left list enqueue/dequeue helpers, which I'm not seeing much
>> >> >value in up-leveling as the driver can do list_add/del itself.
>> >> 
>> >> Why would every driver do it itself, over and over when there can be a
>> >> clean infrastructure to do that. Including abort phase. Without the driver
>> >> needed to be involved.
>> >
>> >Maybe the term ".destructor" has a too strong meaning to deallocation,
>> >but you can indeed do whatever you need in this function.
>> 
>> It is a destructor. Don't know about a better name, suggestions?
>
>Nope, I'm personally fine with this term.
>
>> >
>> >> >Am I missing something?  I didn't see a motivation statement for the
>> >> >RFC so I'm not sure where you wanted to take this.
>> >> 
>> >> I want to make current code much nicer, easier to read and implement in
>> >> other drivers. Look at rocker.c and how often there is == PREP

Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-21 Thread Vivien Didelot
Hi Jiri,

On Sep. Monday 21 (39) 08:25 PM, Jiri Pirko wrote:
> Mon, Sep 21, 2015 at 07:13:58PM CEST, vivien.dide...@savoirfairelinux.com 
> wrote:
> >Hi Jiri, Scott,
> >
> >On Sep. Monday 21 (39) 10:09 AM, Jiri Pirko wrote:
> >> Mon, Sep 21, 2015 at 09:23:24AM CEST, sfel...@gmail.com wrote:
> >> >On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko  wrote:
> >> >> Jiri Pirko (6):
> >> >>   switchdev: rename "trans" to "trans_ph".
> >> >>   switchdev: introduce transaction infrastructure for attr_set and
> >> >> obj_add
> >> >>   rocker: switch to local transaction phase enum
> >> >>   switchdev: move transaction phase enum under transaction structure
> >> >>   rocker: use switchdev transaction queue for allocated memory
> >> >>   switchdev: split commit and prepare phase into two callbacks
> >> >
> >> >Patches compile, but first test bombs.  Cut-and-paste of dump at end
> >> >of this email.
> >> 
> >> Told you :)
> >> 
> >> 
> >> >
> >> >I'm not sure I'm liking this patchset because it looks like a way for
> >> >switchdev drivers to easily opt-out of the prepare-commit transaction
> >> >model by simply not implementing the *_pre op.  I would rather drivers
> >> >explicitly handle the PREPARE phase in code, even if that means
> >> >skipping it gracefully (in code) with a comment (in code) explaining
> >> >why it does not matter for this device/operation.  That's what DSA had
> >> >done, mostly because it was a retro-fit.
> >> 
> >> Each driver should handle this inside it. If it does not need prepare
> >> state, it simply does not implement it. That is the same for all cb,
> >> ndos, netdev notifiers, etc. It is much cleaner and nicer to have these as
> >> separate callbacks. Implementing multiple callback in one is just ugly,
> >> sorry.
> >
> >This is true, (in DSA) we don't have to implement the prepare phase if
> >we fully support the feature in hardware.
> >
> >To give a real example, Marvell switch drivers currently implement all
> >add/del/dump calls for VLAN FDB (where VID 0 means the port itself). No
> >prepare phase needed.
> >
> >Now, I have local patches to enable strict 802.1Q mode in these switches
> >(all the logic is based on the hardware VLAN table). But it does not use
> >per-port FDB, so fdb_add with VID 0 doesn't make sense anymore. That's
> >why we need to push the feature checking down to the drivers in DSA.
> >
> >I have another pending patch to add .port_fdb_pre_add, where mv88e6xxx
> >code will return -EOPNOTSUPP if the given VID is 0.
> >
> >Another example: mv88e6xxx support tagged VLANs, so no hardware check
> >needed. But the Broadcom Starfighter 2 only supports port-based VLANs
> >(which is today wrongly implemented through "bridge_join/leave"). By
> >implementing .port_vlan_pre_add (another pending patch for DSA), the
> >driver will be able to return -EOPNOTSUPP if !BRIDGE_VLAN_INFO_PVID.
> >
> >Also, having logic in switchdev drivers to check SWITCHDEV_TRANS_NONE
> >and SWITCHDEV_TRANS_ABORT is not really nice. Having switchdev handle
> >the abort phase (calling each destructor) and getting rid of the
> >SWITCHDEV_TRANS_* flags sounds better to me.
> 
> Agree, if pre/commit is going to be in one function, we should have
> only prepare/commit enums. It can be carried around as a single bool
> value in switchdev_trans structure. Will include that in my transaction
> patchset.
> 
> 
> >
> >> >Also, the patchset removes the ABORT callback in case of a rollback
> >> >due to a failed PREPARE.  We can't make the assumption that it's just
> >> >a memory list to destroy on ABORT.  The driver, on PREPARE, may have
> >> >reserved device space or staged an operation on the device which we'll
> >> >need to undo on ABORT.
> >> 
> >> Yep, just register an item with custom destructor, there you can do
> >> whatever. Also, I believe much nicer comparing to current code.
> >> 
> >> 
> >> >
> >> >So we need ABORT back, and we need PREPARE to not be optional, so
> >> >what's left list enqueue/dequeue helpers, which I'm not seeing much
> >> >value in up-leveling as the driver can do list_add/del itself.
> >> 
> >> Why would every driver do it itself, over and over when there can be a
> >> clean infrastructure to do that. Including abort phase. Without the driver
> >> needed to be involved.
> >
> >Maybe the term ".destructor" has a too strong meaning to deallocation,
> >but you can indeed do whatever you need in this function.
> 
> It is a destructor. Don't know about a better name, suggestions?

Nope, I'm personally fine with this term.

> >
> >> >Am I missing something?  I didn't see a motivation statement for the
> >> >RFC so I'm not sure where you wanted to take this.
> >> 
> >> I want to make current code much nicer, easier to read and implement in
> >> other drivers. Look at rocker.c and how often there is == PREPARE there.
> >> It's nearly impossible to followthe code, sorry.
> >> 
> >> My next patchset is to un-mess rocker.c (that freaking ofdpa stuff is
> >> everywhere)
> >
> >I'm basically fo

Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-21 Thread Jiri Pirko
Mon, Sep 21, 2015 at 06:34:33PM CEST, sfel...@gmail.com wrote:
>On Mon, Sep 21, 2015 at 1:09 AM, Jiri Pirko  wrote:
>> Mon, Sep 21, 2015 at 09:23:24AM CEST, sfel...@gmail.com wrote:
>>>On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko  wrote:
 Jiri Pirko (6):
   switchdev: rename "trans" to "trans_ph".
   switchdev: introduce transaction infrastructure for attr_set and
 obj_add
   rocker: switch to local transaction phase enum
   switchdev: move transaction phase enum under transaction structure
   rocker: use switchdev transaction queue for allocated memory
   switchdev: split commit and prepare phase into two callbacks
>>>
>>>Patches compile, but first test bombs.  Cut-and-paste of dump at end
>>>of this email.
>>
>> Told you :)
>>
>>
>>>
>>>I'm not sure I'm liking this patchset because it looks like a way for
>>>switchdev drivers to easily opt-out of the prepare-commit transaction
>>>model by simply not implementing the *_pre op.  I would rather drivers
>>>explicitly handle the PREPARE phase in code, even if that means
>>>skipping it gracefully (in code) with a comment (in code) explaining
>>>why it does not matter for this device/operation.  That's what DSA had
>>>done, mostly because it was a retro-fit.
>>
>> Each driver should handle this inside it. If it does not need prepare
>> state, it simply does not implement it. That is the same for all cb,
>> ndos, netdev notifiers, etc. It is much cleaner and nicer to have these as
>> separate callbacks. Implementing multiple callback in one is just ugly,
>> sorry.
>>
>>
>>>
>>>Also, the patchset removes the ABORT callback in case of a rollback
>>>due to a failed PREPARE.  We can't make the assumption that it's just
>>>a memory list to destroy on ABORT.  The driver, on PREPARE, may have
>>>reserved device space or staged an operation on the device which we'll
>>>need to undo on ABORT.
>>
>> Yep, just register an item with custom destructor, there you can do
>> whatever. Also, I believe much nicer comparing to current code.
>>
>>
>>>
>>>So we need ABORT back, and we need PREPARE to not be optional, so
>>>what's left list enqueue/dequeue helpers, which I'm not seeing much
>>>value in up-leveling as the driver can do list_add/del itself.
>>
>> Why would every driver do it itself, over and over when there can be a
>> clean infrastructure to do that. Including abort phase. Without the driver
>> needed to be involved.
>>
>>
>>>
>>>Am I missing something?  I didn't see a motivation statement for the
>>>RFC so I'm not sure where you wanted to take this.
>>
>> I want to make current code much nicer, easier to read and implement in
>> other drivers. Look at rocker.c and how often there is == PREPARE there.
>> It's nearly impossible to followthe code, sorry.
>
>You're going to end up duplicate a lot of code and make it easier to
>make mistakes.  The reason why the == PREPARE checks are there is the
>same code is run for both PREPARE and COMMIT, with == PREPARE checks
>right at the last moment to avoid a irreversible change, such as
>deleting an entry from a table or changing a state variable or bumping
>a hardware desc pointer.  Otherwise the code paths should be the same
>for PREPARE and COMMIT.  If you split these, you're going to duplicate
>lots of code and one path will deviate from the other over time and
>now we've broken the prepare-commit model.  We want PREPARE to answer
>the question "is it OK to do this?" and COMMIT to do without failure.

Okay, I'll leave the last patch out for now. Let's see what the
future brings :)

I really don't like "NONE" phase, does not make sense for switchdev code
and it is just a hack for rocker. So I will kick it out from switchdev
anyway.

"ABORT" phase also does not allign to what you say about
one-code-two-passes and I think that my patchset is actually handling
abort nicely.

That leaves only "PREPARE" and "COMMIT"

Makes sense?

>
>Do you have a specific example where you can't follow the code with == PREPARE?
>
>> My next patchset is to un-mess rocker.c (that freaking ofdpa stuff is 
>> everywhere)
>
>Rocker implements OF-DPA so it's seems natural that freaking OF-DPA
>stuff would be everywhere.

Well, no. It is only one of many possible worlds. In qemu, the split is
done nicely. However in kernel, the ofdpa stuff is squashed to
the generic rocker stuff. I have a patchset for splitting
that under construction for some time. This week I have some time
allocated for that so I hopefully will send the patchset
this Thursday (fingers crossed).

Thanks for review!


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-21 Thread Jiri Pirko
Mon, Sep 21, 2015 at 07:13:58PM CEST, vivien.dide...@savoirfairelinux.com wrote:
>Hi Jiri, Scott,
>
>On Sep. Monday 21 (39) 10:09 AM, Jiri Pirko wrote:
>> Mon, Sep 21, 2015 at 09:23:24AM CEST, sfel...@gmail.com wrote:
>> >On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko  wrote:
>> >> Jiri Pirko (6):
>> >>   switchdev: rename "trans" to "trans_ph".
>> >>   switchdev: introduce transaction infrastructure for attr_set and
>> >> obj_add
>> >>   rocker: switch to local transaction phase enum
>> >>   switchdev: move transaction phase enum under transaction structure
>> >>   rocker: use switchdev transaction queue for allocated memory
>> >>   switchdev: split commit and prepare phase into two callbacks
>> >
>> >Patches compile, but first test bombs.  Cut-and-paste of dump at end
>> >of this email.
>> 
>> Told you :)
>> 
>> 
>> >
>> >I'm not sure I'm liking this patchset because it looks like a way for
>> >switchdev drivers to easily opt-out of the prepare-commit transaction
>> >model by simply not implementing the *_pre op.  I would rather drivers
>> >explicitly handle the PREPARE phase in code, even if that means
>> >skipping it gracefully (in code) with a comment (in code) explaining
>> >why it does not matter for this device/operation.  That's what DSA had
>> >done, mostly because it was a retro-fit.
>> 
>> Each driver should handle this inside it. If it does not need prepare
>> state, it simply does not implement it. That is the same for all cb,
>> ndos, netdev notifiers, etc. It is much cleaner and nicer to have these as
>> separate callbacks. Implementing multiple callback in one is just ugly,
>> sorry.
>
>This is true, (in DSA) we don't have to implement the prepare phase if
>we fully support the feature in hardware.
>
>To give a real example, Marvell switch drivers currently implement all
>add/del/dump calls for VLAN FDB (where VID 0 means the port itself). No
>prepare phase needed.
>
>Now, I have local patches to enable strict 802.1Q mode in these switches
>(all the logic is based on the hardware VLAN table). But it does not use
>per-port FDB, so fdb_add with VID 0 doesn't make sense anymore. That's
>why we need to push the feature checking down to the drivers in DSA.
>
>I have another pending patch to add .port_fdb_pre_add, where mv88e6xxx
>code will return -EOPNOTSUPP if the given VID is 0.
>
>Another example: mv88e6xxx support tagged VLANs, so no hardware check
>needed. But the Broadcom Starfighter 2 only supports port-based VLANs
>(which is today wrongly implemented through "bridge_join/leave"). By
>implementing .port_vlan_pre_add (another pending patch for DSA), the
>driver will be able to return -EOPNOTSUPP if !BRIDGE_VLAN_INFO_PVID.
>
>Also, having logic in switchdev drivers to check SWITCHDEV_TRANS_NONE
>and SWITCHDEV_TRANS_ABORT is not really nice. Having switchdev handle
>the abort phase (calling each destructor) and getting rid of the
>SWITCHDEV_TRANS_* flags sounds better to me.

Agree, if pre/commit is going to be in one function, we should have
only prepare/commit enums. It can be carried around as a single bool
value in switchdev_trans structure. Will include that in my transaction
patchset.


>
>> >Also, the patchset removes the ABORT callback in case of a rollback
>> >due to a failed PREPARE.  We can't make the assumption that it's just
>> >a memory list to destroy on ABORT.  The driver, on PREPARE, may have
>> >reserved device space or staged an operation on the device which we'll
>> >need to undo on ABORT.
>> 
>> Yep, just register an item with custom destructor, there you can do
>> whatever. Also, I believe much nicer comparing to current code.
>> 
>> 
>> >
>> >So we need ABORT back, and we need PREPARE to not be optional, so
>> >what's left list enqueue/dequeue helpers, which I'm not seeing much
>> >value in up-leveling as the driver can do list_add/del itself.
>> 
>> Why would every driver do it itself, over and over when there can be a
>> clean infrastructure to do that. Including abort phase. Without the driver
>> needed to be involved.
>
>Maybe the term ".destructor" has a too strong meaning to deallocation,
>but you can indeed do whatever you need in this function.

It is a destructor. Don't know about a better name, suggestions?

>
>> >Am I missing something?  I didn't see a motivation statement for the
>> >RFC so I'm not sure where you wanted to take this.
>> 
>> I want to make current code much nicer, easier to read and implement in
>> other drivers. Look at rocker.c and how often there is == PREPARE there.
>> It's nearly impossible to followthe code, sorry.
>> 
>> My next patchset is to un-mess rocker.c (that freaking ofdpa stuff is
>> everywhere)
>
>I'm basically for this patchset, but I do have some concerns about the
>general switchdev transaction design.
>
>I certainly don't have the perspective that you guys have, so please
>tell me if I'm totally wrong. From my point of view (with DSA drivers),
>the emphase should be done on asking the hardware if it can handle or
>not 

Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-21 Thread Vivien Didelot
Hi Jiri, Scott,

On Sep. Monday 21 (39) 10:09 AM, Jiri Pirko wrote:
> Mon, Sep 21, 2015 at 09:23:24AM CEST, sfel...@gmail.com wrote:
> >On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko  wrote:
> >> Jiri Pirko (6):
> >>   switchdev: rename "trans" to "trans_ph".
> >>   switchdev: introduce transaction infrastructure for attr_set and
> >> obj_add
> >>   rocker: switch to local transaction phase enum
> >>   switchdev: move transaction phase enum under transaction structure
> >>   rocker: use switchdev transaction queue for allocated memory
> >>   switchdev: split commit and prepare phase into two callbacks
> >
> >Patches compile, but first test bombs.  Cut-and-paste of dump at end
> >of this email.
> 
> Told you :)
> 
> 
> >
> >I'm not sure I'm liking this patchset because it looks like a way for
> >switchdev drivers to easily opt-out of the prepare-commit transaction
> >model by simply not implementing the *_pre op.  I would rather drivers
> >explicitly handle the PREPARE phase in code, even if that means
> >skipping it gracefully (in code) with a comment (in code) explaining
> >why it does not matter for this device/operation.  That's what DSA had
> >done, mostly because it was a retro-fit.
> 
> Each driver should handle this inside it. If it does not need prepare
> state, it simply does not implement it. That is the same for all cb,
> ndos, netdev notifiers, etc. It is much cleaner and nicer to have these as
> separate callbacks. Implementing multiple callback in one is just ugly,
> sorry.

This is true, (in DSA) we don't have to implement the prepare phase if
we fully support the feature in hardware.

To give a real example, Marvell switch drivers currently implement all
add/del/dump calls for VLAN FDB (where VID 0 means the port itself). No
prepare phase needed.

Now, I have local patches to enable strict 802.1Q mode in these switches
(all the logic is based on the hardware VLAN table). But it does not use
per-port FDB, so fdb_add with VID 0 doesn't make sense anymore. That's
why we need to push the feature checking down to the drivers in DSA.

I have another pending patch to add .port_fdb_pre_add, where mv88e6xxx
code will return -EOPNOTSUPP if the given VID is 0.

Another example: mv88e6xxx support tagged VLANs, so no hardware check
needed. But the Broadcom Starfighter 2 only supports port-based VLANs
(which is today wrongly implemented through "bridge_join/leave"). By
implementing .port_vlan_pre_add (another pending patch for DSA), the
driver will be able to return -EOPNOTSUPP if !BRIDGE_VLAN_INFO_PVID.

Also, having logic in switchdev drivers to check SWITCHDEV_TRANS_NONE
and SWITCHDEV_TRANS_ABORT is not really nice. Having switchdev handle
the abort phase (calling each destructor) and getting rid of the
SWITCHDEV_TRANS_* flags sounds better to me.

> >Also, the patchset removes the ABORT callback in case of a rollback
> >due to a failed PREPARE.  We can't make the assumption that it's just
> >a memory list to destroy on ABORT.  The driver, on PREPARE, may have
> >reserved device space or staged an operation on the device which we'll
> >need to undo on ABORT.
> 
> Yep, just register an item with custom destructor, there you can do
> whatever. Also, I believe much nicer comparing to current code.
> 
> 
> >
> >So we need ABORT back, and we need PREPARE to not be optional, so
> >what's left list enqueue/dequeue helpers, which I'm not seeing much
> >value in up-leveling as the driver can do list_add/del itself.
> 
> Why would every driver do it itself, over and over when there can be a
> clean infrastructure to do that. Including abort phase. Without the driver
> needed to be involved.

Maybe the term ".destructor" has a too strong meaning to deallocation,
but you can indeed do whatever you need in this function.

> >Am I missing something?  I didn't see a motivation statement for the
> >RFC so I'm not sure where you wanted to take this.
> 
> I want to make current code much nicer, easier to read and implement in
> other drivers. Look at rocker.c and how often there is == PREPARE there.
> It's nearly impossible to followthe code, sorry.
> 
> My next patchset is to un-mess rocker.c (that freaking ofdpa stuff is
> everywhere)

I'm basically for this patchset, but I do have some concerns about the
general switchdev transaction design.

I certainly don't have the perspective that you guys have, so please
tell me if I'm totally wrong. From my point of view (with DSA drivers),
the emphase should be done on asking the hardware if it can handle or
not a given transaction (not simply an hardware feature!).

The driver can handle any allocation and preparation itself, and also
errors can actually occur *during* a commit phase.

Being able to return -EOPNOTSUPP from any add/del/dump function would be
just perfect.

I ideally imagine the following implementation in switchdev drivers:

int foo_port_obj_add(struct net_device *dev, int id, void *obj)
{
struct foo *foo = netdev_priv(dev);
struct s

Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-21 Thread Scott Feldman
On Mon, Sep 21, 2015 at 1:09 AM, Jiri Pirko  wrote:
> Mon, Sep 21, 2015 at 09:23:24AM CEST, sfel...@gmail.com wrote:
>>On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko  wrote:
>>> Jiri Pirko (6):
>>>   switchdev: rename "trans" to "trans_ph".
>>>   switchdev: introduce transaction infrastructure for attr_set and
>>> obj_add
>>>   rocker: switch to local transaction phase enum
>>>   switchdev: move transaction phase enum under transaction structure
>>>   rocker: use switchdev transaction queue for allocated memory
>>>   switchdev: split commit and prepare phase into two callbacks
>>
>>Patches compile, but first test bombs.  Cut-and-paste of dump at end
>>of this email.
>
> Told you :)
>
>
>>
>>I'm not sure I'm liking this patchset because it looks like a way for
>>switchdev drivers to easily opt-out of the prepare-commit transaction
>>model by simply not implementing the *_pre op.  I would rather drivers
>>explicitly handle the PREPARE phase in code, even if that means
>>skipping it gracefully (in code) with a comment (in code) explaining
>>why it does not matter for this device/operation.  That's what DSA had
>>done, mostly because it was a retro-fit.
>
> Each driver should handle this inside it. If it does not need prepare
> state, it simply does not implement it. That is the same for all cb,
> ndos, netdev notifiers, etc. It is much cleaner and nicer to have these as
> separate callbacks. Implementing multiple callback in one is just ugly,
> sorry.
>
>
>>
>>Also, the patchset removes the ABORT callback in case of a rollback
>>due to a failed PREPARE.  We can't make the assumption that it's just
>>a memory list to destroy on ABORT.  The driver, on PREPARE, may have
>>reserved device space or staged an operation on the device which we'll
>>need to undo on ABORT.
>
> Yep, just register an item with custom destructor, there you can do
> whatever. Also, I believe much nicer comparing to current code.
>
>
>>
>>So we need ABORT back, and we need PREPARE to not be optional, so
>>what's left list enqueue/dequeue helpers, which I'm not seeing much
>>value in up-leveling as the driver can do list_add/del itself.
>
> Why would every driver do it itself, over and over when there can be a
> clean infrastructure to do that. Including abort phase. Without the driver
> needed to be involved.
>
>
>>
>>Am I missing something?  I didn't see a motivation statement for the
>>RFC so I'm not sure where you wanted to take this.
>
> I want to make current code much nicer, easier to read and implement in
> other drivers. Look at rocker.c and how often there is == PREPARE there.
> It's nearly impossible to followthe code, sorry.

You're going to end up duplicate a lot of code and make it easier to
make mistakes.  The reason why the == PREPARE checks are there is the
same code is run for both PREPARE and COMMIT, with == PREPARE checks
right at the last moment to avoid a irreversible change, such as
deleting an entry from a table or changing a state variable or bumping
a hardware desc pointer.  Otherwise the code paths should be the same
for PREPARE and COMMIT.  If you split these, you're going to duplicate
lots of code and one path will deviate from the other over time and
now we've broken the prepare-commit model.  We want PREPARE to answer
the question "is it OK to do this?" and COMMIT to do without failure.

Do you have a specific example where you can't follow the code with == PREPARE?

> My next patchset is to un-mess rocker.c (that freaking ofdpa stuff is 
> everywhere)

Rocker implements OF-DPA so it's seems natural that freaking OF-DPA
stuff would be everywhere.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-21 Thread Jiri Pirko
Mon, Sep 21, 2015 at 09:23:24AM CEST, sfel...@gmail.com wrote:
>On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko  wrote:
>> Jiri Pirko (6):
>>   switchdev: rename "trans" to "trans_ph".
>>   switchdev: introduce transaction infrastructure for attr_set and
>> obj_add
>>   rocker: switch to local transaction phase enum
>>   switchdev: move transaction phase enum under transaction structure
>>   rocker: use switchdev transaction queue for allocated memory
>>   switchdev: split commit and prepare phase into two callbacks
>
>Patches compile, but first test bombs.  Cut-and-paste of dump at end
>of this email.

Told you :)


>
>I'm not sure I'm liking this patchset because it looks like a way for
>switchdev drivers to easily opt-out of the prepare-commit transaction
>model by simply not implementing the *_pre op.  I would rather drivers
>explicitly handle the PREPARE phase in code, even if that means
>skipping it gracefully (in code) with a comment (in code) explaining
>why it does not matter for this device/operation.  That's what DSA had
>done, mostly because it was a retro-fit.

Each driver should handle this inside it. If it does not need prepare
state, it simply does not implement it. That is the same for all cb,
ndos, netdev notifiers, etc. It is much cleaner and nicer to have these as
separate callbacks. Implementing multiple callback in one is just ugly,
sorry.


>
>Also, the patchset removes the ABORT callback in case of a rollback
>due to a failed PREPARE.  We can't make the assumption that it's just
>a memory list to destroy on ABORT.  The driver, on PREPARE, may have
>reserved device space or staged an operation on the device which we'll
>need to undo on ABORT.

Yep, just register an item with custom destructor, there you can do
whatever. Also, I believe much nicer comparing to current code.


>
>So we need ABORT back, and we need PREPARE to not be optional, so
>what's left list enqueue/dequeue helpers, which I'm not seeing much
>value in up-leveling as the driver can do list_add/del itself.

Why would every driver do it itself, over and over when there can be a
clean infrastructure to do that. Including abort phase. Without the driver
needed to be involved.


>
>Am I missing something?  I didn't see a motivation statement for the
>RFC so I'm not sure where you wanted to take this.

I want to make current code much nicer, easier to read and implement in
other drivers. Look at rocker.c and how often there is == PREPARE there.
It's nearly impossible to followthe code, sorry.

My next patchset is to un-mess rocker.c (that freaking ofdpa stuff is 
everywhere)


>
>-scott
>
>
>
>[1.998791] BUG: unable to handle kernel NULL pointer dereference
>at 0008
>[2.05] IP: []
>rocker_port_kzalloc.isra.50+0x5/0x10 [rocker]
>[2.05] PGD 0
>[2.05] Oops:  [#1] SMP
>[2.05] Modules linked in: floppy(+) ata_piix(+) libata
>rocker(+) virtio_pci(+) virtio_ring virtio scsi_mod
>[2.05] CPU: 0 PID: 91 Comm: modprobe Not tainted 4.2.0+ #3
>[2.05] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>[2.05] task: 88000f5e0800 ti: 88000f5f8000 task.ti:
>88000f5f8000
>[2.05] RIP: 0010:[]  []
>rocker_port_kzalloc.isra.50+0x5/0x10 [rocker]
>[2.05] RSP: 0018:88000f5fba20  EFLAGS: 00010246
>[2.05] RAX: 88000f17c050 RBX: 88000b40 RCX: 
>0020
>[2.05] RDX:  RSI:  RDI: 
>
>[2.05] RBP:  R08:  R09: 
>a0058680
>[2.05] R10: 0001 R11: 9319 R12: 
>
>[2.05] R13: 88000f5b7000 R14: 88000f5b7840 R15: 
>
>[2.05] FS:  7fd132237700() GS:88000fc0()
>knlGS:
>[2.05] CS:  0010 DS:  ES:  CR0: 8005003b
>[2.05] CR2: 0008 CR3: 0f5d5000 CR4: 
>06f0
>[2.05] Stack:
>[2.05]  a005a32a 0001 00ff88000f5b7000
>8175725a
>[2.05]  a0057750  a0058680
>0020001a
>[2.05]  88000b40  88000ea06000
>88000f5b7000
>[2.05] Call Trace:
>[2.05]  [] ? rocker_cmd_exec+0x5a/0x1f0 [rocker]
>[2.05]  [] ?
>rocker_cmd_get_port_stats_prep+0x90/0x90 [rocker]
>[2.05]  [] ?
>rocker_cmd_get_port_settings_phys_name_proc+0xb0/0xb0 [rocker]
>[2.05]  [] ? rocker_probe+0xb6a/0xe59 [rocker]
>[2.05]  [] ? local_pci_probe+0x48/0xa0
>[2.05]  [] ? pci_device_probe+0x112/0x130
>[2.05]  [] ? driver_probe_device+0x196/0x2a0
>[2.05]  [] ? driver_probe_device+0x2a0/0x2a0
>[2.05]  [] ? __driver_attach+0x8d/0x90
>[2.05]  [] ? driver_probe_device+0x2a0/0x2a0
>[2.05]  [] ? bus_for_each_dev+0x4c/0x80
>[2.00

Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-21 Thread Scott Feldman
On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko  wrote:
> Jiri Pirko (6):
>   switchdev: rename "trans" to "trans_ph".
>   switchdev: introduce transaction infrastructure for attr_set and
> obj_add
>   rocker: switch to local transaction phase enum
>   switchdev: move transaction phase enum under transaction structure
>   rocker: use switchdev transaction queue for allocated memory
>   switchdev: split commit and prepare phase into two callbacks

Patches compile, but first test bombs.  Cut-and-paste of dump at end
of this email.

I'm not sure I'm liking this patchset because it looks like a way for
switchdev drivers to easily opt-out of the prepare-commit transaction
model by simply not implementing the *_pre op.  I would rather drivers
explicitly handle the PREPARE phase in code, even if that means
skipping it gracefully (in code) with a comment (in code) explaining
why it does not matter for this device/operation.  That's what DSA had
done, mostly because it was a retro-fit.

Also, the patchset removes the ABORT callback in case of a rollback
due to a failed PREPARE.  We can't make the assumption that it's just
a memory list to destroy on ABORT.  The driver, on PREPARE, may have
reserved device space or staged an operation on the device which we'll
need to undo on ABORT.

So we need ABORT back, and we need PREPARE to not be optional, so
what's left list enqueue/dequeue helpers, which I'm not seeing much
value in up-leveling as the driver can do list_add/del itself.

Am I missing something?  I didn't see a motivation statement for the
RFC so I'm not sure where you wanted to take this.

-scott



[1.998791] BUG: unable to handle kernel NULL pointer dereference
at 0008
[2.05] IP: []
rocker_port_kzalloc.isra.50+0x5/0x10 [rocker]
[2.05] PGD 0
[2.05] Oops:  [#1] SMP
[2.05] Modules linked in: floppy(+) ata_piix(+) libata
rocker(+) virtio_pci(+) virtio_ring virtio scsi_mod
[2.05] CPU: 0 PID: 91 Comm: modprobe Not tainted 4.2.0+ #3
[2.05] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[2.05] task: 88000f5e0800 ti: 88000f5f8000 task.ti:
88000f5f8000
[2.05] RIP: 0010:[]  []
rocker_port_kzalloc.isra.50+0x5/0x10 [rocker]
[2.05] RSP: 0018:88000f5fba20  EFLAGS: 00010246
[2.05] RAX: 88000f17c050 RBX: 88000b40 RCX: 0020
[2.05] RDX:  RSI:  RDI: 
[2.05] RBP:  R08:  R09: a0058680
[2.05] R10: 0001 R11: 9319 R12: 
[2.05] R13: 88000f5b7000 R14: 88000f5b7840 R15: 
[2.05] FS:  7fd132237700() GS:88000fc0()
knlGS:
[2.05] CS:  0010 DS:  ES:  CR0: 8005003b
[2.05] CR2: 0008 CR3: 0f5d5000 CR4: 06f0
[2.05] Stack:
[2.05]  a005a32a 0001 00ff88000f5b7000
8175725a
[2.05]  a0057750  a0058680
0020001a
[2.05]  88000b40  88000ea06000
88000f5b7000
[2.05] Call Trace:
[2.05]  [] ? rocker_cmd_exec+0x5a/0x1f0 [rocker]
[2.05]  [] ?
rocker_cmd_get_port_stats_prep+0x90/0x90 [rocker]
[2.05]  [] ?
rocker_cmd_get_port_settings_phys_name_proc+0xb0/0xb0 [rocker]
[2.05]  [] ? rocker_probe+0xb6a/0xe59 [rocker]
[2.05]  [] ? local_pci_probe+0x48/0xa0
[2.05]  [] ? pci_device_probe+0x112/0x130
[2.05]  [] ? driver_probe_device+0x196/0x2a0
[2.05]  [] ? driver_probe_device+0x2a0/0x2a0
[2.05]  [] ? __driver_attach+0x8d/0x90
[2.05]  [] ? driver_probe_device+0x2a0/0x2a0
[2.05]  [] ? bus_for_each_dev+0x4c/0x80
[2.05]  [] ? bus_add_driver+0x119/0x220
[2.05]  [] ? 0xa0065000
[2.05]  [] ? driver_register+0x5a/0xe0
[2.05]  [] ? 0xa0065000
[2.05]  [] ? rocker_module_init+0x33/0x1000 [rocker]
[2.05]  [] ? do_one_initcall+0x90/0x1e0
[2.05]  [] ? do_init_module+0x50/0x1d8
[2.05]  [] ? load_module+0x1c1c/0x2240
[2.05]  [] ? show_initstate+0x50/0x50
[2.05]  [] ? SyS_init_module+0x104/0x130
[2.05]  [] ? entry_SYSCALL_64_fastpath+0x16/0x75
[2.05] Code: 48 89 c1 48 c7 c2 10 dc 15 81 48 89 c6 e8 e4 0d
4a e1 48 8b 3b e8 1c 1b 4a e1 eb b6 66 2e 0f 1f 84 00 00 00 00 00 48
89 d1 89 f2 <8b> 77 08 e9 73 ff ff ff 0f 1f 00 48 83 ec 68 4c 89 7c 24
60 41
[2.05] RIP  []
rocker_port_kzalloc.isra.50+0x5/0x10 [rocker]
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-20 Thread Jiri Pirko
Sat, Sep 19, 2015 at 07:46:14PM CEST, sfel...@gmail.com wrote:
>On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko  wrote:
>> Jiri Pirko (6):
>>   switchdev: rename "trans" to "trans_ph".
>>   switchdev: introduce transaction infrastructure for attr_set and
>> obj_add
>>   rocker: switch to local transaction phase enum
>>   switchdev: move transaction phase enum under transaction structure
>>   rocker: use switchdev transaction queue for allocated memory
>>   switchdev: split commit and prepare phase into two callbacks
>
>Whew, that's a lot of work!  Seems like a good idea to up-level this
>for other drivers to share.  Let me apply the patches and run my tests
>and get back to you.

Expect some blow-ups. Untested. Will test, fix some spelling issues and
post v1, most probably during Monday.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-20 Thread Jiri Pirko
Sat, Sep 19, 2015 at 09:02:00PM CEST, vivien.dide...@savoirfairelinux.com wrote:
>Hi Jiri,
>
>On Sep. Saturday 19 (38) 06:23 PM, Jiri Pirko wrote:
>> Sat, Sep 19, 2015 at 03:35:51PM CEST, rami.ro...@intel.com wrote:
>> >Hi,
>> >
>> >>introduce tranction enfra and for pre-commit split
>> >
>> >Typo:
>> >Instead "tranction enfra" should be "transaction infrastructure".
>> 
>> Will fix. Thanks!
>
>Just being picky, there are a couple more typos in:
>
>2/6: s/separatelly/separately/
>6/6: s/nore/more/ and s/separete/separate/


Will fix. Thanks!

>
>
>Thanks,
>-v
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-19 Thread Vivien Didelot
Hi Jiri,

On Sep. Saturday 19 (38) 06:23 PM, Jiri Pirko wrote:
> Sat, Sep 19, 2015 at 03:35:51PM CEST, rami.ro...@intel.com wrote:
> >Hi,
> >
> >>introduce tranction enfra and for pre-commit split
> >
> >Typo:
> >Instead "tranction enfra" should be "transaction infrastructure".
> 
> Will fix. Thanks!

Just being picky, there are a couple more typos in:

2/6: s/separatelly/separately/
6/6: s/nore/more/ and s/separete/separate/

Thanks,
-v
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-19 Thread Scott Feldman
On Sat, Sep 19, 2015 at 5:29 AM, Jiri Pirko  wrote:
> Jiri Pirko (6):
>   switchdev: rename "trans" to "trans_ph".
>   switchdev: introduce transaction infrastructure for attr_set and
> obj_add
>   rocker: switch to local transaction phase enum
>   switchdev: move transaction phase enum under transaction structure
>   rocker: use switchdev transaction queue for allocated memory
>   switchdev: split commit and prepare phase into two callbacks

Whew, that's a lot of work!  Seems like a good idea to up-level this
for other drivers to share.  Let me apply the patches and run my tests
and get back to you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-19 Thread Jiri Pirko
Sat, Sep 19, 2015 at 03:35:51PM CEST, rami.ro...@intel.com wrote:
>Hi,
>
>>introduce tranction enfra and for pre-commit split
>
>Typo:
>Instead "tranction enfra" should be "transaction infrastructure".

Will fix. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-19 Thread Rosen, Rami
Hi,

>introduce tranction enfra and for pre-commit split

Typo:
Instead "tranction enfra" should be "transaction infrastructure".

Regards,
Rami Rosen
Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next RFC 0/6] switchdev: introduce tranction enfra and for pre-commit split

2015-09-19 Thread Jiri Pirko
Jiri Pirko (6):
  switchdev: rename "trans" to "trans_ph".
  switchdev: introduce transaction infrastructure for attr_set and
obj_add
  rocker: switch to local transaction phase enum
  switchdev: move transaction phase enum under transaction structure
  rocker: use switchdev transaction queue for allocated memory
  switchdev: split commit and prepare phase into two callbacks

 drivers/net/ethernet/rocker/rocker.c | 579 ++-
 include/net/switchdev.h  |  39 ++-
 net/dsa/slave.c  | 129 +---
 net/switchdev/switchdev.c| 153 +++--
 4 files changed, 545 insertions(+), 355 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html