RE: [PATCH 1/2] NET: Multiple queue network device support
> -Original Message- > From: Jarek Poplawski [mailto:[EMAIL PROTECTED] > Sent: Monday, March 12, 2007 1:58 AM > To: Thomas Graf > Cc: Kok, Auke-jan H; David Miller; Garzik, Jeff; > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > Waskiewicz Jr, Peter P; Brandeburg, Jesse; Kok, Auke; Ronciak, John > Subject: Re: [PATCH 1/2] NET: Multiple queue network device support > > On 09-03-2007 14:40, Thomas Graf wrote: > > * Kok, Auke <[EMAIL PROTECTED]> 2007-02-08 16:09 > >> diff --git a/net/core/dev.c b/net/core/dev.c index > 455d589..42b635c > >> 100644 > >> --- a/net/core/dev.c > >> +++ b/net/core/dev.c > >> @@ -1477,6 +1477,49 @@ gso: > >>skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS); > >> #endif > >>if (q->enqueue) { > >> +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE > >> + int queue_index; > >> + /* If we're a multi-queue device, get a queue > index to lock */ > >> + if (netif_is_multiqueue(dev)) > >> + { > >> + /* Get the queue index and lock it. */ > >> + if (likely(q->ops->map_queue)) { > >> + queue_index = q->ops->map_queue(skb, q); > >> + > spin_lock(>egress_subqueue[queue_index].queue_lock); > >> + rc = q->enqueue(skb, q); > > I'm not sure Dave Miller thought about this place, when he > proposed to save the mapping, but I think this could be not > enough. This place is racy: ->map_queue() is called 2 times > and with some filters (and > policies/actions) results could differ. And of course the > subqueue lock doesn't prevent any filter from a config change > in the meantime. > > After second reading of this patch I have doubts it's the > proper way to solve the problem: there are many subqueues but > we need a top queue (prio here) to mange them, anyway. So, > why not to build this functionality directly into the queue? > There is no difference for a device if skbs are going from > the subqueue or a class, it is only interested in the mapping > result and a possibility to stop and start a subqueue and to > query its status. All this could be done by adding the > callbacks directly to any classful scheduler or, if not > enough, to write some specialized qdisc based on prio. The > possibility to lock only a subqueue instead of a queue could > be analized independently - current proposal doesn't solve > this anyway. > > Regards, > Jarek P. > Thanks again for the feedback. Given some discussions I had last week in the office and the general feedback here, I'm going to remove the new per-queue locking and leave the start/stop functions for each queue and combine entry points for hard_start_xmit(). I'll get this out asap for review once it's been tested here. If we see issues in the future with lock contention on the queues, we can revisit the per-queue locking. Cheers, -PJ Waskiewicz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] NET: Multiple queue network device support
On 09-03-2007 14:40, Thomas Graf wrote: > * Kok, Auke <[EMAIL PROTECTED]> 2007-02-08 16:09 >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 455d589..42b635c 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -1477,6 +1477,49 @@ gso: >> skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS); >> #endif >> if (q->enqueue) { >> +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE >> +int queue_index; >> +/* If we're a multi-queue device, get a queue index to lock */ >> +if (netif_is_multiqueue(dev)) >> +{ >> +/* Get the queue index and lock it. */ >> +if (likely(q->ops->map_queue)) { >> +queue_index = q->ops->map_queue(skb, q); >> + >> spin_lock(>egress_subqueue[queue_index].queue_lock); >> +rc = q->enqueue(skb, q); I'm not sure Dave Miller thought about this place, when he proposed to save the mapping, but I think this could be not enough. This place is racy: ->map_queue() is called 2 times and with some filters (and policies/actions) results could differ. And of course the subqueue lock doesn't prevent any filter from a config change in the meantime. After second reading of this patch I have doubts it's the proper way to solve the problem: there are many subqueues but we need a top queue (prio here) to mange them, anyway. So, why not to build this functionality directly into the queue? There is no difference for a device if skbs are going from the subqueue or a class, it is only interested in the mapping result and a possibility to stop and start a subqueue and to query its status. All this could be done by adding the callbacks directly to any classful scheduler or, if not enough, to write some specialized qdisc based on prio. The possibility to lock only a subqueue instead of a queue could be analized independently - current proposal doesn't solve this anyway. Regards, Jarek P. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] NET: Multiple queue network device support
On 09-03-2007 14:40, Thomas Graf wrote: * Kok, Auke [EMAIL PROTECTED] 2007-02-08 16:09 diff --git a/net/core/dev.c b/net/core/dev.c index 455d589..42b635c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1477,6 +1477,49 @@ gso: skb-tc_verd = SET_TC_AT(skb-tc_verd,AT_EGRESS); #endif if (q-enqueue) { +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE +int queue_index; +/* If we're a multi-queue device, get a queue index to lock */ +if (netif_is_multiqueue(dev)) +{ +/* Get the queue index and lock it. */ +if (likely(q-ops-map_queue)) { +queue_index = q-ops-map_queue(skb, q); + spin_lock(dev-egress_subqueue[queue_index].queue_lock); +rc = q-enqueue(skb, q); I'm not sure Dave Miller thought about this place, when he proposed to save the mapping, but I think this could be not enough. This place is racy: -map_queue() is called 2 times and with some filters (and policies/actions) results could differ. And of course the subqueue lock doesn't prevent any filter from a config change in the meantime. After second reading of this patch I have doubts it's the proper way to solve the problem: there are many subqueues but we need a top queue (prio here) to mange them, anyway. So, why not to build this functionality directly into the queue? There is no difference for a device if skbs are going from the subqueue or a class, it is only interested in the mapping result and a possibility to stop and start a subqueue and to query its status. All this could be done by adding the callbacks directly to any classful scheduler or, if not enough, to write some specialized qdisc based on prio. The possibility to lock only a subqueue instead of a queue could be analized independently - current proposal doesn't solve this anyway. Regards, Jarek P. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] NET: Multiple queue network device support
-Original Message- From: Jarek Poplawski [mailto:[EMAIL PROTECTED] Sent: Monday, March 12, 2007 1:58 AM To: Thomas Graf Cc: Kok, Auke-jan H; David Miller; Garzik, Jeff; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Waskiewicz Jr, Peter P; Brandeburg, Jesse; Kok, Auke; Ronciak, John Subject: Re: [PATCH 1/2] NET: Multiple queue network device support On 09-03-2007 14:40, Thomas Graf wrote: * Kok, Auke [EMAIL PROTECTED] 2007-02-08 16:09 diff --git a/net/core/dev.c b/net/core/dev.c index 455d589..42b635c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1477,6 +1477,49 @@ gso: skb-tc_verd = SET_TC_AT(skb-tc_verd,AT_EGRESS); #endif if (q-enqueue) { +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + int queue_index; + /* If we're a multi-queue device, get a queue index to lock */ + if (netif_is_multiqueue(dev)) + { + /* Get the queue index and lock it. */ + if (likely(q-ops-map_queue)) { + queue_index = q-ops-map_queue(skb, q); + spin_lock(dev-egress_subqueue[queue_index].queue_lock); + rc = q-enqueue(skb, q); I'm not sure Dave Miller thought about this place, when he proposed to save the mapping, but I think this could be not enough. This place is racy: -map_queue() is called 2 times and with some filters (and policies/actions) results could differ. And of course the subqueue lock doesn't prevent any filter from a config change in the meantime. After second reading of this patch I have doubts it's the proper way to solve the problem: there are many subqueues but we need a top queue (prio here) to mange them, anyway. So, why not to build this functionality directly into the queue? There is no difference for a device if skbs are going from the subqueue or a class, it is only interested in the mapping result and a possibility to stop and start a subqueue and to query its status. All this could be done by adding the callbacks directly to any classful scheduler or, if not enough, to write some specialized qdisc based on prio. The possibility to lock only a subqueue instead of a queue could be analized independently - current proposal doesn't solve this anyway. Regards, Jarek P. Thanks again for the feedback. Given some discussions I had last week in the office and the general feedback here, I'm going to remove the new per-queue locking and leave the start/stop functions for each queue and combine entry points for hard_start_xmit(). I'll get this out asap for review once it's been tested here. If we see issues in the future with lock contention on the queues, we can revisit the per-queue locking. Cheers, -PJ Waskiewicz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] NET: Multiple queue network device support
> -Original Message- > From: Thomas Graf [mailto:[EMAIL PROTECTED] > Sent: Friday, March 09, 2007 6:35 PM > To: Waskiewicz Jr, Peter P > Cc: Kok, Auke-jan H; David Miller; Garzik, Jeff; > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > Brandeburg, Jesse; Kok, Auke; Ronciak, John > Subject: Re: [PATCH 1/2] NET: Multiple queue network device support > > * Waskiewicz Jr, Peter P <[EMAIL PROTECTED]> > 2007-03-09 15:27 > > That's the entire point of this extra locking. enqueue() > is going to > > put an skb into a band somewhere that maps to some queue, > and there is > > no way to guarantee the skb I retrieve from dequeue() is headed for > > the same queue. Therefore, I need to unlock the queue > after I finish > > enqueuing, since having that lock makes little sense to dequeue(). > > dequeue() will then grab *a* lock on a queue; it may be the > same one > > we had during enqueue(), but it may not be. And the > placement of the > > unlock of that queue is exactly where it happens in non-multiqueue, > > which is right before the hard_start_xmit(). > > The lock is already unlocked after dequeue, from your prio_dequeue(): > >if (netif_is_multiqueue(sch->dev)) { >queue = q->band2queue[prio]; >if > (spin_trylock(>dev->egress_subqueue[queue].queue_lock)) { >qdisc = q->queues[prio]; >skb = qdisc->dequeue(qdisc); >if (skb) { >sch->q.qlen--; >skb->priority = prio; > > spin_unlock(>dev->egress_subqueue[queue].queue_lock); >return skb; >} > > spin_unlock(>dev->egress_subqueue[queue].queue_lock); >} Ok, now I see what's wrong. Taking Dave M.'s recommendation to store the queue mapping in the skb will let me unlock the queue when dequeue() returns. I'll fix this locking issue; thanks for the feedback and persistent drilling into my thick head. -PJ Waskiewicz [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] NET: Multiple queue network device support
-Original Message- From: Thomas Graf [mailto:[EMAIL PROTECTED] Sent: Friday, March 09, 2007 6:35 PM To: Waskiewicz Jr, Peter P Cc: Kok, Auke-jan H; David Miller; Garzik, Jeff; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Brandeburg, Jesse; Kok, Auke; Ronciak, John Subject: Re: [PATCH 1/2] NET: Multiple queue network device support * Waskiewicz Jr, Peter P [EMAIL PROTECTED] 2007-03-09 15:27 That's the entire point of this extra locking. enqueue() is going to put an skb into a band somewhere that maps to some queue, and there is no way to guarantee the skb I retrieve from dequeue() is headed for the same queue. Therefore, I need to unlock the queue after I finish enqueuing, since having that lock makes little sense to dequeue(). dequeue() will then grab *a* lock on a queue; it may be the same one we had during enqueue(), but it may not be. And the placement of the unlock of that queue is exactly where it happens in non-multiqueue, which is right before the hard_start_xmit(). The lock is already unlocked after dequeue, from your prio_dequeue(): if (netif_is_multiqueue(sch-dev)) { queue = q-band2queue[prio]; if (spin_trylock(sch-dev-egress_subqueue[queue].queue_lock)) { qdisc = q-queues[prio]; skb = qdisc-dequeue(qdisc); if (skb) { sch-q.qlen--; skb-priority = prio; spin_unlock(sch-dev-egress_subqueue[queue].queue_lock); return skb; } spin_unlock(sch-dev-egress_subqueue[queue].queue_lock); } Ok, now I see what's wrong. Taking Dave M.'s recommendation to store the queue mapping in the skb will let me unlock the queue when dequeue() returns. I'll fix this locking issue; thanks for the feedback and persistent drilling into my thick head. -PJ Waskiewicz [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] NET: Multiple queue network device support
* Waskiewicz Jr, Peter P <[EMAIL PROTECTED]> 2007-03-09 15:27 > That's the entire point of this extra locking. enqueue() is going to > put an skb into a band somewhere that maps to some queue, and there is > no way to guarantee the skb I retrieve from dequeue() is headed for the > same queue. Therefore, I need to unlock the queue after I finish > enqueuing, since having that lock makes little sense to dequeue(). > dequeue() will then grab *a* lock on a queue; it may be the same one we > had during enqueue(), but it may not be. And the placement of the > unlock of that queue is exactly where it happens in non-multiqueue, > which is right before the hard_start_xmit(). The lock is already unlocked after dequeue, from your prio_dequeue(): if (netif_is_multiqueue(sch->dev)) { queue = q->band2queue[prio]; if (spin_trylock(>dev->egress_subqueue[queue].queue_lock)) { qdisc = q->queues[prio]; skb = qdisc->dequeue(qdisc); if (skb) { sch->q.qlen--; skb->priority = prio; spin_unlock(>dev->egress_subqueue[queue].queue_lock); return skb; } spin_unlock(>dev->egress_subqueue[queue].queue_lock); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] NET: Multiple queue network device support
> * Waskiewicz Jr, Peter P <[EMAIL PROTECTED]> > 2007-03-09 11:25 > > > > + } > > > > + } else { > > > > + /* We're not a multi-queue device. */ > > > > + spin_lock(>queue_lock); > > > > + q = dev->qdisc; > > > > + if (q->enqueue) { > > > > + rc = q->enqueue(skb, q); > > > > + qdisc_run(dev); > > > > + spin_unlock(>queue_lock); > > > > + rc = rc == NET_XMIT_BYPASS > > > > + ? > NET_XMIT_SUCCESS : rc; > > > > + goto out; > > > > + } > > > > + spin_unlock(>queue_lock); > > > > > > Please don't duplicate already existing code. > > > > I don't want to mix multiqueue and non-multiqueue code in > the transmit > > path. This was an attempt to allow MQ and non-MQ devices > to coexist > > in a machine having separate code paths. Are you suggesting to > > combine them? That would get very messy trying to > determine what type > > of lock to grab (subqueue lock or dev->queue_lock), not to mention > > grabbing the > > dev->queue_lock would block multiqueue devices in that same > codepath. > > The piece of code I quoted above is the branch executed if > multi queue is not enabled. The code you added is 100% > identical to the already existing enqueue logic. Just execute > the existing branch if multi queue is not enabled. > I totally agree this is identical code if multiqueue isn't enabled. However, when multiqueue is enabled, I don't want to make all network drivers implement the subqueue API just to be able to lock the queues. So the first thing I check is netif_is_multiqueue(dev), and if it *isn't* multiqueue, it will run the existing code. This way both non-multiqueue devices don't have to have any knowledge of the MQ API. > > This is another attempt to keep the two codepaths separate. > The only > > way I see of combining them is to check netif_is_multiqueue() > > everytime I need to grab a lock, which I think would be excessive. > > The code added is 100% identical to the existing code, just > be a little smarter on how you do the ifdefs. An example could be an on-board adapter isn't multiqueue, and an expansion card you have in your system is. If I handle multiqueue being on with just ifdef's, then I could use the expansion card, but not the on-board adapter as-is. The on-board driver would need to be updated to have 1 queue in the multiqueue context, which is what I tried to avoid. > > > Your modified qdisc_restart() expects the queue_lock to > be locked, > > > how can this work? > > > > No, it doesn't expect the lock to be held. Because of the multiple > > queues, enqueueing and dequeueing are now asynchronous, since I can > > enqueue to queue 0 while dequeuing from queue 1. dev->queue_lock > > isn't held, so this can happen. Therefore the > spin_trylock() is used > > in this dequeue because I don't want to wait for someone to finish > > with that queue in question (e.g. enqueue working), since that will > > block all other bands/queues after the band in question. So if the > > lock isn't available to grab, we move to the next band. If > I were to > > wait for the lock, I'd serialize the enqueue/dequeue > completely, and > > block other traffic flows in other queues waiting for the lock. > > The first thing you do in qdisc_restart() after dequeue()ing > is unlock the sub queue lock. You explicitely unlock it > before calling qdisc_run() so I assume dequeue() is expected > to keep it locked. Something doesn't add up. That's the entire point of this extra locking. enqueue() is going to put an skb into a band somewhere that maps to some queue, and there is no way to guarantee the skb I retrieve from dequeue() is headed for the same queue. Therefore, I need to unlock the queue after I finish enqueuing, since having that lock makes little sense to dequeue(). dequeue() will then grab *a* lock on a queue; it may be the same one we had during enqueue(), but it may not be. And the placement of the unlock of that queue is exactly where it happens in non-multiqueue, which is right before the hard_start_xmit(). I concede that the locking model is complex and seems really nasty, but to truly separate queue functionality from one another, I see no other feasible option than to run locking like this. I am very open to suggestions. > > BTW, which lock serializes your write access to > qdisc->q.qlen? It used to be dev->queue_lock but that is > apparently not true for multi queue. > This is a very good catch, because it isn't being protected on the entire qdisc right now for PRIO. However, Chris Leech pointed out the LINK_STATE_QDISC_RUNNING bit is serializing things at the qdisc_run() level, so that's
Re: [PATCH 1/2] NET: Multiple queue network device support
* Waskiewicz Jr, Peter P <[EMAIL PROTECTED]> 2007-03-09 11:25 > > > + } > > > + } else { > > > + /* We're not a multi-queue device. */ > > > + spin_lock(>queue_lock); > > > + q = dev->qdisc; > > > + if (q->enqueue) { > > > + rc = q->enqueue(skb, q); > > > + qdisc_run(dev); > > > + spin_unlock(>queue_lock); > > > + rc = rc == NET_XMIT_BYPASS > > > +? NET_XMIT_SUCCESS : rc; > > > + goto out; > > > + } > > > + spin_unlock(>queue_lock); > > > > Please don't duplicate already existing code. > > I don't want to mix multiqueue and non-multiqueue code in the transmit > path. This was an attempt to allow MQ and non-MQ devices to coexist in > a machine having separate code paths. Are you suggesting to combine > them? That would get very messy trying to determine what type of lock > to grab (subqueue lock or dev->queue_lock), not to mention grabbing the > dev->queue_lock would block multiqueue devices in that same codepath. The piece of code I quoted above is the branch executed if multi queue is not enabled. The code you added is 100% identical to the already existing enqueue logic. Just execute the existing branch if multi queue is not enabled. > This is another attempt to keep the two codepaths separate. The only > way I see of combining them is to check netif_is_multiqueue() everytime > I need to grab a lock, which I think would be excessive. The code added is 100% identical to the existing code, just be a little smarter on how you do the ifdefs. > > > } > > > > > > return NULL; > > > @@ -141,18 +174,53 @@ prio_dequeue(struct Qdisc* sch) > > > struct sk_buff *skb; > > > struct prio_sched_data *q = qdisc_priv(sch); > > > int prio; > > > +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE > > > + int queue; > > > +#endif > > > struct Qdisc *qdisc; > > > > > > + /* > > > + * If we're multiqueue, the basic approach is try the > > lock on each > > > + * queue. If it's locked, either we're already > > dequeuing, or enqueue > > > + * is doing something. Go to the next band if we're > > locked. Once we > > > + * have a packet, unlock the queue. NOTE: the > > underlying qdisc CANNOT > > > + * be a PRIO qdisc, otherwise we will deadlock. FIXME > > > + */ > > > for (prio = 0; prio < q->bands; prio++) { > > > +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE > > > + if (netif_is_multiqueue(sch->dev)) { > > > + queue = q->band2queue[prio]; > > > + if > > (spin_trylock(>dev->egress_subqueue[queue].queue_lock)) { > > > + qdisc = q->queues[prio]; > > > + skb = qdisc->dequeue(qdisc); > > > + if (skb) { > > > + sch->q.qlen--; > > > + skb->priority = prio; > > > + > > spin_unlock(>dev->egress_subqueue[queue].queue_lock); > > > + return skb; > > > + } > > > + > > spin_unlock(>dev->egress_subqueue[queue].queue_lock); > > > + } > > > > Your modified qdisc_restart() expects the queue_lock to be > > locked, how can this work? > > No, it doesn't expect the lock to be held. Because of the multiple > queues, enqueueing and dequeueing are now asynchronous, since I can > enqueue to queue 0 while dequeuing from queue 1. dev->queue_lock isn't > held, so this can happen. Therefore the spin_trylock() is used in this > dequeue because I don't want to wait for someone to finish with that > queue in question (e.g. enqueue working), since that will block all > other bands/queues after the band in question. So if the lock isn't > available to grab, we move to the next band. If I were to wait for the > lock, I'd serialize the enqueue/dequeue completely, and block other > traffic flows in other queues waiting for the lock. The first thing you do in qdisc_restart() after dequeue()ing is unlock the sub queue lock. You explicitely unlock it before calling qdisc_run() so I assume dequeue() is expected to keep it locked. Something doesn't add up. BTW, which lock serializes your write access to qdisc->q.qlen? It used to be dev->queue_lock but that is apparently not true for multi queue. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] NET: Multiple queue network device support
> -Original Message- > From: Thomas Graf [mailto:[EMAIL PROTECTED] > Sent: Friday, March 09, 2007 5:41 AM > To: Kok, Auke-jan H > Cc: David Miller; Garzik, Jeff; netdev@vger.kernel.org; > linux-kernel@vger.kernel.org; Waskiewicz Jr, Peter P; > Brandeburg, Jesse; Kok, Auke; Ronciak, John > Subject: Re: [PATCH 1/2] NET: Multiple queue network device support > > * Kok, Auke <[EMAIL PROTECTED]> 2007-02-08 16:09 > > diff --git a/net/core/dev.c b/net/core/dev.c index 455d589..42b635c > > 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -1477,6 +1477,49 @@ gso: > > skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS); > > #endif > > if (q->enqueue) { > > +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE > > + int queue_index; > > + /* If we're a multi-queue device, get a queue > index to lock */ > > + if (netif_is_multiqueue(dev)) > > + { > > + /* Get the queue index and lock it. */ > > + if (likely(q->ops->map_queue)) { > > + queue_index = q->ops->map_queue(skb, q); > > + > spin_lock(>egress_subqueue[queue_index].queue_lock); > > + rc = q->enqueue(skb, q); > > + /* > > +* Unlock because the underlying qdisc > > +* may queue and send a packet from a > > +* different queue. > > +*/ > > + > spin_unlock(>egress_subqueue[queue_index].queue_lock); > > + qdisc_run(dev); > > I really dislike this asymmetric locking logic, while > enqueue() is called with queue_lock held dequeue() is > repsonsible to acquire the lock for qdisc_restart(). I don't see how I can make this symmetrical when dealing with more than one queue. If an skb is classified to band 2 which maps to queue 1, we lock queue 1, and enqueue the packet. That queue will not be the first queue checked in dequeue for an skb, so I cannot rely on that lock protecting queue 0 in dequeue. Therefore I need to map the skb, lock, enqueue, unlock, then go into dequeue and lock the correct queue there. If I were to make this symmetrical, then the process of enqueuing and dequeuing would be serialized completely, and we'd be back to where we started without multiqueue. Also, in the multiqueue code path, dev->queue_lock isn't held anywhere, it's the subqueue lock. > > > + rc = rc == NET_XMIT_BYPASS > > + ? NET_XMIT_SUCCESS : rc; > > + goto out; > > + } else { > > + printk(KERN_CRIT "Device %s is " > > + "multiqueue, but map_queue is " > > + "undefined in the qdisc!!\n", > > + dev->name); > > + kfree_skb(skb); > > Move this check to tc_modify_qdisc(), it's useless to check > this for every packet being delivered. The patches I sent yesterday modified the non-multiqueue qdiscs to not load if the device is multiqueue, so this check can be removed altogether. > > > + } > > + } else { > > + /* We're not a multi-queue device. */ > > + spin_lock(>queue_lock); > > + q = dev->qdisc; > > + if (q->enqueue) { > > + rc = q->enqueue(skb, q); > > + qdisc_run(dev); > > + spin_unlock(>queue_lock); > > + rc = rc == NET_XMIT_BYPASS > > + ? NET_XMIT_SUCCESS : rc; > > + goto out; > > + } > > + spin_unlock(>queue_lock); > > Please don't duplicate already existing code. I don't want to mix multiqueue and non-multiqueue code in the transmit path. This was an attempt to allow MQ and non-MQ devices to coexist in a machine having separate code paths. Are you suggesting to combine them? That would get very messy trying to determine what type of lock to grab (subqueue lock or dev->queue_lock), not to mention grabbing the dev->queue_lock would block multiqueue devices in that same codepath. > > > @@ -130,6 +140,72 @@ static inline int qdisc_restart(struct > net_device *dev) > > }
Re: [PATCH 1/2] NET: Multiple queue network device support
* Kok, Auke <[EMAIL PROTECTED]> 2007-02-08 16:09 > diff --git a/net/core/dev.c b/net/core/dev.c > index 455d589..42b635c 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -1477,6 +1477,49 @@ gso: > skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS); > #endif > if (q->enqueue) { > +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE > + int queue_index; > + /* If we're a multi-queue device, get a queue index to lock */ > + if (netif_is_multiqueue(dev)) > + { > + /* Get the queue index and lock it. */ > + if (likely(q->ops->map_queue)) { > + queue_index = q->ops->map_queue(skb, q); > + > spin_lock(>egress_subqueue[queue_index].queue_lock); > + rc = q->enqueue(skb, q); > + /* > + * Unlock because the underlying qdisc > + * may queue and send a packet from a > + * different queue. > + */ > + > spin_unlock(>egress_subqueue[queue_index].queue_lock); > + qdisc_run(dev); I really dislike this asymmetric locking logic, while enqueue() is called with queue_lock held dequeue() is repsonsible to acquire the lock for qdisc_restart(). > + rc = rc == NET_XMIT_BYPASS > +? NET_XMIT_SUCCESS : rc; > + goto out; > + } else { > + printk(KERN_CRIT "Device %s is " > + "multiqueue, but map_queue is " > + "undefined in the qdisc!!\n", > + dev->name); > + kfree_skb(skb); Move this check to tc_modify_qdisc(), it's useless to check this for every packet being delivered. > + } > + } else { > + /* We're not a multi-queue device. */ > + spin_lock(>queue_lock); > + q = dev->qdisc; > + if (q->enqueue) { > + rc = q->enqueue(skb, q); > + qdisc_run(dev); > + spin_unlock(>queue_lock); > + rc = rc == NET_XMIT_BYPASS > +? NET_XMIT_SUCCESS : rc; > + goto out; > + } > + spin_unlock(>queue_lock); Please don't duplicate already existing code. > @@ -130,6 +140,72 @@ static inline int qdisc_restart(struct net_device *dev) > } > > { > +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE > + if (netif_is_multiqueue(dev)) { > + if (likely(q->ops->map_queue)) { > + queue_index = q->ops->map_queue(skb, q); > + } else { > + printk(KERN_CRIT "Device %s is " > + "multiqueue, but map_queue is " > + "undefined in the qdisc!!\n", > + dev->name); > + goto requeue; > + } Yet another condition completely useless for every transmission. > + > spin_unlock(>egress_subqueue[queue_index].queue_lock); > + /* Check top level device, and any sub-device */ > + if ((!netif_queue_stopped(dev)) && > + (!netif_subqueue_stopped(dev, queue_index))) { > + int ret; > + ret = > dev->hard_start_subqueue_xmit(skb, dev, queue_index); > + if (ret == NETDEV_TX_OK) { > + if (!nolock) { > + netif_tx_unlock(dev); > + } > + return -1; > + } > + if (ret == NETDEV_TX_LOCKED && nolock) { > + > spin_lock(>egress_subqueue[queue_index].queue_lock); > + goto collision; > + } > + } > + /* NETDEV_TX_BUSY - we need to requeue */ > + /* Release the driver */ > + if (!nolock) { > + netif_tx_unlock(dev); > +
Re: [PATCH 1/2] NET: Multiple queue network device support
* Kok, Auke [EMAIL PROTECTED] 2007-02-08 16:09 diff --git a/net/core/dev.c b/net/core/dev.c index 455d589..42b635c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1477,6 +1477,49 @@ gso: skb-tc_verd = SET_TC_AT(skb-tc_verd,AT_EGRESS); #endif if (q-enqueue) { +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + int queue_index; + /* If we're a multi-queue device, get a queue index to lock */ + if (netif_is_multiqueue(dev)) + { + /* Get the queue index and lock it. */ + if (likely(q-ops-map_queue)) { + queue_index = q-ops-map_queue(skb, q); + spin_lock(dev-egress_subqueue[queue_index].queue_lock); + rc = q-enqueue(skb, q); + /* + * Unlock because the underlying qdisc + * may queue and send a packet from a + * different queue. + */ + spin_unlock(dev-egress_subqueue[queue_index].queue_lock); + qdisc_run(dev); I really dislike this asymmetric locking logic, while enqueue() is called with queue_lock held dequeue() is repsonsible to acquire the lock for qdisc_restart(). + rc = rc == NET_XMIT_BYPASS +? NET_XMIT_SUCCESS : rc; + goto out; + } else { + printk(KERN_CRIT Device %s is + multiqueue, but map_queue is + undefined in the qdisc!!\n, + dev-name); + kfree_skb(skb); Move this check to tc_modify_qdisc(), it's useless to check this for every packet being delivered. + } + } else { + /* We're not a multi-queue device. */ + spin_lock(dev-queue_lock); + q = dev-qdisc; + if (q-enqueue) { + rc = q-enqueue(skb, q); + qdisc_run(dev); + spin_unlock(dev-queue_lock); + rc = rc == NET_XMIT_BYPASS +? NET_XMIT_SUCCESS : rc; + goto out; + } + spin_unlock(dev-queue_lock); Please don't duplicate already existing code. @@ -130,6 +140,72 @@ static inline int qdisc_restart(struct net_device *dev) } { +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + if (netif_is_multiqueue(dev)) { + if (likely(q-ops-map_queue)) { + queue_index = q-ops-map_queue(skb, q); + } else { + printk(KERN_CRIT Device %s is + multiqueue, but map_queue is + undefined in the qdisc!!\n, + dev-name); + goto requeue; + } Yet another condition completely useless for every transmission. + spin_unlock(dev-egress_subqueue[queue_index].queue_lock); + /* Check top level device, and any sub-device */ + if ((!netif_queue_stopped(dev)) + (!netif_subqueue_stopped(dev, queue_index))) { + int ret; + ret = dev-hard_start_subqueue_xmit(skb, dev, queue_index); + if (ret == NETDEV_TX_OK) { + if (!nolock) { + netif_tx_unlock(dev); + } + return -1; + } + if (ret == NETDEV_TX_LOCKED nolock) { + spin_lock(dev-egress_subqueue[queue_index].queue_lock); + goto collision; + } + } + /* NETDEV_TX_BUSY - we need to requeue */ + /* Release the driver */ + if (!nolock) { + netif_tx_unlock(dev); + } +
RE: [PATCH 1/2] NET: Multiple queue network device support
-Original Message- From: Thomas Graf [mailto:[EMAIL PROTECTED] Sent: Friday, March 09, 2007 5:41 AM To: Kok, Auke-jan H Cc: David Miller; Garzik, Jeff; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Waskiewicz Jr, Peter P; Brandeburg, Jesse; Kok, Auke; Ronciak, John Subject: Re: [PATCH 1/2] NET: Multiple queue network device support * Kok, Auke [EMAIL PROTECTED] 2007-02-08 16:09 diff --git a/net/core/dev.c b/net/core/dev.c index 455d589..42b635c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1477,6 +1477,49 @@ gso: skb-tc_verd = SET_TC_AT(skb-tc_verd,AT_EGRESS); #endif if (q-enqueue) { +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + int queue_index; + /* If we're a multi-queue device, get a queue index to lock */ + if (netif_is_multiqueue(dev)) + { + /* Get the queue index and lock it. */ + if (likely(q-ops-map_queue)) { + queue_index = q-ops-map_queue(skb, q); + spin_lock(dev-egress_subqueue[queue_index].queue_lock); + rc = q-enqueue(skb, q); + /* +* Unlock because the underlying qdisc +* may queue and send a packet from a +* different queue. +*/ + spin_unlock(dev-egress_subqueue[queue_index].queue_lock); + qdisc_run(dev); I really dislike this asymmetric locking logic, while enqueue() is called with queue_lock held dequeue() is repsonsible to acquire the lock for qdisc_restart(). I don't see how I can make this symmetrical when dealing with more than one queue. If an skb is classified to band 2 which maps to queue 1, we lock queue 1, and enqueue the packet. That queue will not be the first queue checked in dequeue for an skb, so I cannot rely on that lock protecting queue 0 in dequeue. Therefore I need to map the skb, lock, enqueue, unlock, then go into dequeue and lock the correct queue there. If I were to make this symmetrical, then the process of enqueuing and dequeuing would be serialized completely, and we'd be back to where we started without multiqueue. Also, in the multiqueue code path, dev-queue_lock isn't held anywhere, it's the subqueue lock. + rc = rc == NET_XMIT_BYPASS + ? NET_XMIT_SUCCESS : rc; + goto out; + } else { + printk(KERN_CRIT Device %s is + multiqueue, but map_queue is + undefined in the qdisc!!\n, + dev-name); + kfree_skb(skb); Move this check to tc_modify_qdisc(), it's useless to check this for every packet being delivered. The patches I sent yesterday modified the non-multiqueue qdiscs to not load if the device is multiqueue, so this check can be removed altogether. + } + } else { + /* We're not a multi-queue device. */ + spin_lock(dev-queue_lock); + q = dev-qdisc; + if (q-enqueue) { + rc = q-enqueue(skb, q); + qdisc_run(dev); + spin_unlock(dev-queue_lock); + rc = rc == NET_XMIT_BYPASS + ? NET_XMIT_SUCCESS : rc; + goto out; + } + spin_unlock(dev-queue_lock); Please don't duplicate already existing code. I don't want to mix multiqueue and non-multiqueue code in the transmit path. This was an attempt to allow MQ and non-MQ devices to coexist in a machine having separate code paths. Are you suggesting to combine them? That would get very messy trying to determine what type of lock to grab (subqueue lock or dev-queue_lock), not to mention grabbing the dev-queue_lock would block multiqueue devices in that same codepath. @@ -130,6 +140,72 @@ static inline int qdisc_restart(struct net_device *dev) } { +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + if (netif_is_multiqueue(dev)) { + if (likely(q-ops-map_queue)) { + queue_index = q-ops-map_queue(skb, q); + } else { + printk(KERN_CRIT Device %s is + multiqueue, but map_queue is + undefined in the qdisc!!\n, + dev-name
Re: [PATCH 1/2] NET: Multiple queue network device support
* Waskiewicz Jr, Peter P [EMAIL PROTECTED] 2007-03-09 11:25 + } + } else { + /* We're not a multi-queue device. */ + spin_lock(dev-queue_lock); + q = dev-qdisc; + if (q-enqueue) { + rc = q-enqueue(skb, q); + qdisc_run(dev); + spin_unlock(dev-queue_lock); + rc = rc == NET_XMIT_BYPASS +? NET_XMIT_SUCCESS : rc; + goto out; + } + spin_unlock(dev-queue_lock); Please don't duplicate already existing code. I don't want to mix multiqueue and non-multiqueue code in the transmit path. This was an attempt to allow MQ and non-MQ devices to coexist in a machine having separate code paths. Are you suggesting to combine them? That would get very messy trying to determine what type of lock to grab (subqueue lock or dev-queue_lock), not to mention grabbing the dev-queue_lock would block multiqueue devices in that same codepath. The piece of code I quoted above is the branch executed if multi queue is not enabled. The code you added is 100% identical to the already existing enqueue logic. Just execute the existing branch if multi queue is not enabled. This is another attempt to keep the two codepaths separate. The only way I see of combining them is to check netif_is_multiqueue() everytime I need to grab a lock, which I think would be excessive. The code added is 100% identical to the existing code, just be a little smarter on how you do the ifdefs. } return NULL; @@ -141,18 +174,53 @@ prio_dequeue(struct Qdisc* sch) struct sk_buff *skb; struct prio_sched_data *q = qdisc_priv(sch); int prio; +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + int queue; +#endif struct Qdisc *qdisc; + /* + * If we're multiqueue, the basic approach is try the lock on each + * queue. If it's locked, either we're already dequeuing, or enqueue + * is doing something. Go to the next band if we're locked. Once we + * have a packet, unlock the queue. NOTE: the underlying qdisc CANNOT + * be a PRIO qdisc, otherwise we will deadlock. FIXME + */ for (prio = 0; prio q-bands; prio++) { +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + if (netif_is_multiqueue(sch-dev)) { + queue = q-band2queue[prio]; + if (spin_trylock(sch-dev-egress_subqueue[queue].queue_lock)) { + qdisc = q-queues[prio]; + skb = qdisc-dequeue(qdisc); + if (skb) { + sch-q.qlen--; + skb-priority = prio; + spin_unlock(sch-dev-egress_subqueue[queue].queue_lock); + return skb; + } + spin_unlock(sch-dev-egress_subqueue[queue].queue_lock); + } Your modified qdisc_restart() expects the queue_lock to be locked, how can this work? No, it doesn't expect the lock to be held. Because of the multiple queues, enqueueing and dequeueing are now asynchronous, since I can enqueue to queue 0 while dequeuing from queue 1. dev-queue_lock isn't held, so this can happen. Therefore the spin_trylock() is used in this dequeue because I don't want to wait for someone to finish with that queue in question (e.g. enqueue working), since that will block all other bands/queues after the band in question. So if the lock isn't available to grab, we move to the next band. If I were to wait for the lock, I'd serialize the enqueue/dequeue completely, and block other traffic flows in other queues waiting for the lock. The first thing you do in qdisc_restart() after dequeue()ing is unlock the sub queue lock. You explicitely unlock it before calling qdisc_run() so I assume dequeue() is expected to keep it locked. Something doesn't add up. BTW, which lock serializes your write access to qdisc-q.qlen? It used to be dev-queue_lock but that is apparently not true for multi queue. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] NET: Multiple queue network device support
* Waskiewicz Jr, Peter P [EMAIL PROTECTED] 2007-03-09 11:25 + } + } else { + /* We're not a multi-queue device. */ + spin_lock(dev-queue_lock); + q = dev-qdisc; + if (q-enqueue) { + rc = q-enqueue(skb, q); + qdisc_run(dev); + spin_unlock(dev-queue_lock); + rc = rc == NET_XMIT_BYPASS + ? NET_XMIT_SUCCESS : rc; + goto out; + } + spin_unlock(dev-queue_lock); Please don't duplicate already existing code. I don't want to mix multiqueue and non-multiqueue code in the transmit path. This was an attempt to allow MQ and non-MQ devices to coexist in a machine having separate code paths. Are you suggesting to combine them? That would get very messy trying to determine what type of lock to grab (subqueue lock or dev-queue_lock), not to mention grabbing the dev-queue_lock would block multiqueue devices in that same codepath. The piece of code I quoted above is the branch executed if multi queue is not enabled. The code you added is 100% identical to the already existing enqueue logic. Just execute the existing branch if multi queue is not enabled. I totally agree this is identical code if multiqueue isn't enabled. However, when multiqueue is enabled, I don't want to make all network drivers implement the subqueue API just to be able to lock the queues. So the first thing I check is netif_is_multiqueue(dev), and if it *isn't* multiqueue, it will run the existing code. This way both non-multiqueue devices don't have to have any knowledge of the MQ API. This is another attempt to keep the two codepaths separate. The only way I see of combining them is to check netif_is_multiqueue() everytime I need to grab a lock, which I think would be excessive. The code added is 100% identical to the existing code, just be a little smarter on how you do the ifdefs. An example could be an on-board adapter isn't multiqueue, and an expansion card you have in your system is. If I handle multiqueue being on with just ifdef's, then I could use the expansion card, but not the on-board adapter as-is. The on-board driver would need to be updated to have 1 queue in the multiqueue context, which is what I tried to avoid. Your modified qdisc_restart() expects the queue_lock to be locked, how can this work? No, it doesn't expect the lock to be held. Because of the multiple queues, enqueueing and dequeueing are now asynchronous, since I can enqueue to queue 0 while dequeuing from queue 1. dev-queue_lock isn't held, so this can happen. Therefore the spin_trylock() is used in this dequeue because I don't want to wait for someone to finish with that queue in question (e.g. enqueue working), since that will block all other bands/queues after the band in question. So if the lock isn't available to grab, we move to the next band. If I were to wait for the lock, I'd serialize the enqueue/dequeue completely, and block other traffic flows in other queues waiting for the lock. The first thing you do in qdisc_restart() after dequeue()ing is unlock the sub queue lock. You explicitely unlock it before calling qdisc_run() so I assume dequeue() is expected to keep it locked. Something doesn't add up. That's the entire point of this extra locking. enqueue() is going to put an skb into a band somewhere that maps to some queue, and there is no way to guarantee the skb I retrieve from dequeue() is headed for the same queue. Therefore, I need to unlock the queue after I finish enqueuing, since having that lock makes little sense to dequeue(). dequeue() will then grab *a* lock on a queue; it may be the same one we had during enqueue(), but it may not be. And the placement of the unlock of that queue is exactly where it happens in non-multiqueue, which is right before the hard_start_xmit(). I concede that the locking model is complex and seems really nasty, but to truly separate queue functionality from one another, I see no other feasible option than to run locking like this. I am very open to suggestions. BTW, which lock serializes your write access to qdisc-q.qlen? It used to be dev-queue_lock but that is apparently not true for multi queue. This is a very good catch, because it isn't being protected on the entire qdisc right now for PRIO. However, Chris Leech pointed out the LINK_STATE_QDISC_RUNNING bit is serializing things at the qdisc_run() level, so that's probably protecting this counter right now. I'm looking at pushing that down into the qdisc, but at the same time I can think of something to
Re: [PATCH 1/2] NET: Multiple queue network device support
* Waskiewicz Jr, Peter P [EMAIL PROTECTED] 2007-03-09 15:27 That's the entire point of this extra locking. enqueue() is going to put an skb into a band somewhere that maps to some queue, and there is no way to guarantee the skb I retrieve from dequeue() is headed for the same queue. Therefore, I need to unlock the queue after I finish enqueuing, since having that lock makes little sense to dequeue(). dequeue() will then grab *a* lock on a queue; it may be the same one we had during enqueue(), but it may not be. And the placement of the unlock of that queue is exactly where it happens in non-multiqueue, which is right before the hard_start_xmit(). The lock is already unlocked after dequeue, from your prio_dequeue(): if (netif_is_multiqueue(sch-dev)) { queue = q-band2queue[prio]; if (spin_trylock(sch-dev-egress_subqueue[queue].queue_lock)) { qdisc = q-queues[prio]; skb = qdisc-dequeue(qdisc); if (skb) { sch-q.qlen--; skb-priority = prio; spin_unlock(sch-dev-egress_subqueue[queue].queue_lock); return skb; } spin_unlock(sch-dev-egress_subqueue[queue].queue_lock); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] NET: Multiple queue network device support
On 07-03-2007 23:42, David Miller wrote: > I didn't say to use skb->priority, I said to shrink skb->priority down > to a u16 and then make another u16 which will store your queue mapping > value. Peter is right: this is fully used by schedulers (prio, CBQ, HTB, HFSC...) and would break users' scripts, so I wouldn't recommend, too. Regards, Jarek P. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] NET: Multiple queue network device support
On 07-03-2007 23:42, David Miller wrote: I didn't say to use skb-priority, I said to shrink skb-priority down to a u16 and then make another u16 which will store your queue mapping value. Peter is right: this is fully used by schedulers (prio, CBQ, HTB, HFSC...) and would break users' scripts, so I wouldn't recommend, too. Regards, Jarek P. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] NET: Multiple queue network device support
I didn't say to use skb->priority, I said to shrink skb->priority down to a u16 and then make another u16 which will store your queue mapping value. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] NET: Multiple queue network device support
Dave, I did some research based on your feedback. Specifically, I looked at removing ->map_queue() and allowing ->enqueue() to take care of mapping and locking of the queues (and ->dequeue()). I found that it can be done either way, but I'm not sure I like taking the locking out of dev_queue_xmit(), and relying on the qdisc ->enqueue() to make sure locking is completed correctly. Having a map routine also allows various parts of the stack to query the skb where it belongs. I also looked into storing the mapped queue value in the skb, namely in skb->priority. If I were to use this value, I'd lose filtering capabilities in sch_prio.c, where if no tc filter exists for the skb in question, it relies on the value of skb->priority to enqueue to a band. The value of skb->priority could also be used in a base driver for further filtering (some of the MAC-based QoS on wireless devices comes to mind), so I'm not sure I feel comfortable using that field to store the queue. I also envision some qdiscs in the future that could change the mapping of bands to queues without reloading the qdisc, so storing the queue information in the skb could lead to out-of-order packets into queues (stale information in the skb). What do you think? Thanks, PJ Waskiewicz Software Engineer LAD - LAN Access Division, New Technologies Intel Corporation [EMAIL PROTECTED] > -Original Message- > From: David Miller [mailto:[EMAIL PROTECTED] > Sent: Monday, February 26, 2007 5:03 PM > To: Kok, Auke-jan H > Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org; > linux-kernel@vger.kernel.org; Waskiewicz Jr, Peter P; > Brandeburg, Jesse; [EMAIL PROTECTED]; Ronciak, John > Subject: Re: [PATCH 1/2] NET: Multiple queue network device support > > From: "Kok, Auke" <[EMAIL PROTECTED]> > Date: Thu, 08 Feb 2007 16:09:50 -0800 > > > From: Peter Waskiewicz Jr <[EMAIL PROTECTED]> > > > > Added an API and associated supporting routines for > multiqueue network > > devices. This allows network devices supporting multiple > TX queues to > > configure each queue within the netdevice and manage each queue > > independantly. Changes to the PRIO Qdisc also allow a user to map > > multiple flows to individual TX queues, taking advantage of > each queue > > on the device. > > > > Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]> > > Signed-off-by: Auke Kok <[EMAIL PROTECTED]> > > Thanks for posting this. > > I was wonding if it would be better to have the ->enqueue() > perform the mapping operation, store the queue index selected > inside of the skb, and just use the normal > ->hard_start_xmit() to send the packet out? > > The only thing to take care of is the per-queue locking, but > that should be easily doable. > > We might be able to do this even without making sk_buff any larger. > For example, I suppose skb->priority might be deducable down > to a u16. After a quick audit I really cannot see any > legitimate use of anything larger than 16-bit values, but a > more thorough audit would be necessary to validate this. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] NET: Multiple queue network device support
Dave, I did some research based on your feedback. Specifically, I looked at removing -map_queue() and allowing -enqueue() to take care of mapping and locking of the queues (and -dequeue()). I found that it can be done either way, but I'm not sure I like taking the locking out of dev_queue_xmit(), and relying on the qdisc -enqueue() to make sure locking is completed correctly. Having a map routine also allows various parts of the stack to query the skb where it belongs. I also looked into storing the mapped queue value in the skb, namely in skb-priority. If I were to use this value, I'd lose filtering capabilities in sch_prio.c, where if no tc filter exists for the skb in question, it relies on the value of skb-priority to enqueue to a band. The value of skb-priority could also be used in a base driver for further filtering (some of the MAC-based QoS on wireless devices comes to mind), so I'm not sure I feel comfortable using that field to store the queue. I also envision some qdiscs in the future that could change the mapping of bands to queues without reloading the qdisc, so storing the queue information in the skb could lead to out-of-order packets into queues (stale information in the skb). What do you think? Thanks, PJ Waskiewicz Software Engineer LAD - LAN Access Division, New Technologies Intel Corporation [EMAIL PROTECTED] -Original Message- From: David Miller [mailto:[EMAIL PROTECTED] Sent: Monday, February 26, 2007 5:03 PM To: Kok, Auke-jan H Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Waskiewicz Jr, Peter P; Brandeburg, Jesse; [EMAIL PROTECTED]; Ronciak, John Subject: Re: [PATCH 1/2] NET: Multiple queue network device support From: Kok, Auke [EMAIL PROTECTED] Date: Thu, 08 Feb 2007 16:09:50 -0800 From: Peter Waskiewicz Jr [EMAIL PROTECTED] Added an API and associated supporting routines for multiqueue network devices. This allows network devices supporting multiple TX queues to configure each queue within the netdevice and manage each queue independantly. Changes to the PRIO Qdisc also allow a user to map multiple flows to individual TX queues, taking advantage of each queue on the device. Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED] Signed-off-by: Auke Kok [EMAIL PROTECTED] Thanks for posting this. I was wonding if it would be better to have the -enqueue() perform the mapping operation, store the queue index selected inside of the skb, and just use the normal -hard_start_xmit() to send the packet out? The only thing to take care of is the per-queue locking, but that should be easily doable. We might be able to do this even without making sk_buff any larger. For example, I suppose skb-priority might be deducable down to a u16. After a quick audit I really cannot see any legitimate use of anything larger than 16-bit values, but a more thorough audit would be necessary to validate this. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] NET: Multiple queue network device support
I didn't say to use skb-priority, I said to shrink skb-priority down to a u16 and then make another u16 which will store your queue mapping value. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] NET: Multiple queue network device support
Dave, Thanks for the feedback. Please see below for my comments / questions. PJ Waskiewicz Intel Corporation > > From: Peter Waskiewicz Jr <[EMAIL PROTECTED]> > > > > Added an API and associated supporting routines for > multiqueue network > > devices. This allows network devices supporting multiple > TX queues to > > configure each queue within the netdevice and manage each queue > > independantly. Changes to the PRIO Qdisc also allow a user to map > > multiple flows to individual TX queues, taking advantage of > each queue > > on the device. > > > > Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]> > > Signed-off-by: Auke Kok <[EMAIL PROTECTED]> > > Thanks for posting this. > > I was wonding if it would be better to have the ->enqueue() > perform the mapping operation, store the queue index selected > inside of the skb, and just use the normal > ->hard_start_xmit() to send the packet out? One of the reasons I chose to add more functions is for the queue management itself. Specifically, I needed the ability to lock queues, stop queues (netif_stop_queue), and wake them up, independent of the global queue_lock in the netdevice. At first I did think about storing the queue index in the skb, but that won't help with the queue locking, since only netdevice is passed to them outside of the context of the skb. As for adding the additional multiqueue version of hard_start_subqueue_xmit(), I did that so non-MQ devices would have a clean, untouched path in dev_queue_xmit(), and only MQ devices would touch a new codepath. This also removes the need for the device driver to inspect the queue index in the skb for which Tx ring to place the packet in, which is good for performance. For having ->enqueue() take care of the mapping, I thought for awhile about this. Originally, I had ->enqueue() doing just that, until I realized I was not locking the individual subqueues correctly. The thought process was once I receive an skb for Tx, I need to lock a queue, enqueue the skb, call qdisc_run(), then unlock. That grew into lock a queue, enqueue, unlock the queue, have dequeue decide which queue to pull an skb from (since the queue an skb comes from can be different than the one just enqueued to), lock it, dequeue, unlock, and return the skb. The whole issue came down to needing to know what queue to lock before enqueuing. If I called enqueue, I would be going in unlocked. I suppose this would be ok if the lock is done inside the ->enqueue(), and unlock is done before exiting; let me look at that possibility. > > The only thing to take care of is the per-queue locking, but > that should be easily doable. > > We might be able to do this even without making sk_buff any larger. > For example, I suppose skb->priority might be deducable down > to a u16. After a quick audit I really cannot see any > legitimate use of anything larger than 16-bit values, but a > more thorough audit would be necessary to validate this. The skb->priority field right now is used to determine the PRIO / pfifo_fast band to place the skb in on enqueue. I still need to think about if storing the queue index in the skb would help, due to the queue management functions outside of the skb context. Again, thanks for the feedback. I'll respond once I've looked at moving locks into ->enqueue() and removing ->map_queue(). Please note that the PRIO band to queue mapping in this patch has a problem when the number of bands is <= the number of Tx queues on the NIC. I have a fix, and will include that in a repost. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] NET: Multiple queue network device support
Dave, Thanks for the feedback. Please see below for my comments / questions. PJ Waskiewicz Intel Corporation From: Peter Waskiewicz Jr [EMAIL PROTECTED] Added an API and associated supporting routines for multiqueue network devices. This allows network devices supporting multiple TX queues to configure each queue within the netdevice and manage each queue independantly. Changes to the PRIO Qdisc also allow a user to map multiple flows to individual TX queues, taking advantage of each queue on the device. Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED] Signed-off-by: Auke Kok [EMAIL PROTECTED] Thanks for posting this. I was wonding if it would be better to have the -enqueue() perform the mapping operation, store the queue index selected inside of the skb, and just use the normal -hard_start_xmit() to send the packet out? One of the reasons I chose to add more functions is for the queue management itself. Specifically, I needed the ability to lock queues, stop queues (netif_stop_queue), and wake them up, independent of the global queue_lock in the netdevice. At first I did think about storing the queue index in the skb, but that won't help with the queue locking, since only netdevice is passed to them outside of the context of the skb. As for adding the additional multiqueue version of hard_start_subqueue_xmit(), I did that so non-MQ devices would have a clean, untouched path in dev_queue_xmit(), and only MQ devices would touch a new codepath. This also removes the need for the device driver to inspect the queue index in the skb for which Tx ring to place the packet in, which is good for performance. For having -enqueue() take care of the mapping, I thought for awhile about this. Originally, I had -enqueue() doing just that, until I realized I was not locking the individual subqueues correctly. The thought process was once I receive an skb for Tx, I need to lock a queue, enqueue the skb, call qdisc_run(), then unlock. That grew into lock a queue, enqueue, unlock the queue, have dequeue decide which queue to pull an skb from (since the queue an skb comes from can be different than the one just enqueued to), lock it, dequeue, unlock, and return the skb. The whole issue came down to needing to know what queue to lock before enqueuing. If I called enqueue, I would be going in unlocked. I suppose this would be ok if the lock is done inside the -enqueue(), and unlock is done before exiting; let me look at that possibility. The only thing to take care of is the per-queue locking, but that should be easily doable. We might be able to do this even without making sk_buff any larger. For example, I suppose skb-priority might be deducable down to a u16. After a quick audit I really cannot see any legitimate use of anything larger than 16-bit values, but a more thorough audit would be necessary to validate this. The skb-priority field right now is used to determine the PRIO / pfifo_fast band to place the skb in on enqueue. I still need to think about if storing the queue index in the skb would help, due to the queue management functions outside of the skb context. Again, thanks for the feedback. I'll respond once I've looked at moving locks into -enqueue() and removing -map_queue(). Please note that the PRIO band to queue mapping in this patch has a problem when the number of bands is = the number of Tx queues on the NIC. I have a fix, and will include that in a repost. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] NET: Multiple queue network device support
From: "Kok, Auke" <[EMAIL PROTECTED]> Date: Thu, 08 Feb 2007 16:09:50 -0800 > From: Peter Waskiewicz Jr <[EMAIL PROTECTED]> > > Added an API and associated supporting routines for multiqueue network > devices. This allows network devices supporting multiple TX queues to > configure each queue within the netdevice and manage each queue > independantly. Changes to the PRIO Qdisc also allow a user to map > multiple flows to individual TX queues, taking advantage of each queue > on the device. > > Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]> > Signed-off-by: Auke Kok <[EMAIL PROTECTED]> Thanks for posting this. I was wonding if it would be better to have the ->enqueue() perform the mapping operation, store the queue index selected inside of the skb, and just use the normal ->hard_start_xmit() to send the packet out? The only thing to take care of is the per-queue locking, but that should be easily doable. We might be able to do this even without making sk_buff any larger. For example, I suppose skb->priority might be deducable down to a u16. After a quick audit I really cannot see any legitimate use of anything larger than 16-bit values, but a more thorough audit would be necessary to validate this. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] NET: Multiple queue network device support
From: Kok, Auke [EMAIL PROTECTED] Date: Thu, 08 Feb 2007 16:09:50 -0800 From: Peter Waskiewicz Jr [EMAIL PROTECTED] Added an API and associated supporting routines for multiqueue network devices. This allows network devices supporting multiple TX queues to configure each queue within the netdevice and manage each queue independantly. Changes to the PRIO Qdisc also allow a user to map multiple flows to individual TX queues, taking advantage of each queue on the device. Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED] Signed-off-by: Auke Kok [EMAIL PROTECTED] Thanks for posting this. I was wonding if it would be better to have the -enqueue() perform the mapping operation, store the queue index selected inside of the skb, and just use the normal -hard_start_xmit() to send the packet out? The only thing to take care of is the per-queue locking, but that should be easily doable. We might be able to do this even without making sk_buff any larger. For example, I suppose skb-priority might be deducable down to a u16. After a quick audit I really cannot see any legitimate use of anything larger than 16-bit values, but a more thorough audit would be necessary to validate this. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] NET: Multiple queue network device support
Stephen Hemminger wrote: On Fri, 23 Feb 2007 04:00:55 -0500 "Sreenivasa Honnur" <[EMAIL PROTECTED]> wrote: Fucntion "map_queue" returns queue index as '0'. There is no support to return different queue indexes. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kok, Auke Sent: Friday, February 09, 2007 5:40 AM To: David Miller; Garzik, Jeff; netdev@vger.kernel.org; linux-kernel@vger.kernel.org Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke; Ronciak, John Subject: [PATCH 1/2] NET: Multiple queue network device support From: Peter Waskiewicz Jr <[EMAIL PROTECTED]> Added an API and associated supporting routines for multiqueue network devices. This allows network devices supporting multiple TX queues to configure each queue within the netdevice and manage each queue independantly. Changes to the PRIO Qdisc also allow a user to map multiple flows to individual TX queues, taking advantage of each queue on the device. Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]> Signed-off-by: Auke Kok <[EMAIL PROTECTED]> --- +config NET_MULTI_QUEUE_DEVICE + bool "Multiple queue network device support (EXPERIMENTAL)" + depends on NET_SCHED && EXPERIMENTAL + help + Saying Y here will add support for network devices that have more than + one physical TX queue and/or RX queue. + + Multiple queue devices will require qdiscs that understand how to + queue to multiple targets. The default packet scheduler will default + to the first queue in a device. In other words, if you need the + ability to spread traffic across queues, your queueing discipline + needs to know how to do that. + + Note that saying Y here will give preferential treatment to multiple + queue devices in the network stack. A slight drop in single-queue + device performance may be seen. + + Say N here unless you know your network device supports multiple + TX and/or RX queues. + This should not be a user visible configuration option. It should either: always be part of the kernel API or be selected by drivers that need/want it. perhaps when it's stable, yes, but right now it's definately experimental and may result in a slight overhead for single-queue devices as the text reads. Cheers, Auke - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] NET: Multiple queue network device support
On Fri, 23 Feb 2007 04:00:55 -0500 "Sreenivasa Honnur" <[EMAIL PROTECTED]> wrote: > Fucntion "map_queue" returns queue index as '0'. There is no support to > return different queue indexes. > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > On Behalf Of Kok, Auke > Sent: Friday, February 09, 2007 5:40 AM > To: David Miller; Garzik, Jeff; netdev@vger.kernel.org; > linux-kernel@vger.kernel.org > Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke; > Ronciak, John > Subject: [PATCH 1/2] NET: Multiple queue network device support > > > From: Peter Waskiewicz Jr <[EMAIL PROTECTED]> > > Added an API and associated supporting routines for multiqueue network > devices. This allows network devices supporting multiple TX queues to > configure each queue within the netdevice and manage each queue > independantly. Changes to the PRIO Qdisc also allow a user to map > multiple flows to individual TX queues, taking advantage of each queue > on the device. > > Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]> > Signed-off-by: Auke Kok <[EMAIL PROTECTED]> > --- > > +config NET_MULTI_QUEUE_DEVICE > + bool "Multiple queue network device support (EXPERIMENTAL)" > + depends on NET_SCHED && EXPERIMENTAL > + help > + Saying Y here will add support for network devices that have > more than > + one physical TX queue and/or RX queue. > + > + Multiple queue devices will require qdiscs that understand how > to > + queue to multiple targets. The default packet scheduler will > default > + to the first queue in a device. In other words, if you need > the > + ability to spread traffic across queues, your queueing > discipline > + needs to know how to do that. > + > + Note that saying Y here will give preferential treatment to > multiple > + queue devices in the network stack. A slight drop in > single-queue > + device performance may be seen. > + > + Say N here unless you know your network device supports > multiple > + TX and/or RX queues. > + This should not be a user visible configuration option. It should either: always be part of the kernel API or be selected by drivers that need/want it. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] NET: Multiple queue network device support
map_queue will always return 0 in the pfifo_fast Qdisc. This is because pfifo_fast is the default scheduler when a device is brought up. I didn't want to make any assumptions about a device's functionality, so using queue 0 seems the least dangerous. However, in the prio qdisc, there is a small problem with the queue to band assignment logic. If you have the same number of bands as queues (e.g. 4 prio bands and 4 Tx queues), then you will have all bands assigned to queue 0. I have a fix for that, which I plan to send when I finish up a documentation piece for this patch's repost. Thanks for the feedback, PJ Waskiewicz > -Original Message- > From: Sreenivasa Honnur [mailto:[EMAIL PROTECTED] > Sent: Friday, February 23, 2007 1:01 AM > To: Kok, Auke-jan H; David Miller; Garzik, Jeff; > netdev@vger.kernel.org; linux-kernel@vger.kernel.org > Cc: Waskiewicz Jr, Peter P; Brandeburg, Jesse; Kok, Auke; > Ronciak, John > Subject: RE: [PATCH 1/2] NET: Multiple queue network device support > > Fucntion "map_queue" returns queue index as '0'. There is no > support to return different queue indexes. > > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] > On Behalf Of Kok, Auke > Sent: Friday, February 09, 2007 5:40 AM > To: David Miller; Garzik, Jeff; netdev@vger.kernel.org; > linux-kernel@vger.kernel.org > Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, > Auke; Ronciak, John > Subject: [PATCH 1/2] NET: Multiple queue network device support > > > From: Peter Waskiewicz Jr <[EMAIL PROTECTED]> > > Added an API and associated supporting routines for > multiqueue network devices. This allows network devices > supporting multiple TX queues to configure each queue within > the netdevice and manage each queue independantly. Changes > to the PRIO Qdisc also allow a user to map multiple flows to > individual TX queues, taking advantage of each queue on the device. > > Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]> > Signed-off-by: Auke Kok <[EMAIL PROTECTED]> > --- > > include/linux/netdevice.h | 73 > include/net/sch_generic.h |3 + > net/Kconfig | 20 > net/core/dev.c| 51 > net/sched/sch_generic.c | 117 > + > net/sched/sch_prio.c | 106 > ++--- > 6 files changed, 364 insertions(+), 6 deletions(-) > > diff --git a/include/linux/netdevice.h > b/include/linux/netdevice.h index > 2e37f50..c7f94a8 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -106,6 +106,16 @@ struct netpoll_info; #define MAX_HEADER > (LL_MAX_HEADER + 48) #endif > > +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE > + > +struct net_device_subqueue > +{ > + /* Give a lock and a flow control state for each queue */ > + unsigned long state; > + spinlock_t queue_lock cacheline_aligned_in_smp; > +}; > +#endif > + > /* > * Network device statistics. Akin to the 2.0 ether stats but > * with byte counters. > @@ -339,6 +349,10 @@ struct net_device > #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) > #define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM) > > +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE > +#define NETIF_F_MULTI_QUEUE 8192/* Has multiple TX/RX queues */ > +#endif > + > struct net_device *next_sched; > > /* Interface index. Unique device identifier*/ > @@ -532,6 +546,19 @@ struct net_device > struct device dev; > /* space for optional statistics and wireless sysfs groups */ > struct attribute_group *sysfs_groups[3]; > + > +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE > + /* To retrieve statistics per subqueue - FOR FUTURE USE */ > + struct net_device_stats* (*get_subqueue_stats)(struct net_device > *dev, > + int > queue_index); > + > + /* The TX queue control structures */ > + struct net_device_subqueue *egress_subqueue; > + int egress_subqueue_count; > + int (*hard_start_subqueue_xmit)(struct > sk_buff *skb, > + struct > net_device *dev, > + int > queue_index); > +#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */ > }; > #define to_net_dev(d) container_of(d, struct net_device, dev) > > @@ -673,6 +700,52 @@ static inline int netif_running(const > struct net_device *dev) >
RE: [PATCH 1/2] NET: Multiple queue network device support
Fucntion "map_queue" returns queue index as '0'. There is no support to return different queue indexes. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kok, Auke Sent: Friday, February 09, 2007 5:40 AM To: David Miller; Garzik, Jeff; netdev@vger.kernel.org; linux-kernel@vger.kernel.org Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke; Ronciak, John Subject: [PATCH 1/2] NET: Multiple queue network device support From: Peter Waskiewicz Jr <[EMAIL PROTECTED]> Added an API and associated supporting routines for multiqueue network devices. This allows network devices supporting multiple TX queues to configure each queue within the netdevice and manage each queue independantly. Changes to the PRIO Qdisc also allow a user to map multiple flows to individual TX queues, taking advantage of each queue on the device. Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]> Signed-off-by: Auke Kok <[EMAIL PROTECTED]> --- include/linux/netdevice.h | 73 include/net/sch_generic.h |3 + net/Kconfig | 20 net/core/dev.c| 51 net/sched/sch_generic.c | 117 + net/sched/sch_prio.c | 106 ++--- 6 files changed, 364 insertions(+), 6 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2e37f50..c7f94a8 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -106,6 +106,16 @@ struct netpoll_info; #define MAX_HEADER (LL_MAX_HEADER + 48) #endif +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + +struct net_device_subqueue +{ + /* Give a lock and a flow control state for each queue */ + unsigned long state; + spinlock_t queue_lock cacheline_aligned_in_smp; +}; +#endif + /* * Network device statistics. Akin to the 2.0 ether stats but * with byte counters. @@ -339,6 +349,10 @@ struct net_device #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) #define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM) +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE +#define NETIF_F_MULTI_QUEUE8192/* Has multiple TX/RX queues */ +#endif + struct net_device *next_sched; /* Interface index. Unique device identifier*/ @@ -532,6 +546,19 @@ struct net_device struct device dev; /* space for optional statistics and wireless sysfs groups */ struct attribute_group *sysfs_groups[3]; + +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + /* To retrieve statistics per subqueue - FOR FUTURE USE */ + struct net_device_stats* (*get_subqueue_stats)(struct net_device *dev, + int queue_index); + + /* The TX queue control structures */ + struct net_device_subqueue *egress_subqueue; + int egress_subqueue_count; + int (*hard_start_subqueue_xmit)(struct sk_buff *skb, + struct net_device *dev, + int queue_index); +#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */ }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -673,6 +700,52 @@ static inline int netif_running(const struct net_device *dev) return test_bit(__LINK_STATE_START, >state); } +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + +/* + * Routines to manage the subqueues on a device. We only need start + * stop, and a check if it's stopped. All other device management is + * done at the overall netdevice level. + * Also test the netif state if we're multi-queue at all. + */ +static inline void netif_start_subqueue(struct net_device *dev, int +queue_index) { + clear_bit(__LINK_STATE_XOFF, +>egress_subqueue[queue_index].state); +} + +static inline void netif_stop_subqueue(struct net_device *dev, int +queue_index) { #ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + set_bit(__LINK_STATE_XOFF, >egress_subqueue[queue_index].state); +} + +static inline int netif_subqueue_stopped(const struct net_device *dev, + int queue_index) { + return test_bit(__LINK_STATE_XOFF, + >egress_subqueue[queue_index].state); +} + +static inline void netif_wake_subqueue(struct net_device *dev, int +queue_index) { #ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + if (test_and_clear_bit(__LINK_STATE_XOFF, + >egress_subqueue[queue_index].state)) + __netif_schedule(dev); +} + +static inline int netif_is_multiqueue(const struct net_device *dev) { + return (!!(NETIF_F_MULTI_QUEUE & dev->features)); } #endif /* +CONFIG_NET_MULTI_QUEUE_DEVICE */
RE: [PATCH 1/2] NET: Multiple queue network device support
Function "map_queue" returns queue_index as 0 always. There is no support to return different queue numbers. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kok, Auke Sent: Friday, February 09, 2007 5:40 AM To: David Miller; Garzik, Jeff; netdev@vger.kernel.org; linux-kernel@vger.kernel.org Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke; Ronciak, John Subject: [PATCH 1/2] NET: Multiple queue network device support From: Peter Waskiewicz Jr <[EMAIL PROTECTED]> Added an API and associated supporting routines for multiqueue network devices. This allows network devices supporting multiple TX queues to configure each queue within the netdevice and manage each queue independantly. Changes to the PRIO Qdisc also allow a user to map multiple flows to individual TX queues, taking advantage of each queue on the device. Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]> Signed-off-by: Auke Kok <[EMAIL PROTECTED]> --- include/linux/netdevice.h | 73 include/net/sch_generic.h |3 + net/Kconfig | 20 net/core/dev.c| 51 net/sched/sch_generic.c | 117 + net/sched/sch_prio.c | 106 ++--- 6 files changed, 364 insertions(+), 6 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2e37f50..c7f94a8 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -106,6 +106,16 @@ struct netpoll_info; #define MAX_HEADER (LL_MAX_HEADER + 48) #endif +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + +struct net_device_subqueue +{ + /* Give a lock and a flow control state for each queue */ + unsigned long state; + spinlock_t queue_lock cacheline_aligned_in_smp; +}; +#endif + /* * Network device statistics. Akin to the 2.0 ether stats but * with byte counters. @@ -339,6 +349,10 @@ struct net_device #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) #define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM) +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE +#define NETIF_F_MULTI_QUEUE8192/* Has multiple TX/RX queues */ +#endif + struct net_device *next_sched; /* Interface index. Unique device identifier*/ @@ -532,6 +546,19 @@ struct net_device struct device dev; /* space for optional statistics and wireless sysfs groups */ struct attribute_group *sysfs_groups[3]; + +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + /* To retrieve statistics per subqueue - FOR FUTURE USE */ + struct net_device_stats* (*get_subqueue_stats)(struct net_device *dev, + int queue_index); + + /* The TX queue control structures */ + struct net_device_subqueue *egress_subqueue; + int egress_subqueue_count; + int (*hard_start_subqueue_xmit)(struct sk_buff *skb, + struct net_device *dev, + int queue_index); +#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */ }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -673,6 +700,52 @@ static inline int netif_running(const struct net_device *dev) return test_bit(__LINK_STATE_START, >state); } +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + +/* + * Routines to manage the subqueues on a device. We only need start + * stop, and a check if it's stopped. All other device management is + * done at the overall netdevice level. + * Also test the netif state if we're multi-queue at all. + */ +static inline void netif_start_subqueue(struct net_device *dev, int +queue_index) { + clear_bit(__LINK_STATE_XOFF, +>egress_subqueue[queue_index].state); +} + +static inline void netif_stop_subqueue(struct net_device *dev, int +queue_index) { #ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + set_bit(__LINK_STATE_XOFF, >egress_subqueue[queue_index].state); +} + +static inline int netif_subqueue_stopped(const struct net_device *dev, + int queue_index) { + return test_bit(__LINK_STATE_XOFF, + >egress_subqueue[queue_index].state); +} + +static inline void netif_wake_subqueue(struct net_device *dev, int +queue_index) { #ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + if (test_and_clear_bit(__LINK_STATE_XOFF, + >egress_subqueue[queue_index].state)) + __netif_schedule(dev); +} + +static inline int netif_is_multiqueue(const struct net_device *dev) { + return (!!(NETIF_F_MULTI_QUEUE & dev->features)); } #endif /* +CONFIG_NET_MULTI_QUEUE_DEVICE */
RE: [PATCH 1/2] NET: Multiple queue network device support
Function map_queue returns queue_index as 0 always. There is no support to return different queue numbers. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kok, Auke Sent: Friday, February 09, 2007 5:40 AM To: David Miller; Garzik, Jeff; netdev@vger.kernel.org; linux-kernel@vger.kernel.org Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke; Ronciak, John Subject: [PATCH 1/2] NET: Multiple queue network device support From: Peter Waskiewicz Jr [EMAIL PROTECTED] Added an API and associated supporting routines for multiqueue network devices. This allows network devices supporting multiple TX queues to configure each queue within the netdevice and manage each queue independantly. Changes to the PRIO Qdisc also allow a user to map multiple flows to individual TX queues, taking advantage of each queue on the device. Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED] Signed-off-by: Auke Kok [EMAIL PROTECTED] --- include/linux/netdevice.h | 73 include/net/sch_generic.h |3 + net/Kconfig | 20 net/core/dev.c| 51 net/sched/sch_generic.c | 117 + net/sched/sch_prio.c | 106 ++--- 6 files changed, 364 insertions(+), 6 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2e37f50..c7f94a8 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -106,6 +106,16 @@ struct netpoll_info; #define MAX_HEADER (LL_MAX_HEADER + 48) #endif +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + +struct net_device_subqueue +{ + /* Give a lock and a flow control state for each queue */ + unsigned long state; + spinlock_t queue_lock cacheline_aligned_in_smp; +}; +#endif + /* * Network device statistics. Akin to the 2.0 ether stats but * with byte counters. @@ -339,6 +349,10 @@ struct net_device #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) #define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM) +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE +#define NETIF_F_MULTI_QUEUE8192/* Has multiple TX/RX queues */ +#endif + struct net_device *next_sched; /* Interface index. Unique device identifier*/ @@ -532,6 +546,19 @@ struct net_device struct device dev; /* space for optional statistics and wireless sysfs groups */ struct attribute_group *sysfs_groups[3]; + +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + /* To retrieve statistics per subqueue - FOR FUTURE USE */ + struct net_device_stats* (*get_subqueue_stats)(struct net_device *dev, + int queue_index); + + /* The TX queue control structures */ + struct net_device_subqueue *egress_subqueue; + int egress_subqueue_count; + int (*hard_start_subqueue_xmit)(struct sk_buff *skb, + struct net_device *dev, + int queue_index); +#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */ }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -673,6 +700,52 @@ static inline int netif_running(const struct net_device *dev) return test_bit(__LINK_STATE_START, dev-state); } +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + +/* + * Routines to manage the subqueues on a device. We only need start + * stop, and a check if it's stopped. All other device management is + * done at the overall netdevice level. + * Also test the netif state if we're multi-queue at all. + */ +static inline void netif_start_subqueue(struct net_device *dev, int +queue_index) { + clear_bit(__LINK_STATE_XOFF, +dev-egress_subqueue[queue_index].state); +} + +static inline void netif_stop_subqueue(struct net_device *dev, int +queue_index) { #ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + set_bit(__LINK_STATE_XOFF, dev-egress_subqueue[queue_index].state); +} + +static inline int netif_subqueue_stopped(const struct net_device *dev, + int queue_index) { + return test_bit(__LINK_STATE_XOFF, + dev-egress_subqueue[queue_index].state); +} + +static inline void netif_wake_subqueue(struct net_device *dev, int +queue_index) { #ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + if (test_and_clear_bit(__LINK_STATE_XOFF, + dev-egress_subqueue[queue_index].state)) + __netif_schedule(dev); +} + +static inline int netif_is_multiqueue(const struct net_device *dev) { + return (!!(NETIF_F_MULTI_QUEUE dev-features)); } #endif /* +CONFIG_NET_MULTI_QUEUE_DEVICE */ + /* Use this variant when it is known
RE: [PATCH 1/2] NET: Multiple queue network device support
Fucntion map_queue returns queue index as '0'. There is no support to return different queue indexes. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kok, Auke Sent: Friday, February 09, 2007 5:40 AM To: David Miller; Garzik, Jeff; netdev@vger.kernel.org; linux-kernel@vger.kernel.org Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke; Ronciak, John Subject: [PATCH 1/2] NET: Multiple queue network device support From: Peter Waskiewicz Jr [EMAIL PROTECTED] Added an API and associated supporting routines for multiqueue network devices. This allows network devices supporting multiple TX queues to configure each queue within the netdevice and manage each queue independantly. Changes to the PRIO Qdisc also allow a user to map multiple flows to individual TX queues, taking advantage of each queue on the device. Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED] Signed-off-by: Auke Kok [EMAIL PROTECTED] --- include/linux/netdevice.h | 73 include/net/sch_generic.h |3 + net/Kconfig | 20 net/core/dev.c| 51 net/sched/sch_generic.c | 117 + net/sched/sch_prio.c | 106 ++--- 6 files changed, 364 insertions(+), 6 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2e37f50..c7f94a8 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -106,6 +106,16 @@ struct netpoll_info; #define MAX_HEADER (LL_MAX_HEADER + 48) #endif +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + +struct net_device_subqueue +{ + /* Give a lock and a flow control state for each queue */ + unsigned long state; + spinlock_t queue_lock cacheline_aligned_in_smp; +}; +#endif + /* * Network device statistics. Akin to the 2.0 ether stats but * with byte counters. @@ -339,6 +349,10 @@ struct net_device #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) #define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM) +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE +#define NETIF_F_MULTI_QUEUE8192/* Has multiple TX/RX queues */ +#endif + struct net_device *next_sched; /* Interface index. Unique device identifier*/ @@ -532,6 +546,19 @@ struct net_device struct device dev; /* space for optional statistics and wireless sysfs groups */ struct attribute_group *sysfs_groups[3]; + +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + /* To retrieve statistics per subqueue - FOR FUTURE USE */ + struct net_device_stats* (*get_subqueue_stats)(struct net_device *dev, + int queue_index); + + /* The TX queue control structures */ + struct net_device_subqueue *egress_subqueue; + int egress_subqueue_count; + int (*hard_start_subqueue_xmit)(struct sk_buff *skb, + struct net_device *dev, + int queue_index); +#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */ }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -673,6 +700,52 @@ static inline int netif_running(const struct net_device *dev) return test_bit(__LINK_STATE_START, dev-state); } +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + +/* + * Routines to manage the subqueues on a device. We only need start + * stop, and a check if it's stopped. All other device management is + * done at the overall netdevice level. + * Also test the netif state if we're multi-queue at all. + */ +static inline void netif_start_subqueue(struct net_device *dev, int +queue_index) { + clear_bit(__LINK_STATE_XOFF, +dev-egress_subqueue[queue_index].state); +} + +static inline void netif_stop_subqueue(struct net_device *dev, int +queue_index) { #ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + set_bit(__LINK_STATE_XOFF, dev-egress_subqueue[queue_index].state); +} + +static inline int netif_subqueue_stopped(const struct net_device *dev, + int queue_index) { + return test_bit(__LINK_STATE_XOFF, + dev-egress_subqueue[queue_index].state); +} + +static inline void netif_wake_subqueue(struct net_device *dev, int +queue_index) { #ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + if (test_and_clear_bit(__LINK_STATE_XOFF, + dev-egress_subqueue[queue_index].state)) + __netif_schedule(dev); +} + +static inline int netif_is_multiqueue(const struct net_device *dev) { + return (!!(NETIF_F_MULTI_QUEUE dev-features)); } #endif /* +CONFIG_NET_MULTI_QUEUE_DEVICE */ + /* Use this variant when it is known for sure
RE: [PATCH 1/2] NET: Multiple queue network device support
map_queue will always return 0 in the pfifo_fast Qdisc. This is because pfifo_fast is the default scheduler when a device is brought up. I didn't want to make any assumptions about a device's functionality, so using queue 0 seems the least dangerous. However, in the prio qdisc, there is a small problem with the queue to band assignment logic. If you have the same number of bands as queues (e.g. 4 prio bands and 4 Tx queues), then you will have all bands assigned to queue 0. I have a fix for that, which I plan to send when I finish up a documentation piece for this patch's repost. Thanks for the feedback, PJ Waskiewicz -Original Message- From: Sreenivasa Honnur [mailto:[EMAIL PROTECTED] Sent: Friday, February 23, 2007 1:01 AM To: Kok, Auke-jan H; David Miller; Garzik, Jeff; netdev@vger.kernel.org; linux-kernel@vger.kernel.org Cc: Waskiewicz Jr, Peter P; Brandeburg, Jesse; Kok, Auke; Ronciak, John Subject: RE: [PATCH 1/2] NET: Multiple queue network device support Fucntion map_queue returns queue index as '0'. There is no support to return different queue indexes. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kok, Auke Sent: Friday, February 09, 2007 5:40 AM To: David Miller; Garzik, Jeff; netdev@vger.kernel.org; linux-kernel@vger.kernel.org Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke; Ronciak, John Subject: [PATCH 1/2] NET: Multiple queue network device support From: Peter Waskiewicz Jr [EMAIL PROTECTED] Added an API and associated supporting routines for multiqueue network devices. This allows network devices supporting multiple TX queues to configure each queue within the netdevice and manage each queue independantly. Changes to the PRIO Qdisc also allow a user to map multiple flows to individual TX queues, taking advantage of each queue on the device. Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED] Signed-off-by: Auke Kok [EMAIL PROTECTED] --- include/linux/netdevice.h | 73 include/net/sch_generic.h |3 + net/Kconfig | 20 net/core/dev.c| 51 net/sched/sch_generic.c | 117 + net/sched/sch_prio.c | 106 ++--- 6 files changed, 364 insertions(+), 6 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2e37f50..c7f94a8 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -106,6 +106,16 @@ struct netpoll_info; #define MAX_HEADER (LL_MAX_HEADER + 48) #endif +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + +struct net_device_subqueue +{ + /* Give a lock and a flow control state for each queue */ + unsigned long state; + spinlock_t queue_lock cacheline_aligned_in_smp; +}; +#endif + /* * Network device statistics. Akin to the 2.0 ether stats but * with byte counters. @@ -339,6 +349,10 @@ struct net_device #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) #define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM) +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE +#define NETIF_F_MULTI_QUEUE 8192/* Has multiple TX/RX queues */ +#endif + struct net_device *next_sched; /* Interface index. Unique device identifier*/ @@ -532,6 +546,19 @@ struct net_device struct device dev; /* space for optional statistics and wireless sysfs groups */ struct attribute_group *sysfs_groups[3]; + +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + /* To retrieve statistics per subqueue - FOR FUTURE USE */ + struct net_device_stats* (*get_subqueue_stats)(struct net_device *dev, + int queue_index); + + /* The TX queue control structures */ + struct net_device_subqueue *egress_subqueue; + int egress_subqueue_count; + int (*hard_start_subqueue_xmit)(struct sk_buff *skb, + struct net_device *dev, + int queue_index); +#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */ }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -673,6 +700,52 @@ static inline int netif_running(const struct net_device *dev) return test_bit(__LINK_STATE_START, dev-state); } +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + +/* + * Routines to manage the subqueues on a device. We only need start + * stop, and a check if it's stopped. All other device management is + * done at the overall netdevice level. + * Also test the netif state if we're multi-queue at all. + */ +static inline void netif_start_subqueue(struct net_device *dev, int +queue_index) { + clear_bit(__LINK_STATE_XOFF
Re: [PATCH 1/2] NET: Multiple queue network device support
On Fri, 23 Feb 2007 04:00:55 -0500 Sreenivasa Honnur [EMAIL PROTECTED] wrote: Fucntion map_queue returns queue index as '0'. There is no support to return different queue indexes. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kok, Auke Sent: Friday, February 09, 2007 5:40 AM To: David Miller; Garzik, Jeff; netdev@vger.kernel.org; linux-kernel@vger.kernel.org Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke; Ronciak, John Subject: [PATCH 1/2] NET: Multiple queue network device support From: Peter Waskiewicz Jr [EMAIL PROTECTED] Added an API and associated supporting routines for multiqueue network devices. This allows network devices supporting multiple TX queues to configure each queue within the netdevice and manage each queue independantly. Changes to the PRIO Qdisc also allow a user to map multiple flows to individual TX queues, taking advantage of each queue on the device. Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED] Signed-off-by: Auke Kok [EMAIL PROTECTED] --- +config NET_MULTI_QUEUE_DEVICE + bool Multiple queue network device support (EXPERIMENTAL) + depends on NET_SCHED EXPERIMENTAL + help + Saying Y here will add support for network devices that have more than + one physical TX queue and/or RX queue. + + Multiple queue devices will require qdiscs that understand how to + queue to multiple targets. The default packet scheduler will default + to the first queue in a device. In other words, if you need the + ability to spread traffic across queues, your queueing discipline + needs to know how to do that. + + Note that saying Y here will give preferential treatment to multiple + queue devices in the network stack. A slight drop in single-queue + device performance may be seen. + + Say N here unless you know your network device supports multiple + TX and/or RX queues. + This should not be a user visible configuration option. It should either: always be part of the kernel API or be selected by drivers that need/want it. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] NET: Multiple queue network device support
Stephen Hemminger wrote: On Fri, 23 Feb 2007 04:00:55 -0500 Sreenivasa Honnur [EMAIL PROTECTED] wrote: Fucntion map_queue returns queue index as '0'. There is no support to return different queue indexes. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kok, Auke Sent: Friday, February 09, 2007 5:40 AM To: David Miller; Garzik, Jeff; netdev@vger.kernel.org; linux-kernel@vger.kernel.org Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke; Ronciak, John Subject: [PATCH 1/2] NET: Multiple queue network device support From: Peter Waskiewicz Jr [EMAIL PROTECTED] Added an API and associated supporting routines for multiqueue network devices. This allows network devices supporting multiple TX queues to configure each queue within the netdevice and manage each queue independantly. Changes to the PRIO Qdisc also allow a user to map multiple flows to individual TX queues, taking advantage of each queue on the device. Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED] Signed-off-by: Auke Kok [EMAIL PROTECTED] --- +config NET_MULTI_QUEUE_DEVICE + bool Multiple queue network device support (EXPERIMENTAL) + depends on NET_SCHED EXPERIMENTAL + help + Saying Y here will add support for network devices that have more than + one physical TX queue and/or RX queue. + + Multiple queue devices will require qdiscs that understand how to + queue to multiple targets. The default packet scheduler will default + to the first queue in a device. In other words, if you need the + ability to spread traffic across queues, your queueing discipline + needs to know how to do that. + + Note that saying Y here will give preferential treatment to multiple + queue devices in the network stack. A slight drop in single-queue + device performance may be seen. + + Say N here unless you know your network device supports multiple + TX and/or RX queues. + This should not be a user visible configuration option. It should either: always be part of the kernel API or be selected by drivers that need/want it. perhaps when it's stable, yes, but right now it's definately experimental and may result in a slight overhead for single-queue devices as the text reads. Cheers, Auke - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] NET: Multiple queue network device support
From: Peter Waskiewicz Jr <[EMAIL PROTECTED]> Added an API and associated supporting routines for multiqueue network devices. This allows network devices supporting multiple TX queues to configure each queue within the netdevice and manage each queue independantly. Changes to the PRIO Qdisc also allow a user to map multiple flows to individual TX queues, taking advantage of each queue on the device. Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]> Signed-off-by: Auke Kok <[EMAIL PROTECTED]> --- include/linux/netdevice.h | 73 include/net/sch_generic.h |3 + net/Kconfig | 20 net/core/dev.c| 51 net/sched/sch_generic.c | 117 + net/sched/sch_prio.c | 106 ++--- 6 files changed, 364 insertions(+), 6 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2e37f50..c7f94a8 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -106,6 +106,16 @@ struct netpoll_info; #define MAX_HEADER (LL_MAX_HEADER + 48) #endif +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + +struct net_device_subqueue +{ + /* Give a lock and a flow control state for each queue */ + unsigned long state; + spinlock_t queue_lock cacheline_aligned_in_smp; +}; +#endif + /* * Network device statistics. Akin to the 2.0 ether stats but * with byte counters. @@ -339,6 +349,10 @@ struct net_device #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) #define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM) +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE +#define NETIF_F_MULTI_QUEUE8192/* Has multiple TX/RX queues */ +#endif + struct net_device *next_sched; /* Interface index. Unique device identifier*/ @@ -532,6 +546,19 @@ struct net_device struct device dev; /* space for optional statistics and wireless sysfs groups */ struct attribute_group *sysfs_groups[3]; + +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + /* To retrieve statistics per subqueue - FOR FUTURE USE */ + struct net_device_stats* (*get_subqueue_stats)(struct net_device *dev, + int queue_index); + + /* The TX queue control structures */ + struct net_device_subqueue *egress_subqueue; + int egress_subqueue_count; + int (*hard_start_subqueue_xmit)(struct sk_buff *skb, + struct net_device *dev, + int queue_index); +#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */ }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -673,6 +700,52 @@ static inline int netif_running(const struct net_device *dev) return test_bit(__LINK_STATE_START, >state); } +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + +/* + * Routines to manage the subqueues on a device. We only need start + * stop, and a check if it's stopped. All other device management is + * done at the overall netdevice level. + * Also test the netif state if we're multi-queue at all. + */ +static inline void netif_start_subqueue(struct net_device *dev, int queue_index) +{ + clear_bit(__LINK_STATE_XOFF, >egress_subqueue[queue_index].state); +} + +static inline void netif_stop_subqueue(struct net_device *dev, int queue_index) +{ +#ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + set_bit(__LINK_STATE_XOFF, >egress_subqueue[queue_index].state); +} + +static inline int netif_subqueue_stopped(const struct net_device *dev, + int queue_index) +{ + return test_bit(__LINK_STATE_XOFF, + >egress_subqueue[queue_index].state); +} + +static inline void netif_wake_subqueue(struct net_device *dev, int queue_index) +{ +#ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + if (test_and_clear_bit(__LINK_STATE_XOFF, + >egress_subqueue[queue_index].state)) + __netif_schedule(dev); +} + +static inline int netif_is_multiqueue(const struct net_device *dev) +{ + return (!!(NETIF_F_MULTI_QUEUE & dev->features)); +} +#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */ + /* Use this variant when it is known for sure that it * is executing from interrupt context. diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 8208639..8751ba6 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -102,6 +102,9 @@ struct Qdisc_ops int (*dump)(struct Qdisc *, struct sk_buff *); int (*dump_stats)(struct Qdisc *, struct gnet_dump *); +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE +
[PATCH 1/2] NET: Multiple queue network device support
From: Peter Waskiewicz Jr [EMAIL PROTECTED] Added an API and associated supporting routines for multiqueue network devices. This allows network devices supporting multiple TX queues to configure each queue within the netdevice and manage each queue independantly. Changes to the PRIO Qdisc also allow a user to map multiple flows to individual TX queues, taking advantage of each queue on the device. Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED] Signed-off-by: Auke Kok [EMAIL PROTECTED] --- include/linux/netdevice.h | 73 include/net/sch_generic.h |3 + net/Kconfig | 20 net/core/dev.c| 51 net/sched/sch_generic.c | 117 + net/sched/sch_prio.c | 106 ++--- 6 files changed, 364 insertions(+), 6 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2e37f50..c7f94a8 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -106,6 +106,16 @@ struct netpoll_info; #define MAX_HEADER (LL_MAX_HEADER + 48) #endif +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + +struct net_device_subqueue +{ + /* Give a lock and a flow control state for each queue */ + unsigned long state; + spinlock_t queue_lock cacheline_aligned_in_smp; +}; +#endif + /* * Network device statistics. Akin to the 2.0 ether stats but * with byte counters. @@ -339,6 +349,10 @@ struct net_device #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) #define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM) +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE +#define NETIF_F_MULTI_QUEUE8192/* Has multiple TX/RX queues */ +#endif + struct net_device *next_sched; /* Interface index. Unique device identifier*/ @@ -532,6 +546,19 @@ struct net_device struct device dev; /* space for optional statistics and wireless sysfs groups */ struct attribute_group *sysfs_groups[3]; + +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + /* To retrieve statistics per subqueue - FOR FUTURE USE */ + struct net_device_stats* (*get_subqueue_stats)(struct net_device *dev, + int queue_index); + + /* The TX queue control structures */ + struct net_device_subqueue *egress_subqueue; + int egress_subqueue_count; + int (*hard_start_subqueue_xmit)(struct sk_buff *skb, + struct net_device *dev, + int queue_index); +#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */ }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -673,6 +700,52 @@ static inline int netif_running(const struct net_device *dev) return test_bit(__LINK_STATE_START, dev-state); } +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE + +/* + * Routines to manage the subqueues on a device. We only need start + * stop, and a check if it's stopped. All other device management is + * done at the overall netdevice level. + * Also test the netif state if we're multi-queue at all. + */ +static inline void netif_start_subqueue(struct net_device *dev, int queue_index) +{ + clear_bit(__LINK_STATE_XOFF, dev-egress_subqueue[queue_index].state); +} + +static inline void netif_stop_subqueue(struct net_device *dev, int queue_index) +{ +#ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + set_bit(__LINK_STATE_XOFF, dev-egress_subqueue[queue_index].state); +} + +static inline int netif_subqueue_stopped(const struct net_device *dev, + int queue_index) +{ + return test_bit(__LINK_STATE_XOFF, + dev-egress_subqueue[queue_index].state); +} + +static inline void netif_wake_subqueue(struct net_device *dev, int queue_index) +{ +#ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + if (test_and_clear_bit(__LINK_STATE_XOFF, + dev-egress_subqueue[queue_index].state)) + __netif_schedule(dev); +} + +static inline int netif_is_multiqueue(const struct net_device *dev) +{ + return (!!(NETIF_F_MULTI_QUEUE dev-features)); +} +#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */ + /* Use this variant when it is known for sure that it * is executing from interrupt context. diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 8208639..8751ba6 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -102,6 +102,9 @@ struct Qdisc_ops int (*dump)(struct Qdisc *, struct sk_buff *); int (*dump_stats)(struct Qdisc *, struct gnet_dump *); +#ifdef