RE: [PATCH 1/2] NET: Multiple queue network device support

2007-03-12 Thread Waskiewicz Jr, Peter P

> -Original Message-
> From: Jarek Poplawski [mailto:[EMAIL PROTECTED] 
> Sent: Monday, March 12, 2007 1:58 AM
> To: Thomas Graf
> Cc: Kok, Auke-jan H; David Miller; Garzik, Jeff; 
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; 
> Waskiewicz Jr, Peter P; Brandeburg, Jesse; Kok, Auke; Ronciak, John
> Subject: Re: [PATCH 1/2] NET: Multiple queue network device support
> 
> On 09-03-2007 14:40, Thomas Graf wrote:
> > * Kok, Auke <[EMAIL PROTECTED]> 2007-02-08 16:09
> >> diff --git a/net/core/dev.c b/net/core/dev.c index 
> 455d589..42b635c 
> >> 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -1477,6 +1477,49 @@ gso:
> >>skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS);
> >>  #endif
> >>if (q->enqueue) {
> >> +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
> >> +  int queue_index;
> >> +  /* If we're a multi-queue device, get a queue 
> index to lock */
> >> +  if (netif_is_multiqueue(dev))
> >> +  {
> >> +  /* Get the queue index and lock it. */
> >> +  if (likely(q->ops->map_queue)) {
> >> +  queue_index = q->ops->map_queue(skb, q);
> >> +  
> spin_lock(>egress_subqueue[queue_index].queue_lock);
> >> +  rc = q->enqueue(skb, q);
> 
> I'm not sure Dave Miller thought about this place, when he 
> proposed to save the mapping, but I think this could be not 
> enough. This place is racy: ->map_queue() is called 2 times 
> and with some filters (and
> policies/actions) results could differ. And of course the 
> subqueue lock doesn't prevent any filter from a config change 
> in the meantime.
> 
> After second reading of this patch I have doubts it's the 
> proper way to solve the problem: there are many subqueues but 
> we need a top queue (prio here) to mange them, anyway. So, 
> why not to build this functionality directly into the queue? 
> There is no difference for a device if skbs are going from 
> the subqueue or a class, it is only interested in the mapping 
> result and a possibility to stop and start a subqueue and to 
> query its status. All this could be done by adding the 
> callbacks directly to any classful scheduler or, if not 
> enough, to write some specialized qdisc based on prio. The 
> possibility to lock only a subqueue instead of a queue could 
> be analized independently - current proposal doesn't solve 
> this anyway.
> 
> Regards,
> Jarek P.
> 

Thanks again for the feedback.  Given some discussions I had last week
in the office and the general feedback here, I'm going to remove the new
per-queue locking and leave the start/stop functions for each queue and
combine entry points for hard_start_xmit().  I'll get this out asap for
review once it's been tested here.  If we see issues in the future with
lock contention on the queues, we can revisit the per-queue locking.

Cheers,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] NET: Multiple queue network device support

2007-03-12 Thread Jarek Poplawski
On 09-03-2007 14:40, Thomas Graf wrote:
> * Kok, Auke <[EMAIL PROTECTED]> 2007-02-08 16:09
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 455d589..42b635c 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1477,6 +1477,49 @@ gso:
>>  skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS);
>>  #endif
>>  if (q->enqueue) {
>> +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
>> +int queue_index;
>> +/* If we're a multi-queue device, get a queue index to lock */
>> +if (netif_is_multiqueue(dev))
>> +{
>> +/* Get the queue index and lock it. */
>> +if (likely(q->ops->map_queue)) {
>> +queue_index = q->ops->map_queue(skb, q);
>> +
>> spin_lock(>egress_subqueue[queue_index].queue_lock);
>> +rc = q->enqueue(skb, q);

I'm not sure Dave Miller thought about this place, when he proposed
to save the mapping, but I think this could be not enough. This place
is racy: ->map_queue() is called 2 times and with some filters (and
policies/actions) results could differ. And of course the subqueue
lock doesn't prevent any filter from a config change in the meantime.

After second reading of this patch I have doubts it's the proper way
to solve the problem: there are many subqueues but we need a top queue
(prio here) to mange them, anyway. So, why not to build this
functionality directly into the queue? There is no difference for a
device if skbs are going from the subqueue or a class, it is only
interested in the mapping result and a possibility to stop and start
a subqueue and to query its status. All this could be done by adding
the callbacks directly to any classful scheduler or, if not enough,
to write some specialized qdisc based on prio. The possibility to lock
only a subqueue instead of a queue could be analized independently -
current proposal doesn't solve this anyway.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] NET: Multiple queue network device support

2007-03-12 Thread Jarek Poplawski
On 09-03-2007 14:40, Thomas Graf wrote:
 * Kok, Auke [EMAIL PROTECTED] 2007-02-08 16:09
 diff --git a/net/core/dev.c b/net/core/dev.c
 index 455d589..42b635c 100644
 --- a/net/core/dev.c
 +++ b/net/core/dev.c
 @@ -1477,6 +1477,49 @@ gso:
  skb-tc_verd = SET_TC_AT(skb-tc_verd,AT_EGRESS);
  #endif
  if (q-enqueue) {
 +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
 +int queue_index;
 +/* If we're a multi-queue device, get a queue index to lock */
 +if (netif_is_multiqueue(dev))
 +{
 +/* Get the queue index and lock it. */
 +if (likely(q-ops-map_queue)) {
 +queue_index = q-ops-map_queue(skb, q);
 +
 spin_lock(dev-egress_subqueue[queue_index].queue_lock);
 +rc = q-enqueue(skb, q);

I'm not sure Dave Miller thought about this place, when he proposed
to save the mapping, but I think this could be not enough. This place
is racy: -map_queue() is called 2 times and with some filters (and
policies/actions) results could differ. And of course the subqueue
lock doesn't prevent any filter from a config change in the meantime.

After second reading of this patch I have doubts it's the proper way
to solve the problem: there are many subqueues but we need a top queue
(prio here) to mange them, anyway. So, why not to build this
functionality directly into the queue? There is no difference for a
device if skbs are going from the subqueue or a class, it is only
interested in the mapping result and a possibility to stop and start
a subqueue and to query its status. All this could be done by adding
the callbacks directly to any classful scheduler or, if not enough,
to write some specialized qdisc based on prio. The possibility to lock
only a subqueue instead of a queue could be analized independently -
current proposal doesn't solve this anyway.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] NET: Multiple queue network device support

2007-03-12 Thread Waskiewicz Jr, Peter P

 -Original Message-
 From: Jarek Poplawski [mailto:[EMAIL PROTECTED] 
 Sent: Monday, March 12, 2007 1:58 AM
 To: Thomas Graf
 Cc: Kok, Auke-jan H; David Miller; Garzik, Jeff; 
 netdev@vger.kernel.org; linux-kernel@vger.kernel.org; 
 Waskiewicz Jr, Peter P; Brandeburg, Jesse; Kok, Auke; Ronciak, John
 Subject: Re: [PATCH 1/2] NET: Multiple queue network device support
 
 On 09-03-2007 14:40, Thomas Graf wrote:
  * Kok, Auke [EMAIL PROTECTED] 2007-02-08 16:09
  diff --git a/net/core/dev.c b/net/core/dev.c index 
 455d589..42b635c 
  100644
  --- a/net/core/dev.c
  +++ b/net/core/dev.c
  @@ -1477,6 +1477,49 @@ gso:
 skb-tc_verd = SET_TC_AT(skb-tc_verd,AT_EGRESS);
   #endif
 if (q-enqueue) {
  +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
  +  int queue_index;
  +  /* If we're a multi-queue device, get a queue 
 index to lock */
  +  if (netif_is_multiqueue(dev))
  +  {
  +  /* Get the queue index and lock it. */
  +  if (likely(q-ops-map_queue)) {
  +  queue_index = q-ops-map_queue(skb, q);
  +  
 spin_lock(dev-egress_subqueue[queue_index].queue_lock);
  +  rc = q-enqueue(skb, q);
 
 I'm not sure Dave Miller thought about this place, when he 
 proposed to save the mapping, but I think this could be not 
 enough. This place is racy: -map_queue() is called 2 times 
 and with some filters (and
 policies/actions) results could differ. And of course the 
 subqueue lock doesn't prevent any filter from a config change 
 in the meantime.
 
 After second reading of this patch I have doubts it's the 
 proper way to solve the problem: there are many subqueues but 
 we need a top queue (prio here) to mange them, anyway. So, 
 why not to build this functionality directly into the queue? 
 There is no difference for a device if skbs are going from 
 the subqueue or a class, it is only interested in the mapping 
 result and a possibility to stop and start a subqueue and to 
 query its status. All this could be done by adding the 
 callbacks directly to any classful scheduler or, if not 
 enough, to write some specialized qdisc based on prio. The 
 possibility to lock only a subqueue instead of a queue could 
 be analized independently - current proposal doesn't solve 
 this anyway.
 
 Regards,
 Jarek P.
 

Thanks again for the feedback.  Given some discussions I had last week
in the office and the general feedback here, I'm going to remove the new
per-queue locking and leave the start/stop functions for each queue and
combine entry points for hard_start_xmit().  I'll get this out asap for
review once it's been tested here.  If we see issues in the future with
lock contention on the queues, we can revisit the per-queue locking.

Cheers,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] NET: Multiple queue network device support

2007-03-10 Thread Waskiewicz Jr, Peter P
> -Original Message-
> From: Thomas Graf [mailto:[EMAIL PROTECTED] 
> Sent: Friday, March 09, 2007 6:35 PM
> To: Waskiewicz Jr, Peter P
> Cc: Kok, Auke-jan H; David Miller; Garzik, Jeff; 
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; 
> Brandeburg, Jesse; Kok, Auke; Ronciak, John
> Subject: Re: [PATCH 1/2] NET: Multiple queue network device support
> 
> * Waskiewicz Jr, Peter P <[EMAIL PROTECTED]> 
> 2007-03-09 15:27
> > That's the entire point of this extra locking.  enqueue() 
> is going to 
> > put an skb into a band somewhere that maps to some queue, 
> and there is 
> > no way to guarantee the skb I retrieve from dequeue() is headed for 
> > the same queue.  Therefore, I need to unlock the queue 
> after I finish 
> > enqueuing, since having that lock makes little sense to dequeue().
> > dequeue() will then grab *a* lock on a queue; it may be the 
> same one 
> > we had during enqueue(), but it may not be.  And the 
> placement of the 
> > unlock of that queue is exactly where it happens in non-multiqueue, 
> > which is right before the hard_start_xmit().
> 
> The lock is already unlocked after dequeue, from your prio_dequeue():
> 
>if (netif_is_multiqueue(sch->dev)) {
>queue = q->band2queue[prio];
>if 
> (spin_trylock(>dev->egress_subqueue[queue].queue_lock)) {
>qdisc = q->queues[prio];
>skb = qdisc->dequeue(qdisc);
>if (skb) {
>sch->q.qlen--;
>skb->priority = prio;
>
> spin_unlock(>dev->egress_subqueue[queue].queue_lock);
>return skb;
>}
>
> spin_unlock(>dev->egress_subqueue[queue].queue_lock);
>}

Ok, now I see what's wrong.  Taking Dave M.'s recommendation to store
the queue mapping in the skb will let me unlock the queue when dequeue()
returns.  I'll fix this locking issue; thanks for the feedback and
persistent drilling into my thick head.

-PJ Waskiewicz
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] NET: Multiple queue network device support

2007-03-10 Thread Waskiewicz Jr, Peter P
 -Original Message-
 From: Thomas Graf [mailto:[EMAIL PROTECTED] 
 Sent: Friday, March 09, 2007 6:35 PM
 To: Waskiewicz Jr, Peter P
 Cc: Kok, Auke-jan H; David Miller; Garzik, Jeff; 
 netdev@vger.kernel.org; linux-kernel@vger.kernel.org; 
 Brandeburg, Jesse; Kok, Auke; Ronciak, John
 Subject: Re: [PATCH 1/2] NET: Multiple queue network device support
 
 * Waskiewicz Jr, Peter P [EMAIL PROTECTED] 
 2007-03-09 15:27
  That's the entire point of this extra locking.  enqueue() 
 is going to 
  put an skb into a band somewhere that maps to some queue, 
 and there is 
  no way to guarantee the skb I retrieve from dequeue() is headed for 
  the same queue.  Therefore, I need to unlock the queue 
 after I finish 
  enqueuing, since having that lock makes little sense to dequeue().
  dequeue() will then grab *a* lock on a queue; it may be the 
 same one 
  we had during enqueue(), but it may not be.  And the 
 placement of the 
  unlock of that queue is exactly where it happens in non-multiqueue, 
  which is right before the hard_start_xmit().
 
 The lock is already unlocked after dequeue, from your prio_dequeue():
 
if (netif_is_multiqueue(sch-dev)) {
queue = q-band2queue[prio];
if 
 (spin_trylock(sch-dev-egress_subqueue[queue].queue_lock)) {
qdisc = q-queues[prio];
skb = qdisc-dequeue(qdisc);
if (skb) {
sch-q.qlen--;
skb-priority = prio;

 spin_unlock(sch-dev-egress_subqueue[queue].queue_lock);
return skb;
}

 spin_unlock(sch-dev-egress_subqueue[queue].queue_lock);
}

Ok, now I see what's wrong.  Taking Dave M.'s recommendation to store
the queue mapping in the skb will let me unlock the queue when dequeue()
returns.  I'll fix this locking issue; thanks for the feedback and
persistent drilling into my thick head.

-PJ Waskiewicz
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] NET: Multiple queue network device support

2007-03-09 Thread Thomas Graf
* Waskiewicz Jr, Peter P <[EMAIL PROTECTED]> 2007-03-09 15:27
> That's the entire point of this extra locking.  enqueue() is going to
> put an skb into a band somewhere that maps to some queue, and there is
> no way to guarantee the skb I retrieve from dequeue() is headed for the
> same queue.  Therefore, I need to unlock the queue after I finish
> enqueuing, since having that lock makes little sense to dequeue().
> dequeue() will then grab *a* lock on a queue; it may be the same one we
> had during enqueue(), but it may not be.  And the placement of the
> unlock of that queue is exactly where it happens in non-multiqueue,
> which is right before the hard_start_xmit().

The lock is already unlocked after dequeue, from your prio_dequeue():

   if (netif_is_multiqueue(sch->dev)) {
   queue = q->band2queue[prio];
   if (spin_trylock(>dev->egress_subqueue[queue].queue_lock)) {
   qdisc = q->queues[prio];
   skb = qdisc->dequeue(qdisc);
   if (skb) {
   sch->q.qlen--;
   skb->priority = prio;
   
spin_unlock(>dev->egress_subqueue[queue].queue_lock);
   return skb;
   }
   
spin_unlock(>dev->egress_subqueue[queue].queue_lock);
   }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] NET: Multiple queue network device support

2007-03-09 Thread Waskiewicz Jr, Peter P
> * Waskiewicz Jr, Peter P <[EMAIL PROTECTED]> 
> 2007-03-09 11:25
> > > > +   }
> > > > +   } else {
> > > > +   /* We're not a multi-queue device. */
> > > > +   spin_lock(>queue_lock);
> > > > +   q = dev->qdisc;
> > > > +   if (q->enqueue) {
> > > > +   rc = q->enqueue(skb, q);
> > > > +   qdisc_run(dev);
> > > > +   spin_unlock(>queue_lock);
> > > > +   rc = rc == NET_XMIT_BYPASS
> > > > +  ? 
> NET_XMIT_SUCCESS : rc;
> > > > +   goto out;
> > > > +   }
> > > > +   spin_unlock(>queue_lock);
> > > 
> > > Please don't duplicate already existing code.
> > 
> > I don't want to mix multiqueue and non-multiqueue code in 
> the transmit 
> > path.  This was an attempt to allow MQ and non-MQ devices 
> to coexist 
> > in a machine having separate code paths.  Are you suggesting to 
> > combine them?  That would get very messy trying to 
> determine what type 
> > of lock to grab (subqueue lock or dev->queue_lock), not to mention 
> > grabbing the
> > dev->queue_lock would block multiqueue devices in that same 
> codepath.
> 
> The piece of code I quoted above is the branch executed if 
> multi queue is not enabled. The code you added is 100% 
> identical to the already existing enqueue logic. Just execute 
> the existing branch if multi queue is not enabled.
> 

I totally agree this is identical code if multiqueue isn't enabled.
However, when multiqueue is enabled, I don't want to make all network
drivers implement the subqueue API just to be able to lock the queues.
So the first thing I check is netif_is_multiqueue(dev), and if it
*isn't* multiqueue, it will run the existing code.  This way both
non-multiqueue devices don't have to have any knowledge of the MQ API.

> > This is another attempt to keep the two codepaths separate. 
>  The only 
> > way I see of combining them is to check netif_is_multiqueue() 
> > everytime I need to grab a lock, which I think would be excessive.
> 
> The code added is 100% identical to the existing code, just 
> be a little smarter on how you do the ifdefs.

An example could be an on-board adapter isn't multiqueue, and an
expansion card you have in your system is.  If I handle multiqueue being
on with just ifdef's, then I could use the expansion card, but not the
on-board adapter as-is.  The on-board driver would need to be updated to
have 1 queue in the multiqueue context, which is what I tried to avoid.

> > > Your modified qdisc_restart() expects the queue_lock to 
> be locked, 
> > > how can this work?
> > 
> > No, it doesn't expect the lock to be held.  Because of the multiple 
> > queues, enqueueing and dequeueing are now asynchronous, since I can 
> > enqueue to queue 0 while dequeuing from queue 1.  dev->queue_lock 
> > isn't held, so this can happen.  Therefore the 
> spin_trylock() is used 
> > in this dequeue because I don't want to wait for someone to finish 
> > with that queue in question (e.g. enqueue working), since that will 
> > block all other bands/queues after the band in question.  So if the 
> > lock isn't available to grab, we move to the next band.  If 
> I were to 
> > wait for the lock, I'd serialize the enqueue/dequeue 
> completely, and 
> > block other traffic flows in other queues waiting for the lock.
> 
> The first thing you do in qdisc_restart() after dequeue()ing 
> is unlock the sub queue lock. You explicitely unlock it 
> before calling qdisc_run() so I assume dequeue() is expected 
> to keep it locked. Something doesn't add up.

That's the entire point of this extra locking.  enqueue() is going to
put an skb into a band somewhere that maps to some queue, and there is
no way to guarantee the skb I retrieve from dequeue() is headed for the
same queue.  Therefore, I need to unlock the queue after I finish
enqueuing, since having that lock makes little sense to dequeue().
dequeue() will then grab *a* lock on a queue; it may be the same one we
had during enqueue(), but it may not be.  And the placement of the
unlock of that queue is exactly where it happens in non-multiqueue,
which is right before the hard_start_xmit().

I concede that the locking model is complex and seems really nasty, but
to truly separate queue functionality from one another, I see no other
feasible option than to run locking like this.  I am very open to
suggestions.

> 
> BTW, which lock serializes your write access to 
> qdisc->q.qlen? It used to be dev->queue_lock but that is 
> apparently not true for multi queue.
> 

This is a very good catch, because it isn't being protected on the
entire qdisc right now for PRIO.  However, Chris Leech pointed out the
LINK_STATE_QDISC_RUNNING bit is serializing things at the qdisc_run()
level, so that's 

Re: [PATCH 1/2] NET: Multiple queue network device support

2007-03-09 Thread Thomas Graf
* Waskiewicz Jr, Peter P <[EMAIL PROTECTED]> 2007-03-09 11:25
> > > + }
> > > + } else {
> > > + /* We're not a multi-queue device. */
> > > + spin_lock(>queue_lock);
> > > + q = dev->qdisc;
> > > + if (q->enqueue) {
> > > + rc = q->enqueue(skb, q);
> > > + qdisc_run(dev);
> > > + spin_unlock(>queue_lock);
> > > + rc = rc == NET_XMIT_BYPASS
> > > +? NET_XMIT_SUCCESS : rc;
> > > + goto out;
> > > + }
> > > + spin_unlock(>queue_lock);
> > 
> > Please don't duplicate already existing code.
> 
> I don't want to mix multiqueue and non-multiqueue code in the transmit
> path.  This was an attempt to allow MQ and non-MQ devices to coexist in
> a machine having separate code paths.  Are you suggesting to combine
> them?  That would get very messy trying to determine what type of lock
> to grab (subqueue lock or dev->queue_lock), not to mention grabbing the
> dev->queue_lock would block multiqueue devices in that same codepath.

The piece of code I quoted above is the branch executed if multi queue
is not enabled. The code you added is 100% identical to the already
existing enqueue logic. Just execute the existing branch if multi queue
is not enabled.

> This is another attempt to keep the two codepaths separate.  The only
> way I see of combining them is to check netif_is_multiqueue() everytime
> I need to grab a lock, which I think would be excessive.

The code added is 100% identical to the existing code, just be a little
smarter on how you do the ifdefs.

> > >   }
> > >  
> > >   return NULL;
> > > @@ -141,18 +174,53 @@ prio_dequeue(struct Qdisc* sch)
> > >   struct sk_buff *skb;
> > >   struct prio_sched_data *q = qdisc_priv(sch);
> > >   int prio;
> > > +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
> > > + int queue;
> > > +#endif
> > >   struct Qdisc *qdisc;
> > >  
> > > + /*
> > > +  * If we're multiqueue, the basic approach is try the 
> > lock on each
> > > +  * queue.  If it's locked, either we're already 
> > dequeuing, or enqueue
> > > +  * is doing something.  Go to the next band if we're 
> > locked.  Once we
> > > +  * have a packet, unlock the queue.  NOTE: the 
> > underlying qdisc CANNOT
> > > +  * be a PRIO qdisc, otherwise we will deadlock.  FIXME
> > > +  */
> > >   for (prio = 0; prio < q->bands; prio++) {
> > > +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
> > > + if (netif_is_multiqueue(sch->dev)) {
> > > + queue = q->band2queue[prio];
> > > + if 
> > (spin_trylock(>dev->egress_subqueue[queue].queue_lock)) {
> > > + qdisc = q->queues[prio];
> > > + skb = qdisc->dequeue(qdisc);
> > > + if (skb) {
> > > + sch->q.qlen--;
> > > + skb->priority = prio;
> > > + 
> > spin_unlock(>dev->egress_subqueue[queue].queue_lock);
> > > + return skb;
> > > + }
> > > + 
> > spin_unlock(>dev->egress_subqueue[queue].queue_lock);
> > > + }
> > 
> > Your modified qdisc_restart() expects the queue_lock to be 
> > locked, how can this work?
> 
> No, it doesn't expect the lock to be held.  Because of the multiple
> queues, enqueueing and dequeueing are now asynchronous, since I can
> enqueue to queue 0 while dequeuing from queue 1.  dev->queue_lock isn't
> held, so this can happen.  Therefore the spin_trylock() is used in this
> dequeue because I don't want to wait for someone to finish with that
> queue in question (e.g. enqueue working), since that will block all
> other bands/queues after the band in question.  So if the lock isn't
> available to grab, we move to the next band.  If I were to wait for the
> lock, I'd serialize the enqueue/dequeue completely, and block other
> traffic flows in other queues waiting for the lock.

The first thing you do in qdisc_restart() after dequeue()ing is unlock
the sub queue lock. You explicitely unlock it before calling qdisc_run()
so I assume dequeue() is expected to keep it locked. Something doesn't
add up.

BTW, which lock serializes your write access to qdisc->q.qlen? It used
to be dev->queue_lock but that is apparently not true for multi queue.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] NET: Multiple queue network device support

2007-03-09 Thread Waskiewicz Jr, Peter P
> -Original Message-
> From: Thomas Graf [mailto:[EMAIL PROTECTED] 
> Sent: Friday, March 09, 2007 5:41 AM
> To: Kok, Auke-jan H
> Cc: David Miller; Garzik, Jeff; netdev@vger.kernel.org; 
> linux-kernel@vger.kernel.org; Waskiewicz Jr, Peter P; 
> Brandeburg, Jesse; Kok, Auke; Ronciak, John
> Subject: Re: [PATCH 1/2] NET: Multiple queue network device support
> 
> * Kok, Auke <[EMAIL PROTECTED]> 2007-02-08 16:09
> > diff --git a/net/core/dev.c b/net/core/dev.c index 455d589..42b635c 
> > 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -1477,6 +1477,49 @@ gso:
> > skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS);
> >  #endif
> > if (q->enqueue) {
> > +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
> > +   int queue_index;
> > +   /* If we're a multi-queue device, get a queue 
> index to lock */
> > +   if (netif_is_multiqueue(dev))
> > +   {
> > +   /* Get the queue index and lock it. */
> > +   if (likely(q->ops->map_queue)) {
> > +   queue_index = q->ops->map_queue(skb, q);
> > +   
> spin_lock(>egress_subqueue[queue_index].queue_lock);
> > +   rc = q->enqueue(skb, q);
> > +   /*
> > +* Unlock because the underlying qdisc
> > +* may queue and send a packet from a
> > +* different queue.
> > +*/
> > +   
> spin_unlock(>egress_subqueue[queue_index].queue_lock);
> > +   qdisc_run(dev);
> 
> I really dislike this asymmetric locking logic, while 
> enqueue() is called with queue_lock held dequeue() is 
> repsonsible to acquire the lock for qdisc_restart().

I don't see how I can make this symmetrical when dealing with more than
one queue.  If an skb is classified to band 2 which maps to queue 1, we
lock queue 1, and enqueue the packet.  That queue will not be the first
queue checked in dequeue for an skb, so I cannot rely on that lock
protecting queue 0 in dequeue.  Therefore I need to map the skb, lock,
enqueue, unlock, then go into dequeue and lock the correct queue there.
If I were to make this symmetrical, then the process of enqueuing and
dequeuing would be serialized completely, and we'd be back to where we
started without multiqueue.

Also, in the multiqueue code path, dev->queue_lock isn't held anywhere,
it's the subqueue lock.

> 
> > +   rc = rc == NET_XMIT_BYPASS
> > +  ? NET_XMIT_SUCCESS : rc;
> > +   goto out;
> > +   } else {
> > +   printk(KERN_CRIT "Device %s is "
> > +   "multiqueue, but map_queue is "
> > +   "undefined in the qdisc!!\n",
> > +   dev->name);
> > +   kfree_skb(skb);
> 
> Move this check to tc_modify_qdisc(), it's useless to check 
> this for every packet being delivered.

The patches I sent yesterday modified the non-multiqueue qdiscs to not
load if the device is multiqueue, so this check can be removed
altogether.

> 
> > +   }
> > +   } else {
> > +   /* We're not a multi-queue device. */
> > +   spin_lock(>queue_lock);
> > +   q = dev->qdisc;
> > +   if (q->enqueue) {
> > +   rc = q->enqueue(skb, q);
> > +   qdisc_run(dev);
> > +   spin_unlock(>queue_lock);
> > +   rc = rc == NET_XMIT_BYPASS
> > +  ? NET_XMIT_SUCCESS : rc;
> > +   goto out;
> > +   }
> > +   spin_unlock(>queue_lock);
> 
> Please don't duplicate already existing code.

I don't want to mix multiqueue and non-multiqueue code in the transmit
path.  This was an attempt to allow MQ and non-MQ devices to coexist in
a machine having separate code paths.  Are you suggesting to combine
them?  That would get very messy trying to determine what type of lock
to grab (subqueue lock or dev->queue_lock), not to mention grabbing the
dev->queue_lock would block multiqueue devices in that same codepath.

> 
> > @@ -130,6 +140,72 @@ static inline int qdisc_restart(struct 
> net_device *dev)
> > }

Re: [PATCH 1/2] NET: Multiple queue network device support

2007-03-09 Thread Thomas Graf
* Kok, Auke <[EMAIL PROTECTED]> 2007-02-08 16:09
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 455d589..42b635c 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1477,6 +1477,49 @@ gso:
>   skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS);
>  #endif
>   if (q->enqueue) {
> +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
> + int queue_index;
> + /* If we're a multi-queue device, get a queue index to lock */
> + if (netif_is_multiqueue(dev))
> + {
> + /* Get the queue index and lock it. */
> + if (likely(q->ops->map_queue)) {
> + queue_index = q->ops->map_queue(skb, q);
> + 
> spin_lock(>egress_subqueue[queue_index].queue_lock);
> + rc = q->enqueue(skb, q);
> + /*
> +  * Unlock because the underlying qdisc
> +  * may queue and send a packet from a
> +  * different queue.
> +  */
> + 
> spin_unlock(>egress_subqueue[queue_index].queue_lock);
> + qdisc_run(dev);

I really dislike this asymmetric locking logic, while enqueue() is called
with queue_lock held dequeue() is repsonsible to acquire the lock for
qdisc_restart().

> + rc = rc == NET_XMIT_BYPASS
> +? NET_XMIT_SUCCESS : rc;
> + goto out;
> + } else {
> + printk(KERN_CRIT "Device %s is "
> + "multiqueue, but map_queue is "
> + "undefined in the qdisc!!\n",
> + dev->name);
> + kfree_skb(skb);

Move this check to tc_modify_qdisc(), it's useless to check this for
every packet being delivered.

> + }
> + } else {
> + /* We're not a multi-queue device. */
> + spin_lock(>queue_lock);
> + q = dev->qdisc;
> + if (q->enqueue) {
> + rc = q->enqueue(skb, q);
> + qdisc_run(dev);
> + spin_unlock(>queue_lock);
> + rc = rc == NET_XMIT_BYPASS
> +? NET_XMIT_SUCCESS : rc;
> + goto out;
> + }
> + spin_unlock(>queue_lock);

Please don't duplicate already existing code.

> @@ -130,6 +140,72 @@ static inline int qdisc_restart(struct net_device *dev)
>   }
>   
>   {
> +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
> + if (netif_is_multiqueue(dev)) {
> + if (likely(q->ops->map_queue)) {
> + queue_index = q->ops->map_queue(skb, q);
> + } else {
> + printk(KERN_CRIT "Device %s is "
> + "multiqueue, but map_queue is "
> + "undefined in the qdisc!!\n",
> + dev->name);
> + goto requeue;
> + }

Yet another condition completely useless for every transmission.

> + 
> spin_unlock(>egress_subqueue[queue_index].queue_lock);
> + /* Check top level device, and any sub-device */
> + if ((!netif_queue_stopped(dev)) &&
> +   (!netif_subqueue_stopped(dev, queue_index))) {
> + int ret;
> + ret = 
> dev->hard_start_subqueue_xmit(skb, dev, queue_index);
> + if (ret == NETDEV_TX_OK) {
> + if (!nolock) {
> + netif_tx_unlock(dev);
> + }
> + return -1;
> + }
> + if (ret == NETDEV_TX_LOCKED && nolock) {
> + 
> spin_lock(>egress_subqueue[queue_index].queue_lock);
> + goto collision;
> + }
> + }
> + /* NETDEV_TX_BUSY - we need to requeue */
> + /* Release the driver */
> + if (!nolock) {
> + netif_tx_unlock(dev);
> +  

Re: [PATCH 1/2] NET: Multiple queue network device support

2007-03-09 Thread Thomas Graf
* Kok, Auke [EMAIL PROTECTED] 2007-02-08 16:09
 diff --git a/net/core/dev.c b/net/core/dev.c
 index 455d589..42b635c 100644
 --- a/net/core/dev.c
 +++ b/net/core/dev.c
 @@ -1477,6 +1477,49 @@ gso:
   skb-tc_verd = SET_TC_AT(skb-tc_verd,AT_EGRESS);
  #endif
   if (q-enqueue) {
 +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
 + int queue_index;
 + /* If we're a multi-queue device, get a queue index to lock */
 + if (netif_is_multiqueue(dev))
 + {
 + /* Get the queue index and lock it. */
 + if (likely(q-ops-map_queue)) {
 + queue_index = q-ops-map_queue(skb, q);
 + 
 spin_lock(dev-egress_subqueue[queue_index].queue_lock);
 + rc = q-enqueue(skb, q);
 + /*
 +  * Unlock because the underlying qdisc
 +  * may queue and send a packet from a
 +  * different queue.
 +  */
 + 
 spin_unlock(dev-egress_subqueue[queue_index].queue_lock);
 + qdisc_run(dev);

I really dislike this asymmetric locking logic, while enqueue() is called
with queue_lock held dequeue() is repsonsible to acquire the lock for
qdisc_restart().

 + rc = rc == NET_XMIT_BYPASS
 +? NET_XMIT_SUCCESS : rc;
 + goto out;
 + } else {
 + printk(KERN_CRIT Device %s is 
 + multiqueue, but map_queue is 
 + undefined in the qdisc!!\n,
 + dev-name);
 + kfree_skb(skb);

Move this check to tc_modify_qdisc(), it's useless to check this for
every packet being delivered.

 + }
 + } else {
 + /* We're not a multi-queue device. */
 + spin_lock(dev-queue_lock);
 + q = dev-qdisc;
 + if (q-enqueue) {
 + rc = q-enqueue(skb, q);
 + qdisc_run(dev);
 + spin_unlock(dev-queue_lock);
 + rc = rc == NET_XMIT_BYPASS
 +? NET_XMIT_SUCCESS : rc;
 + goto out;
 + }
 + spin_unlock(dev-queue_lock);

Please don't duplicate already existing code.

 @@ -130,6 +140,72 @@ static inline int qdisc_restart(struct net_device *dev)
   }
   
   {
 +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
 + if (netif_is_multiqueue(dev)) {
 + if (likely(q-ops-map_queue)) {
 + queue_index = q-ops-map_queue(skb, q);
 + } else {
 + printk(KERN_CRIT Device %s is 
 + multiqueue, but map_queue is 
 + undefined in the qdisc!!\n,
 + dev-name);
 + goto requeue;
 + }

Yet another condition completely useless for every transmission.

 + 
 spin_unlock(dev-egress_subqueue[queue_index].queue_lock);
 + /* Check top level device, and any sub-device */
 + if ((!netif_queue_stopped(dev)) 
 +   (!netif_subqueue_stopped(dev, queue_index))) {
 + int ret;
 + ret = 
 dev-hard_start_subqueue_xmit(skb, dev, queue_index);
 + if (ret == NETDEV_TX_OK) {
 + if (!nolock) {
 + netif_tx_unlock(dev);
 + }
 + return -1;
 + }
 + if (ret == NETDEV_TX_LOCKED  nolock) {
 + 
 spin_lock(dev-egress_subqueue[queue_index].queue_lock);
 + goto collision;
 + }
 + }
 + /* NETDEV_TX_BUSY - we need to requeue */
 + /* Release the driver */
 + if (!nolock) {
 + netif_tx_unlock(dev);
 + }
 + 
 

RE: [PATCH 1/2] NET: Multiple queue network device support

2007-03-09 Thread Waskiewicz Jr, Peter P
 -Original Message-
 From: Thomas Graf [mailto:[EMAIL PROTECTED] 
 Sent: Friday, March 09, 2007 5:41 AM
 To: Kok, Auke-jan H
 Cc: David Miller; Garzik, Jeff; netdev@vger.kernel.org; 
 linux-kernel@vger.kernel.org; Waskiewicz Jr, Peter P; 
 Brandeburg, Jesse; Kok, Auke; Ronciak, John
 Subject: Re: [PATCH 1/2] NET: Multiple queue network device support
 
 * Kok, Auke [EMAIL PROTECTED] 2007-02-08 16:09
  diff --git a/net/core/dev.c b/net/core/dev.c index 455d589..42b635c 
  100644
  --- a/net/core/dev.c
  +++ b/net/core/dev.c
  @@ -1477,6 +1477,49 @@ gso:
  skb-tc_verd = SET_TC_AT(skb-tc_verd,AT_EGRESS);
   #endif
  if (q-enqueue) {
  +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
  +   int queue_index;
  +   /* If we're a multi-queue device, get a queue 
 index to lock */
  +   if (netif_is_multiqueue(dev))
  +   {
  +   /* Get the queue index and lock it. */
  +   if (likely(q-ops-map_queue)) {
  +   queue_index = q-ops-map_queue(skb, q);
  +   
 spin_lock(dev-egress_subqueue[queue_index].queue_lock);
  +   rc = q-enqueue(skb, q);
  +   /*
  +* Unlock because the underlying qdisc
  +* may queue and send a packet from a
  +* different queue.
  +*/
  +   
 spin_unlock(dev-egress_subqueue[queue_index].queue_lock);
  +   qdisc_run(dev);
 
 I really dislike this asymmetric locking logic, while 
 enqueue() is called with queue_lock held dequeue() is 
 repsonsible to acquire the lock for qdisc_restart().

I don't see how I can make this symmetrical when dealing with more than
one queue.  If an skb is classified to band 2 which maps to queue 1, we
lock queue 1, and enqueue the packet.  That queue will not be the first
queue checked in dequeue for an skb, so I cannot rely on that lock
protecting queue 0 in dequeue.  Therefore I need to map the skb, lock,
enqueue, unlock, then go into dequeue and lock the correct queue there.
If I were to make this symmetrical, then the process of enqueuing and
dequeuing would be serialized completely, and we'd be back to where we
started without multiqueue.

Also, in the multiqueue code path, dev-queue_lock isn't held anywhere,
it's the subqueue lock.

 
  +   rc = rc == NET_XMIT_BYPASS
  +  ? NET_XMIT_SUCCESS : rc;
  +   goto out;
  +   } else {
  +   printk(KERN_CRIT Device %s is 
  +   multiqueue, but map_queue is 
  +   undefined in the qdisc!!\n,
  +   dev-name);
  +   kfree_skb(skb);
 
 Move this check to tc_modify_qdisc(), it's useless to check 
 this for every packet being delivered.

The patches I sent yesterday modified the non-multiqueue qdiscs to not
load if the device is multiqueue, so this check can be removed
altogether.

 
  +   }
  +   } else {
  +   /* We're not a multi-queue device. */
  +   spin_lock(dev-queue_lock);
  +   q = dev-qdisc;
  +   if (q-enqueue) {
  +   rc = q-enqueue(skb, q);
  +   qdisc_run(dev);
  +   spin_unlock(dev-queue_lock);
  +   rc = rc == NET_XMIT_BYPASS
  +  ? NET_XMIT_SUCCESS : rc;
  +   goto out;
  +   }
  +   spin_unlock(dev-queue_lock);
 
 Please don't duplicate already existing code.

I don't want to mix multiqueue and non-multiqueue code in the transmit
path.  This was an attempt to allow MQ and non-MQ devices to coexist in
a machine having separate code paths.  Are you suggesting to combine
them?  That would get very messy trying to determine what type of lock
to grab (subqueue lock or dev-queue_lock), not to mention grabbing the
dev-queue_lock would block multiqueue devices in that same codepath.

 
  @@ -130,6 +140,72 @@ static inline int qdisc_restart(struct 
 net_device *dev)
  }
  
  {
  +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
  +   if (netif_is_multiqueue(dev)) {
  +   if (likely(q-ops-map_queue)) {
  +   queue_index = 
 q-ops-map_queue(skb, q);
  +   } else {
  +   printk(KERN_CRIT Device %s is 
  +   multiqueue, 
 but map_queue is 
  +   undefined in 
 the qdisc!!\n,
  +   dev-name

Re: [PATCH 1/2] NET: Multiple queue network device support

2007-03-09 Thread Thomas Graf
* Waskiewicz Jr, Peter P [EMAIL PROTECTED] 2007-03-09 11:25
   + }
   + } else {
   + /* We're not a multi-queue device. */
   + spin_lock(dev-queue_lock);
   + q = dev-qdisc;
   + if (q-enqueue) {
   + rc = q-enqueue(skb, q);
   + qdisc_run(dev);
   + spin_unlock(dev-queue_lock);
   + rc = rc == NET_XMIT_BYPASS
   +? NET_XMIT_SUCCESS : rc;
   + goto out;
   + }
   + spin_unlock(dev-queue_lock);
  
  Please don't duplicate already existing code.
 
 I don't want to mix multiqueue and non-multiqueue code in the transmit
 path.  This was an attempt to allow MQ and non-MQ devices to coexist in
 a machine having separate code paths.  Are you suggesting to combine
 them?  That would get very messy trying to determine what type of lock
 to grab (subqueue lock or dev-queue_lock), not to mention grabbing the
 dev-queue_lock would block multiqueue devices in that same codepath.

The piece of code I quoted above is the branch executed if multi queue
is not enabled. The code you added is 100% identical to the already
existing enqueue logic. Just execute the existing branch if multi queue
is not enabled.

 This is another attempt to keep the two codepaths separate.  The only
 way I see of combining them is to check netif_is_multiqueue() everytime
 I need to grab a lock, which I think would be excessive.

The code added is 100% identical to the existing code, just be a little
smarter on how you do the ifdefs.

 }

 return NULL;
   @@ -141,18 +174,53 @@ prio_dequeue(struct Qdisc* sch)
 struct sk_buff *skb;
 struct prio_sched_data *q = qdisc_priv(sch);
 int prio;
   +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
   + int queue;
   +#endif
 struct Qdisc *qdisc;

   + /*
   +  * If we're multiqueue, the basic approach is try the 
  lock on each
   +  * queue.  If it's locked, either we're already 
  dequeuing, or enqueue
   +  * is doing something.  Go to the next band if we're 
  locked.  Once we
   +  * have a packet, unlock the queue.  NOTE: the 
  underlying qdisc CANNOT
   +  * be a PRIO qdisc, otherwise we will deadlock.  FIXME
   +  */
 for (prio = 0; prio  q-bands; prio++) {
   +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
   + if (netif_is_multiqueue(sch-dev)) {
   + queue = q-band2queue[prio];
   + if 
  (spin_trylock(sch-dev-egress_subqueue[queue].queue_lock)) {
   + qdisc = q-queues[prio];
   + skb = qdisc-dequeue(qdisc);
   + if (skb) {
   + sch-q.qlen--;
   + skb-priority = prio;
   + 
  spin_unlock(sch-dev-egress_subqueue[queue].queue_lock);
   + return skb;
   + }
   + 
  spin_unlock(sch-dev-egress_subqueue[queue].queue_lock);
   + }
  
  Your modified qdisc_restart() expects the queue_lock to be 
  locked, how can this work?
 
 No, it doesn't expect the lock to be held.  Because of the multiple
 queues, enqueueing and dequeueing are now asynchronous, since I can
 enqueue to queue 0 while dequeuing from queue 1.  dev-queue_lock isn't
 held, so this can happen.  Therefore the spin_trylock() is used in this
 dequeue because I don't want to wait for someone to finish with that
 queue in question (e.g. enqueue working), since that will block all
 other bands/queues after the band in question.  So if the lock isn't
 available to grab, we move to the next band.  If I were to wait for the
 lock, I'd serialize the enqueue/dequeue completely, and block other
 traffic flows in other queues waiting for the lock.

The first thing you do in qdisc_restart() after dequeue()ing is unlock
the sub queue lock. You explicitely unlock it before calling qdisc_run()
so I assume dequeue() is expected to keep it locked. Something doesn't
add up.

BTW, which lock serializes your write access to qdisc-q.qlen? It used
to be dev-queue_lock but that is apparently not true for multi queue.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] NET: Multiple queue network device support

2007-03-09 Thread Waskiewicz Jr, Peter P
 * Waskiewicz Jr, Peter P [EMAIL PROTECTED] 
 2007-03-09 11:25
+   }
+   } else {
+   /* We're not a multi-queue device. */
+   spin_lock(dev-queue_lock);
+   q = dev-qdisc;
+   if (q-enqueue) {
+   rc = q-enqueue(skb, q);
+   qdisc_run(dev);
+   spin_unlock(dev-queue_lock);
+   rc = rc == NET_XMIT_BYPASS
+  ? 
 NET_XMIT_SUCCESS : rc;
+   goto out;
+   }
+   spin_unlock(dev-queue_lock);
   
   Please don't duplicate already existing code.
  
  I don't want to mix multiqueue and non-multiqueue code in 
 the transmit 
  path.  This was an attempt to allow MQ and non-MQ devices 
 to coexist 
  in a machine having separate code paths.  Are you suggesting to 
  combine them?  That would get very messy trying to 
 determine what type 
  of lock to grab (subqueue lock or dev-queue_lock), not to mention 
  grabbing the
  dev-queue_lock would block multiqueue devices in that same 
 codepath.
 
 The piece of code I quoted above is the branch executed if 
 multi queue is not enabled. The code you added is 100% 
 identical to the already existing enqueue logic. Just execute 
 the existing branch if multi queue is not enabled.
 

I totally agree this is identical code if multiqueue isn't enabled.
However, when multiqueue is enabled, I don't want to make all network
drivers implement the subqueue API just to be able to lock the queues.
So the first thing I check is netif_is_multiqueue(dev), and if it
*isn't* multiqueue, it will run the existing code.  This way both
non-multiqueue devices don't have to have any knowledge of the MQ API.

  This is another attempt to keep the two codepaths separate. 
  The only 
  way I see of combining them is to check netif_is_multiqueue() 
  everytime I need to grab a lock, which I think would be excessive.
 
 The code added is 100% identical to the existing code, just 
 be a little smarter on how you do the ifdefs.

An example could be an on-board adapter isn't multiqueue, and an
expansion card you have in your system is.  If I handle multiqueue being
on with just ifdef's, then I could use the expansion card, but not the
on-board adapter as-is.  The on-board driver would need to be updated to
have 1 queue in the multiqueue context, which is what I tried to avoid.

   Your modified qdisc_restart() expects the queue_lock to 
 be locked, 
   how can this work?
  
  No, it doesn't expect the lock to be held.  Because of the multiple 
  queues, enqueueing and dequeueing are now asynchronous, since I can 
  enqueue to queue 0 while dequeuing from queue 1.  dev-queue_lock 
  isn't held, so this can happen.  Therefore the 
 spin_trylock() is used 
  in this dequeue because I don't want to wait for someone to finish 
  with that queue in question (e.g. enqueue working), since that will 
  block all other bands/queues after the band in question.  So if the 
  lock isn't available to grab, we move to the next band.  If 
 I were to 
  wait for the lock, I'd serialize the enqueue/dequeue 
 completely, and 
  block other traffic flows in other queues waiting for the lock.
 
 The first thing you do in qdisc_restart() after dequeue()ing 
 is unlock the sub queue lock. You explicitely unlock it 
 before calling qdisc_run() so I assume dequeue() is expected 
 to keep it locked. Something doesn't add up.

That's the entire point of this extra locking.  enqueue() is going to
put an skb into a band somewhere that maps to some queue, and there is
no way to guarantee the skb I retrieve from dequeue() is headed for the
same queue.  Therefore, I need to unlock the queue after I finish
enqueuing, since having that lock makes little sense to dequeue().
dequeue() will then grab *a* lock on a queue; it may be the same one we
had during enqueue(), but it may not be.  And the placement of the
unlock of that queue is exactly where it happens in non-multiqueue,
which is right before the hard_start_xmit().

I concede that the locking model is complex and seems really nasty, but
to truly separate queue functionality from one another, I see no other
feasible option than to run locking like this.  I am very open to
suggestions.

 
 BTW, which lock serializes your write access to 
 qdisc-q.qlen? It used to be dev-queue_lock but that is 
 apparently not true for multi queue.
 

This is a very good catch, because it isn't being protected on the
entire qdisc right now for PRIO.  However, Chris Leech pointed out the
LINK_STATE_QDISC_RUNNING bit is serializing things at the qdisc_run()
level, so that's probably protecting this counter right now.  I'm
looking at pushing that down into the qdisc, but at the same time I can
think of something to 

Re: [PATCH 1/2] NET: Multiple queue network device support

2007-03-09 Thread Thomas Graf
* Waskiewicz Jr, Peter P [EMAIL PROTECTED] 2007-03-09 15:27
 That's the entire point of this extra locking.  enqueue() is going to
 put an skb into a band somewhere that maps to some queue, and there is
 no way to guarantee the skb I retrieve from dequeue() is headed for the
 same queue.  Therefore, I need to unlock the queue after I finish
 enqueuing, since having that lock makes little sense to dequeue().
 dequeue() will then grab *a* lock on a queue; it may be the same one we
 had during enqueue(), but it may not be.  And the placement of the
 unlock of that queue is exactly where it happens in non-multiqueue,
 which is right before the hard_start_xmit().

The lock is already unlocked after dequeue, from your prio_dequeue():

   if (netif_is_multiqueue(sch-dev)) {
   queue = q-band2queue[prio];
   if (spin_trylock(sch-dev-egress_subqueue[queue].queue_lock)) {
   qdisc = q-queues[prio];
   skb = qdisc-dequeue(qdisc);
   if (skb) {
   sch-q.qlen--;
   skb-priority = prio;
   
spin_unlock(sch-dev-egress_subqueue[queue].queue_lock);
   return skb;
   }
   
spin_unlock(sch-dev-egress_subqueue[queue].queue_lock);
   }
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] NET: Multiple queue network device support

2007-03-08 Thread Jarek Poplawski
On 07-03-2007 23:42, David Miller wrote:
> I didn't say to use skb->priority, I said to shrink skb->priority down
> to a u16 and then make another u16 which will store your queue mapping
> value.

Peter is right: this is fully used by schedulers (prio,
CBQ, HTB, HFSC...) and would break users' scripts, so I
wouldn't recommend, too.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] NET: Multiple queue network device support

2007-03-08 Thread Jarek Poplawski
On 07-03-2007 23:42, David Miller wrote:
 I didn't say to use skb-priority, I said to shrink skb-priority down
 to a u16 and then make another u16 which will store your queue mapping
 value.

Peter is right: this is fully used by schedulers (prio,
CBQ, HTB, HFSC...) and would break users' scripts, so I
wouldn't recommend, too.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] NET: Multiple queue network device support

2007-03-07 Thread David Miller

I didn't say to use skb->priority, I said to shrink skb->priority down
to a u16 and then make another u16 which will store your queue mapping
value.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] NET: Multiple queue network device support

2007-03-07 Thread Waskiewicz Jr, Peter P
Dave,
I did some research based on your feedback.  Specifically, I
looked at removing ->map_queue() and allowing ->enqueue() to take care
of mapping and locking of the queues (and ->dequeue()).  I found that it
can be done either way, but I'm not sure I like taking the locking out
of dev_queue_xmit(), and relying on the qdisc ->enqueue() to make sure
locking is completed correctly.  Having a map routine also allows
various parts of the stack to query the skb where it belongs.
I also looked into storing the mapped queue value in the skb,
namely in skb->priority.  If I were to use this value, I'd lose
filtering capabilities in sch_prio.c, where if no tc filter exists for
the skb in question, it relies on the value of skb->priority to enqueue
to a band.  The value of skb->priority could also be used in a base
driver for further filtering (some of the MAC-based QoS on wireless
devices comes to mind), so I'm not sure I feel comfortable using that
field to store the queue.  I also envision some qdiscs in the future
that could change the mapping of bands to queues without reloading the
qdisc, so storing the queue information in the skb could lead to
out-of-order packets into queues (stale information in the skb).
What do you think?

Thanks,

PJ Waskiewicz
Software Engineer
LAD - LAN Access Division, New Technologies
Intel Corporation
[EMAIL PROTECTED]
 

> -Original Message-
> From: David Miller [mailto:[EMAIL PROTECTED] 
> Sent: Monday, February 26, 2007 5:03 PM
> To: Kok, Auke-jan H
> Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org; 
> linux-kernel@vger.kernel.org; Waskiewicz Jr, Peter P; 
> Brandeburg, Jesse; [EMAIL PROTECTED]; Ronciak, John
> Subject: Re: [PATCH 1/2] NET: Multiple queue network device support
> 
> From: "Kok, Auke" <[EMAIL PROTECTED]>
> Date: Thu, 08 Feb 2007 16:09:50 -0800
> 
> > From: Peter Waskiewicz Jr <[EMAIL PROTECTED]>
> > 
> > Added an API and associated supporting routines for 
> multiqueue network 
> > devices.  This allows network devices supporting multiple 
> TX queues to 
> > configure each queue within the netdevice and manage each queue 
> > independantly.  Changes to the PRIO Qdisc also allow a user to map 
> > multiple flows to individual TX queues, taking advantage of 
> each queue 
> > on the device.
> > 
> > Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]>
> > Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
> 
> Thanks for posting this.
> 
> I was wonding if it would be better to have the ->enqueue() 
> perform the mapping operation, store the queue index selected 
> inside of the skb, and just use the normal 
> ->hard_start_xmit() to send the packet out?
> 
> The only thing to take care of is the per-queue locking, but 
> that should be easily doable.
> 
> We might be able to do this even without making sk_buff any larger.
> For example, I suppose skb->priority might be deducable down 
> to a u16.  After a quick audit I really cannot see any 
> legitimate use of anything larger than 16-bit values, but a 
> more thorough audit would be necessary to validate this.
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] NET: Multiple queue network device support

2007-03-07 Thread Waskiewicz Jr, Peter P
Dave,
I did some research based on your feedback.  Specifically, I
looked at removing -map_queue() and allowing -enqueue() to take care
of mapping and locking of the queues (and -dequeue()).  I found that it
can be done either way, but I'm not sure I like taking the locking out
of dev_queue_xmit(), and relying on the qdisc -enqueue() to make sure
locking is completed correctly.  Having a map routine also allows
various parts of the stack to query the skb where it belongs.
I also looked into storing the mapped queue value in the skb,
namely in skb-priority.  If I were to use this value, I'd lose
filtering capabilities in sch_prio.c, where if no tc filter exists for
the skb in question, it relies on the value of skb-priority to enqueue
to a band.  The value of skb-priority could also be used in a base
driver for further filtering (some of the MAC-based QoS on wireless
devices comes to mind), so I'm not sure I feel comfortable using that
field to store the queue.  I also envision some qdiscs in the future
that could change the mapping of bands to queues without reloading the
qdisc, so storing the queue information in the skb could lead to
out-of-order packets into queues (stale information in the skb).
What do you think?

Thanks,

PJ Waskiewicz
Software Engineer
LAD - LAN Access Division, New Technologies
Intel Corporation
[EMAIL PROTECTED]
 

 -Original Message-
 From: David Miller [mailto:[EMAIL PROTECTED] 
 Sent: Monday, February 26, 2007 5:03 PM
 To: Kok, Auke-jan H
 Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org; 
 linux-kernel@vger.kernel.org; Waskiewicz Jr, Peter P; 
 Brandeburg, Jesse; [EMAIL PROTECTED]; Ronciak, John
 Subject: Re: [PATCH 1/2] NET: Multiple queue network device support
 
 From: Kok, Auke [EMAIL PROTECTED]
 Date: Thu, 08 Feb 2007 16:09:50 -0800
 
  From: Peter Waskiewicz Jr [EMAIL PROTECTED]
  
  Added an API and associated supporting routines for 
 multiqueue network 
  devices.  This allows network devices supporting multiple 
 TX queues to 
  configure each queue within the netdevice and manage each queue 
  independantly.  Changes to the PRIO Qdisc also allow a user to map 
  multiple flows to individual TX queues, taking advantage of 
 each queue 
  on the device.
  
  Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED]
  Signed-off-by: Auke Kok [EMAIL PROTECTED]
 
 Thanks for posting this.
 
 I was wonding if it would be better to have the -enqueue() 
 perform the mapping operation, store the queue index selected 
 inside of the skb, and just use the normal 
 -hard_start_xmit() to send the packet out?
 
 The only thing to take care of is the per-queue locking, but 
 that should be easily doable.
 
 We might be able to do this even without making sk_buff any larger.
 For example, I suppose skb-priority might be deducable down 
 to a u16.  After a quick audit I really cannot see any 
 legitimate use of anything larger than 16-bit values, but a 
 more thorough audit would be necessary to validate this.
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] NET: Multiple queue network device support

2007-03-07 Thread David Miller

I didn't say to use skb-priority, I said to shrink skb-priority down
to a u16 and then make another u16 which will store your queue mapping
value.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] NET: Multiple queue network device support

2007-02-27 Thread Waskiewicz Jr, Peter P
Dave,
Thanks for the feedback.  Please see below for my comments /
questions.

PJ Waskiewicz
Intel Corporation

> > From: Peter Waskiewicz Jr <[EMAIL PROTECTED]>
> > 
> > Added an API and associated supporting routines for 
> multiqueue network 
> > devices.  This allows network devices supporting multiple 
> TX queues to 
> > configure each queue within the netdevice and manage each queue 
> > independantly.  Changes to the PRIO Qdisc also allow a user to map 
> > multiple flows to individual TX queues, taking advantage of 
> each queue 
> > on the device.
> > 
> > Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]>
> > Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
> 
> Thanks for posting this.
> 
> I was wonding if it would be better to have the ->enqueue() 
> perform the mapping operation, store the queue index selected 
> inside of the skb, and just use the normal 
> ->hard_start_xmit() to send the packet out?

One of the reasons I chose to add more functions is for the queue
management itself.  Specifically, I needed the ability to lock queues,
stop queues (netif_stop_queue), and wake them up, independent of the
global queue_lock in the netdevice.  At first I did think about storing
the queue index in the skb, but that won't help with the queue locking,
since only netdevice is passed to them outside of the context of the
skb.  As for adding the additional multiqueue version of
hard_start_subqueue_xmit(), I did that so non-MQ devices would have a
clean, untouched path in dev_queue_xmit(), and only MQ devices would
touch a new codepath.  This also removes the need for the device driver
to inspect the queue index in the skb for which Tx ring to place the
packet in, which is good for performance.

For having ->enqueue() take care of the mapping, I thought for awhile
about this.  Originally, I had ->enqueue() doing just that, until I
realized I was not locking the individual subqueues correctly.  The
thought process was once I receive an skb for Tx, I need to lock a
queue, enqueue the skb, call qdisc_run(), then unlock.  That grew into
lock a queue, enqueue, unlock the queue, have dequeue decide which queue
to pull an skb from (since the queue an skb comes from can be different
than the one just enqueued to), lock it, dequeue, unlock, and return the
skb.  The whole issue came down to needing to know what queue to lock
before enqueuing.  If I called enqueue, I would be going in unlocked.  I
suppose this would be ok if the lock is done inside the ->enqueue(), and
unlock is done before exiting; let me look at that possibility.

> 
> The only thing to take care of is the per-queue locking, but 
> that should be easily doable.
> 
> We might be able to do this even without making sk_buff any larger.
> For example, I suppose skb->priority might be deducable down 
> to a u16.  After a quick audit I really cannot see any 
> legitimate use of anything larger than 16-bit values, but a 
> more thorough audit would be necessary to validate this.

The skb->priority field right now is used to determine the PRIO /
pfifo_fast band to place the skb in on enqueue.  I still need to think
about if storing the queue index in the skb would help, due to the queue
management functions outside of the skb context.

Again, thanks for the feedback.  I'll respond once I've looked at moving
locks into ->enqueue() and removing ->map_queue().  Please note that the
PRIO band to queue mapping in this patch has a problem when the number
of bands is <= the number of Tx queues on the NIC.  I have a fix, and
will include that in a repost.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] NET: Multiple queue network device support

2007-02-27 Thread Waskiewicz Jr, Peter P
Dave,
Thanks for the feedback.  Please see below for my comments /
questions.

PJ Waskiewicz
Intel Corporation

  From: Peter Waskiewicz Jr [EMAIL PROTECTED]
  
  Added an API and associated supporting routines for 
 multiqueue network 
  devices.  This allows network devices supporting multiple 
 TX queues to 
  configure each queue within the netdevice and manage each queue 
  independantly.  Changes to the PRIO Qdisc also allow a user to map 
  multiple flows to individual TX queues, taking advantage of 
 each queue 
  on the device.
  
  Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED]
  Signed-off-by: Auke Kok [EMAIL PROTECTED]
 
 Thanks for posting this.
 
 I was wonding if it would be better to have the -enqueue() 
 perform the mapping operation, store the queue index selected 
 inside of the skb, and just use the normal 
 -hard_start_xmit() to send the packet out?

One of the reasons I chose to add more functions is for the queue
management itself.  Specifically, I needed the ability to lock queues,
stop queues (netif_stop_queue), and wake them up, independent of the
global queue_lock in the netdevice.  At first I did think about storing
the queue index in the skb, but that won't help with the queue locking,
since only netdevice is passed to them outside of the context of the
skb.  As for adding the additional multiqueue version of
hard_start_subqueue_xmit(), I did that so non-MQ devices would have a
clean, untouched path in dev_queue_xmit(), and only MQ devices would
touch a new codepath.  This also removes the need for the device driver
to inspect the queue index in the skb for which Tx ring to place the
packet in, which is good for performance.

For having -enqueue() take care of the mapping, I thought for awhile
about this.  Originally, I had -enqueue() doing just that, until I
realized I was not locking the individual subqueues correctly.  The
thought process was once I receive an skb for Tx, I need to lock a
queue, enqueue the skb, call qdisc_run(), then unlock.  That grew into
lock a queue, enqueue, unlock the queue, have dequeue decide which queue
to pull an skb from (since the queue an skb comes from can be different
than the one just enqueued to), lock it, dequeue, unlock, and return the
skb.  The whole issue came down to needing to know what queue to lock
before enqueuing.  If I called enqueue, I would be going in unlocked.  I
suppose this would be ok if the lock is done inside the -enqueue(), and
unlock is done before exiting; let me look at that possibility.

 
 The only thing to take care of is the per-queue locking, but 
 that should be easily doable.
 
 We might be able to do this even without making sk_buff any larger.
 For example, I suppose skb-priority might be deducable down 
 to a u16.  After a quick audit I really cannot see any 
 legitimate use of anything larger than 16-bit values, but a 
 more thorough audit would be necessary to validate this.

The skb-priority field right now is used to determine the PRIO /
pfifo_fast band to place the skb in on enqueue.  I still need to think
about if storing the queue index in the skb would help, due to the queue
management functions outside of the skb context.

Again, thanks for the feedback.  I'll respond once I've looked at moving
locks into -enqueue() and removing -map_queue().  Please note that the
PRIO band to queue mapping in this patch has a problem when the number
of bands is = the number of Tx queues on the NIC.  I have a fix, and
will include that in a repost.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] NET: Multiple queue network device support

2007-02-26 Thread David Miller
From: "Kok, Auke" <[EMAIL PROTECTED]>
Date: Thu, 08 Feb 2007 16:09:50 -0800

> From: Peter Waskiewicz Jr <[EMAIL PROTECTED]>
> 
> Added an API and associated supporting routines for multiqueue network
> devices.  This allows network devices supporting multiple TX queues to
> configure each queue within the netdevice and manage each queue
> independantly.  Changes to the PRIO Qdisc also allow a user to map
> multiple flows to individual TX queues, taking advantage of each queue
> on the device.
> 
> Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]>
> Signed-off-by: Auke Kok <[EMAIL PROTECTED]>

Thanks for posting this.

I was wonding if it would be better to have the ->enqueue() perform
the mapping operation, store the queue index selected inside of the
skb, and just use the normal ->hard_start_xmit() to send the packet
out?

The only thing to take care of is the per-queue locking, but that
should be easily doable.

We might be able to do this even without making sk_buff any larger.
For example, I suppose skb->priority might be deducable down to
a u16.  After a quick audit I really cannot see any legitimate use
of anything larger than 16-bit values, but a more thorough audit
would be necessary to validate this.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] NET: Multiple queue network device support

2007-02-26 Thread David Miller
From: Kok, Auke [EMAIL PROTECTED]
Date: Thu, 08 Feb 2007 16:09:50 -0800

 From: Peter Waskiewicz Jr [EMAIL PROTECTED]
 
 Added an API and associated supporting routines for multiqueue network
 devices.  This allows network devices supporting multiple TX queues to
 configure each queue within the netdevice and manage each queue
 independantly.  Changes to the PRIO Qdisc also allow a user to map
 multiple flows to individual TX queues, taking advantage of each queue
 on the device.
 
 Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED]
 Signed-off-by: Auke Kok [EMAIL PROTECTED]

Thanks for posting this.

I was wonding if it would be better to have the -enqueue() perform
the mapping operation, store the queue index selected inside of the
skb, and just use the normal -hard_start_xmit() to send the packet
out?

The only thing to take care of is the per-queue locking, but that
should be easily doable.

We might be able to do this even without making sk_buff any larger.
For example, I suppose skb-priority might be deducable down to
a u16.  After a quick audit I really cannot see any legitimate use
of anything larger than 16-bit values, but a more thorough audit
would be necessary to validate this.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] NET: Multiple queue network device support

2007-02-23 Thread Kok, Auke

Stephen Hemminger wrote:

On Fri, 23 Feb 2007 04:00:55 -0500
"Sreenivasa Honnur" <[EMAIL PROTECTED]> wrote:


Fucntion "map_queue" returns queue index as '0'. There is no support to
return different queue indexes.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Kok, Auke
Sent: Friday, February 09, 2007 5:40 AM
To: David Miller; Garzik, Jeff; netdev@vger.kernel.org;
linux-kernel@vger.kernel.org
Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke;
Ronciak, John
Subject: [PATCH 1/2] NET: Multiple queue network device support


From: Peter Waskiewicz Jr <[EMAIL PROTECTED]>

Added an API and associated supporting routines for multiqueue network
devices.  This allows network devices supporting multiple TX queues to
configure each queue within the netdevice and manage each queue
independantly.  Changes to the PRIO Qdisc also allow a user to map
multiple flows to individual TX queues, taking advantage of each queue
on the device.

Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]>
Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---


 
+config NET_MULTI_QUEUE_DEVICE

+   bool "Multiple queue network device support (EXPERIMENTAL)"
+   depends on NET_SCHED && EXPERIMENTAL
+   help
+ Saying Y here will add support for network devices that have
more than
+ one physical TX queue and/or RX queue.
+
+ Multiple queue devices will require qdiscs that understand how
to
+ queue to multiple targets.  The default packet scheduler will
default
+ to the first queue in a device.  In other words, if you need
the
+ ability to spread traffic across queues, your queueing
discipline
+ needs to know how to do that.
+
+ Note that saying Y here will give preferential treatment to
multiple
+ queue devices in the network stack.  A slight drop in
single-queue
+ device performance may be seen.
+
+ Say N here unless you know your network device supports
multiple
+ TX and/or RX queues.
+


This should not be a user visible configuration option.
It should either: always be part of the kernel API
or be selected by drivers that need/want it.


perhaps when it's stable, yes, but right now it's definately experimental and 
may result in a slight overhead for single-queue devices as the text reads.



Cheers,

Auke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] NET: Multiple queue network device support

2007-02-23 Thread Stephen Hemminger
On Fri, 23 Feb 2007 04:00:55 -0500
"Sreenivasa Honnur" <[EMAIL PROTECTED]> wrote:

> Fucntion "map_queue" returns queue index as '0'. There is no support to
> return different queue indexes.
> 
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> On Behalf Of Kok, Auke
> Sent: Friday, February 09, 2007 5:40 AM
> To: David Miller; Garzik, Jeff; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke;
> Ronciak, John
> Subject: [PATCH 1/2] NET: Multiple queue network device support
> 
> 
> From: Peter Waskiewicz Jr <[EMAIL PROTECTED]>
> 
> Added an API and associated supporting routines for multiqueue network
> devices.  This allows network devices supporting multiple TX queues to
> configure each queue within the netdevice and manage each queue
> independantly.  Changes to the PRIO Qdisc also allow a user to map
> multiple flows to individual TX queues, taking advantage of each queue
> on the device.
> 
> Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]>
> Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
> ---

>  
> +config NET_MULTI_QUEUE_DEVICE
> + bool "Multiple queue network device support (EXPERIMENTAL)"
> + depends on NET_SCHED && EXPERIMENTAL
> + help
> +   Saying Y here will add support for network devices that have
> more than
> +   one physical TX queue and/or RX queue.
> +
> +   Multiple queue devices will require qdiscs that understand how
> to
> +   queue to multiple targets.  The default packet scheduler will
> default
> +   to the first queue in a device.  In other words, if you need
> the
> +   ability to spread traffic across queues, your queueing
> discipline
> +   needs to know how to do that.
> +
> +   Note that saying Y here will give preferential treatment to
> multiple
> +   queue devices in the network stack.  A slight drop in
> single-queue
> +   device performance may be seen.
> +
> +   Say N here unless you know your network device supports
> multiple
> +   TX and/or RX queues.
> +

This should not be a user visible configuration option.
It should either: always be part of the kernel API
or be selected by drivers that need/want it.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] NET: Multiple queue network device support

2007-02-23 Thread Waskiewicz Jr, Peter P
map_queue will always return 0 in the pfifo_fast Qdisc.  This is because
pfifo_fast is the default scheduler when a device is brought up.  I
didn't want to make any assumptions about a device's functionality, so
using queue 0 seems the least dangerous.

However, in the prio qdisc, there is a small problem with the queue to
band assignment logic.  If you have the same number of bands as queues
(e.g. 4 prio bands and 4 Tx queues), then you will have all bands
assigned to queue 0.  I have a fix for that, which I plan to send when I
finish up a documentation piece for this patch's repost.

Thanks for the feedback,

PJ Waskiewicz

> -Original Message-
> From: Sreenivasa Honnur [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 23, 2007 1:01 AM
> To: Kok, Auke-jan H; David Miller; Garzik, Jeff; 
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: Waskiewicz Jr, Peter P; Brandeburg, Jesse; Kok, Auke; 
> Ronciak, John
> Subject: RE: [PATCH 1/2] NET: Multiple queue network device support
> 
> Fucntion "map_queue" returns queue index as '0'. There is no 
> support to return different queue indexes.
> 
> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED]
> On Behalf Of Kok, Auke
> Sent: Friday, February 09, 2007 5:40 AM
> To: David Miller; Garzik, Jeff; netdev@vger.kernel.org; 
> linux-kernel@vger.kernel.org
> Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, 
> Auke; Ronciak, John
> Subject: [PATCH 1/2] NET: Multiple queue network device support
> 
> 
> From: Peter Waskiewicz Jr <[EMAIL PROTECTED]>
> 
> Added an API and associated supporting routines for 
> multiqueue network devices.  This allows network devices 
> supporting multiple TX queues to configure each queue within 
> the netdevice and manage each queue independantly.  Changes 
> to the PRIO Qdisc also allow a user to map multiple flows to 
> individual TX queues, taking advantage of each queue on the device.
> 
> Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]>
> Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
> ---
> 
>  include/linux/netdevice.h |   73 
>  include/net/sch_generic.h |3 +
>  net/Kconfig   |   20 
>  net/core/dev.c|   51 
>  net/sched/sch_generic.c   |  117
> +
>  net/sched/sch_prio.c  |  106
> ++---
>  6 files changed, 364 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/netdevice.h 
> b/include/linux/netdevice.h index
> 2e37f50..c7f94a8 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -106,6 +106,16 @@ struct netpoll_info;  #define MAX_HEADER 
> (LL_MAX_HEADER + 48)  #endif
>  
> +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
> +
> +struct net_device_subqueue
> +{
> + /* Give a lock and a flow control state for each queue */
> + unsigned long   state;
> + spinlock_t  queue_lock cacheline_aligned_in_smp;
> +};
> +#endif
> +
>  /*
>   *   Network device statistics. Akin to the 2.0 ether stats but
>   *   with byte counters.
> @@ -339,6 +349,10 @@ struct net_device
>  #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
>  #define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
>  
> +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
> +#define NETIF_F_MULTI_QUEUE  8192/* Has multiple TX/RX queues */
> +#endif
> +
>   struct net_device   *next_sched;
>  
>   /* Interface index. Unique device identifier*/
> @@ -532,6 +546,19 @@ struct net_device
>   struct device   dev;
>   /* space for optional statistics and wireless sysfs groups */
>   struct attribute_group  *sysfs_groups[3];
> +
> +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
> + /* To retrieve statistics per subqueue - FOR FUTURE USE */
> + struct net_device_stats* (*get_subqueue_stats)(struct net_device
> *dev,
> + int
> queue_index);
> +
> + /* The TX queue control structures */
> + struct net_device_subqueue  *egress_subqueue;
> + int egress_subqueue_count;
> + int (*hard_start_subqueue_xmit)(struct
> sk_buff *skb,
> + struct
> net_device *dev,
> + int
> queue_index);
> +#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */
>  };
>  #define to_net_dev(d) container_of(d, struct net_device, dev)
>  
> @@ -673,6 +700,52 @@ static inline int netif_running(const 
> struct net_device *dev)
>   

RE: [PATCH 1/2] NET: Multiple queue network device support

2007-02-23 Thread Sreenivasa Honnur
Fucntion "map_queue" returns queue index as '0'. There is no support to
return different queue indexes.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Kok, Auke
Sent: Friday, February 09, 2007 5:40 AM
To: David Miller; Garzik, Jeff; netdev@vger.kernel.org;
linux-kernel@vger.kernel.org
Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke;
Ronciak, John
Subject: [PATCH 1/2] NET: Multiple queue network device support


From: Peter Waskiewicz Jr <[EMAIL PROTECTED]>

Added an API and associated supporting routines for multiqueue network
devices.  This allows network devices supporting multiple TX queues to
configure each queue within the netdevice and manage each queue
independantly.  Changes to the PRIO Qdisc also allow a user to map
multiple flows to individual TX queues, taking advantage of each queue
on the device.

Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]>
Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 include/linux/netdevice.h |   73 
 include/net/sch_generic.h |3 +
 net/Kconfig   |   20 
 net/core/dev.c|   51 
 net/sched/sch_generic.c   |  117
+
 net/sched/sch_prio.c  |  106
++---
 6 files changed, 364 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index
2e37f50..c7f94a8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -106,6 +106,16 @@ struct netpoll_info;  #define MAX_HEADER
(LL_MAX_HEADER + 48)  #endif
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+
+struct net_device_subqueue
+{
+   /* Give a lock and a flow control state for each queue */
+   unsigned long   state;
+   spinlock_t  queue_lock cacheline_aligned_in_smp;
+};
+#endif
+
 /*
  * Network device statistics. Akin to the 2.0 ether stats but
  * with byte counters.
@@ -339,6 +349,10 @@ struct net_device
 #define NETIF_F_GEN_CSUM   (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
 #define NETIF_F_ALL_CSUM   (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+#define NETIF_F_MULTI_QUEUE8192/* Has multiple TX/RX queues */
+#endif
+
struct net_device   *next_sched;
 
/* Interface index. Unique device identifier*/
@@ -532,6 +546,19 @@ struct net_device
struct device   dev;
/* space for optional statistics and wireless sysfs groups */
struct attribute_group  *sysfs_groups[3];
+
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+   /* To retrieve statistics per subqueue - FOR FUTURE USE */
+   struct net_device_stats* (*get_subqueue_stats)(struct net_device
*dev,
+   int
queue_index);
+
+   /* The TX queue control structures */
+   struct net_device_subqueue  *egress_subqueue;
+   int egress_subqueue_count;
+   int (*hard_start_subqueue_xmit)(struct
sk_buff *skb,
+   struct
net_device *dev,
+   int
queue_index);
+#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -673,6 +700,52 @@ static inline int netif_running(const struct
net_device *dev)
return test_bit(__LINK_STATE_START, >state);  }
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+
+/*
+ * Routines to manage the subqueues on a device.  We only need start
+ * stop, and a check if it's stopped.  All other device management is
+ * done at the overall netdevice level.
+ * Also test the netif state if we're multi-queue at all.
+ */
+static inline void netif_start_subqueue(struct net_device *dev, int
+queue_index) {
+   clear_bit(__LINK_STATE_XOFF,
+>egress_subqueue[queue_index].state);
+}
+
+static inline void netif_stop_subqueue(struct net_device *dev, int
+queue_index) { #ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   set_bit(__LINK_STATE_XOFF,
>egress_subqueue[queue_index].state);
+}
+
+static inline int netif_subqueue_stopped(const struct net_device *dev,
+ int queue_index) {
+   return test_bit(__LINK_STATE_XOFF,
+   >egress_subqueue[queue_index].state);
+}
+
+static inline void netif_wake_subqueue(struct net_device *dev, int
+queue_index) { #ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   if (test_and_clear_bit(__LINK_STATE_XOFF,
+
>egress_subqueue[queue_index].state))
+   __netif_schedule(dev);
+}
+
+static inline int netif_is_multiqueue(const struct net_device *dev) {
+   return (!!(NETIF_F_MULTI_QUEUE & dev->features)); } #endif /* 
+CONFIG_NET_MULTI_QUEUE_DEVICE */

RE: [PATCH 1/2] NET: Multiple queue network device support

2007-02-23 Thread Sreenivasa Honnur
Function "map_queue" returns queue_index as 0 always. There is no
support to return different queue numbers.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Kok, Auke
Sent: Friday, February 09, 2007 5:40 AM
To: David Miller; Garzik, Jeff; netdev@vger.kernel.org;
linux-kernel@vger.kernel.org
Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke;
Ronciak, John
Subject: [PATCH 1/2] NET: Multiple queue network device support


From: Peter Waskiewicz Jr <[EMAIL PROTECTED]>

Added an API and associated supporting routines for multiqueue network
devices.  This allows network devices supporting multiple TX queues to
configure each queue within the netdevice and manage each queue
independantly.  Changes to the PRIO Qdisc also allow a user to map
multiple flows to individual TX queues, taking advantage of each queue
on the device.

Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]>
Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 include/linux/netdevice.h |   73 
 include/net/sch_generic.h |3 +
 net/Kconfig   |   20 
 net/core/dev.c|   51 
 net/sched/sch_generic.c   |  117
+
 net/sched/sch_prio.c  |  106
++---
 6 files changed, 364 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index
2e37f50..c7f94a8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -106,6 +106,16 @@ struct netpoll_info;  #define MAX_HEADER
(LL_MAX_HEADER + 48)  #endif
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+
+struct net_device_subqueue
+{
+   /* Give a lock and a flow control state for each queue */
+   unsigned long   state;
+   spinlock_t  queue_lock cacheline_aligned_in_smp;
+};
+#endif
+
 /*
  * Network device statistics. Akin to the 2.0 ether stats but
  * with byte counters.
@@ -339,6 +349,10 @@ struct net_device
 #define NETIF_F_GEN_CSUM   (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
 #define NETIF_F_ALL_CSUM   (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+#define NETIF_F_MULTI_QUEUE8192/* Has multiple TX/RX queues */
+#endif
+
struct net_device   *next_sched;
 
/* Interface index. Unique device identifier*/
@@ -532,6 +546,19 @@ struct net_device
struct device   dev;
/* space for optional statistics and wireless sysfs groups */
struct attribute_group  *sysfs_groups[3];
+
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+   /* To retrieve statistics per subqueue - FOR FUTURE USE */
+   struct net_device_stats* (*get_subqueue_stats)(struct net_device
*dev,
+   int
queue_index);
+
+   /* The TX queue control structures */
+   struct net_device_subqueue  *egress_subqueue;
+   int egress_subqueue_count;
+   int (*hard_start_subqueue_xmit)(struct
sk_buff *skb,
+   struct
net_device *dev,
+   int
queue_index);
+#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -673,6 +700,52 @@ static inline int netif_running(const struct
net_device *dev)
return test_bit(__LINK_STATE_START, >state);  }
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+
+/*
+ * Routines to manage the subqueues on a device.  We only need start
+ * stop, and a check if it's stopped.  All other device management is
+ * done at the overall netdevice level.
+ * Also test the netif state if we're multi-queue at all.
+ */
+static inline void netif_start_subqueue(struct net_device *dev, int
+queue_index) {
+   clear_bit(__LINK_STATE_XOFF,
+>egress_subqueue[queue_index].state);
+}
+
+static inline void netif_stop_subqueue(struct net_device *dev, int
+queue_index) { #ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   set_bit(__LINK_STATE_XOFF,
>egress_subqueue[queue_index].state);
+}
+
+static inline int netif_subqueue_stopped(const struct net_device *dev,
+ int queue_index) {
+   return test_bit(__LINK_STATE_XOFF,
+   >egress_subqueue[queue_index].state);
+}
+
+static inline void netif_wake_subqueue(struct net_device *dev, int
+queue_index) { #ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   if (test_and_clear_bit(__LINK_STATE_XOFF,
+
>egress_subqueue[queue_index].state))
+   __netif_schedule(dev);
+}
+
+static inline int netif_is_multiqueue(const struct net_device *dev) {
+   return (!!(NETIF_F_MULTI_QUEUE & dev->features)); } #endif /* 
+CONFIG_NET_MULTI_QUEUE_DEVICE */

RE: [PATCH 1/2] NET: Multiple queue network device support

2007-02-23 Thread Sreenivasa Honnur
Function map_queue returns queue_index as 0 always. There is no
support to return different queue numbers.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Kok, Auke
Sent: Friday, February 09, 2007 5:40 AM
To: David Miller; Garzik, Jeff; netdev@vger.kernel.org;
linux-kernel@vger.kernel.org
Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke;
Ronciak, John
Subject: [PATCH 1/2] NET: Multiple queue network device support


From: Peter Waskiewicz Jr [EMAIL PROTECTED]

Added an API and associated supporting routines for multiqueue network
devices.  This allows network devices supporting multiple TX queues to
configure each queue within the netdevice and manage each queue
independantly.  Changes to the PRIO Qdisc also allow a user to map
multiple flows to individual TX queues, taking advantage of each queue
on the device.

Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED]
Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 include/linux/netdevice.h |   73 
 include/net/sch_generic.h |3 +
 net/Kconfig   |   20 
 net/core/dev.c|   51 
 net/sched/sch_generic.c   |  117
+
 net/sched/sch_prio.c  |  106
++---
 6 files changed, 364 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index
2e37f50..c7f94a8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -106,6 +106,16 @@ struct netpoll_info;  #define MAX_HEADER
(LL_MAX_HEADER + 48)  #endif
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+
+struct net_device_subqueue
+{
+   /* Give a lock and a flow control state for each queue */
+   unsigned long   state;
+   spinlock_t  queue_lock cacheline_aligned_in_smp;
+};
+#endif
+
 /*
  * Network device statistics. Akin to the 2.0 ether stats but
  * with byte counters.
@@ -339,6 +349,10 @@ struct net_device
 #define NETIF_F_GEN_CSUM   (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
 #define NETIF_F_ALL_CSUM   (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+#define NETIF_F_MULTI_QUEUE8192/* Has multiple TX/RX queues */
+#endif
+
struct net_device   *next_sched;
 
/* Interface index. Unique device identifier*/
@@ -532,6 +546,19 @@ struct net_device
struct device   dev;
/* space for optional statistics and wireless sysfs groups */
struct attribute_group  *sysfs_groups[3];
+
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+   /* To retrieve statistics per subqueue - FOR FUTURE USE */
+   struct net_device_stats* (*get_subqueue_stats)(struct net_device
*dev,
+   int
queue_index);
+
+   /* The TX queue control structures */
+   struct net_device_subqueue  *egress_subqueue;
+   int egress_subqueue_count;
+   int (*hard_start_subqueue_xmit)(struct
sk_buff *skb,
+   struct
net_device *dev,
+   int
queue_index);
+#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -673,6 +700,52 @@ static inline int netif_running(const struct
net_device *dev)
return test_bit(__LINK_STATE_START, dev-state);  }
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+
+/*
+ * Routines to manage the subqueues on a device.  We only need start
+ * stop, and a check if it's stopped.  All other device management is
+ * done at the overall netdevice level.
+ * Also test the netif state if we're multi-queue at all.
+ */
+static inline void netif_start_subqueue(struct net_device *dev, int
+queue_index) {
+   clear_bit(__LINK_STATE_XOFF,
+dev-egress_subqueue[queue_index].state);
+}
+
+static inline void netif_stop_subqueue(struct net_device *dev, int
+queue_index) { #ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   set_bit(__LINK_STATE_XOFF,
dev-egress_subqueue[queue_index].state);
+}
+
+static inline int netif_subqueue_stopped(const struct net_device *dev,
+ int queue_index) {
+   return test_bit(__LINK_STATE_XOFF,
+   dev-egress_subqueue[queue_index].state);
+}
+
+static inline void netif_wake_subqueue(struct net_device *dev, int
+queue_index) { #ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   if (test_and_clear_bit(__LINK_STATE_XOFF,
+
dev-egress_subqueue[queue_index].state))
+   __netif_schedule(dev);
+}
+
+static inline int netif_is_multiqueue(const struct net_device *dev) {
+   return (!!(NETIF_F_MULTI_QUEUE  dev-features)); } #endif /* 
+CONFIG_NET_MULTI_QUEUE_DEVICE */
+
 
 /* Use this variant when it is known

RE: [PATCH 1/2] NET: Multiple queue network device support

2007-02-23 Thread Sreenivasa Honnur
Fucntion map_queue returns queue index as '0'. There is no support to
return different queue indexes.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Kok, Auke
Sent: Friday, February 09, 2007 5:40 AM
To: David Miller; Garzik, Jeff; netdev@vger.kernel.org;
linux-kernel@vger.kernel.org
Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke;
Ronciak, John
Subject: [PATCH 1/2] NET: Multiple queue network device support


From: Peter Waskiewicz Jr [EMAIL PROTECTED]

Added an API and associated supporting routines for multiqueue network
devices.  This allows network devices supporting multiple TX queues to
configure each queue within the netdevice and manage each queue
independantly.  Changes to the PRIO Qdisc also allow a user to map
multiple flows to individual TX queues, taking advantage of each queue
on the device.

Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED]
Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 include/linux/netdevice.h |   73 
 include/net/sch_generic.h |3 +
 net/Kconfig   |   20 
 net/core/dev.c|   51 
 net/sched/sch_generic.c   |  117
+
 net/sched/sch_prio.c  |  106
++---
 6 files changed, 364 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index
2e37f50..c7f94a8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -106,6 +106,16 @@ struct netpoll_info;  #define MAX_HEADER
(LL_MAX_HEADER + 48)  #endif
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+
+struct net_device_subqueue
+{
+   /* Give a lock and a flow control state for each queue */
+   unsigned long   state;
+   spinlock_t  queue_lock cacheline_aligned_in_smp;
+};
+#endif
+
 /*
  * Network device statistics. Akin to the 2.0 ether stats but
  * with byte counters.
@@ -339,6 +349,10 @@ struct net_device
 #define NETIF_F_GEN_CSUM   (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
 #define NETIF_F_ALL_CSUM   (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+#define NETIF_F_MULTI_QUEUE8192/* Has multiple TX/RX queues */
+#endif
+
struct net_device   *next_sched;
 
/* Interface index. Unique device identifier*/
@@ -532,6 +546,19 @@ struct net_device
struct device   dev;
/* space for optional statistics and wireless sysfs groups */
struct attribute_group  *sysfs_groups[3];
+
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+   /* To retrieve statistics per subqueue - FOR FUTURE USE */
+   struct net_device_stats* (*get_subqueue_stats)(struct net_device
*dev,
+   int
queue_index);
+
+   /* The TX queue control structures */
+   struct net_device_subqueue  *egress_subqueue;
+   int egress_subqueue_count;
+   int (*hard_start_subqueue_xmit)(struct
sk_buff *skb,
+   struct
net_device *dev,
+   int
queue_index);
+#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -673,6 +700,52 @@ static inline int netif_running(const struct
net_device *dev)
return test_bit(__LINK_STATE_START, dev-state);  }
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+
+/*
+ * Routines to manage the subqueues on a device.  We only need start
+ * stop, and a check if it's stopped.  All other device management is
+ * done at the overall netdevice level.
+ * Also test the netif state if we're multi-queue at all.
+ */
+static inline void netif_start_subqueue(struct net_device *dev, int
+queue_index) {
+   clear_bit(__LINK_STATE_XOFF,
+dev-egress_subqueue[queue_index].state);
+}
+
+static inline void netif_stop_subqueue(struct net_device *dev, int
+queue_index) { #ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   set_bit(__LINK_STATE_XOFF,
dev-egress_subqueue[queue_index].state);
+}
+
+static inline int netif_subqueue_stopped(const struct net_device *dev,
+ int queue_index) {
+   return test_bit(__LINK_STATE_XOFF,
+   dev-egress_subqueue[queue_index].state);
+}
+
+static inline void netif_wake_subqueue(struct net_device *dev, int
+queue_index) { #ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   if (test_and_clear_bit(__LINK_STATE_XOFF,
+
dev-egress_subqueue[queue_index].state))
+   __netif_schedule(dev);
+}
+
+static inline int netif_is_multiqueue(const struct net_device *dev) {
+   return (!!(NETIF_F_MULTI_QUEUE  dev-features)); } #endif /* 
+CONFIG_NET_MULTI_QUEUE_DEVICE */
+
 
 /* Use this variant when it is known for sure

RE: [PATCH 1/2] NET: Multiple queue network device support

2007-02-23 Thread Waskiewicz Jr, Peter P
map_queue will always return 0 in the pfifo_fast Qdisc.  This is because
pfifo_fast is the default scheduler when a device is brought up.  I
didn't want to make any assumptions about a device's functionality, so
using queue 0 seems the least dangerous.

However, in the prio qdisc, there is a small problem with the queue to
band assignment logic.  If you have the same number of bands as queues
(e.g. 4 prio bands and 4 Tx queues), then you will have all bands
assigned to queue 0.  I have a fix for that, which I plan to send when I
finish up a documentation piece for this patch's repost.

Thanks for the feedback,

PJ Waskiewicz

 -Original Message-
 From: Sreenivasa Honnur [mailto:[EMAIL PROTECTED] 
 Sent: Friday, February 23, 2007 1:01 AM
 To: Kok, Auke-jan H; David Miller; Garzik, Jeff; 
 netdev@vger.kernel.org; linux-kernel@vger.kernel.org
 Cc: Waskiewicz Jr, Peter P; Brandeburg, Jesse; Kok, Auke; 
 Ronciak, John
 Subject: RE: [PATCH 1/2] NET: Multiple queue network device support
 
 Fucntion map_queue returns queue index as '0'. There is no 
 support to return different queue indexes.
 
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED]
 On Behalf Of Kok, Auke
 Sent: Friday, February 09, 2007 5:40 AM
 To: David Miller; Garzik, Jeff; netdev@vger.kernel.org; 
 linux-kernel@vger.kernel.org
 Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, 
 Auke; Ronciak, John
 Subject: [PATCH 1/2] NET: Multiple queue network device support
 
 
 From: Peter Waskiewicz Jr [EMAIL PROTECTED]
 
 Added an API and associated supporting routines for 
 multiqueue network devices.  This allows network devices 
 supporting multiple TX queues to configure each queue within 
 the netdevice and manage each queue independantly.  Changes 
 to the PRIO Qdisc also allow a user to map multiple flows to 
 individual TX queues, taking advantage of each queue on the device.
 
 Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED]
 Signed-off-by: Auke Kok [EMAIL PROTECTED]
 ---
 
  include/linux/netdevice.h |   73 
  include/net/sch_generic.h |3 +
  net/Kconfig   |   20 
  net/core/dev.c|   51 
  net/sched/sch_generic.c   |  117
 +
  net/sched/sch_prio.c  |  106
 ++---
  6 files changed, 364 insertions(+), 6 deletions(-)
 
 diff --git a/include/linux/netdevice.h 
 b/include/linux/netdevice.h index
 2e37f50..c7f94a8 100644
 --- a/include/linux/netdevice.h
 +++ b/include/linux/netdevice.h
 @@ -106,6 +106,16 @@ struct netpoll_info;  #define MAX_HEADER 
 (LL_MAX_HEADER + 48)  #endif
  
 +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
 +
 +struct net_device_subqueue
 +{
 + /* Give a lock and a flow control state for each queue */
 + unsigned long   state;
 + spinlock_t  queue_lock cacheline_aligned_in_smp;
 +};
 +#endif
 +
  /*
   *   Network device statistics. Akin to the 2.0 ether stats but
   *   with byte counters.
 @@ -339,6 +349,10 @@ struct net_device
  #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
  #define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
  
 +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
 +#define NETIF_F_MULTI_QUEUE  8192/* Has multiple TX/RX queues */
 +#endif
 +
   struct net_device   *next_sched;
  
   /* Interface index. Unique device identifier*/
 @@ -532,6 +546,19 @@ struct net_device
   struct device   dev;
   /* space for optional statistics and wireless sysfs groups */
   struct attribute_group  *sysfs_groups[3];
 +
 +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
 + /* To retrieve statistics per subqueue - FOR FUTURE USE */
 + struct net_device_stats* (*get_subqueue_stats)(struct net_device
 *dev,
 + int
 queue_index);
 +
 + /* The TX queue control structures */
 + struct net_device_subqueue  *egress_subqueue;
 + int egress_subqueue_count;
 + int (*hard_start_subqueue_xmit)(struct
 sk_buff *skb,
 + struct
 net_device *dev,
 + int
 queue_index);
 +#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */
  };
  #define to_net_dev(d) container_of(d, struct net_device, dev)
  
 @@ -673,6 +700,52 @@ static inline int netif_running(const 
 struct net_device *dev)
   return test_bit(__LINK_STATE_START, dev-state);  }
  
 +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
 +
 +/*
 + * Routines to manage the subqueues on a device.  We only need start
 + * stop, and a check if it's stopped.  All other device management is
 + * done at the overall netdevice level.
 + * Also test the netif state if we're multi-queue at all.
 + */
 +static inline void netif_start_subqueue(struct net_device *dev, int
 +queue_index) {
 + clear_bit(__LINK_STATE_XOFF

Re: [PATCH 1/2] NET: Multiple queue network device support

2007-02-23 Thread Stephen Hemminger
On Fri, 23 Feb 2007 04:00:55 -0500
Sreenivasa Honnur [EMAIL PROTECTED] wrote:

 Fucntion map_queue returns queue index as '0'. There is no support to
 return different queue indexes.
 
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 On Behalf Of Kok, Auke
 Sent: Friday, February 09, 2007 5:40 AM
 To: David Miller; Garzik, Jeff; netdev@vger.kernel.org;
 linux-kernel@vger.kernel.org
 Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke;
 Ronciak, John
 Subject: [PATCH 1/2] NET: Multiple queue network device support
 
 
 From: Peter Waskiewicz Jr [EMAIL PROTECTED]
 
 Added an API and associated supporting routines for multiqueue network
 devices.  This allows network devices supporting multiple TX queues to
 configure each queue within the netdevice and manage each queue
 independantly.  Changes to the PRIO Qdisc also allow a user to map
 multiple flows to individual TX queues, taking advantage of each queue
 on the device.
 
 Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED]
 Signed-off-by: Auke Kok [EMAIL PROTECTED]
 ---

  
 +config NET_MULTI_QUEUE_DEVICE
 + bool Multiple queue network device support (EXPERIMENTAL)
 + depends on NET_SCHED  EXPERIMENTAL
 + help
 +   Saying Y here will add support for network devices that have
 more than
 +   one physical TX queue and/or RX queue.
 +
 +   Multiple queue devices will require qdiscs that understand how
 to
 +   queue to multiple targets.  The default packet scheduler will
 default
 +   to the first queue in a device.  In other words, if you need
 the
 +   ability to spread traffic across queues, your queueing
 discipline
 +   needs to know how to do that.
 +
 +   Note that saying Y here will give preferential treatment to
 multiple
 +   queue devices in the network stack.  A slight drop in
 single-queue
 +   device performance may be seen.
 +
 +   Say N here unless you know your network device supports
 multiple
 +   TX and/or RX queues.
 +

This should not be a user visible configuration option.
It should either: always be part of the kernel API
or be selected by drivers that need/want it.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] NET: Multiple queue network device support

2007-02-23 Thread Kok, Auke

Stephen Hemminger wrote:

On Fri, 23 Feb 2007 04:00:55 -0500
Sreenivasa Honnur [EMAIL PROTECTED] wrote:


Fucntion map_queue returns queue index as '0'. There is no support to
return different queue indexes.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Kok, Auke
Sent: Friday, February 09, 2007 5:40 AM
To: David Miller; Garzik, Jeff; netdev@vger.kernel.org;
linux-kernel@vger.kernel.org
Cc: Kok, Auke; Peter Waskiewicz Jr; Brandeburg, Jesse; Kok, Auke;
Ronciak, John
Subject: [PATCH 1/2] NET: Multiple queue network device support


From: Peter Waskiewicz Jr [EMAIL PROTECTED]

Added an API and associated supporting routines for multiqueue network
devices.  This allows network devices supporting multiple TX queues to
configure each queue within the netdevice and manage each queue
independantly.  Changes to the PRIO Qdisc also allow a user to map
multiple flows to individual TX queues, taking advantage of each queue
on the device.

Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED]
Signed-off-by: Auke Kok [EMAIL PROTECTED]
---


 
+config NET_MULTI_QUEUE_DEVICE

+   bool Multiple queue network device support (EXPERIMENTAL)
+   depends on NET_SCHED  EXPERIMENTAL
+   help
+ Saying Y here will add support for network devices that have
more than
+ one physical TX queue and/or RX queue.
+
+ Multiple queue devices will require qdiscs that understand how
to
+ queue to multiple targets.  The default packet scheduler will
default
+ to the first queue in a device.  In other words, if you need
the
+ ability to spread traffic across queues, your queueing
discipline
+ needs to know how to do that.
+
+ Note that saying Y here will give preferential treatment to
multiple
+ queue devices in the network stack.  A slight drop in
single-queue
+ device performance may be seen.
+
+ Say N here unless you know your network device supports
multiple
+ TX and/or RX queues.
+


This should not be a user visible configuration option.
It should either: always be part of the kernel API
or be selected by drivers that need/want it.


perhaps when it's stable, yes, but right now it's definately experimental and 
may result in a slight overhead for single-queue devices as the text reads.



Cheers,

Auke
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] NET: Multiple queue network device support

2007-02-08 Thread Kok, Auke

From: Peter Waskiewicz Jr <[EMAIL PROTECTED]>

Added an API and associated supporting routines for multiqueue network
devices.  This allows network devices supporting multiple TX queues to
configure each queue within the netdevice and manage each queue
independantly.  Changes to the PRIO Qdisc also allow a user to map
multiple flows to individual TX queues, taking advantage of each queue
on the device.

Signed-off-by: Peter Waskiewicz Jr <[EMAIL PROTECTED]>
Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 include/linux/netdevice.h |   73 
 include/net/sch_generic.h |3 +
 net/Kconfig   |   20 
 net/core/dev.c|   51 
 net/sched/sch_generic.c   |  117 +
 net/sched/sch_prio.c  |  106 ++---
 6 files changed, 364 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2e37f50..c7f94a8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -106,6 +106,16 @@ struct netpoll_info;
 #define MAX_HEADER (LL_MAX_HEADER + 48)
 #endif
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+
+struct net_device_subqueue
+{
+   /* Give a lock and a flow control state for each queue */
+   unsigned long   state;
+   spinlock_t  queue_lock cacheline_aligned_in_smp;
+};
+#endif
+
 /*
  * Network device statistics. Akin to the 2.0 ether stats but
  * with byte counters.
@@ -339,6 +349,10 @@ struct net_device
 #define NETIF_F_GEN_CSUM   (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
 #define NETIF_F_ALL_CSUM   (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+#define NETIF_F_MULTI_QUEUE8192/* Has multiple TX/RX queues */
+#endif
+
struct net_device   *next_sched;
 
/* Interface index. Unique device identifier*/
@@ -532,6 +546,19 @@ struct net_device
struct device   dev;
/* space for optional statistics and wireless sysfs groups */
struct attribute_group  *sysfs_groups[3];
+
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+   /* To retrieve statistics per subqueue - FOR FUTURE USE */
+   struct net_device_stats* (*get_subqueue_stats)(struct net_device *dev,
+   int queue_index);
+
+   /* The TX queue control structures */
+   struct net_device_subqueue  *egress_subqueue;
+   int egress_subqueue_count;
+   int (*hard_start_subqueue_xmit)(struct sk_buff *skb,
+   struct net_device *dev,
+   int queue_index);
+#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -673,6 +700,52 @@ static inline int netif_running(const struct net_device 
*dev)
return test_bit(__LINK_STATE_START, >state);
 }
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+
+/*
+ * Routines to manage the subqueues on a device.  We only need start
+ * stop, and a check if it's stopped.  All other device management is
+ * done at the overall netdevice level.
+ * Also test the netif state if we're multi-queue at all.
+ */
+static inline void netif_start_subqueue(struct net_device *dev, int 
queue_index)
+{
+   clear_bit(__LINK_STATE_XOFF, >egress_subqueue[queue_index].state);
+}
+
+static inline void netif_stop_subqueue(struct net_device *dev, int queue_index)
+{
+#ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   set_bit(__LINK_STATE_XOFF, >egress_subqueue[queue_index].state);
+}
+
+static inline int netif_subqueue_stopped(const struct net_device *dev,
+ int queue_index)
+{
+   return test_bit(__LINK_STATE_XOFF,
+   >egress_subqueue[queue_index].state);
+}
+
+static inline void netif_wake_subqueue(struct net_device *dev, int queue_index)
+{
+#ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   if (test_and_clear_bit(__LINK_STATE_XOFF,
+  >egress_subqueue[queue_index].state))
+   __netif_schedule(dev);
+}
+
+static inline int netif_is_multiqueue(const struct net_device *dev)
+{
+   return (!!(NETIF_F_MULTI_QUEUE & dev->features));
+}
+#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */
+
 
 /* Use this variant when it is known for sure that it
  * is executing from interrupt context.
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 8208639..8751ba6 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -102,6 +102,9 @@ struct Qdisc_ops
 
int (*dump)(struct Qdisc *, struct sk_buff *);
int (*dump_stats)(struct Qdisc *, struct gnet_dump 
*);
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+  

[PATCH 1/2] NET: Multiple queue network device support

2007-02-08 Thread Kok, Auke

From: Peter Waskiewicz Jr [EMAIL PROTECTED]

Added an API and associated supporting routines for multiqueue network
devices.  This allows network devices supporting multiple TX queues to
configure each queue within the netdevice and manage each queue
independantly.  Changes to the PRIO Qdisc also allow a user to map
multiple flows to individual TX queues, taking advantage of each queue
on the device.

Signed-off-by: Peter Waskiewicz Jr [EMAIL PROTECTED]
Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 include/linux/netdevice.h |   73 
 include/net/sch_generic.h |3 +
 net/Kconfig   |   20 
 net/core/dev.c|   51 
 net/sched/sch_generic.c   |  117 +
 net/sched/sch_prio.c  |  106 ++---
 6 files changed, 364 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2e37f50..c7f94a8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -106,6 +106,16 @@ struct netpoll_info;
 #define MAX_HEADER (LL_MAX_HEADER + 48)
 #endif
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+
+struct net_device_subqueue
+{
+   /* Give a lock and a flow control state for each queue */
+   unsigned long   state;
+   spinlock_t  queue_lock cacheline_aligned_in_smp;
+};
+#endif
+
 /*
  * Network device statistics. Akin to the 2.0 ether stats but
  * with byte counters.
@@ -339,6 +349,10 @@ struct net_device
 #define NETIF_F_GEN_CSUM   (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
 #define NETIF_F_ALL_CSUM   (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+#define NETIF_F_MULTI_QUEUE8192/* Has multiple TX/RX queues */
+#endif
+
struct net_device   *next_sched;
 
/* Interface index. Unique device identifier*/
@@ -532,6 +546,19 @@ struct net_device
struct device   dev;
/* space for optional statistics and wireless sysfs groups */
struct attribute_group  *sysfs_groups[3];
+
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+   /* To retrieve statistics per subqueue - FOR FUTURE USE */
+   struct net_device_stats* (*get_subqueue_stats)(struct net_device *dev,
+   int queue_index);
+
+   /* The TX queue control structures */
+   struct net_device_subqueue  *egress_subqueue;
+   int egress_subqueue_count;
+   int (*hard_start_subqueue_xmit)(struct sk_buff *skb,
+   struct net_device *dev,
+   int queue_index);
+#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -673,6 +700,52 @@ static inline int netif_running(const struct net_device 
*dev)
return test_bit(__LINK_STATE_START, dev-state);
 }
 
+#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
+
+/*
+ * Routines to manage the subqueues on a device.  We only need start
+ * stop, and a check if it's stopped.  All other device management is
+ * done at the overall netdevice level.
+ * Also test the netif state if we're multi-queue at all.
+ */
+static inline void netif_start_subqueue(struct net_device *dev, int 
queue_index)
+{
+   clear_bit(__LINK_STATE_XOFF, dev-egress_subqueue[queue_index].state);
+}
+
+static inline void netif_stop_subqueue(struct net_device *dev, int queue_index)
+{
+#ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   set_bit(__LINK_STATE_XOFF, dev-egress_subqueue[queue_index].state);
+}
+
+static inline int netif_subqueue_stopped(const struct net_device *dev,
+ int queue_index)
+{
+   return test_bit(__LINK_STATE_XOFF,
+   dev-egress_subqueue[queue_index].state);
+}
+
+static inline void netif_wake_subqueue(struct net_device *dev, int queue_index)
+{
+#ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   if (test_and_clear_bit(__LINK_STATE_XOFF,
+  dev-egress_subqueue[queue_index].state))
+   __netif_schedule(dev);
+}
+
+static inline int netif_is_multiqueue(const struct net_device *dev)
+{
+   return (!!(NETIF_F_MULTI_QUEUE  dev-features));
+}
+#endif /* CONFIG_NET_MULTI_QUEUE_DEVICE */
+
 
 /* Use this variant when it is known for sure that it
  * is executing from interrupt context.
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 8208639..8751ba6 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -102,6 +102,9 @@ struct Qdisc_ops
 
int (*dump)(struct Qdisc *, struct sk_buff *);
int (*dump_stats)(struct Qdisc *, struct gnet_dump 
*);
+#ifdef