RE: [PATCH] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-11 Thread Waskiewicz Jr, Peter P
 +  skb-queue_mapping =
 +   
 q-prio2band[q-band2queue[bandTC_PRIO_MAX]];
 
 
 Does this needs to be cleared at some point again? TC actions might 
 redirect or mirror packets to other (multiqueue) devices.
  
  
  If an skb is redirected to another device, the skb should 
 be filtered 
  through that device's qdisc, yes?
 
 
 Yes, but the device might not have a queue or use something 
 different than prio, so the value would stay the same. I 
 think you need to clear it before enqueueing a packet or 
 alternatively when redirecting in the mirred action.
 

I see what you're saying now - I'll set the queue_mapping to zero before
the enqueue occurs.  Thanks for the feedback.

-PJ Waskiewicz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-11 Thread Waskiewicz Jr, Peter P
 Thanks.
 
  However, the PRIO qdisc still uses the priority in the bands for 
  dequeueing priority, and will feed the queues on the NIC.
  The e1000, and any other multiqueue NIC, will schedule Tx 
 based on how 
  the PRIO qdisc feeds the queues.  So the only priority here is the 
  dequeuing priority from the kernel.  The e1000 will use the new API 
  for starting/stopping the individual queues based on the 
 descriptors 
  available, much like it does today for the global queue.
 
 
 Packets will only be dequeued from a band if the associated 
 subqueue is active, which moves the decision from prio to the 
 driver, no?
 What policy does e1000 use for scheduling its internal queues?
 

E1000 is handed the skb's from PRIO to whichever queue the PRIO flows
dictate (based on band2queue mapping, tc filters, or TOS to
skb-priority filtering).  Once the skb hits the e1000, the internal
policy is round-robin on the hardware queues to dequeue and transmit on
the wire.  I agree the NIC does influence the decision of which band an
skb will be dequeued from based on available descriptors, etc., but
that's one of the goals of multiqueue: don't allow another traffic flow
in one queue stop or impact another separate flow.

Once NICs start using hardware-based priority scheduling (wireless?), we
can use a round-robin type qdisc in the kernel to dequeue, and let the
hardware directly decide how important one flow is over another.

Hope this answers your question.

Cheers,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-11 Thread Patrick McHardy
Waskiewicz Jr, Peter P wrote:
Packets will only be dequeued from a band if the associated 
subqueue is active, which moves the decision from prio to the 
driver, no?
What policy does e1000 use for scheduling its internal queues?

 
 
 E1000 is handed the skb's from PRIO to whichever queue the PRIO flows
 dictate (based on band2queue mapping, tc filters, or TOS to
 skb-priority filtering).  Once the skb hits the e1000, the internal
 policy is round-robin on the hardware queues to dequeue and transmit on
 the wire.  I agree the NIC does influence the decision of which band an
 skb will be dequeued from based on available descriptors, etc., but
 that's one of the goals of multiqueue: don't allow another traffic flow
 in one queue stop or impact another separate flow.


Yes, I'm not saying its wrong, but it will surprise users that put a
prio qdisc on a multiqueue device without knowing anything about the
multiqueue capabilities and suddenly don't have strict priority
anymore. If it changes the behaviour, it should be explicitly enabled
by the user in the prio qdisc configuration. This would also allow
to move the band2queue mapping policy to userspace.

 Once NICs start using hardware-based priority scheduling (wireless?), we
 can use a round-robin type qdisc in the kernel to dequeue, and let the
 hardware directly decide how important one flow is over another.


You bring up a good point, it would be good to hear the opinion from
one of the wireless people on this since they have their own multiqueue
scheduler in the wireless-dev tree.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-11 Thread Zhu Yi
On Wed, 2007-04-11 at 19:03 +0200, Patrick McHardy wrote:
 
 You bring up a good point, it would be good to hear the opinion from
 one of the wireless people on this since they have their own
 multiqueue scheduler in the wireless-dev tree. 

The one in the wireless-dev is pretty much like this one. It existed
only because there was not such a multiqueue aware qdisc available at
that time.

The requirement for wireless is the same as the strict PRIO with an
addition that the dequeued SKB's corresponding NIC hardware queue must
be active (this is also true for other devices I think, otherwise it has
to be requeued which leads a busy or dead loop in the end). In other
words, the dequeue method should select the SKB with the highest
priority from all the ACTIVE hardware queues (not all queues). The
wireless hardware then schedules all the packets from its 4 hardware TX
queues based on the priority and network environment.

Thanks,
-yi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-10 Thread Patrick McHardy
Waskiewicz Jr, Peter P wrote:
 Thanks Pat for the initial feedback.  I can post a set of patches to
 e1000 using the new API; I'll try to get them out asap (need to apply to
 this kernel tree).


Thanks.

 However, the PRIO qdisc still uses the priority in
 the bands for dequeueing priority, and will feed the queues on the NIC.
 The e1000, and any other multiqueue NIC, will schedule Tx based on how
 the PRIO qdisc feeds the queues.  So the only priority here is the
 dequeuing priority from the kernel.  The e1000 will use the new API for
 starting/stopping the individual queues based on the descriptors
 available, much like it does today for the global queue.


Packets will only be dequeued from a band if the associated subqueue
is active, which moves the decision from prio to the driver, no?
What policy does e1000 use for scheduling its internal queues?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-10 Thread Patrick McHardy
Peter P Waskiewicz Jr wrote:
 + /* To retrieve statistics per subqueue - FOR FUTURE USE */
 + struct net_device_stats* (*get_subqueue_stats)(struct net_device *dev,
 + int queue_index);


Please no future use stuff, just add it when you need it.

 diff --git a/net/core/dev.c b/net/core/dev.c
 index 219a57f..c11c8fa 100644
 --- a/net/core/dev.c
 +++ b/net/core/dev.c
 @@ -3326,12 +3328,23 @@ struct net_device *alloc_netdev(int sizeof_priv, 
 const char *name,
   if (sizeof_priv)
   dev-priv = netdev_priv(dev);
  
 + alloc_size = (sizeof(struct net_device_subqueue) * queue_count);
 + 
 + p = kzalloc(alloc_size, GFP_KERNEL);
 + if (!p) {
 + printk(KERN_ERR alloc_netdev: Unable to allocate queues.\n);
 + return NULL;


This leaks the device. You treat every single-queue device as having
a single subqueue. If it doesn't get too ugly it would be nice to avoid
this and only allocate the subqueue states for real multiqueue devices.

 --- a/net/sched/sch_generic.c
 +++ b/net/sched/sch_generic.c
 @@ -133,7 +133,8 @@ static inline int qdisc_restart(struct net_device *dev)
   /* And release queue */
   spin_unlock(dev-queue_lock);
  
 - if (!netif_queue_stopped(dev)) {
 + if (!netif_queue_stopped(dev) 
 + !netif_subqueue_stopped(dev, skb-queue_mapping)) {
   int ret;
  
   ret = dev_hard_start_xmit(skb, dev);
 @@ -149,7 +150,6 @@ static inline int qdisc_restart(struct net_device *dev)
   goto collision;
   }
   }
 -


Unrelated whitespace change.

   /* NETDEV_TX_BUSY - we need to requeue */
   /* Release the driver */
   if (!nolock) {
 diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
 index 5cfe60b..7365621 100644
 --- a/net/sched/sch_prio.c
 +++ b/net/sched/sch_prio.c
 @@ -43,6 +43,7 @@ struct prio_sched_data
   struct tcf_proto *filter_list;
   u8  prio2band[TC_PRIO_MAX+1];
   struct Qdisc *queues[TCQ_PRIO_BANDS];
 + u16 band2queue[TC_PRIO_MAX + 1];
  };
  
  
 @@ -63,20 +64,26 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int 
 *qerr)
   case TC_ACT_SHOT:
   return NULL;
   };
 -

Same here

   if (!q-filter_list ) {
  #else
   if (!q-filter_list || tc_classify(skb, q-filter_list, res)) {
  #endif
   if (TC_H_MAJ(band))
   band = 0;
 + skb-queue_mapping =
 +   q-prio2band[q-band2queue[bandTC_PRIO_MAX]];
 +


Does this needs to be cleared at some point again? TC actions might
redirect or mirror packets to other (multiqueue) devices.

 @@ -242,6 +259,30 @@ static int prio_tune(struct Qdisc *sch, struct rtattr 
 *opt)
   }
   }
   }
 + /* setup queue to band mapping */
 + if (q-bands  sch-dev-egress_subqueue_count) {
 + qmapoffset = 1;
 + mod = 0;
 + } else {
 + mod = q-bands % sch-dev-egress_subqueue_count;
 + qmapoffset = q-bands / sch-dev-egress_subqueue_count +
 + ((mod) ? 1 : 0);
 + }
 +
 + queue = 0;
 + offset = 0;
 + for (i = 0; i  q-bands; i++) {
 + q-band2queue[i] = queue;
 + if ( ((i + 1) - offset) == qmapoffset) {
 + queue++;
 + offset += qmapoffset;
 + if (mod)
 + mod--;
 + qmapoffset = q-bands /
 + sch-dev-egress_subqueue_count +
 + ((mod) ? 1 : 0);
 + }
 + }


Besides being quite ugly, I don't think this does what you want.
For bands  queues we get band2queue[0] = 0, all others map to 1.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-10 Thread Waskiewicz Jr, Peter P
 Peter P Waskiewicz Jr wrote:
  +   /* To retrieve statistics per subqueue - FOR FUTURE USE */
  +   struct net_device_stats* (*get_subqueue_stats)(struct 
 net_device *dev,
  +   int 
 queue_index);
 
 
 Please no future use stuff, just add it when you need it.

Gotcha.  I'll remove this.

 
  diff --git a/net/core/dev.c b/net/core/dev.c index 219a57f..c11c8fa 
  100644
  --- a/net/core/dev.c
  +++ b/net/core/dev.c
  @@ -3326,12 +3328,23 @@ struct net_device *alloc_netdev(int 
 sizeof_priv, const char *name,
  if (sizeof_priv)
  dev-priv = netdev_priv(dev);
   
  +   alloc_size = (sizeof(struct net_device_subqueue) * queue_count);
  + 
  +   p = kzalloc(alloc_size, GFP_KERNEL);
  +   if (!p) {
  +   printk(KERN_ERR alloc_netdev: Unable to 
 allocate queues.\n);
  +   return NULL;
 
 
 This leaks the device. You treat every single-queue device as 
 having a single subqueue. If it doesn't get too ugly it would 
 be nice to avoid this and only allocate the subqueue states 
 for real multiqueue devices.

We went back and forth on this.  The reason we allocate a queue in every
case, even on single-queue devices, was to make the stack not have
branching for multiqueue and non-multiqueue devices.  If we don't have
at least one queue on a device, then we can't have
netif_subqueue_stopped() in the hotpath unless we check if a device is
multiqueue before.  The original patches I released had this branching,
and I was asked to not do that.  I'd also like to see all queue-related
stuff be pulled from net_device and put into net_device_subqueue at some
point, even for single-queue devices.  Thoughts?

 
  --- a/net/sched/sch_generic.c
  +++ b/net/sched/sch_generic.c
  @@ -133,7 +133,8 @@ static inline int qdisc_restart(struct 
 net_device *dev)
  /* And release queue */
  spin_unlock(dev-queue_lock);
   
  -   if (!netif_queue_stopped(dev)) {
  +   if (!netif_queue_stopped(dev) 
  +   !netif_subqueue_stopped(dev, 
 skb-queue_mapping)) {
  int ret;
   
  ret = dev_hard_start_xmit(skb, 
 dev); @@ -149,7 +150,6 @@ static 
  inline int qdisc_restart(struct net_device *dev)
  goto collision;
  }
  }
  -
 
 
 Unrelated whitespace change.

I'll fix that.

 
  /* NETDEV_TX_BUSY - we need to requeue */
  /* Release the driver */
  if (!nolock) {
  diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c index 
  5cfe60b..7365621 100644
  --- a/net/sched/sch_prio.c
  +++ b/net/sched/sch_prio.c
  @@ -43,6 +43,7 @@ struct prio_sched_data
  struct tcf_proto *filter_list;
  u8  prio2band[TC_PRIO_MAX+1];
  struct Qdisc *queues[TCQ_PRIO_BANDS];
  +   u16 band2queue[TC_PRIO_MAX + 1];
   };
   
   
  @@ -63,20 +64,26 @@ prio_classify(struct sk_buff *skb, 
 struct Qdisc *sch, int *qerr)
  case TC_ACT_SHOT:
  return NULL;
  };
  -
 
 Same here

I'll fix that too.

 
  if (!q-filter_list ) {
   #else
  if (!q-filter_list || tc_classify(skb, 
 q-filter_list, res)) {  
  #endif
  if (TC_H_MAJ(band))
  band = 0;
  +   skb-queue_mapping =
  + 
 q-prio2band[q-band2queue[bandTC_PRIO_MAX]];
  +
 
 
 Does this needs to be cleared at some point again? TC actions 
 might redirect or mirror packets to other (multiqueue) devices.

If an skb is redirected to another device, the skb should be filtered
through that device's qdisc, yes?

 
  @@ -242,6 +259,30 @@ static int prio_tune(struct Qdisc 
 *sch, struct rtattr *opt)
  }
  }
  }
  +   /* setup queue to band mapping */
  +   if (q-bands  sch-dev-egress_subqueue_count) {
  +   qmapoffset = 1;
  +   mod = 0;
  +   } else {
  +   mod = q-bands % sch-dev-egress_subqueue_count;
  +   qmapoffset = q-bands / 
 sch-dev-egress_subqueue_count +
  +   ((mod) ? 1 : 0);
  +   }
  +
  +   queue = 0;
  +   offset = 0;
  +   for (i = 0; i  q-bands; i++) {
  +   q-band2queue[i] = queue;
  +   if ( ((i + 1) - offset) == qmapoffset) {
  +   queue++;
  +   offset += qmapoffset;
  +   if (mod)
  +   mod--;
  +   qmapoffset = q-bands /
  +   sch-dev-egress_subqueue_count +
  +   ((mod) ? 1 : 0);
  +   }
  +   }
 
 
 Besides being quite ugly, I don't think this does what you want.
 For bands  queues we get band2queue[0] = 0, all others map to 1.
 

I see what's wrong with this.  If I removed mod = 0;, the algorithm
works as intended.  We 

Re: [PATCH] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-10 Thread Patrick McHardy
Waskiewicz Jr, Peter P wrote:
This leaks the device. You treat every single-queue device as 
having a single subqueue. If it doesn't get too ugly it would 
be nice to avoid this and only allocate the subqueue states 
for real multiqueue devices.
 
 
 We went back and forth on this.  The reason we allocate a queue in every
 case, even on single-queue devices, was to make the stack not have
 branching for multiqueue and non-multiqueue devices.  If we don't have
 at least one queue on a device, then we can't have
 netif_subqueue_stopped() in the hotpath unless we check if a device is
 multiqueue before.  The original patches I released had this branching,
 and I was asked to not do that.


OK, thanks for the explanation.

+skb-queue_mapping =
+ q-prio2band[q-band2queue[bandTC_PRIO_MAX]];


Does this needs to be cleared at some point again? TC actions 
might redirect or mirror packets to other (multiqueue) devices.
 
 
 If an skb is redirected to another device, the skb should be filtered
 through that device's qdisc, yes?


Yes, but the device might not have a queue or use something different
than prio, so the value would stay the same. I think you need to clear
it before enqueueing a packet or alternatively when redirecting in the
mirred action.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-09 Thread Patrick McHardy
Peter P Waskiewicz Jr wrote:
 From: Peter P Waskiewicz Jr [EMAIL PROTECTED]
 
 Update: Fixed a typecast in free_netdev() for the egress_subqueue list.
 
 Added an API and associated supporting routines for multiqueue network 
 devices.
 This allows network devices supporting multiple TX queues to configure each
 queue within the netdevice and manage each queue independantly.  Changes to 
 the
 PRIO Qdisc also allow a user to map multiple flows to individual TX queues,
 taking advantage of each queue on the device.


This indeed looks a lot better than the first patch. I'm too tired to
fully review this now, but could you please post the corresponding e1000
patch? From a quick look I'm guessing that this patch changes the
behaviour of the prio qdisc from strict priority to whatever scheduling
mechanism e1000 uses for its queues when the multiqueue config option
is enabled, which might surprise people.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-09 Thread Waskiewicz Jr, Peter P
 This indeed looks a lot better than the first patch. I'm too 
 tired to fully review this now, but could you please post the 
 corresponding e1000 patch? From a quick look I'm guessing 
 that this patch changes the behaviour of the prio qdisc from 
 strict priority to whatever scheduling mechanism e1000 uses 
 for its queues when the multiqueue config option is enabled, 
 which might surprise people.
 

Thanks Pat for the initial feedback.  I can post a set of patches to
e1000 using the new API; I'll try to get them out asap (need to apply to
this kernel tree).  However, the PRIO qdisc still uses the priority in
the bands for dequeueing priority, and will feed the queues on the NIC.
The e1000, and any other multiqueue NIC, will schedule Tx based on how
the PRIO qdisc feeds the queues.  So the only priority here is the
dequeuing priority from the kernel.  The e1000 will use the new API for
starting/stopping the individual queues based on the descriptors
available, much like it does today for the global queue.

-PJ Waskiewicz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html