Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread John Neiberger
> Unfortunately, there are no 'absolute' per queue counters, only per queue
> drop counters. So no easy way to determine if other queues are being
> utilized unless you just 'know' (based on your classification policies and
> known application mix) or those queues overflow & drop.
>
>
>> >
>> >
>> <...snip...>
>
>
>
>> > This suggests to me that there is traffic in other queues contending for
>> > the
>> > available bandwidth, and that there's periodically instantaneous
>> > congestion.
>> > Alternatively you could try sizing this queue bigger and using the
>> > original
>> > bandwidth ratio. Or a combination of those two (tweaking both bandwidth
>> > &
>> > queue-limit).
>> >
>> > Is there some issue with changing the bandwidth ratio on this queue (ie,
>> > are
>> > you seeing collateral damage)? Else, seems like you've solved the
>> > problem
>> > already ;)
>>
>> Nope, we don't have a problem with it. That's what we've been doing.
>> We haven't really been adjusting the queue limit ratios, though. In
>> most cases, we were just changing the bandwidth ratio weights. I'm
>> looking at an interface right now where the 30-second weighted traffic
>> rate has never gone above around 150 Mbps but I'm still seeing OQDs in
>> one of the queues only. How do you think we should be interpreting
>> that?
>
>
>
>
> In my opinion, it indicates that:
> 1. there is traffic in the other queues contending for the link bandwidth
> 2. there is instantaneous oversubscription that causes the problem queue to
> fill as it's not being serviced frequently enough and/or is inadequately
> sized
> 3. the other queues are sized/weighted appropriately to handle the amount of
> traffic that maps to them (ie, even under congestion scenarios, there is
> adequate buffer to hold enough packets to avoid drops)
>
> If #1 was not true, then I don't see how changing the bandwidth ratio would
> make any difference at all - if there is no traffic in the other queues,
> then the single remaining active queue would get full unrestricted access to
> the full bandwidth of the link and no queuing would be necessary in the
> first place.
>
> Supposing there is no traffic in the other queues - in that case, you could
> certainly still have oversubscription of the single queue and drops, but
> changing the weight should have no effect on that scenario at all (while
> changing the q-limit certainly could).
>
>
> 2 cents,
> Tim

I just ran across an older thread where someone was having the same
problem. In his case, he had a 1-gig source and a 1-gig receiver on
the same switch with no output drops. He moved the receiver to another
switch that was connected to the first switch via a 10-gig link. That
resulted in output drops toward the receiver, apparently because of
the difference in serialization delay on the second switch, i.e. it
didn't take as long to bring in a packet on the 10-gig as it did to
send it on the 1-gig, so the buffers were filling with bursty traffic
at low apparent traffic rates.

This is very interesting stuff. Just a little complicated.  :)
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread Tim Stevenson

At 10:53 AM 6/27/2012, Peter Rathlev pronounced:

On Wed, 2012-06-27 at 10:46 -0700, Tim Stevenson wrote:
> Unfortunately, there are no 'absolute' per queue counters, only per
> queue drop counters.

Any chance of that ever showing up on the Cat6500 platform? :-D


Not on 67xx cards; not sure if 69xx cards have capable hardware, have 
been off the c6k platform for quite a while.


Tim



Or as my lolcat would say: "i can haz absolute counters plz kthxby"

--
Peter





Tim Stevenson, tstev...@cisco.com
Routing & Switching CCIE #5561
Distinguished Technical Marketing Engineer, Cisco Nexus 7000
Cisco - http://www.cisco.com
IP Phone: 408-526-6759

The contents of this message may be *Cisco Confidential*
and are intended for the specified recipients only.


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread Peter Rathlev
On Wed, 2012-06-27 at 10:46 -0700, Tim Stevenson wrote:
> Unfortunately, there are no 'absolute' per queue counters, only per 
> queue drop counters.

Any chance of that ever showing up on the Cat6500 platform? :-D

Or as my lolcat would say: "i can haz absolute counters plz kthxby"

-- 
Peter


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread Tim Stevenson

Hi John, please see inline below:

At 10:05 AM 6/27/2012, John Neiberger pronounced:
<...snip...>

> What this should be doing is just causing us to service the queue more
> frequently. That could certainly reduce/eliminate drops in the event of
> congestion, but only if there is traffic in the other queues that is also
> contending for the bandwidth.
>
> In other words, if there is only one active queue (ie only one queue has
> traffic in it), then it can & should get full unrestricted access to the
> entire link bandwidth. Can you confirm whether there's traffic in the other
> queues?
>

I'm not certain whether or not we have traffic in the other queues. In
nearly all cases, the output drops are all in one queue with zero in
the other queues. That seems to indicate that either all of our
traffic is one queue or there just isn't a lot of traffic in the other
queues.



Unfortunately, there are no 'absolute' per queue counters, only per 
queue drop counters. So no easy way to determine if other queues are 
being utilized unless you just 'know' (based on your classification 
policies and known application mix) or those queues overflow & drop.




>
>
<...snip...>



> This suggests to me that there is traffic in other queues 
contending for the
> available bandwidth, and that there's periodically instantaneous 
congestion.

> Alternatively you could try sizing this queue bigger and using the original
> bandwidth ratio. Or a combination of those two (tweaking both bandwidth &
> queue-limit).
>
> Is there some issue with changing the bandwidth ratio on this 
queue (ie, are

> you seeing collateral damage)? Else, seems like you've solved the problem
> already ;)

Nope, we don't have a problem with it. That's what we've been doing.
We haven't really been adjusting the queue limit ratios, though. In
most cases, we were just changing the bandwidth ratio weights. I'm
looking at an interface right now where the 30-second weighted traffic
rate has never gone above around 150 Mbps but I'm still seeing OQDs in
one of the queues only. How do you think we should be interpreting
that?




In my opinion, it indicates that:
1. there is traffic in the other queues contending for the link bandwidth
2. there is instantaneous oversubscription that causes the problem 
queue to fill as it's not being serviced frequently enough and/or is 
inadequately sized
3. the other queues are sized/weighted appropriately to handle the 
amount of traffic that maps to them (ie, even under congestion 
scenarios, there is adequate buffer to hold enough packets to avoid drops)


If #1 was not true, then I don't see how changing the bandwidth ratio 
would make any difference at all - if there is no traffic in the 
other queues, then the single remaining active queue would get full 
unrestricted access to the full bandwidth of the link and no queuing 
would be necessary in the first place.


Supposing there is no traffic in the other queues - in that case, you 
could certainly still have oversubscription of the single queue and 
drops, but changing the weight should have no effect on that scenario 
at all (while changing the q-limit certainly could).



2 cents,
Tim






>
> Hope that helps,
> Tim

It helps a lot! thanks!

John





Tim Stevenson, tstev...@cisco.com
Routing & Switching CCIE #5561
Distinguished Technical Marketing Engineer, Cisco Nexus 7000
Cisco - http://www.cisco.com
IP Phone: 408-526-6759

The contents of this message may be *Cisco Confidential*
and are intended for the specified recipients only.


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread John Neiberger
> queue-limit and bandwidth values (ratios/weights) are *different* things.
>
> The queue-limit physically sizes the queue. It says how much of the total
> physical buffer on the port is set aside exclusively for each class (where
> class is based on DSCP or COS). Traffic from other classes can NEVER get
> access to the buffer set aside for another class, ie, there could be plenty
> of available buffer in other queues even as you're dropping traffic in one
> of the queues.
>
> The bandwidth ratios, on the other hand, determine how frequently each of
> those queues is serviced, ie, how often the scheduler will dequeue/transmit
> a frame from the queue. If there is nothing sitting in one queue, other
> queues can get access to that bandwidth, ie, "bandwidth" is not a hard
> limit, you can think of it as a minimum guarantee when there is
> congestion/contention.
>

That part I think I understand. Mostly.  :)  When I say bandwidth in
this context, i'm referring to the bandwidth ratio weight.

>> are fairly hard limits. That is in line with what we
>> were experiencing because we were seeing output queue drops when the
>> interface was not fully utilized. Increasing the queue bandwidth got
>> rid of the output queue drops.
>
>
>
> What this should be doing is just causing us to service the queue more
> frequently. That could certainly reduce/eliminate drops in the event of
> congestion, but only if there is traffic in the other queues that is also
> contending for the bandwidth.
>
> In other words, if there is only one active queue (ie only one queue has
> traffic in it), then it can & should get full unrestricted access to the
> entire link bandwidth. Can you confirm whether there's traffic in the other
> queues?
>

I'm not certain whether or not we have traffic in the other queues. In
nearly all cases, the output drops are all in one queue with zero in
the other queues. That seems to indicate that either all of our
traffic is one queue or there just isn't a lot of traffic in the other
queues.

>
>
>> For one particular application
>> traversing this link, that resulted in a file transfer rate increase
>> from 2.5 MB/s to 25 MB/s. That's a really huge difference and all we
>> did was increase the allocated queue bandwidth. At no point was that
>> link overutilized.
>
>
>
> We frequently see 'microburst' situations where the avg rate measured over
> 30sec etc is well under rate, but at some instantaneous moment there is a
> burst that exceeds line rate and can cause drops if the queue is not deep
> enough. Having a low bandwidth ratio, with traffic present in other queues,
> is another form of the queue not being deep enough, ie, the queue may have a
> lot of space but if packets are not dequeued frequently enough that queue
> can still fill & drop.
>
>
>
>> In fact, during our testing of that particular
>> application, the link output never went above 350 Mbps. We used very
>> large files so that the transfer would take a while and we'd get a
>> good feel for what was happening. Doing nothing but increasing the
>> queue bandwidth fixed the problem there and has fixed the same sort of
>> issue elsewhere.
>
>
> This suggests to me that there is traffic in other queues contending for the
> available bandwidth, and that there's periodically instantaneous congestion.
> Alternatively you could try sizing this queue bigger and using the original
> bandwidth ratio. Or a combination of those two (tweaking both bandwidth &
> queue-limit).
>
> Is there some issue with changing the bandwidth ratio on this queue (ie, are
> you seeing collateral damage)? Else, seems like you've solved the problem
> already ;)

Nope, we don't have a problem with it. That's what we've been doing.
We haven't really been adjusting the queue limit ratios, though. In
most cases, we were just changing the bandwidth ratio weights. I'm
looking at an interface right now where the 30-second weighted traffic
rate has never gone above around 150 Mbps but I'm still seeing OQDs in
one of the queues only. How do you think we should be interpreting
that?

>
> Hope that helps,
> Tim

It helps a lot! thanks!

John
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread Tim Stevenson

At 09:20 AM 6/27/2012, Phil Mayers pronounced:

note that queues don't have bandwidth, they have size and weight.


Yes, I've always disliked this term, "bandwidth" - I think "weight" 
would have been better, but that's water under the bridge.


Tim



Tim Stevenson, tstev...@cisco.com
Routing & Switching CCIE #5561
Distinguished Technical Marketing Engineer, Cisco Nexus 7000
Cisco - http://www.cisco.com
IP Phone: 408-526-6759

The contents of this message may be *Cisco Confidential*
and are intended for the specified recipients only.


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread Tim Stevenson

Hi John, please see inline below:

At 08:58 AM 6/27/2012, John Neiberger pronounced:

On Wed, Jun 27, 2012 at 8:24 AM, Janez Novak  wrote:
> 6748 can't do shaping. Would love to have them do that. So you must be
> experiencing drops somewhere else and not from WRR BW settings or WRED
> settings. They both kick in when congestion is happening (queues are
> filling up). For exaple linecard is oversubscribed etc
>
> Look at second bullet
> 
(http://www.cisco.com/en/US/docs/routers/7600/ios/12.2SR/configuration/guide/qos.html#wp1728810).

>
> Kind regards,
> Bostjan

This is very confusing and I'm getting a lot of conflicting
information. I've been told by three Cisco engineers that these queue
bandwidths limits



queue-limit and bandwidth values (ratios/weights) are *different* things.

The queue-limit physically sizes the queue. It says how much of the 
total physical buffer on the port is set aside exclusively for each 
class (where class is based on DSCP or COS). Traffic from other 
classes can NEVER get access to the buffer set aside for another 
class, ie, there could be plenty of available buffer in other queues 
even as you're dropping traffic in one of the queues.


The bandwidth ratios, on the other hand, determine how frequently 
each of those queues is serviced, ie, how often the scheduler will 
dequeue/transmit a frame from the queue. If there is nothing sitting 
in one queue, other queues can get access to that bandwidth, ie, 
"bandwidth" is not a hard limit, you can think of it as a minimum 
guarantee when there is congestion/contention.




are fairly hard limits. That is in line with what we
were experiencing because we were seeing output queue drops when the
interface was not fully utilized. Increasing the queue bandwidth got
rid of the output queue drops.



What this should be doing is just causing us to service the queue 
more frequently. That could certainly reduce/eliminate drops in the 
event of congestion, but only if there is traffic in the other queues 
that is also contending for the bandwidth.


In other words, if there is only one active queue (ie only one queue 
has traffic in it), then it can & should get full unrestricted access 
to the entire link bandwidth. Can you confirm whether there's traffic 
in the other queues?




For one particular application
traversing this link, that resulted in a file transfer rate increase
from 2.5 MB/s to 25 MB/s. That's a really huge difference and all we
did was increase the allocated queue bandwidth. At no point was that
link overutilized.



We frequently see 'microburst' situations where the avg rate measured 
over 30sec etc is well under rate, but at some instantaneous moment 
there is a burst that exceeds line rate and can cause drops if the 
queue is not deep enough. Having a low bandwidth ratio, with traffic 
present in other queues, is another form of the queue not being deep 
enough, ie, the queue may have a lot of space but if packets are not 
dequeued frequently enough that queue can still fill & drop.




In fact, during our testing of that particular
application, the link output never went above 350 Mbps. We used very
large files so that the transfer would take a while and we'd get a
good feel for what was happening. Doing nothing but increasing the
queue bandwidth fixed the problem there and has fixed the same sort of
issue elsewhere.


This suggests to me that there is traffic in other queues contending 
for the available bandwidth, and that there's periodically 
instantaneous congestion. Alternatively you could try sizing this 
queue bigger and using the original bandwidth ratio. Or a combination 
of those two (tweaking both bandwidth & queue-limit).


Is there some issue with changing the bandwidth ratio on this queue 
(ie, are you seeing collateral damage)? Else, seems like you've 
solved the problem already ;)


Hope that helps,
Tim





I'm still researching this and trying to get to the bottom of it. I
think we're missing something important that would make this all make
more sense. I appreciate everyone's help!

John
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/





Tim Stevenson, tstev...@cisco.com
Routing & Switching CCIE #5561
Distinguished Technical Marketing Engineer, Cisco Nexus 7000
Cisco - http://www.cisco.com
IP Phone: 408-526-6759

The contents of this message may be *Cisco Confidential*
and are intended for the specified recipients only.


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread John Neiberger
On Wed, Jun 27, 2012 at 10:20 AM, Saku Ytti  wrote:
> On (2012-06-27 12:11 -0400), Chris Evans wrote:
>
>> If you don't need QoS features, disable it and you will have the full
>> interface buffer for any traffic. If you do need QoS perhaps remap your
>
> Agreed, no reason run what you don't need. I view CoPP as mandatory feature
> to any node with IP address reachable from Internet and CoPP depends on
> 'mls qos'.
> If you do enable MLS QoS, you might want to map all traffic to fewer
> classes, maybe just 2 or even 1, this way you can allocate more buffers,
> instead of dividing it evenly to maximum amount of classes card supports.
>
> --
>  ++ytti

We definitely need CoPP, so I think on the devices that don't need it,
we should definitely re-map the classes to one queue and then tune it
accordingly.

This is all fantastic information. I've never had to deal with
queueing at this level before, so much of this is new to me. I
appreciate everyone's help!

John

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread Phil Mayers

On 27/06/12 16:58, John Neiberger wrote:


I'm still researching this and trying to get to the bottom of it. I
think we're missing something important that would make this all make
more sense. I appreciate everyone's help!


Queueing on this platform is complex.

Google "qos srnd" and read the sections on 6500 carefully, if you 
haven't already.


In particular, note that queues don't have bandwidth, they have size and 
weight. The actual rate at which packets leave a queue is a weighted 
function of arrival rate at ALL queues. A queue can absorb a burst in 
excess of the empty rate up to the queue size, with a drop threshold (if 
RED is enabled) controlled by queue size & CoS.


If you are seeing a queue dropping packets, and the offered load into 
that queue is less than egress link speed, then some OTHER queue must 
have a weight AND OFFERED LOAD that is causing the dropped queue to be 
under-serviced.


The 6748 does have DWRR, so you shouldn't be suffering from starvation.

At this point, a "sh queueing int ..." on the egress port would help.

Are you running a "default" QoS config? Are you trusting CoS/DSCP or not?
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread Saku Ytti
On (2012-06-27 12:11 -0400), Chris Evans wrote:

> If you don't need QoS features, disable it and you will have the full
> interface buffer for any traffic. If you do need QoS perhaps remap your

Agreed, no reason run what you don't need. I view CoPP as mandatory feature
to any node with IP address reachable from Internet and CoPP depends on
'mls qos'.
If you do enable MLS QoS, you might want to map all traffic to fewer
classes, maybe just 2 or even 1, this way you can allocate more buffers,
instead of dividing it evenly to maximum amount of classes card supports.

-- 
  ++ytti
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread Phil Mayers

On 27/06/12 17:11, Chris Evans wrote:


If you don't need QoS features, disable it and you will have the full
interface buffer for any traffic. If you do need QoS perhaps remap your
queues to reduce the amount of queues that will be in contention for
bandwidth.   In my experience QoS on the 6500 has always caused more issues
than its solved due to its limited interface queuing capabilties.


Note that CoPP on this platform requires QoS.

I agree that remapping into fewer (one?) queues may be the solution here.
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread Chris Evans
This is where I ask the question whether you need QoS and its queues or
not? At my old employer we never enabled QoS on our 6500s in the data
centers because of this buffer carving issue. When you disable QoS on the
6500 platform it lets the dscp/802.1p bits pass, which we were fine with.
We never wanted to do tagging for applications, it was always do it
yourself or its not getting done. Anytime we enabled QoS we ran into issues
such as you are having.

If you don't need QoS features, disable it and you will have the full
interface buffer for any traffic. If you do need QoS perhaps remap your
queues to reduce the amount of queues that will be in contention for
bandwidth.   In my experience QoS on the 6500 has always caused more issues
than its solved due to its limited interface queuing capabilties.

On Wed, Jun 27, 2012 at 12:01 PM, John Neiberger wrote:

> On Wed, Jun 27, 2012 at 9:58 AM, John Neiberger 
> wrote:
> > On Wed, Jun 27, 2012 at 8:24 AM, Janez Novak 
> wrote:
> >> 6748 can't do shaping. Would love to have them do that. So you must be
> >> experiencing drops somewhere else and not from WRR BW settings or WRED
> >> settings. They both kick in when congestion is happening (queues are
> >> filling up). For exaple linecard is oversubscribed etc
> >>
> >> Look at second bullet
> >> (
> http://www.cisco.com/en/US/docs/routers/7600/ios/12.2SR/configuration/guide/qos.html#wp1728810
> ).
> >>
> >> Kind regards,
> >> Bostjan
> >
> > This is very confusing and I'm getting a lot of conflicting
> > information. I've been told by three Cisco engineers that these queue
> > bandwidths limits are fairly hard limits. That is in line with what we
> > were experiencing because we were seeing output queue drops when the
> > interface was not fully utilized. Increasing the queue bandwidth got
> > rid of the output queue drops. For one particular application
> > traversing this link, that resulted in a file transfer rate increase
> > from 2.5 MB/s to 25 MB/s. That's a really huge difference and all we
> > did was increase the allocated queue bandwidth. At no point was that
> > link overutilized. In fact, during our testing of that particular
> > application, the link output never went above 350 Mbps. We used very
> > large files so that the transfer would take a while and we'd get a
> > good feel for what was happening. Doing nothing but increasing the
> > queue bandwidth fixed the problem there and has fixed the same sort of
> > issue elsewhere.
> >
> > I'm still researching this and trying to get to the bottom of it. I
> > think we're missing something important that would make this all make
> > more sense. I appreciate everyone's help!
> >
> > John
>
> Also, these 6748 linecards are 1p3q8t. According to that doc these use
> DWRR. Does the second bullet apply to DWRR, as well? I'm not quite
> sure of the differences.
>
> Thanks again,
> John
>
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread John Neiberger
On Wed, Jun 27, 2012 at 9:58 AM, John Neiberger  wrote:
> On Wed, Jun 27, 2012 at 8:24 AM, Janez Novak  wrote:
>> 6748 can't do shaping. Would love to have them do that. So you must be
>> experiencing drops somewhere else and not from WRR BW settings or WRED
>> settings. They both kick in when congestion is happening (queues are
>> filling up). For exaple linecard is oversubscribed etc
>>
>> Look at second bullet
>> (http://www.cisco.com/en/US/docs/routers/7600/ios/12.2SR/configuration/guide/qos.html#wp1728810).
>>
>> Kind regards,
>> Bostjan
>
> This is very confusing and I'm getting a lot of conflicting
> information. I've been told by three Cisco engineers that these queue
> bandwidths limits are fairly hard limits. That is in line with what we
> were experiencing because we were seeing output queue drops when the
> interface was not fully utilized. Increasing the queue bandwidth got
> rid of the output queue drops. For one particular application
> traversing this link, that resulted in a file transfer rate increase
> from 2.5 MB/s to 25 MB/s. That's a really huge difference and all we
> did was increase the allocated queue bandwidth. At no point was that
> link overutilized. In fact, during our testing of that particular
> application, the link output never went above 350 Mbps. We used very
> large files so that the transfer would take a while and we'd get a
> good feel for what was happening. Doing nothing but increasing the
> queue bandwidth fixed the problem there and has fixed the same sort of
> issue elsewhere.
>
> I'm still researching this and trying to get to the bottom of it. I
> think we're missing something important that would make this all make
> more sense. I appreciate everyone's help!
>
> John

Also, these 6748 linecards are 1p3q8t. According to that doc these use
DWRR. Does the second bullet apply to DWRR, as well? I'm not quite
sure of the differences.

Thanks again,
John
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread John Neiberger
On Wed, Jun 27, 2012 at 8:24 AM, Janez Novak  wrote:
> 6748 can't do shaping. Would love to have them do that. So you must be
> experiencing drops somewhere else and not from WRR BW settings or WRED
> settings. They both kick in when congestion is happening (queues are
> filling up). For exaple linecard is oversubscribed etc
>
> Look at second bullet
> (http://www.cisco.com/en/US/docs/routers/7600/ios/12.2SR/configuration/guide/qos.html#wp1728810).
>
> Kind regards,
> Bostjan

This is very confusing and I'm getting a lot of conflicting
information. I've been told by three Cisco engineers that these queue
bandwidths limits are fairly hard limits. That is in line with what we
were experiencing because we were seeing output queue drops when the
interface was not fully utilized. Increasing the queue bandwidth got
rid of the output queue drops. For one particular application
traversing this link, that resulted in a file transfer rate increase
from 2.5 MB/s to 25 MB/s. That's a really huge difference and all we
did was increase the allocated queue bandwidth. At no point was that
link overutilized. In fact, during our testing of that particular
application, the link output never went above 350 Mbps. We used very
large files so that the transfer would take a while and we'd get a
good feel for what was happening. Doing nothing but increasing the
queue bandwidth fixed the problem there and has fixed the same sort of
issue elsewhere.

I'm still researching this and trying to get to the bottom of it. I
think we're missing something important that would make this all make
more sense. I appreciate everyone's help!

John
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-27 Thread Janez Novak
6748 can't do shaping. Would love to have them do that. So you must be
experiencing drops somewhere else and not from WRR BW settings or WRED
settings. They both kick in when congestion is happening (queues are
filling up). For exaple linecard is oversubscribed etc

Look at second bullet
(http://www.cisco.com/en/US/docs/routers/7600/ios/12.2SR/configuration/guide/qos.html#wp1728810).

Kind regards,
Bostjan

On Tue, Jun 26, 2012 at 10:28 PM, Chris Evans  wrote:
> Tac is right. This is a downfall of ethernet switching qos. The buffers are
> carved up for the queues. My advice is to disable qos altogether or remap
> all traffic and buffers  back to one queue.
> On Jun 26, 2012 4:22 PM, "John Neiberger"  wrote:
>
>> I'm getting conflicting information about how WRR scheduling and
>> queueing works on 6748 blades. These blades have three regular queues
>> and one priority queue. We've been told by two Cisco TAC engineers
>> that if one queue is full, packets will start being dropped even if
>> you have plenty of link bandwidth available. Our experience over the
>> past few days dealing with related issues seems to bear this out. If a
>> queue doesn't have enough bandwidth allotted to it, bad things happen
>> even when the link has plenty of room left over.
>>
>> However, someone else is telling me that traffic should be able to
>> burst up to the link speed as long as the other queues are not full.
>> Our experience seems to support what we were told by Cisco, but we may
>> just be looking at this the wrong way. It's possible that the queue
>> only seems to be policed, but maybe most of the drops are from RED.
>> I'm just not sure now.
>>
>> Can anyone help clear this up?
>>
>> Thanks!
>> John
>> ___
>> cisco-nsp mailing list  cisco-nsp@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>
> ___
> cisco-nsp mailing list  cisco-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-26 Thread Peter Rathlev
On Tue, 2012-06-26 at 14:16 -0600, John Neiberger wrote:
> I'm getting conflicting information about how WRR scheduling and
> queueing works on 6748 blades. These blades have three regular queues
> and one priority queue. We've been told by two Cisco TAC engineers
> that if one queue is full, packets will start being dropped even if
> you have plenty of link bandwidth available.

That is correct: If the queue is full, packets are dropped. The question
is the: Why does the queue end up full if there's plenty of bandwidth
available?

> Our experience over the past few days dealing with related issues
> seems to bear this out. If a queue doesn't have enough bandwidth
> allotted to it, bad things happen even when the link has plenty of
> room left over.

Can you share the configuration from the interface in question together
with the output from "show interface GiX/Y" and "show queueing interface
GiX/Y"? And maybe "show flowcontrol interface GiX/Y" if you're using
flowcontrol.
> 
> However, someone else is telling me that traffic should be able to
> burst up to the link speed as long as the other queues are not full.

Correct. Keep in mind that queueing and bandwidth are two different
things working together. Packets are put in queues and queues are served
in a weighed round-robin fashion. If the amount of packets enqueued is
larger than what can be transmitted for this queue it starts to drop. As
long as there's available bandwidth all the WRR queues should be able to
send what they have.

> Our experience seems to support what we were told by Cisco, but we may
> just be looking at this the wrong way. It's possible that the queue
> only seems to be policed, but maybe most of the drops are from RED.
> I'm just not sure now.

RED (which is enabled by default) would introduce drops faster than
without. This might not be the best idea for non-core interfaces. If
your traffic is mostly BE (and thus hitting queue 1 threshold 1) you
start RED-dropping at 40% and tail-dropping at 70% of the queue buffer
space. And queue 1 has 50% of the interface buffers, which should be
583KB [0]. If my back-of-the-envolope calculation is right that's ~3.3ms
queuing for BE traffic (q1t1).

[0]: 
http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps708/prod_white_paper09186a0080131086.html

-- 
Peter


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] WRR Confusion on 6748 blades

2012-06-26 Thread Chris Evans
Tac is right. This is a downfall of ethernet switching qos. The buffers are
carved up for the queues. My advice is to disable qos altogether or remap
all traffic and buffers  back to one queue.
On Jun 26, 2012 4:22 PM, "John Neiberger"  wrote:

> I'm getting conflicting information about how WRR scheduling and
> queueing works on 6748 blades. These blades have three regular queues
> and one priority queue. We've been told by two Cisco TAC engineers
> that if one queue is full, packets will start being dropped even if
> you have plenty of link bandwidth available. Our experience over the
> past few days dealing with related issues seems to bear this out. If a
> queue doesn't have enough bandwidth allotted to it, bad things happen
> even when the link has plenty of room left over.
>
> However, someone else is telling me that traffic should be able to
> burst up to the link speed as long as the other queues are not full.
> Our experience seems to support what we were told by Cisco, but we may
> just be looking at this the wrong way. It's possible that the queue
> only seems to be policed, but maybe most of the drops are from RED.
> I'm just not sure now.
>
> Can anyone help clear this up?
>
> Thanks!
> John
> ___
> cisco-nsp mailing list  cisco-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/