We discussed this briefly during today's ARCH call so I wanted to
raise this as a topic to explore here as well as possibly discuss as
part of the ODP Design Summit at BUD17.

Background
----------------

When packets are received via PktIO, they become odp_packet_t objects
that are stored in an associated packet pool associated either with
the PktIO or with the CoS that the packet is assigned to by the
classifier. ODP pools have a defined capacity assigned at
odp_pool_create() time. The question arises is what happens as pools
are depleted and packets continue to arrive?

Strategies
--------------

Historically, packets that have no place to be stored were simply
dropped at the RX interface. Such "tail dropping" is undesirable as it
results in sharp "edges" in responsiveness and is indiscriminate in
how it is applied.

There are two basic strategies for dealing with this situation,
depending on whether or not it is acceptable to drop packets at all.
Since Ethernet was designed to be a "lossy" protocol, for most
protocols some sort of drop strategy is acceptable. The most common
employed is RED [1] and its variants.

Some protocols such as FCoE [2] cannot tolerate losses, and so
Ethernet Flow Control [3] protocols were developed, the main one being
Priority Flow Control [4] that extends the Ethernet Pause frame to
enable up to 8 separate flow classes that can be paused independently.

Both of these mechanisms rely on watermarking. When a pool depletes to
its low water mark, RED-based systems begin discarding packets while
PFC-based systems issue Pause frames to halt incoming traffic classes.
Given the timing involved in ensuring lossless Ethernet, PFC requires
HW support as well as careful tuning of the watermarks to ensure that
there is sufficient buffer space left for the pause frame to be
received at the other end of the link and the link to drain once the
other end stops transmitting.

To enable hysteresis, once initiated flow control actions continue
until the pool has recovered to a high water mark, signaling that it
is safe to return to normal operation. In the case of RED, multiple
low water marks can exist, with each one triggering the next level of
aggressiveness in the algorithms until if the pool is depleted
entirely all packets are dropped. PFC uses only a single low
watermark, as a link is either paused or not paused.

ODP Support
------------------

ODP could offer support for drop/pause policies as capabilities at the
PktIO and pool levels. Pools could have APIs that permit watermarks to
be queried or set, and PktIOs could have drop/pause policies
configured that would be triggered by watermark notifications received
from pools they are filling for RX processing. How a pool notifies a
PktIO of watermark levels would be implementation-dependent since
these tend to be tightly integrated in most systems and these
notifications are essentially private. Appropriate statistics would
also need to be defined to accompany this support to permit
applications to report on drop/pause activity.

Request for Comments
-------------------------------

It would be useful to know what capabilities exist in various HW
platforms that ODP is being implemented as well as any application
use-cases for implementing ODP support for enabling and managing these
capabilities.  Please respond to this thread and we'll see whether
this is worth having further technical discussions on this topic as
part of the BUD17 design summit.

Thanks.

Bill

---
[1] https://en.wikipedia.org/wiki/Random_early_detection
[2] https://en.wikipedia.org/wiki/Fibre_Channel_over_Ethernet
[3] https://en.wikipedia.org/wiki/Ethernet_flow_control
[4] 
http://www.cisco.com/c/en/us/products/collateral/switches/nexus-7000-series-switches/white_paper_c11-542809.pdf

Reply via email to