We discussed this briefly during today's ARCH call so I wanted to raise this as a topic to explore here as well as possibly discuss as part of the ODP Design Summit at BUD17.
Background ---------------- When packets are received via PktIO, they become odp_packet_t objects that are stored in an associated packet pool associated either with the PktIO or with the CoS that the packet is assigned to by the classifier. ODP pools have a defined capacity assigned at odp_pool_create() time. The question arises is what happens as pools are depleted and packets continue to arrive? Strategies -------------- Historically, packets that have no place to be stored were simply dropped at the RX interface. Such "tail dropping" is undesirable as it results in sharp "edges" in responsiveness and is indiscriminate in how it is applied. There are two basic strategies for dealing with this situation, depending on whether or not it is acceptable to drop packets at all. Since Ethernet was designed to be a "lossy" protocol, for most protocols some sort of drop strategy is acceptable. The most common employed is RED [1] and its variants. Some protocols such as FCoE [2] cannot tolerate losses, and so Ethernet Flow Control [3] protocols were developed, the main one being Priority Flow Control [4] that extends the Ethernet Pause frame to enable up to 8 separate flow classes that can be paused independently. Both of these mechanisms rely on watermarking. When a pool depletes to its low water mark, RED-based systems begin discarding packets while PFC-based systems issue Pause frames to halt incoming traffic classes. Given the timing involved in ensuring lossless Ethernet, PFC requires HW support as well as careful tuning of the watermarks to ensure that there is sufficient buffer space left for the pause frame to be received at the other end of the link and the link to drain once the other end stops transmitting. To enable hysteresis, once initiated flow control actions continue until the pool has recovered to a high water mark, signaling that it is safe to return to normal operation. In the case of RED, multiple low water marks can exist, with each one triggering the next level of aggressiveness in the algorithms until if the pool is depleted entirely all packets are dropped. PFC uses only a single low watermark, as a link is either paused or not paused. ODP Support ------------------ ODP could offer support for drop/pause policies as capabilities at the PktIO and pool levels. Pools could have APIs that permit watermarks to be queried or set, and PktIOs could have drop/pause policies configured that would be triggered by watermark notifications received from pools they are filling for RX processing. How a pool notifies a PktIO of watermark levels would be implementation-dependent since these tend to be tightly integrated in most systems and these notifications are essentially private. Appropriate statistics would also need to be defined to accompany this support to permit applications to report on drop/pause activity. Request for Comments ------------------------------- It would be useful to know what capabilities exist in various HW platforms that ODP is being implemented as well as any application use-cases for implementing ODP support for enabling and managing these capabilities. Please respond to this thread and we'll see whether this is worth having further technical discussions on this topic as part of the BUD17 design summit. Thanks. Bill --- [1] https://en.wikipedia.org/wiki/Random_early_detection [2] https://en.wikipedia.org/wiki/Fibre_Channel_over_Ethernet [3] https://en.wikipedia.org/wiki/Ethernet_flow_control [4] http://www.cisco.com/c/en/us/products/collateral/switches/nexus-7000-series-switches/white_paper_c11-542809.pdf