Thanks for the reply. Comments in line.

On 1/22/2024 3:27 AM, François Michel wrote:
Hi Christian,

Great to hear from you, especially given you expertise in the topic!
Thank you for all your comments. See my answers below.

I am adding Rachel to the loop, that was interested in progressing in the draft together and ensure the design can handle their use-case as well.


Le 19/01/24 à 07:29, Christian Huitema a écrit :
François, Olivier,

I just spent some time studying your draft on QUIC FEC. I like the idea of having an FEC framework independent from the algorithm used to actually compute the FEC data and repair packets. Your draft solves a number of practical problems, such as how to notify peers when FEC helps receive a frame from an otherwise lost packet, or how to identify "symblos" independently of packet numbers using the symbol identifier frame (SID).

The draft is obviously a work in progress.

Yes, the aim of this draft and current papers under submission is to spark the interest on the topic again. I've been working on FEC for years now and was part of the earlier QUIC-FEC work at NWCRG where we already wrote interesting drafts. My intent with this one here is to propose a short and simple specification that people can wrap their head around. We can then make it progress together with quicwg folks instead of proposing a first, exhaustive but complex draft that is difficult to apprehend.


You propose two alternatives for linking frames to a SID. I wish you picked just one, and I prefer your first alternative, in which your SID frames brackets a list of protected frames.

I prefer the first one as well, but I was insure it fits well the design of existing implementations. If people are okay, I'd be more than happy to only keep alternative 1.

  However, I an not quite sure
how this should be parsed. You give an example as:

   | Pkt(6)[STREAM(2, "xyz"),                                    |
   |        SOURCE_SYMBOL(1, { STREAM(8, "def"),                 |
   |                           DATAGRAM("msg") }]                |

In that example, the frame  STREAM(8..) and DATAGRAM() are protected,
while the "STREAM(2)" is not. Fine, but the syntax is described as:

SOURCE_SYMBOL {
   SID (i),
   FEC Protected Payload (..)
}

... and I don't know how to parse that. There is no indication of the length of the "FEC Protected Payload". Do you mean to indicate that the SOURCE_SYMBOL frame extends to the end of the packet, and that all frames following the SID are protected?

Yes, that's right. In the current design, the frame stops at the end of the packet. We can add a length field or a number of protected frames.

OK. It would be simple enough to define packets as:
<non protected frames><SID><protected frames>

You define a framework in which client and server negotiate to use FEC, and also to select a FEC scheme. The syntax of your transport parameter seems a bit restrictive: the client proposes to use FEC and a specific scheme, and the server accepts or refuse. Given the experimental nature of FEC, I expect that we will try several algorithms. It would be nice for the client to propose a list, and for the server to pick one -- or zero, if it does not support any of the proposed values. In fact, I think that you could merge the "enable FEC" parameter that negotiates use of FEC with the "decoder FEC scheme" negotiation.

Agree, we could use the transport parameter to propose a list of FEC schemes, that's indeed how most negotiation mechanisms work. The absence of this parameter indicates FEC is not supported and an empty list would announce that FEC SOURCE_SYMBOL frames are parsed but not used. This might cause problems though as the REPAIR frame format depends on the negotiated scheme.

The classic negotiation is "client proposes many, sender pick one, and that one defines the REPAIR format." The alternative would be to not have a repair format, and let each scheme define its own.

Your draft does not assign identifiers to existing FEC schemes. To facilitate interop tests, I suggest that you define at least one. In fact, I would suggest a very simple one, in which the REPAIR frame identifies a range of SID, and then carries the XOR of all packets in that range.

Agree. I am not a fan of the XOR code as it performs really badly in many scenarios when losses occur in bursts and it might lead implementers to only implement XOR and I would like to avoid that. We could maybe define xor (e.g. "interop_xor") in another draft especially for interop purpose so that it is clear.

Yes, that was my intent too.

The suggestion above brings a discussion of the relative size of the "FEC Protected Payload" and the REPAIR frames. As in the example above, I would expect REPAIR frames to include a small header followed by a combination of the content of several FEC Protected Payload, with that combination being at least as long as the longest FEC Protected Payload in the set. That longest size, by default, can be a full packet payload (per PMTU), minus the length of the SID prefix. But that leave very little room for encoding the prefix of the REPAIR frame, which is likely to require at list the REPAIR frame type (arguably same length as the SOURCE_SYMBOL frame type), and SID identifying the range (same length as the SID parameter of the SOURCE_SYMBOL frame), and an additional parameter indicating the variant of he repair frame according to the selected scheme (arguably the same length as the coding window). Is that the problem that you are discussing in section 4.2.3?

If you refer to the end section 5.3 and not 4.2.3, yes. The REPAIR frame may contain metadata that may increase its overhead. So I see two ways to cope with this problem:

1) Restrict the maximum size of FEC Protected Payload (what we do now in our implems) 2) Make it possible to "stream" REPAIR frames. This is a bit sad as you may need more packets than repair symbols but on average that could work well.

If you see the need to support streaming of repairs, then it would make sense to define how to do that in the generic "framing" document.

Restricting the length of the protected frames feels much simpler, for example not having to worry about receiving only fractions of REPAIR frames. But you would want that to be a function of the PMTU, otherwise you introduce overhead, which is why I was mentioning "maximum overhead of repair"

Should there be some property associated to the FEC scheme, such as the maximum overhead of a REPAIR frame?

If we decide 1) above, yes, that would be really helpful to have the FEC scheme signal the max repair frame overhead, but that may not work with schemes that e.g. explicitly list the IDs of protected symbols.

Make that a "don't care" option, but it requires defining how to split and reassemble repair frames.

(Also, why pad the FEC-protected data at the beginning rather than at the end? Or leave that as a property of the FEC scheme?)

We pad it at the beginning so that padding can be naturally handled at the QUIC layer. The padding can be parsed and handled as classical QUIC padding frames, so the decoded does not have to process the recovered payload before handling it to QUIC.

That does not feel like a very good reason. Padding at the end is "virtual", and does not introduce packet overhead. Yes, this means the final "repaired" frame may contain an arbitrary number of zeroes at the end, but these will be parsed as QUIC padding and ignored by the decoder.

I am not sure that I fully understand how to use the FEC WINDOW frame. You allow it to change, but what if the packet containing that frame is lost? How can the peer know when exactly the use of the new window starts, and which window is associated with a particular SOURCE_SYMBOL or REPAIR frame?

I think our draft is not clear enough about that. The FEC_WINDOW frame announces that maximum amount of symbols that can be stored by the FEC decoder. It has a purpose comparable to QUIC MAX_DATA frames. This prevent the sender to send REPAIR frames that protect more symbols than what the receiver is able to store.

In our view, the actual window protected by a repair symbol should be announced inside the REPAIR frame carrying the repair symbol. Source symbols can be associated with many different windows with different sizes, all being defined by the repair symbols.

If we announce a reduced FEC_WINDOW and if the packet containing it is lost, there will be a period of time where the server may send REPAIR frames that protect more symbols than what the receiver is allowed to store and those REPAIR frames will be useless. In our scenarios, this FEC_WINDOW limit was rarely hit though, as if you send at a limited bitrate (e.g. real-time video), you'll probably protect less packets than the maximum buffer size of the FEC Decoder. Same thing is you protect bulk transfers with mdeium/low BDPs, but this limit will likely be hit with high BDPs or low-memory FEC Decoders.



Reed-Solomon codes are often characterized by two numbers, the length of the coding window and the number of redundant copies -- in our case, the number of REPAIR frames for a given coding window. It seems that in your proposal these two numbers are set arbitrarily by the sender. Should there me some negotiation of maximum values? Or would those maximum values be deduced from the scheme identifier, something like "reed solomon 32 + 8"? Or should the "repair" frame indicate the length of the coding widow over which it operates?

I like the idea of the repair symbols being self-contained. However, I understand that there might be good reasons to define limits, especially on constrained devices, as complex codes (e.g. large reed solomon blocks) might be too heavy CPU-wise.

I have used 40+8 in previous implementations. Yes, there is a CPU cost, but it is smaller than or comparable to the cost of encryption. Some implementations might want to do that.

I am also not sure how the update of the coding window works for a convolutional code

(see my answer in the paragraph below)


One way to understand the coding window is "the number of frames over which a given REPAIR may operate," but we are concerned with correlated losses happening in trains. To protect against that, it is nice to send the repair frames some time after the protected frames, in which case the window would an indication of how long a copy of a given frame has to be kept. This could be expressed as a number of packets, but if multipath is supported we may want to send the repairs on a different path, and then using number of packets is not natural.

In our implem, the sender stores the maximum window size (announced by FEC_WINDOW frames). When generating a repair symbol, the size of the window protected by this repair symbol is set to min(max_window_size, n_symbols_in_flight). That window size is announced as part of the repair symbol. Concerning the receiver, it keeps track of the source symbol with the highest received SID (highest_sid). It keeps in memory the symbols with SID [highest_sid-max_window_size, highest_sid] and can therefore perform decoding for repair symbols protecting windows sitting inside this interval.

I can see a negotiation between sender and receiver. In my mind, there are two critical numbers, both on the receiver side:

* the maximum number of SID that it will remember, because that requires committing memory.

* the maximum number of SID+REPAIR frames that it is willing to combine, i.e., the size of the "repair matrix", because of both CPU and memory. Arguably, that one depends on the scheme.

If we have a negotiation, then I would expect it at the beginning of the connection. The variable window seems like a secondary optimization, as in "we negotiated a maximum of 16+5, but for now i am sending 8+2 because it seems sufficient".

I would rather encode that in the SID frame than in a separate frame, because we have some room in the SID frame (see length of REPAIR frame issue). Also because that removes the need for handling synchronization of FEC_WINDOW, SID and REPAIR.


So using SIDs instead of packet numbers and number of frames seems more natural to me.

Speaking of multipath: I had hopes at some point to be able to define FEC-dedicated paths, even if the FEC-dedicated path is running on the same network path (i.e. like a "virtual path" concept). Let me explain: frames protected by FEC could be sent on packets over the FEC-dedicated path only and frames not protected by FEC would be sent on another path. Both path could actually use the same network path, but this could naturally allow to decouple FEC-protected frames from others and remove the need of an SID frame (and spare space in packets!) if we ensure the packet numbers are sent contiguously, as we could use packet numbers as SIDs. I know contiguous packet numbers is not how QUIC works, and this idea may not be the ideal solution, but I describe it here as it might spark clever ideas from you or other folks reading this thread.

Why specialize paths? You could spray both the SID and the REPAIR over multiple paths. You don't need much there, but you do need a common SID number space between all paths. And you need to handle reordering of out of order SID and REPAIR packets, with a reordering buffer the size of the FEC window.

OK, that's a lot of text. Some of that may be because I did not fully understand your intent. I expect things to get clearer with your next draft, or when we start interop testing of different implementations...


Sorry if the draft is still in a bit of a rough form! We currently concentrate on having implems and papers published (publishing FEC work is a *real* effort, as the ideas behind FEC are old, it is difficult to convince reviewers of the novelty... :-) and that's fine but it takes time)

I will certainly integrate your comments in the next version(s), they are all valuable and totally on-point ! Thank you so much for that.

Waiting to work on that!

Same, after all these years ! Maybe Brisbane is too early to host a hackathon table (especially that I don't know how many folks would be interested), but that might be something we could do for next IETFs. Once I get my work published, I can also release my FEC encoder/decoder lib that has C bindings that could really be valuable for interop as well.

Will look into that, thanks.

-- Christian Huitema

Reply via email to