On Mon, Feb 13, 2023 at 7:20 AM Christian Huitema <huit...@huitema.net> wrote:
> This issue, packet number encryption versus hardware acceleration, was > discussed in quite some depth during the standardization process. The > current design was adopted with full knowledge that hardware > acceleration will require some harder work than if numbers were in clear > text. > I did look through the mailing list archives, and I understand that adding it now is non-trivial. I didn't see a previous discussion on the mailing lists of negotiating plaintext packet numbers as part of the handshake, and since it will benefit high-performance users I would like to draft such a proposal. > > Boris, you may try to propose an extension or a new version that changes > this specification, but such changes will have to be negotiated and > agreed by the client and server for each connection. Very likely, a > significant number of these clients and servers are going to reject the > extension. So, even if you do define this extension, your hardware will > still have to support the existing specification to talk to this > unmodified clients or servers. > Thanks Christian, a negotiated extension is exactly what I would like to propose. Indeed, both client and server must support this for it to work, and that's perfectly fine for many use-cases, such as high performance computing and data centers. Do note that hardware support for clients and servers that reject this feature is an orthogonal question---NIC based encryption acceleration is usually combined with fully compliant software stacks that handle packets which hardware cannot process. For a discussion of software stacks for NIC hardware encryption acceleration for QUIC, see: Linux: https://lwn.net/ml/linux-doc/20220817200940.1656747-1-adel.abush...@gmail.com/ Windows: https://github.com/microsoft/quic-offloads What would be the next step for such a proposal? an RFC draft? a discussion at IETF116? I think that it makes more sense to start with DTLS1.3 as it seems to require only an extension, compared to a new QUIC version that changes an invariant. > -- Christian Huitema > > On 2/10/2023 12:21 AM, Boris Pismenny wrote: > > Hi Mikkel. > > > > On Thu, Feb 9, 2023 at 8:21 PM Mikkel Fahnøe Jørgensen > > <mikke...@gmail.com <mailto:mikke...@gmail.com>> wrote: > > > > QUIC does allow for creating a custom version of the protocol but it > > should be registered with IANA. It is also possible to use > > unregistered version numbers for test purposes in closed > > environments, but tis hardly the case for released hardware. > > Weakening the security of official protocol versions through > > negotiation is likely not a good idea. > > > > > > Thanks for the information. I didn't know it was possible to register > > custom versions of QUIC/DTLS. Are there any examples of that? What would > > be the process of proposing one? > > > > In any case, it seems a bit extreme to create a custom/new version just > > to add an option to negotiate slightly different header formatting. For > > example, DTLS1.2 connection IDs, which are a far more drastic change in > > my opinion, extend the DTLS header format without a new version. > > > > > > As to latency of hardware encryption: > > > > I’m not sure what unique pipeline stage means, but clearly some > > special attention is required for the header. > > > > > > By 'unique pipeline stage', I mean to describe a dedicated piece of > > logic in silicon dedicated for PNE. Hardware typically consists of a > > sequence of logical stages manipulating data in sequence. Dedicated > > logic in silicon costs money, and adds some latency. > > > > Assuming we limit the discussion to AES encryption with block size > > 64, PNE works by sampling somewhere within the first AES block. > > We can assume there is enough data to sample because otherwise the > > packet is invalid so any random padding will do. > > > > You can precompute the AES encryption of the first block by > > encrypting zeroes so you have that encryption data before the packet > > data arrives - you should do that anyway > > for maximum throughput unless you are seriously constrained for > memory. > > > > > > This optimization improves things, but we've taken that into > > consideration and the argument still holds. Putting dedicated hardware > > for header protection exclusively is a tradeoff between performance and > > cost vs. security, and arguably important tradeoffs should be > negotiable. > > > > Potentially you can also precompute the packet number length and > > even the packet number itself, at least optimistically. > > Once the data arrives, you can immediately retrieve the plain text > > data segment of the first block - and sample the data needed for PNE > > encryption without delaying the encryption pipeline. > > Now you have the AES block and the clear text mask data, and you > > just need to XOR onto the packet being encrypted. You may need to > > double encrypt the packet number : both with the mask > > and with its AES block at its own location such that you can XOR it > > into place once the encryption stage signals the block is available. > > This will undo that packet number encryption by the > > encryption stage and apply the packet number encryption as intended. > > > > On caveat is that I do not remember how this affects the signature > > at the end of the packet, but I assume it does not include the > > encrypted version of the packet number, otherwise > > that would complicate things. > > > > This will not necessarily work for encryption schemes that do not > > rely on XOR for encryption, but you can limit negotiation to schemes > > that have hardware support without changing protocol version. > > > > > > Since DTLS1.3 supports only the AES encryption option, it seems like the > > most common option to tackle. > > > > > > As I recall, header encryption was a really hard nut to crack > > because any trivial encryption was either too weak or too > complicated. > > > > > > Indeed I see no method of efficiency header protection while keeping the > > current format and size of the header. For instance, one could > > conceivably use another number for PNE, but that seems wasteful. > > > > > > Mikkel > > > >> On 8 Feb 2023, at 09.25, Boris Pismenny <borispisme...@gmail.com > >> <mailto:borispisme...@gmail.com>> wrote: > >> > >> Hello, > >> > >> I work on NIC hardware acceleration for NVIDIA, and we are looking > >> into QUIC and DTLS1.3 acceleration. QUIC and DTLS employ packet > >> number encryption (PNE) which increases security. At the same > >> time, PNE significantly encumbers hardware acceleration as I’ll > >> explain next. > >> > >> For hardware to encrypt the packet numbers, there are two options: > >> > >> 1. > >> Feed the header back into the encryption machine after data > >> has been encrypted. This means storing and forwarding data, > >> higher implementation complexity, and greater bandwidth > >> requirements on the single encryption machine. > >> 2. > >> > >> Adding an additional uniquepipeline stage dedicated for header > >> encryption. > >> > >> As you may already know, this is not hardware friendly and for > >> this reason many vendors will likely refuse to pay the cost of > >> supporting this. But suppose a vendor does implement this feature, > >> one problem still remains. PNE will still cause noticeable latency > >> and performance degradation for high speed networks (think > >400Gbps). > >> > >> Now, in certain use-cases, such as high performance computing, > >> cloud computing, or data-center clusters—the security benefits of > >> encrypting headers are marginal compared to the latency imposed by > >> PNE. Would it be possible to consider letting these users > >> negotiate to disable PNE and by doing so benefit (more) from > >> encryption acceleration? > >> > >> Best regards, > >> > >> Boris > >> > > > > > > _______________________________________________ > > TLS mailing list > > t...@ietf.org > > https://www.ietf.org/mailman/listinfo/tls >