Re: [TLS] record layer limits of TLS1.3
On 11/23/2016 02:46 AM, Judson Wilson wrote: > I worry about the buffer sizes required on embedded devices. Hopefully > the other endpoint would be programmed to limit record sizes, but is > that something we want to rely on? This could be a parameter agreed > upon during the handshake, but that seems bad. > My understanding is that the original motivation (which admittedly preceded me) included putting a cap on the amount of data that an endpoint could be forced to buffer, yes. Also note the proposal to steal the high bit of the length field to indicate encrypted records, in the proposal to reclaim the three fixed bytes from the record header. (https://github.com/tlswg/tls13-spec/pull/762) -Ben ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] record layer limits of TLS1.3
On 23/11/16 19:13, Watson Ladd wrote: > On Nov 23, 2016 10:22 AM, "Jeremy Harris" wrote: >> >> On 23/11/16 08:50, Yoav Nir wrote: >>> As long as you run over a network that has a smallish MTU, you’re going > to incur the packetization costs anyway, either in your code or in > operating system code. If you have a 1.44 GB file you want to send, it’s > going to take a million IP packets either way and 100 million AES block > operations. >> >> Actually, no. Everybody offloads ether-frame packetization and TCP >> re-segmentation to the NIC, talking 64kB TCP segments across the NIC/OS >> boundary. > > Who is 'everybody'? Broadcom, Intel, Emulex... anyone producing a high speed NIC. Perhaps we're talking past each other; I'm referring to (usually TCP) offload. > Let's look at the cost more exactly. We always have to copy from the > storage to the network. Packetization copies a tiny bit more data on each > packet. The amount of extra data doesn't hurt, having to deal with a larger number of buffers does - especially on receive, with reassembly. -- Jeremy ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] record layer limits of TLS1.3
A) OpenSSL does not measure the actual TLS performance (including nonce construction, additional data, etc), but rather just the speed of the main encryption loop. B) Still, I agree with Yoav. From my experience, the difference in TPT between 16K records and 64K records is negligible, as well as the network overhead. On the other hand using larger records increases the risk of HoL blocking. Cheers, Vlad > On Nov 24, 2016, at 6:16 AM, Yoav Nir wrote: > > >> On 24 Nov 2016, at 15:47, Hubert Kario wrote: >> >> On Wednesday, 23 November 2016 10:50:37 CET Yoav Nir wrote: On 23 Nov 2016, at 10:30, Nikos Mavrogiannopoulos wrote: > On Wed, 2016-11-23 at 10:05 +0200, Yoav Nir wrote: > Hi, Nikos > > On 23 Nov 2016, at 9:06, Nikos Mavrogiannopoulos That to my understanding is a way to reduce latency in contrast to cpu costs. An increase to packet size targets bandwidth rather than latency (speed). >>> >>> Sure, but running ‘openssl speed’ on either aes-128-cbc or hmac or sha256 >>> (there’s no test for AES-GCM or ChaCha-poly) you get smallish differences >>> in terms of kilobytes per second between 1024-byte buffers and 8192-byte >>> buffers. And the difference going to be even smaller going to 16KB buffers, >>> let alone 64KB buffers. >> >> this is not valid comparison. openssl speed doesn't use the hardware >> accelerated codepath >> >> you need to use `openssl speed -evp aes-128-gcm` to see it (and yes, >> aes-gcm and chacha20-poly1305 is supported then) >> >> What I see is nearly a 1GB/s throughput increase between 1024 and 8192 byte >> blocks for AES-GCM: >> >> type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes >> aes-128-gcm 614979.91k 1388369.31k 2702645.76k 3997320.76k >> 4932512.79k >> >> While indeed, for chacha20 there's little to no difference at the high end: >> type 16 bytes 64 bytes256 bytes 1024 bytes 8192 >> bytes 16384 bytes >> chacha20-poly1305 242518.50k 514356.72k 1035220.57k 1868933.46k >> 1993609.50k 1997438.98k >> >> (aes-128-gcm performance from openssl-1.0.2j-1.fc24.x86_64, >> chacha20-poly1305 from openssl master, both on >> Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz) > > Cool. So you got a 23% improvement, and I got an 18% improvement for AES-GCM. > I still claim (but cannot prove without modifying openssl code (maybe I’ll do > that over the weekend) that the jump from 16KB to 64KB will be far, far less > pronounced. > > Yoav > > ___ > TLS mailing list > TLS@ietf.org > https://www.ietf.org/mailman/listinfo/tls ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] record layer limits of TLS1.3
> On 24 Nov 2016, at 15:47, Hubert Kario wrote: > > On Wednesday, 23 November 2016 10:50:37 CET Yoav Nir wrote: >> On 23 Nov 2016, at 10:30, Nikos Mavrogiannopoulos wrote: >>> On Wed, 2016-11-23 at 10:05 +0200, Yoav Nir wrote: Hi, Nikos On 23 Nov 2016, at 9:06, Nikos Mavrogiannopoulos >>> That to my understanding is a way to reduce >>> latency in contrast to cpu costs. An increase to packet size targets >>> bandwidth rather than latency (speed). >> >> Sure, but running ‘openssl speed’ on either aes-128-cbc or hmac or sha256 >> (there’s no test for AES-GCM or ChaCha-poly) you get smallish differences >> in terms of kilobytes per second between 1024-byte buffers and 8192-byte >> buffers. And the difference going to be even smaller going to 16KB buffers, >> let alone 64KB buffers. > > this is not valid comparison. openssl speed doesn't use the hardware > accelerated codepath > > you need to use `openssl speed -evp aes-128-gcm` to see it (and yes, > aes-gcm and chacha20-poly1305 is supported then) > > What I see is nearly a 1GB/s throughput increase between 1024 and 8192 byte > blocks for AES-GCM: > > type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes > aes-128-gcm 614979.91k 1388369.31k 2702645.76k 3997320.76k 4932512.79k > > While indeed, for chacha20 there's little to no difference at the high end: > type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes > 16384 bytes > chacha20-poly1305 242518.50k 514356.72k 1035220.57k 1868933.46k > 1993609.50k 1997438.98k > > (aes-128-gcm performance from openssl-1.0.2j-1.fc24.x86_64, chacha20-poly1305 > from openssl master, both on > Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz) Cool. So you got a 23% improvement, and I got an 18% improvement for AES-GCM. I still claim (but cannot prove without modifying openssl code (maybe I’ll do that over the weekend) that the jump from 16KB to 64KB will be far, far less pronounced. Yoav ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] record layer limits of TLS1.3
On Wednesday, 23 November 2016 10:50:37 CET Yoav Nir wrote: > On 23 Nov 2016, at 10:30, Nikos Mavrogiannopoulos wrote: > > On Wed, 2016-11-23 at 10:05 +0200, Yoav Nir wrote: > >> Hi, Nikos > >> > >> On 23 Nov 2016, at 9:06, Nikos Mavrogiannopoulos > > That to my understanding is a way to reduce > > latency in contrast to cpu costs. An increase to packet size targets > > bandwidth rather than latency (speed). > > Sure, but running ‘openssl speed’ on either aes-128-cbc or hmac or sha256 > (there’s no test for AES-GCM or ChaCha-poly) you get smallish differences > in terms of kilobytes per second between 1024-byte buffers and 8192-byte > buffers. And the difference going to be even smaller going to 16KB buffers, > let alone 64KB buffers. this is not valid comparison. openssl speed doesn't use the hardware accelerated codepath you need to use `openssl speed -evp aes-128-gcm` to see it (and yes, aes-gcm and chacha20-poly1305 is supported then) What I see is nearly a 1GB/s throughput increase between 1024 and 8192 byte blocks for AES-GCM: type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes aes-128-gcm 614979.91k 1388369.31k 2702645.76k 3997320.76k 4932512.79k While indeed, for chacha20 there's little to no difference at the high end: type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes 16384 bytes chacha20-poly1305 242518.50k 514356.72k 1035220.57k 1868933.46k 1993609.50k 1997438.98k (aes-128-gcm performance from openssl-1.0.2j-1.fc24.x86_64, chacha20-poly1305 from openssl master, both on Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz) -- Regards, Hubert Kario Senior Quality Engineer, QE BaseOS Security team Web: www.cz.redhat.com Red Hat Czech s.r.o., Purkyňova 99/71, 612 45, Brno, Czech Republic signature.asc Description: This is a digitally signed message part. ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] record layer limits of TLS1.3
On Nov 23, 2016 10:22 AM, "Jeremy Harris" wrote: > > On 23/11/16 08:50, Yoav Nir wrote: > > As long as you run over a network that has a smallish MTU, you’re going to incur the packetization costs anyway, either in your code or in operating system code. If you have a 1.44 GB file you want to send, it’s going to take a million IP packets either way and 100 million AES block operations. > > Actually, no. Everybody offloads ether-frame packetization and TCP > re-segmentation to the NIC, talking 64kB TCP segments across the NIC/OS > boundary. Who is 'everybody'? Let's look at the cost more exactly. We always have to copy from the storage to the network. Packetization copies a tiny bit more data on each packet. Maybe if you have special PCI DMA between devices, but that is rare. > > Precisely because of the packetization cost. > -- > Jeremy > > ___ > TLS mailing list > TLS@ietf.org > https://www.ietf.org/mailman/listinfo/tls ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] record layer limits of TLS1.3
On 23/11/16 08:50, Yoav Nir wrote: > As long as you run over a network that has a smallish MTU, you’re going to > incur the packetization costs anyway, either in your code or in operating > system code. If you have a 1.44 GB file you want to send, it’s going to take > a million IP packets either way and 100 million AES block operations. Actually, no. Everybody offloads ether-frame packetization and TCP re-segmentation to the NIC, talking 64kB TCP segments across the NIC/OS boundary. Precisely because of the packetization cost. -- Jeremy ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] record layer limits of TLS1.3
> On 23 Nov 2016, at 09:50, Yoav Nir wrote: > > > On 23 Nov 2016, at 10:30, Nikos Mavrogiannopoulos wrote: > >> On Wed, 2016-11-23 at 10:05 +0200, Yoav Nir wrote: >>> Hi, Nikos >>> >>> On 23 Nov 2016, at 9:06, Nikos Mavrogiannopoulos >>> wrote: >>> Hi, Up to the current draft of TLS1.3 the record layer is restricted to sending 2^14 or less. Is the 2^14 number something we want to preserve? 16kb used to be a lot, but today if one wants to do fast data transfers most likely he would prefer to use larger blocks. Given that the length field allows for sizes up to 2^16, shouldn't the draft allow for 2^16- 1024 as maximum? >>> >>> I am not opposed to this, but looking at real browsers and servers, >>> we see that they tend to set the size of records to fit IP packets. >> >> IP packets can carry up to 64kb of data. I believe you may be referring >> to ethernet MTU sizes. > > I’m referring to the IP packets that they actually use, and that is set by > TCP to fit the PMTU, which is <= the ethernet MTU. In practice it is 1500 > bytes for most network paths. > >> That to my understanding is a way to reduce >> latency in contrast to cpu costs. An increase to packet size targets >> bandwidth rather than latency (speed). > > Sure, but running ‘openssl speed’ on either aes-128-cbc or hmac or sha256 > (there’s no test for AES-GCM or ChaCha-poly) you get smallish differences in > terms of kilobytes per second between 1024-byte buffers and 8192-byte > buffers. And the difference going to be even smaller going to 16KB buffers, > let alone 64KB buffers. > >>> The gains from increasing the size of records from the ~1460 bytes >>> that fit in a packet to nearly 64KB are not all that great, and the >>> gains from increasing records from 16 KB to 64KB are almost >>> negligible. At that size the block encryption dominates the CPU time. >> >> Do you have measurements to support that? I'm quite surprized by such a >> general statement because packetization itself is a non-negligible cost >> especially when encryption is fast (i.e., in most modern CPUs with >> dedicated instructions). > > As long as you run over a network that has a smallish MTU, you’re going to > incur the packetization costs anyway, either in your code or in operating > system code. If you have a 1.44 GB file you want to send, it’s going to take > a million IP packets either way and 100 million AES block operations. > > Whether you do the encryption is a million 1440-byte records or 100,000 > 14,400-byte records makes a difference of only a few percent. > > Measurement that we’ve made bear this out. They’re with IPsec, so it’s > fragmentation rather than packetization, but I don’t think that should make > much of a difference. > > Again, I’m not opposed to this. A few percent is a worthy gain. OK. However it would be useful to have the possibility to use a larger record than 16 KB. This is not relevant when using TLS over TCP for which the application does not expect record boundary preservation. For DTLS the story is different. The application expects record boundary preservation. Therefore the maximum DTLS record size does limit the maximum message size the application can transfer. In the case of DTLS over UDP, this is not so critical as long as you avoid IP fragmentation and have MTUs smaller than 16 KB. However, in the case of DTLS over SCTP this imposes a limitation. SCTP does segmentation and reassembly and therefore the MTU limit does not apply. Diameter is a protocol possibly using DTLS/SCTP as its transport and I was told that the 16 KB limit is a real restriction in some cases, relaxing it to 64 KB would be fine for them. So I'm not saying we need to change the default to 64 KB, we can put it wherever you find appropriate, as long as both sides can negotiate a larger limit. For example, one could allow the Maximum Fragment Length Negotiation to negotiate a value larger then 16KB. You would only need to extend the enum in https://tools.ietf.org/html/rfc6066#section-4 to allow values larger than 2^14. Best regards Michael > > Yoav > > ___ > TLS mailing list > TLS@ietf.org > https://www.ietf.org/mailman/listinfo/tls ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] record layer limits of TLS1.3
Maybe a solution would be a better maximum fragment length extension which allows the size can be negotiated in a more fine-grained way, as pointed in: https://www.ietf.org/mail-archive/web/tls/current/msg12472.html I also found these requests asking for larger packet sizes. https://www.ietf.org/mail-archive/web/tls/current/msg13569.html https://mailarchive.ietf.org/arch/msg/tls/K_6Ug4GtAxbrQJy2CFFFoxxtJ60 On Wed, 2016-11-23 at 00:46 -0800, Judson Wilson wrote: > I worry about the buffer sizes required on embedded devices. > Hopefully the other endpoint would be programmed to limit record > sizes, but is that something we want to rely on? This could be a > parameter agreed upon during the handshake, but that seems bad. > > > On Wed, Nov 23, 2016 at 12:41 AM, Nikos Mavrogiannopoulos t.com> wrote: > > On Wed, 2016-11-23 at 00:39 -0800, Judson Wilson wrote: > > > Can you send multiple records in one data transfer to achieve > > > whatever gains are desired? > > > > The packetization cost still remains even if you do that. However, > > the > > question is how does the 2^14 limit comes from, and why TLS 1.3 > > should > > keep it? > > > > regards, > > Nikos ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] record layer limits of TLS1.3
On 23 Nov 2016, at 10:30, Nikos Mavrogiannopoulos wrote: > On Wed, 2016-11-23 at 10:05 +0200, Yoav Nir wrote: >> Hi, Nikos >> >> On 23 Nov 2016, at 9:06, Nikos Mavrogiannopoulos >> wrote: >> >>> >>> Hi, >>> Up to the current draft of TLS1.3 the record layer is restricted >>> to >>> sending 2^14 or less. Is the 2^14 number something we want to >>> preserve? >>> 16kb used to be a lot, but today if one wants to do fast data >>> transfers >>> most likely he would prefer to use larger blocks. Given that the >>> length >>> field allows for sizes up to 2^16, shouldn't the draft allow for >>> 2^16- >>> 1024 as maximum? >> >> I am not opposed to this, but looking at real browsers and servers, >> we see that they tend to set the size of records to fit IP packets. > > IP packets can carry up to 64kb of data. I believe you may be referring > to ethernet MTU sizes. I’m referring to the IP packets that they actually use, and that is set by TCP to fit the PMTU, which is <= the ethernet MTU. In practice it is 1500 bytes for most network paths. > That to my understanding is a way to reduce > latency in contrast to cpu costs. An increase to packet size targets > bandwidth rather than latency (speed). Sure, but running ‘openssl speed’ on either aes-128-cbc or hmac or sha256 (there’s no test for AES-GCM or ChaCha-poly) you get smallish differences in terms of kilobytes per second between 1024-byte buffers and 8192-byte buffers. And the difference going to be even smaller going to 16KB buffers, let alone 64KB buffers. >> The gains from increasing the size of records from the ~1460 bytes >> that fit in a packet to nearly 64KB are not all that great, and the >> gains from increasing records from 16 KB to 64KB are almost >> negligible. At that size the block encryption dominates the CPU time. > > Do you have measurements to support that? I'm quite surprized by such a > general statement because packetization itself is a non-negligible cost > especially when encryption is fast (i.e., in most modern CPUs with > dedicated instructions). As long as you run over a network that has a smallish MTU, you’re going to incur the packetization costs anyway, either in your code or in operating system code. If you have a 1.44 GB file you want to send, it’s going to take a million IP packets either way and 100 million AES block operations. Whether you do the encryption is a million 1440-byte records or 100,000 14,400-byte records makes a difference of only a few percent. Measurement that we’ve made bear this out. They’re with IPsec, so it’s fragmentation rather than packetization, but I don’t think that should make much of a difference. Again, I’m not opposed to this. A few percent is a worthy gain. Yoav ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] record layer limits of TLS1.3
I worry about the buffer sizes required on embedded devices. Hopefully the other endpoint would be programmed to limit record sizes, but is that something we want to rely on? This could be a parameter agreed upon during the handshake, but that seems bad. On Wed, Nov 23, 2016 at 12:41 AM, Nikos Mavrogiannopoulos wrote: > On Wed, 2016-11-23 at 00:39 -0800, Judson Wilson wrote: > > Can you send multiple records in one data transfer to achieve > > whatever gains are desired? > > The packetization cost still remains even if you do that. However, the > question is how does the 2^14 limit comes from, and why TLS 1.3 should > keep it? > > regards, > Nikos > > ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] record layer limits of TLS1.3
On Wed, 2016-11-23 at 00:39 -0800, Judson Wilson wrote: > Can you send multiple records in one data transfer to achieve > whatever gains are desired? The packetization cost still remains even if you do that. However, the question is how does the 2^14 limit comes from, and why TLS 1.3 should keep it? regards, Nikos ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] record layer limits of TLS1.3
Can you send multiple records in one data transfer to achieve whatever gains are desired? On Wed, Nov 23, 2016 at 12:30 AM, Nikos Mavrogiannopoulos wrote: > On Wed, 2016-11-23 at 10:05 +0200, Yoav Nir wrote: > > Hi, Nikos > > > > On 23 Nov 2016, at 9:06, Nikos Mavrogiannopoulos > > wrote: > > > > > > > > Hi, > > > Up to the current draft of TLS1.3 the record layer is restricted > > > to > > > sending 2^14 or less. Is the 2^14 number something we want to > > > preserve? > > > 16kb used to be a lot, but today if one wants to do fast data > > > transfers > > > most likely he would prefer to use larger blocks. Given that the > > > length > > > field allows for sizes up to 2^16, shouldn't the draft allow for > > > 2^16- > > > 1024 as maximum? > > > > I am not opposed to this, but looking at real browsers and servers, > > we see that they tend to set the size of records to fit IP packets. > > IP packets can carry up to 64kb of data. I believe you may be referring > to ethernet MTU sizes. That to my understanding is a way to reduce > latency in contrast to cpu costs. An increase to packet size targets > bandwidth rather than latency (speed). > > > The gains from increasing the size of records from the ~1460 bytes > > that fit in a packet to nearly 64KB are not all that great, and the > > gains from increasing records from 16 KB to 64KB are almost > > negligible. At that size the block encryption dominates the CPU time. > > Do you have measurements to support that? I'm quite surprized by such a > general statement because packetization itself is a non-negligible cost > especially when encryption is fast (i.e., in most modern CPUs with > dedicated instructions). > > regards, > Nikos > > ___ > TLS mailing list > TLS@ietf.org > https://www.ietf.org/mailman/listinfo/tls > ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] record layer limits of TLS1.3
On Wed, 2016-11-23 at 10:05 +0200, Yoav Nir wrote: > Hi, Nikos > > On 23 Nov 2016, at 9:06, Nikos Mavrogiannopoulos > wrote: > > > > > Hi, > > Up to the current draft of TLS1.3 the record layer is restricted > > to > > sending 2^14 or less. Is the 2^14 number something we want to > > preserve? > > 16kb used to be a lot, but today if one wants to do fast data > > transfers > > most likely he would prefer to use larger blocks. Given that the > > length > > field allows for sizes up to 2^16, shouldn't the draft allow for > > 2^16- > > 1024 as maximum? > > I am not opposed to this, but looking at real browsers and servers, > we see that they tend to set the size of records to fit IP packets. IP packets can carry up to 64kb of data. I believe you may be referring to ethernet MTU sizes. That to my understanding is a way to reduce latency in contrast to cpu costs. An increase to packet size targets bandwidth rather than latency (speed). > The gains from increasing the size of records from the ~1460 bytes > that fit in a packet to nearly 64KB are not all that great, and the > gains from increasing records from 16 KB to 64KB are almost > negligible. At that size the block encryption dominates the CPU time. Do you have measurements to support that? I'm quite surprized by such a general statement because packetization itself is a non-negligible cost especially when encryption is fast (i.e., in most modern CPUs with dedicated instructions). regards, Nikos ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
Re: [TLS] record layer limits of TLS1.3
Hi, Nikos On 23 Nov 2016, at 9:06, Nikos Mavrogiannopoulos wrote: > Hi, > Up to the current draft of TLS1.3 the record layer is restricted to > sending 2^14 or less. Is the 2^14 number something we want to preserve? > 16kb used to be a lot, but today if one wants to do fast data transfers > most likely he would prefer to use larger blocks. Given that the length > field allows for sizes up to 2^16, shouldn't the draft allow for 2^16- > 1024 as maximum? I am not opposed to this, but looking at real browsers and servers, we see that they tend to set the size of records to fit IP packets. The gains from increasing the size of records from the ~1460 bytes that fit in a packet to nearly 64KB are not all that great, and the gains from increasing records from 16 KB to 64KB are almost negligible. At that size the block encryption dominates the CPU time. Yoav ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls
[TLS] record layer limits of TLS1.3
Hi, Up to the current draft of TLS1.3 the record layer is restricted to sending 2^14 or less. Is the 2^14 number something we want to preserve? 16kb used to be a lot, but today if one wants to do fast data transfers most likely he would prefer to use larger blocks. Given that the length field allows for sizes up to 2^16, shouldn't the draft allow for 2^16- 1024 as maximum? regards, Nikos ___ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls