Hi Hannes,

On 2019-02-14, 11:50, "Hannes Tschofenig" <hannes.tschofe...@arm.com> wrote:

    Hi Göran,
    
    I will obviously not be able to convince you to change your research 
strategy. So, I will not even try.

This is not just a research topic, but if this means that you respect that 
different companies may have different strategies and want to be able to choose 
between solutions with different properties, then I'm grateful for that. Also, 
thanks for the pointers comparing ARM processors.

Göran

    Anyway, thanks for the performance measurements your co-workers created in 
the Excel sheets. I will take a closer look at them.
    
    One item worthwhile to respond is the choice of the MCU. You wrote:
    [GS] Nice application of LwM2M. The showcased device didn't seem very 
constrained though, ARM Cortex M4?
    
    The Cortex M4 offers a larger instruction set, including DSP/SIMD 
capabilities, compared to something like the M0+. You can see the differences 
at https://en.wikipedia.org/wiki/ARM_Cortex-M
    In this blog post, see 
https://community.arm.com/processors/b/blog/posts/armv6-m-vs-armv7-m---unpacking-the-microcontrollers,
 Chris Shore shows the difference in the instruction set graphically.
    
    Using these extra instructions code can be executed faster. This faster 
execution time is already ensured by compilers but if you additionally use 
hand-crafted Assembly code then you will get an extra performance improvement. 
My co-workers from the Mbed TLS team have written hand-crafted Assembly to 
speed up bignum computations, see 
https://github.com/ARMmbed/mbedtls/blob/development/include/mbedtls/bn_mul.h#L645
    
https://github.com/ARMmbed/mbedtls/blob/development/include/mbedtls/bn_mul.h#L582
    
    Executing code faster gives the device the ability to enter a low power 
state quicker.
    
    Additionally, if you use sensor fusion then having floating point support 
in hardware will make your life easier (and the code faster).
    
    Ciao
    Hannes
    
    -----Original Message-----
    From: Göran Selander <goran.selan...@ericsson.com>
    Sent: Montag, 4. Februar 2019 18:41
    To: Hannes Tschofenig <hannes.tschofe...@arm.com>; secdispa...@ietf.org; 
ace@ietf.org
    Subject: Re: [Secdispatch] FW: [secdir] EDHOC and Transports
    
    Hi Hannes, secdispatch, and ace,
    
    (It seems Hannes original mail only went to secdispatch.)
    
    Apologies for a long mail, and late response. I had to ask some people for 
help with calculations, see end of this mail.
    
    On 2019-01-25, 15:15, "Secdispatch on behalf of Hannes Tschofenig" 
<secdispatch-boun...@ietf.org on behalf of hannes.tschofe...@arm.com> wrote:
    
        Fwd to SecDispatch since it was only posted on the SecDir list
    
        -----Original Message-----
        From: Hannes Tschofenig <hannes.tschofe...@arm.com>
        Sent: Freitag, 25. Januar 2019 14:07
        To: Hannes Tschofenig <hannes.tschofe...@arm.com>; Jim Schaad 
<i...@augustcellars.com>; sec...@ietf.org
        Subject: RE: [secdir] EDHOC and Transports
    
        A minor follow-up: I mentioned that I am aware of a company using the 
energy scavenging devices and it turns out that this information is actually 
public and there is even a short video on YouTube. The company we worked with 
is called Alphatronics and here is the video: 
https://www.youtube.com/watch?v=JHpJV_CPYb4
    
        As you can hear in the video we have been using our Mbed OS together 
with our device management solution (LwM2M with DTLS and CoAP) for these types 
of devices.
    
    [GS] Nice application of LwM2M. The showcased device didn't seem very 
constrained though, ARM Cortex M4?
    
        -----Original Message-----
        From: secdir <secdir-boun...@ietf.org> On Behalf Of Hannes Tschofenig
        Sent: Freitag, 25. Januar 2019 13:52
        To: Jim Schaad <i...@augustcellars.com>; sec...@ietf.org
        Subject: Re: [secdir] EDHOC and Transports
    
    
       [Hannes]  what we are doing here is making an optimization. For some 
(unknown reason) we have focused our attention to the over-the-wire 
transmission overhead (not code size, RAM utilization, or developer usability*).
    
    [GS] Exactly my point, it is not enough with reducing transmission 
overhead. We should also look at additional memory, flash, and configuration 
effort. These parameters are of course implementation dependent but can to some 
extent be inferred by bulk of specification and what pre-existing code can be 
reused.
    
       [Hannes]  We are doing this optimization mostly based on information 
about what other people tell us rather than based on our experience. The 
problem is that we have too few people with hands-on knowledge and/or 
deployment experience and if they have that experience they may not like to 
talk about it. So, we are stepping around in the dark and mostly perceived 
problems.
    
    [GS] I don't think this rhetoric is very helpful. Who are "us"? The 
co-workers you quote below, are they "us" or the "other people"? The people 
active in 6tisch, lpwan or 6lo who are supporting the work on an optimized key 
exchange, are they "us" or the "other people"?
    
    
       [Hannes]  Having said that I would like to provide a few remarks to your 
list below:
    
      [Jim]   1.  Low-power devices that either are battery based or scavenge 
power, these devices pay a power penalty for every byte of data sent and thus 
have a desire for the smallest messages possible.
    
        [Hannes] Low power is a very complex topic since it is a system issue 
and boiling it down to the transmission overhead of every byte is an 
oversimplification. You are making certain assumptions of how power consumption 
of radio technologies work, which will be hard to verify. I have been working 
on power measurements recently (but only focused on power measurements of 
crypto, see 
https://community.arm.com/arm-research/b/articles/posts/testing-crypto-performance-and-power-consumption).
    
    [GS] These kind of power measurements of crypto are part of the explanation 
for why transmission overhead is important to reduce. Optimizations and 
hardware support make the crypto contribution to power consumption possible to 
handle, so that there is no reason to deviate from the use of current best 
practice crypto in security protocols even for constrained devices. The energy 
cost for transmission, however, is a strongly coupled to the laws of physics 
which sets a limit for how much they can be optimized.
    
    [Hannes] I doubt that many people on this list nor in the IETF have a lot 
of experience in this field to use this as a basic for an optimization.
    
    [GS] There are people in 6tisch, lpwan and 6lo who knows about power 
consumption and constrained characteristics. Some of them were supporting EDHOC 
in ACE when you were chair.
    
    [Hannes]   My co-workers, who are active in this space, tell me that there 
is nothing like a "per byte" linear relationship (for small quantities of data) 
in terms of energy cost. Obviously if you trigger "an additional transmission", 
which requires you to ramp up a PLL, turn on radio amplifiers, send lengthy 
preambles etc then the incremental cost of sending 64 bytes in that packet vs 
16 bytes might be immeasurable small. The critical thing appears to be how long 
the RF amplifiers are powered on. Hence, you will often see publications that 
tell you that waiting for incoming packets is actually the most expensive task 
(in terms of power consumption).
    
    [GS] Energy consumption generally increases with message overhead in 
wireless systems. This function is different for different radio technologies, 
data rates, etc. Even if we pick a certain technology like 6tisch, LoRaWAN or 
NB-IoT, events like packet loss and retransmission impacts the result. So 
indeed, this is complicated, but we can still make general claims as well as 
estimates of particular technologies. I asked a colleague to make some power 
consumption estimates for NB-IoT devices. NB-IoT is licensed spectrum, which 
implies that the devices are allowed to transmit at a higher power compared to 
unlicensed spectrum. It also means that the application provider in general 
does not control how good the coverage is, since that depends on location of 
base station and environment. A comparison [3] between DTLS 1.3 and EDHOC is 
given at the end of this mail, but just because you mentioned the incremental 
cost of a device sending 64 vs 16 bytes, the difference is indeed measurable: 
992 mJ vs 479 mJ, i.e. half a Joule of difference in a case of low coverage 
(see [3]).
    
    [GS]: About cost for listening: there are different techniques for 
decreasing time to listen, like time slots, DRX etc. These are examples of 
where the radio guys can be innovative and make optimizations, in contrast to 
transmission overhead for security where they just have to accept what the 
security people decided.
    
      [Jim]  2. CoAP over SMS:  SMS has a 140 byte packet size.  There are two 
approaches for dealing with packets of larger than 140 bytes:  1) There is a 
method of appending multiple packets together to form a single larger packet.  
2) You can use CoAP blockwise transfer.  Using CoAP blockwise would result in 
128 byte packets for the underlying transfer assuming that only 12 bytes are 
needed for the CoAP header itself.
    
        [Hannes] It turns out that CoAP over SMS is rarely used for delivering 
data of IP-based devices since SMS is a pretty expensive transport. From my 
work in the OMA I know that people use SMS to trigger the wake-up of devices 
and then switch to regular data transmission over IP. IMHO optimizing for use 
cases that barely anyone  uses appears to be a waste of time.
    
    [GS]  I strongly disagree with the general argument that what is currently 
applied is the only thing that is worth working on. One problem with this type 
of argument is that it reinforces the existing limitations and becomes a 
self-fulfilling prophecy. The fact that key exchange protocol messages 
currently does not fit into an SMS contributes to the reason why it is not so 
much used. More SMSs also adds to cost, but the cost depends on the agreement 
with the operator so is not necessarily a hard limitation. Who are we to 
predict what technology will used given a more efficient key exchange protocol? 
For EDHOC with PSK or RPK, each message fits into one SMS.
    
    
     [Jim]   3. 6LoPan over IEEE 802.15.4:  This has a packet size of 127 
bytes.  The maximum frame overhead size is 25 bytes allowing for 102 bytes of 
message
        space.   If one assumes 20 bytes of overhead for CoAP then this means a
        protocol packet size of 82 bytes.  If one needs to break the message 
across multiple packets then the maximum data size is going to be 64 bytes 
using CoAP blockwise options.
    
        [Hannes] For some reason there seems to be the worry that a small MTU 
size at the link layer will cause a lot of problems. There are some radios that 
have this small MTU size, IEEE 802.15.4 and Bluetooth Low Energy belong to 
them. It turns out, however, that higher layers then offer fragmentation and 
reassembly support so that higher layers just don't get to see any of this. In 
IEEE 802.15.4 this fragmentation & reassembly support is offered by 6lowpan and 
in case of Bluetooth Low Energy the link layer actually consists of various 
sub-protocols. One of them offers fragmentation & reassembly. As such, the 
problem you describe is actually not a problem. There is no reason why you 
always have to put a single application layer payload into a single link layer 
frame.  We have been using LwM2M (which uses DTLS and CoAP) over IEEE 802.15.4 
networks successfully for big commercial deployments. We have not run into 
problems with the smaller MTU size at the lower layers.
    
    [GS] I'm happy to hear you don’t experience any problems, but MTU sizes 
does matter. If message overhead at a higher layer causes fragmentation at a 
lower layer, instead of only powering up the radio and sending the physical 
preamble once, it will be necessary to do that once per each fragment in the 
next transmission opportunity at the MAC layer. On top of this wireless links 
can be quite lossy, particularly with low-power radios like what is used e.g. 
with 6tisch. For example, Packet Delivery Ratio (PDR) that you will typically 
find indoors with 802.15.4 radios is 60-80% [1]. Now, when you pass from this 
single frame to multiple fragments, you also exponentially increase the 
probability that one of those fragments will get lost and that it needs to be  
retransmitted. It often occurs that the endpoint performing the reassembly of 
the fragments just drops the whole thing in case one of the fragments gets 
lost. This then results in retransmissions of all fragments at the sending 
endpoint, their link-layer retransmissions, etc, all employing the costly radio 
operations that you describe. Having this handled by "lower layer" only means 
that the application developer does not have to handle it himself, but the 
energy penalty for the system does not go away!
    
    [GS] Fragmentation also adds to latency in several ways. E.g. for LoRaWAN 
which operates on unlicensed band, in Europe 868 MHz, there is the concept of 
1% duty cycle meaning that for each transmission the device must wait 100 times 
as long interval as message sending time before it is allowed to transmit 
again. LoRaWAN is currently PSK based and this is one example where a key 
exchange protocol would improve the overall security both in the case of PSK 
and RPK, see [2] for an analysis using EDHOC with PSK ECDHE.
    
    [GS] A comparison [4] of time on air between DTLS 1.3 handshake and EDHOC 
are given at the end of the mail. Since for LoRaWAN the maximum MTU is 242 
bytes, DTLS handshake with RPK ECHDE does not even fit and would require some 
fragmentation scheme (+ the 100 times additional delay). Depending on radio 
conditions, the higher data rates associated with 242 bytes may incur too much 
packet loss requiring the use of a lower data rate with associated lower frame 
size and even more severe message overhead restrictions to avoid fragmentation.
    
      [Hannes]  When it comes to energy scavenging devices then it becomes even 
more challenging since this is a more rarely used case. I know about one 
company doing this and I have spoken with a researchers at last year's Arm 
research summit who show-cased one device. The device shown by the researcher 
was a prototype and didn't use any Internet protocol nor a security mechanism. 
I wouldn't call myself knowledgeable enough to optimize a system based on this 
experience but maybe you have more expertise in this field. I am happy to learn 
more.
    
    [GS] As mentioned in my previous mail, the scope of this work is about 
optimizing security for deployments that can support some kind of CoAP stack, 
e.g. CoAP/UDP/IP or CoAP over some link technology.
    
    
    [Hannes] The handshake itself is just a very small part of the overall size 
of data that gets transmitted during the lifetime of the device since the 
handshake obviously happens extremely rarely.
    
    [GS] How often a handshake is invoked is application dependent, it could 
for example be the result of the device needs to power off, or because the 
device reboots. If one handshake consumes as much energy as months of normal 
operations, then this contribution may well be noticeable in the lifetime of 
the battery.
    
    [Hannes] There are much better ways to optimize traffic and you obviously 
have to look at all the data you are transmitting for the device.
    
    [GS] How much further optimization you can do is application dependent, and 
for some applications security overhead matters.
    
        Ciao
        Hannes
    
        *: In my experience the ability for developers to easily use any of the 
performance optimization techniques is the biggest barrier for gaining 
performance. Of course, this does not fit nicely in any of the standardization 
efforts in the IETF so the focus has to be somewhere else.
    
    [GS] The need for performance optimizations depends on the design of the 
protocol, so there are definitely efforts in the IETF which can make the life 
easier for developers.
    
    [GS] Now for the comparisons:
    
    NB-IoT
    ======
    Calculations of energy consumption for NB-IoT comparing EDHOC and DTLS 1.3 
handshake is given in [3]
    
    PSK + ECDHE (normal coverage)
    ----------------
    DTLS 1.3 handshake: 47 mJ
    EDHOC: 19 mJ
    
    PSK + ECDHE (low coverage)
    ----------------
    DTLS 1.3 handshake: 2992 mJ
    EDHOC: 912 mJ
    
    
    RPK + ECDHE (normal coverage)
    ----------------
    DTLS 1.3 handshake: 64 mJ
    EDHOC: 29 mJ
    
    RPK + ECDHE (low coverage)
    ----------------
    DTLS 1.3 handshake: 4326 mJ
    EDHOC: 1677 mJ
    
    
    We see that the factor 4 in message overhead with PSK ECDHE between DTLS 
1.3 handshake and EDHOC (appendix E of EDHOC) is translated to a factor 2.5-3.3 
in energy consumption for a NB-IoT device depending on coverage. Analogously 
the factor 3 in message overhead with RPK ECDHE is translated into a factor 2.2 
- 2.6 in energy consumption.
    
    
    LoRaWAN
    ======
    Calculations of time-of-air of handshake of EDHOC and DTLS 1.3 for LoRaWAN 
is given in [4]
    
    PSK + ECDHE
    ----------------
    DTLS 1.3
    Message #1: 564 ms
    Message #2: 574 ms
    Message #3: 226 ms
    
    EDHOC:
    Message #1: 195 ms
    Message #2: 205 ms
    Message #3: 113 ms
    
    RPK + ECDHE
    -----------------
    DTLS 1.3: N/A without fragmentation scheme
    
    EDHOC:
    Message #1: 184 ms
    Message #2: 389 ms
    Message #3: 297 ms
    
    As mentioned above, the time-on-air is an important property for LoRaWAN 
deployments since it both relates to power consumption and latency, in 
particular due to duty cycles.
    
    
    Summary
    =======
    There is a lot that speaks in favor of low message overhead, for example
    
    * Smaller per-byte contribution to power consumption, which has significant 
impact in e.g. licensed spectrum
    * Less latency, in particular due to duty cycles in LoRaWAN
    * Better fit into MTUs with less fragmentation and associated overhead
    * Smaller probability of packet loss
    
    The comparisons presented here show that DTLS 1.3 is far from optimal. Let 
me reiterate that this should not be interpreted as a criticism against 
TLS/DTLS. We are targeting applications in constrained environments which the 
TLS handshake was explicitly not designed to optimize for. We agree that for 
many IoT applications the performance of the handshake is adequate, so there is 
no need to change DTLS. We also agree that message overhead is only one aspect, 
and it is really important to look at other aspects such as memory, code 
footprint and usability, which all speak in favor of a protocol with limited 
functionality and which reuses existing code in the devices such as CBOR and 
COSE. For certain application providers current IETF protocols are prohibitive 
in one or more of these aspects, and unless the performance is drastically 
improved some consider (still, 2019) to skip end-to-end security (e.g. 
terminate security in a gateway), make their own security protocol, or use more 
pragmatic key exchange constructions like Noise [5].
    
    I would like to leave the comparison exercise soon and focus on the 
security properties. I hope we have made a point that constrained 
characteristics matter. Can the IETF support work on a key exchange protocol 
that is designed for the constrained IoT, or are we restricted to retrofit some 
other protocol with other design goals?
    
    
    Göran
    
    
    [1] Muñoz, Jonathan, et al. "Why Channel Hopping Makes Sense, even with 
IEEE802. 15.4 OFDM at 2.4 GHz." 2018 Global Internet of Things Summit (GIoTS). 
IEEE, 2018.
    [2] Sanchez-Iborra, Ramon, et al., "Enhancing LoRaWAN Security through a 
Lightweight and Authenticated Key Management Approach", Sensors, 2018 
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6021899/
    [3] NB-IoT power consumption comparison EDHOC-DTLS 1.3 
https://github.com/EricssonResearch/EDHOC/blob/master/docs/NB%20IoT%20power%20consumption.xlsx
    [4] LoRaWAN Time-of-Air comparison EDHOC-DTLS 1.3 
https://github.com/EricssonResearch/EDHOC/blob/master/docs/LoRaWAN_ToA.xlsx
    [5] The Noise Protocol Framework
    http://www.noiseprotocol.org/
    
    
    IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
    

_______________________________________________
Ace mailing list
Ace@ietf.org
https://www.ietf.org/mailman/listinfo/ace

Reply via email to