On 11/12/23 00:44, Gernot Heiser via Devel wrote:
> On 11 Nov 2023, at 07:49, Demi Marie Obenour <demioben...@gmail.com> wrote:
>>
>> On 11/9/23 17:47, Gernot Heiser via Devel wrote:
>>> On 10 Nov 2023, at 06:03, Demi Marie Obenour <demioben...@gmail.com> wrote:
>>>
>>>> - Speculative taint tracking provides complete protection against
>>>> speculative attacks.  This is sufficient to prevent leakage of
>>>> cryptographic key material, even in fully dynamic systems.
>>>> Furthermore, it is compatible with fast context switches between
>>>> protection domains.
>>>
>>> It’s also a point solution, that provides zero guarantees against 
>>> unforeseen attacks.
>>
>> Unless I am severely mistaken, it provides complete protection for code
>> that has secret-independent timing, such as cryptographic software.  It
>> is also cheaper than some of the workarounds existing systems must use.
> 
> Well, *if* speculative taint tracking is really completely and correctly 
> implemented, *and* you have such magic hardware. That’s a strong statement 
> (for which there’s no proof). But let’s assume it is true.
> 
> Then you may a complete protection against *speculation* attacks.
> 
> Remember, speculation attacks construct a Trojan in otherwise trustworthy 
> code using speculative execution of gadgets. There are other ways, 
> specifically control-flow attacks
> 
> So order to be secure, you then “only” need:
> - the magic complete and performant implementation of taint tacking, AND
> - complete prevention of control-flow attacks, AND
> - all secret-handling code being free from algorithmic timing side channels 
> (i.e. no branching on or indexing by secrets), AND
> - no untrusted code, because any untrusted code may contain a Trojan that 
> actively leaks through caches etc

The first is a reasonable assumption for hardware with taint tracking,
and the rest are reasonable assumptions for cryptographic code.

> If you’re comfortable with all those ifs, fine. I’m not.
> 
>>> - Full time partitioning eliminates all timing channels, but it is
>>> possible only in fully static systems, which severely limits its
>>> applicability.
>>>
>>>> I’m sorry, but this is simply false.
>>>>
>>>> What you need for time protection (I assume this is what you mean with 
>>>> “full time partitioning”) are fixed time slices – ”fixed” in that their 
>>>> length cannot depend on any events in the system that can be controlled by 
>>>> an untrusted domain. It doesn’t mean they cannot be changed as domains 
>>>> come and go.
>>
>> Based on what information should I set these time slices?
> 
> That’s OS/hypervisor policy. Every system I know assigns time slices, that’s 
> normal.
> 
> But note, the strictly fixed (in the sense of not influencable by user code) 
> time slices are only needed if you want to prevent *all* timing channels, in 
> this case leaking by controlling the timing of context switches.
> 
> This is a relatively low-bandwidth channel, i.e. it will need a minute or tow 
> to leak an SSL key. If you’re fine with that then there’s no need for fixed 
> time slices.

Is this with both sides cooperating or not?  Covert channel attacks are out of
scope.

>>>> - Time protection without time partitioning does _not_ fully prevent
>>>> Spectre v1 attacks, and still imposes a large penalty on protection
>>>> domain switches.
>>>>
>>> Time protection does *not* impose a large penalty. Its context-switching 
>>> cost is completely hidden by the cost of an L1 D-cache flush – as has been 
>>> demonstrated by published work. And if you don’t flush the L1 cache, you’ll 
>>> have massive leakage, taint-tracking or not.
>>>
>>> Where time protection, *without further hardware support*, does have a cost 
>>> is for partitioning the lower-level caches. This cost is two-fold:
>>>
>>> 1) Average cache utilisation is reduced due to the static partitioning (in 
>>> contrast to the dynamic partitioning that happens as a side effect of the 
>>> hardware’s cache replacement policy). This cost is generally in the 
>>> single-digit percentage range (as per published work), but can also be 
>>> negative – there’s plenty of work that uses static cache partitioning for 
>>> performance *isolation/improvement*.
>>
>> Static partitioning based on _what_?  On a desktop system, the dynamic 
>> behavior
>> of a workload is generally the _only_ information known about that workload, 
>> so
>> any partitioning _must_ be dynamic.
> 
> Again, if you trust all your code to not intentionally leak secrets, then you 
> don’t have to do this.
> 
> Cache channels are very high bandwidth. Even cache side-channels have high 
> enough bandwidth to steal encryption keys in minutes.

How much would this be reduced by a cache that was fully associative,
or which emulated full associativity?

> If your threat scenario doesn’t care about this, fine. But there’s no way of 
> preventing cache channels other than flushing or partitioning.

Environments that can use cache flushing or partitioning should.
However, the cost of flushing may be more than one can bear, and static
partitioning requires information that desktop (and mobile) OSs simply
don’t have.  The question then becomes how much one can minimize the
channel before the performance (and power consumption) penalty exceeds
what users are willing to tolerate.

> So, it all depends on your threat scenario.

It doesn’t just depend on threat scenario, but also on what one can afford.

Desktop and mobile OSs run workloads their developers may have never dreamed
of.  And users expect them to remain performant and responsive.  I have yet to
see anyone 

> If your threat scenario is that:
> - your hypervisor/kernel is completely trusted.
> - all secret-handling code is trusted to
>    - not have algorithmic timing channels
>    - not be susceptible to control-flow attacks
>    - be free of Trojans
> … then speculation attacks may be your main worry and you can ignore all the 
> other timing channels, and you *may* be covered by (complete and 
> properly-implemented) speculation taint tracking.

Algorithmic timing channels are the only threat on that list I am concerned 
about for
the purposes of this discussion.

> That’s a lot of ifs – too many for my comfort.
> 
> Note, this tracking adds a fair amount of complexity to the processor, which 
> means there’s a high likelihood the implementation is buggy. This is in 
> contract to fence.t, which is extremely simple to implement.

If I can’t use it, it is of no use to me.

>>> And, of course, without partitioning the lower-level caches you have 
>>> leakage again, and taint tracking isn’t going to help there either.
>>>
>>> If people want to improve the hardware, focussing on generic mechanisms 
>>> such as support for partitioning L2-LL caches would be far more beneficial 
>>> than point-solutions that will be defeated by the next class of attacks.
>>
>> I would much rather have a theoretically sound solution than an unsound one.
>> However, it is even more important that my desktop actually be able to do the
>> tasks I expect of it.  To the best of my knowledge, time protection and a 
>> usable
>> desktop are incompatible with each other.  I do hope you can prove me wrong 
>> here.
> 
> If you want a theoretically sound solution – Welcome to Time Protection!

I _want_ a theoretically sound solution.

I _need_ a usable desktop.

If I must use between a theoretically sound solution and a usable
system, soundness loses.  Every time.

I _know_ that Qubes OS could use speculative taint tracking if the
hardware supported it.  I can say the same about fully-associative
caches and other pure-hardware countermeasures.

Are you equally certain that Qubes OS could take advantage of time
protection, and do so in a way that is completely transparent to the
user?  If so, how?  “Completely transparent to the user” also means
that the performance penalty must not be more than 30% or so, even
when one protection domain unexpectedly needs access to most of the
system CPU and memory.

> In contrast to your implied claim that time protection is unsound: It’s been 
> formalised, and it’s in the process of being proved correct and complete in 
> an seL4 implementation. Its minimal hardware support also been implemented in 
> RISC-V processors and demonstrated cheap and complete (may even be in silicon 
> by now).

I misunderstood time protection as referring to using fence.t on context
switch, as opposed to also ensuring fixed-size timeslices.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

_______________________________________________
Devel mailing list -- devel@sel4.systems
To unsubscribe send an email to devel-leave@sel4.systems

Reply via email to