Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread Mouse
> Having read several papers on the exploitation of cache latency to
> defeat aslr (kernel or not), it appears that disabling the rdtsc
> instruction is a good mitigation on x86.

I don't really know x86.  But, based on the little I do know, my
reactions are:

(1) Please provide a kernel build option to remove the restriction.
There probably aren't very many systems where fast precise time for
nonprivileged users is considered more important than defeating
defeating ASLR, but I'd be astonished if there were none.  (I'm
reminded of the _huge_ performance hit non-executable stack imposed on
a few of my programs back when it came in; it was so bad I ended up
removing that change from the kernel.)

(2) Does that actually help, or does it just compel the attacker to use
cruder timers and thus longer test runs?  (Or is that enough difference
that you believe it would actually help in practice?)

(3) Do you maybe want to log something, and/or print to the process's
tty and/or the console, so that users whose programs start mysteriously
crashing have at least a fighting chance of figuring out why?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread David Young
On Tue, Mar 28, 2017 at 04:58:58PM +0200, Maxime Villard wrote:
> Having read several papers on the exploitation of cache latency to defeat
> aslr (kernel or not), it appears that disabling the rdtsc instruction is a
> good mitigation on x86. However, some applications can legitimately use it,
> so I would rather suggest restricting it to root instead.

I may not understand some of your premises.

Why do you single out the rdtsc instruction instead of other time
sources?

What do you mean by "legitimately" use rdtsc?  It seems to me that it
is legitimate for a user to use a high-resolution timer to profile some
code that's under development.  They may want to avoid running that code
with root privileges under most circumstances.

Dave

-- 
David Young
dyo...@pobox.comUrbana, IL(217) 721-9981


Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread Manuel Bouyer
On Tue, Mar 28, 2017 at 11:30:52AM -0500, David Young wrote:
> [...]
> What do you mean by "legitimately" use rdtsc?  It seems to me that it
> is legitimate for a user to use a high-resolution timer to profile some
> code that's under development.  They may want to avoid running that code
> with root privileges under most circumstances.
> 

Sure.
At the very last a sysctl to remove the restriction is needed.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread Joerg Sonnenberger
On Tue, Mar 28, 2017 at 11:30:52AM -0500, David Young wrote:
> What do you mean by "legitimately" use rdtsc?  It seems to me that it
> is legitimate for a user to use a high-resolution timer to profile some
> code that's under development.  They may want to avoid running that code
> with root privileges under most circumstances.

In fact, one of the open project items is to make it possible to do
gettimeofday and friends without a system call at all on x86. As usual,
questionable gains in security complete defeat performance again...

Joerg


Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread Maxime Villard

Answering to each of your questions in one mail, with a few notes at the end.
First of all, it is just a wild idea from when I was in the train the other
day, and I haven't written any code for it. Then:

Le 28/03/2017 à 18:01, Mouse a écrit :

(1) Please provide a kernel build option to remove the restriction.
[...]


My original plan was to use a sysctl - as suggested by Manuel. One to
enable/disable the feature, another to log the segfaults.

Le 28/03/2017 à 18:01, Mouse a écrit :

(2) Does that actually help, or does it just compel the attacker to use
cruder timers and thus longer test runs?  (Or is that enough difference
that you believe it would actually help in practice?)


It does help, and that's the conclusion of most papers. There is however
another technique (software clock) to try to compute the number of cycles an
operation takes, but the resulting accuracy is very low, and not sufficient
to detect cache misses via latency.

Le 28/03/2017 à 18:30, David Young a écrit :

Why do you single out the rdtsc instruction instead of other time
sources?


Because of accuracy. As far as the papers point out, detecting cache misses
implies having a precision of at least ~50 cycles, and only rdtsc offers
this precision. Syscalls and other software-based timers have a non-
deterministic overhead that is bigger than ~50 cycles, and it therefore
pollutes the relevant information.

Le 28/03/2017 à 18:30, David Young a écrit :

What do you mean by "legitimately" use rdtsc?  It seems to me that it
is legitimate for a user to use a high-resolution timer to profile some
code that's under development.  They may want to avoid running that code
with root privileges under most circumstances.


Just like you said, that some users need to profile some code they develop.
They may indeed want to avoid running their tests with root privileges, and
that's where the sysctl is useful - they can disable the feature if they
want to.

A few notes now. In fact, the rdpmc instruction can also be used for side-
channel attacks, but we don't enable it currently so it does not matter.

Regarding serialization, I may not have been clear enough too. rdtsc is not
serializing, which means that it does not wait for the previous instructions
to execute completely before being executed. To compensate for that the
user needs to first execute a serializing instruction like cpuid, and right
after that put the rdtsc. With the fault approach, serialization is ensured,
because when returning to userland 'iret' is used, which is serializing. So
we have a 'iret+rdtsc', which has the same effect as 'cpuid+rdtsc'.

Also, a detail about my remark on accuracy. The basic use case for rdtsc is
the following:
start = rdtsc
work
end = rdtsc
elapsed = end - start
Here, we will fault on the first rdtsc; so the kernel will be entered, and
many cycles will be consumed there. But it does not matter, since the first
rdtsc is used as the starting point, and we don't care about adding cycles
before it. Therefore, the number of elapsed cycles is the same, with and
without the feature.

Finally, I'll add that there are other mitigations available on rdtsc, which
consist for example in adding a random (small) delta to the counter directly,
in order to fuzz the results. But then there is the problem of how big this
delta needs to be: big enough to mitigate side-channels, small enough to
still give relevant - yet a little inaccurate - information back to userland.


Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread Mouse
> There is however another technique (software clock) to try to compute
> the number of cycles an operation takes, but the resulting accuracy
> is very low, and not sufficient to detect cache misses via latency.

> [D]etecting cache misses implies having a precision of at least ~50
> cycles, and only rdtsc offers this precision.  Syscalls and other
> software-based timers have a non-deterministic overhead that is
> bigger than ~50 cycles, and it therefore pollutes the relevant
> information.

But, by repeating observations, it is possible to detect signals well
below the apparent noise floor.  NTP does this with timekeeping
(submicrosecond accuracy over network paths having tens of microseconds
of latency jitter).  GPS receivers do similar things with radio
reception.  I have little-to-no faith that this change would do more
than raise the work factor; my question was essentially asking whether
you believe the increase in work factor would be enough to make it
infeasible.  I am pessimistic on that question, given how long stealthy
malware has been known to run on machines without being noticed.

Hence also my question about changing the kernel's location at runtime.
If the address space base changes every second, say, any technique to
discover it that takes longer than a second becomes useless.

Of course, this is all about something that is, really,
belt-and-suspenders.  That doesn't make it useless, but it _is_ a
second layer, not a primary defense.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread Taylor R Campbell
> Date: Tue, 28 Mar 2017 16:58:58 +0200
> From: Maxime Villard 
> 
> Having read several papers on the exploitation of cache latency to defeat
> aslr (kernel or not), it appears that disabling the rdtsc instruction is a
> good mitigation on x86. However, some applications can legitimately use it,
> so I would rather suggest restricting it to root instead.

Put barriers in the way of legitimate applications to thwart
hypothetical attackers who will... step around them and use another
time source, of which there are many options in the system?  This
sounds more like cutting off the nose to spite the face than a good
mitigation against real attacks.


Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread David Young
On Tue, Mar 28, 2017 at 06:47:11PM +0200, Manuel Bouyer wrote:
> On Tue, Mar 28, 2017 at 11:30:52AM -0500, David Young wrote:
> > [...]
> > What do you mean by "legitimately" use rdtsc?  It seems to me that it
> > is legitimate for a user to use a high-resolution timer to profile some
> > code that's under development.  They may want to avoid running that code
> > with root privileges under most circumstances.
> > 
> 
> Sure.
> At the very last a sysctl to remove the restriction is needed.

Just to expand on that, an interface to set the restriction on a
per-process (per-thread?) level would be handy.

Capabilities beckon! :-)

Dave

-- 
David Young
dyo...@pobox.comUrbana, IL(217) 721-9981


Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread Paul.Koning

> On Mar 28, 2017, at 2:37 PM, Taylor R Campbell 
>  wrote:
> 
> [EXTERNAL EMAIL]
> 
>> Date: Tue, 28 Mar 2017 16:58:58 +0200
>> From: Maxime Villard 
>> 
>> Having read several papers on the exploitation of cache latency to defeat
>> aslr (kernel or not), it appears that disabling the rdtsc instruction is a
>> good mitigation on x86. However, some applications can legitimately use it,
>> so I would rather suggest restricting it to root instead.
> 
> Put barriers in the way of legitimate applications to thwart
> hypothetical attackers who will... step around them and use another
> time source, of which there are many options in the system?  This
> sounds more like cutting off the nose to spite the face than a good
> mitigation against real attacks.

More in general, it seems to me that the answer to timing attacks is not to 
attempt to make timing information unavailable (which is not doable, as has 
been explained already) -- but rather to fix the algorithm to remove the 
vulnerability.

paul



Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread Maxime Villard

Le 28/03/2017 à 20:32, Mouse a écrit :

But, by repeating observations, it is possible to detect signals well
below the apparent noise floor. [...] my question was essentially asking
whether you believe the increase in work factor would be enough to make it
infeasible.


Theoretically, it does indeed only increase the work factor. I don't have
a clear answer on how big this factor is; but as a comparison, one of the
papers I found demonstrated that fuzzing the counter with a delta increased
the average time of computation from five minutes to ten years, while this
is a relatively simple technique.

Now, if we take into account the entropy of having a software-based time
source (scheduling, interrupts, locks, TLB/cache hits/misses), I would
naively say the increase in work factor is a good deterrent.

But I agree that demonstrating it would require more research than the
effort I'm currently putting into it.

Le 28/03/2017 à 20:32, Mouse a écrit :

Hence also my question about changing the kernel's location at runtime.
If the address space base changes every second, say, any technique to
discover it that takes longer than a second becomes useless.


I already thought about this a few months ago, and my conclusion back then
was that it is very difficult to achieve if we want both good performance
and good security. This is a little off-topic, but the idea would consist in
having two identical kernel text segments mapped at different addresses. Only
one kernel is active at a time. Every once in a while we randomize the other
kernel, wait for interrupts to happen in the currently running lwps, and
migrate these lwps to the new kernel, dropping refcounts along the way. When
it reaches zero, everybody uses the new kernel, and we unmap the previous
one. And we keep jumping between kernels this way regularly. I also had other
magic tricks for .data and .rodata, but that's another debate.

Le 28/03/2017 à 20:37, Taylor R Campbell a écrit :

Put barriers in the way of legitimate applications to thwart
hypothetical attackers who will... step around them and use another
time source, of which there are many options in the system?  This
sounds more like cutting off the nose to spite the face than a good
mitigation against real attacks.


Kind of, but so far it is the only viable fix. Other mitigations exist, such
as flushing caches on each context switch, preventing two threads in the same
process from being scheduled together on a single cpu, detecting intentional
races that betray a side-channel attack, etc., but each of these seems
complicated to implement and may drastically impact performance, while not
being particularly more efficient than just restricting rdtsc.

As Paul said the real solution would be to change the "algorithm", but it is
up to the vendors to do so. There's nothing else I can tell you here.


Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread Thomas Klausner
On Tue, Mar 28, 2017 at 10:36:52PM +0200, Maxime Villard wrote:
> I already thought about this a few months ago, and my conclusion back then
> was that it is very difficult to achieve if we want both good performance
> and good security. This is a little off-topic, but the idea would consist in
> having two identical kernel text segments mapped at different addresses. Only
> one kernel is active at a time. Every once in a while we randomize the other
> kernel, wait for interrupts to happen in the currently running lwps, and
> migrate these lwps to the new kernel, dropping refcounts along the way. When
> it reaches zero, everybody uses the new kernel, and we unmap the previous
> one. And we keep jumping between kernels this way regularly. I also had other
> magic tricks for .data and .rodata, but that's another debate.

This would be a step in the direction of allowing updating running
kernels, wouldn't it?
 Thomas


Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread Mouse
>> Hence also my question about changing the kernel's location at
>> runtime.  If the address space base changes every second, say, any
>> technique to discover it that takes longer than a second becomes
>> useless.
> I already thought about this a few months ago, and my conclusion back
> then was that it is very difficult to achieve if we want both good
> performance and good security.

I suspect you aren't being imaginative enough. :-)

If I were to do this, I would first make (or arrange for) compiler
options so that all memory references - both data references and
jump/call targets - are done relative to a base register.  The kernel
is built that way.  Then, whenever we want to, we (a) fiddle the MMU
and (b) change that register.  Instant kernel relocation!

Of course, that register is not exposed to userland.  Syscalls and
interrupts need a little more shim code than they have now, but not by
very much.

I'm not sure it's actually workable.  But it sounds plausible enough
that I wouldn't discard it without trying it (or discussing it with
someone who has).  The memory access stuff might impose too much
performance penalty, but that too I wouldn't assume without testing.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread Alexander Nasonov
Maxime Villard wrote:
> Having read several papers on the exploitation of cache latency to defeat
> aslr (kernel or not), it appears that disabling the rdtsc instruction is a
> good mitigation on x86. However, some applications can legitimately use it,
> so I would rather suggest restricting it to root instead.

Why does root need it? For ntp? Properly implemented ntp should be
privsep'ed.

I think this should be either all-or-nothing. You either have rdtsc as
a time source or you don't. Similar for rdpmc (and other performance
counters).

-- 
Alex


Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread Thor Lancelot Simon
On Tue, Mar 28, 2017 at 04:58:58PM +0200, Maxime Villard wrote:
> Having read several papers on the exploitation of cache latency to defeat
> aslr (kernel or not), it appears that disabling the rdtsc instruction is a
> good mitigation on x86. However, some applications can legitimately use it,
> so I would rather suggest restricting it to root instead.

This will break a ton of stuff.  Code all over the place checks if it's
on x86 and uses rdtsc when it wants a quick timestamp.

Thor


Re: Restricting rdtsc [was: kernel aslr]

2017-03-30 Thread Steffen Nurpmeso
Maxime Villard  wrote:
 |Having read several papers on the exploitation of cache latency to defeat
 |aslr (kernel or not), it appears that disabling the rdtsc instruction is a
 |good mitigation on x86. However, some applications can legitimately use it,
 |so I would rather suggest restricting it to root instead.

I have used it for random noise in user space.  I don't want to
paste it, it is so ridiculous…, but then again a nice example of
user space horror – you may skip the rest at your will.

 |The idea is simple: we set CR4_TSD in %cr4, the first time an application
 |uses rdtsc it faults, we look at the creds of the lwp, if it is root we

I used it to add noise to my ARC4 random generator upon ()()/call()
time, as in

// strong (noisy) generator?
if(m_d.flags & f_strong) {
#if(__HAVE_RAND_CRYPTOHW)
if(__RAND_CRYPTOHW_OK) {
ret = ::__sf_sys_misc_rand_Strong();
goto jout;
} else
#endif
addNoise();
}


where this was

#if(__HAVE_RAND_CRYPTOHW)
if(!m_d.enpy)
goto jout;
#endif
#if(!__HAVE_RAND_NOISE)
ep.now().setSecond(ep.second() ^ ep.microsecond())
.setMicrosecond(_WEAK(ep.microsecond()));
addNoise(ep.tv(), szof(Epoch::TimeVal));
#else   
x = ::__sf_sys_misc_rand_Noise();
stack[0] = x;
x = _WEAK(x);
stack[1] = x;
addNoise(stack, szof(stack));
#endif
#if(__HAVE_RAND_CRYPTOHW)
jout:
#endif  

and that with args did a loop that used "random" bytes of the
given "stack" as noise additions to the internal entropy (and
doing one ARC4 stir after each addition).

 |What about this?

No longer of any value, it seems to me.

--steffen


Re: Restricting rdtsc [was: kernel aslr]

2017-03-31 Thread Andreas Gustafsson
Maxime Villard wrote:
> Having read several papers on the exploitation of cache latency to defeat
> aslr (kernel or not), it appears that disabling the rdtsc instruction is a
> good mitigation on x86. However, some applications can legitimately use it,
> so I would rather suggest restricting it to root instead.

It's ASLR that's broken, not rdtsc, and I strongly object to
restricting the latter just to that people can continue to gain
a false sense of security from the former.
-- 
Andreas Gustafsson, g...@gson.org


Re: Restricting rdtsc [was: kernel aslr]

2017-04-04 Thread Maxime Villard

sorry for the delay

Le 31/03/2017 à 19:23, Andreas Gustafsson a écrit :

It's ASLR that's broken, not rdtsc, and I strongly object to
restricting the latter just to that people can continue to gain
a false sense of security from the former.


For your information, side-channels are not only limited to aslr. It has
been demonstrated too that cache latencies can be used to keylog a privileged
process, and to steal cryptographic keys.

Le 28/03/2017 à 23:17, Mouse a écrit :

I suspect you aren't being imaginative enough. :-)

If I were to do this, I would first make (or arrange for) compiler
options so that all memory references - both data references and
jump/call targets - are done relative to a base register.  The kernel
is built that way.  Then, whenever we want to, we (a) fiddle the MMU
and (b) change that register.  Instant kernel relocation!


This is not part of being imaginative, it is called segmentation, and it was
abandoned twenty years ago. The principle was exactly what you are describing:
each memory reference would be indexed by a segment register (%cs %ds etc),
that had a base address and limit. Originally it was used to provide privilege
separation; later, when paging was introduced, each operating system switched
to the flat-memory model (base = 0, limit = maximum). Now, the amd64 hardware
does not enforce the base and limit on several segment registers.

If you remember the USER_LDT thread from a few weeks ago, the problem with our
amd64 is precisely that we expect segment registers to be useless, and therefore
we don't allow them to have different values than the static flat-memory ones.
The problem is, under netbsd32+USER_LDT, segment registers are useful.

Even if they were still available on amd64, there would be a number of different
workarounds to apply in order to maintain consistency with paging, which in
short may have an important performance impact.

Apart from segment registers, you won't find a simple way of indexing memory
references; so no, this is not workable.

Le 29/03/2017 à 01:17, Thor Lancelot Simon a écrit :

This will break a ton of stuff.  Code all over the place checks if it's
on x86 and uses rdtsc when it wants a quick timestamp.


There is no code "all over the place" that uses rdtsc; just look at NXR, we
don't have any program in base that uses it. It is like RWX pages and
pax_mprotect, or even mapping NULL; some programs need to be special-cased, but
apart from that it does not create a huge mess.

Le 29/03/2017 à 00:49, Alexander Nasonov a écrit :

I think this should be either all-or-nothing. You either have rdtsc as
a time source or you don't. Similar for rdpmc (and other performance
counters).


Well, the idea was to make the availability more fine-grained.


Seeing the general skepticism that prevails, I guess we can just forget about
this idea.


Re: Restricting rdtsc [was: kernel aslr]

2017-04-04 Thread Thor Lancelot Simon
On Tue, Apr 04, 2017 at 05:39:35PM +0200, Maxime Villard wrote:
> sorry for the delay
> 
> Le 31/03/2017 ? 19:23, Andreas Gustafsson a ?crit :
> > It's ASLR that's broken, not rdtsc, and I strongly object to
> > restricting the latter just to that people can continue to gain
> > a false sense of security from the former.
> 
> For your information, side-channels are not only limited to aslr. It has
> been demonstrated too that cache latencies can be used to keylog a privileged
> process, and to steal cryptographic keys.

Time is a basic operating system service.  Lack of cheap precision time is
not an _advantage_ of NetBSD; it is a disadvantage.

As others have noted, our general intention has been to _reduce_ the cost
to an applicaiton of obtaining timestamps in general -- by providing a
commpage with a base value, and allowing libc to use the cycle counter
as a no-system-calls-required way to obtain an offset.  Other operating
systems do this and it is a real advantage for many applicaitons.  If we
block userland access to the cycle counters, this is a nonstarter.

Yes, the ability of malicious code to measure the behavior of critical
system components and facilities is a problem, but I tend to believe the
solution has to be in the implementation of those components and facilities,
not in removing the ability of non-malicious code to make precision
measurements.

We may not have applications in base that use use rdtsc to get quick
timestamps, but they're sure out there.  OpenSSL's MD code used to
use it -- has that changed? -- and I've seen it in database applications,
language runtimes, and numerous other places.  I really don't think it
would be a good idea to cause it to not work in the general case.

Thor


Re: Restricting rdtsc [was: kernel aslr]

2017-04-04 Thread Alexander Nasonov
Maxime Villard wrote:
> Le 29/03/2017 ? 00:49, Alexander Nasonov a ?crit :
> > I think this should be either all-or-nothing. You either have rdtsc as
> > a time source or you don't. Similar for rdpmc (and other performance
> > counters).
> 
> Well, the idea was to make the availability more fine-grained.
> 
> 
> Seeing the general skepticism that prevails, I guess we can just forget about
> this idea.

There are two more or less independent things: fine-grained time source
and userspace rdtsc. The latter is often used directly when vdso isn't
available. If we implement vdso, I assume that software that needs rdtsc
can be taught to call it via vdso.

With vdso implemented, we can have a flag that enables/disables
vdso globally as well as per process (paxctl?). Independetly,
the kernel can be configured to use either fine-grained or hackerproof
time source for regular (non-vdso) system calls.

Alex


Re: Restricting rdtsc [was: kernel aslr]

2018-01-05 Thread Alexander Nasonov
Taylor R Campbell wrote:
> > Date: Tue, 28 Mar 2017 16:58:58 +0200
> > From: Maxime Villard 
> > 
> > Having read several papers on the exploitation of cache latency to defeat
> > aslr (kernel or not), it appears that disabling the rdtsc instruction is a
> > good mitigation on x86. However, some applications can legitimately use it,
> > so I would rather suggest restricting it to root instead.
> 
> Put barriers in the way of legitimate applications to thwart
> hypothetical attackers who will... step around them and use another
> time source, of which there are many options in the system?  This
> sounds more like cutting off the nose to spite the face than a good
> mitigation against real attacks.

Old thread but the authors of the spectre paper did exactly what Taylor said:

https://spectreattack.com/spectre.pdf

"JavaScript does not provide access to the rdtscp instruction, and
Chrome intentionally degrades the accuracy of its high-resolution
timer to dissuade timing attacks using performance.now() [1]. However,
the Web Workers feature of HTML5 makes it simple to create a separate
thread that repeatedly decrements a value in a shared memory location
[18, 32]. This approach yielded a high-resolution timer that provided
sufficient resolution."

-- 
Alex