On 8/11/2011 12:55 PM, David Barbour wrote:
On Wed, Aug 10, 2011 at 7:35 PM, BGB <cr88...@gmail.com
<mailto:cr88...@gmail.com>> wrote:
not all code may be from trusted sources.
consider, say, code comes from the internet.
what is a "good" way of enforcing security in such a case?
Object capability security is probably the very best approach
available today - in terms of a wide variety of criterion such
as flexibility, performance, precision, visibility, awareness,
simplicity, and usability.
In this model, ability to send a message to an object is sufficient
proof that you have rights to use it - there are no passwords, no
permissions checks, etc. The security discipline involves controlling
whom has access to which objects - i.e. there are a number of
patterns, such as 'revocable forwarders', where you'll provide an
intermediate object that allows you to audit and control access to
another object. You can read about several of these patterns on the
erights wiki [1].
the big problem though:
to try to implement this as a sole security model, and expecting it to
be effective, would likely impact language design and programming
strategy, and possibly lead to a fair amount of effort WRT "hole
plugging" in an existing project.
granted, code will probably not use logins/passwords for authority, as
this would likely be horridly ineffective for code (about as soon as a
piece of malware knows the login used by a piece of "trusted" code, it
can spoof as the code and do whatever it wants).
"digital signing" is another possible strategy, but poses a similar problem:
how to effectively prevent spoofing (say, one manages to "extract" the
key from a trusted app, and then signs a piece of malware with it).
AFAICT, the usual strategy used with SSL certificates is that they may
expire and are checked against a "certificate authority". although maybe
reasonably effective for the internet, this seems to be a fairly complex
and heavy-weight approach (not ideal for software, especially not FOSS,
as most such authorities want money and require signing individual
binaries, ...).
my current thinking is roughly along the line that each piece of code
will be given a "fingerprint" (possibly an MD5 or SHA hash), and this
fingerprint is either known good to the VM itself (for example, its own
code, or code that is part of the host application), or may be confirmed
as "trusted" by the user (if it requires special access, ...).
it is a little harder to spoof a hash, and tampering with a piece of
code will change its hash (although with simpler hashes, such as
checksums and CRC's, it is often possible to use a glob of "garbage
bytes" to trick the checksum algorithm into giving the desired value).
yes, there is still always the risk of a naive user confirming a piece
of malware, but this is their own problem at this point.
Access to FFI and such would be regulated through objects. This leaves
the issue of deciding: how do we decide which objects untrusted code
should get access to? Disabling all of FFI is often too extreme.
potentially.
my current thinking is, granted, that it will disable access to the "FFI
access object" (internally called "ctop" in my VM), which would disable
the ability to fetch new functions/... from the FFI (or perform "native
import" operations with the current implementation).
however, if retrieved functions are still accessible, it might be
possible to retrieve them indirectly and then make them visible this way.
as noted in another message:
native import C.math;
var mathobj={sin: sin, cos: cos, tan: tan, ...};
giving access to "mathobj" will still allow access to these functions,
without necessarily giving access to "the entire C toplevel", which
poses a much bigger security risk.
sadly, there is no real good way to safely "streamline" this in the
current implementation.
My current design: FFI is a network of registries. Plugins and
services publish FFI objects (modules) to these registries. Different
registries are associated with different security levels, and there
might be connections between them based on relative trust and
security. A single FFI plugin might provide similar objects at
multiple security levels - e.g. access to HTTP service might be
provided at a low security level for remote addresses, but at a high
security level that allows for local (127, 192.168, 10.0.0, etc.)
addresses. One reason to favor plugin-based FFI is that it is easy to
develop security policy for high-level features compared to low-level
capabilities. (E.g. access to generic 'local storage' is lower
security level than access to 'filesystem'.)
my FFI is based on bulk importing the contents of C headers.
although fairly powerful and convenient, "securing" such a beast is
likely to be a bit of a problem.
easier just to be like "code which isn't trusted can't directly use the
FFI...".
Other than security, my design is to solve other difficult problems
involving code migration [2], multi-process and distributed
extensibility (easy to publish modules to registries even from other
processes or servers; similar to web-server CGI), smooth transitions
from legacy, extreme resilience and self-healing (multiple fallbacks
per FFI dependency), and policy&configuration management [3].
[1] http://wiki.erights.org/wiki/Walnut/Secure_Distributed_Computing
[2] http://wiki.erights.org/wiki/Unum
[3] http://c2.com/cgi/wiki?PolicyInjection
I had done code migration in the past, but sadly my VM's haven't had
this feature in a fairly long time (many years).
even then, it had a few ugly problems:
the migration essentially involved transparently sending the AST, and
recompiling it on the other end. a result of this was that closures
would tend to loose the "identity" of their lexical scope.
...
over a socket, it had used a model where many data types (lists/...)
were essentially passed as copies;
things like builtin and native functions simply bound against their
analogues on the other end (code in C land was unique to each node);
objects were "mirrored" with an asynchronous consistency model (altering
an object would send slot-change messages to the other nodes which held
copies);
other object types were passed-by-handle (basically, it identifies the
NodeID and ObjectID for a remote object);
...
some later ideas (for reviving the above) had involved the idea of using
essentially mirroring a virtual heap over the network (using a system
similar to "far pointers" and "segmented addressing"), but this would
have introduced many nasty problems, and this didn't go anywhere.
if I ever do get around to re-implementing something like this, I will
probably use a variation of my original strategy, except that I would
probably leave objects as being remotely accessed via handles, rather
than trying to mirror them and keep them in sync (or, if mirroring is
used, effectively using a "synchronized write" strategy of some sort...).
the second thing seems to be the option of moving the code to a
local toplevel where its ability to see certain things is severely
limited.
Yes, this is equivalent to controlling which 'capabilities' are
available in a given context. Unfortunately, developers lack
'awareness' - i.e. it is not explicit in code that certain
capabilities are needed by a given library, so failures occur much
later when the library is actually loaded. This is part of why I
eventually abandoned dynamic scopes (where 'dynamic scope' would
include the toplevel [4]).
"dynamic scope" in my case refers to something very different.
I generally call the objects+delegation model "object scope", which is
the main model used by the toplevel.
it differs some for import:
by default, "import" actually exists in terms of the lexical scope (it
is internally a delegate lexical variable);
potentially confusingly, for "delegate import" the import is actually
placed into the object scope (directly into the containing package or
toplevel object), which is part of the reason for its unique semantics.
say (at the toplevel):
extern delegate import foo.bar;
actually does something roughly similar to:
load("foo/bar.bs"); //not exactly, but it is a similar idea...
delegate var #'foo/bar'=#:"foo/bar"; //sort of...
in turn invoking more funky semantics in the VM.
note: #'...' and #:"..." is basically syntax for allowing identifiers
and keywords containing otherwise invalid characters (characters invalid
for identifiers).
[4] http://c2.com/cgi/wiki?ExplicitManagementOfImplicitContext
ok.
simply disabling compiler features may not be sufficient
It is also a bad idea. You end up with 2^N languages for N switches.
That's hell to test and verify. Libraries developed for different sets
of switches will consequently prove buggy when people try to compose
them. This is even more documentation to manage.
it depends on the nature of the features and their impact on the language.
if trying to use a feature simply makes code using it invalid ("sorry, I
can't let you do that"), this works.
if it leaves the code still valid but with different semantics, or
enabling a feature changes the semantics of code written with it
disabled, well, this is a bit more ugly...
but, yes, sadly, I am already having enough issues with seemingly
endless undocumented/forgotten features, and features which were mostly
implemented but are subtly broken (for example, me earlier fixing a
feature which existed in the parser/compiler, but depended on an opcode
which for whatever reason was absent from the bytecode interpreter, ...).
but, with a language/VM existing for approx 8 years and with ~ 540
opcodes, ... I guess things like this are inevitable.
anything still visible may be tampered with, for example, suppose
a global package is made visible in the new toplevel, and the
untrusted code decides to define functions in a system package,
essentially overwriting the existing functions
Indeed. Almost every language built for security makes heavy use of
immutable objects. They're easier to reason about. For example, rather
than replacing the function in the package, you would be forced to
create a new record that is the same as the old one but replaces one
of the functions.
Access to mutable state is more tightly controlled - i.e. an explicit
capability to inject a new stage in a pipeline, rather than implicit
access to a variable. We don't lose any flexibility, but the 'path of
least resistance' is much more secure.
yes, but this isn't as ideal in a pre-existing language where nearly
everything is highly mutable.
in this case, creation of security may involve... "write protecting"
things...
a basic security mechanism then is that, by default, most non-owned
objects will be marked read-only.
an exposed API function may indirectly give untrusted code
"unexpected levels of power" if it, by default, has unhindered
access to the system, placing additional burden on library code
not to perform operations which may be exploitable
This is why whitelisting, rather than blacklisting, should be the rule
for security.
but whitelisting is potentially much more effort than blacklisting, even
if potentially somewhat better from a security perspective.
assigning through a delegated object may in-turn move up and
assign the variable in a delegated-to object (at the VM level
there are multiple assignment operators to address these different
cases, namely which object will have a variable set in...).
The security problem isn't delegation, but rather the fact that this
chaining is 'implicit' so developers easily forget about it and thus
leave security holes.
A library of security patterns could help out. E.g. you could ensure
your revocable forwarders and facet-pattern constructors also provide
barriers against propagation of assignment.
potentially, or use cloning rather than delegation chaining (however, in
my VM, it is only possible to clone from a single object, whereas one
may do N-way delegation, making delegation generally more convenient for
building the toplevel).
my current thinking is that basically assignment delegation will stop
once an object is hit which is read-only, forcing the assignment into a
"nearer" object. trying to force-assign into a read-only object will
result in an exception or similar.
in general though, trying to assign top-level bindings (which are
generally things like API functions) may be a bad practice in general.
could a variation of, say, the Unix security model, be applied at
the VM level?
Within the VM, this has been done before, e.g. Java introduced thread
capabilities. But the Unix security model is neither simple nor
flexible nor efficient, especially for fine-grained delegation. I
cannot recommend it. But if you do pursue this route: it has been done
before, and there's a lot of material you can learn from. Look up
LambdaMoo, for example.
LambdaMoo found a MUD, if this is what was in question...
I partly patched it on last-night, and the performance overhead should
be "modest" in the common case.
as for "simple" or "efficient", a Unix-style security model doesn't look
all that bad. at least I am not looking at implementing ACLs or a
Windows-style security model, which would be a fair amount more complex
and slower (absent static checking and optimization).
luckily, there are only a relatively small number of places I really
need to put in security checks (mostly in the object system and
similar). most of the rest of the typesystem or VM doesn't really need them.
or such...
_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc