On Thu, Aug 11, 2011 at 19:06, BGB <cr88...@gmail.com
<mailto:cr88...@gmail.com>> wrote:
On 8/11/2011 12:55 PM, David Barbour wrote:
On Wed, Aug 10, 2011 at 7:35 PM, BGB <cr88...@gmail.com
<mailto:cr88...@gmail.com>> wrote:
not all code may be from trusted sources.
consider, say, code comes from the internet.
what is a "good" way of enforcing security in such a case?
Object capability security is probably the very best approach
available today - in terms of a wide variety of criterion such
as flexibility, performance, precision, visibility, awareness,
simplicity, and usability.
In this model, ability to send a message to an object is
sufficient proof that you have rights to use it - there are no
passwords, no permissions checks, etc. The security discipline
involves controlling whom has access to which objects - i.e.
there are a number of patterns, such as 'revocable forwarders',
where you'll provide an intermediate object that allows you to
audit and control access to another object. You can read about
several of these patterns on the erights wiki [1].
the big problem though:
to try to implement this as a sole security model, and expecting
it to be effective, would likely impact language design and
programming strategy, and possibly lead to a fair amount of effort
WRT "hole plugging" in an existing project.
granted, code will probably not use logins/passwords for
authority, as this would likely be horridly ineffective for code
(about as soon as a piece of malware knows the login used by a
piece of "trusted" code, it can spoof as the code and do whatever
it wants).
"digital signing" is another possible strategy, but poses a
similar problem:
how to effectively prevent spoofing (say, one manages to "extract"
the key from a trusted app, and then signs a piece of malware with
it).
AFAICT, the usual strategy used with SSL certificates is that they
may expire and are checked against a "certificate authority".
although maybe reasonably effective for the internet, this seems
to be a fairly complex and heavy-weight approach (not ideal for
software, especially not FOSS, as most such authorities want money
and require signing individual binaries, ...).
my current thinking is roughly along the line that each piece of
code will be given a "fingerprint" (possibly an MD5 or SHA hash),
and this fingerprint is either known good to the VM itself (for
example, its own code, or code that is part of the host
application), or may be confirmed as "trusted" by the user (if it
requires special access, ...).
it is a little harder to spoof a hash, and tampering with a piece
of code will change its hash (although with simpler hashes, such
as checksums and CRC's, it is often possible to use a glob of
"garbage bytes" to trick the checksum algorithm into giving the
desired value).
yes, there is still always the risk of a naive user confirming a
piece of malware, but this is their own problem at this point.
Access to FFI and such would be regulated through objects. This
leaves the issue of deciding: how do we decide which objects
untrusted code should get access to? Disabling all of FFI is
often too extreme.
potentially.
my current thinking is, granted, that it will disable access to
the "FFI access object" (internally called "ctop" in my VM), which
would disable the ability to fetch new functions/... from the FFI
(or perform "native import" operations with the current
implementation).
however, if retrieved functions are still accessible, it might be
possible to retrieve them indirectly and then make them visible
this way.
as noted in another message:
native import C.math;
var mathobj={sin: sin, cos: cos, tan: tan, ...};
giving access to "mathobj" will still allow access to these
functions, without necessarily giving access to "the entire C
toplevel", which poses a much bigger security risk.
sadly, there is no real good way to safely "streamline" this in
the current implementation.
My current design: FFI is a network of registries. Plugins and
services publish FFI objects (modules) to these registries.
Different registries are associated with different security
levels, and there might be connections between them based on
relative trust and security. A single FFI plugin might provide
similar objects at multiple security levels - e.g. access to HTTP
service might be provided at a low security level for remote
addresses, but at a high security level that allows for local
(127, 192.168, 10.0.0, etc.) addresses. One reason to favor
plugin-based FFI is that it is easy to develop security policy
for high-level features compared to low-level capabilities. (E.g.
access to generic 'local storage' is lower security level than
access to 'filesystem'.)
my FFI is based on bulk importing the contents of C headers.
although fairly powerful and convenient, "securing" such a beast
is likely to be a bit of a problem.
easier just to be like "code which isn't trusted can't directly
use the FFI...".
Other than security, my design is to solve other difficult
problems involving code migration [2], multi-process and
distributed extensibility (easy to publish modules to registries
even from other processes or servers; similar to web-server CGI),
smooth transitions from legacy, extreme resilience and
self-healing (multiple fallbacks per FFI dependency), and
policy&configuration management [3].
[1] http://wiki.erights.org/wiki/Walnut/Secure_Distributed_Computing
[2] http://wiki.erights.org/wiki/Unum
[3] http://c2.com/cgi/wiki?PolicyInjection
I had done code migration in the past, but sadly my VM's haven't
had this feature in a fairly long time (many years).
even then, it had a few ugly problems:
the migration essentially involved transparently sending the AST,
and recompiling it on the other end. a result of this was that
closures would tend to loose the "identity" of their lexical scope.
...
over a socket, it had used a model where many data types
(lists/...) were essentially passed as copies;
things like builtin and native functions simply bound against
their analogues on the other end (code in C land was unique to
each node);
objects were "mirrored" with an asynchronous consistency model
(altering an object would send slot-change messages to the other
nodes which held copies);
other object types were passed-by-handle (basically, it identifies
the NodeID and ObjectID for a remote object);
...
some later ideas (for reviving the above) had involved the idea of
using essentially mirroring a virtual heap over the network (using
a system similar to "far pointers" and "segmented addressing"),
but this would have introduced many nasty problems, and this
didn't go anywhere.
if I ever do get around to re-implementing something like this, I
will probably use a variation of my original strategy, except that
I would probably leave objects as being remotely accessed via
handles, rather than trying to mirror them and keep them in sync
(or, if mirroring is used, effectively using a "synchronized
write" strategy of some sort...).
the second thing seems to be the option of moving the code to
a local toplevel where its ability to see certain things is
severely limited.
Yes, this is equivalent to controlling which 'capabilities' are
available in a given context. Unfortunately, developers lack
'awareness' - i.e. it is not explicit in code that certain
capabilities are needed by a given library, so failures occur
much later when the library is actually loaded. This is part of
why I eventually abandoned dynamic scopes (where 'dynamic scope'
would include the toplevel [4]).
"dynamic scope" in my case refers to something very different.
I generally call the objects+delegation model "object scope",
which is the main model used by the toplevel.
it differs some for import:
by default, "import" actually exists in terms of the lexical scope
(it is internally a delegate lexical variable);
potentially confusingly, for "delegate import" the import is
actually placed into the object scope (directly into the
containing package or toplevel object), which is part of the
reason for its unique semantics.
say (at the toplevel):
extern delegate import foo.bar;
actually does something roughly similar to:
load("foo/bar.bs <http://bar.bs>"); //not exactly, but it is a
similar idea...
delegate var #'foo/bar'=#:"foo/bar"; //sort of...
in turn invoking more funky semantics in the VM.
note: #'...' and #:"..." is basically syntax for allowing
identifiers and keywords containing otherwise invalid characters
(characters invalid for identifiers).
[4] http://c2.com/cgi/wiki?ExplicitManagementOfImplicitContext
ok.
simply disabling compiler features may not be sufficient
It is also a bad idea. You end up with 2^N languages for N
switches. That's hell to test and verify. Libraries developed for
different sets of switches will consequently prove buggy when
people try to compose them. This is even more documentation to
manage.
it depends on the nature of the features and their impact on the
language.
if trying to use a feature simply makes code using it invalid
("sorry, I can't let you do that"), this works.
if it leaves the code still valid but with different semantics, or
enabling a feature changes the semantics of code written with it
disabled, well, this is a bit more ugly...
but, yes, sadly, I am already having enough issues with seemingly
endless undocumented/forgotten features, and features which were
mostly implemented but are subtly broken (for example, me earlier
fixing a feature which existed in the parser/compiler, but
depended on an opcode which for whatever reason was absent from
the bytecode interpreter, ...).
but, with a language/VM existing for approx 8 years and with ~ 540
opcodes, ... I guess things like this are inevitable.
anything still visible may be tampered with, for example,
suppose a global package is made visible in the new toplevel,
and the untrusted code decides to define functions in a
system package, essentially overwriting the existing functions
Indeed. Almost every language built for security makes heavy use
of immutable objects. They're easier to reason about. For
example, rather than replacing the function in the package, you
would be forced to create a new record that is the same as the
old one but replaces one of the functions.
Access to mutable state is more tightly controlled - i.e. an
explicit capability to inject a new stage in a pipeline, rather
than implicit access to a variable. We don't lose any
flexibility, but the 'path of least resistance' is much more secure.
yes, but this isn't as ideal in a pre-existing language where
nearly everything is highly mutable.
in this case, creation of security may involve... "write
protecting" things...
a basic security mechanism then is that, by default, most
non-owned objects will be marked read-only.
an exposed API function may indirectly give untrusted code
"unexpected levels of power" if it, by default, has
unhindered access to the system, placing additional burden on
library code not to perform operations which may be exploitable
This is why whitelisting, rather than blacklisting, should be the
rule for security.
but whitelisting is potentially much more effort than
blacklisting, even if potentially somewhat better from a security
perspective.
assigning through a delegated object may in-turn move up and
assign the variable in a delegated-to object (at the VM level
there are multiple assignment operators to address these
different cases, namely which object will have a variable set
in...).
The security problem isn't delegation, but rather the fact that
this chaining is 'implicit' so developers easily forget about it
and thus leave security holes.
A library of security patterns could help out. E.g. you could
ensure your revocable forwarders and facet-pattern constructors
also provide barriers against propagation of assignment.
potentially, or use cloning rather than delegation chaining
(however, in my VM, it is only possible to clone from a single
object, whereas one may do N-way delegation, making delegation
generally more convenient for building the toplevel).
my current thinking is that basically assignment delegation will
stop once an object is hit which is read-only, forcing the
assignment into a "nearer" object. trying to force-assign into a
read-only object will result in an exception or similar.
in general though, trying to assign top-level bindings (which are
generally things like API functions) may be a bad practice in
general.
could a variation of, say, the Unix security model, be
applied at the VM level?
Within the VM, this has been done before, e.g. Java introduced
thread capabilities. But the Unix security model is neither
simple nor flexible nor efficient, especially for fine-grained
delegation. I cannot recommend it. But if you do pursue this
route: it has been done before, and there's a lot of material you
can learn from. Look up LambdaMoo, for example.
LambdaMoo found a MUD, if this is what was in question...
I partly patched it on last-night, and the performance overhead
should be "modest" in the common case.
as for "simple" or "efficient", a Unix-style security model
doesn't look all that bad. at least I am not looking at
implementing ACLs or a Windows-style security model, which would
be a fair amount more complex and slower (absent static checking
and optimization).
luckily, there are only a relatively small number of places I
really need to put in security checks (mostly in the object system
and similar). most of the rest of the typesystem or VM doesn't
really need them.
or such...
_______________________________________________
fonc mailing list
fonc@vpri.org <mailto:fonc@vpri.org>
http://vpri.org/mailman/listinfo/fonc
_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc