On 8/11/2011 12:55 PM, David Barbour wrote:
On Wed, Aug 10, 2011 at 7:35 PM, BGB <cr88...@gmail.com <mailto:cr88...@gmail.com>> wrote:

    not all code may be from trusted sources.
    consider, say, code comes from the internet.

    what is a "good" way of enforcing security in such a case?


Object capability security is probably the very best approach available today - in terms of a wide variety of criterion such as flexibility, performance, precision, visibility, awareness, simplicity, and usability.

In this model, ability to send a message to an object is sufficient proof that you have rights to use it - there are no passwords, no permissions checks, etc. The security discipline involves controlling whom has access to which objects - i.e. there are a number of patterns, such as 'revocable forwarders', where you'll provide an intermediate object that allows you to audit and control access to another object. You can read about several of these patterns on the erights wiki [1].


the big problem though:
to try to implement this as a sole security model, and expecting it to be effective, would likely impact language design and programming strategy, and possibly lead to a fair amount of effort WRT "hole plugging" in an existing project.

granted, code will probably not use logins/passwords for authority, as this would likely be horridly ineffective for code (about as soon as a piece of malware knows the login used by a piece of "trusted" code, it can spoof as the code and do whatever it wants).

"digital signing" is another possible strategy, but poses a similar problem:
how to effectively prevent spoofing (say, one manages to "extract" the key from a trusted app, and then signs a piece of malware with it).

AFAICT, the usual strategy used with SSL certificates is that they may expire and are checked against a "certificate authority". although maybe reasonably effective for the internet, this seems to be a fairly complex and heavy-weight approach (not ideal for software, especially not FOSS, as most such authorities want money and require signing individual binaries, ...).

my current thinking is roughly along the line that each piece of code will be given a "fingerprint" (possibly an MD5 or SHA hash), and this fingerprint is either known good to the VM itself (for example, its own code, or code that is part of the host application), or may be confirmed as "trusted" by the user (if it requires special access, ...).

it is a little harder to spoof a hash, and tampering with a piece of code will change its hash (although with simpler hashes, such as checksums and CRC's, it is often possible to use a glob of "garbage bytes" to trick the checksum algorithm into giving the desired value).

yes, there is still always the risk of a naive user confirming a piece of malware, but this is their own problem at this point.


Access to FFI and such would be regulated through objects. This leaves the issue of deciding: how do we decide which objects untrusted code should get access to? Disabling all of FFI is often too extreme.


potentially.
my current thinking is, granted, that it will disable access to the "FFI access object" (internally called "ctop" in my VM), which would disable the ability to fetch new functions/... from the FFI (or perform "native import" operations with the current implementation).

however, if retrieved functions are still accessible, it might be possible to retrieve them indirectly and then make them visible this way.

as noted in another message:
native import C.math;
var mathobj={sin: sin, cos: cos, tan: tan, ...};

giving access to "mathobj" will still allow access to these functions, without necessarily giving access to "the entire C toplevel", which poses a much bigger security risk.

sadly, there is no real good way to safely "streamline" this in the current implementation.


My current design: FFI is a network of registries. Plugins and services publish FFI objects (modules) to these registries. Different registries are associated with different security levels, and there might be connections between them based on relative trust and security. A single FFI plugin might provide similar objects at multiple security levels - e.g. access to HTTP service might be provided at a low security level for remote addresses, but at a high security level that allows for local (127, 192.168, 10.0.0, etc.) addresses. One reason to favor plugin-based FFI is that it is easy to develop security policy for high-level features compared to low-level capabilities. (E.g. access to generic 'local storage' is lower security level than access to 'filesystem'.)

my FFI is based on bulk importing the contents of C headers.

although fairly powerful and convenient, "securing" such a beast is likely to be a bit of a problem.

easier just to be like "code which isn't trusted can't directly use the FFI...".


Other than security, my design is to solve other difficult problems involving code migration [2], multi-process and distributed extensibility (easy to publish modules to registries even from other processes or servers; similar to web-server CGI), smooth transitions from legacy, extreme resilience and self-healing (multiple fallbacks per FFI dependency), and policy&configuration management [3].

[1] http://wiki.erights.org/wiki/Walnut/Secure_Distributed_Computing
[2] http://wiki.erights.org/wiki/Unum
[3] http://c2.com/cgi/wiki?PolicyInjection

I had done code migration in the past, but sadly my VM's haven't had this feature in a fairly long time (many years).

even then, it had a few ugly problems:
the migration essentially involved transparently sending the AST, and recompiling it on the other end. a result of this was that closures would tend to loose the "identity" of their lexical scope.
...

over a socket, it had used a model where many data types (lists/...) were essentially passed as copies; things like builtin and native functions simply bound against their analogues on the other end (code in C land was unique to each node); objects were "mirrored" with an asynchronous consistency model (altering an object would send slot-change messages to the other nodes which held copies); other object types were passed-by-handle (basically, it identifies the NodeID and ObjectID for a remote object);
...


some later ideas (for reviving the above) had involved the idea of using essentially mirroring a virtual heap over the network (using a system similar to "far pointers" and "segmented addressing"), but this would have introduced many nasty problems, and this didn't go anywhere.

if I ever do get around to re-implementing something like this, I will probably use a variation of my original strategy, except that I would probably leave objects as being remotely accessed via handles, rather than trying to mirror them and keep them in sync (or, if mirroring is used, effectively using a "synchronized write" strategy of some sort...).



    the second thing seems to be the option of moving the code to a
    local toplevel where its ability to see certain things is severely
    limited.


Yes, this is equivalent to controlling which 'capabilities' are available in a given context. Unfortunately, developers lack 'awareness' - i.e. it is not explicit in code that certain capabilities are needed by a given library, so failures occur much later when the library is actually loaded. This is part of why I eventually abandoned dynamic scopes (where 'dynamic scope' would include the toplevel [4]).

"dynamic scope" in my case refers to something very different.
I generally call the objects+delegation model "object scope", which is the main model used by the toplevel.

it differs some for import:
by default, "import" actually exists in terms of the lexical scope (it is internally a delegate lexical variable); potentially confusingly, for "delegate import" the import is actually placed into the object scope (directly into the containing package or toplevel object), which is part of the reason for its unique semantics.

say (at the toplevel):
extern delegate import foo.bar;

actually does something roughly similar to:
load("foo/bar.bs");    //not exactly, but it is a similar idea...
delegate var #'foo/bar'=#:"foo/bar";    //sort of...
in turn invoking more funky semantics in the VM.

note: #'...' and #:"..." is basically syntax for allowing identifiers and keywords containing otherwise invalid characters (characters invalid for identifiers).


[4] http://c2.com/cgi/wiki?ExplicitManagementOfImplicitContext


ok.


    simply disabling compiler features may not be sufficient


It is also a bad idea. You end up with 2^N languages for N switches. That's hell to test and verify. Libraries developed for different sets of switches will consequently prove buggy when people try to compose them. This is even more documentation to manage.

it depends on the nature of the features and their impact on the language.

if trying to use a feature simply makes code using it invalid ("sorry, I can't let you do that"), this works. if it leaves the code still valid but with different semantics, or enabling a feature changes the semantics of code written with it disabled, well, this is a bit more ugly...


but, yes, sadly, I am already having enough issues with seemingly endless undocumented/forgotten features, and features which were mostly implemented but are subtly broken (for example, me earlier fixing a feature which existed in the parser/compiler, but depended on an opcode which for whatever reason was absent from the bytecode interpreter, ...).

but, with a language/VM existing for approx 8 years and with ~ 540 opcodes, ... I guess things like this are inevitable.



    anything still visible may be tampered with, for example, suppose
    a global package is made visible in the new toplevel, and the
    untrusted code decides to define functions in a system package,
    essentially overwriting the existing functions


Indeed. Almost every language built for security makes heavy use of immutable objects. They're easier to reason about. For example, rather than replacing the function in the package, you would be forced to create a new record that is the same as the old one but replaces one of the functions.

Access to mutable state is more tightly controlled - i.e. an explicit capability to inject a new stage in a pipeline, rather than implicit access to a variable. We don't lose any flexibility, but the 'path of least resistance' is much more secure.


yes, but this isn't as ideal in a pre-existing language where nearly everything is highly mutable. in this case, creation of security may involve... "write protecting" things...

a basic security mechanism then is that, by default, most non-owned objects will be marked read-only.


    an exposed API function may indirectly give untrusted code
    "unexpected levels of power" if it, by default, has unhindered
    access to the system, placing additional burden on library code
    not to perform operations which may be exploitable


This is why whitelisting, rather than blacklisting, should be the rule for security.


but whitelisting is potentially much more effort than blacklisting, even if potentially somewhat better from a security perspective.



    assigning through a delegated object may in-turn move up and
    assign the variable in a delegated-to object (at the VM level
    there are multiple assignment operators to address these different
    cases, namely which object will have a variable set in...).


The security problem isn't delegation, but rather the fact that this chaining is 'implicit' so developers easily forget about it and thus leave security holes.

A library of security patterns could help out. E.g. you could ensure your revocable forwarders and facet-pattern constructors also provide barriers against propagation of assignment.


potentially, or use cloning rather than delegation chaining (however, in my VM, it is only possible to clone from a single object, whereas one may do N-way delegation, making delegation generally more convenient for building the toplevel).

my current thinking is that basically assignment delegation will stop once an object is hit which is read-only, forcing the assignment into a "nearer" object. trying to force-assign into a read-only object will result in an exception or similar.

in general though, trying to assign top-level bindings (which are generally things like API functions) may be a bad practice in general.



    could a variation of, say, the Unix security model, be applied at
    the VM level?


Within the VM, this has been done before, e.g. Java introduced thread capabilities. But the Unix security model is neither simple nor flexible nor efficient, especially for fine-grained delegation. I cannot recommend it. But if you do pursue this route: it has been done before, and there's a lot of material you can learn from. Look up LambdaMoo, for example.


LambdaMoo found a MUD, if this is what was in question...

I partly patched it on last-night, and the performance overhead should be "modest" in the common case.


as for "simple" or "efficient", a Unix-style security model doesn't look all that bad. at least I am not looking at implementing ACLs or a Windows-style security model, which would be a fair amount more complex and slower (absent static checking and optimization).

luckily, there are only a relatively small number of places I really need to put in security checks (mostly in the object system and similar). most of the rest of the typesystem or VM doesn't really need them.

or such...

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Reply via email to