Re: [fonc] misc: code security model

BGB Thu, 11 Aug 2011 17:12:45 -0700

On 8/11/2011 12:55 PM, David Barbour wrote:

On Wed, Aug 10, 2011 at 7:35 PM, BGB <cr88...@gmail.com<mailto:cr88...@gmail.com>> wrote:
    not all code may be from trusted sources.
    consider, say, code comes from the internet.

    what is a "good" way of enforcing security in such a case?
Object capability security is probably the very best approachavailable today - in terms of a wide variety of criterion suchas flexibility, performance, precision, visibility, awareness,simplicity, and usability.
In this model, ability to send a message to an object is sufficientproof that you have rights to use it - there are no passwords, nopermissions checks, etc. The security discipline involves controllingwhom has access to which objects - i.e. there are a number ofpatterns, such as 'revocable forwarders', where you'll provide anintermediate object that allows you to audit and control access toanother object. You can read about several of these patterns on theerights wiki [1].


the big problem though:

to try to implement this as a sole security model, and expecting it tobe effective, would likely impact language design and programmingstrategy, and possibly lead to a fair amount of effort WRT "holeplugging" in an existing project.

granted, code will probably not use logins/passwords for authority, asthis would likely be horridly ineffective for code (about as soon as apiece of malware knows the login used by a piece of "trusted" code, itcan spoof as the code and do whatever it wants).


"digital signing" is another possible strategy, but poses a similar problem:

how to effectively prevent spoofing (say, one manages to "extract" thekey from a trusted app, and then signs a piece of malware with it).

AFAICT, the usual strategy used with SSL certificates is that they mayexpire and are checked against a "certificate authority". although maybereasonably effective for the internet, this seems to be a fairly complexand heavy-weight approach (not ideal for software, especially not FOSS,as most such authorities want money and require signing individualbinaries, ...).

my current thinking is roughly along the line that each piece of codewill be given a "fingerprint" (possibly an MD5 or SHA hash), and thisfingerprint is either known good to the VM itself (for example, its owncode, or code that is part of the host application), or may be confirmedas "trusted" by the user (if it requires special access, ...).

it is a little harder to spoof a hash, and tampering with a piece ofcode will change its hash (although with simpler hashes, such aschecksums and CRC's, it is often possible to use a glob of "garbagebytes" to trick the checksum algorithm into giving the desired value).

yes, there is still always the risk of a naive user confirming a pieceof malware, but this is their own problem at this point.

Access to FFI and such would be regulated through objects. This leavesthe issue of deciding: how do we decide which objects untrusted codeshould get access to? Disabling all of FFI is often too extreme.


potentially.

my current thinking is, granted, that it will disable access to the "FFIaccess object" (internally called "ctop" in my VM), which would disablethe ability to fetch new functions/... from the FFI (or perform "nativeimport" operations with the current implementation).

however, if retrieved functions are still accessible, it might bepossible to retrieve them indirectly and then make them visible this way.


as noted in another message:
native import C.math;
var mathobj={sin: sin, cos: cos, tan: tan, ...};

giving access to "mathobj" will still allow access to these functions,without necessarily giving access to "the entire C toplevel", whichposes a much bigger security risk.

sadly, there is no real good way to safely "streamline" this in thecurrent implementation.

My current design: FFI is a network of registries. Plugins andservices publish FFI objects (modules) to these registries. Differentregistries are associated with different security levels, and theremight be connections between them based on relative trust andsecurity. A single FFI plugin might provide similar objects atmultiple security levels - e.g. access to HTTP service might beprovided at a low security level for remote addresses, but at a highsecurity level that allows for local (127, 192.168, 10.0.0, etc.)addresses. One reason to favor plugin-based FFI is that it is easy todevelop security policy for high-level features compared to low-levelcapabilities. (E.g. access to generic 'local storage' is lowersecurity level than access to 'filesystem'.)


my FFI is based on bulk importing the contents of C headers.

although fairly powerful and convenient, "securing" such a beast islikely to be a bit of a problem.

easier just to be like "code which isn't trusted can't directly use theFFI...".

Other than security, my design is to solve other difficult problemsinvolving code migration [2], multi-process and distributedextensibility (easy to publish modules to registries even from otherprocesses or servers; similar to web-server CGI), smooth transitionsfrom legacy, extreme resilience and self-healing (multiple fallbacksper FFI dependency), and policy&configuration management [3].
[1] http://wiki.erights.org/wiki/Walnut/Secure_Distributed_Computing
[2] http://wiki.erights.org/wiki/Unum
[3] http://c2.com/cgi/wiki?PolicyInjection

I had done code migration in the past, but sadly my VM's haven't hadthis feature in a fairly long time (many years).


even then, it had a few ugly problems:

the migration essentially involved transparently sending the AST, andrecompiling it on the other end. a result of this was that closureswould tend to loose the "identity" of their lexical scope.

...

over a socket, it had used a model where many data types (lists/...)were essentially passed as copies;things like builtin and native functions simply bound against theiranalogues on the other end (code in C land was unique to each node);objects were "mirrored" with an asynchronous consistency model (alteringan object would send slot-change messages to the other nodes which heldcopies);other object types were passed-by-handle (basically, it identifies theNodeID and ObjectID for a remote object);

...

some later ideas (for reviving the above) had involved the idea of usingessentially mirroring a virtual heap over the network (using a systemsimilar to "far pointers" and "segmented addressing"), but this wouldhave introduced many nasty problems, and this didn't go anywhere.

if I ever do get around to re-implementing something like this, I willprobably use a variation of my original strategy, except that I wouldprobably leave objects as being remotely accessed via handles, ratherthan trying to mirror them and keep them in sync (or, if mirroring isused, effectively using a "synchronized write" strategy of some sort...).

    the second thing seems to be the option of moving the code to a
    local toplevel where its ability to see certain things is severely
    limited.
Yes, this is equivalent to controlling which 'capabilities' areavailable in a given context. Unfortunately, developers lack'awareness' - i.e. it is not explicit in code that certaincapabilities are needed by a given library, so failures occur muchlater when the library is actually loaded. This is part of why Ieventually abandoned dynamic scopes (where 'dynamic scope' wouldinclude the toplevel [4]).


"dynamic scope" in my case refers to something very different.

I generally call the objects+delegation model "object scope", which isthe main model used by the toplevel.


it differs some for import:

by default, "import" actually exists in terms of the lexical scope (itis internally a delegate lexical variable);potentially confusingly, for "delegate import" the import is actuallyplaced into the object scope (directly into the containing package ortoplevel object), which is part of the reason for its unique semantics.


say (at the toplevel):
extern delegate import foo.bar;

actually does something roughly similar to:
load("foo/bar.bs");    //not exactly, but it is a similar idea...
delegate var #'foo/bar'=#:"foo/bar";    //sort of...
in turn invoking more funky semantics in the VM.

note: #'...' and #:"..." is basically syntax for allowing identifiersand keywords containing otherwise invalid characters (characters invalidfor identifiers).

[4] http://c2.com/cgi/wiki?ExplicitManagementOfImplicitContext

ok.

    simply disabling compiler features may not be sufficient
It is also a bad idea. You end up with 2^N languages for N switches.That's hell to test and verify. Libraries developed for different setsof switches will consequently prove buggy when people try to composethem. This is even more documentation to manage.


it depends on the nature of the features and their impact on the language.

if trying to use a feature simply makes code using it invalid ("sorry, Ican't let you do that"), this works.if it leaves the code still valid but with different semantics, orenabling a feature changes the semantics of code written with itdisabled, well, this is a bit more ugly...

but, yes, sadly, I am already having enough issues with seeminglyendless undocumented/forgotten features, and features which were mostlyimplemented but are subtly broken (for example, me earlier fixing afeature which existed in the parser/compiler, but depended on an opcodewhich for whatever reason was absent from the bytecode interpreter, ...).

but, with a language/VM existing for approx 8 years and with ~ 540opcodes, ... I guess things like this are inevitable.

    anything still visible may be tampered with, for example, suppose
    a global package is made visible in the new toplevel, and the
    untrusted code decides to define functions in a system package,
    essentially overwriting the existing functions
Indeed. Almost every language built for security makes heavy use ofimmutable objects. They're easier to reason about. For example, ratherthan replacing the function in the package, you would be forced tocreate a new record that is the same as the old one but replaces oneof the functions.
Access to mutable state is more tightly controlled - i.e. an explicitcapability to inject a new stage in a pipeline, rather than implicitaccess to a variable. We don't lose any flexibility, but the 'path ofleast resistance' is much more secure.

yes, but this isn't as ideal in a pre-existing language where nearlyeverything is highly mutable.in this case, creation of security may involve... "write protecting"things...

a basic security mechanism then is that, by default, most non-ownedobjects will be marked read-only.

    an exposed API function may indirectly give untrusted code
    "unexpected levels of power" if it, by default, has unhindered
    access to the system, placing additional burden on library code
    not to perform operations which may be exploitable

This is why whitelisting, rather than blacklisting, should be the rulefor security.

but whitelisting is potentially much more effort than blacklisting, evenif potentially somewhat better from a security perspective.

    assigning through a delegated object may in-turn move up and
    assign the variable in a delegated-to object (at the VM level
    there are multiple assignment operators to address these different
    cases, namely which object will have a variable set in...).
The security problem isn't delegation, but rather the fact that thischaining is 'implicit' so developers easily forget about it and thusleave security holes.
A library of security patterns could help out. E.g. you could ensureyour revocable forwarders and facet-pattern constructors also providebarriers against propagation of assignment.

potentially, or use cloning rather than delegation chaining (however, inmy VM, it is only possible to clone from a single object, whereas onemay do N-way delegation, making delegation generally more convenient forbuilding the toplevel).

my current thinking is that basically assignment delegation will stoponce an object is hit which is read-only, forcing the assignment into a"nearer" object. trying to force-assign into a read-only object willresult in an exception or similar.

in general though, trying to assign top-level bindings (which aregenerally things like API functions) may be a bad practice in general.

    could a variation of, say, the Unix security model, be applied at
    the VM level?
Within the VM, this has been done before, e.g. Java introduced threadcapabilities. But the Unix security model is neither simple norflexible nor efficient, especially for fine-grained delegation. Icannot recommend it. But if you do pursue this route: it has been donebefore, and there's a lot of material you can learn from. Look upLambdaMoo, for example.


LambdaMoo found a MUD, if this is what was in question...

I partly patched it on last-night, and the performance overhead shouldbe "modest" in the common case.

as for "simple" or "efficient", a Unix-style security model doesn't lookall that bad. at least I am not looking at implementing ACLs or aWindows-style security model, which would be a fair amount more complexand slower (absent static checking and optimization).

luckily, there are only a relatively small number of places I reallyneed to put in security checks (mostly in the object system andsimilar). most of the rest of the typesystem or VM doesn't really need them.


or such...

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] misc: code security model

Reply via email to