Re: [fonc] misc: code security model

BGB Thu, 11 Aug 2011 20:43:38 -0700

On 8/11/2011 7:35 PM, Tristan Slominski wrote:

I feel obligated to comment on usage of MD5 for any security purpose:


http://www.codeproject.com/KB/security/HackingMd5.aspx


but, to be fair, that is a fairly contrived example...

it is at least not like, say, Adler-32 or CRC-32 where one can (fairlyquickly) brute-force a glob of "bozo bytes" to make one file look likeanother (or apparently with CRC-32, make use of a few well-placed xorsand similar, at least going by some examples I found on the internet).


like, say, file A has a certain CRC;

file B is a virus, followed by a sequence of bytes to fiddle the CRCbits into the desired value, and to pad the file to the expected size(among other possibilities).

Adler-32 is basically a pair of sums, and with some fiddling andarithmetic, one can (probably) push the sums up/down (in relation) untilthey match the expected values.

On Thu, Aug 11, 2011 at 19:06, BGB <cr88...@gmail.com<mailto:cr88...@gmail.com>> wrote:


    On 8/11/2011 12:55 PM, David Barbour wrote:

    On Wed, Aug 10, 2011 at 7:35 PM, BGB <cr88...@gmail.com
    <mailto:cr88...@gmail.com>> wrote:

        not all code may be from trusted sources.
        consider, say, code comes from the internet.

        what is a "good" way of enforcing security in such a case?


    Object capability security is probably the very best approach
    available today - in terms of a wide variety of criterion such
    as flexibility, performance, precision, visibility, awareness,
    simplicity, and usability.

    In this model, ability to send a message to an object is
    sufficient proof that you have rights to use it - there are no
    passwords, no permissions checks, etc. The security discipline
    involves controlling whom has access to which objects - i.e.
    there are a number of patterns, such as 'revocable forwarders',
    where you'll provide an intermediate object that allows you to
    audit and control access to another object. You can read about
    several of these patterns on the erights wiki [1].


    the big problem though:
    to try to implement this as a sole security model, and expecting
    it to be effective, would likely impact language design and
    programming strategy, and possibly lead to a fair amount of effort
    WRT "hole plugging" in an existing project.

    granted, code will probably not use logins/passwords for
    authority, as this would likely be horridly ineffective for code
    (about as soon as a piece of malware knows the login used by a
    piece of "trusted" code, it can spoof as the code and do whatever
    it wants).

    "digital signing" is another possible strategy, but poses a
    similar problem:
    how to effectively prevent spoofing (say, one manages to "extract"
    the key from a trusted app, and then signs a piece of malware with
    it).

    AFAICT, the usual strategy used with SSL certificates is that they
    may expire and are checked against a "certificate authority".
    although maybe reasonably effective for the internet, this seems
    to be a fairly complex and heavy-weight approach (not ideal for
    software, especially not FOSS, as most such authorities want money
    and require signing individual binaries, ...).

    my current thinking is roughly along the line that each piece of
    code will be given a "fingerprint" (possibly an MD5 or SHA hash),
    and this fingerprint is either known good to the VM itself (for
    example, its own code, or code that is part of the host
    application), or may be confirmed as "trusted" by the user (if it
    requires special access, ...).

    it is a little harder to spoof a hash, and tampering with a piece
    of code will change its hash (although with simpler hashes, such
    as checksums and CRC's, it is often possible to use a glob of
    "garbage bytes" to trick the checksum algorithm into giving the
    desired value).

    yes, there is still always the risk of a naive user confirming a
    piece of malware, but this is their own problem at this point.

    Access to FFI and such would be regulated through objects. This
    leaves the issue of deciding: how do we decide which objects
    untrusted code should get access to? Disabling all of FFI is
    often too extreme.


    potentially.
    my current thinking is, granted, that it will disable access to
    the "FFI access object" (internally called "ctop" in my VM), which
    would disable the ability to fetch new functions/... from the FFI
    (or perform "native import" operations with the current
    implementation).

    however, if retrieved functions are still accessible, it might be
    possible to retrieve them indirectly and then make them visible
    this way.

    as noted in another message:

    native import C.math;
    var mathobj={sin: sin, cos: cos, tan: tan, ...};

    giving access to "mathobj" will still allow access to these
    functions, without necessarily giving access to "the entire C
    toplevel", which poses a much bigger security risk.

    sadly, there is no real good way to safely "streamline" this in
    the current implementation.

    My current design: FFI is a network of registries. Plugins and
    services publish FFI objects (modules) to these registries.
    Different registries are associated with different security
    levels, and there might be connections between them based on
    relative trust and security. A single FFI plugin might provide
    similar objects at multiple security levels - e.g. access to HTTP
    service might be provided at a low security level for remote
    addresses, but at a high security level that allows for local
    (127, 192.168, 10.0.0, etc.) addresses. One reason to favor
    plugin-based FFI is that it is easy to develop security policy
    for high-level features compared to low-level capabilities. (E.g.
    access to generic 'local storage' is lower security level than
    access to 'filesystem'.)


    my FFI is based on bulk importing the contents of C headers.

    although fairly powerful and convenient, "securing" such a beast
    is likely to be a bit of a problem.

    easier just to be like "code which isn't trusted can't directly
    use the FFI...".

    Other than security, my design is to solve other difficult
    problems involving code migration [2], multi-process and
    distributed extensibility (easy to publish modules to registries
    even from other processes or servers; similar to web-server CGI),
    smooth transitions from legacy, extreme resilience and
    self-healing (multiple fallbacks per FFI dependency), and
    policy&configuration management [3].

    [1] http://wiki.erights.org/wiki/Walnut/Secure_Distributed_Computing
    [2] http://wiki.erights.org/wiki/Unum
    [3] http://c2.com/cgi/wiki?PolicyInjection


    I had done code migration in the past, but sadly my VM's haven't
    had this feature in a fairly long time (many years).

    even then, it had a few ugly problems:
    the migration essentially involved transparently sending the AST,
    and recompiling it on the other end. a result of this was that
    closures would tend to loose the "identity" of their lexical scope.
    ...

    over a socket, it had used a model where many data types
    (lists/...) were essentially passed as copies;
    things like builtin and native functions simply bound against
    their analogues on the other end (code in C land was unique to
    each node);
    objects were "mirrored" with an asynchronous consistency model
    (altering an object would send slot-change messages to the other
    nodes which held copies);
    other object types were passed-by-handle (basically, it identifies
    the NodeID and ObjectID for a remote object);
    ...


    some later ideas (for reviving the above) had involved the idea of
    using essentially mirroring a virtual heap over the network (using
    a system similar to "far pointers" and "segmented addressing"),
    but this would have introduced many nasty problems, and this
    didn't go anywhere.

    if I ever do get around to re-implementing something like this, I
    will probably use a variation of my original strategy, except that
    I would probably leave objects as being remotely accessed via
    handles, rather than trying to mirror them and keep them in sync
    (or, if mirroring is used, effectively using a "synchronized
    write" strategy of some sort...).


        the second thing seems to be the option of moving the code to
        a local toplevel where its ability to see certain things is
        severely limited.


    Yes, this is equivalent to controlling which 'capabilities' are
    available in a given context. Unfortunately, developers lack
    'awareness' - i.e. it is not explicit in code that certain
    capabilities are needed by a given library, so failures occur
    much later when the library is actually loaded. This is part of
    why I eventually abandoned dynamic scopes (where 'dynamic scope'
    would include the toplevel [4]).


    "dynamic scope" in my case refers to something very different.
    I generally call the objects+delegation model "object scope",
    which is the main model used by the toplevel.

    it differs some for import:
    by default, "import" actually exists in terms of the lexical scope
    (it is internally a delegate lexical variable);
    potentially confusingly, for "delegate import" the import is
    actually placed into the object scope (directly into the
    containing package or toplevel object), which is part of the
    reason for its unique semantics.

    say (at the toplevel):
    extern delegate import foo.bar;

    actually does something roughly similar to:
    load("foo/bar.bs <http://bar.bs>");    //not exactly, but it is a
    similar idea...
    delegate var #'foo/bar'=#:"foo/bar";    //sort of...
    in turn invoking more funky semantics in the VM.

    note: #'...' and #:"..." is basically syntax for allowing
    identifiers and keywords containing otherwise invalid characters
    (characters invalid for identifiers).

    [4] http://c2.com/cgi/wiki?ExplicitManagementOfImplicitContext

ok.


        simply disabling compiler features may not be sufficient


    It is also a bad idea. You end up with 2^N languages for N
    switches. That's hell to test and verify. Libraries developed for
    different sets of switches will consequently prove buggy when
    people try to compose them. This is even more documentation to
    manage.


    it depends on the nature of the features and their impact on the
    language.

    if trying to use a feature simply makes code using it invalid
    ("sorry, I can't let you do that"), this works.
    if it leaves the code still valid but with different semantics, or
    enabling a feature changes the semantics of code written with it
    disabled, well, this is a bit more ugly...


    but, yes, sadly, I am already having enough issues with seemingly
    endless undocumented/forgotten features, and features which were
    mostly implemented but are subtly broken (for example, me earlier
    fixing a feature which existed in the parser/compiler, but
    depended on an opcode which for whatever reason was absent from
    the bytecode interpreter, ...).

    but, with a language/VM existing for approx 8 years and with ~ 540
    opcodes, ... I guess things like this are inevitable.


        anything still visible may be tampered with, for example,
        suppose a global package is made visible in the new toplevel,
        and the untrusted code decides to define functions in a
        system package, essentially overwriting the existing functions


    Indeed. Almost every language built for security makes heavy use
    of immutable objects. They're easier to reason about. For
    example, rather than replacing the function in the package, you
    would be forced to create a new record that is the same as the
    old one but replaces one of the functions.

    Access to mutable state is more tightly controlled - i.e. an
    explicit capability to inject a new stage in a pipeline, rather
    than implicit access to a variable. We don't lose any
    flexibility, but the 'path of least resistance' is much more secure.


    yes, but this isn't as ideal in a pre-existing language where
    nearly everything is highly mutable.
    in this case, creation of security may involve... "write
    protecting" things...

    a basic security mechanism then is that, by default, most
    non-owned objects will be marked read-only.

        an exposed API function may indirectly give untrusted code
        "unexpected levels of power" if it, by default, has
        unhindered access to the system, placing additional burden on
        library code not to perform operations which may be exploitable


    This is why whitelisting, rather than blacklisting, should be the
    rule for security.


    but whitelisting is potentially much more effort than
    blacklisting, even if potentially somewhat better from a security
    perspective.


        assigning through a delegated object may in-turn move up and
        assign the variable in a delegated-to object (at the VM level
        there are multiple assignment operators to address these
        different cases, namely which object will have a variable set
        in...).


    The security problem isn't delegation, but rather the fact that
    this chaining is 'implicit' so developers easily forget about it
    and thus leave security holes.

    A library of security patterns could help out. E.g. you could
    ensure your revocable forwarders and facet-pattern constructors
    also provide barriers against propagation of assignment.


    potentially, or use cloning rather than delegation chaining
    (however, in my VM, it is only possible to clone from a single
    object, whereas one may do N-way delegation, making delegation
    generally more convenient for building the toplevel).

    my current thinking is that basically assignment delegation will
    stop once an object is hit which is read-only, forcing the
    assignment into a "nearer" object. trying to force-assign into a
    read-only object will result in an exception or similar.

    in general though, trying to assign top-level bindings (which are
    generally things like API functions) may be a bad practice in
    general.


        could a variation of, say, the Unix security model, be
        applied at the VM level?


    Within the VM, this has been done before, e.g. Java introduced
    thread capabilities. But the Unix security model is neither
    simple nor flexible nor efficient, especially for fine-grained
    delegation. I cannot recommend it. But if you do pursue this
    route: it has been done before, and there's a lot of material you
    can learn from. Look up LambdaMoo, for example.


    LambdaMoo found a MUD, if this is what was in question...

    I partly patched it on last-night, and the performance overhead
    should be "modest" in the common case.


    as for "simple" or "efficient", a Unix-style security model
    doesn't look all that bad. at least I am not looking at
    implementing ACLs or a Windows-style security model, which would
    be a fair amount more complex and slower (absent static checking
    and optimization).

    luckily, there are only a relatively small number of places I
    really need to put in security checks (mostly in the object system
    and similar). most of the rest of the typesystem or VM doesn't
    really need them.

    or such...


    _______________________________________________
    fonc mailing list
    fonc@vpri.org <mailto:fonc@vpri.org>
    http://vpri.org/mailman/listinfo/fonc




_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] misc: code security model

Reply via email to